Using Optimization to find Synthetic Equity Universes that minimize Survivorship and Selection Biases

Using Optimization to find Synthetic Equity Universes that minimize
Survivorship and Selection Biases
Tobias Setz
DSF-R Conference, Vienna, 2019-09-20

Disclaimer
This document is copyrighted, and its content may not be reproduced without the permission of the
authors.
This material has been prepared solely for informational purposes only and it is not intended to be and
should not be considered as an offer, or a solicitation of an offer, or an invitation or a personal
recommendation to buy or sell any stocks and bonds, or any other fund, security, or financial instrument,
or to participate in any investment strategy, directly or indirectly. It is intended for use in research only by
those recipients to whom it was made available by the authors of the document.
All information for an investment strategy prior to its launch date is back-tested, based on the
methodology that was in effect on the launch date. Back-tested performance, which is hypothetical and
not actual performance, is subject to inherent limitations because it reflects application of a methodology
and selection of constituents in hindsight. No theoretical approach can take into account all of the factors
in the markets in general and the impact of decisions that might have been made during the actual
operation of an investment strategy. Actual returns may differ from, and be lower than, back-tested
returns.
© OpenMetrics Solutions LLC, Zurich (2019) 2

“It is better to be roughly right than precisely wrong.”
John Maynard Keynes, (1883-1946)
3

Introduction
There were two main factors leading to this presentation:
• The desire / customer request to apply our risk management environment on equities. Which was so far
exclusively applied on broad market indices, commodities and currencies.
• The confrontation with various backtests that claim to generate huge premiums (10% and more than the
benchmark index) through equity selection.

MSCIWorld Index
Consists of around 1600
equities.
Colored lines show all
equities existing since
1998-12-31.These are
1109 companies.
Black line shows the
Equal Weights Portfolio
(discrete) of these 1109
equities.
Equity Universe

Not considering
companies that fall out of
the index (e.g.
bankruptcy) leads to a
huge survivorship bias.
Backtests based on this
universe would be
massively flawed.
Might be the driving
force in practice for
sensational promises
based on backtests.
Idea:
Use an optimizer on
these 1109 equities to
find a subset that tracks
the index.
Problems:
• Very time consuming
• AMPL restricted to
300 variables
Equity Universe

Procedure:
• Sort the 1109 equites
based on their
performance.
• Create a sequence
from 1 to 1109 with
length 300: roughly
take only every 4th
equity.
• In total the diluted
universe consists of
300 equities.
Diluted Equity Universe

Very similar behavior to
the full universe.
Still a huge survivorship
bias.
“Party in backtest,
hangover in production.”
Anonymous
Diluted Equity Universe

The Representative Universe
• The goal is to find a sub-universe of the 300 equities that tracks the index (the representative universe)
as good as possible; thus trying to eliminate any biases.
• As an alternative one could of course also take into consideration at any point in time the companies
that went bankrupt later in time or did fall out of the index because of other reasons (the complete
universe).
• We are mainly interested in the representative universe because it allows to test trading or risk
management algorithms without being greatly biased in a fast and efficient way.
• Should the tests be positive for the representative universe it would of course make sense to make the
final tests based on the complete universe.
The Complete Universe The Representative Universe
• Exact
• Backtest implementation is quite complex
(e.g. dynamic universe over time)
• Computationally expensive
• Data might be hard to get
• Approximative
• Backtest implementation is straight forward
(e.g. constant universe over time)
• Computationally cheap
• Data is not an issue

Selection: Continuous Weights
• To exploit the potential of a sub-universe of the 300 equities that tracks the index (the representative
universe) the optimization was first tested by assuming that the resulting weights could be continuous
(in the range between 0 and 1).
• The following problem was solved:
min var 𝑟𝑏 − 𝑅𝑤
𝑠. 𝑡.
𝑚𝑒𝑎𝑛 𝑅𝑤 = 𝑐 ∙ 𝑚𝑒𝑎𝑛(𝑟𝑏)
𝑠𝑢𝑚 𝑤 = 1
0 ≤ 𝑤𝑖≤ 1
• where 𝑟𝑏 are the benchmark returns, 𝑅 is a matrix that holds the returns of the 300 equities, 𝑤 are the
unknown weights and 𝑐 is a constant to lower or increase the target return. For the moment 𝑐 = 1 was
chosen. Weights: continuous. Objective: quadratic. Constraints: linear.
• To calculate the weights the weekly returns were used. This in order to make the calculations less time
consuming and since the results are qualitatively the same as using daily returns.

require(Rsolnp)
nVar <- ncol(returnsUniverse)
pars <- rep(1/nVar,nVar)
fun <- function(x) {var(returnsBenchmark-returnsUniverse%*%x)}
c <- 1
eqfun <- function(x) {c(mean(returnsUniverse%*%x),sum(x))}
eqB <- c(c*mean(returnsBenchmark),1)
solnp(pars,fun,
eqfun=eqfun,eqB=eqB,
LB=rep(0,nVar),UB=rep(1,nVar))

Setup:
• Solver: solnp
c=1
• Returns to calculate
the weights:
logarithmic (weekly)
the backtest:
logarithmic (daily)
At first sight the
survivorship bias is
greatly reduced.
NOTE: Using logarithmic
returns for the backtest
would imply that there is
a constant rebalancing
scheme in place.

Setup:
• Solver: solnp
c=1
the weights:
logarithmic (weekly)
the backtest:
discrete (daily)
While the logarithmic
backtest assumes a
constant rebalancing
scheme the discrete
backtest does not; in this
case the rebalancing
frequency is daily.
This leads immediately
to a selection bias.
The assumed premium
can almost certainly not
be exploited in
production.

Setup:
• Solver: solnp
c=1
the weights:
discrete (weekly)
the backtest:
discrete (daily)
This shows the potential
on how much the biases
could be reduced.

NOTE:
• To further reduce the
bias the parameter “c”
cold be lowered.
• For the weights based
on the discrete returns
it would be better
practice to use the
geometric mean.
However; the results
are basically the
same.
• We are not looking for
a perfect track record
(which is probably
impossible for the
discrete case); but for
a basket of equities
that does not promise
us too much more
return and too much
less risk as the index
itself.

Selection: Discrete Weights
• The continuous weights do not have a discrete subset of the 300 equities as a result. It was just done to
get a feeling for the potential to reduce the biases.
• In order to get a discrete subset it is necessary to introduce binary variables (𝑏𝑖). It comes down to
solving the following problem:
min var 𝑟𝑏 − 𝑅𝑤
𝑠. 𝑡.
𝑚𝑒𝑎𝑛 𝑅𝑤 = 𝑐 ∙ 𝑚𝑒𝑎𝑛 𝑟𝑏
𝑤𝑖 = 𝑏𝑖/(𝑠𝑢𝑚 𝑏 + 𝜀)
𝑏𝑖 ∈ 𝐵
• Where 𝜀 is chosen to be the machine precision. This makes sure that there are no divisions by zero while
the impact stays neglectable. This since as soon as 𝑠𝑢𝑚(𝑏)>0 then 𝑠𝑢𝑚(𝑏) + 𝜀 → 𝑠𝑢𝑚(𝑏).
• Weights: f(binary). Objective: non-linear. Constraints: linear. If the normalization of the weights cannot
be linearized this problem can only be solved by using Mixed Integer Non-Linear optimizers.

Selection: Discrete Weights
• In order to get a discrete subset it is necessary to constrain the weights to be binary. There are various
methods to achieve this:
Simulated Annealing • Might have to be implemented from scratch
• Results might vary greatly based on initial parameters
• Could be further investigated
Genetic Algorithms • Publicly available
• Did not perform well in tests
• Could be further investigated
Differential Evolution • Publicly available
• Performed well in tests
Branch and Bound • Publicly available
• Performed well in tests

Selection: Discrete Weights (Differential Evolution)
require(DEoptimR)
xFun <- function(x) {x<-round(x);x<-x/(sum(x)+eps)}
fn <- function(x){x<-xFun(x);var(returnsBenchmark-returnsUniverse%*%x)}
c <- 1
constr <- function(x){
x<-xFun(x)
c(mean(returnsUniverse%*%x)-c*mean(returnsBenchmark),sum(x)-1)
}
JDEoptim(fn=fn,constr=constr,meq=2,
lower=rep(0,nVar),upper=rep(1,nVar))

Setup:
• Solver: DEoptimR
c=1
the weights:
discrete (daily)
the backtest:
discrete (daily)
• Number of Assets
chosen: 42
In this case daily returns
are computationally less
expensive than weekly
returns.
Selection: Discrete Weights (Differential Evolution)

Selection: Discrete Weights (Branch and Bound)
• For the differential evolution solver the continuous weights were transformed into discrete weights.
While this works well for differential evolution it is not working well for most other solvers.
• It is certainly better practice to use a solver that is designed to work with binary variables.
• Such solvers are accessible through the AMPL interface. Although the free version is limited to 300
variables (equities) by AMPL and is even further reduced within most commercial solvers. An alternative
is to access them through NEOS. But the computation time there is limited to 8 hours. Hence the dilution
in order to make a local calculation possible.
• The AMPL framework works as follows:
Data File Model File Run File
Holds all the data necessary for
the computations.
Defines the optimization
problem.
Defines the execution: solver
and parameters.
• Once these files are generated the problem can be solved with any suitable solver. Changing the solver
involves just changing one word within the run file. NOTE: A similar framework which is completely
implemented in R is ROI.

model <- c(
"param nVar;",
"param nSmp;",
"param returnsUniverse {1..nSmp,1..nVar};",
"param returnsBenchmark {1..nSmp};",
"param mu {1..nVar};",
"param targetReturn;",
"var xb {j in 1..nVar} binary;",
"var xS = sum{j in 1..nVar} xb[j];",
"var x {j in 1..nVar} = xb[j]/(xS+eps);",
"var portfolioReturns {i in 1..nSmp}
= sum{j in 1..nVar} returnsUniverse[i,j]*x[j];",
"var diffReturns {i in 1..nSmp}
= returnsBenchmark[i]-portfolioReturns[i];",
"var mean = (sum {i in 1..nSmp} diffReturns[i])/nSmp;",
"var variance = (sum{i in 1..nSmp} (diffReturns[i]-mean)^2)/nSmp;",
"var portfolioReturn = sum{j in 1..nVar} mu[j]*x[j];",
"minimize trackError: variance;",
"subject to targetLB: portfolioReturn >= targetReturn*0.90;",
"subject to targetUB: portfolioReturn <= targetReturn*1.00;“
)

# model: see previous slide
amplModelFile(model)
c <- 1
nSmp <- nrow(returnsUniverse)
mu <- colMeans(returnsUniverse)
targetReturn <- c*mean(returnsBenchmark)
eps <- .Machine$double.eps
data <-list("nVar"=nVar,"nSmp"=nSmp,
"returnsUniverse"=returnsUniverse,
"returnsBenchmark"=returnsBenchmark,
"mu"=mu,"targetReturn"=targetReturn,
"eps"=eps)
amplDataFile(data)
interface <- c("local","neos")[1]
solver <- c("bonmin","cplex","couenne")[1]
options <- "option bonmin_options 'algorithm=B-BB tolerance=1e-08';"
amplRunFile(interface,solver,options)
amplOutRun(external=FALSE)

Setup:
• Solver:AMPL/Bonmin
c=1
the weights:
discrete (weekly)
the backtest:
discrete (daily)
• Number of Assets
chosen: 37
33 of the chosen assets
are identical with the
selection of DEoptimR.
NOTE: Results are preliminary.

Outlook
Note that this project is still work in progress. The current results to still show a remaining bias. Which
could be further reduced by:
• Using the return factor: c<1
• Switch back to daily returns for the calculation of the weights. Although the impact is rather limited.
• Check on whether there is a way to linearize the problem.
• Explore more solvers.
• Switch back to the full universe (no dilution). Needs an unrestricted solver environment. Might be
possible with ROI in the future.

Conclusion
• Blindly using equity data can lead to quite dramatic survivorship or selection biases.
• This leads to greatly flawed backtests which might be a severe problem within the financial industry. The
premiums expected based on the backtests might completely collapse in production. NOTE: biases are
not the only reason for this (e.g. in-sample, over-fitting, timing and slippage, costs, …); but might be
quite dominant.
• The representative universe represents an alternative to the complete universe approach whose benefit
is mostly that it is a very efficient way to get a first unbiased impression on a trading or risk management
strategy applied on equities.

Contact
OpenMetrics Solutions LLC
Dufoursstrasse 47
CH-8008 Zurich
+41 44 552 4909
contact@openmetrics.ch
www.openmetrics.ch
OpenMetrics Solutions LLC is an approved ETH Zurich Spin-off
OpenMetrics© is a registered trademark at IPI (Swiss Federal Institute of Intellectual Property), Berne
swiss made software is the nationally and
internationally recognized symbol for
Swiss quality in software development.

Using Optimization to find Synthetic Equity Universes that minimize Survivorship and Selection Biases

More Related Content

Similar to Using Optimization to find Synthetic Equity Universes that minimize Survivorship and Selection Biases (20)

Recently uploaded (20)

Using Optimization to find Synthetic Equity Universes that minimize Survivorship and Selection Biases