SlideShare a Scribd company logo
Using Optimization to find Synthetic Equity Universes that minimize
Survivorship and Selection Biases
Tobias Setz
DSF-R Conference, Vienna, 2019-09-20
Disclaimer
This document is copyrighted, and its content may not be reproduced without the permission of the
authors.
This material has been prepared solely for informational purposes only and it is not intended to be and
should not be considered as an offer, or a solicitation of an offer, or an invitation or a personal
recommendation to buy or sell any stocks and bonds, or any other fund, security, or financial instrument,
or to participate in any investment strategy, directly or indirectly. It is intended for use in research only by
those recipients to whom it was made available by the authors of the document.
All information for an investment strategy prior to its launch date is back-tested, based on the
methodology that was in effect on the launch date. Back-tested performance, which is hypothetical and
not actual performance, is subject to inherent limitations because it reflects application of a methodology
and selection of constituents in hindsight. No theoretical approach can take into account all of the factors
in the markets in general and the impact of decisions that might have been made during the actual
operation of an investment strategy. Actual returns may differ from, and be lower than, back-tested
returns.
© OpenMetrics Solutions LLC, Zurich (2019) 2
“It is better to be roughly right than precisely wrong.”
John Maynard Keynes, (1883-1946)
3
Introduction
There were two main factors leading to this presentation:
• The desire / customer request to apply our risk management environment on equities. Which was so far
exclusively applied on broad market indices, commodities and currencies.
• The confrontation with various backtests that claim to generate huge premiums (10% and more than the
benchmark index) through equity selection.
© OpenMetrics Solutions LLC, Zurich (2019) 4
© OpenMetrics Solutions LLC, Zurich (2019) 5
MSCIWorld Index
Consists of around 1600
equities.
Colored lines show all
equities existing since
1998-12-31.These are
1109 companies.
Black line shows the
Equal Weights Portfolio
(discrete) of these 1109
equities.
Equity Universe
© OpenMetrics Solutions LLC, Zurich (2019) 6
Not considering
companies that fall out of
the index (e.g.
bankruptcy) leads to a
huge survivorship bias.
Backtests based on this
universe would be
massively flawed.
Might be the driving
force in practice for
sensational promises
based on backtests.
Idea:
Use an optimizer on
these 1109 equities to
find a subset that tracks
the index.
Problems:
• Very time consuming
• AMPL restricted to
300 variables
Equity Universe
© OpenMetrics Solutions LLC, Zurich (2019) 7
Procedure:
• Sort the 1109 equites
based on their
performance.
• Create a sequence
from 1 to 1109 with
length 300: roughly
take only every 4th
equity.
• In total the diluted
universe consists of
300 equities.
Diluted Equity Universe
© OpenMetrics Solutions LLC, Zurich (2019) 8
Very similar behavior to
the full universe.
Still a huge survivorship
bias.
“Party in backtest,
hangover in production.”
Anonymous
Diluted Equity Universe
The Representative Universe
• The goal is to find a sub-universe of the 300 equities that tracks the index (the representative universe)
as good as possible; thus trying to eliminate any biases.
• As an alternative one could of course also take into consideration at any point in time the companies
that went bankrupt later in time or did fall out of the index because of other reasons (the complete
universe).
• We are mainly interested in the representative universe because it allows to test trading or risk
management algorithms without being greatly biased in a fast and efficient way.
• Should the tests be positive for the representative universe it would of course make sense to make the
final tests based on the complete universe.
© OpenMetrics Solutions LLC, Zurich (2019) 9
The Complete Universe The Representative Universe
• Exact
• Backtest implementation is quite complex
(e.g. dynamic universe over time)
• Computationally expensive
• Data might be hard to get
• Approximative
• Backtest implementation is straight forward
(e.g. constant universe over time)
• Computationally cheap
• Data is not an issue
Selection: Continuous Weights
• To exploit the potential of a sub-universe of the 300 equities that tracks the index (the representative
universe) the optimization was first tested by assuming that the resulting weights could be continuous
(in the range between 0 and 1).
• The following problem was solved:
min var 𝑟𝑏 − 𝑅𝑤
𝑠. 𝑡.
𝑚𝑒𝑎𝑛 𝑅𝑤 = 𝑐 ∙ 𝑚𝑒𝑎𝑛(𝑟𝑏)
𝑠𝑢𝑚 𝑤 = 1
0 ≤ 𝑤𝑖≤ 1
• where 𝑟𝑏 are the benchmark returns, 𝑅 is a matrix that holds the returns of the 300 equities, 𝑤 are the
unknown weights and 𝑐 is a constant to lower or increase the target return. For the moment 𝑐 = 1 was
chosen. Weights: continuous. Objective: quadratic. Constraints: linear.
• To calculate the weights the weekly returns were used. This in order to make the calculations less time
consuming and since the results are qualitatively the same as using daily returns.
© OpenMetrics Solutions LLC, Zurich (2019) 10
Selection: Continuous Weights
require(Rsolnp)
nVar <- ncol(returnsUniverse)
pars <- rep(1/nVar,nVar)
fun <- function(x) {var(returnsBenchmark-returnsUniverse%*%x)}
c <- 1
eqfun <- function(x) {c(mean(returnsUniverse%*%x),sum(x))}
eqB <- c(c*mean(returnsBenchmark),1)
solnp(pars,fun,
eqfun=eqfun,eqB=eqB,
LB=rep(0,nVar),UB=rep(1,nVar))
© OpenMetrics Solutions LLC, Zurich (2019) 11
© OpenMetrics Solutions LLC, Zurich (2019) 12
Setup:
• Solver: solnp
c=1
• Returns to calculate
the weights:
logarithmic (weekly)
• Returns to calculate
the backtest:
logarithmic (daily)
At first sight the
survivorship bias is
greatly reduced.
NOTE: Using logarithmic
returns for the backtest
would imply that there is
a constant rebalancing
scheme in place.
Selection: Continuous Weights
© OpenMetrics Solutions LLC, Zurich (2019) 13
Setup:
• Solver: solnp
c=1
• Returns to calculate
the weights:
logarithmic (weekly)
• Returns to calculate
the backtest:
discrete (daily)
While the logarithmic
backtest assumes a
constant rebalancing
scheme the discrete
backtest does not; in this
case the rebalancing
frequency is daily.
This leads immediately
to a selection bias.
The assumed premium
can almost certainly not
be exploited in
production.
Selection: Continuous Weights
© OpenMetrics Solutions LLC, Zurich (2019) 14
Setup:
• Solver: solnp
c=1
• Returns to calculate
the weights:
discrete (weekly)
• Returns to calculate
the backtest:
discrete (daily)
This shows the potential
on how much the biases
could be reduced.
Selection: Continuous Weights
© OpenMetrics Solutions LLC, Zurich (2019) 15
NOTE:
• To further reduce the
bias the parameter “c”
cold be lowered.
• For the weights based
on the discrete returns
it would be better
practice to use the
geometric mean.
However; the results
are basically the
same.
• We are not looking for
a perfect track record
(which is probably
impossible for the
discrete case); but for
a basket of equities
that does not promise
us too much more
return and too much
less risk as the index
itself.
Selection: Continuous Weights
Selection: Discrete Weights
• The continuous weights do not have a discrete subset of the 300 equities as a result. It was just done to
get a feeling for the potential to reduce the biases.
• In order to get a discrete subset it is necessary to introduce binary variables (𝑏𝑖). It comes down to
solving the following problem:
min var 𝑟𝑏 − 𝑅𝑤
𝑠. 𝑡.
𝑚𝑒𝑎𝑛 𝑅𝑤 = 𝑐 ∙ 𝑚𝑒𝑎𝑛 𝑟𝑏
𝑤𝑖 = 𝑏𝑖/(𝑠𝑢𝑚 𝑏 + 𝜀)
𝑏𝑖 ∈ 𝐵
• Where 𝜀 is chosen to be the machine precision. This makes sure that there are no divisions by zero while
the impact stays neglectable. This since as soon as 𝑠𝑢𝑚(𝑏)>0 then 𝑠𝑢𝑚(𝑏) + 𝜀 → 𝑠𝑢𝑚(𝑏).
• Weights: f(binary). Objective: non-linear. Constraints: linear. If the normalization of the weights cannot
be linearized this problem can only be solved by using Mixed Integer Non-Linear optimizers.
© OpenMetrics Solutions LLC, Zurich (2019) 16
Selection: Discrete Weights
• In order to get a discrete subset it is necessary to constrain the weights to be binary. There are various
methods to achieve this:
© OpenMetrics Solutions LLC, Zurich (2019) 17
Simulated Annealing • Might have to be implemented from scratch
• Results might vary greatly based on initial parameters
• Could be further investigated
Genetic Algorithms • Publicly available
• Did not perform well in tests
• Could be further investigated
Differential Evolution • Publicly available
• Performed well in tests
Branch and Bound • Publicly available
• Performed well in tests
Selection: Discrete Weights (Differential Evolution)
require(DEoptimR)
nVar <- ncol(returnsUniverse)
xFun <- function(x) {x<-round(x);x<-x/(sum(x)+eps)}
fn <- function(x){x<-xFun(x);var(returnsBenchmark-returnsUniverse%*%x)}
c <- 1
constr <- function(x){
x<-xFun(x)
c(mean(returnsUniverse%*%x)-c*mean(returnsBenchmark),sum(x)-1)
}
JDEoptim(fn=fn,constr=constr,meq=2,
lower=rep(0,nVar),upper=rep(1,nVar))
© OpenMetrics Solutions LLC, Zurich (2019) 18
© OpenMetrics Solutions LLC, Zurich (2019) 19
Setup:
• Solver: DEoptimR
c=1
• Returns to calculate
the weights:
discrete (daily)
• Returns to calculate
the backtest:
discrete (daily)
• Number of Assets
chosen: 42
In this case daily returns
are computationally less
expensive than weekly
returns.
Selection: Discrete Weights (Differential Evolution)
Selection: Discrete Weights (Branch and Bound)
• For the differential evolution solver the continuous weights were transformed into discrete weights.
While this works well for differential evolution it is not working well for most other solvers.
• It is certainly better practice to use a solver that is designed to work with binary variables.
• Such solvers are accessible through the AMPL interface. Although the free version is limited to 300
variables (equities) by AMPL and is even further reduced within most commercial solvers. An alternative
is to access them through NEOS. But the computation time there is limited to 8 hours. Hence the dilution
in order to make a local calculation possible.
• The AMPL framework works as follows:
© OpenMetrics Solutions LLC, Zurich (2019) 20
Data File Model File Run File
Holds all the data necessary for
the computations.
Defines the optimization
problem.
Defines the execution: solver
and parameters.
• Once these files are generated the problem can be solved with any suitable solver. Changing the solver
involves just changing one word within the run file. NOTE: A similar framework which is completely
implemented in R is ROI.
Selection: Discrete Weights (Branch and Bound)
model <- c(
"param nVar;",
"param nSmp;",
"param returnsUniverse {1..nSmp,1..nVar};",
"param returnsBenchmark {1..nSmp};",
"param mu {1..nVar};",
"param targetReturn;",
"var xb {j in 1..nVar} binary;",
"var xS = sum{j in 1..nVar} xb[j];",
"var x {j in 1..nVar} = xb[j]/(xS+eps);",
"var portfolioReturns {i in 1..nSmp}
= sum{j in 1..nVar} returnsUniverse[i,j]*x[j];",
"var diffReturns {i in 1..nSmp}
= returnsBenchmark[i]-portfolioReturns[i];",
"var mean = (sum {i in 1..nSmp} diffReturns[i])/nSmp;",
"var variance = (sum{i in 1..nSmp} (diffReturns[i]-mean)^2)/nSmp;",
"var portfolioReturn = sum{j in 1..nVar} mu[j]*x[j];",
"minimize trackError: variance;",
"subject to targetLB: portfolioReturn >= targetReturn*0.90;",
"subject to targetUB: portfolioReturn <= targetReturn*1.00;“
)
© OpenMetrics Solutions LLC, Zurich (2019) 21
Selection: Discrete Weights (Branch and Bound)
# model: see previous slide
amplModelFile(model)
c <- 1
nVar <- ncol(returnsUniverse)
nSmp <- nrow(returnsUniverse)
mu <- colMeans(returnsUniverse)
targetReturn <- c*mean(returnsBenchmark)
eps <- .Machine$double.eps
data <-list("nVar"=nVar,"nSmp"=nSmp,
"returnsUniverse"=returnsUniverse,
"returnsBenchmark"=returnsBenchmark,
"mu"=mu,"targetReturn"=targetReturn,
"eps"=eps)
amplDataFile(data)
interface <- c("local","neos")[1]
solver <- c("bonmin","cplex","couenne")[1]
options <- "option bonmin_options 'algorithm=B-BB tolerance=1e-08';"
amplRunFile(interface,solver,options)
amplOutRun(external=FALSE)
© OpenMetrics Solutions LLC, Zurich (2019) 22
© OpenMetrics Solutions LLC, Zurich (2019) 23
Setup:
• Solver:AMPL/Bonmin
c=1
• Returns to calculate
the weights:
discrete (weekly)
• Returns to calculate
the backtest:
discrete (daily)
• Number of Assets
chosen: 37
33 of the chosen assets
are identical with the
selection of DEoptimR.
NOTE: Results are preliminary.
Selection: Discrete Weights (Branch and Bound)
Outlook
Note that this project is still work in progress. The current results to still show a remaining bias. Which
could be further reduced by:
• Using the return factor: c<1
• Switch back to daily returns for the calculation of the weights. Although the impact is rather limited.
• Check on whether there is a way to linearize the problem.
• Explore more solvers.
• Switch back to the full universe (no dilution). Needs an unrestricted solver environment. Might be
possible with ROI in the future.
© OpenMetrics Solutions LLC, Zurich (2019) 24
Conclusion
• Blindly using equity data can lead to quite dramatic survivorship or selection biases.
• This leads to greatly flawed backtests which might be a severe problem within the financial industry. The
premiums expected based on the backtests might completely collapse in production. NOTE: biases are
not the only reason for this (e.g. in-sample, over-fitting, timing and slippage, costs, …); but might be
quite dominant.
• The representative universe represents an alternative to the complete universe approach whose benefit
is mostly that it is a very efficient way to get a first unbiased impression on a trading or risk management
strategy applied on equities.
© OpenMetrics Solutions LLC, Zurich (2019) 25
Contact
OpenMetrics Solutions LLC
Dufoursstrasse 47
CH-8008 Zurich
+41 44 552 4909
contact@openmetrics.ch
www.openmetrics.ch
OpenMetrics Solutions LLC is an approved ETH Zurich Spin-off
OpenMetrics© is a registered trademark at IPI (Swiss Federal Institute of Intellectual Property), Berne
© OpenMetrics Solutions LLC, Zurich (2019) 26
swiss made software is the nationally and
internationally recognized symbol for
Swiss quality in software development.

More Related Content

PDF
Decision tree example problem
PDF
Ensembles of example dependent cost-sensitive decision trees slides
PDF
Investment management chapter 4.1 optimal portfolio choice -b
PDF
Reweighting and Boosting to uniforimty in HEP
PPTX
Decision tree and Multi armed bandit.pptx
PDF
Real time clustering of time series
PPTX
Computational Finance Introductory Lecture
PDF
Bayesian Dynamic Linear Models for Strategic Asset Allocation
Decision tree example problem
Ensembles of example dependent cost-sensitive decision trees slides
Investment management chapter 4.1 optimal portfolio choice -b
Reweighting and Boosting to uniforimty in HEP
Decision tree and Multi armed bandit.pptx
Real time clustering of time series
Computational Finance Introductory Lecture
Bayesian Dynamic Linear Models for Strategic Asset Allocation

Similar to Using Optimization to find Synthetic Equity Universes that minimize Survivorship and Selection Biases (20)

PDF
Managed ETFs - Efficient investment products meet advanced risk management…
PDF
Improving Returns from the Markowitz Model using GA- AnEmpirical Validation o...
PDF
"Portfolio Optimisation When You Don’t Know the Future (or the Past)" by Rob...
PDF
Prediction the stock market with genetic programming
PDF
A few solvers for portfolio selection
PPTX
"A Framework for Developing Trading Models Based on Machine Learning" by Kris...
PDF
High Performance Decision Tree Optimization within a Deep Learning Framework ...
PPTX
Intro to ml_2021
PDF
Interactive Visualization in Human Time -StampedeCon 2015
PDF
GPU Accelerated Backtesting and Machine Learning for Quant Trading Strategies
PDF
PORTFOLIO DEFENDER
PPTX
EvoFIN2015
PDF
Lecture 8 - Feature Engineering and Optimization, a lecture in subject module...
PDF
IRJET- Performance Evaluation of Various Classification Algorithms
PDF
IRJET- Performance Evaluation of Various Classification Algorithms
PPTX
Real-Time Portfolio Rebalancing with Machine Learning: Optimizing Financial P...
DOCX
Schema econ
DOCX
Econometrics: Basic
PDF
STOCK PRICE PREDICTION USING MACHINE LEARNING [RANDOM FOREST REGRESSION MODEL]
Managed ETFs - Efficient investment products meet advanced risk management…
Improving Returns from the Markowitz Model using GA- AnEmpirical Validation o...
"Portfolio Optimisation When You Don’t Know the Future (or the Past)" by Rob...
Prediction the stock market with genetic programming
A few solvers for portfolio selection
"A Framework for Developing Trading Models Based on Machine Learning" by Kris...
High Performance Decision Tree Optimization within a Deep Learning Framework ...
Intro to ml_2021
Interactive Visualization in Human Time -StampedeCon 2015
GPU Accelerated Backtesting and Machine Learning for Quant Trading Strategies
PORTFOLIO DEFENDER
EvoFIN2015
Lecture 8 - Feature Engineering and Optimization, a lecture in subject module...
IRJET- Performance Evaluation of Various Classification Algorithms
IRJET- Performance Evaluation of Various Classification Algorithms
Real-Time Portfolio Rebalancing with Machine Learning: Optimizing Financial P...
Schema econ
Econometrics: Basic
STOCK PRICE PREDICTION USING MACHINE LEARNING [RANDOM FOREST REGRESSION MODEL]
Ad

Recently uploaded (20)

PPTX
Unilever_Financial_Analysis_Presentation.pptx
PDF
1a In Search of the Numbers ssrn 1488130 Oct 2009.pdf
PDF
Lecture1.pdf buss1040 uses economics introduction
PDF
NAPF_RESPONSE_TO_THE_PENSIONS_COMMISSION_8 _2_.pdf
PDF
Mathematical Economics 23lec03slides.pdf
PDF
Dr Tran Quoc Bao the first Vietnamese speaker at GITEX DigiHealth Conference ...
PPTX
The discussion on the Economic in transportation .pptx
PDF
Corporate Finance Fundamentals - Course Presentation.pdf
PDF
discourse-2025-02-building-a-trillion-dollar-dream.pdf
PPTX
introuction to banking- Types of Payment Methods
PDF
Predicting Customer Bankruptcy Using Machine Learning Algorithm research pape...
PDF
Buy Verified Stripe Accounts for Sale - Secure and.pdf
PDF
ECONOMICS AND ENTREPRENEURS LESSONSS AND
PDF
way to join Real illuminati agent 0782561496,0756664682
PDF
Why Ignoring Passive Income for Retirees Could Cost You Big.pdf
PDF
Bitcoin Layer August 2025: Power Laws of Bitcoin: The Core and Bubbles
PPTX
kyc aml guideline a detailed pt onthat.pptx
PDF
ECONOMICS AND ENTREPRENEURS LESSONSS AND
PDF
How to join illuminati agent in Uganda Kampala call 0782561496/0756664682
PPTX
4.5.1 Financial Governance_Appropriation & Finance.pptx
Unilever_Financial_Analysis_Presentation.pptx
1a In Search of the Numbers ssrn 1488130 Oct 2009.pdf
Lecture1.pdf buss1040 uses economics introduction
NAPF_RESPONSE_TO_THE_PENSIONS_COMMISSION_8 _2_.pdf
Mathematical Economics 23lec03slides.pdf
Dr Tran Quoc Bao the first Vietnamese speaker at GITEX DigiHealth Conference ...
The discussion on the Economic in transportation .pptx
Corporate Finance Fundamentals - Course Presentation.pdf
discourse-2025-02-building-a-trillion-dollar-dream.pdf
introuction to banking- Types of Payment Methods
Predicting Customer Bankruptcy Using Machine Learning Algorithm research pape...
Buy Verified Stripe Accounts for Sale - Secure and.pdf
ECONOMICS AND ENTREPRENEURS LESSONSS AND
way to join Real illuminati agent 0782561496,0756664682
Why Ignoring Passive Income for Retirees Could Cost You Big.pdf
Bitcoin Layer August 2025: Power Laws of Bitcoin: The Core and Bubbles
kyc aml guideline a detailed pt onthat.pptx
ECONOMICS AND ENTREPRENEURS LESSONSS AND
How to join illuminati agent in Uganda Kampala call 0782561496/0756664682
4.5.1 Financial Governance_Appropriation & Finance.pptx
Ad

Using Optimization to find Synthetic Equity Universes that minimize Survivorship and Selection Biases

  • 1. Using Optimization to find Synthetic Equity Universes that minimize Survivorship and Selection Biases Tobias Setz DSF-R Conference, Vienna, 2019-09-20
  • 2. Disclaimer This document is copyrighted, and its content may not be reproduced without the permission of the authors. This material has been prepared solely for informational purposes only and it is not intended to be and should not be considered as an offer, or a solicitation of an offer, or an invitation or a personal recommendation to buy or sell any stocks and bonds, or any other fund, security, or financial instrument, or to participate in any investment strategy, directly or indirectly. It is intended for use in research only by those recipients to whom it was made available by the authors of the document. All information for an investment strategy prior to its launch date is back-tested, based on the methodology that was in effect on the launch date. Back-tested performance, which is hypothetical and not actual performance, is subject to inherent limitations because it reflects application of a methodology and selection of constituents in hindsight. No theoretical approach can take into account all of the factors in the markets in general and the impact of decisions that might have been made during the actual operation of an investment strategy. Actual returns may differ from, and be lower than, back-tested returns. © OpenMetrics Solutions LLC, Zurich (2019) 2
  • 3. “It is better to be roughly right than precisely wrong.” John Maynard Keynes, (1883-1946) 3
  • 4. Introduction There were two main factors leading to this presentation: • The desire / customer request to apply our risk management environment on equities. Which was so far exclusively applied on broad market indices, commodities and currencies. • The confrontation with various backtests that claim to generate huge premiums (10% and more than the benchmark index) through equity selection. © OpenMetrics Solutions LLC, Zurich (2019) 4
  • 5. © OpenMetrics Solutions LLC, Zurich (2019) 5 MSCIWorld Index Consists of around 1600 equities. Colored lines show all equities existing since 1998-12-31.These are 1109 companies. Black line shows the Equal Weights Portfolio (discrete) of these 1109 equities. Equity Universe
  • 6. © OpenMetrics Solutions LLC, Zurich (2019) 6 Not considering companies that fall out of the index (e.g. bankruptcy) leads to a huge survivorship bias. Backtests based on this universe would be massively flawed. Might be the driving force in practice for sensational promises based on backtests. Idea: Use an optimizer on these 1109 equities to find a subset that tracks the index. Problems: • Very time consuming • AMPL restricted to 300 variables Equity Universe
  • 7. © OpenMetrics Solutions LLC, Zurich (2019) 7 Procedure: • Sort the 1109 equites based on their performance. • Create a sequence from 1 to 1109 with length 300: roughly take only every 4th equity. • In total the diluted universe consists of 300 equities. Diluted Equity Universe
  • 8. © OpenMetrics Solutions LLC, Zurich (2019) 8 Very similar behavior to the full universe. Still a huge survivorship bias. “Party in backtest, hangover in production.” Anonymous Diluted Equity Universe
  • 9. The Representative Universe • The goal is to find a sub-universe of the 300 equities that tracks the index (the representative universe) as good as possible; thus trying to eliminate any biases. • As an alternative one could of course also take into consideration at any point in time the companies that went bankrupt later in time or did fall out of the index because of other reasons (the complete universe). • We are mainly interested in the representative universe because it allows to test trading or risk management algorithms without being greatly biased in a fast and efficient way. • Should the tests be positive for the representative universe it would of course make sense to make the final tests based on the complete universe. © OpenMetrics Solutions LLC, Zurich (2019) 9 The Complete Universe The Representative Universe • Exact • Backtest implementation is quite complex (e.g. dynamic universe over time) • Computationally expensive • Data might be hard to get • Approximative • Backtest implementation is straight forward (e.g. constant universe over time) • Computationally cheap • Data is not an issue
  • 10. Selection: Continuous Weights • To exploit the potential of a sub-universe of the 300 equities that tracks the index (the representative universe) the optimization was first tested by assuming that the resulting weights could be continuous (in the range between 0 and 1). • The following problem was solved: min var 𝑟𝑏 − 𝑅𝑤 𝑠. 𝑡. 𝑚𝑒𝑎𝑛 𝑅𝑤 = 𝑐 ∙ 𝑚𝑒𝑎𝑛(𝑟𝑏) 𝑠𝑢𝑚 𝑤 = 1 0 ≤ 𝑤𝑖≤ 1 • where 𝑟𝑏 are the benchmark returns, 𝑅 is a matrix that holds the returns of the 300 equities, 𝑤 are the unknown weights and 𝑐 is a constant to lower or increase the target return. For the moment 𝑐 = 1 was chosen. Weights: continuous. Objective: quadratic. Constraints: linear. • To calculate the weights the weekly returns were used. This in order to make the calculations less time consuming and since the results are qualitatively the same as using daily returns. © OpenMetrics Solutions LLC, Zurich (2019) 10
  • 11. Selection: Continuous Weights require(Rsolnp) nVar <- ncol(returnsUniverse) pars <- rep(1/nVar,nVar) fun <- function(x) {var(returnsBenchmark-returnsUniverse%*%x)} c <- 1 eqfun <- function(x) {c(mean(returnsUniverse%*%x),sum(x))} eqB <- c(c*mean(returnsBenchmark),1) solnp(pars,fun, eqfun=eqfun,eqB=eqB, LB=rep(0,nVar),UB=rep(1,nVar)) © OpenMetrics Solutions LLC, Zurich (2019) 11
  • 12. © OpenMetrics Solutions LLC, Zurich (2019) 12 Setup: • Solver: solnp c=1 • Returns to calculate the weights: logarithmic (weekly) • Returns to calculate the backtest: logarithmic (daily) At first sight the survivorship bias is greatly reduced. NOTE: Using logarithmic returns for the backtest would imply that there is a constant rebalancing scheme in place. Selection: Continuous Weights
  • 13. © OpenMetrics Solutions LLC, Zurich (2019) 13 Setup: • Solver: solnp c=1 • Returns to calculate the weights: logarithmic (weekly) • Returns to calculate the backtest: discrete (daily) While the logarithmic backtest assumes a constant rebalancing scheme the discrete backtest does not; in this case the rebalancing frequency is daily. This leads immediately to a selection bias. The assumed premium can almost certainly not be exploited in production. Selection: Continuous Weights
  • 14. © OpenMetrics Solutions LLC, Zurich (2019) 14 Setup: • Solver: solnp c=1 • Returns to calculate the weights: discrete (weekly) • Returns to calculate the backtest: discrete (daily) This shows the potential on how much the biases could be reduced. Selection: Continuous Weights
  • 15. © OpenMetrics Solutions LLC, Zurich (2019) 15 NOTE: • To further reduce the bias the parameter “c” cold be lowered. • For the weights based on the discrete returns it would be better practice to use the geometric mean. However; the results are basically the same. • We are not looking for a perfect track record (which is probably impossible for the discrete case); but for a basket of equities that does not promise us too much more return and too much less risk as the index itself. Selection: Continuous Weights
  • 16. Selection: Discrete Weights • The continuous weights do not have a discrete subset of the 300 equities as a result. It was just done to get a feeling for the potential to reduce the biases. • In order to get a discrete subset it is necessary to introduce binary variables (𝑏𝑖). It comes down to solving the following problem: min var 𝑟𝑏 − 𝑅𝑤 𝑠. 𝑡. 𝑚𝑒𝑎𝑛 𝑅𝑤 = 𝑐 ∙ 𝑚𝑒𝑎𝑛 𝑟𝑏 𝑤𝑖 = 𝑏𝑖/(𝑠𝑢𝑚 𝑏 + 𝜀) 𝑏𝑖 ∈ 𝐵 • Where 𝜀 is chosen to be the machine precision. This makes sure that there are no divisions by zero while the impact stays neglectable. This since as soon as 𝑠𝑢𝑚(𝑏)>0 then 𝑠𝑢𝑚(𝑏) + 𝜀 → 𝑠𝑢𝑚(𝑏). • Weights: f(binary). Objective: non-linear. Constraints: linear. If the normalization of the weights cannot be linearized this problem can only be solved by using Mixed Integer Non-Linear optimizers. © OpenMetrics Solutions LLC, Zurich (2019) 16
  • 17. Selection: Discrete Weights • In order to get a discrete subset it is necessary to constrain the weights to be binary. There are various methods to achieve this: © OpenMetrics Solutions LLC, Zurich (2019) 17 Simulated Annealing • Might have to be implemented from scratch • Results might vary greatly based on initial parameters • Could be further investigated Genetic Algorithms • Publicly available • Did not perform well in tests • Could be further investigated Differential Evolution • Publicly available • Performed well in tests Branch and Bound • Publicly available • Performed well in tests
  • 18. Selection: Discrete Weights (Differential Evolution) require(DEoptimR) nVar <- ncol(returnsUniverse) xFun <- function(x) {x<-round(x);x<-x/(sum(x)+eps)} fn <- function(x){x<-xFun(x);var(returnsBenchmark-returnsUniverse%*%x)} c <- 1 constr <- function(x){ x<-xFun(x) c(mean(returnsUniverse%*%x)-c*mean(returnsBenchmark),sum(x)-1) } JDEoptim(fn=fn,constr=constr,meq=2, lower=rep(0,nVar),upper=rep(1,nVar)) © OpenMetrics Solutions LLC, Zurich (2019) 18
  • 19. © OpenMetrics Solutions LLC, Zurich (2019) 19 Setup: • Solver: DEoptimR c=1 • Returns to calculate the weights: discrete (daily) • Returns to calculate the backtest: discrete (daily) • Number of Assets chosen: 42 In this case daily returns are computationally less expensive than weekly returns. Selection: Discrete Weights (Differential Evolution)
  • 20. Selection: Discrete Weights (Branch and Bound) • For the differential evolution solver the continuous weights were transformed into discrete weights. While this works well for differential evolution it is not working well for most other solvers. • It is certainly better practice to use a solver that is designed to work with binary variables. • Such solvers are accessible through the AMPL interface. Although the free version is limited to 300 variables (equities) by AMPL and is even further reduced within most commercial solvers. An alternative is to access them through NEOS. But the computation time there is limited to 8 hours. Hence the dilution in order to make a local calculation possible. • The AMPL framework works as follows: © OpenMetrics Solutions LLC, Zurich (2019) 20 Data File Model File Run File Holds all the data necessary for the computations. Defines the optimization problem. Defines the execution: solver and parameters. • Once these files are generated the problem can be solved with any suitable solver. Changing the solver involves just changing one word within the run file. NOTE: A similar framework which is completely implemented in R is ROI.
  • 21. Selection: Discrete Weights (Branch and Bound) model <- c( "param nVar;", "param nSmp;", "param returnsUniverse {1..nSmp,1..nVar};", "param returnsBenchmark {1..nSmp};", "param mu {1..nVar};", "param targetReturn;", "var xb {j in 1..nVar} binary;", "var xS = sum{j in 1..nVar} xb[j];", "var x {j in 1..nVar} = xb[j]/(xS+eps);", "var portfolioReturns {i in 1..nSmp} = sum{j in 1..nVar} returnsUniverse[i,j]*x[j];", "var diffReturns {i in 1..nSmp} = returnsBenchmark[i]-portfolioReturns[i];", "var mean = (sum {i in 1..nSmp} diffReturns[i])/nSmp;", "var variance = (sum{i in 1..nSmp} (diffReturns[i]-mean)^2)/nSmp;", "var portfolioReturn = sum{j in 1..nVar} mu[j]*x[j];", "minimize trackError: variance;", "subject to targetLB: portfolioReturn >= targetReturn*0.90;", "subject to targetUB: portfolioReturn <= targetReturn*1.00;“ ) © OpenMetrics Solutions LLC, Zurich (2019) 21
  • 22. Selection: Discrete Weights (Branch and Bound) # model: see previous slide amplModelFile(model) c <- 1 nVar <- ncol(returnsUniverse) nSmp <- nrow(returnsUniverse) mu <- colMeans(returnsUniverse) targetReturn <- c*mean(returnsBenchmark) eps <- .Machine$double.eps data <-list("nVar"=nVar,"nSmp"=nSmp, "returnsUniverse"=returnsUniverse, "returnsBenchmark"=returnsBenchmark, "mu"=mu,"targetReturn"=targetReturn, "eps"=eps) amplDataFile(data) interface <- c("local","neos")[1] solver <- c("bonmin","cplex","couenne")[1] options <- "option bonmin_options 'algorithm=B-BB tolerance=1e-08';" amplRunFile(interface,solver,options) amplOutRun(external=FALSE) © OpenMetrics Solutions LLC, Zurich (2019) 22
  • 23. © OpenMetrics Solutions LLC, Zurich (2019) 23 Setup: • Solver:AMPL/Bonmin c=1 • Returns to calculate the weights: discrete (weekly) • Returns to calculate the backtest: discrete (daily) • Number of Assets chosen: 37 33 of the chosen assets are identical with the selection of DEoptimR. NOTE: Results are preliminary. Selection: Discrete Weights (Branch and Bound)
  • 24. Outlook Note that this project is still work in progress. The current results to still show a remaining bias. Which could be further reduced by: • Using the return factor: c<1 • Switch back to daily returns for the calculation of the weights. Although the impact is rather limited. • Check on whether there is a way to linearize the problem. • Explore more solvers. • Switch back to the full universe (no dilution). Needs an unrestricted solver environment. Might be possible with ROI in the future. © OpenMetrics Solutions LLC, Zurich (2019) 24
  • 25. Conclusion • Blindly using equity data can lead to quite dramatic survivorship or selection biases. • This leads to greatly flawed backtests which might be a severe problem within the financial industry. The premiums expected based on the backtests might completely collapse in production. NOTE: biases are not the only reason for this (e.g. in-sample, over-fitting, timing and slippage, costs, …); but might be quite dominant. • The representative universe represents an alternative to the complete universe approach whose benefit is mostly that it is a very efficient way to get a first unbiased impression on a trading or risk management strategy applied on equities. © OpenMetrics Solutions LLC, Zurich (2019) 25
  • 26. Contact OpenMetrics Solutions LLC Dufoursstrasse 47 CH-8008 Zurich +41 44 552 4909 contact@openmetrics.ch www.openmetrics.ch OpenMetrics Solutions LLC is an approved ETH Zurich Spin-off OpenMetrics© is a registered trademark at IPI (Swiss Federal Institute of Intellectual Property), Berne © OpenMetrics Solutions LLC, Zurich (2019) 26 swiss made software is the nationally and internationally recognized symbol for Swiss quality in software development.