SlideShare a Scribd company logo
Statistical Clustering, Hierarchical PCA and Portfolio
Management
Marco Avellaneda & Juan A. Serur
Courant Institute of Mathematical Sciences, New York University
9th Annual Big Data Finance Conference
Dec 2021
9th Annual Big Data Finance Conference Dec 2021
Introduction
Quantitative factor analysis has become increasingly important in portfolio management.
Some prominent works
Sharpe → CAPM
˜
ri − rF = βi,M (˜
rM − rF ) (1)
Ross → APT
˜
ri − rF =
N
X
j=1
βi,j (˜
ri − rF ) (2)
Fama-French (explicit factors: value, size, profitability, etc.)
9th Annual Big Data Finance Conference Dec 2021
Types of factor models
Models based on explicit factors such as momentum, value, size, quality, etc.
CAPM
Fama-French Three-Factor Model
Fama-French Five-Factor Model
Models based on mathematical factors like statistical features extracted from assets’ returns
using
Principal Component Analysis
Maximum Likelihood
Among others...
Some advantages of mathematical factor models:
Don’t make assumptions on the drivers of price movements
Rely on market data without additional information
Some challenges, like choosing the number K factor. Much of the work in this area is
related to Random Matrix Theory.
9th Annual Big Data Finance Conference Dec 2021
Principal Component Analysis (PCA)
In a universe consisting of N stocks and T observations, we consider the N × N empirical
correlation matrix,
C =
1
T
RRt
(3)
where R is the T × N matrix of standardized returns.1
PCA calculates the eigenvalues and eigenvectors of the correlation matrix ranked in de-
creasing order by eigenvalues. Accordingly, the first eigenvector solves the variational
problem
V (1)
= argmax {V t
CV : ||V ||2 = 1} (4)
where ||.||2 represents the Euclidean space.
9th Annual Big Data Finance Conference Dec 2021
First Eigenportfolio ≈ Market Portfolio
The first eigenvector is key to describe the statistics of the system
Describes it statistically (as the direction of maximum variance) and financially
Define the principal eigenportfolio as the portfolio with weights
θi = c
V
(1)
i
σi
, where c =
1
N
P
j=1
V
(1)
j
σj
(5)
If F is the return of the principal eigenportfolio, then the first-order optimality condition
for a portfolio that maximizes the Sharpe Ratio over all competing portfolios investing in
the same N stocks can be represented as
r − E(r) = βr (F − E(F)) + r (6)
The Principal Eigenportfolio is connected to the concept of the Market Portfolio (Modern
Portfolio Theory).
9th Annual Big Data Finance Conference Dec 2021
The Principal Eigenportfolio is connected to the concept of the Market Portfolio (Mod-
ern Portfolio Theory). If the entries of the principal eigenvector V
(1)
i are positive for all i,
the principal eigenportfolio has positive weights. This follows the Perron-Frobenius The-
orem.
Perron-Frobenius Theorem
Any square matrix Ai,j  0 with positive entries has a unique eigenvector with positive
entries (up to a multiplication by a positive scalar), and the corresponding eigenvalue has
multiplicity one and is strictly greater than the absolute value of any other eigenvalue.
That is
∃ λ∗
 0, V ∗
 0, ||V ∗
||2 = 1 s.t. Aλ∗
= λ∗
V ∗
. V ∗
→ right eigenvector
∀ λ, |λ| ≤ λ∗
V ∗
is unique up to scaling
The Perron-Frobenius vector applying Collotz-Wielandt formula
λ∗
= max
x≥0,x6=0
min
xi 6=0
[Ax]i
xi
= min
x0
max
[Ax]i
xi
This result can be extended to non-negative matrices
9th Annual Big Data Finance Conference Dec 2021
PCA in Stock Markets
Difficult to find a financial explanation in higher-order eigenportfolios
The first eigenvector has all the entries positive (good proxy of the market
portfolio)
9th Annual Big Data Finance Conference Dec 2021
Hierarchical PCA
Partition the stock universe into clusters
Strong beliefs on the “intra-cluster data”
Weak beliefs on the “inter-cluster data”
Define b “benchmark portfolios” associated with clusters
Set I(i) = K if stock i is in cluster K
A correlation matrix Ĉ which incorporates the modeler’s beliefs in a parsimonious fashion
is given by:
Ĉi,j =
(
Ci,j if I(i) = I(j)
βi βj ρ̂I(i),I(j)
otherwise.
(7)
From the orthogonality of the eigenportfolios in the same sector, we can derive a simple
formula for the regression coefficients:
βi = Corr(Xi , FI(i)
) =
√
λ1,I(i) V
I(i)
i (8)
9th Annual Big Data Finance Conference Dec 2021
If the benchmark portfolio for a cluster is its 1st eigenportfolio, then
Fk
=
1
√
λ1,k
X
i:I(i)=k
V k
i Xi (9)
where Xi are the standardized returns, V k
i is the 1st eigenvector of the PCA factorized
matrix of cluster k, and λ1,k
is the 1st eigenvalue.
𝑪𝑳𝑼𝑺𝑻𝑬𝑹𝑺 𝑴𝑶𝑫𝑰𝑭𝑰𝑬𝑫 𝑴𝑨𝑻𝑹𝑰𝑿
𝑪𝒊,𝒋
𝜷𝒊𝜷𝒋𝝆𝑰(𝒊),𝑰(𝒋)
𝜷𝒊𝜷𝒋𝝆𝑰(𝒊),
𝜷𝒊𝜷𝒋𝝆𝑰(𝒊),𝑰(𝒋)
𝑪𝒊,𝒋
𝜷𝒊𝜷𝒋𝝆𝑰(𝒊),𝑰(𝒋)
𝑪𝒊,𝒋
9th Annual Big Data Finance Conference Dec 2021
HPCA matrix presents a more clear, distinct block structure
Darker areas → low correlation (inter-clusters)
Lighter areas → higher correlation (intra-clusters)
0
Figure: Original (left) and modified (right) correlation matrices estimated with the SP 500
returns’ constituents and GICS clusters, from 2010 to 2019.
9th Annual Big Data Finance Conference Dec 2021
Examples: Global Stocks
Analysis of four major global equity markets: United States, Europe, Emerging markets
and China.
Sector (GICS) USA Europe Emerging Mkts. China
Communication 24 42 59 10
Consumer Discretionary 64 66 114 84
Consumer Staples 32 43 92 23
Energy 28 22 56 5
Financials 64 109 277 19
Health Care 60 54 54 50
Industrials 69 115 89 97
Information Technology 69 33 121 88
Materials 28 49 121 81
Real Estate 31 33 44 22
Utilities 28 30 46 19
Total 497 596 1049 498
Table: Numbers of companies considered in the study by GICS sectors and regions.
9th Annual Big Data Finance Conference Dec 2021
US Stocks
The curve of cumulative explained variance of PCA rises faster
HPCA has lower concentration given a number of components
HPCA is a less greedy algorithm
9th Annual Big Data Finance Conference Dec 2021
US Stocks
Clear higher-order portfolios
Single-cluster portfolios
Multi-cluster portfolios (i.e., portfolios of portfolios)
9th Annual Big Data Finance Conference Dec 2021
China Stocks
Compared to the other markets, the spread between the two lines is greater
Less diversity level
9th Annual Big Data Finance Conference Dec 2021
China Stocks
Clear higher-order eigenportfolios
Almost all are multi-cluster portfolios (i.e., portfolios of portfolios)
9th Annual Big Data Finance Conference Dec 2021
European Stocks
Similar behavior to the case of the US
The analysis with European countries (instead of GICS) delivered similar behavior
9th Annual Big Data Finance Conference Dec 2021
European Stocks
Mix between single- and multi-cluster portfolios
9th Annual Big Data Finance Conference Dec 2021
Emerging Markets
Behavior is more similar to the case of China than the case of the US
The analysis with Emerging countries (instead of GICS) delivered similar behavior
9th Annual Big Data Finance Conference Dec 2021
Emerging Markets
Most of the portfolios are multi-cluster
9th Annual Big Data Finance Conference Dec 2021
Statistically Generated Clusters
Stocks belonging to the same GICS or country share common factors that capture
–to some extend– their joint dynamics
They are very easy to interpret
However, they have some shortcomings
Stock markets and their components change almost continuously
For risk and portfolio management, practitioners seek a trade off between stable and
adaptive clusters
I It goes against the diversification of a seemingly diversified strategy
I Trading strategies such as sector/country rotation may be affected
9th Annual Big Data Finance Conference Dec 2021
Description of the Algorithm
Given M eigenvectors of PCA, we construct 2M
clusters
Set a {+1, −1} M-vector. For each stock i, each entry represents the sign of the
eigenvector (representing a quadrant)
The new space is divided into M quadrants
9th Annual Big Data Finance Conference Dec 2021
Results
Like in the US case, the curve of cumulative explained variance of PCA rises faster
The difference here is even higher (here it is for more than 2600 stocks!)
9th Annual Big Data Finance Conference Dec 2021
Cluster
5
Cluster
5
Cluster
5
Cluster
5
Cluster
13
Cluster
13
Cluster
1
Cluster
6
Cluster
16
Cluster
16
Cluster
3
Cluster
9
Cluster
9
Cluster
9
Cluster
9
Cluster
10
Cluster
10
Cluster
10
Cluster
10
Cluster
12
Cluster
12
Cluster
12
Cluster
15
Cluster
15
Cluster
15
Cluster
5
Cluster
5
Cluster
5
Cluster
5
Cluster
13
Cluster
13
Cluster
1
Cluster
6
Cluster
16
Cluster
16
Cluster
3
Cluster
9
Cluster
9
Cluster
9
Cluster
9
Cluster
10
Cluster
10
Cluster
10
Cluster
10
Cluster
12
Cluster
12
Cluster
12
Cluster
12
Cluster
15
Cluster
15
Cluster
15
Cluster
5
Cluster
5
Cluster
5
Cluster
5
Cluster
13
Cluster
13
Cluster
1
Cluster
6
Cluster
16
Cluster
16
Cluster
3
Cluster
9
Cluster
9
Cluster
9
Cluster
9
Cluster
10
Cluster
10
Cluster
10
Cluster
10
Cluster
12
Cluster
12
Cluster
12
Cluster
12
Cluster
15
Cluster
15
Cluster
15
Cluster
5
Cluster
5
Cluster
5
Cluster
5
Cluster
13
Cluster
13
Cluster
1
Cluster
6
Cluster
16
Cluster
16
Cluster
3
Cluster
9
Cluster
9
Cluster
9
Cluster
9
Cluster
10
Cluster
10
Cluster
10
Cluster
10
Cluster
12
Cluster
12
Cluster
12
Cluster
12
Cluster
15
Cluster
15
Cluster
15
Cluster
5
Cluster
5
Cluster
5
Cluster
5
Cluster
13
Cluster
13
Cluster
1
Cluster
6
Cluster
16
Cluster
16
Cluster
3
Cluster
9
Cluster
9
Cluster
9
Cluster
9
Cluster
10
Cluster
10
Cluster
10
Cluster
10
Cluster
12
Cluster
12
Cluster
12
Cluster
12
Cluster
15
Cluster
15
Cluster
15
Cluster
5
Cluster
5
Cluster
5
Cluster
5
Cluster
13
Cluster
13
Cluster
1
Cluster
6
Cluster
16
Cluster
16
Cluster
3
Cluster
9
Cluster
9
Cluster
9
Cluster
9
Cluster
10
Cluster
10
Cluster
10
Cluster
10
Cluster
12
Cluster
12
Cluster
12
Cluster
15
Cluster
15
Cluster
15
Cluster
5
Cluster
5
Cluster
5
Cluster
5
Cluster
13
Cluster
13
Cluster
1
Cluster
6
Cluster
16
Cluster
16
Cluster
3
Cluster
9
Cluster
9
Cluster
9
Cluster
9
Cluster
10
Cluster
10
Cluster
10
Cluster
10
Cluster
12
Cluster
12
Cluster
12
Cluster
15
Cluster
15
Cluster
15
Cluster
5
Cluster
5
Cluster
5
Cluster
5
Cluster
13
Cluster
13
Cluster
1
Cluster
6
Cluster
16
Cluster
16
Cluster
3
Cluster
9
Cluster
9
Cluster
9
Cluster
9
Cluster
10
Cluster
10
Cluster
10
Cluster
10
Cluster
12
Cluster
12
Cluster
12
Cluster
15
Cluster
15
Cluster
15
Cluster
5
Cluster
5
Cluster
5
Cluster
5
Cluster
13
Cluster
13
Cluster
1
Cluster
6
Cluster
16
Cluster
16
Cluster
3
Cluster
9
Cluster
9
Cluster
9
Cluster
9
Cluster
10
Cluster
10
Cluster
10
Cluster
10
Cluster
12
Cluster
12
Cluster
12
Cluster
15
Cluster
15
Cluster
15
Cluster
5
Cluster
5
Cluster
5
Cluster
5
Cluster
13
Cluster
13
Cluster
1
Cluster
6
Cluster
16
Cluster
16
Cluster
3
Cluster
9
Cluster
9
Cluster
9
Cluster
9
Cluster
10
Cluster
10
Cluster
10
Cluster
10
Cluster
12
Cluster
12
Cluster
12
Cluster
15
Cluster
15
Cluster
15
Eigenportfolio concentrated in one or a few clusters
Clusters account for common factors, like GICS and countries, and for hidden
(latent) statistical factors affecting stock returns
9th Annual Big Data Finance Conference Dec 2021
Application to Portfolio Management
First eigenportfolio and the market portfolio (SPY)
First eigenportfolio does a good job of tracking SPY!
9th Annual Big Data Finance Conference Dec 2021
Statistical Clustering Factor Model
The model correlation matrix using K components of HPCA reads
C = ÔΛ̂ÔT
+ ζ2
(10)
ζ2
j =
N
X
i=K+1
λ(i)
(O
(i)
j )2
(11)
where ζ2
is the (uncorrelated) idiosyncratic risk. The factor expected returns are
E(r) =
K
X
i=i
β
(K)
j F(K)
+ i (12)
The number K of factors are chosen with the eRank. Let the SVD of the T × N matrix
of standardized log-return
R = UDV (13)
The associated probability distribution is
Pj =
σj
kσk1
for j = 1, ..., Q for Q = M ∧ T (14)
where k.k1 is the L − 1 norm. Then, the effective rank is defined as
eRank(R) = exp{H(P1, P2, ..., PQ )} (15)
where H is the Shannon entropy
9th Annual Big Data Finance Conference Dec 2021
Mean-variance optimization:
I Monthly rebalancing using a 6-month estimation window
I HPCA statistical-based factor model (blue) outperforms the HPCA GICS-based factor
model (red) and the shrinkage estimator (salmon)
See [Fabozzi, Focardi and Kolm, 2010], [Fabozzi, Kolm and Focardi, 2006], [?].
9th Annual Big Data Finance Conference Dec 2021
References
Avellaneda, M. (2019) Hierarchical PCA and Applications to Portfolio Management.
NYU Courant Working Paper.
Fabozzi, F.J., Focardi, S.M. and Kolm, P.N. (2010) Quantitative Equity Investing:
Techniques and Strategies. Hoboken, NJ: John Wiley  Sons, Inc.
Fabozzi, F.J., Kolm, P.N. and Focardi, S.M. (2002) Robust Financial Modeling of
the Equity Market: From CAPM to Cointegration. Hoboken, NJ: John Wiley 
Sons, Inc.
Kakushadze, Z. (2015) Heterotic Risk Models. Wilmott Magazine 2015(80): 40-55.
Kakushadze, Z. and Yu, W. (2017) Statistical Risk Models. Journal of Investment
Strategies 6(2): 1-40.
Lopez de Prado, M. (2016) Building Diversified Portfolios that Outperform
Out-of-Sample. Journal of Portfolio Management 42(4): 59-69.
Lopez de Prado, M. (2019) Ten Applications of Financial Machine Learning.
Available at SSRN.
9th Annual Big Data Finance Conference Dec 2021

More Related Content

PDF
Statistical Clustering and Portfolio Management
PDF
Correlation modeling and portfolio optimization - CIPEFA
PDF
Discussion world finance coference, Italy - Juan A. Serur
PDF
2005. Dissertation Carlos Salas
PDF
Econophysics II: Detailed Look at Stock Markets and Trading - Thomas Guhr
PDF
Consistent Pricing of VIX Derivatives and SPX Options with the Heston++ model
PDF
Smiling Twice: The Heston++ Model
PDF
Knowledge Flows and Local Innovation Activity
Statistical Clustering and Portfolio Management
Correlation modeling and portfolio optimization - CIPEFA
Discussion world finance coference, Italy - Juan A. Serur
2005. Dissertation Carlos Salas
Econophysics II: Detailed Look at Stock Markets and Trading - Thomas Guhr
Consistent Pricing of VIX Derivatives and SPX Options with the Heston++ model
Smiling Twice: The Heston++ Model
Knowledge Flows and Local Innovation Activity

What's hot (11)

PDF
PPS
Liverpool Complexity Presentation
PDF
Pairs Trading: Optimizing via Mixed Copula versus Distance Method for S&P 5...
PDF
Relationship Between Global Stosk Indices and Optimal Allocation for a Global...
PDF
Sessione II - Estimation methods and accuracy - P.D. Falorsi F. Petrarca, P...
PDF
Learning on the job and the cost of business cycles
PDF
SIGMOD 2013 - Patricia's talk on "Value invention for Data Exchange"
PDF
A review of two decades of correlations, hierarchies, networks and clustering...
PDF
cCorrGAN: Conditional Correlation GAN for Learning Empirical Conditional Dist...
PDF
A Comparison of Hsu &Wang Model and Cost of Carry Model: The case of Stock In...
Liverpool Complexity Presentation
Pairs Trading: Optimizing via Mixed Copula versus Distance Method for S&P 5...
Relationship Between Global Stosk Indices and Optimal Allocation for a Global...
Sessione II - Estimation methods and accuracy - P.D. Falorsi F. Petrarca, P...
Learning on the job and the cost of business cycles
SIGMOD 2013 - Patricia's talk on "Value invention for Data Exchange"
A review of two decades of correlations, hierarchies, networks and clustering...
cCorrGAN: Conditional Correlation GAN for Learning Empirical Conditional Dist...
A Comparison of Hsu &Wang Model and Cost of Carry Model: The case of Stock In...
Ad

Similar to Statistical Clustering and Portfolio Management (20)

PDF
Exploratory data analysis
PDF
IPE Article
PDF
POLA_QUANT.IT_2016_v5
PDF
Market Timing, Big Data, and Machine Learning by Xiao Qiao at QuantCon 2016
PPTX
1101AIFQA07_AI_in_Finance_and_Quantitative_Analysis.pptx
PPTX
Advanced Econometrics L1-2.pptx
PDF
Sesión 8. análisis de estilos
DOCX
Math 141 Exam Final Exam Name____________________.docx
DOCX
Math 141 Exam Final Exam Name____________________.docx
DOCX
1) How would you define statistics How is statistics used in bu.docx
PDF
Bivariate Regression
PDF
El impacto de los comunicados de prensa sobre la liquidez de la bolsa colombiana
PDF
4 things maybe you don't know about nasdaq-100 (posted 23th June 2017)
PPTX
The Relationship Between Firm Investment and Financial Status
PDF
Qwafafew meeting 4
PDF
_Draft__Asset_Supply_and_Liquidity_Transformation_in_HANK (2).pdf
PDF
EDHEC_Publication_Factor_Investing_and_Risk_Allocation
PDF
Modeling market and nonmarket Intangible investments in a macro-econometric f...
PDF
Measuring and allocating portfolio risk capital in the real world
PDF
Ciadmin,+journal+manager,+2718 10868-1-ce
Exploratory data analysis
IPE Article
POLA_QUANT.IT_2016_v5
Market Timing, Big Data, and Machine Learning by Xiao Qiao at QuantCon 2016
1101AIFQA07_AI_in_Finance_and_Quantitative_Analysis.pptx
Advanced Econometrics L1-2.pptx
Sesión 8. análisis de estilos
Math 141 Exam Final Exam Name____________________.docx
Math 141 Exam Final Exam Name____________________.docx
1) How would you define statistics How is statistics used in bu.docx
Bivariate Regression
El impacto de los comunicados de prensa sobre la liquidez de la bolsa colombiana
4 things maybe you don't know about nasdaq-100 (posted 23th June 2017)
The Relationship Between Firm Investment and Financial Status
Qwafafew meeting 4
_Draft__Asset_Supply_and_Liquidity_Transformation_in_HANK (2).pdf
EDHEC_Publication_Factor_Investing_and_Risk_Allocation
Modeling market and nonmarket Intangible investments in a macro-econometric f...
Measuring and allocating portfolio risk capital in the real world
Ciadmin,+journal+manager,+2718 10868-1-ce
Ad

Recently uploaded (20)

PPT
E commerce busin and some important issues
PDF
Copia de Minimal 3D Technology Consulting Presentation.pdf
PDF
Buy Verified Stripe Accounts for Sale - Secure and.pdf
PPTX
kyc aml guideline a detailed pt onthat.pptx
PDF
ABriefOverviewComparisonUCP600_ISP8_URDG_758.pdf
PDF
Unkipdf.pdf of work in the economy we are
PDF
Spending, Allocation Choices, and Aging THROUGH Retirement. Are all of these ...
PDF
financing insitute rbi nabard adb imf world bank insurance and credit gurantee
PDF
NAPF_RESPONSE_TO_THE_PENSIONS_COMMISSION_8 _2_.pdf
PPTX
Unilever_Financial_Analysis_Presentation.pptx
PDF
ECONOMICS AND ENTREPRENEURS LESSONSS AND
PDF
discourse-2025-02-building-a-trillion-dollar-dream.pdf
PPTX
Session 14-16. Capital Structure Theories.pptx
PPTX
EABDM Slides for Indifference curve.pptx
PDF
Understanding University Research Expenditures (1)_compressed.pdf
PPTX
The discussion on the Economic in transportation .pptx
PPTX
FL INTRODUCTION TO AGRIBUSINESS CHAPTER 1
PDF
Dialnet-DynamicHedgingOfPricesOfNaturalGasInMexico-8788871.pdf
PPTX
Introduction to Managemeng Chapter 1..pptx
PDF
Chapter 9 IFRS Ed-Ed4_2020 Intermediate Accounting
E commerce busin and some important issues
Copia de Minimal 3D Technology Consulting Presentation.pdf
Buy Verified Stripe Accounts for Sale - Secure and.pdf
kyc aml guideline a detailed pt onthat.pptx
ABriefOverviewComparisonUCP600_ISP8_URDG_758.pdf
Unkipdf.pdf of work in the economy we are
Spending, Allocation Choices, and Aging THROUGH Retirement. Are all of these ...
financing insitute rbi nabard adb imf world bank insurance and credit gurantee
NAPF_RESPONSE_TO_THE_PENSIONS_COMMISSION_8 _2_.pdf
Unilever_Financial_Analysis_Presentation.pptx
ECONOMICS AND ENTREPRENEURS LESSONSS AND
discourse-2025-02-building-a-trillion-dollar-dream.pdf
Session 14-16. Capital Structure Theories.pptx
EABDM Slides for Indifference curve.pptx
Understanding University Research Expenditures (1)_compressed.pdf
The discussion on the Economic in transportation .pptx
FL INTRODUCTION TO AGRIBUSINESS CHAPTER 1
Dialnet-DynamicHedgingOfPricesOfNaturalGasInMexico-8788871.pdf
Introduction to Managemeng Chapter 1..pptx
Chapter 9 IFRS Ed-Ed4_2020 Intermediate Accounting

Statistical Clustering and Portfolio Management