SlideShare a Scribd company logo
Introduction
Clustering Financial Time Series:
How Long is Enough?
25th International Joint Conference on Artificial Intelligence
IJCAI-16
S. Andler, G. Marti, F. Nielsen, P. Donnat
July 14, 2016
Gautier Marti Clustering Financial Time Series: How Long is Enough?
Introduction
Clustering of Financial Time Series
Goal: Build Risk & Trading AI agents. . .
source: www.datagrapple.com
. . . which can strive with this kind of data.
Gautier Marti Clustering Financial Time Series: How Long is Enough?
Introduction
Clustering of Financial Time Series
Stylized fact I: Financial time series correlations have a strong
hierarchical block diagonal structure (Econophysics [4])
Stylized fact II: Most correlations are spurious (RMT [2])
Motivation for clustering financial time series using correlation as a
similarity measure:
dimensionality reduction ≡ filtering noisy correlations
Gautier Marti Clustering Financial Time Series: How Long is Enough?
Introduction
Challenge for the statistical practitioner
The dilemma:
the longer the time interval, the more precise the correlation
estimates, but also
the longer the time interval, the more unrealistic the
stationarity hypothesis for these time series.
Question: How does the clustering behave with statistical errors
of the correlation estimates?
How long is enough? 30 days? 120 days? 10 years?
Gautier Marti Clustering Financial Time Series: How Long is Enough?
Introduction
A first theoretical approach - simplified setting
We consider the following framework:
financial time series ≡ random walks
they follow a joint elliptical distribution (e.g. Gaussian,
Student) parameterized by a correlation matrix
the correlation matrix has a hierarchical block structure:
Gautier Marti Clustering Financial Time Series: How Long is Enough?
Introduction
Simulations in the simplified setting
Some influential parameters:
clustering algorithm
number of observations T
number of variables N relative to T
contrast between the correlations, and their values
correlation estimator (e.g. Pearson, Spearman)
100 200 300 400 500
Sample size
0.0
0.2
0.4
0.6
0.8
1.0
Score
Empirical rates of convergence for Single Linkage
Gaussian - Pearson
Gaussian - Spearman
Student - Pearson
Student - Spearman
100 200 300 400 500
Sample size
0.0
0.2
0.4
0.6
0.8
1.0
Score
Empirical rates of convergence for Average Linkage
Gaussian - Pearson
Gaussian - Spearman
Student - Pearson
Student - Spearman
100 200 300 400 500
Sample size
0.0
0.2
0.4
0.6
0.8
1.0
Score
Empirical rates of convergence for Ward
Gaussian - Pearson
Gaussian - Spearman
Student - Pearson
Student - Spearman
Ratio of the number of correct clustering obtained over the
number of trials as a function of T
Gautier Marti Clustering Financial Time Series: How Long is Enough?
Introduction
A consistency proof & first convergence bounds
A 2-step proof. First step:
We consider Hierarchical Agglomerative Clustering algorithms
Space contracting vs. Space conserving vs. Space dilating [1]
D(t+1)
C
(t)
i
∪ C
(t)
j
, C
(t)
k
≤ min D
(t)
ik
, D
(t)
jk
D(t+1)
C
(t)
i
∪ C
(t)
j
, C
(t)
k
∈
min D
(t)
ik
, D
(t)
jk
, max D
(t)
ik
, D
(t)
jk
D(t+1)
C
(t)
i
∪ C
(t)
j
, C
(t)
k
≥ max D
(t)
ik
, D
(t)
jk
Gautier Marti Clustering Financial Time Series: How Long is Enough?
Introduction
A consistency proof & first convergence bounds
A 2-step proof. First step:
Which geometrical configurations lead to the true clustering?
For space-conserving algorithms (e.g. Single, Complete, Average
Linkage), a sufficient separability condition reads
max Dintra := max
1≤i,j≤N
C(i)=C(j)
d(Xi , Xj ) < min
1≤i,j≤N
C(i)=C(j)
d(Xi , Xj ) =: min Dinter
Gautier Marti Clustering Financial Time Series: How Long is Enough?
Introduction
A consistency proof & first convergence bounds
A 2-step proof. Second step:
How long does it take for the estimates of the correlation
coefficients to be precise enough to be with high probability in
a good configuration for the clustering algorithm?
Answer: Concentration inequalities for correlation coefficients.
Gautier Marti Clustering Financial Time Series: How Long is Enough?
Introduction
Convergence bounds
Combining both steps, we get the following convergence rate:
Convergence rate
The probability of the clustering algorithm making an error is
O
log N
T
.
Gautier Marti Clustering Financial Time Series: How Long is Enough?
Introduction
Proof. Step 1 - A bit more details
By induction.
Let’s assume the separability condition is satisfied at step t,
then
min D
(t)
intra ≤ max D
(t)
intra < min D
(t)
inter ≤ max D
(t)
inter
From the space-conserving property, we get:
D
(t+1)
intra ∈ min D
(t)
intra, max D
(t)
intra and D
(t+1)
inter ∈ min D
(t)
inter, max D
(t)
inter .
Therefore:
separability condition is satisfied at t+1,
the clustering algorithm has not linked points from two
different clusters between step t and step t + 1.
Gautier Marti Clustering Financial Time Series: How Long is Enough?
Introduction
Proof. Step 2 - A bit more details
Maximum statistical error
For space conserving algorithm the separability condition is met if
ˆΣ − Σ ∞ <
minρi ,ρj
|ρi − ρj |
2
,
where C(i) = C(j).
This means that the statistical error has to be below the minimum
correlation ‘contrast’ between the clusters.
Weaker the ‘contrast’, more precise the correlation estimates have to be.
N.B. From Cram´er–Rao lower bound, we get for Pearson correlation
estimator:
var(ˆρ) ≥
(1 − ρ2
)2
1 + ρ2
.
When correlation is high, it is easier to estimate.
Gautier Marti Clustering Financial Time Series: How Long is Enough?
Introduction
Correlation estimates concentration bounds
number of variables N, observations T, minimum separation d
Concentration bounds [3]
If Σ and ˆΣ are the population and empirical Spearman correlation
matrices respectively, then for N ≥ 24
log T + 2, we have with
probability at least 1 − 1
T2 ,
ˆΣ − Σ ∞ ≤ 24
log N
T
.
P(“correct clustering”) ≥ 1 − 2N2
e−Td2/24
Not sharp enough! (for reasonable values of N, T, d)
Gautier Marti Clustering Financial Time Series: How Long is Enough?
Introduction
Future developments
Bounds are not sharp enough. We can try to refine them using:
(theoretical) Intrinsic dimension of the HCBM model [5];
(empirical) A distance between dendrograms (instead of
correct/incorrect) for a finer analysis;
(empirical) A study of ‘correctness’ isoquants:
Precise convergence rates of clustering methodologies can provide
a useful model selection criterion for practitioners!
Gautier Marti Clustering Financial Time Series: How Long is Enough?
Introduction
Zhenmin Chen and John W Van Ness.
Space-conserving agglomerative algorithms.
Journal of classification, 13(1):157–168, 1996.
Laurent Laloux, Pierre Cizeau, Marc Potters, and
Jean-Philippe Bouchaud.
Random matrix theory and financial correlations.
International Journal of Theoretical and Applied Finance,
3(03):391–397, 2000.
Han Liu, Fang Han, Ming Yuan, John Lafferty, Larry
Wasserman, et al.
High-dimensional semiparametric gaussian copula graphical
models.
The Annals of Statistics, 40(4):2293–2326, 2012.
Rosario N Mantegna.
Hierarchical structure in financial markets.
Gautier Marti Clustering Financial Time Series: How Long is Enough?
Introduction
The European Physical Journal B-Condensed Matter and
Complex Systems, 11(1):193–197, 1999.
Joel A Tropp.
An introduction to matrix concentration inequalities.
arXiv preprint arXiv:1501.01571, 2015.
Gautier Marti Clustering Financial Time Series: How Long is Enough?

More Related Content

PDF
Clustering CDS: algorithms, distances, stability and convergence rates
PDF
A closer look at correlations
PDF
Clustering Financial Time Series using their Correlations and their Distribut...
PDF
Optimal Transport between Copulas for Clustering Time Series
PDF
On the stability of clustering financial time series
PDF
Some contributions to the clustering of financial time series - Applications ...
PDF
On clustering financial time series - A need for distances between dependent ...
PDF
Optimal Transport vs. Fisher-Rao distance between Copulas
Clustering CDS: algorithms, distances, stability and convergence rates
A closer look at correlations
Clustering Financial Time Series using their Correlations and their Distribut...
Optimal Transport between Copulas for Clustering Time Series
On the stability of clustering financial time series
Some contributions to the clustering of financial time series - Applications ...
On clustering financial time series - A need for distances between dependent ...
Optimal Transport vs. Fisher-Rao distance between Copulas

What's hot (20)

PDF
Clustering Random Walk Time Series
PDF
A review of two decades of correlations, hierarchies, networks and clustering...
PDF
Autoregressive Convolutional Neural Networks for Asynchronous Time Series
PDF
Hierarchical Deterministic Quadrature Methods for Option Pricing under the Ro...
PDF
Cari2020 Parallel Hybridization for SAT: An Efficient Combination of Search S...
PDF
MCQMC 2020 talk: Importance Sampling for a Robust and Efficient Multilevel Mo...
PDF
Numerical smoothing and hierarchical approximations for efficient option pric...
PDF
ABC in Varanasi
PDF
Is ABC a new empirical Bayes approach?
PDF
A Maximum Entropy Approach to the Loss Data Aggregation Problem
PDF
Measuring credit risk in a large banking system: econometric modeling and emp...
PDF
Bayesian model choice in cosmology
PDF
11.the comparative study of finite difference method and monte carlo method f...
PDF
Affine Term Structure Model with Stochastic Market Price of Risk
PDF
Uncertain Volatility Models
PDF
Spillover Dynamics for Systemic Risk Measurement Using Spatial Financial Time...
PDF
"Correlated Volatility Shocks" by Dr. Xiao Qiao, Researcher at SummerHaven In...
PDF
Pricing interest rate derivatives (ext)
PDF
Network and risk spillovers: a multivariate GARCH perspective
PDF
Pricing Exotics using Change of Numeraire
Clustering Random Walk Time Series
A review of two decades of correlations, hierarchies, networks and clustering...
Autoregressive Convolutional Neural Networks for Asynchronous Time Series
Hierarchical Deterministic Quadrature Methods for Option Pricing under the Ro...
Cari2020 Parallel Hybridization for SAT: An Efficient Combination of Search S...
MCQMC 2020 talk: Importance Sampling for a Robust and Efficient Multilevel Mo...
Numerical smoothing and hierarchical approximations for efficient option pric...
ABC in Varanasi
Is ABC a new empirical Bayes approach?
A Maximum Entropy Approach to the Loss Data Aggregation Problem
Measuring credit risk in a large banking system: econometric modeling and emp...
Bayesian model choice in cosmology
11.the comparative study of finite difference method and monte carlo method f...
Affine Term Structure Model with Stochastic Market Price of Risk
Uncertain Volatility Models
Spillover Dynamics for Systemic Risk Measurement Using Spatial Financial Time...
"Correlated Volatility Shocks" by Dr. Xiao Qiao, Researcher at SummerHaven In...
Pricing interest rate derivatives (ext)
Network and risk spillovers: a multivariate GARCH perspective
Pricing Exotics using Change of Numeraire
Ad

Viewers also liked (13)

PDF
On Clustering Financial Time Series - Beyond Correlation
PPTX
Fernando Imperiale - Una aguja en el pajar
PPT
Cormac Ferrick Sociology 204 Final Presentation
PPTX
Geography 372 Final Presentation
PPTX
IBM - Security Intelligence para PYMES
PDF
Bartha_Éva_Lili-A_matroid_és_gráfelmélet_összefüggései - MSc_Diplomamunka
PPTX
Yasemin yilmazer latifepalta_zeynepucar
PPTX
Fernando Imperiale - Security Intelligence para PYMES
PDF
Carla Casilli - Cineca + open badges - May 2015
PDF
International Coaching News article page 3
ODT
Prabhu Sundaramurthi (4)
PPSX
integrating climate risks in agricultural value chains enamul haque
PPTX
Magento News @ Magento Meetup Wien 17
On Clustering Financial Time Series - Beyond Correlation
Fernando Imperiale - Una aguja en el pajar
Cormac Ferrick Sociology 204 Final Presentation
Geography 372 Final Presentation
IBM - Security Intelligence para PYMES
Bartha_Éva_Lili-A_matroid_és_gráfelmélet_összefüggései - MSc_Diplomamunka
Yasemin yilmazer latifepalta_zeynepucar
Fernando Imperiale - Security Intelligence para PYMES
Carla Casilli - Cineca + open badges - May 2015
International Coaching News article page 3
Prabhu Sundaramurthi (4)
integrating climate risks in agricultural value chains enamul haque
Magento News @ Magento Meetup Wien 17
Ad

Similar to Clustering Financial Time Series: How Long is Enough? (20)

PDF
Probabilistic Modelling with Information Filtering Networks
PPTX
Omnibus diagnostic procedures for vector multiplicative errors models.pptx
PDF
Cointegration and Long-Horizon Forecasting
PDF
Systemic Risk Modeling - André Lucas, April 16 2014
PPTX
Degree presentation: Indirect Inference Applied to Financial Econometrics
PDF
Adesanya dissagregation of data corrected
PDF
slides of ABC talk at i-like workshop, Warwick, May 16
PDF
Threshold autoregressive (tar) &momentum threshold autoregressive (mtar) mode...
PDF
Glm
PDF
Forecasting Gasonline Price in Vietnam Based on Fuzzy Time Series and Automat...
PDF
Numerical Smoothing and Hierarchical Approximations for E cient Option Pricin...
PDF
Cb36469472
PDF
ICCF_2022_talk.pdf
PDF
PDF
Machine Learning for Epidemiological Models (Enrico Meloni)
PDF
Icitam2019 2020 book_chapter
PDF
Sequence Similarity between Genetic Codes using Improved Longest Common Subse...
PDF
Average case acceleration through spectral density estimation
PDF
Jmestn42351212
PDF
Time series analysis, modeling and applications
Probabilistic Modelling with Information Filtering Networks
Omnibus diagnostic procedures for vector multiplicative errors models.pptx
Cointegration and Long-Horizon Forecasting
Systemic Risk Modeling - André Lucas, April 16 2014
Degree presentation: Indirect Inference Applied to Financial Econometrics
Adesanya dissagregation of data corrected
slides of ABC talk at i-like workshop, Warwick, May 16
Threshold autoregressive (tar) &momentum threshold autoregressive (mtar) mode...
Glm
Forecasting Gasonline Price in Vietnam Based on Fuzzy Time Series and Automat...
Numerical Smoothing and Hierarchical Approximations for E cient Option Pricin...
Cb36469472
ICCF_2022_talk.pdf
Machine Learning for Epidemiological Models (Enrico Meloni)
Icitam2019 2020 book_chapter
Sequence Similarity between Genetic Codes using Improved Longest Common Subse...
Average case acceleration through spectral density estimation
Jmestn42351212
Time series analysis, modeling and applications

More from Gautier Marti (9)

PDF
Using Large Language Models in 10 Lines of Code
PDF
What deep learning can bring to...
PDF
A quick demo of Top2Vec With application on 2020 10-K business descriptions
PDF
cCorrGAN: Conditional Correlation GAN for Learning Empirical Conditional Dist...
PDF
How deep generative models can help quants reduce the risk of overfitting?
PDF
Generating Realistic Synthetic Data in Finance
PDF
Applications of GANs in Finance
PDF
My recent attempts at using GANs for simulating realistic stocks returns
PDF
Takeaways from ICML 2019, Long Beach, California
Using Large Language Models in 10 Lines of Code
What deep learning can bring to...
A quick demo of Top2Vec With application on 2020 10-K business descriptions
cCorrGAN: Conditional Correlation GAN for Learning Empirical Conditional Dist...
How deep generative models can help quants reduce the risk of overfitting?
Generating Realistic Synthetic Data in Finance
Applications of GANs in Finance
My recent attempts at using GANs for simulating realistic stocks returns
Takeaways from ICML 2019, Long Beach, California

Recently uploaded (20)

PPTX
Pilar Kemerdekaan dan Identi Bangsa.pptx
PPTX
New ISO 27001_2022 standard and the changes
PPTX
QUANTUM_COMPUTING_AND_ITS_POTENTIAL_APPLICATIONS[2].pptx
PDF
OneRead_20250728_1808.pdfhdhddhshahwhwwjjaaja
PDF
Data Engineering Interview Questions & Answers Batch Processing (Spark, Hadoo...
PPTX
A Complete Guide to Streamlining Business Processes
PPTX
CYBER SECURITY the Next Warefare Tactics
PPT
ISS -ESG Data flows What is ESG and HowHow
PPTX
Acceptance and paychological effects of mandatory extra coach I classes.pptx
PPTX
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
PPTX
Qualitative Qantitative and Mixed Methods.pptx
PPTX
Topic 5 Presentation 5 Lesson 5 Corporate Fin
PPTX
Managing Community Partner Relationships
PDF
REAL ILLUMINATI AGENT IN KAMPALA UGANDA CALL ON+256765750853/0705037305
PPTX
sac 451hinhgsgshssjsjsjheegdggeegegdggddgeg.pptx
PPTX
Copy of 16 Timeline & Flowchart Templates – HubSpot.pptx
DOCX
Factor Analysis Word Document Presentation
PDF
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
PPTX
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
PPTX
STERILIZATION AND DISINFECTION-1.ppthhhbx
Pilar Kemerdekaan dan Identi Bangsa.pptx
New ISO 27001_2022 standard and the changes
QUANTUM_COMPUTING_AND_ITS_POTENTIAL_APPLICATIONS[2].pptx
OneRead_20250728_1808.pdfhdhddhshahwhwwjjaaja
Data Engineering Interview Questions & Answers Batch Processing (Spark, Hadoo...
A Complete Guide to Streamlining Business Processes
CYBER SECURITY the Next Warefare Tactics
ISS -ESG Data flows What is ESG and HowHow
Acceptance and paychological effects of mandatory extra coach I classes.pptx
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
Qualitative Qantitative and Mixed Methods.pptx
Topic 5 Presentation 5 Lesson 5 Corporate Fin
Managing Community Partner Relationships
REAL ILLUMINATI AGENT IN KAMPALA UGANDA CALL ON+256765750853/0705037305
sac 451hinhgsgshssjsjsjheegdggeegegdggddgeg.pptx
Copy of 16 Timeline & Flowchart Templates – HubSpot.pptx
Factor Analysis Word Document Presentation
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
STERILIZATION AND DISINFECTION-1.ppthhhbx

Clustering Financial Time Series: How Long is Enough?

  • 1. Introduction Clustering Financial Time Series: How Long is Enough? 25th International Joint Conference on Artificial Intelligence IJCAI-16 S. Andler, G. Marti, F. Nielsen, P. Donnat July 14, 2016 Gautier Marti Clustering Financial Time Series: How Long is Enough?
  • 2. Introduction Clustering of Financial Time Series Goal: Build Risk & Trading AI agents. . . source: www.datagrapple.com . . . which can strive with this kind of data. Gautier Marti Clustering Financial Time Series: How Long is Enough?
  • 3. Introduction Clustering of Financial Time Series Stylized fact I: Financial time series correlations have a strong hierarchical block diagonal structure (Econophysics [4]) Stylized fact II: Most correlations are spurious (RMT [2]) Motivation for clustering financial time series using correlation as a similarity measure: dimensionality reduction ≡ filtering noisy correlations Gautier Marti Clustering Financial Time Series: How Long is Enough?
  • 4. Introduction Challenge for the statistical practitioner The dilemma: the longer the time interval, the more precise the correlation estimates, but also the longer the time interval, the more unrealistic the stationarity hypothesis for these time series. Question: How does the clustering behave with statistical errors of the correlation estimates? How long is enough? 30 days? 120 days? 10 years? Gautier Marti Clustering Financial Time Series: How Long is Enough?
  • 5. Introduction A first theoretical approach - simplified setting We consider the following framework: financial time series ≡ random walks they follow a joint elliptical distribution (e.g. Gaussian, Student) parameterized by a correlation matrix the correlation matrix has a hierarchical block structure: Gautier Marti Clustering Financial Time Series: How Long is Enough?
  • 6. Introduction Simulations in the simplified setting Some influential parameters: clustering algorithm number of observations T number of variables N relative to T contrast between the correlations, and their values correlation estimator (e.g. Pearson, Spearman) 100 200 300 400 500 Sample size 0.0 0.2 0.4 0.6 0.8 1.0 Score Empirical rates of convergence for Single Linkage Gaussian - Pearson Gaussian - Spearman Student - Pearson Student - Spearman 100 200 300 400 500 Sample size 0.0 0.2 0.4 0.6 0.8 1.0 Score Empirical rates of convergence for Average Linkage Gaussian - Pearson Gaussian - Spearman Student - Pearson Student - Spearman 100 200 300 400 500 Sample size 0.0 0.2 0.4 0.6 0.8 1.0 Score Empirical rates of convergence for Ward Gaussian - Pearson Gaussian - Spearman Student - Pearson Student - Spearman Ratio of the number of correct clustering obtained over the number of trials as a function of T Gautier Marti Clustering Financial Time Series: How Long is Enough?
  • 7. Introduction A consistency proof & first convergence bounds A 2-step proof. First step: We consider Hierarchical Agglomerative Clustering algorithms Space contracting vs. Space conserving vs. Space dilating [1] D(t+1) C (t) i ∪ C (t) j , C (t) k ≤ min D (t) ik , D (t) jk D(t+1) C (t) i ∪ C (t) j , C (t) k ∈ min D (t) ik , D (t) jk , max D (t) ik , D (t) jk D(t+1) C (t) i ∪ C (t) j , C (t) k ≥ max D (t) ik , D (t) jk Gautier Marti Clustering Financial Time Series: How Long is Enough?
  • 8. Introduction A consistency proof & first convergence bounds A 2-step proof. First step: Which geometrical configurations lead to the true clustering? For space-conserving algorithms (e.g. Single, Complete, Average Linkage), a sufficient separability condition reads max Dintra := max 1≤i,j≤N C(i)=C(j) d(Xi , Xj ) < min 1≤i,j≤N C(i)=C(j) d(Xi , Xj ) =: min Dinter Gautier Marti Clustering Financial Time Series: How Long is Enough?
  • 9. Introduction A consistency proof & first convergence bounds A 2-step proof. Second step: How long does it take for the estimates of the correlation coefficients to be precise enough to be with high probability in a good configuration for the clustering algorithm? Answer: Concentration inequalities for correlation coefficients. Gautier Marti Clustering Financial Time Series: How Long is Enough?
  • 10. Introduction Convergence bounds Combining both steps, we get the following convergence rate: Convergence rate The probability of the clustering algorithm making an error is O log N T . Gautier Marti Clustering Financial Time Series: How Long is Enough?
  • 11. Introduction Proof. Step 1 - A bit more details By induction. Let’s assume the separability condition is satisfied at step t, then min D (t) intra ≤ max D (t) intra < min D (t) inter ≤ max D (t) inter From the space-conserving property, we get: D (t+1) intra ∈ min D (t) intra, max D (t) intra and D (t+1) inter ∈ min D (t) inter, max D (t) inter . Therefore: separability condition is satisfied at t+1, the clustering algorithm has not linked points from two different clusters between step t and step t + 1. Gautier Marti Clustering Financial Time Series: How Long is Enough?
  • 12. Introduction Proof. Step 2 - A bit more details Maximum statistical error For space conserving algorithm the separability condition is met if ˆΣ − Σ ∞ < minρi ,ρj |ρi − ρj | 2 , where C(i) = C(j). This means that the statistical error has to be below the minimum correlation ‘contrast’ between the clusters. Weaker the ‘contrast’, more precise the correlation estimates have to be. N.B. From Cram´er–Rao lower bound, we get for Pearson correlation estimator: var(ˆρ) ≥ (1 − ρ2 )2 1 + ρ2 . When correlation is high, it is easier to estimate. Gautier Marti Clustering Financial Time Series: How Long is Enough?
  • 13. Introduction Correlation estimates concentration bounds number of variables N, observations T, minimum separation d Concentration bounds [3] If Σ and ˆΣ are the population and empirical Spearman correlation matrices respectively, then for N ≥ 24 log T + 2, we have with probability at least 1 − 1 T2 , ˆΣ − Σ ∞ ≤ 24 log N T . P(“correct clustering”) ≥ 1 − 2N2 e−Td2/24 Not sharp enough! (for reasonable values of N, T, d) Gautier Marti Clustering Financial Time Series: How Long is Enough?
  • 14. Introduction Future developments Bounds are not sharp enough. We can try to refine them using: (theoretical) Intrinsic dimension of the HCBM model [5]; (empirical) A distance between dendrograms (instead of correct/incorrect) for a finer analysis; (empirical) A study of ‘correctness’ isoquants: Precise convergence rates of clustering methodologies can provide a useful model selection criterion for practitioners! Gautier Marti Clustering Financial Time Series: How Long is Enough?
  • 15. Introduction Zhenmin Chen and John W Van Ness. Space-conserving agglomerative algorithms. Journal of classification, 13(1):157–168, 1996. Laurent Laloux, Pierre Cizeau, Marc Potters, and Jean-Philippe Bouchaud. Random matrix theory and financial correlations. International Journal of Theoretical and Applied Finance, 3(03):391–397, 2000. Han Liu, Fang Han, Ming Yuan, John Lafferty, Larry Wasserman, et al. High-dimensional semiparametric gaussian copula graphical models. The Annals of Statistics, 40(4):2293–2326, 2012. Rosario N Mantegna. Hierarchical structure in financial markets. Gautier Marti Clustering Financial Time Series: How Long is Enough?
  • 16. Introduction The European Physical Journal B-Condensed Matter and Complex Systems, 11(1):193–197, 1999. Joel A Tropp. An introduction to matrix concentration inequalities. arXiv preprint arXiv:1501.01571, 2015. Gautier Marti Clustering Financial Time Series: How Long is Enough?