On clustering financial time series - A need for distances between dependent random variables

Introduction
Dependence and Distribution
Toward an extension to the multivariate case
On clustering ﬁnancial time series
A need for distances between dependent random variables
Gautier Marti, Frank Nielsen, Philippe Very, Philippe Donnat
24 September 2015
Gautier Marti, Frank Nielsen On clustering ﬁnancial time series

Introduction
1 Introduction
2 Dependence and Distribution
3 Toward an extension to the multivariate case

Introduction
Motivations: Why clustering?
Motivations:
Mathematical finance: Use of variance-covariance matrices
(e.g., Markowitz, Value-at-Risk)
Stylized fact: Empirical
variance-covariance matrices
estimated on financial time
series are very noisy
(Random Matrix Theory,
Noise Dressing of Financial
Correlation Matrices, Laloux
et al, 1999)
Figure: Marchenko-Pastur
distribution vs. eigenvalues of the
empirical correlation matrix
How to filter these variance-covariance matrices?

Introduction
Information ﬁltering? Clustering!
Mantegna (1999) et al’s work:
Limits: focus on ρij (Pearson correlation) which is not robust to
outliers / heavy tails → could lead to spurious clusters

Introduction
Modelling
Asset i variations or returns follow random variable Xi
Assets variations or returns are ”correlated”
i.i.d. observations:
X1 : X1
1 , X2
1 , . . . , XT
1
X2 : X1
2 , X2
2 , . . . , XT
2
. . . , . . . , . . . , . . . , . . .
XN : X1
N, X2
N, . . . , XT
N
Which distances d(Xi , Xj ) between dependent random variables?

Introduction
Pitfalls of a basic distance
Let (X, Y ) be a bivariate Gaussian vector, with X ∼ N(µX , σ2
X ),
Y ∼ N(µY , σ2
Y ) and whose correlation is ρ(X, Y ) ∈ [−1, 1].
E[(X − Y )2
] = (µX − µY )2
+ (σX − σY )2
+ 2σX σY (1 − ρ(X, Y ))
Now, consider the following values for correlation:
ρ(X, Y ) = 0, so E[(X − Y )2] = (µX − µY )2 + σ2
X + σ2
Y .
Assume µX = µY and σX = σY . For σX = σY 1, we
obtain E[(X − Y )2] 1 instead of the distance 0, expected
from comparing two equal Gaussians.
ρ(X, Y ) = 1, so E[(X − Y )2] = (µX − µY )2 + (σX − σY )2.

Introduction
Pitfalls of a basic distance
(Marti, Nielsen, Very, Donnat, ICMLA 2015)

Introduction
The Financial Engineer Bias: Correlation
correlation patterns are blatant
Mantegna et al. aim at ﬁltering information from the
correlation matrix using clustering
O(N2) (correlation) vs. O(N) (distribution) parameters

Introduction
Information Geometry and its statistical distances
original poster: http://www.sonycsl.co.jp/person/nielsen/FrankNielsen-distances-figs.pdf

Introduction
Sklar’s Theorem and the Copula Transform
Theorem (Sklar’s Theorem (1959))
For any random vector X = (X1, . . . , XN) having continuous
marginal cdfs Pi , 1 ≤ i ≤ N, its joint cumulative distribution P is
uniquely expressed as
P(X1, . . . , XN) = C(P1(X1), . . . , PN(XN)),
where C, the multivariate distribution of uniform marginals, is
known as the copula of X.

Introduction
Sklar’s Theorem and the Copula Transform
Deﬁnition (The Copula Transform)
Let X = (X1, . . . , XN) be a random vector with continuous
marginal cumulative distribution functions (cdfs) Pi , 1 ≤ i ≤ N.
The random vector
U = (U1, . . . , UN) := P(X) = (P1(X1), . . . , PN(XN))
is known as the copula transform.
Ui , 1 ≤ i ≤ N, are uniformly distributed on [0, 1] (the probability
integral transform): for Pi the cdf of Xi , we have
x = Pi (Pi
−1
(x)) = Pr(Xi ≤ Pi
−1
(x)) = Pr(Pi (Xi ) ≤ x), thus
Pi (Xi ) ∼ U[0, 1].

Introduction
Distance Design
d2
θ (Xi , Xj ) = θ3E |Pi (Xi ) − Pj (Xj )|2
+ (1 − θ)
1
2 R
dPi
dλ
−
dPj
dλ
2
dλ

Introduction
Results: Data from Hierarchical Block Model
Adjusted Rand Index
Algo. Distance A B C
HC-AL
(1 − ρ)/2 0.00 ±0.01 0.99 ±0.01 0.56 ±0.01
E[(X − Y )2
] 0.00 ±0.00 0.09 ±0.12 0.55 ±0.05
GPR θ = 0 0.34 ±0.01 0.01 ±0.01 0.06 ±0.02
GPR θ = 1 0.00 ±0.01 0.99 ±0.01 0.56 ±0.01
GPR θ = .5 0.34 ±0.01 0.59 ±0.12 0.57 ±0.01
GNPR θ = 0 1 0.00 ±0.00 0.17 ±0.00
GNPR θ = 1 0.00 ±0.00 1 0.57 ±0.00
GNPR θ = .5 0.99 ±0.01 0.25 ±0.20 0.95 ±0.08
AP
(1 − ρ)/2 0.00 ±0.00 0.99 ±0.07 0.48 ±0.02
E[(X − Y )2
] 0.14 ±0.03 0.94 ±0.02 0.59 ±0.00
GPR θ = 0 0.25 ±0.08 0.01 ±0.01 0.05 ±0.02
GPR θ = 1 0.00 ±0.01 0.99 ±0.01 0.48 ±0.02
GPR θ = .5 0.06 ±0.00 0.80 ±0.10 0.52 ±0.02
GNPR θ = 0 1 0.00 ±0.00 0.18 ±0.01
GNPR θ = 1 0.00 ±0.01 1 0.59 ±0.00
GNPR θ = .5 0.39 ±0.02 0.39 ±0.11 1

Introduction
Results: Data from CDS market
(Marti, Nielsen, Very, Donnat, ICMLA 2015)

Introduction
Limits and questions
Why a convex combination? no a priori support from geometry
In practice:
no real control on the weight of correlation and on the weight
of distribution
stability methods are still prone to overﬁtting for selecting
parameters
θ actually depends on the convergence rate of the estimators:
correlation measures converge faster than distribution
estimation

Introduction
Overview

Introduction
Multivariate dependence
What is the state of the art on multivariate dependence?
multivariate mutual information: In information theory
there have been various attempts over the years to
extend the deﬁnition of mutual information to more than
two random variables. These attempts have met with a
great deal of confusion and a realization that interactions
among many random variables are poorly understood.

Introduction
Optimal Copula Transport for intra-dependence
Dintra(X1, X2) := EMD(s1, s2),
EMD(s1, s2) := min
f
1≤i,j≤n
pi − qj fij
subject to fij ≥ 0, 1 ≤ i, j ≤ n,
n
j=1
fij ≤ wpi
, 1 ≤ i ≤ n,
n
i=1
fij ≤ wqj
, 1 ≤ j ≤ n,
n
i=1
n
j=1
fij = 1.

Introduction
Optimal Copula Transport for inter-dependence

Introduction
Limits and questions
does not scale well with even moderate dimensionality:
density estimation
computing cost
full parametric approach?
how to connect with the (copula,margins) representation?
information geometry?
(approximate) optimal transport?
kernel embedding of distributions?
contact: gautier.marti@helleborecapital.com

Introduction
Daniel Aloise, Amit Deshpande, Pierre Hansen, and Preyas
Popat.
NP-hardness of Euclidean sum-of-squares clustering.
Machine Learning, 75(2):245–248, 2009.
Luigi Ambrosio and Nicola Gigli.
A user’s guide to optimal transport.
In Modelling and optimisation of ﬂows on networks, pages
1–155. Springer, 2013.
David Applegate, Tamraparni Dasu, Shankar Krishnan, and
Simon Urbanek.
Unsupervised clustering of multidimensional distributions using
earth mover distance.
In Proceedings of the 17th ACM SIGKDD international
conference on Knowledge discovery and data mining, pages
636–644. ACM, 2011.

Introduction
Shai Ben-David, Ulrike Von Luxburg, and Dávid Pál.
A sober look at clustering stability.
In Learning theory, pages 5–19. Springer, 2006.
Petro Borysov, Jan Hannig, and JS Marron.
Asymptotics of hierarchical clustering for growing dimension.
Journal of Multivariate Analysis, 124:465–479, 2014.
Leo Breiman and Jerome H Friedman.
Estimating optimal transformations for multiple regression and
correlation.
Journal of the American statistical Association, 80(391):
580–598, 1985.
Joël Bun, Romain Allez, Jean-Philippe Bouchaud, and Marc
Potters.
Rotational invariant estimator for general noisy matrices.
arXiv preprint arXiv:1502.06736, 2015.

Introduction
Gunnar Carlsson and Facundo M´emoli.
Characterization, stability and convergence of hierarchical
clustering methods.
The Journal of Machine Learning Research, 11:1425–1470,
2010.
Yanping Chen, Eamonn Keogh, Bing Hu, Nurjahan Begum,
Anthony Bagnall, Abdullah Mueen, and Gustavo Batista.
The UCR time series classiﬁcation archive, July 2015.
www.cs.ucr.edu/~eamonn/time_series_data/.
Tamraparni Dasu, Deborah F Swayne, and David Poole.
Grouping multivariate time series: A case study.
In Proceedings of the IEEE Workshop on Temporal Data
Mining: Algorithms, Theory and Applications, in conjunction
with the Conference on Data Mining, Houston, pages 25–32,
2005.
Paul Deheuvels.

Introduction
La fonction de dépendance empirique et ses propriétés. un test
non paramétrique d’indépendance.
Acad. Roy. Belg. Bull. Cl. Sci.(5), 65(6):274–292, 1979.
Paul Deheuvels.
An asymptotic decomposition for multivariate distribution-free
tests of independence.
Journal of Multivariate Analysis, 11(1):102–113, 1981.
T Di Matteo, T Aste, ST Hyde, and S Ramsden.
Interest rates hierarchical structure.
Physica A: Statistical Mechanics and its Applications, 355(1):
21–33, 2005.
T Di Matteo, Francesca Pozzi, and Tomaso Aste.
The use of dynamical networks to detect the hierarchical
organization of financial market sectors.
The European Physical Journal B-Condensed Matter and
Complex Systems, 73(1):3–11, 2010.

Introduction
Francis X Diebold and Canlin Li.
Forecasting the term structure of government bond yields.
Journal of econometrics, 130(2):337–364, 2006.
A Adam Ding and Yi Li.
Copula correlation: An equitable dependence measure and
extension of pearson’s correlation.
Bradley Efron.
Bootstrap methods: another look at the jackknife.
The annals of Statistics, pages 1–26, 1979.
Gal Elidan.
Copulas in machine learning.
In Copulae in Mathematical and Quantitative Finance, pages
39–60. Springer, 2013.

Introduction
Sira Ferradans, Nicolas Papadakis, Julien Rabin, Gabriel Peyr´e,
and Jean-Fran¸cois Aujol.
Regularized discrete optimal transport.
Springer, 2013.
Hans Gebelein.
Das statistische problem der korrelation als variations-und
eigenwertproblem und sein zusammenhang mit der
ausgleichsrechnung.
ZAMM-Journal of Applied Mathematics and
Mechanics/Zeitschrift f¨ur Angewandte Mathematik und
Mechanik, 21(6):364–379, 1941.
Cyril Goutte, Peter Toft, Egill Rostrup, Finn ˚A Nielsen, and
Lars Kai Hansen.
On clustering fMRI time series.
NeuroImage, 9(3):298–310, 1999.

Introduction
Clive WJ Granger and Paul Newbold.
Spurious regressions in econometrics.
Journal of econometrics, 2(2):111–120, 1974.
Isabelle Guyon, Ulrike Von Luxburg, and Robert C Williamson.
Clustering: Science or art.
In NIPS 2009 Workshop on Clustering Theory, 2009.
Jiang Hangjin and Ding Yiming.
Equitability of dependence measure.
stat, 1050:9, 2015.
Keith Henderson, Brian Gallagher, and Tina Eliassi-Rad.
EP-MEANS: An eﬃcient nonparametric clustering of empirical
probability distributions.
2015.
Weiming Hu, Tieniu Tan, Liang Wang, and Steve Maybank.
A survey on visual surveillance of object motion and behaviors.

Introduction
Systems, Man, and Cybernetics, Part C: Applications and
Reviews, IEEE Transactions on, 34(3):334–352, 2004.
John C Hull.
Options, futures, and other derivatives.
Pearson Education, 2006.
Anil K Jain.
Data clustering: 50 years beyond k-means.
Pattern recognition letters, 31(8):651–666, 2010.
Konstantinos Kalpakis, Dhiral Gada, and Vasundhara
Puttagunta.
Distance measures for eﬀective clustering of ARIMA
time-series.
In Data Mining, 2001. ICDM 2001, Proceedings IEEE
International Conference on, pages 273–280. IEEE, 2001.
M Kanevski, V Timonin, A Pozdnoukhov, and M Maignan.

Introduction
Evolution of interest rate curve: empirical analysis of patterns
using nonlinear clustering tools.
In European Symposium on Time Series Prediction, 2008.
Leonid Vitalievich Kantorovich.
On the translocation of masses.
In Dokl. Akad. Nauk SSSR, volume 37, pages 199–201, 1942.
Justin B Kinney and Gurinder S Atwal.
Equitability, mutual information, and the maximal information
coeﬃcient.
Proceedings of the National Academy of Sciences, 111(9):
3354–3359, 2014.
Jon M. Kleinberg.
An impossibility theorem for clustering.
In S. Thrun and K. Obermayer, editors, Advances in Neural
Information Processing Systems 15, pages 446–453. MIT
Press, Cambridge, MA, 2002.

Introduction
URL
http://books.nips.cc/papers/files/nips15/LT17.pdf.
Laurent Laloux, Pierre Cizeau, Marc Potters, and
Jean-Philippe Bouchaud.
Random matrix theory and ﬁnancial correlations.
International Journal of Theoretical and Applied Finance, 3
(03):391–397, 2000.
Victoria Lemieux, Payam S Rahmdel, Rick Walker, BL Wong,
and Mark Flood.
Clustering techniques and their eﬀect on portfolio formation
and risk analysis.
In Proceedings of the International Workshop on Data Science
for Macro-Modeling, pages 1–6. ACM, 2014.
Erel Levine and Eytan Domany.

Introduction
Resampling method for unsupervised estimation of cluster
validity.
Neural computation, 13(11):2573–2593, 2001.
T Warren Liao.
Clustering of time series data—a survey.
Pattern recognition, 38(11):1857–1874, 2005.
Jessica Lin, Eamonn Keogh, Stefano Lonardi, and Bill Chiu.
A symbolic representation of time series, with implications for
streaming algorithms.
In Proceedings of the 8th ACM SIGMOD workshop on
Research issues in data mining and knowledge discovery, pages
2–11. ACM, 2003.
Jessica Lin, Michail Vlachos, Eamonn Keogh, and Dimitrios
Gunopulos.
Iterative incremental clustering of time series.

Introduction
In Advances in Database Technology-EDBT 2004, pages
106–122. Springer, 2004.
Jessica Lin, Eamonn Keogh, Li Wei, and Stefano Lonardi.
Experiencing SAX: a novel symbolic representation of time
series.
Data Mining and knowledge discovery, 15(2):107–144, 2007.
David Lopez-Paz, Philipp Hennig, and Bernhard Schölkopf.
The randomized dependence coefficient.
Rosario N Mantegna.
Hierarchical structure in financial markets.
The European Physical Journal B-Condensed Matter and
Complex Systems, 11(1):193–197, 1999.
Martin Martens and Ser-Huang Poon.

Introduction
Returns synchronization and daily correlation dynamics
between international stock markets.
Journal of Banking & Finance, 25(10):1805–1827, 2001.
Gautier Marti, Philippe Donnat, Frank Nielsen, and Philippe
Very.
HCMapper: An interactive visualization tool to compare
partition-based ﬂat clustering extracted from pairs of
dendrograms.
arXiv preprint arXiv:1507.08137, 2015a.
Gautier Marti, Philippe Very, and Philippe Donnat.
Toward a generic representation of random variables for
machine learning.
arXiv preprint arXiv:1506.00976, 2015b.
Sergio Mayordomo, Juan Ignacio Pe˜na, and Eduardo S
Schwartz.

Introduction
Are all credit default swap databases equal?
Technical report, National Bureau of Economic Research,
2010.
Sergio Mayordomo, Juan Ignacio Peña, and Eduardo S
Schwartz.
Are all credit default swap databases equal?
European Financial Management, 20(4):677–713, 2014.
Gaspard Monge.
Mémoire sur la théorie des déblais et des remblais.
De l’Imprimerie Royale, 1781.
James Munkres.
Algorithms for the assignment and transportation problems.
Journal of the Society for Industrial and Applied Mathematics,
5(1):32–38, 1957.
Nicolo Musmeci, Tomaso Aste, and Tiziana Di Matteo.

Introduction
Relation between financial market structure and the real
economy: Comparison between clustering methods.
Available at SSRN 2525291, 2014.
Nicoló Musmeci, Tomaso Aste, and Tiziana Di Matteo.
Relation between financial market structure and the real
economy: comparison between clustering methods.
2015.
Roger B Nelsen.
An introduction to copulas, volume 139.
Springer Science & Business Media, 2013.
Dominic O’Kane.
Modelling single-name and multi-name credit derivatives,
volume 573.
John Wiley & Sons, 2011.
Barnabás Póczos, Zoubin Ghahramani, and Jeff Schneider.

Introduction
Copula-based kernel dependency measures.
David N Reshef, Yakir A Reshef, Hilary K Finucane, Sharon R
Grossman, Gilean McVean, Peter J Turnbaugh, Eric S Lander,
Michael Mitzenmacher, and Pardis C Sabeti.
Detecting novel associations in large data sets.
science, 334(6062):1518–1524, 2011.
David N Reshef, Yakir A Reshef, Pardis C Sabeti, and
Michael M Mitzenmacher.
An empirical study of leading measures of dependence.
arXiv preprint arXiv:1505.02214, 2015a.
Yakir A Reshef, David N Reshef, Hilary K Finucane, Pardis C
Sabeti, and Michael M Mitzenmacher.
Measuring dependence powerfully and equitably.
arXiv preprint arXiv:1505.02213, 2015b.

Introduction
Yakir A Reshef, David N Reshef, Pardis C Sabeti, and
Michael M Mitzenmacher.
Equitability, interval estimation, and statistical power.
arXiv preprint arXiv:1505.02212, 2015c.
Yossi Rubner, Carlo Tomasi, and Leonidas J Guibas.
The earth mover’s distance as a metric for image retrieval.
International journal of computer vision, 40(2):99–121, 2000.
Daniil Ryabko.
Clustering processes.
Ohad Shamir and Naftali Tishby.
Cluster stability for ﬁnite samples.
In NIPS, 2007.
Robert H Shumway.
Time-frequency clustering and discriminant analysis.

Introduction
Statistics & probability letters, 63(3):307–314, 2003.
Noah Simon and Robert Tibshirani.
Comment on”detecting novel associations in large data sets”
by reshef et al, science dec 16, 2011.
Ashish Singhal and Dale E Seborg.
Clustering of multivariate time-series data.
Journal of Chemometrics, 19:427—-438, 2005.
A Sklar.
Fonctions de répartition à n dimensions et leurs marges.
Université Paris 8, 1959.
Won-Min Song, T Di Matteo, and Tomaso Aste.
Hierarchical information clustering by means of topologically
embedded graphs.
PLoS One, 7(3):e31929, 2012.

Introduction
Jimeng Sun, Christos Faloutsos, Spiros Papadimitriou, and
Philip S Yu.
Graphscope: parameter-free mining of large time-evolving
graphs.
In Proceedings of the 13th ACM SIGKDD international
conference on Knowledge discovery and data mining, pages
687–696. ACM, 2007.
G´abor J Sz´ekely, Maria L Rizzo, Nail K Bakirov, et al.
Measuring and testing dependence by correlation of distances.
The Annals of Statistics, 35(6):2769–2794, 2007.
Chayant Tantipathananandh and Tanya Y Berger-Wolf.
Finding communities in dynamic social networks.
In Data Mining (ICDM), 2011 IEEE 11th International
Conference on, pages 1236–1241. IEEE, 2011.

Introduction
Vincenzo Tola, Fabrizio Lillo, Mauro Gallegati, and Rosario N
Mantegna.
Cluster analysis for portfolio optimization.
Journal of Economic Dynamics and Control, 32(1):235–258,
2008.
Michele Tumminello, Tomaso Aste, Tiziana Di Matteo, and
Rosario N Mantegna.
A tool for filtering information in complex systems.
Proceedings of the National Academy of Sciences of the
United States of America, 102(30):10421–10426, 2005.
Michele Tumminello, Fabrizio Lillo, and Rosario N Mantegna.
Correlation, hierarchies, and networks in financial markets.
Journal of Economic Behavior & Organization, 75(1):40–58,
2010.
Cédric Villani.

Introduction
Optimal transport: old and new, volume 338.
Springer Science & Business Media, 2008.
Kiyoung Yang and Cyrus Shahabi.
A pca-based similarity measure for multivariate time series.
In Proceedings of the 2nd ACM international workshop on
Multimedia databases, pages 65–74. ACM, 2004.
Kiyoung Yang and Cyrus Shahabi.
On the stationarity of multivariate time series for
correlation-based data analysis.
In Data Mining, Fifth IEEE International Conference on, pages
4–pp. IEEE, 2005.

On clustering financial time series - A need for distances between dependent random variables

More Related Content

What's hot (20)

Viewers also liked (15)

Similar to On clustering financial time series - A need for distances between dependent random variables (20)

More from Gautier Marti (8)

Recently uploaded (20)

On clustering financial time series - A need for distances between dependent random variables