SlideShare a Scribd company logo
Advanced Strategies for
Metabolomic Data Analysis
Dmitry Grapov, PhD
MetabolomicDataAnalysis
Analysis at the Metabolomic Scale
Multivariate Analysis
Samples
variables
Multivariate Analysis
• Visualization
• Clustering
• Projection
• Modeling
• Networks
Simultaneous analysis of many variables
Clustering
Identify
•patterns
•group structure
•relationships
•Evaluate/refine hypothesis
•Reduce complexity
Artist: Chuck Close
Cluster Analysis
Use the concept similarity/dissimilarity
to group a collection of samples or
variables
Approaches
•hierarchical (HCA)
•non-hierarchical (k-NN, k-means)
•distribution (mixtures models)
•density (DBSCAN)
•self organizing maps (SOM)
Linkage k-means
Distribution Density
Hierarchical Cluster Analysis
• similarity/dissimilarity
defines “nearness” or
distance
X
Y
euclidean
X
Y
manhattan Mahalanobis
X
Y
*
non-euclidean
Hierarchical Cluster Analysis
single complete centroid average
Agglomerative/linkage algorithm
defines how points are grouped
Hierarchical Cluster Analysis (cont.)
Similarity
x
x
x
x
Overview Confirmation
How does my metadata
match my data structure?
Hierarchical Cluster Analysis (cont.)
Multidimensional Scaling
PLoS ONE 7(11): e48852. doi:10.1371/journal.pone.0048852
Projection of Data
The algorithm defines the position of the light source
Principal Components Analysis (PCA)
• unsupervised
• maximize variance (X)
Partial Least Squares Projection to
Latent Structures (PLS)
• supervised
• maximize covariance (Y ~ X)
PCA: Goals
Principal Components (PCs)
•non-supervised
•projection of the data which
maximize variance explained
Results
1.eigenvalues = variance explained
2.scores = new coordinates for
samples (rows)
3.loadings = linear combination of
original variables
James X. Li, 2009, VisuMap Tech.
Interpreting PCA Results
Variance explained (eigenvalues)
Row (sample) scores and column (variable) loadings
PCA Example
*no scaling or centering
glucose
219021
How are scores and
loadings related?
Centering and Scaling
van den Berg RA, Hoefsloot HC, Westerhuis JA, Smilde AK, van der Werf MJ (2006) Centering, scaling, and transformations:
improving the biological information content of metabolomics data. BMC Genomics 7: 142.
Data scaling is very important!
*autoscaling (unit variance and centered)
glucose
(GC/TOF)
glucose (clinical)
219021
Use PLS to test a hypothesis
Loadings on the
first latent
variable (x-axis)
can be used to
interpret the
multivariate
changes in
metabolites
which are
correlated with
time
time = 0 120 min.
Modeling multifactorial
relationships
dynamic changes among groups~two-way ANOVA
“goodness” of the model is all about the
perspective
Determine in-sample (Q2
) and out-
of-sample error (RMSEP) and
compare to a random model
•permutation tests
•training/testing
Biological Interpretation
• Visualization
• Enrichment
• Networks
– biochemical
– structural
– spectral
– empirical
Projection or mapping of analysis results
into a biological context.
Ingredients for Network Analysis
1. Determine connections
• biochemical (substrate/product)
• chemical similarity
• spectral similarity
• empirical dependency (correlation)
2. Determine vertex properties
• magnitude
• importance
• direction
• relationships
Organism specific biochemical relationships and information
Multiple organism DBs
•KEGG
•BioCyc
•Reactome
•Human
•HMDB
•SMPDB
Making Connections Based on Biochemistry
Biochemical Networks
•Use structure to
generate molecular
fingerprint
•Calculate similarities
between metabolites
based on fingerprint
•PubChem service for
similarity calculations
•http://pubchem.ncbi.nlm.nih.gov//score_matrix/score_matrix.cgi
•online tools
•http://uranus.fiehnlab.ucdavis.edu:8080/MetaMapp/homePage
BMC Bioinformatics 2012, 13:99 doi:10.1186/1471-2105-13-99
Making Connections Based on structural
similarity
Structural Similarity Network
Making Connections Based on spectral
similarity
Watrous J et al. PNAS 2012;109:E1743-E1752
•Connect molecules
based on EI or MS/MS
spectral similarity
•Useful for linking
annotated analytes
(known) to unknown
Spectral Similarity Network
Watrous J et al. PNAS 2012;109:E1743-E1752
Making connections based on empirical
relationships
•Connect molecules
based on strength of
correlation or partial-
correlation
Treatment Effects Network
=
Metabolites
Shape = increase/decrease
Size = importance (loading)
Color = correlation
Connections
red = Biochemical relationships
violet = Structural similarity
Summary
Multivariate analysis is useful for:
•Visualization
•Exploration and overview
•Complexity reduction
•Identification of multidimensional
relationships and trends
•Mapping to networks
•Generating holistic summaries of
findings
Resource
•Mapping tools (review)
• Brief Bioinform (2012) doi: 10.1093/bib/bbs055
•Tutorials and Examples
• http://imdevsoftware.wordpress.com/category/uncategorized/
• https://github.com/dgrapov/TeachingDemos
•Chemical Translations Services
•CTS: http://cts.fiehnlab.ucdavis.edu/
•R-interface: https://github.com/dgrapov/CTSgetR
•CIR: http://cactus.nci.nih.gov/chemical/structure
•R-interface: https://github.com/dgrapov/CIRgetR

More Related Content

PPT
Strategies for Metabolomics Data Analysis
PPTX
Metabolomics and Beyond Challenges and Strategies for Next-gen Omic Analyses
PPTX
Metabolomic Data Analysis Workshop and Tutorials (2014)
PPTX
Automation of (Biological) Data Analysis and Report Generation
PPT
Metabolomic Data Analysis Case Studies
PPTX
Data Normalization Approaches for Large-scale Biological Studies
PPTX
High Dimensional Biological Data Analysis and Visualization
PPTX
Mapping to the Metabolomic Manifold
Strategies for Metabolomics Data Analysis
Metabolomics and Beyond Challenges and Strategies for Next-gen Omic Analyses
Metabolomic Data Analysis Workshop and Tutorials (2014)
Automation of (Biological) Data Analysis and Report Generation
Metabolomic Data Analysis Case Studies
Data Normalization Approaches for Large-scale Biological Studies
High Dimensional Biological Data Analysis and Visualization
Mapping to the Metabolomic Manifold

What's hot (20)

PPT
Multivarite and network tools for biological data analysis
PPTX
Normalization of Large-Scale Metabolomic Studies 2014
PDF
Case Study: Overview of Metabolomic Data Normalization Strategies
PPTX
0 introduction
PPTX
Data analysis workflows part 2 2015
PPTX
Metabolomic data analysis and visualization tools
PPTX
3 principal components analysis
PPTX
7 network mapping i
PPT
Multivariate data analysis and visualization tools for biological data
PPTX
3 data normalization (2014 lab tutorial)
PPTX
4 partial least squares modeling
PPTX
1 statistical analysis
PPTX
Data analysis workflows part 1 2015
PPT
Prote-OMIC Data Analysis and Visualization
PPTX
Omic Data Integration Strategies
PPTX
Some statistical concepts relevant to proteomics data analysis
PPT
Harnessing The Proteome With Proteo Iq Quantitative Proteomics Software
PDF
The International Journal of Engineering and Science (The IJES)
PPT
Paper presentation @IPAW'08
PPT
Advanced Strategies for Analysis of Metabolomic Data
Multivarite and network tools for biological data analysis
Normalization of Large-Scale Metabolomic Studies 2014
Case Study: Overview of Metabolomic Data Normalization Strategies
0 introduction
Data analysis workflows part 2 2015
Metabolomic data analysis and visualization tools
3 principal components analysis
7 network mapping i
Multivariate data analysis and visualization tools for biological data
3 data normalization (2014 lab tutorial)
4 partial least squares modeling
1 statistical analysis
Data analysis workflows part 1 2015
Prote-OMIC Data Analysis and Visualization
Omic Data Integration Strategies
Some statistical concepts relevant to proteomics data analysis
Harnessing The Proteome With Proteo Iq Quantitative Proteomics Software
The International Journal of Engineering and Science (The IJES)
Paper presentation @IPAW'08
Advanced Strategies for Analysis of Metabolomic Data
Ad

Similar to Advanced strategies for Metabolomics Data Analysis (20)

PPT
American Society for Mass Spectrometry Conference 2013
PDF
Metabolomics Bioinformatics Analysis.pdf
PDF
Bioinformatics Analysis of Metabolomics Data
PPTX
Kernel-based machine learning methods
PPT
Multivariate Analysis and Visualization of Proteomic Data
PDF
Cardiology_Metabolomics_workshop_2016_v2
PDF
Network mapping 101 course
PDF
Introduction to 16S rRNA gene multivariate analysis
PPTX
2 cluster analysis
PDF
Statistical analysis
PPTX
Complex Systems Biology Informed Data Analysis and Machine Learning
PPTX
Metabolomics Society meeting 2011 - presentatie Kees
PPTX
Metabolic Set Enrichment Analysis - chemrich - 2019
PPTX
Metabolic network visualization - concepts
PPT
Metabolomics.ppt molecular biology ( botany)
PDF
Metabolomics Data Analysis
PPTX
How to analyse large data sets
PPTX
Metabolite Set Enrichment Analysis (ChemRICH)
PPTX
Machine Learning Powered Metabolomic Network Analysis
PPTX
Metabolic network mapping for metabolomics
American Society for Mass Spectrometry Conference 2013
Metabolomics Bioinformatics Analysis.pdf
Bioinformatics Analysis of Metabolomics Data
Kernel-based machine learning methods
Multivariate Analysis and Visualization of Proteomic Data
Cardiology_Metabolomics_workshop_2016_v2
Network mapping 101 course
Introduction to 16S rRNA gene multivariate analysis
2 cluster analysis
Statistical analysis
Complex Systems Biology Informed Data Analysis and Machine Learning
Metabolomics Society meeting 2011 - presentatie Kees
Metabolic Set Enrichment Analysis - chemrich - 2019
Metabolic network visualization - concepts
Metabolomics.ppt molecular biology ( botany)
Metabolomics Data Analysis
How to analyse large data sets
Metabolite Set Enrichment Analysis (ChemRICH)
Machine Learning Powered Metabolomic Network Analysis
Metabolic network mapping for metabolomics
Ad

More from Dmitry Grapov (8)

PDF
R programming for Data Science - A Beginner’s Guide
PDF
Rise of Deep Learning for Genomic, Proteomic, and Metabolomic Data Integratio...
PDF
Dmitry Grapov Resume and CV
PPTX
Modeling poster
PPT
Gene Ontology Enrichment Network Analysis -Tutorial
PPTX
American Society of Mass Spectrommetry Conference 2014
PPTX
6 metabolite enrichment analysis
PPTX
5 data analysis case study
R programming for Data Science - A Beginner’s Guide
Rise of Deep Learning for Genomic, Proteomic, and Metabolomic Data Integratio...
Dmitry Grapov Resume and CV
Modeling poster
Gene Ontology Enrichment Network Analysis -Tutorial
American Society of Mass Spectrommetry Conference 2014
6 metabolite enrichment analysis
5 data analysis case study

Recently uploaded (20)

PPTX
Radiologic_Anatomy_of_the_Brachial_plexus [final].pptx
PDF
Chinmaya Tiranga quiz Grand Finale.pdf
PDF
2.FourierTransform-ShortQuestionswithAnswers.pdf
DOC
Soft-furnishing-By-Architect-A.F.M.Mohiuddin-Akhand.doc
PDF
ChatGPT for Dummies - Pam Baker Ccesa007.pdf
PPTX
Tissue processing ( HISTOPATHOLOGICAL TECHNIQUE
PDF
Microbial disease of the cardiovascular and lymphatic systems
PPTX
Introduction-to-Literarature-and-Literary-Studies-week-Prelim-coverage.pptx
PPTX
master seminar digital applications in india
PPTX
Microbial diseases, their pathogenesis and prophylaxis
PDF
What if we spent less time fighting change, and more time building what’s rig...
PPTX
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
PPTX
202450812 BayCHI UCSC-SV 20250812 v17.pptx
PDF
Updated Idioms and Phrasal Verbs in English subject
PDF
A GUIDE TO GENETICS FOR UNDERGRADUATE MEDICAL STUDENTS
PDF
Computing-Curriculum for Schools in Ghana
PPTX
Orientation - ARALprogram of Deped to the Parents.pptx
PDF
01-Introduction-to-Information-Management.pdf
PDF
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
PPTX
Final Presentation General Medicine 03-08-2024.pptx
Radiologic_Anatomy_of_the_Brachial_plexus [final].pptx
Chinmaya Tiranga quiz Grand Finale.pdf
2.FourierTransform-ShortQuestionswithAnswers.pdf
Soft-furnishing-By-Architect-A.F.M.Mohiuddin-Akhand.doc
ChatGPT for Dummies - Pam Baker Ccesa007.pdf
Tissue processing ( HISTOPATHOLOGICAL TECHNIQUE
Microbial disease of the cardiovascular and lymphatic systems
Introduction-to-Literarature-and-Literary-Studies-week-Prelim-coverage.pptx
master seminar digital applications in india
Microbial diseases, their pathogenesis and prophylaxis
What if we spent less time fighting change, and more time building what’s rig...
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
202450812 BayCHI UCSC-SV 20250812 v17.pptx
Updated Idioms and Phrasal Verbs in English subject
A GUIDE TO GENETICS FOR UNDERGRADUATE MEDICAL STUDENTS
Computing-Curriculum for Schools in Ghana
Orientation - ARALprogram of Deped to the Parents.pptx
01-Introduction-to-Information-Management.pdf
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
Final Presentation General Medicine 03-08-2024.pptx

Advanced strategies for Metabolomics Data Analysis

  • 1. Advanced Strategies for Metabolomic Data Analysis Dmitry Grapov, PhD MetabolomicDataAnalysis
  • 2. Analysis at the Metabolomic Scale
  • 4. Multivariate Analysis • Visualization • Clustering • Projection • Modeling • Networks Simultaneous analysis of many variables
  • 6. Cluster Analysis Use the concept similarity/dissimilarity to group a collection of samples or variables Approaches •hierarchical (HCA) •non-hierarchical (k-NN, k-means) •distribution (mixtures models) •density (DBSCAN) •self organizing maps (SOM) Linkage k-means Distribution Density
  • 7. Hierarchical Cluster Analysis • similarity/dissimilarity defines “nearness” or distance X Y euclidean X Y manhattan Mahalanobis X Y * non-euclidean
  • 8. Hierarchical Cluster Analysis single complete centroid average Agglomerative/linkage algorithm defines how points are grouped
  • 9. Hierarchical Cluster Analysis (cont.) Similarity x x x x
  • 10. Overview Confirmation How does my metadata match my data structure? Hierarchical Cluster Analysis (cont.)
  • 11. Multidimensional Scaling PLoS ONE 7(11): e48852. doi:10.1371/journal.pone.0048852
  • 12. Projection of Data The algorithm defines the position of the light source Principal Components Analysis (PCA) • unsupervised • maximize variance (X) Partial Least Squares Projection to Latent Structures (PLS) • supervised • maximize covariance (Y ~ X)
  • 13. PCA: Goals Principal Components (PCs) •non-supervised •projection of the data which maximize variance explained Results 1.eigenvalues = variance explained 2.scores = new coordinates for samples (rows) 3.loadings = linear combination of original variables James X. Li, 2009, VisuMap Tech.
  • 14. Interpreting PCA Results Variance explained (eigenvalues) Row (sample) scores and column (variable) loadings
  • 15. PCA Example *no scaling or centering glucose 219021
  • 16. How are scores and loadings related?
  • 17. Centering and Scaling van den Berg RA, Hoefsloot HC, Westerhuis JA, Smilde AK, van der Werf MJ (2006) Centering, scaling, and transformations: improving the biological information content of metabolomics data. BMC Genomics 7: 142.
  • 18. Data scaling is very important! *autoscaling (unit variance and centered) glucose (GC/TOF) glucose (clinical) 219021
  • 19. Use PLS to test a hypothesis Loadings on the first latent variable (x-axis) can be used to interpret the multivariate changes in metabolites which are correlated with time time = 0 120 min.
  • 21. “goodness” of the model is all about the perspective Determine in-sample (Q2 ) and out- of-sample error (RMSEP) and compare to a random model •permutation tests •training/testing
  • 22. Biological Interpretation • Visualization • Enrichment • Networks – biochemical – structural – spectral – empirical Projection or mapping of analysis results into a biological context.
  • 23. Ingredients for Network Analysis 1. Determine connections • biochemical (substrate/product) • chemical similarity • spectral similarity • empirical dependency (correlation) 2. Determine vertex properties • magnitude • importance • direction • relationships
  • 24. Organism specific biochemical relationships and information Multiple organism DBs •KEGG •BioCyc •Reactome •Human •HMDB •SMPDB Making Connections Based on Biochemistry
  • 26. •Use structure to generate molecular fingerprint •Calculate similarities between metabolites based on fingerprint •PubChem service for similarity calculations •http://pubchem.ncbi.nlm.nih.gov//score_matrix/score_matrix.cgi •online tools •http://uranus.fiehnlab.ucdavis.edu:8080/MetaMapp/homePage BMC Bioinformatics 2012, 13:99 doi:10.1186/1471-2105-13-99 Making Connections Based on structural similarity
  • 28. Making Connections Based on spectral similarity Watrous J et al. PNAS 2012;109:E1743-E1752 •Connect molecules based on EI or MS/MS spectral similarity •Useful for linking annotated analytes (known) to unknown
  • 29. Spectral Similarity Network Watrous J et al. PNAS 2012;109:E1743-E1752
  • 30. Making connections based on empirical relationships •Connect molecules based on strength of correlation or partial- correlation
  • 31. Treatment Effects Network = Metabolites Shape = increase/decrease Size = importance (loading) Color = correlation Connections red = Biochemical relationships violet = Structural similarity
  • 32. Summary Multivariate analysis is useful for: •Visualization •Exploration and overview •Complexity reduction •Identification of multidimensional relationships and trends •Mapping to networks •Generating holistic summaries of findings
  • 33. Resource •Mapping tools (review) • Brief Bioinform (2012) doi: 10.1093/bib/bbs055 •Tutorials and Examples • http://imdevsoftware.wordpress.com/category/uncategorized/ • https://github.com/dgrapov/TeachingDemos •Chemical Translations Services •CTS: http://cts.fiehnlab.ucdavis.edu/ •R-interface: https://github.com/dgrapov/CTSgetR •CIR: http://cactus.nci.nih.gov/chemical/structure •R-interface: https://github.com/dgrapov/CIRgetR