SlideShare a Scribd company logo
Welcome to network mapping 101
Welcome to network mapping 101
In the following
course you will learn
how to integrate
statistical,
multivariate and
machine learning
results within a
publication quality
biochemical network.
1
Tutorials
Tutorials
https://creativedatasolutions.github.io/CDS.courses/courses/network_mapping_101/docs/#topics 2
•Preparing raw data for analysis
•Statistical analysis
•Multivariate data exploration
•Supervised clustering
•Machine learning
•classification
•model validation
•feature selection
•Network analysis
•biochemical
•structural similarity
•correlation
•Network mapping – putting it all together
Analysis at the metabolomic scale
3
Integrate high-dimensional data
Samples
variables
4
Identify what matters
Univariate
Group
1
Group
2
Multivariate Predictive Modeling
ANOVA PCA PLS 5
Topics
Topics
https://creativedatasolutions.github.io/CDS.courses/courses/network_mapping_101/docs/#topics 6
•Data preparation
•Differential expression
•Hierarchical Clustering
•Principal Components Analysis (PCA)
•Statistical analysis
•Machine learning
•Network analysis
•Network mapping
How to think about data complexity
n
m
1-D 2-D m-D
Data
samples
variables
complexity
Meta
Data
Experimental
Design =
Variable # = dimensionality
7
Data preprocessing
Data preprocessing
https://creativedatasolutions.github.io/CDS.courses/courses/network_mapping_101/docs/partial/preprocess/ 8
define
• data
• row meta data
• column meta data
remove and/or impute
• missing values
Follow along with the following tutorial:
https://creativedatasolutions.github.io/CDS.courses/courses/network_mapping
_101/docs/partial/preprocess/
9
Your turn
Your turn
Differential expression
Differential expression
https://creativedatasolutions.github.io/CDS.courses/courses/network_mapping_101/docs/partial/statistics/ 10
compare
• class means
identify
• significant differences
visualize
• volcano plots
• violin plots
Follow along with the following tutorial:
https://creativedatasolutions.github.io/CDS.courses/courses/network_mapping
_101/docs/partial/statistics/
11
Your turn
Your turn
Hierarchical clustering (HCA)
Hierarchical clustering (HCA)
https://creativedatasolutions.github.io/CDS.courses/courses/network_mapping_101/docs/partial/clustering/#heirarchical-clustering 12
group
• samples and/or variables
define similarity
• correlation
• distance
• linkage
visualize
• heatmaps
• dendrograms
Use the concept similarity/dissimilarity to group a
collection of samples or variables
approaches
•hierarchical (linkage)
•non-hierarchical (k-NN, k-means)
•distribution (mixtures models)
•density (DBSCAN)
•self organizing maps (SOM)
Linkage k-means
Distribution Density
13
Clustering basics
Clustering basics
identify
•patterns
•group structure
•relationships
evaluate and refine hypothesis
reduce complexity
Artist: Chuck Close
14
HCA goals
HCA goals
distance
• defines “nearness” or
similarity
X
Y
Euclidean
X
Y
Manhattan Mahalanobis
X
Y
*
non-Euclidean
HCA methods
HCA methods
distance
• defines “nearness” or
similarity
HCA methods
HCA methods
https://towardsdatascience.com/9-distance-measures-in-data-science-918109d069fa
HCA methods
HCA methods
single complete centroid average
linkage or agglomeration
• how samples or variables are connected or grouped
HCA methods
HCA methods
linkage or agglomeration
https://scikit-learn.org/stable/auto_examples/cluster/plot_linkage_comparison.html
HCA process
HCA process
Similarity
x
x
x
x
HCA interpretation
HCA interpretation
overview confirmation
How does my metadata*
match my data structure?
* *
*
Follow along with the following tutorial:
https://creativedatasolutions.github.io/CDS.courses/courses/network_mapping
_101/docs/partial/clustering/#heirarchical-clustering
21
Your turn
Your turn
https://creativedatasolutions.github.io/CDS.courses/courses/network_mapping_101/docs/partial/multivariate/#pca 22
reduce
• dimensionality
maximize
• variance explained
visualize
• variance explained
• outliers
• sample scores
• variable loadings
Principal Components Analysis (PCA)
Principal Components Analysis (PCA)
James X. Li, 2009, VisuMap Tech.
PCA goals
PCA goals
Principal Components (PCs)
•non-supervised
•projection of the data which maximize
variance explained
results
1. eigenvalues = variance explained
2. scores = new coordinates for samples
(rows)
3. loadings = linear combination of
original variables
23
PCA interpretation
PCA interpretation
Variance explained (eigenvalues)
Row (sample) scores and column (variable) loadings
24
*no scaling or centering
glucose
219021
25
PCA example
PCA example
Relationship between
scores and loadings
Relationship between
scores and loadings
26
loadings
scores
top loading variable’s scatterplot
Follow along with the following tutorial:
https://creativedatasolutions.github.io/CDS.courses/courses/network_mapping
_101/docs/partial/multivariate/#pca
Your turn
Your turn
27
https://creativedatasolutions.github.io/CDS.courses/courses/network_mapping_101/docs/partial/model/ 28
predict
• sample classification
optimize
• model performance
select
• important features
visualize
• model performance
• feature importance
Machine learning
Machine learning
29
model validation
Machine learning
Machine learning
https://scikit-learn.org/stable/modules/cross_validation.html
30
model training
and validation
Machine learning
Machine learning
https://en.wikipedia.org/wiki/Learning_curve_(machine_learning)
https://www.researchgate.net/publication/359896672_A_Review_of_Machine_Learning_Methods_Applied_to_Structural_Dynamics_and_Vibroacoustic/figures?lo=1 31
bias vs. variance
Machine learning
Machine learning
https://en.wikipedia.org/wiki/Evaluation_of_binary_classifiers 32
classification performance
Machine learning
Machine learning
33
cross-validation
Machine learning
Machine learning
https://scikit-learn.org/stable/modules/cross_validation.html
34
feature selection
Machine learning
Machine learning
https://scikit-learn.org/stable/modules/feature_selection.html#feature-selection
35
Random Forest
• cross-validation
• feature selection
Machine learning
Machine learning
https://mlu-explain.github.io/random-forest/
36
Random Forest
• decision path
Machine learning
Machine learning
https://github.com/parrt/dtreeviz
37
Random Forest
• decision path
Machine learning
Machine learning
https://github.com/parrt/dtreeviz
38
autoML
Machine learning
Machine learning
https://pycaret.gitbook.io/docs/get-started/quickstart?q=leaderboard
Follow along with the following tutorial:
https://creativedatasolutions.github.io/CDS.courses/courses/network_mapping
_101/docs/partial/model/
39
Your turn
Your turn
https://creativedatasolutions.github.io/CDS.courses/courses/network_mapping_101/docs/partial/network/ 40
network mapping
• transform variables
network calculation
• regularized correlation
• biochemical
• structural similarity
• model performance
visualize
• interactive networks
Network analysis
Network analysis
Components for network mapping
Components for network mapping
connections (edges)
• empirical dependency (correlation)
• biochemical (substrate/product)
• chemical similarity
• …
nodes (vertices)
• magnitude
• importance
• direction
• relationships
• … 41
Network data structures
Network data structures
adjacency matrix
42
https://www.steveclarkapps.com/graphs/
Network data structures
Network data structures
adjacency list
43
https://www.steveclarkapps.com/graphs/
Network data structures
Network data structures
edge list and
vertices list
(node attributes)
44
source target …
index …
Correlation networks
Correlation networks
Connect molecules based on strength of their correlation or partial-correlation
45
bivariate multivariate network
regularized correlation
network showing
relationships in metabolic
timeseries measurements
for two classes of samples
46
Correlation example
Correlation example
Biochemical networks
Biochemical networks
47
nodes
edges
Multi-Omic networks
Multi-Omic networks
48
https://www.ebi.ac.uk/training/online/courses/network-analysis-of-protein-interaction-data-an-introduction/types-of-biological-networks/
Multi-Omic networks
Multi-Omic networks
49
Multi-Omic networks
Multi-Omic networks
50
•Use structure to generate
molecular fingerprint
•Calculate similarities
between metabolites based
on fingerprint
BMC Bioinformatics 2012, 13:99 doi:10.1186/1471-2105-13-99
Structural similarity networks
Structural similarity networks
51
52
Structural similarity example
Structural similarity example
Treatment Effects Network
53
Metabolites
Shape = increase/decrease
Size = importance (loading)
Color = correlation
Connections
violet = Biochemical relationships
green = Structural similarity
Combining
networks
Combining
networks
https://dgrapov.github.io/MetaMapR/gallery.html
Follow along with the following tutorial:
https://creativedatasolutions.github.io/CDS.courses/courses/network_mapping
_101/docs/partial/network/
54
Your turn
Your turn
learn
• Cytoscape basics
map variables to
• node attributes
• edge attributes
optimize
• layout
• legend
• publication quality figure
https://creativedatasolutions.github.io/CDS.courses/courses/network_mapping_101/docs/partial/cytoscape/ 55
Network refinement and visualization
Network refinement and visualization
Follow along with the following tutorial:
https://creativedatasolutions.github.io/CDS.courses/courses/network_mapping
_101/docs/partial/cytoscape/
56
Your turn
Your turn

More Related Content

PPT
Advanced strategies for Metabolomics Data Analysis
PPTX
Metabolomic Data Analysis Workshop and Tutorials (2014)
PPTX
ML-Lec-18-NEW Dimensionality Reduction-PCA (1).pptx
PPT
The following ppt is about principal component analysis
PPT
PPTX
PCA_2022-In_and_out.pptx zxczxczxczxczxcxzczx
PPT
Principal Component Analysis (PCA):How to conduct PCA
PPT
Lecture 12 Principal Component Analysis in Machine Learning.ppt
Advanced strategies for Metabolomics Data Analysis
Metabolomic Data Analysis Workshop and Tutorials (2014)
ML-Lec-18-NEW Dimensionality Reduction-PCA (1).pptx
The following ppt is about principal component analysis
PCA_2022-In_and_out.pptx zxczxczxczxczxcxzczx
Principal Component Analysis (PCA):How to conduct PCA
Lecture 12 Principal Component Analysis in Machine Learning.ppt

Similar to Network mapping 101 course (20)

PPT
pca in machine learning pca in machine learning pca in machine learning pca i...
PPTX
Dimensionality Reduction and feature extraction.pptx
PPTX
0 introduction
PPT
Advanced Strategies for Analysis of Metabolomic Data
PDF
asset-v1_MITx+18.6501x+2T2020+type@asset+block@lectureslides_Chap8-noPhantom.pdf
PDF
lec9_annotated.pdf ml csci 567 vatsal sharan
PDF
Rapport d'analyse Dimensionality Reduction
PDF
Machine_Learining_Concepts_DecisionTrees&PCA.pdf
PPT
Prote-OMIC Data Analysis and Visualization
PDF
PCA for the uninitiated
PPTX
M5.pptx
PDF
Principal component analysis and lda
PPT
Multivariate Analysis and Visualization of Proteomic Data
PPTX
Machine Learning Summary for Caltech2
PPTX
Singular Value Decomposition (SVD).pptx
PPTX
EDAB Module 5 Singular Value Decomposition (SVD).pptx
PPTX
introduction to Statistical Theory.pptx
PDF
Machine Learning Notes for beginners ,Step by step
PPT
Multivariate data analysis and visualization tools for biological data
PPTX
Too good to be true? How validate your data
pca in machine learning pca in machine learning pca in machine learning pca i...
Dimensionality Reduction and feature extraction.pptx
0 introduction
Advanced Strategies for Analysis of Metabolomic Data
asset-v1_MITx+18.6501x+2T2020+type@asset+block@lectureslides_Chap8-noPhantom.pdf
lec9_annotated.pdf ml csci 567 vatsal sharan
Rapport d'analyse Dimensionality Reduction
Machine_Learining_Concepts_DecisionTrees&PCA.pdf
Prote-OMIC Data Analysis and Visualization
PCA for the uninitiated
M5.pptx
Principal component analysis and lda
Multivariate Analysis and Visualization of Proteomic Data
Machine Learning Summary for Caltech2
Singular Value Decomposition (SVD).pptx
EDAB Module 5 Singular Value Decomposition (SVD).pptx
introduction to Statistical Theory.pptx
Machine Learning Notes for beginners ,Step by step
Multivariate data analysis and visualization tools for biological data
Too good to be true? How validate your data
Ad

More from Dmitry Grapov (20)

PDF
R programming for Data Science - A Beginner’s Guide
PDF
Rise of Deep Learning for Genomic, Proteomic, and Metabolomic Data Integratio...
PDF
Dmitry Grapov Resume and CV
PPTX
Machine Learning Powered Metabolomic Network Analysis
PPTX
Complex Systems Biology Informed Data Analysis and Machine Learning
PPTX
Data analysis workflows part 1 2015
PPTX
Data analysis workflows part 2 2015
PPTX
Metabolomics and Beyond Challenges and Strategies for Next-gen Omic Analyses
PDF
Case Study: Overview of Metabolomic Data Normalization Strategies
PPTX
Modeling poster
PPTX
Mapping to the Metabolomic Manifold
PPTX
3 data normalization (2014 lab tutorial)
PPTX
Normalization of Large-Scale Metabolomic Studies 2014
PPT
Gene Ontology Enrichment Network Analysis -Tutorial
PPTX
American Society of Mass Spectrommetry Conference 2014
PPT
Multivarite and network tools for biological data analysis
PPTX
Data Normalization Approaches for Large-scale Biological Studies
PPTX
Omic Data Integration Strategies
PPTX
Automation of (Biological) Data Analysis and Report Generation
PPTX
Metabolomic data analysis and visualization tools
R programming for Data Science - A Beginner’s Guide
Rise of Deep Learning for Genomic, Proteomic, and Metabolomic Data Integratio...
Dmitry Grapov Resume and CV
Machine Learning Powered Metabolomic Network Analysis
Complex Systems Biology Informed Data Analysis and Machine Learning
Data analysis workflows part 1 2015
Data analysis workflows part 2 2015
Metabolomics and Beyond Challenges and Strategies for Next-gen Omic Analyses
Case Study: Overview of Metabolomic Data Normalization Strategies
Modeling poster
Mapping to the Metabolomic Manifold
3 data normalization (2014 lab tutorial)
Normalization of Large-Scale Metabolomic Studies 2014
Gene Ontology Enrichment Network Analysis -Tutorial
American Society of Mass Spectrommetry Conference 2014
Multivarite and network tools for biological data analysis
Data Normalization Approaches for Large-scale Biological Studies
Omic Data Integration Strategies
Automation of (Biological) Data Analysis and Report Generation
Metabolomic data analysis and visualization tools
Ad

Recently uploaded (20)

PDF
ELS_Q1_Module-11_Formation-of-Rock-Layers_v2.pdf
PPTX
GEN. BIO 1 - CELL TYPES & CELL MODIFICATIONS
PDF
VARICELLA VACCINATION: A POTENTIAL STRATEGY FOR PREVENTING MULTIPLE SCLEROSIS
PPTX
microscope-Lecturecjchchchchcuvuvhc.pptx
PPTX
Cell Membrane: Structure, Composition & Functions
PPTX
TOTAL hIP ARTHROPLASTY Presentation.pptx
PPTX
Protein & Amino Acid Structures Levels of protein structure (primary, seconda...
PPTX
The KM-GBF monitoring framework – status & key messages.pptx
PPTX
BIOMOLECULES PPT........................
PPTX
Introduction to Fisheries Biotechnology_Lesson 1.pptx
PPTX
famous lake in india and its disturibution and importance
PDF
. Radiology Case Scenariosssssssssssssss
PPTX
Taita Taveta Laboratory Technician Workshop Presentation.pptx
PPTX
EPIDURAL ANESTHESIA ANATOMY AND PHYSIOLOGY.pptx
PDF
Unveiling a 36 billion solar mass black hole at the centre of the Cosmic Hors...
PDF
Phytochemical Investigation of Miliusa longipes.pdf
PDF
An interstellar mission to test astrophysical black holes
PDF
SEHH2274 Organic Chemistry Notes 1 Structure and Bonding.pdf
PPTX
Microbiology with diagram medical studies .pptx
PDF
IFIT3 RNA-binding activity primores influenza A viruz infection and translati...
ELS_Q1_Module-11_Formation-of-Rock-Layers_v2.pdf
GEN. BIO 1 - CELL TYPES & CELL MODIFICATIONS
VARICELLA VACCINATION: A POTENTIAL STRATEGY FOR PREVENTING MULTIPLE SCLEROSIS
microscope-Lecturecjchchchchcuvuvhc.pptx
Cell Membrane: Structure, Composition & Functions
TOTAL hIP ARTHROPLASTY Presentation.pptx
Protein & Amino Acid Structures Levels of protein structure (primary, seconda...
The KM-GBF monitoring framework – status & key messages.pptx
BIOMOLECULES PPT........................
Introduction to Fisheries Biotechnology_Lesson 1.pptx
famous lake in india and its disturibution and importance
. Radiology Case Scenariosssssssssssssss
Taita Taveta Laboratory Technician Workshop Presentation.pptx
EPIDURAL ANESTHESIA ANATOMY AND PHYSIOLOGY.pptx
Unveiling a 36 billion solar mass black hole at the centre of the Cosmic Hors...
Phytochemical Investigation of Miliusa longipes.pdf
An interstellar mission to test astrophysical black holes
SEHH2274 Organic Chemistry Notes 1 Structure and Bonding.pdf
Microbiology with diagram medical studies .pptx
IFIT3 RNA-binding activity primores influenza A viruz infection and translati...

Network mapping 101 course