SlideShare a Scribd company logo
Protein-Protein Interactions Prediction
Sergey Knyazev
November 21, 2012
Outline
1) Introduction
2) Protein-Protein interaction
3) Protein-Protein interaction databases
4) Protein-Protein interaction prediction
Introduction
1) Introduction
2) Protein-Protein interaction
3) Protein-Protein interaction databases
4) Protein-Protein interaction prediction
There is Huge Ammount of
Interactions in a Cell
Example: Possible molecular interactions
in a spreading cell.
There is Many Ways Biomolecules
Interacts in a Cell.
Protein-Protein Interaction
1) Introduction
2) Protein-Protein interaction
3) Protein-Protein interaction databases
4) Protein-Protein interaction prediction
Protein-Protein Interaction
●
Physical contacts with molecular docking between
proteins that occur in a cell or in a living organism.
●
Not just a ‘‘functional contact’’: The existence of
many other types of functional links between
biomolecular entities (genes, proteins, metabolites,
etc.) in living organisms should not be confused
with protein physical interactions.
●
‘‘Specific contact’’, not just all proteins that bump
into each other by chance.
●
Should be excluded interactions that a protein
experiences when it is being made, folded, quality
checked, or degraded.
Protein-Protein Interaction (PPI)
detection
PPIs Detection Methods
Protein-Protein Interaction
Databases
1) Introduction
2) Protein-Protein interaction
3) Protein-Protein interaction
databases
4) Protein-Protein interaction prediction
Protein-Protein Interaction
Databases
●
BIND - Biomolecular Interaction Network Database;
●
BioGRID - Biological General Repository for
Interaction Datasets;
●
DIP - Database of Interacting Proteins;
●
IntAct - IntAct Molecular Interaction Database;
●
HPRD - Human Protein Reference Database
●
MINT - Molecular INTeraction database;
●
PIPs - Human PPI Prediction database;
●
STRING - Known and Predicted Protein-Protein
Interactions.
Slides 0
PPI Network Derived from
Databases
PIPs human PPIs database
●
Contains predictions of 37 000 high probability
interactions of which 34 000 are not reported in the
interaction databases HPRD, BIND, DIP or OPHID.
●
Interactions predicted by a naive Bayesian model.
The method combines information from gene co-
expression, orthology, co-occurrence of domains,
post-translational modifications, co-localization of
the proteins within the cell and analysis of the local
topology of the predicted PPI network.
●
Based on a prediction algorithm described bellow...
Protein-Protein interaction
prediction
1) Introduction
2) Protein-Protein interaction
3) Protein-Protein interaction databases
4) Protein-Protein interaction
prediction
Protein-Protein Interaction
Prediction
●
The prediction of human protein-protein
interactions was investigated in a Bayesian
framework by considering combinations of
individual protein features known to be indicative
of interaction.
●
The seven individual features are used.
●
The features are grouped into five distinct
modules: Expression (E), Ortology(O),
Combined(C), Disorder(D), Transitive(T).
Expression Module
●
Data Source:
–
GDS596 from the Gene Expression Omnibus
●
Description:
–
Gene co-expression profiles from 79 physiologically normal
tissues obtained from various sources
●
Scoring function:
–
Pearson correlation of coexpression over all conditions
●
Bins:
–
20 of equal size covering the correlation value
range (-1 to +1)
Orthology Module
●
Data Source:
–
InParanoid, BIND, DIP and GRID databases
●
Description:
–
Interactions of homologous protein pairs from yeast, fly, worm and human
●
Scoring function:
–
Organism-based using InParanoid score
●
13 Bins:
–
High, medium and low confidence bins were defined for human protein pairs that have
interacting orthologs in either yeast, fly or worm (for a total of 9 bins)
–
two bin for human pairs that have interacting paralogs in human (a medium and a low
confidence)
–
one bin for human pairs that have interacting homologs in more than one organism
–
one bin for human pairs that have only noninteracting orthologs
Combined Module
●
This module incorporates three distinct features in a nonnaïve
Bayesian framework: subcellular localization, domain co-
occurrence and post-translational modification co-occurrence.
●
Localization:
–
Data source:
●
PSLT predictions
–
Description:
●
PSLT is a human subcellular localization predictor that considers nine different
compartments (ER, Golgi, cytosol, nucleus, peroxisome, plasma membrane,
lysosome, mitochondria and extracellular)
–
Scoring function:
●
Qualitative score: proximity of compartments
–
4 bins:
●
same, neighboring, different compartments, or not localized
Combined module
●
Domain co-occurrence
–
Data source:
●
InterPro and Pfam
–
Description:
●
Protein domains and motifs
–
Scoring function:
●
The chi-square test was used as a measure of the likelihood of co-
occurrence of specific InterPro domains and motifs in protein pairs
●
Chi-square scores were calculated for all pairs of domains/motifs
that occurred in the training data
–
Bins:
●
5 covering range of Chi-square scores
Combined module
●
PTM co-occurrence
–
Data source:
●
HPRD and UniProt
–
Description:
●
Post-translational modifications
–
Scoring function:
–
Bins:
●
4 covering range of PTM scores
Disorder Module
●
Data source:
–
VLS2 predictions
●
Description:
–
Prediction of protein intrinsic disorder
●
Scoring function:
–
Sum of the percent disorder for each protein in a pair
●
Bins:
–
6 covering range of scoring function (0 to 200%)
Transitive Module
●
Description:
–
Module that considers local
topology of underlying network
predicted using combinations of
above features
●
Scoring function:
●
Bins:
–
5 covering range of scoring
function
Independence of the Modules
●
The final likelihood ratio output by the predictor is only
representative of the true likelihood of interaction of a protein pair if
the modules considered are independent. If the modules were not
independent, some likelihood ratios would likely be overestimated.
●
Previous studies have demonstrated that some of the features
considered here are indeed independent.
●
Independence of all modules used in our predictor was verified by
calculating Pearson correlation coefficients for all pairs of modules.
Architecture of the Predictor and
Likelihoods of the Modules
Posterior Odds Ratio Estimation
●
f1, … , fn — features
●
I — interaction
●
~I — non-interaction
Accuracy of the Predictors
●
In order to analyze the predictions, five-fold cross validation
experiments were performed and the area under partial ROC
(receiver operator characteristic) curves (partial AUCs) measured.
●
T is the total number of positives in the test set
●
Ti is the number of positives that score higher than the ith highest
scoring negative
Prediction Accuracy of Different Combinations of Modules
PPI Prediction by Single Module
PPI Prediction by Combination of
Modules
Receiver Operator Characteristic
(ROC)
Comparison with Other Interaction
Datasets
●
Estimated datasets:
–
Rhodes probabilistic dataset
–
LR400 (derived from our predictors)
–
Lehner orthology-derived dataset
●
The false positive rates:
●
Reference datasets:
–
Literature-mined Ramani dataset
–
Human Protein Reference Database
(HPRD)
Comparison with Other Interaction
Datasets
Independent Validation
Conclusion
●
Predicted over 37000 human protein
interactions
●
Explored a subspace of the human
interactome that has not been
investigated by previous large
interaction datasets.
References
●
Protein–Protein Interactions Essentials: Key Concepts to
Building and Analyzing Interactome Networks 2010
Javier De Las Rivas, Celia Fontanillo
●
PIPs: human protein–protein interaction prediction
database 2008
Mark D. McDowall, Michelle S. Scott and Geoffrey J. Barton
●
Probabilistic prediction and ranking of human protein-
protein interactions 2007
Michelle S Scott and Geoffrey J Barton
Thank you!

More Related Content

PPTX
Protein protein interaction
PPTX
YEAST TWO HYBRID SYSTEM
PPTX
Protein protein interaction
PPT
Protein Interaction Reporters : Protein-Protein Interactions (PPI) elucidated...
PPTX
Brief Introduction of Protein-Protein Interactions (PPIs)
PPTX
GFP For Exploring Protein-Protein Interactions - Nelson Giovanny Rincon Silva
PPTX
protein-protein interaction
PPTX
Tandem affinity purification
Protein protein interaction
YEAST TWO HYBRID SYSTEM
Protein protein interaction
Protein Interaction Reporters : Protein-Protein Interactions (PPI) elucidated...
Brief Introduction of Protein-Protein Interactions (PPIs)
GFP For Exploring Protein-Protein Interactions - Nelson Giovanny Rincon Silva
protein-protein interaction
Tandem affinity purification

What's hot (20)

PPTX
co immunoprecipitation
PDF
Cytoscape: Gene coexppression and PPI networks
PPTX
Proteomics and protein-protein interaction
PPTX
Protein-protein interaction (PPI)
PPTX
Protein protein interaction basic
PPTX
Protein protein interactions
PDF
Protein interaction Creative Biomart
PPTX
Protein-Protein Interactions (PPIs)
PPT
Protein protein interaction
PPTX
The yeast two hybrid system and ChIP
PPTX
yeast two hybrid system
PPTX
Yeast two hybrid system for Protein Protein Interaction Studies
PPTX
Protein protein interactions
PPT
Biotech 2012 spring-6_protein_interactions_0
PPT
Protein-protein interaction
PPTX
Proteomics
PPT
Gene regulatory networks
PPSX
Yeast two hybrid
PPTX
Yeast two hybrid
co immunoprecipitation
Cytoscape: Gene coexppression and PPI networks
Proteomics and protein-protein interaction
Protein-protein interaction (PPI)
Protein protein interaction basic
Protein protein interactions
Protein interaction Creative Biomart
Protein-Protein Interactions (PPIs)
Protein protein interaction
The yeast two hybrid system and ChIP
yeast two hybrid system
Yeast two hybrid system for Protein Protein Interaction Studies
Protein protein interactions
Biotech 2012 spring-6_protein_interactions_0
Protein-protein interaction
Proteomics
Gene regulatory networks
Yeast two hybrid
Yeast two hybrid
Ad

Viewers also liked (7)

PPTX
Bioinformatics and functional genomics
PPTX
Structural genomics
PPSX
Functional genomics
PPTX
Functional genomics
PDF
Genomics
PPTX
Types of genomics ppt
PPT
Bioinformatics and functional genomics
Structural genomics
Functional genomics
Functional genomics
Genomics
Types of genomics ppt
Ad

Similar to Slides 0 (20)

PPTX
overview of the protein protein interaction
PDF
ANTIC-2021_paper_95.pdf
PPT
Protein protein interaction important doc
PPT
Protein protein interactions in systems biology
PPTX
protein-protein interactions/ relationship.pptx
PPTX
Protein protein interaction, functional proteomics
PPTX
Path2 ppi
PPTX
Protein protein interactions
PDF
A novel optimized deep learning method for protein-protein prediction in bioi...
PPTX
Protein protein interactions
PDF
Proteinprotein Interactions Computational Experimental Tools W Cai
PPTX
In silico methods and protein network rewiring.pptx
PDF
upload.pdf
PPTX
Predict protein1 presentation
PDF
Comparative Genomics 1st Edition Philipp Pagel
PDF
A MODEL FOR PREDICTING HIV-1 – HUMAN PROTEIN INTERACTIONS USING DATA MINING T...
PDF
A MODEL FOR PREDICTING HIV-1 – HUMAN PROTEIN INTERACTIONS USING DATA MINING T...
PDF
Comparative Genomics 1st Edition Philipp Pagel
PPTX
[IJCAI 2023] SemiGNN-PPI: Self-Ensembling Multi-Graph Neural Network for Effi...
PDF
Protein-Protein Interaction using SVM based kernel,Jacob Coefficient and Gene...
overview of the protein protein interaction
ANTIC-2021_paper_95.pdf
Protein protein interaction important doc
Protein protein interactions in systems biology
protein-protein interactions/ relationship.pptx
Protein protein interaction, functional proteomics
Path2 ppi
Protein protein interactions
A novel optimized deep learning method for protein-protein prediction in bioi...
Protein protein interactions
Proteinprotein Interactions Computational Experimental Tools W Cai
In silico methods and protein network rewiring.pptx
upload.pdf
Predict protein1 presentation
Comparative Genomics 1st Edition Philipp Pagel
A MODEL FOR PREDICTING HIV-1 – HUMAN PROTEIN INTERACTIONS USING DATA MINING T...
A MODEL FOR PREDICTING HIV-1 – HUMAN PROTEIN INTERACTIONS USING DATA MINING T...
Comparative Genomics 1st Edition Philipp Pagel
[IJCAI 2023] SemiGNN-PPI: Self-Ensembling Multi-Graph Neural Network for Effi...
Protein-Protein Interaction using SVM based kernel,Jacob Coefficient and Gene...

More from BioinformaticsInstitute (20)

PPTX
PDF
Nanopores sequencing
PDF
A superglue for string comparison
PDF
Comparative Genomics and de Bruijn graphs
PDF
Биоинформатический анализ данных полноэкзомного секвенирования: анализ качес...
PPTX
Вперед в прошлое. Методы генетической диагностики древней днк
PDF
Knime & bioinformatics
PDF
"Зачем биологам суперкомпьютеры", Александр Предеус
PDF
Иммунотерапия раковых опухолей: взгляд со стороны системной биологии. Максим ...
PDF
Рак 101 (Мария Шутова, ИоГЕН РАН)
PDF
Плюрипотентность 101
PDF
Секвенирование как инструмент исследования сложных фенотипов человека: от ген...
PPTX
Инвестиции в биоинформатику и биотех (Андрей Афанасьев)
PPT
Biodb 2011-everything
PPT
PPT
PPT
PPT
PPT
Nanopores sequencing
A superglue for string comparison
Comparative Genomics and de Bruijn graphs
Биоинформатический анализ данных полноэкзомного секвенирования: анализ качес...
Вперед в прошлое. Методы генетической диагностики древней днк
Knime & bioinformatics
"Зачем биологам суперкомпьютеры", Александр Предеус
Иммунотерапия раковых опухолей: взгляд со стороны системной биологии. Максим ...
Рак 101 (Мария Шутова, ИоГЕН РАН)
Плюрипотентность 101
Секвенирование как инструмент исследования сложных фенотипов человека: от ген...
Инвестиции в биоинформатику и биотех (Андрей Афанасьев)
Biodb 2011-everything

Recently uploaded (20)

PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PDF
Encapsulation theory and applications.pdf
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PPTX
MYSQL Presentation for SQL database connectivity
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PPTX
sap open course for s4hana steps from ECC to s4
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
Empathic Computing: Creating Shared Understanding
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PPTX
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Electronic commerce courselecture one. Pdf
PDF
Approach and Philosophy of On baking technology
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
Encapsulation theory and applications.pdf
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
Building Integrated photovoltaic BIPV_UPV.pdf
Agricultural_Statistics_at_a_Glance_2022_0.pdf
MYSQL Presentation for SQL database connectivity
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Diabetes mellitus diagnosis method based random forest with bat algorithm
sap open course for s4hana steps from ECC to s4
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
“AI and Expert System Decision Support & Business Intelligence Systems”
Empathic Computing: Creating Shared Understanding
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Advanced methodologies resolving dimensionality complications for autism neur...
Electronic commerce courselecture one. Pdf
Approach and Philosophy of On baking technology

Slides 0

  • 2. Outline 1) Introduction 2) Protein-Protein interaction 3) Protein-Protein interaction databases 4) Protein-Protein interaction prediction
  • 3. Introduction 1) Introduction 2) Protein-Protein interaction 3) Protein-Protein interaction databases 4) Protein-Protein interaction prediction
  • 4. There is Huge Ammount of Interactions in a Cell Example: Possible molecular interactions in a spreading cell.
  • 5. There is Many Ways Biomolecules Interacts in a Cell.
  • 6. Protein-Protein Interaction 1) Introduction 2) Protein-Protein interaction 3) Protein-Protein interaction databases 4) Protein-Protein interaction prediction
  • 7. Protein-Protein Interaction ● Physical contacts with molecular docking between proteins that occur in a cell or in a living organism. ● Not just a ‘‘functional contact’’: The existence of many other types of functional links between biomolecular entities (genes, proteins, metabolites, etc.) in living organisms should not be confused with protein physical interactions. ● ‘‘Specific contact’’, not just all proteins that bump into each other by chance. ● Should be excluded interactions that a protein experiences when it is being made, folded, quality checked, or degraded.
  • 10. Protein-Protein Interaction Databases 1) Introduction 2) Protein-Protein interaction 3) Protein-Protein interaction databases 4) Protein-Protein interaction prediction
  • 11. Protein-Protein Interaction Databases ● BIND - Biomolecular Interaction Network Database; ● BioGRID - Biological General Repository for Interaction Datasets; ● DIP - Database of Interacting Proteins; ● IntAct - IntAct Molecular Interaction Database; ● HPRD - Human Protein Reference Database ● MINT - Molecular INTeraction database; ● PIPs - Human PPI Prediction database; ● STRING - Known and Predicted Protein-Protein Interactions.
  • 13. PPI Network Derived from Databases
  • 14. PIPs human PPIs database ● Contains predictions of 37 000 high probability interactions of which 34 000 are not reported in the interaction databases HPRD, BIND, DIP or OPHID. ● Interactions predicted by a naive Bayesian model. The method combines information from gene co- expression, orthology, co-occurrence of domains, post-translational modifications, co-localization of the proteins within the cell and analysis of the local topology of the predicted PPI network. ● Based on a prediction algorithm described bellow...
  • 15. Protein-Protein interaction prediction 1) Introduction 2) Protein-Protein interaction 3) Protein-Protein interaction databases 4) Protein-Protein interaction prediction
  • 16. Protein-Protein Interaction Prediction ● The prediction of human protein-protein interactions was investigated in a Bayesian framework by considering combinations of individual protein features known to be indicative of interaction. ● The seven individual features are used. ● The features are grouped into five distinct modules: Expression (E), Ortology(O), Combined(C), Disorder(D), Transitive(T).
  • 17. Expression Module ● Data Source: – GDS596 from the Gene Expression Omnibus ● Description: – Gene co-expression profiles from 79 physiologically normal tissues obtained from various sources ● Scoring function: – Pearson correlation of coexpression over all conditions ● Bins: – 20 of equal size covering the correlation value range (-1 to +1)
  • 18. Orthology Module ● Data Source: – InParanoid, BIND, DIP and GRID databases ● Description: – Interactions of homologous protein pairs from yeast, fly, worm and human ● Scoring function: – Organism-based using InParanoid score ● 13 Bins: – High, medium and low confidence bins were defined for human protein pairs that have interacting orthologs in either yeast, fly or worm (for a total of 9 bins) – two bin for human pairs that have interacting paralogs in human (a medium and a low confidence) – one bin for human pairs that have interacting homologs in more than one organism – one bin for human pairs that have only noninteracting orthologs
  • 19. Combined Module ● This module incorporates three distinct features in a nonnaïve Bayesian framework: subcellular localization, domain co- occurrence and post-translational modification co-occurrence. ● Localization: – Data source: ● PSLT predictions – Description: ● PSLT is a human subcellular localization predictor that considers nine different compartments (ER, Golgi, cytosol, nucleus, peroxisome, plasma membrane, lysosome, mitochondria and extracellular) – Scoring function: ● Qualitative score: proximity of compartments – 4 bins: ● same, neighboring, different compartments, or not localized
  • 20. Combined module ● Domain co-occurrence – Data source: ● InterPro and Pfam – Description: ● Protein domains and motifs – Scoring function: ● The chi-square test was used as a measure of the likelihood of co- occurrence of specific InterPro domains and motifs in protein pairs ● Chi-square scores were calculated for all pairs of domains/motifs that occurred in the training data – Bins: ● 5 covering range of Chi-square scores
  • 21. Combined module ● PTM co-occurrence – Data source: ● HPRD and UniProt – Description: ● Post-translational modifications – Scoring function: – Bins: ● 4 covering range of PTM scores
  • 22. Disorder Module ● Data source: – VLS2 predictions ● Description: – Prediction of protein intrinsic disorder ● Scoring function: – Sum of the percent disorder for each protein in a pair ● Bins: – 6 covering range of scoring function (0 to 200%)
  • 23. Transitive Module ● Description: – Module that considers local topology of underlying network predicted using combinations of above features ● Scoring function: ● Bins: – 5 covering range of scoring function
  • 24. Independence of the Modules ● The final likelihood ratio output by the predictor is only representative of the true likelihood of interaction of a protein pair if the modules considered are independent. If the modules were not independent, some likelihood ratios would likely be overestimated. ● Previous studies have demonstrated that some of the features considered here are indeed independent. ● Independence of all modules used in our predictor was verified by calculating Pearson correlation coefficients for all pairs of modules.
  • 25. Architecture of the Predictor and Likelihoods of the Modules
  • 26. Posterior Odds Ratio Estimation ● f1, … , fn — features ● I — interaction ● ~I — non-interaction
  • 27. Accuracy of the Predictors ● In order to analyze the predictions, five-fold cross validation experiments were performed and the area under partial ROC (receiver operator characteristic) curves (partial AUCs) measured. ● T is the total number of positives in the test set ● Ti is the number of positives that score higher than the ith highest scoring negative
  • 28. Prediction Accuracy of Different Combinations of Modules
  • 29. PPI Prediction by Single Module
  • 30. PPI Prediction by Combination of Modules
  • 32. Comparison with Other Interaction Datasets ● Estimated datasets: – Rhodes probabilistic dataset – LR400 (derived from our predictors) – Lehner orthology-derived dataset ● The false positive rates: ● Reference datasets: – Literature-mined Ramani dataset – Human Protein Reference Database (HPRD)
  • 33. Comparison with Other Interaction Datasets
  • 35. Conclusion ● Predicted over 37000 human protein interactions ● Explored a subspace of the human interactome that has not been investigated by previous large interaction datasets.
  • 36. References ● Protein–Protein Interactions Essentials: Key Concepts to Building and Analyzing Interactome Networks 2010 Javier De Las Rivas, Celia Fontanillo ● PIPs: human protein–protein interaction prediction database 2008 Mark D. McDowall, Michelle S. Scott and Geoffrey J. Barton ● Probabilistic prediction and ranking of human protein- protein interactions 2007 Michelle S Scott and Geoffrey J Barton