SlideShare a Scribd company logo
Open Source Bioinformatics 

for Data Scientists
Amanda Schierz
Recent Projects
! Druggability prediction
! 3D structure
! Protein Sequence
! Predict a protein’s druggability based on it’s position in the
protein-protein interaction network
! Drug Resistance
! Therapeutic opportunities
! Identification of new gene targets for cancer
! Are they Druggable?
! Candidate Compounds
! Compounds more likely to be a hit for a bioassay
Drug Discovery Process
Early-stage:
Discovery
Optimisation ADMET
Clinical
Trials
Paperwork
• Target Evaluation
• Compound
Screening
• Computational
Chemistry
• Structure-
based Drug
Design
• Absorption
Distribution
Metabolism
Excretion
Toxicity
• Patient
Stratification
• Protocol
• Drug Approval
Biology 101
! There is a many to many relationship between Gene and Protein
! A Protein is a large molecule; a Drug is a small molecule
! Gene Expression data
! The amount of a gene produced. Epigenetics.
! highly / lowly / over / under – fold change
! Warning: Platforms and preprocessing
! Gene Copy Number
! Loss / Gain a gene
! On one strand or 2?
! There are only approx. 400 genetic targets of approved
pharmaceuticals
! Only from a handful of Protein Families
! Desperate need for diversity
! TCGGTCAGGCTAGCCGTTACAGGG
Target Identification
! Prediction of disease-associated genes
! patient level
! gene / protein level
! network
! Prediction of mechanisms of disease
! Epigenetic targets – meta-targets
! Prediction of protein function – from sequence / structure / network
! multi-class; multi-label
! Prediction of 3D structure
! Prediction of protein binding
! New immune targets
Druggability Prediction
! Drugs – FDA Approved ~350 Very strict – know
therapeutic benefit
! Drugbank – loose – binds but no therapeutic benefit
! Tractable or Druggable
! Rule of 5 compliant
! Precedence-based
- Druggable families / Homology
- Ligand-based scoring
- Uniprot, bioassays – EBI and Pubchem bioassay
- Statistical analysis
Druggability Prediction
! Sequence Analysis
- Amino Acid motifs and composition
- Physicochemical descriptors
- infinite amount – very wide data set
- Supervised classification
! FASTA - can download all human sequences from Uniprot
>seq0
FQTWEEFSRAAEKLYLADPMKVRVVLKYRHVDGNLCIKVTD
! R ProtR ; R Bioconductor
! species,mhc,peptide_length,A,R,N,D,C,E,Q,G,H,I,L,K,M,F,P,S,T,W,Y
,V,scl1.lag1,scl2.lag1,scl1.lag2,scl2.lag2,scl1.2.lag1,scl2.1.lag1,scl1.2.l
ag2,scl2.1.lag2,AA,RA,NA,DA,CA,EA,QA,GA,HA,IA,LA,KA,MA,F
A,PA,SA,TA,WA,YA,VA,AR,RR,NR,DR,CR ..... ,Schneider.Xr.K,Schn
eider.Xr.M,Schneider.Xr.F, Grantham.Xr.A,Grantham.Xr.R,
Druggability Prediction
! 3D structure
- Pockets, surface area
- Ligand interaction fingerprints
- Supervised classification
3D Structure
! PDB, ProtDCal, PockDrug
Druggability Prediction
! Interaction Network
! Many use cases
! Data from EBI and Y2H
! List of binary interactions
! Becareful 1: Data is inherently biased
! Becareful 2: Complex interactions
! R iGraph; Gephi for visualisation
! Topological properties
! Community analysis
! Subgraph analysis
! Statistical analysis, network analysis and supervised
classification
Open-Source Bioinformatics for Data Scientists with Amanda Schierz
Drug Resistance
Drug Resistance
Compound Bioactivity
! Brute force mass screening
! 1000s compounds screened in batches
! Primary Assays; Secondary / confirmatory assays
! Can be binary classification or regression
! The IC50 is a measure of how effective a drug is.
! Active / inactive : IC50 threshold
! Goal is also to identify diverse compound structures
! Scaffold Hopping
! Same kind of method as Protein Sequence conversion
! Pharmacophore fingerprints
! https://www.chemaxon.com/free-software/
Open-Source Bioinformatics for Data Scientists with Amanda Schierz
Compound ADMET
! Many use cases
! ADMET of hits
! Absorption
! Distribution
! Metabolism
! Excretion
! Toxicity
! Mutagenecity
! Protein binding
General Resources
! EBI European Bioinformatics Institute / Pubchem
! API
! Integrates several downloadable Data Sources (expression, Copy
Number, Bioassays, network, disease-specific)
! Baseline data (Normal not diseased)
! Protein Data Bank – 3D Structures
! DrugBank
! Cancer – The Cancer Genome Atlas (TCGA) and International
Cancer Genome Consortium (ICGC)
! Coding Tools – R Bioconductor , BioPerl, BioPython
! https://docs.chemaxon.com/display/docs/Documentation
General Resources
! canSAR database
! Integration of biological, pharmacological, chemical, structural
biology and protein network data
Beware 101
! Non-standard Gene names
! Some experiments Genes, some are Proteins
! We need new Drug Targets, different from established ones.
! Keep in mind when analysing results
! Cancer is difficult
! Drug resistance
! Data is not up with the science
! Tumour Heterogeneity
! Wide data = random patterns
! Different expression / sequencing platforms
Therapeutic Opportunities
! Approximately only 350 - 400 protein targets
! DNA damage response (DDR) is essential for maintaining
the genomic integrity of the cell
! Currently targeted by chemotherapy and radiation. Goal is for
small molecule targeting
! TCGA Patient Analysis: Expression, Copy Number Variation
and Mutation data.
! 15 cancer disease types
! Telegraph March 2015
! New drugs to tackle cancer cell weak spots could end
'scattergun' chemotherapy
Laurence H. Pearl, Amanda C. Schierz, Simon E. Ward, Bissan Al-Lazikani, Frances M. G.
Pearl. Therapeutic opportunities within the DNA Damage Response. Nature Cancer Reviews
Therapeutic Opportunities
! Statistical analysis of DDR deregulation in patients compared
to a random set of genes
! Druggability prediction of deregulated DDR genes
! Synthetic Lethality analysis of Yeast DDR orthologues
! Two genes are synthetic lethal if mutation of either alone is fine
but mutation of both leads to cell death. Targeting a gene that is
synthetic lethal to a cancer-relevant mutation theoretically will
kill only cancer cells.
Therapeutic Opportunities
DDR Pathway Signatures

More Related Content

PPTX
Ca ncer proteomics
PDF
Forum on Personalized Medicine: Challenges for the next decade
PPT
Anne Wojcicki of 23andMe at FDA Public Meeting on LDTs, July 20, 2010
PDF
Cellular & Gene Therapy
PPTX
Genomics, Bioinformatics, and Pathology
PPTX
2016 Dal Human Genetics - Genomics in Medicine Lecture
PPTX
2016 ngs health_lecture
PPTX
Dr. Leroy Hood Lecuture on P4 Medicine
Ca ncer proteomics
Forum on Personalized Medicine: Challenges for the next decade
Anne Wojcicki of 23andMe at FDA Public Meeting on LDTs, July 20, 2010
Cellular & Gene Therapy
Genomics, Bioinformatics, and Pathology
2016 Dal Human Genetics - Genomics in Medicine Lecture
2016 ngs health_lecture
Dr. Leroy Hood Lecuture on P4 Medicine

What's hot (19)

PPTX
Pre-clinical drug prioritization via prognosis-guided genetic interaction net...
PDF
Single-Cell Sequencing for Drug Discovery: Applications and Challenges
PDF
Big Data and Genomic Medicine by Corey Nislow
PPTX
Next Generation Sequencing application in virology
PDF
A New Generation Of Mechanism-Based Biomarkers For The Clinic
PPT
Gamida cell ppt_english_5-5-2010
PDF
BE Retreat 2015 Poster
PDF
Robert Pesich_PAVA_Stanford Resume v. 8_22_16
PPTX
Single cell pcr
PPTX
Bioinformatics as a tool for understanding carcinogenesis
PPTX
APPLICATION OF NEXT GENERATION SEQUENCING (NGS) IN CANCER TREATMENT
PPTX
The Monarch Initiative: A semantic phenomics approach to disease discovery
PPTX
Next generation sequencing in cancer treatment
PPTX
Bioinformatics as a tool for understanding clinically significant variations ...
PDF
The Application of Next Generation Sequencing (NGS) in cancer treatment
PDF
The Monarch Initiative: From Model Organism to Precision Medicine
PPT
High-Throughput Sequencing
DOCX
Chimeric Antigen Receptors (paper with corresponding power point)
Pre-clinical drug prioritization via prognosis-guided genetic interaction net...
Single-Cell Sequencing for Drug Discovery: Applications and Challenges
Big Data and Genomic Medicine by Corey Nislow
Next Generation Sequencing application in virology
A New Generation Of Mechanism-Based Biomarkers For The Clinic
Gamida cell ppt_english_5-5-2010
BE Retreat 2015 Poster
Robert Pesich_PAVA_Stanford Resume v. 8_22_16
Single cell pcr
Bioinformatics as a tool for understanding carcinogenesis
APPLICATION OF NEXT GENERATION SEQUENCING (NGS) IN CANCER TREATMENT
The Monarch Initiative: A semantic phenomics approach to disease discovery
Next generation sequencing in cancer treatment
Bioinformatics as a tool for understanding clinically significant variations ...
The Application of Next Generation Sequencing (NGS) in cancer treatment
The Monarch Initiative: From Model Organism to Precision Medicine
High-Throughput Sequencing
Chimeric Antigen Receptors (paper with corresponding power point)
Ad

Viewers also liked (18)

PPT
Biology Endocrine Powerpoint
PPT
Abhishek seminar
PPT
Pure White 2008 Cdr
PPTX
Soluble Lectin-Like Oxidized LDL Receptor-1 and High-Sensitivity Troponin T a...
PPTX
Role of Leptin in Obesity
PPTX
molecular docking
PPT
Human endocrine system
PDF
Ab Initio Protein Structure Prediction
PPT
Melanin synthesis
PPT
Protein-Ligand Docking
PPTX
Protein 3D structure and classification database
PPTX
Homology modelling
PPT
methods for protein structure prediction
PPT
Disorders of pigmentation
PPT
protein sturcture prediction and molecular modelling
PPTX
Molecular docking
PPTX
Molecular docking and_virtual_screening
PPTX
Chou fasman algorithm for protein structure prediction
Biology Endocrine Powerpoint
Abhishek seminar
Pure White 2008 Cdr
Soluble Lectin-Like Oxidized LDL Receptor-1 and High-Sensitivity Troponin T a...
Role of Leptin in Obesity
molecular docking
Human endocrine system
Ab Initio Protein Structure Prediction
Melanin synthesis
Protein-Ligand Docking
Protein 3D structure and classification database
Homology modelling
methods for protein structure prediction
Disorders of pigmentation
protein sturcture prediction and molecular modelling
Molecular docking
Molecular docking and_virtual_screening
Chou fasman algorithm for protein structure prediction
Ad

Similar to Open-Source Bioinformatics for Data Scientists with Amanda Schierz (20)

PDF
Molecular techniques for pathology research - MDX .pdf
PDF
Proteomic And Metabolomic Approaches To Biomarker Discovery Haleem J Issaq An...
PPTX
Genomics and proteomics in drug discovery and development
PDF
2014 11-27 ODDP 2014 course, Amsterdam, Alain van Gool
PDF
TRANSPARENT AI/ML TO DISCOVER NOVEL THERAPEUTICS FOR RNA SPLICING-MEDIATED DI...
PPTX
Pistoia Alliance-Elsevier Datathon
PPTX
Quantifying the content of biomedical semantic resources as a core for drug d...
PPTX
Genomics
PPTX
Biomarkers & Clinical Research
PPTX
PadminiNarayanan-Intro-2018.pptx
PPTX
Genomics and proteomics
PPTX
2017 molecular profiling_wim_vancriekinge
PDF
EID_lec3_Bishai.pdf
PPTX
Sundaram et al. 2018 Presentation
PPTX
Marsh pers strat-mednov2014
PDF
MLGG_for_linkedIn
PPTX
Cell Authentication By STR Profiling
PDF
Introduction to the drug discovery process
PPTX
Role of bioinformatics of drug designing
PPTX
Solutions for Personalized Medicine brochure
Molecular techniques for pathology research - MDX .pdf
Proteomic And Metabolomic Approaches To Biomarker Discovery Haleem J Issaq An...
Genomics and proteomics in drug discovery and development
2014 11-27 ODDP 2014 course, Amsterdam, Alain van Gool
TRANSPARENT AI/ML TO DISCOVER NOVEL THERAPEUTICS FOR RNA SPLICING-MEDIATED DI...
Pistoia Alliance-Elsevier Datathon
Quantifying the content of biomedical semantic resources as a core for drug d...
Genomics
Biomarkers & Clinical Research
PadminiNarayanan-Intro-2018.pptx
Genomics and proteomics
2017 molecular profiling_wim_vancriekinge
EID_lec3_Bishai.pdf
Sundaram et al. 2018 Presentation
Marsh pers strat-mednov2014
MLGG_for_linkedIn
Cell Authentication By STR Profiling
Introduction to the drug discovery process
Role of bioinformatics of drug designing
Solutions for Personalized Medicine brochure

More from Jessica Willis (7)

PDF
ODSC Hackathon for Health October 2016
PDF
Jon Sedar topic modelling presentation #odsc 2016
PDF
Knime customer intelligence on social media odsc london
PDF
Deep learning frameworks v0.40
PDF
Ian huston getting started with cloud foundry
PDF
Iot analytics in wearables
PDF
Data Science for Internet of Things with Ajit Jaokar
ODSC Hackathon for Health October 2016
Jon Sedar topic modelling presentation #odsc 2016
Knime customer intelligence on social media odsc london
Deep learning frameworks v0.40
Ian huston getting started with cloud foundry
Iot analytics in wearables
Data Science for Internet of Things with Ajit Jaokar

Recently uploaded (20)

PDF
Assigned Numbers - 2025 - Bluetooth® Document
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PPTX
Spectroscopy.pptx food analysis technology
PDF
Electronic commerce courselecture one. Pdf
PDF
Network Security Unit 5.pdf for BCA BBA.
PPT
Teaching material agriculture food technology
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PDF
A comparative analysis of optical character recognition models for extracting...
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
NewMind AI Weekly Chronicles - August'25-Week II
PPTX
Big Data Technologies - Introduction.pptx
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
Empathic Computing: Creating Shared Understanding
PDF
Encapsulation theory and applications.pdf
PPTX
Machine Learning_overview_presentation.pptx
PDF
Approach and Philosophy of On baking technology
PDF
Machine learning based COVID-19 study performance prediction
Assigned Numbers - 2025 - Bluetooth® Document
Diabetes mellitus diagnosis method based random forest with bat algorithm
Spectroscopy.pptx food analysis technology
Electronic commerce courselecture one. Pdf
Network Security Unit 5.pdf for BCA BBA.
Teaching material agriculture food technology
Per capita expenditure prediction using model stacking based on satellite ima...
MIND Revenue Release Quarter 2 2025 Press Release
A comparative analysis of optical character recognition models for extracting...
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Unlocking AI with Model Context Protocol (MCP)
NewMind AI Weekly Chronicles - August'25-Week II
Big Data Technologies - Introduction.pptx
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
The AUB Centre for AI in Media Proposal.docx
Empathic Computing: Creating Shared Understanding
Encapsulation theory and applications.pdf
Machine Learning_overview_presentation.pptx
Approach and Philosophy of On baking technology
Machine learning based COVID-19 study performance prediction

Open-Source Bioinformatics for Data Scientists with Amanda Schierz

  • 1. Open Source Bioinformatics 
 for Data Scientists Amanda Schierz
  • 2. Recent Projects ! Druggability prediction ! 3D structure ! Protein Sequence ! Predict a protein’s druggability based on it’s position in the protein-protein interaction network ! Drug Resistance ! Therapeutic opportunities ! Identification of new gene targets for cancer ! Are they Druggable? ! Candidate Compounds ! Compounds more likely to be a hit for a bioassay
  • 3. Drug Discovery Process Early-stage: Discovery Optimisation ADMET Clinical Trials Paperwork • Target Evaluation • Compound Screening • Computational Chemistry • Structure- based Drug Design • Absorption Distribution Metabolism Excretion Toxicity • Patient Stratification • Protocol • Drug Approval
  • 4. Biology 101 ! There is a many to many relationship between Gene and Protein ! A Protein is a large molecule; a Drug is a small molecule ! Gene Expression data ! The amount of a gene produced. Epigenetics. ! highly / lowly / over / under – fold change ! Warning: Platforms and preprocessing ! Gene Copy Number ! Loss / Gain a gene ! On one strand or 2? ! There are only approx. 400 genetic targets of approved pharmaceuticals ! Only from a handful of Protein Families ! Desperate need for diversity
  • 6. Target Identification ! Prediction of disease-associated genes ! patient level ! gene / protein level ! network ! Prediction of mechanisms of disease ! Epigenetic targets – meta-targets ! Prediction of protein function – from sequence / structure / network ! multi-class; multi-label ! Prediction of 3D structure ! Prediction of protein binding ! New immune targets
  • 7. Druggability Prediction ! Drugs – FDA Approved ~350 Very strict – know therapeutic benefit ! Drugbank – loose – binds but no therapeutic benefit ! Tractable or Druggable ! Rule of 5 compliant ! Precedence-based - Druggable families / Homology - Ligand-based scoring - Uniprot, bioassays – EBI and Pubchem bioassay - Statistical analysis
  • 8. Druggability Prediction ! Sequence Analysis - Amino Acid motifs and composition - Physicochemical descriptors - infinite amount – very wide data set - Supervised classification ! FASTA - can download all human sequences from Uniprot >seq0 FQTWEEFSRAAEKLYLADPMKVRVVLKYRHVDGNLCIKVTD ! R ProtR ; R Bioconductor ! species,mhc,peptide_length,A,R,N,D,C,E,Q,G,H,I,L,K,M,F,P,S,T,W,Y ,V,scl1.lag1,scl2.lag1,scl1.lag2,scl2.lag2,scl1.2.lag1,scl2.1.lag1,scl1.2.l ag2,scl2.1.lag2,AA,RA,NA,DA,CA,EA,QA,GA,HA,IA,LA,KA,MA,F A,PA,SA,TA,WA,YA,VA,AR,RR,NR,DR,CR ..... ,Schneider.Xr.K,Schn eider.Xr.M,Schneider.Xr.F, Grantham.Xr.A,Grantham.Xr.R,
  • 9. Druggability Prediction ! 3D structure - Pockets, surface area - Ligand interaction fingerprints - Supervised classification
  • 10. 3D Structure ! PDB, ProtDCal, PockDrug
  • 11. Druggability Prediction ! Interaction Network ! Many use cases ! Data from EBI and Y2H ! List of binary interactions ! Becareful 1: Data is inherently biased ! Becareful 2: Complex interactions ! R iGraph; Gephi for visualisation ! Topological properties ! Community analysis ! Subgraph analysis ! Statistical analysis, network analysis and supervised classification
  • 15. Compound Bioactivity ! Brute force mass screening ! 1000s compounds screened in batches ! Primary Assays; Secondary / confirmatory assays ! Can be binary classification or regression ! The IC50 is a measure of how effective a drug is. ! Active / inactive : IC50 threshold ! Goal is also to identify diverse compound structures ! Scaffold Hopping ! Same kind of method as Protein Sequence conversion ! Pharmacophore fingerprints ! https://www.chemaxon.com/free-software/
  • 17. Compound ADMET ! Many use cases ! ADMET of hits ! Absorption ! Distribution ! Metabolism ! Excretion ! Toxicity ! Mutagenecity ! Protein binding
  • 18. General Resources ! EBI European Bioinformatics Institute / Pubchem ! API ! Integrates several downloadable Data Sources (expression, Copy Number, Bioassays, network, disease-specific) ! Baseline data (Normal not diseased) ! Protein Data Bank – 3D Structures ! DrugBank ! Cancer – The Cancer Genome Atlas (TCGA) and International Cancer Genome Consortium (ICGC) ! Coding Tools – R Bioconductor , BioPerl, BioPython ! https://docs.chemaxon.com/display/docs/Documentation
  • 19. General Resources ! canSAR database ! Integration of biological, pharmacological, chemical, structural biology and protein network data
  • 20. Beware 101 ! Non-standard Gene names ! Some experiments Genes, some are Proteins ! We need new Drug Targets, different from established ones. ! Keep in mind when analysing results ! Cancer is difficult ! Drug resistance ! Data is not up with the science ! Tumour Heterogeneity ! Wide data = random patterns ! Different expression / sequencing platforms
  • 21. Therapeutic Opportunities ! Approximately only 350 - 400 protein targets ! DNA damage response (DDR) is essential for maintaining the genomic integrity of the cell ! Currently targeted by chemotherapy and radiation. Goal is for small molecule targeting ! TCGA Patient Analysis: Expression, Copy Number Variation and Mutation data. ! 15 cancer disease types ! Telegraph March 2015 ! New drugs to tackle cancer cell weak spots could end 'scattergun' chemotherapy Laurence H. Pearl, Amanda C. Schierz, Simon E. Ward, Bissan Al-Lazikani, Frances M. G. Pearl. Therapeutic opportunities within the DNA Damage Response. Nature Cancer Reviews
  • 22. Therapeutic Opportunities ! Statistical analysis of DDR deregulation in patients compared to a random set of genes ! Druggability prediction of deregulated DDR genes ! Synthetic Lethality analysis of Yeast DDR orthologues ! Two genes are synthetic lethal if mutation of either alone is fine but mutation of both leads to cell death. Targeting a gene that is synthetic lethal to a cancer-relevant mutation theoretically will kill only cancer cells.