SlideShare a Scribd company logo
1
2
Elia Brodsky
Co-founder and CEO
Pine Biotech (pine-biotech.com)
Survey: bit.ly/2KG7iEj
Georgetown
3
Overview
Education & Training
Research Support
Survey: bit.ly/2KG7iEj
4
5
Patients
Most important clinical features
Normalizedclinicalfeatures
Missing Values
Group-to-group
associations
(bipartite network)
OmicsFeatures
PREDICTION
Enrichment of Clinical Data
Clinical characteristics of patients that are of most interest
can be predicted using other clinical features. This type of
group-to-group association network can be prepared within
a normalized set of clinical features across patients, thus
identifying patients at risk or selecting patients for specific
data to be generated. Validated associations can become a
part of a clinical application used regularly to select patients
or match patients to requirements.
Integration of Omics Data
Omics data represents a feature-rich resource that can be
used for discovery of underlying biology linked to a
particular condition, disease stage, co-morbidities and
population stratification. Integrating the omics data into a
usable, normalized resource and linking it to clinical
outcomes can enable precision targeting of patients and
more precise treatment for complex diseases.
Machine Learning Applications
6
Exploring Ideas Experiment Design Research and Development
Data is playing a key role in all stages of R&D:
the National Center
for Biotechnology
Information (NCBI)
Gene Expression
Omnibus repository
(GEO) alone
contains 80,985
public datasets,
spanning hundreds
of tissue types in
thousands of
organisms
Omics Data: What’s out there?
7
8
Education & Training
Research Support
Dataset: bit.ly/2vNovGC
9
10
Online resources Curated Datasets User-friendly Tools
11
The T-BioInfo Platform is at the core of this experience
Server.t-bio.info
12
13
14
INDEPENDENT PROJECTS (ONCOLOGY SPECIALIZATION EXAMPLE)
Projects are prepared from high impact publications relevant to the specialization. Public domain datasets are curated to prepare focused
assignments illustrating how the data is processed and used to achieve similar results to the publication. Other approaches that can perform a
similar function are discussed, providing a review of the methods section of the paper. Finally, full dataset is organized into a project format that
can be analyzed for discovery.
16
https://edu.t-bio.info/workshops/
17
FALL 2018
INDIA:
USA:
ONLINE:
Louisiana Biomedical Research Network – September 2018
Georgetown University – October 2018
August 25 – workshop in Kolkata (APT SOFTWARE AT SALT LAKE)
September 6 – AIIMS (AIIMS, THEATRE ROOM )
September 15: Short Term Training in Next Generation Sequencing
(Transcriptomics and Genomics)
October 15: Short Term Training in Machine Learning for Biomedical Data
18
server.t-bio.info
Login: test@pine-biotech.com
Password: WF(9iobE
Files: bit.ly/2vNovGC
T-BioInfo is designed for processing, analysis and
integration of multi-omics data. The platform is used in
multiple research groups to extract meaningful insights
from large multi-omics datasets. Our current effort
expands to education, by enabling more people to
extract meaningful, data-driven insights from omics
datasets with biomedical applications. To learn more
about the platform and it’s research and educational
features, follow the highlighted links .
T-bio.info | edu.t-bio.info | server.t-bio.info
19
Modeling Precision Medicine
Machine Learning for Transcriptomics Data: Extracting
Meaningful insights from high-throughput biomedical data.
20
Clinical Subtypes Molecular Subtypes
21
Diagnosis, Prognosis, Response to Treatment
22
Survival prediction
Treatment Selection
OncotypeDXPAM50
Daemen et al., 2013, “Modeling precision treatment of breast cancer”: an analysis of over 70 different Breast Cancer cell lines and over 90
different therapeutic agents. https://genomebiology.biomedcentral.com/articles/10.1186/gb-2013-14-10-r110
Sample 1 Sample 2 Sample 3 Sample 4
gene 1 4 3 3 7
gene 2 6 5 5 8
gene 3 6 6 6 6
gene 4 1 2 1 2
gene 5 9 10 1 5
gene 6 12 4 0 5
gene 7 1 7 9 8
gene 8 4 8 3 10 23
24
server.t-bio.info
Login: test@pine-biotech.com
Password: WF(9iobE
Files: https://bit.ly/2n54U0A
Files we will use in this session
25
/export-data/sciservice/data/pipelines/5629cc88e71b7bf5/upload/SRR925687_1.fq
/export-data/sciservice/data/pipelines/5629cc88e71b7bf5/upload/SRR925687_2.fq
/export-data/sciservice/data/pipelines/5629cc88e71b7bf5/upload/SRR925697_1.fq
/export-data/sciservice/data/pipelines/5629cc88e71b7bf5/upload/SRR925697_2.fq
Part 1:
RNA-Seq Processing
from raw reads to a table of expression
26
RNA-Seq: overview
.…TCTGAAACAATGCTTCAATCTAACTTATCATTCATTGGGA….
Genome
27
Gene A Gene B Gene C
Transcr. ATranscript A Transcr. ATranscript C
28
.…TCTGAAACAATGCTTCAATCTAACTTATCATTCATTGGGA….Gene A Gene B Gene C
Transcr. ATranscript A Transcr. ATranscript C
Reads
RNA-Seq: overview
Preprocessing:
• Adapters removal plus additional
trimming
• Removing PCR duplicates
29
Quantification of expression levels
Mapping
• Mapping on the set of known transcripts
• Mapping on genome (and potential
identification of novel transcripts)
• Combined strategy
RNA-Seq: overview
30
RNA-Seq: basic pipeline
31
RNA-Seq: extended pipeline
32
Expression Table
Sample Name
Gene ID What is this number?
33
Part 2:
Machine Learning
Data exploration and classification
34
35
Group 1
Group 2
Outlier
Unsupervised analysis: PCA
36
• Explore data
• Visualize
Why use Principal Component
Analysis?
• Data Filtering
• Outliers
• Interpretation
Considerations:
37
Unsupervised analysis: PCA
38
Unsupervised analysis: PCA
PCA 7,000 genes PCA PAM50 (35) genes
Normal-like
Basal
Claudin-low
Luminal
39
Unsupervised analysis: Hierarchical Clustering
• Identify groups
• Associate sample to group
Why use clustering?
• Various methods
• Random selection in some methods
• Interpretation
Considerations:
40
Unsupervised analysis: Hierarchical Clustering
Unsupervised analysis: hierarchical clustering
Dendrogram
41
2 clusters
4 clusters
8 clusters
BREAK
42
Q&A
43
DogsCats
?????
Training Set Test Set
Supervised Machine Learning
44
Step-wise Linear Discriminant Analysis (swLDA)
45
Support Vector Machine (SVM) with Linear Kernel
d
d
46
Support Vector Machine (SVM) with Linear Kernel
?
?
47
Support Vector Machine (SVM) with Linear Kernel
BREAK
48
Q&A
Part 3:
Interpretation
Annotating and Interpreting Gene Expression
49
Gene annotation: ENSG to Gene Symbols plus GO
50
BREAK
51
Q&A
52
Download and Modify R Scripts
53
Upcoming Programs and Registration:
Elia Brodsky
Co-founder and CEO
Pine Biotech (pine-biotech.com)
bit.ly/2KG7iEj
https://edu.t-bio.info/workshops/
elia@pine.bio; sahil@pine.bio

More Related Content

PDF
Louisiana Biomedical Research Network - Fall 2020 Bioinformatics Program Ove...
PDF
Omics Logic - Bioinformatics 2.0
PPTX
A collaborative model for bioinformatics education: combining biologically i...
PPTX
Free webinar-introduction to bioinformatics - biologist-1
PPTX
Mastering RNA-Seq (NGS Data Analysis) - A Critical Approach To Transcriptomic...
PDF
Omics Logic Genomics Program
PPTX
Introduction to bioinformatics
PPT
Bioinformatics Projects And Applications
Louisiana Biomedical Research Network - Fall 2020 Bioinformatics Program Ove...
Omics Logic - Bioinformatics 2.0
A collaborative model for bioinformatics education: combining biologically i...
Free webinar-introduction to bioinformatics - biologist-1
Mastering RNA-Seq (NGS Data Analysis) - A Critical Approach To Transcriptomic...
Omics Logic Genomics Program
Introduction to bioinformatics
Bioinformatics Projects And Applications

What's hot (20)

PPTX
Bioinformatics Applications in Biotechnology
PPTX
Role of Bioinformatics in Cancer Research
PDF
Current Trends & Developments of Bioinformatics
PPTX
Uses of Artificial Intelligence in Bioinformatics
PPT
Bioinformatics Information Sources
PPTX
Career oppurtunities in the field of Bioinformatics
PPTX
AI in Bioinformatics
PPTX
Bioinformatics
PPTX
Bioinformatics
PDF
Multi-Omics Bioinformatics across Application Domains
PPTX
Bioinformatics
PPTX
Tools of bioinforformatics by kk
PPTX
Brief introduction to Bioinformatics
PPT
Industry Program In For Sci
PDF
Bioinformatics
PPT
Bioinformatics Project Training for 2,4,6 month
PPT
Bioinformatics in biotechnology by kk sahu
PPTX
Application of bioinformatics in climate smart horticulture
PPT
Bioinformatics, its application main
Bioinformatics Applications in Biotechnology
Role of Bioinformatics in Cancer Research
Current Trends & Developments of Bioinformatics
Uses of Artificial Intelligence in Bioinformatics
Bioinformatics Information Sources
Career oppurtunities in the field of Bioinformatics
AI in Bioinformatics
Bioinformatics
Bioinformatics
Multi-Omics Bioinformatics across Application Domains
Bioinformatics
Tools of bioinforformatics by kk
Brief introduction to Bioinformatics
Industry Program In For Sci
Bioinformatics
Bioinformatics Project Training for 2,4,6 month
Bioinformatics in biotechnology by kk sahu
Application of bioinformatics in climate smart horticulture
Bioinformatics, its application main
Ad

Similar to User-friendly bioinformatics (Monthly Informational workshop) (20)

PDF
Informatics In Proteomics 1st Edition Sudhir Srivastava
PPTX
Software Pipelines: The Good, The Bad and The Ugly
PDF
Informatics In Proteomics 1st Edition Sudhir Srivastava
PPTX
FAIR as a Working Principle for Cancer Genomic Data
PDF
“Detection of Diseases using Machine Learning”
PPTX
Oracle Clinical Overview_Katalyst HLS
PDF
Project Focused Activity And Knowledge Tracker A Unified Data Analysis Collab...
PPTX
From Buffer-Overflowing Genomic Tools to Securing Biomedical File Formats
PPT
tranSMART Community Meeting 5-7 Nov 13 - Session 3: Pfizer’s Recent Use of tr...
PDF
2015-02-10 The Open PHACTS Discovery Platform: Semantic Data Integration for ...
PDF
A systematic review of network analyst - Pubrica
PPTX
Web-based access to experimental and predicted data for environmental fate, t...
PDF
Tag.bio aws public jun 08 2021
PDF
2016 ACS Semantic Approaches for Biochemical Knowledge Discovery
PDF
AI-SDV 2022: Big data analytics platform at Bayer – Turning bits into insight...
PPTX
How can you access PubChem programmatically?
PDF
Oracle Clinical Overview_Katalyst HLS
PDF
COMBINE standards & tools: Getting model management right
PPTX
Data Integration vs Transparency: Tackling the tension
PDF
ISO 20428 Intro
Informatics In Proteomics 1st Edition Sudhir Srivastava
Software Pipelines: The Good, The Bad and The Ugly
Informatics In Proteomics 1st Edition Sudhir Srivastava
FAIR as a Working Principle for Cancer Genomic Data
“Detection of Diseases using Machine Learning”
Oracle Clinical Overview_Katalyst HLS
Project Focused Activity And Knowledge Tracker A Unified Data Analysis Collab...
From Buffer-Overflowing Genomic Tools to Securing Biomedical File Formats
tranSMART Community Meeting 5-7 Nov 13 - Session 3: Pfizer’s Recent Use of tr...
2015-02-10 The Open PHACTS Discovery Platform: Semantic Data Integration for ...
A systematic review of network analyst - Pubrica
Web-based access to experimental and predicted data for environmental fate, t...
Tag.bio aws public jun 08 2021
2016 ACS Semantic Approaches for Biochemical Knowledge Discovery
AI-SDV 2022: Big data analytics platform at Bayer – Turning bits into insight...
How can you access PubChem programmatically?
Oracle Clinical Overview_Katalyst HLS
COMBINE standards & tools: Getting model management right
Data Integration vs Transparency: Tackling the tension
ISO 20428 Intro
Ad

Recently uploaded (20)

PPTX
IB Computer Science - Internal Assessment.pptx
PPTX
Introduction to machine learning and Linear Models
PPTX
Qualitative Qantitative and Mixed Methods.pptx
PDF
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
PPT
Quality review (1)_presentation of this 21
PDF
Mega Projects Data Mega Projects Data
PPTX
Business Ppt On Nestle.pptx huunnnhhgfvu
PDF
Clinical guidelines as a resource for EBP(1).pdf
PDF
Business Analytics and business intelligence.pdf
PDF
Galatica Smart Energy Infrastructure Startup Pitch Deck
PPTX
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
PPTX
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
PPTX
Supervised vs unsupervised machine learning algorithms
PPTX
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
PDF
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
PPTX
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
PPTX
Introduction-to-Cloud-ComputingFinal.pptx
PPTX
Data_Analytics_and_PowerBI_Presentation.pptx
PPTX
Introduction to Knowledge Engineering Part 1
PPTX
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
IB Computer Science - Internal Assessment.pptx
Introduction to machine learning and Linear Models
Qualitative Qantitative and Mixed Methods.pptx
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
Quality review (1)_presentation of this 21
Mega Projects Data Mega Projects Data
Business Ppt On Nestle.pptx huunnnhhgfvu
Clinical guidelines as a resource for EBP(1).pdf
Business Analytics and business intelligence.pdf
Galatica Smart Energy Infrastructure Startup Pitch Deck
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
Supervised vs unsupervised machine learning algorithms
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
Introduction-to-Cloud-ComputingFinal.pptx
Data_Analytics_and_PowerBI_Presentation.pptx
Introduction to Knowledge Engineering Part 1
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx

User-friendly bioinformatics (Monthly Informational workshop)