SlideShare a Scribd company logo
Nan Newton
Data Scientist, Global IT Analytics
StampedeCon 2017, St. Louis MO
Novel Semi-Supervised Probabilistic ML
approach to SNP Variant Calling
DNA Analysis
Through
Advanced ML
Automated
Seed Chipping
Performance
Evaluation
Superior
Seeds
Selected
Today’s Digital Plant Breeding
is Powered by Our Knowledge of Genetics & Advanced Analytics
LABFIELD
Single Nucleotide
Polymorphism (SNP)
Variant detects seeds
with desired traits
C
T
SNP or Molecular Markers serve as signposts
Monsanto Company Confidential4
Genotypes-Phenotype Association
helps breeders reduce spending on field resources by only selecting
seeds with desired phenotypes
Parent generation
P1 P2
CC TT
F1 generation
CT CT CT CT
F2 generation
CC CT CT TT
CC CT CT TT
Homozygous C
Heterozygous
Homozygous T
Genotypes
Goal: Predict Genotypes for any seeds
Genotypes Detection
through Molecular Biology knowledge in high throughput genotyping labs
Seeds sent to lab Part of seeds are chipped
DNA molecule obtained
for each seed
A A T C A T G T
A A T C A T G T
allele1
allele2
A A C T A C G A
A A C T A C G A
allele1
allele2
Uncoil double helix DNA
A A T C A T G T
A A T C A T G T
allele1
allele2
A A C T A C G A
A A C T A C G A
allele1
allele2
Add fluorophores
FAM
FAM
VIC
VIC
Make a bunch of DNA copies
to generate stronger signal
of fluorophores
Genotypes Calls
through fluorescence signals and controls information
Controls
HOM_FAM
HET
HOM_VIC
MISSING
Plate-to-Plate Variations
in regards to clusters behaviors, controls performance, intensities distribution
Monsanto Company Confidential8
Novel Semi-supervised Probabilistic ML Approach to SNP Variant Calling - StampedeCon AI Summit 2017
Impute MISSING label using
k-Nearest Neighbor algorithm
MISSING Less
Confident
Predict FAIL samples using training
model from another lab
MISSING
Semi-supervised Machine Learning
Random
Forest
Normalized
Fluorescence Intensities
Create fluorescence-
based features
Create controls-
based features
Create positions-
based features
Create unsupervised
clustering based features
Predict
Probabilistic
Genotypes
Predict FAIL
samples
Probabilistic Genotypes Prediction
LOWER
CONFIDENCE
Model Scalability and Extensibility: AWS Cloud
Integration with Enterprise Digital Architecture
Input Data from
any databases
Breeding
Biotech
Supply Chain
Customized
Training Models
Predictive Model Execution
4
Better data… Better Decisions
Linkage
Disequilibrium
Genotype-Phenotype
Association
Haplotype
Mapping
Genetic
Mapping
Probabilistic impact on downstream genetic analytics
aa Aa AA
Short Tall
Further Improvement
17 Acknowledgement: Jeff Pobst, Bryan Dannowitz, Chris Schlosberg, Shane Ryerson
Further Improvement
18
Acknowledgement
Molecular Breeding Technology
Global Breeding
Cloud Analytics
Global IT Analytics
Products & Engineering Lab Platform
Product 360 Data Asset
Data Science Center of Excellence

More Related Content

PPTX
F Cluster Synteny Presentation
PPTX
B2.6 genetic engineering
PPTX
Final+ppt+viral+vectors+
PPT
Sv40 virus
PPTX
Genome editing
PPTX
TILLING of cotton and BIOINFORMATICS
PPT
Tilling @ sid
F Cluster Synteny Presentation
B2.6 genetic engineering
Final+ppt+viral+vectors+
Sv40 virus
Genome editing
TILLING of cotton and BIOINFORMATICS
Tilling @ sid

What's hot (20)

PPT
PPTX
Allele mining in crop improvement
PDF
Animal viral vector
PPT
TILLING and Eco-TILLING for crop improvement
PPTX
Tilling and eco tilling
PPT
Presentation
PPTX
Ti Plasmid
PPTX
TRANSPOSON TAGGING
PPTX
Vir regulon
PPTX
Feature of ti plasmid
PDF
Molecular quantitative genetics for plant breeding roundtable 2010x
PPTX
A system approaches to identifing gene rgulatory network
PDF
Need to revolutionize the crop breeding
PPTX
Marker Assisted Selection in Crop Breeding
PPTX
Agrobacterium mediated gene transfer in plants
PPTX
cloning and expression vector in plants
PPTX
Project Unity: The Way of the Future for Plant Breeding
PPT
Microarray technology
PPTX
Gene transfer in plants 2- biological vector
PPT
Transgenic plants new
Allele mining in crop improvement
Animal viral vector
TILLING and Eco-TILLING for crop improvement
Tilling and eco tilling
Presentation
Ti Plasmid
TRANSPOSON TAGGING
Vir regulon
Feature of ti plasmid
Molecular quantitative genetics for plant breeding roundtable 2010x
A system approaches to identifing gene rgulatory network
Need to revolutionize the crop breeding
Marker Assisted Selection in Crop Breeding
Agrobacterium mediated gene transfer in plants
cloning and expression vector in plants
Project Unity: The Way of the Future for Plant Breeding
Microarray technology
Gene transfer in plants 2- biological vector
Transgenic plants new
Ad

Similar to Novel Semi-supervised Probabilistic ML Approach to SNP Variant Calling - StampedeCon AI Summit 2017 (20)

PPTX
2014 11 03_bioinformatics_case_studies
PPT
Multiplex Assays for Studying Gene Regulation and Cell Function
PDF
Towards Precision Medicine: Tute Genomics, a cloud-based application for anal...
PPTX
AI in Bioinformatics
PPTX
Aug2013 illumina platinum genomes
PDF
20170428 - Look to Precision Agriculture to Bootstrap Precision Medicine - Cu...
PDF
Cloud Accelerated Genomics by Allen Day of Google
PDF
Analyzing Fusion Genes Using Next-Generation Sequencing
PPTX
Introducing the KnetMiner Knowledge Graph: things, not strings
PPT
PROKARYOTIC TRANSCRIPTOMICS AND METAGENOMICS
PPTX
Expanding Your Research Capabilities Using Targeted NGS
PPTX
2015 TriCon - Clinical Grade Annotations - Public Data Resources for Interpre...
PPTX
Seminar-Artificial intelligence in vegetable breeding [Autosaved] (1).pptx
PPTX
Complete Human Genome Sequencing
PDF
ICMP MPS SNP Panel for Missing Persons - Michelle Peck et al.
PPT
Techniques of-biotechnology-mcclean-good
PPTX
Genome in a bottle for amp GeT-RM 181030
PDF
Next Generation Diagnostics: Potential Clinical Applications of Illumina’sTec...
PPTX
171017 giab for giab grc workshop
PPTX
2014 davis-talk
2014 11 03_bioinformatics_case_studies
Multiplex Assays for Studying Gene Regulation and Cell Function
Towards Precision Medicine: Tute Genomics, a cloud-based application for anal...
AI in Bioinformatics
Aug2013 illumina platinum genomes
20170428 - Look to Precision Agriculture to Bootstrap Precision Medicine - Cu...
Cloud Accelerated Genomics by Allen Day of Google
Analyzing Fusion Genes Using Next-Generation Sequencing
Introducing the KnetMiner Knowledge Graph: things, not strings
PROKARYOTIC TRANSCRIPTOMICS AND METAGENOMICS
Expanding Your Research Capabilities Using Targeted NGS
2015 TriCon - Clinical Grade Annotations - Public Data Resources for Interpre...
Seminar-Artificial intelligence in vegetable breeding [Autosaved] (1).pptx
Complete Human Genome Sequencing
ICMP MPS SNP Panel for Missing Persons - Michelle Peck et al.
Techniques of-biotechnology-mcclean-good
Genome in a bottle for amp GeT-RM 181030
Next Generation Diagnostics: Potential Clinical Applications of Illumina’sTec...
171017 giab for giab grc workshop
2014 davis-talk
Ad

More from StampedeCon (20)

PDF
Why Should We Trust You-Interpretability of Deep Neural Networks - StampedeCo...
PDF
The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017
PDF
Predicting Outcomes When Your Outcomes are Graphs - StampedeCon AI Summit 2017
PDF
How to Talk about AI to Non-analaysts - Stampedecon AI Summit 2017
PDF
Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017
PDF
Foundations of Machine Learning - StampedeCon AI Summit 2017
PDF
Don't Start from Scratch: Transfer Learning for Novel Computer Vision Problem...
PDF
Bringing the Whole Elephant Into View Can Cognitive Systems Bring Real Soluti...
PDF
Automated AI The Next Frontier in Analytics - StampedeCon AI Summit 2017
PDF
AI in the Enterprise: Past, Present & Future - StampedeCon AI Summit 2017
PDF
A Different Data Science Approach - StampedeCon AI Summit 2017
PDF
Graph in Customer 360 - StampedeCon Big Data Conference 2017
PDF
End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017
PDF
Doing Big Data Using Amazon's Analogs - StampedeCon Big Data Conference 2017
PDF
Enabling New Business Capabilities with Cloud-based Streaming Data Architectu...
PDF
Big Data Meets IoT: Lessons From the Cloud on Polling, Collecting, and Analyz...
PDF
Innovation in the Data Warehouse - StampedeCon 2016
PPTX
Creating a Data Driven Organization - StampedeCon 2016
PPTX
Using The Internet of Things for Population Health Management - StampedeCon 2016
PDF
Turn Data Into Actionable Insights - StampedeCon 2016
Why Should We Trust You-Interpretability of Deep Neural Networks - StampedeCo...
The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017
Predicting Outcomes When Your Outcomes are Graphs - StampedeCon AI Summit 2017
How to Talk about AI to Non-analaysts - Stampedecon AI Summit 2017
Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017
Foundations of Machine Learning - StampedeCon AI Summit 2017
Don't Start from Scratch: Transfer Learning for Novel Computer Vision Problem...
Bringing the Whole Elephant Into View Can Cognitive Systems Bring Real Soluti...
Automated AI The Next Frontier in Analytics - StampedeCon AI Summit 2017
AI in the Enterprise: Past, Present & Future - StampedeCon AI Summit 2017
A Different Data Science Approach - StampedeCon AI Summit 2017
Graph in Customer 360 - StampedeCon Big Data Conference 2017
End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017
Doing Big Data Using Amazon's Analogs - StampedeCon Big Data Conference 2017
Enabling New Business Capabilities with Cloud-based Streaming Data Architectu...
Big Data Meets IoT: Lessons From the Cloud on Polling, Collecting, and Analyz...
Innovation in the Data Warehouse - StampedeCon 2016
Creating a Data Driven Organization - StampedeCon 2016
Using The Internet of Things for Population Health Management - StampedeCon 2016
Turn Data Into Actionable Insights - StampedeCon 2016

Recently uploaded (20)

PPTX
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
PPTX
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
PPTX
Database Infoormation System (DBIS).pptx
PPTX
CEE 2 REPORT G7.pptxbdbshjdgsgjgsjfiuhsd
PPTX
Introduction to Knowledge Engineering Part 1
PPTX
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
PPTX
STUDY DESIGN details- Lt Col Maksud (21).pptx
PDF
.pdf is not working space design for the following data for the following dat...
PDF
Introduction to Business Data Analytics.
PPTX
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
PDF
Clinical guidelines as a resource for EBP(1).pdf
PPTX
climate analysis of Dhaka ,Banglades.pptx
PPTX
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
PPTX
Major-Components-ofNKJNNKNKNKNKronment.pptx
PPT
Chapter 2 METAL FORMINGhhhhhhhjjjjmmmmmmmmm
PPTX
Acceptance and paychological effects of mandatory extra coach I classes.pptx
PDF
Fluorescence-microscope_Botany_detailed content
PPTX
Computer network topology notes for revision
PDF
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
PPTX
Global journeys: estimating international migration
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
Database Infoormation System (DBIS).pptx
CEE 2 REPORT G7.pptxbdbshjdgsgjgsjfiuhsd
Introduction to Knowledge Engineering Part 1
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
STUDY DESIGN details- Lt Col Maksud (21).pptx
.pdf is not working space design for the following data for the following dat...
Introduction to Business Data Analytics.
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
Clinical guidelines as a resource for EBP(1).pdf
climate analysis of Dhaka ,Banglades.pptx
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
Major-Components-ofNKJNNKNKNKNKronment.pptx
Chapter 2 METAL FORMINGhhhhhhhjjjjmmmmmmmmm
Acceptance and paychological effects of mandatory extra coach I classes.pptx
Fluorescence-microscope_Botany_detailed content
Computer network topology notes for revision
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
Global journeys: estimating international migration

Novel Semi-supervised Probabilistic ML Approach to SNP Variant Calling - StampedeCon AI Summit 2017