SlideShare a Scribd company logo
A Lifeboat to the Gene PoolPredictive association between trait data and eco-geographic data for identification of trait properties useful for improvement of food cropsVavilov Seminar at IPK GaterslebenMay 12, 2010 - Dag Endresen, NordGen
Topics:Utilization of genetic diversity
Core collection subset
Trait mining selection (FIGS)
Computer modeling
Some examples (FIGS)2
Domestication bottleneckunlocking genetic potential from the wildwild tomatotomatoteosintecorn, maize3
Crop Genetic DiversityTraditional landracesCrop Wild RelativesModern cultivarsGenetic bottlenecks during crop domestication and during modern plant breeding. The circles represent allelic variation. The funnels represents allelic variation of genes found in the crop wild relatives, but gradually lost during domestication, traditional cultivation and modern plant breeding.Illustration based on: Tanksley, Steven D. and Susan R. McCouch 1997. Seed Banks and Molecular Maps: Unlocking Genetic Potential from the Wild Science 277 (5329), 1063. (22 August 1997). doi:10.1126/science.277.5329.10634
Plant Genetic Resources for Crop ImprovementPrimitive crops and traditional landraces are an important source for novel traits for improvement of modern crops.Landraces are often not well described for the economically valuable traits.Identification of novel crop traits will often be the result of a larger field trial screening project (thousands of individual plants).Large scale field trials are very costly, area and human working hours.5
Challenges for improved utilization of genetic resources for crop improvement :* Large gene bank collections* Limited screening capacity6
A needle in a hay stackScientists and plant breeders want a few hundred germplasm accessions to evaluate for a particular trait.How does the scientist select a small subset likely to have the useful trait?Example: More than 560 000 wheat accessions in genebanks worldwide.Slide adopted from a slide by Ken Street, ICARDA (FIGS team)7
Core collection subsetThe scientist or the breeder need a smaller subset to cope with the field  screening experiments.A common approach is to create a so-called core collection.Sir Otto H. Frankel (1900-1998) proposed a limited set established from an existing collection with minimum similarity between its entries.The core collection is of limited size and chosen to represent the genetic diversityof a large collection (1984) .8
Core subset selectionGiven that the trait property you are looking for is relatively rare:Perhaps as rare as a unique allele for one single landrace cultivar...Getting what you want is largely a question of LUCK!9Slide adopted from a slide by Ken Street, ICARDA (FIGS team)
FIGS analysis methodFocused Identification of Germplasm Strategy10
Focused Identification of Germplasm StrategyObjective of this method: Explore climate data as a prediction model for “computer pre-screening” of crop traits BEFORE full scale field trials.Identification of landraces with a higher probability of holding an interesting trait property.11
Climate effect during the cultivation processPrimitive cultivated crops are shaped by local climate and humansWild relatives are shaped by the environmentTraditional cultivated crops (landraces) are shaped by climate and humansModern cultivated crops are mostly shaped by humans (plant breeders)Perhaps future crops are shaped in the molecular laboratory…?12
Predictive pattern between eco-geography and traitThe predictive pattern between the eco-geography and the traits can of course also have other sources than adaption.During traditional cultivation the farmer will also select for and introduce germplasm for improved suitability of the landrace to the local conditions.13
FIGS selection methodAssumption: the climate at the original source location, where the landrace was developed during long-term traditional cultivation, is correlated to the trait score. Aim: to build a computer model explaining the crop trait score (dependent variables) from the climate data (independent variables).14
We combine three datasetsLandrace samples (genebank seed accessions)Trait observations (experimental design) - High cost dataClimate data (for the landrace location of origin) - Low cost dataThe accession identifier (accession number) provides the bridge to the crop trait observations.
The longitude, latitude coordinates for the original collecting site of the accessions (landraces) provide the bridge to the environmental data. 15
1. Genetic resources, genebank collectionsLima, PeruAlnarp, SwedenSvalbardBenin16More than 7.4 million genebank accessions, more than 1 400 genebanks, worldwide.
2. Trait data, descriptive crop dataField trials, Gatersleben, GermanyPotato Priekuli LatviaFaba bean, FinlandLinnés äppleForage crops, Dotnuva, LithuaniaRadish (S. Jeppson)17Powdery Mildew, Blumeria graminisLeaf spotsAscochyta sp.Yellow rustPuccinia strilformisBlack stem rustPuccinia graminishttp://barley.ipk-gatersleben.de
3. Climate data – WorldClim	The climate data can be extracted from the WorldClim dataset.http://www.worldclim.org/	Data from weather stations worldwide are combined  to a continuous surface layer.	Climate data for each landrace is extracted from this surface layer.Precipitation: 20 590 stationsTemperature: 7 280 stations18
FIGS – Focused Identification of Germplasm StrategyFIGS selection is a new method to predict crop traits of primitive cultivated material from climate variables by using multivariate statistical methods.19
What is FIGShttp://www.figstraitmine.org/FocusedIdentification of GermplasmStrategyMediterranean regionOrigin of Concept (1980s):Wheat and barley landraces from marine soils in the Mediterranean region provided genetic variation for boron toxicity.South AustraliaSlide made byMichael Mackay 199520
21FIGS	The FIGS technology takes much of the guess work out of choosing which accessions are most likely to contain the specific characteristics being sought by plant breeders to improve plant productivity across numerous challenging environments.http://www.figstraitmine.org/FIGS salinity set21
Slide made byMichael Mackay 199522
Ecological Niche ModelingSpecies Distribution ModelsThe fundamental ecological niche of an organism was formalized by G. E. Hutchinson[1] in 1957 as a multidimensional hypercube defining the ecological conditions that allow a species to exist.A computer model of the occurrence localities together with associated environmental conditions such as rainfall, temperature, day length etc., provides an approximation of the fundamental niche.Popular software implementations for modeling the ecological niche include openModeller, MaxEnt, BioCLIM, DesktopGARP, etc.23George Evelyn Hutchinson (1903 – 1991)
Computer modeling24
Data for the simulation modelTraining setFor the initial calibration or training step.Calibration setFurther calibration, tuning stepOften cross-validation on the training set is used to reduce the consumption of raw data.Test setFor the model validation or goodness of fit testing.New external data, not used in the model calibration.25
A model of the real worldValidation stepNo model can ever be absolutely correctA simulation model can only be an approximationA model is always created for a specific purposeApply the modelThe simulation model is applied to make predictions based on new fresh dataBe aware to avoid extrapolation problems26
Model validationResidual analysis (RMSE)
Pearson Product-Moment Correlation Coefficient (r)27
Residuals (validate model fit)The distance between the model (predictions) and the reference values (validation) is the residuals.Example of a bad model calibrationCalibration stepCross-validation indicates the appropriate model complexity.28Be aware of over-fitting! NB! Model validation!
Some examples29
Morphological traits in Nordic Barley landracesField observations by AgneseKolodinskaBrantestam (2005)Multi-way N-PLS data analysis, Dag Endresen (2009)30Priekuli (L)Bjorke (N)Landskrona (S)
Landrace origin locations (georeferencing)From a total of 19 landrace accessions included in the dataset, only 4 of the landrace accessions included geo-referenced coordinates in the NordGen SESTO database. 10 accessions were geo-referenced from the reported place name and descriptions of the original gathering site included in SESTO and other sources. For 5 accessions there were not enough information available to locate the original gathering location.Right side illustrationExample of georeferencing for NGB9529, landrace reported as originating from Lyderupgaard using KRAK.dk and maps.google.com31
Landrace origin locations (gathering sites)32
Multi-way analysis with PLS Toolbox and MATLAB33
3-way cube model (climate data, X)3-way cube:Climate data (mode 3): Minimum temperature
 Maximum temperature
 Precipitation
 … (many more layers can be added)3 climate variablesX14 landraces(location of origin)12 monthly means2-way array (bi-linear):36 variablesMin. temperatureMax. temperaturePrecipitationJan, Feb, Mar, …Jan, Feb, Mar, …Jan, Feb, Mar, …14 samples34
3-way cube model (trait data, Y)3-way cube:  Mode 2 (Traits) * Heading days* Ripening days* Length of plant* Harvest index* Volumetric weight* Grain weight (tgw)  Mode 3 (experiment site)* Latvia, 2002* Latvia, 2003* Norway, 2002* Norway, 2003* Sweden, 2002* Sweden, 20036 year + locationY14 samples, landraces (x2)6 crop traits2-way array (bi-linear):36 variablesBjørke (N)2002Bjørke (N)2003Landskrona (S)2003Landskrona (S)2002Priekuli (Lv)2002Priekuli (Lv)200314 samples (x2)6 traits6 traits6 traits6 traits6 traits6 traits35
Trait scores (pre-processing)Here: Across mode 2 (traits)  Auto-scaling is a combination of mean centering and variance scaling.
 Mean centering removes the absolute intensity to avoid the model to focus on the variables with the highest numerical values (intensity).
 Scaling makes the relative distribution of values (range spread) more equal between variables.
 After auto-scaling all variables have a mean of zero and a standard deviation of one.
 The objective is to help the model to separate the relevant information from the noise.36
Trait dataset - outlierOutlier: NGB6300, replicate 2 from Priekuli 2003 (LYR122)The influence plot (residuals against leverage) shows sample NGB6300 (FRO) observed at Priekuli in 2003 (replicate 2) with a very high leverage - well separated from the “data cloud”. After looking into the raw data (see the table above), this observation point was removed as outlier (set to NaN).37
PARAFAC split-half, trait data (3-way)PARAFAC split-half (mode 1) analysis:The two PARAFAC models each calibrated from two independent split-half subsets, both converge to the same solutions.The PARAFAC 3-way method produces thus a stable model for this dataset.38

More Related Content

PDF
Trait data mining at European pre-breeding workshop at Alnarp (25 Nov 2009)
PPTX
NOVA PhD training course on pre-breeding, Nordic University Network (2012)
PDF
Phenotypic variability of drought avoidance shoot and root phenes
PPTX
Sub-surface soil organic carbon stock affected by tree lines in an oxisol und...
PPTX
FIGS workshop in Madrid, PGR Secure (9 to 13 January 2012)
PPTX
Searching for traits in PGR collections using Focused Identification of Germp...
PDF
Part_IV_remotesensing-02-02185-v2 (9)
PDF
Trait data mining at European pre-breeding workshop at Alnarp (25 Nov 2009)
NOVA PhD training course on pre-breeding, Nordic University Network (2012)
Phenotypic variability of drought avoidance shoot and root phenes
Sub-surface soil organic carbon stock affected by tree lines in an oxisol und...
FIGS workshop in Madrid, PGR Secure (9 to 13 January 2012)
Searching for traits in PGR collections using Focused Identification of Germp...
Part_IV_remotesensing-02-02185-v2 (9)

What's hot (14)

PPTX
Presentation1 ecogeographic basis
PDF
Castaneda2009 Modelamiento Distribucion Especies
PPTX
Cropped Field Boundaries, Food Systems, & Fire
PDF
Soil information demand for crop simulation, Introducing AquaCrop - Pasquale ...
 
PPT
Gothenburg Pres Ag
PPT
Gothenburg Pres Ag
PDF
TLI 2012: Data management for bean breeding - Ethiopia
PDF
TLI 2012: Bean breeding - Ethiopia
PPTX
Self-organzing maps in Earth Observation Data Cube Analysis
PDF
GB20 Nodes Training Course 2013, module 5B: Latest trends in data analysis
PPTX
Determining gamma radiation dose......Leonard
PPTX
John Stephen: Introducing BASE: Biome of Australian Soil Environments. A coll...
PPTX
Weed management using remote sensing
PDF
Studying yield and yield components of Early Maturing Maize (Zea mays L.) Inb...
Presentation1 ecogeographic basis
Castaneda2009 Modelamiento Distribucion Especies
Cropped Field Boundaries, Food Systems, & Fire
Soil information demand for crop simulation, Introducing AquaCrop - Pasquale ...
 
Gothenburg Pres Ag
Gothenburg Pres Ag
TLI 2012: Data management for bean breeding - Ethiopia
TLI 2012: Bean breeding - Ethiopia
Self-organzing maps in Earth Observation Data Cube Analysis
GB20 Nodes Training Course 2013, module 5B: Latest trends in data analysis
Determining gamma radiation dose......Leonard
John Stephen: Introducing BASE: Biome of Australian Soil Environments. A coll...
Weed management using remote sensing
Studying yield and yield components of Early Maturing Maize (Zea mays L.) Inb...
Ad

Viewers also liked (20)

PPTX
Genetic Resources - R Computing Platform -27JUN2016 - PPT
PPTX
Introduction to the project "Optimizing the Use of Plant Genetic Resources fo...
 
PDF
THEME – 0 Targeted search for crop germplasm with climate change adaptive tra...
PPT
Turok Amman Conference Jan 2010
PPT
Modeling Genetic Resources
PPT
Pondering the (Near) Future: Climate Change and the Genetics of Plant Migrati...
PPTX
Seed conservation - A global approach
DOC
Linda sanders tab consulting resume 16 may 2016
PPT
Optimizing the Use of Plant Genetic resources for Food and Agriculture for Ad...
 
PDF
Global Impacts of Climate Change and Potentials for Adaptation and Mitigation...
PPTX
Cryopreservationofgermplasm
PPT
Climatechange
PPTX
Crop wild relatives - looking at trends in genetic diversity to inform conser...
PDF
IFPRI - NAIP - National Genomic Resources Repository - K C Bansal
PPT
The role of ex situ crop diversity conservation in adaptation to climate change
PPTX
Ecogeographic core collections and FIGS
PPTX
An online checklist of banana cultivars
PDF
Novel strategies for using crop diversity in climate change adaptation
PPTX
National PGR Strategy for Jordan
 
PPT
Plant adaptation to climate change - Scott Chapman
Genetic Resources - R Computing Platform -27JUN2016 - PPT
Introduction to the project "Optimizing the Use of Plant Genetic Resources fo...
 
THEME – 0 Targeted search for crop germplasm with climate change adaptive tra...
Turok Amman Conference Jan 2010
Modeling Genetic Resources
Pondering the (Near) Future: Climate Change and the Genetics of Plant Migrati...
Seed conservation - A global approach
Linda sanders tab consulting resume 16 may 2016
Optimizing the Use of Plant Genetic resources for Food and Agriculture for Ad...
 
Global Impacts of Climate Change and Potentials for Adaptation and Mitigation...
Cryopreservationofgermplasm
Climatechange
Crop wild relatives - looking at trends in genetic diversity to inform conser...
IFPRI - NAIP - National Genomic Resources Repository - K C Bansal
The role of ex situ crop diversity conservation in adaptation to climate change
Ecogeographic core collections and FIGS
An online checklist of banana cultivars
Novel strategies for using crop diversity in climate change adaptation
National PGR Strategy for Jordan
 
Plant adaptation to climate change - Scott Chapman
Ad

Similar to Predictive association between trait data and eco-geographic data for Nordic barley landraces (Gatersleben, 2010-05-12) (20)

PPT
Trait data mining using FIGS (2006)
PDF
Trait data mining using FIGS, seminar at Copenhagen University (27 May 2009)
PDF
Trait data mining seminar at the Carlsberg research institute (CRI) (4 Nov 2009)
PDF
Plant phenotyping platforms
PPT
Amman Workshop #3 - M MacKay
PPTX
Castaneda2013 capfitogen bases_de_datos
PPTX
Aus plots escience-brasil
PPTX
Module 5 - EN - Promoting data use III: Most frequent data analysis techniques
PPTX
Open Science and Ecological meta-anlaysis
PDF
2013 GRM: Improve chickpea productivity for marginal environments in sub-Sah...
PPTX
A Review on the Application of Natural Computing in Environmental Informatics
PDF
Application and Analysis of Ensemble Algorithms in Solving Regression Problems
PDF
APPLICATION AND ANALYSIS OF ENSEMBLE ALGORITHMS IN SOLVING REGRESSION PROBLEMS
PDF
Predictive fertilization models for potato crops using machine learning techn...
PDF
Climate resilient crop cultivars in the view point of physiology and genetics...
PDF
AusPlots field data collection with AusScribe
PDF
Plant phenotyping systems
PPT
Approaches and needs of remote sensing in phenotyping for plant breeding
PDF
The dssat cropping system model
PDF
Dj ict4 ag_2016_en_twitter
Trait data mining using FIGS (2006)
Trait data mining using FIGS, seminar at Copenhagen University (27 May 2009)
Trait data mining seminar at the Carlsberg research institute (CRI) (4 Nov 2009)
Plant phenotyping platforms
Amman Workshop #3 - M MacKay
Castaneda2013 capfitogen bases_de_datos
Aus plots escience-brasil
Module 5 - EN - Promoting data use III: Most frequent data analysis techniques
Open Science and Ecological meta-anlaysis
2013 GRM: Improve chickpea productivity for marginal environments in sub-Sah...
A Review on the Application of Natural Computing in Environmental Informatics
Application and Analysis of Ensemble Algorithms in Solving Regression Problems
APPLICATION AND ANALYSIS OF ENSEMBLE ALGORITHMS IN SOLVING REGRESSION PROBLEMS
Predictive fertilization models for potato crops using machine learning techn...
Climate resilient crop cultivars in the view point of physiology and genetics...
AusPlots field data collection with AusScribe
Plant phenotyping systems
Approaches and needs of remote sensing in phenotyping for plant breeding
The dssat cropping system model
Dj ict4 ag_2016_en_twitter

More from Dag Endresen (20)

PDF
Joint GBIF Biodiversa+ symposium in Helsinki on 2024-04-16
PDF
Iliad webinar 2024-03-13, Accessing and publishing marine biodiversity data i...
PDF
Modelling Research Expeditions in Wikidata: Best Practice for Standardisation...
PDF
Ontologies for biodiversity informatics, UiO DSC June 2023
PDF
Evacuation of the Kherson herbarium
PDF
2023-05-08 GLIS SAC Rome
PDF
BioDT for the UiO Science section meeting 2023-03-24
PDF
Data and Stats Forum at MINA NMBU - 2023-04-26
PPTX
BioDATA final conference in Oslo, November 2022
PDF
GBIF data mobilisation for the Nansen Legacy, Tromsø, 2022-09-20
PDF
GBIF at Living Norway Open Science Lab 2022-03-03
PDF
GBIF & GRScicoll, Høstseminar Norges museumsforbunds Seksjon for natur, 2021-...
PDF
Råd fra GBIF-Norge til datainfrastrukturutvalget i dialogmøte 2021-11-19
PDF
The role of biodiversity informatics in GBIF, 2021-05-18
PDF
GBIF and Biodiversity informatics for museums, 15 March 2021
PDF
2016-10-12 MUSIT & GBIF - Dataset portals
PDF
2021-01-27--biodiversity-informatics-gbif-(52slides)
PDF
GBIF and Open Science
PDF
FAIR and open biodiversity collection data management
PDF
BioDATA capacity enhancement curriculum at GBIF GB26 Global Nodes Meeting in ...
Joint GBIF Biodiversa+ symposium in Helsinki on 2024-04-16
Iliad webinar 2024-03-13, Accessing and publishing marine biodiversity data i...
Modelling Research Expeditions in Wikidata: Best Practice for Standardisation...
Ontologies for biodiversity informatics, UiO DSC June 2023
Evacuation of the Kherson herbarium
2023-05-08 GLIS SAC Rome
BioDT for the UiO Science section meeting 2023-03-24
Data and Stats Forum at MINA NMBU - 2023-04-26
BioDATA final conference in Oslo, November 2022
GBIF data mobilisation for the Nansen Legacy, Tromsø, 2022-09-20
GBIF at Living Norway Open Science Lab 2022-03-03
GBIF & GRScicoll, Høstseminar Norges museumsforbunds Seksjon for natur, 2021-...
Råd fra GBIF-Norge til datainfrastrukturutvalget i dialogmøte 2021-11-19
The role of biodiversity informatics in GBIF, 2021-05-18
GBIF and Biodiversity informatics for museums, 15 March 2021
2016-10-12 MUSIT & GBIF - Dataset portals
2021-01-27--biodiversity-informatics-gbif-(52slides)
GBIF and Open Science
FAIR and open biodiversity collection data management
BioDATA capacity enhancement curriculum at GBIF GB26 Global Nodes Meeting in ...

Recently uploaded (20)

PPTX
Group 1 Presentation -Planning and Decision Making .pptx
PPTX
1. Introduction to Computer Programming.pptx
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PPTX
Spectroscopy.pptx food analysis technology
PDF
cuic standard and advanced reporting.pdf
PDF
NewMind AI Weekly Chronicles - August'25-Week II
PDF
A comparative analysis of optical character recognition models for extracting...
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PPTX
A Presentation on Artificial Intelligence
PPT
Teaching material agriculture food technology
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
Electronic commerce courselecture one. Pdf
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PPTX
Big Data Technologies - Introduction.pptx
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Group 1 Presentation -Planning and Decision Making .pptx
1. Introduction to Computer Programming.pptx
The Rise and Fall of 3GPP – Time for a Sabbatical?
Spectroscopy.pptx food analysis technology
cuic standard and advanced reporting.pdf
NewMind AI Weekly Chronicles - August'25-Week II
A comparative analysis of optical character recognition models for extracting...
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
A Presentation on Artificial Intelligence
Teaching material agriculture food technology
“AI and Expert System Decision Support & Business Intelligence Systems”
Electronic commerce courselecture one. Pdf
MIND Revenue Release Quarter 2 2025 Press Release
Big Data Technologies - Introduction.pptx
Programs and apps: productivity, graphics, security and other tools
Mobile App Security Testing_ A Comprehensive Guide.pdf
Advanced methodologies resolving dimensionality complications for autism neur...
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx

Predictive association between trait data and eco-geographic data for Nordic barley landraces (Gatersleben, 2010-05-12)

  • 1. A Lifeboat to the Gene PoolPredictive association between trait data and eco-geographic data for identification of trait properties useful for improvement of food cropsVavilov Seminar at IPK GaterslebenMay 12, 2010 - Dag Endresen, NordGen
  • 7. Domestication bottleneckunlocking genetic potential from the wildwild tomatotomatoteosintecorn, maize3
  • 8. Crop Genetic DiversityTraditional landracesCrop Wild RelativesModern cultivarsGenetic bottlenecks during crop domestication and during modern plant breeding. The circles represent allelic variation. The funnels represents allelic variation of genes found in the crop wild relatives, but gradually lost during domestication, traditional cultivation and modern plant breeding.Illustration based on: Tanksley, Steven D. and Susan R. McCouch 1997. Seed Banks and Molecular Maps: Unlocking Genetic Potential from the Wild Science 277 (5329), 1063. (22 August 1997). doi:10.1126/science.277.5329.10634
  • 9. Plant Genetic Resources for Crop ImprovementPrimitive crops and traditional landraces are an important source for novel traits for improvement of modern crops.Landraces are often not well described for the economically valuable traits.Identification of novel crop traits will often be the result of a larger field trial screening project (thousands of individual plants).Large scale field trials are very costly, area and human working hours.5
  • 10. Challenges for improved utilization of genetic resources for crop improvement :* Large gene bank collections* Limited screening capacity6
  • 11. A needle in a hay stackScientists and plant breeders want a few hundred germplasm accessions to evaluate for a particular trait.How does the scientist select a small subset likely to have the useful trait?Example: More than 560 000 wheat accessions in genebanks worldwide.Slide adopted from a slide by Ken Street, ICARDA (FIGS team)7
  • 12. Core collection subsetThe scientist or the breeder need a smaller subset to cope with the field screening experiments.A common approach is to create a so-called core collection.Sir Otto H. Frankel (1900-1998) proposed a limited set established from an existing collection with minimum similarity between its entries.The core collection is of limited size and chosen to represent the genetic diversityof a large collection (1984) .8
  • 13. Core subset selectionGiven that the trait property you are looking for is relatively rare:Perhaps as rare as a unique allele for one single landrace cultivar...Getting what you want is largely a question of LUCK!9Slide adopted from a slide by Ken Street, ICARDA (FIGS team)
  • 14. FIGS analysis methodFocused Identification of Germplasm Strategy10
  • 15. Focused Identification of Germplasm StrategyObjective of this method: Explore climate data as a prediction model for “computer pre-screening” of crop traits BEFORE full scale field trials.Identification of landraces with a higher probability of holding an interesting trait property.11
  • 16. Climate effect during the cultivation processPrimitive cultivated crops are shaped by local climate and humansWild relatives are shaped by the environmentTraditional cultivated crops (landraces) are shaped by climate and humansModern cultivated crops are mostly shaped by humans (plant breeders)Perhaps future crops are shaped in the molecular laboratory…?12
  • 17. Predictive pattern between eco-geography and traitThe predictive pattern between the eco-geography and the traits can of course also have other sources than adaption.During traditional cultivation the farmer will also select for and introduce germplasm for improved suitability of the landrace to the local conditions.13
  • 18. FIGS selection methodAssumption: the climate at the original source location, where the landrace was developed during long-term traditional cultivation, is correlated to the trait score. Aim: to build a computer model explaining the crop trait score (dependent variables) from the climate data (independent variables).14
  • 19. We combine three datasetsLandrace samples (genebank seed accessions)Trait observations (experimental design) - High cost dataClimate data (for the landrace location of origin) - Low cost dataThe accession identifier (accession number) provides the bridge to the crop trait observations.
  • 20. The longitude, latitude coordinates for the original collecting site of the accessions (landraces) provide the bridge to the environmental data. 15
  • 21. 1. Genetic resources, genebank collectionsLima, PeruAlnarp, SwedenSvalbardBenin16More than 7.4 million genebank accessions, more than 1 400 genebanks, worldwide.
  • 22. 2. Trait data, descriptive crop dataField trials, Gatersleben, GermanyPotato Priekuli LatviaFaba bean, FinlandLinnés äppleForage crops, Dotnuva, LithuaniaRadish (S. Jeppson)17Powdery Mildew, Blumeria graminisLeaf spotsAscochyta sp.Yellow rustPuccinia strilformisBlack stem rustPuccinia graminishttp://barley.ipk-gatersleben.de
  • 23. 3. Climate data – WorldClim The climate data can be extracted from the WorldClim dataset.http://www.worldclim.org/ Data from weather stations worldwide are combined to a continuous surface layer. Climate data for each landrace is extracted from this surface layer.Precipitation: 20 590 stationsTemperature: 7 280 stations18
  • 24. FIGS – Focused Identification of Germplasm StrategyFIGS selection is a new method to predict crop traits of primitive cultivated material from climate variables by using multivariate statistical methods.19
  • 25. What is FIGShttp://www.figstraitmine.org/FocusedIdentification of GermplasmStrategyMediterranean regionOrigin of Concept (1980s):Wheat and barley landraces from marine soils in the Mediterranean region provided genetic variation for boron toxicity.South AustraliaSlide made byMichael Mackay 199520
  • 26. 21FIGS The FIGS technology takes much of the guess work out of choosing which accessions are most likely to contain the specific characteristics being sought by plant breeders to improve plant productivity across numerous challenging environments.http://www.figstraitmine.org/FIGS salinity set21
  • 27. Slide made byMichael Mackay 199522
  • 28. Ecological Niche ModelingSpecies Distribution ModelsThe fundamental ecological niche of an organism was formalized by G. E. Hutchinson[1] in 1957 as a multidimensional hypercube defining the ecological conditions that allow a species to exist.A computer model of the occurrence localities together with associated environmental conditions such as rainfall, temperature, day length etc., provides an approximation of the fundamental niche.Popular software implementations for modeling the ecological niche include openModeller, MaxEnt, BioCLIM, DesktopGARP, etc.23George Evelyn Hutchinson (1903 – 1991)
  • 30. Data for the simulation modelTraining setFor the initial calibration or training step.Calibration setFurther calibration, tuning stepOften cross-validation on the training set is used to reduce the consumption of raw data.Test setFor the model validation or goodness of fit testing.New external data, not used in the model calibration.25
  • 31. A model of the real worldValidation stepNo model can ever be absolutely correctA simulation model can only be an approximationA model is always created for a specific purposeApply the modelThe simulation model is applied to make predictions based on new fresh dataBe aware to avoid extrapolation problems26
  • 34. Residuals (validate model fit)The distance between the model (predictions) and the reference values (validation) is the residuals.Example of a bad model calibrationCalibration stepCross-validation indicates the appropriate model complexity.28Be aware of over-fitting! NB! Model validation!
  • 36. Morphological traits in Nordic Barley landracesField observations by AgneseKolodinskaBrantestam (2005)Multi-way N-PLS data analysis, Dag Endresen (2009)30Priekuli (L)Bjorke (N)Landskrona (S)
  • 37. Landrace origin locations (georeferencing)From a total of 19 landrace accessions included in the dataset, only 4 of the landrace accessions included geo-referenced coordinates in the NordGen SESTO database. 10 accessions were geo-referenced from the reported place name and descriptions of the original gathering site included in SESTO and other sources. For 5 accessions there were not enough information available to locate the original gathering location.Right side illustrationExample of georeferencing for NGB9529, landrace reported as originating from Lyderupgaard using KRAK.dk and maps.google.com31
  • 38. Landrace origin locations (gathering sites)32
  • 39. Multi-way analysis with PLS Toolbox and MATLAB33
  • 40. 3-way cube model (climate data, X)3-way cube:Climate data (mode 3): Minimum temperature
  • 43. … (many more layers can be added)3 climate variablesX14 landraces(location of origin)12 monthly means2-way array (bi-linear):36 variablesMin. temperatureMax. temperaturePrecipitationJan, Feb, Mar, …Jan, Feb, Mar, …Jan, Feb, Mar, …14 samples34
  • 44. 3-way cube model (trait data, Y)3-way cube: Mode 2 (Traits) * Heading days* Ripening days* Length of plant* Harvest index* Volumetric weight* Grain weight (tgw) Mode 3 (experiment site)* Latvia, 2002* Latvia, 2003* Norway, 2002* Norway, 2003* Sweden, 2002* Sweden, 20036 year + locationY14 samples, landraces (x2)6 crop traits2-way array (bi-linear):36 variablesBjørke (N)2002Bjørke (N)2003Landskrona (S)2003Landskrona (S)2002Priekuli (Lv)2002Priekuli (Lv)200314 samples (x2)6 traits6 traits6 traits6 traits6 traits6 traits35
  • 45. Trait scores (pre-processing)Here: Across mode 2 (traits) Auto-scaling is a combination of mean centering and variance scaling.
  • 46. Mean centering removes the absolute intensity to avoid the model to focus on the variables with the highest numerical values (intensity).
  • 47. Scaling makes the relative distribution of values (range spread) more equal between variables.
  • 48. After auto-scaling all variables have a mean of zero and a standard deviation of one.
  • 49. The objective is to help the model to separate the relevant information from the noise.36
  • 50. Trait dataset - outlierOutlier: NGB6300, replicate 2 from Priekuli 2003 (LYR122)The influence plot (residuals against leverage) shows sample NGB6300 (FRO) observed at Priekuli in 2003 (replicate 2) with a very high leverage - well separated from the “data cloud”. After looking into the raw data (see the table above), this observation point was removed as outlier (set to NaN).37
  • 51. PARAFAC split-half, trait data (3-way)PARAFAC split-half (mode 1) analysis:The two PARAFAC models each calibrated from two independent split-half subsets, both converge to the same solutions.The PARAFAC 3-way method produces thus a stable model for this dataset.38
  • 52. PARAFAC split-half, climate data (3-w)10 different PARAFAC split-half alternatives resulted in 2 good splits39
  • 54. Significance levelsOften the critical levels (a) for the p-value significance is set as 0.05, 0.01 and 0.001 (5 %, 1 %, 0.1 %).For the modeling of 14 samples (landraces) gives:12 degrees of freedom for the correlation tests (mean x, y)One-tailed test (looking only at positive correlation of predictions versus the reference values).A coefficient of determination (r2) larger than 0.56 is significant at the 0.001 (0.1%) level for 14 values/samples.Many introductory text books on statistics include a table of Critical Values for Pearson’s r.41
  • 55. N-PLS regression resultsHeadingLengthH-IndexVolwgtTGWPriekuli (L)Bjorke (N)Landskrona (S)RipeningThe 5% and 1% significance levels indicated by the horizontal green lines42
  • 56. Experiment observation siteLatvia 2002 (LY11)May 2002 was extreme dry in Priekuli.June 2002 was extreme wet in Priekuli.The wet June caused germination on the spikes for many of the early varieties.Landskrona 2003 (LY32)June 2003 was extreme dry in Landskrona.June was the time for grain filling here.Too extreme for the genotype to be “normally” expressed ?Too large effect from “G by E” interaction ?43
  • 57. Experiment observation site (field stations)44Priekuli (L)Bjorke (N)Landskrona (S)
  • 58. N-PLS results diagnosticsPrediction:Calibration:Explained variance:Evaluation of model performance:45
  • 59. Net Blotch inBarley Landraces (FIGS dataset)46
  • 60. Net Blotch susceptibility in Barley landracesMichael MackayFIGS coordinatorBarley (Hordeum vulgare ssp. vulgare) collected from different countries worldwide screened for susceptibility of net blotch infection (1676 greenhouse + 2975 field observations).
  • 61. Net blotch is a common disease of barley caused by the fungus Pyrenophora teres.
  • 62. Screened at four USDA research stations: North Dakota (Langdon, Fargo), Minnesota (Stephen), Georgia (Athens).
  • 63. 1-3 are basically resistant  group 1
  • 64. 4-6 are intermediate  group 2
  • 65. 7-9 are susceptible  group 3
  • 67. Correctly classified groups: 45.9 % in the training set and 44.4 % in the test set (random for 3 groups: 33 %).
  • 68. Work in progress! (SIMCA, D-PLS, Multi-way)Ken StreetFIGS project leaderHarold BockelmanNet blotch dataEddy De PauwClimate dataDag EndresenData analysis47
  • 69. Trait observation locations, USDA Research StationsDr Harold Bockelman extracted the trait data (C&E) from the GRIN database, USDA-ARS, National Plant Germplasm System, Germplasm Resources Information Network, online http://www.ars-grin.gov/npgs48
  • 70. 49Climate dataAgro-climatic Zone (UNESCO classification)Soil classification (FAO Soil map)Aridity (dryness)PrecipitationPotential evapotranspiration (water loss)Temperature Maximum temperatures Minimum temperatures (mean values for month and year)Eddy De Pauw(ICARDA, 2008)
  • 71. Sunn Pest in wheat (ICARDA)No sources of Sunn pest resistance previously found in hexaploid wheat.
  • 72. 2 000 accessions screened at ICARDA without result (during last 7 years).
  • 73. A FIGS set of 534 accessions was developed and screened (2007, 2008).
  • 75. The FIGS selection started from 16 000 landraces from VIR, ICARDA and AWCC
  • 76. Exclude origin CHN, PAK, IND were Sunn pest only recently reported (6 328 acc).
  • 77. Only accession per collecting site (2 830 acc).
  • 78. Excluding dry environments below 280 mm/year
  • 79. Excluding sites of low winter temperature below 10 degrees Celsius (1 502 acc)http://dx.doi.org/10.1007/s10722-009-9427-150Slide adopted from Ken Street, ICARDA (FIGS team)
  • 80. FIGS selection - a lifeboat to the gene pool51

Editor's Notes

  • #2: Photo by Dag Endresen. Barley (Hordeum vulgare L.) at Gatersleben (June, 2007). URL: http://www.flickr.com/photos/dag_endresen/4189818373/
  • #3: Photo: Dag Endresen. Barley seeds (Hordeum vulgare L.), genebank accession NGB11242, at the Nordic Gene Bank, Alnarp (July 2004). URL: http://www.flickr.com/photos/dag_endresen/4262545194/
  • #5: Illustration based on: Tanksley, Steven D. and Susan R. McCouch 1997. Seed Banks and Molecular Maps: Unlocking Genetic Potential from the Wild Science 277 (5329), 1063. (22 August 1997). doi:10.1126/science.277.5329.1063
  • #6: Modern agriculture uses advanced plant varieties based on the most productive genetics. The original land races and wild forms produce lower yields, but their greater genetic variation contains a higher diversity in e.g. resistance to disease. High-yielding modern crops are therefore vulnerable when a new disease arises.
  • #7: Photo: Dag Endresen.Field of sugar beet (Beta vulgaris L.) at Alnarp (June 2005). URL: http://www.flickr.com/photos/dag_endresen/4189812241/
  • #9: Some selected literature on core collections:---1995 :: Core Collections of Plant Genetic Resources. Author: Hodgkin, T.; Brown, A.H.D.; van Hintum, Th.J.L.; Morales, E.A.V. (eds.). ISBN-10: 0-471-95545-0.http://www.bioversityinternational.org/index.php?id=19&user_bioversitypublications_pi1[showUid]=2365---1999 :: Core collections for today and tomorrow. Author: Johnson, R.C.; Hodgkin, T. (eds.). ISBN-10: 92-9043-424-4. http://www.bioversityinternational.org/index.php?id=19&user_bioversitypublications_pi1[showUid]=2153---2000 :: Core Collections of plant genetic resources. IPGRI Technical Bulletin No. 3. Author: van Hintum, Th.J.L.; Brown, A.H.D.; Spillane, C.; Hodgkin, T. ISBN-10: 92-9043-454-6. http://www.bioversityinternational.org/index.php?id=19&user_bioversitypublications_pi1[showUid]=2540---2002 :: Accession management trials of genetic resources collections. IPGRI Technical Bulletin No. 5. Author: Sackville Hamilton, N.R.; Engels, J.M.M.; van Hintum, Th.J.L.; Koo, B.; Smale, B. ISBN-10: 92-9043-516-X.http://www.bioversityinternational.org/index.php?id=19&user_bioversitypublications_pi1[showUid]=2703
  • #10: TODO: lookup reference for rare alleles... (Erling Fimland mentioned alleles unique for a single landrace in husbandry domesticated animals)http://commons.wikimedia.org/wiki/File:P_game.svg
  • #13: Modern agriculture uses advanced plant varieties based on the most productive genetics. The original land races and wild forms produce lower yields, but their greater genetic variation contains a higher diversity in e.g. resistance to disease. High-yielding modern crops are therefore vulnerable when a new disease arises.
  • #14: Illustration traditional cattle farming: http://commons.wikimedia.org/wiki/File:Traditional_farming_Guinea.jpg (USAID, Public Domain)
  • #17: Estimates from FAO State of the World report 2009 (SoW 2009). http://www.fao.org/nr/cgrfa/cgrfa-meetings/cgrfa-comm/twelfth-reg/en/
  • #18: Raphanus Radish: Photographer Simon Jeppson (NordGen Picture Archive, image 004247)Linnés äpple:Photographer MattiasIwarsson (NordGen Picture Archive, image 004942)
  • #19: The WorldClim dataset is described in: Hijmans, R.J., S.E. Cameron, J.L. Parra, P.G. Jones and A. Jarvis, 2005. Very high resolution interpolated climate surfaces for global land areas. International Journal of Climatology 25: 1965-1978NOAA GHCN-Monthly version 2:http://www.ncdc.noaa.gov/oa/climate/ghcn-monthly/index.phpWeather stations, precipitation: 20590;temperature:7280
  • #26: We often divide the data for a simulation model project in three equal parts: one set for initial model calibration or training, one set for further calibration or fine tuning; and one test set for validation on the model.
  • #28: http://en.wikipedia.org/wiki/CorrelationFormula (1): dividing the sample covariance between the two variables by the product of their sample standard deviation.Formula (2): Xi, Yi, X, sx are the standard score, sample mean, and sample standard deviation (equal result as above).---http://en.wikipedia.org/wiki/Anscombe%27s_quartetAnscombe, Francis J. (1973) Graphs in statistical analysis. American Statistician, 27, 17–21. A set of four different pairs of variables created by Francis Anscombe. All the four y variables have the same mean (7.5), standard deviation (4.12). All pairs have the correlation (0.81) and regression line (y = 3 + 0.5x). However, as can be seen on the plots, the distribution of the variables is very different.
  • #29: Residuals
  • #32: KRAK: http://www.krak.dk/query?mop=aq&mapstate=7%3B9.305588071850734%3B56.61105751259899%3Bh%3B9.282591620463698%3B56.61775781407488%3B9.328584523237769%3B56.60435721112311%3B853%3B469&what=map_adr# Google Maps: http://maps.google.com/maps/ms?ie=UTF8&hl=en&msa=0&msid=107144586665622662057.00045ff98921bd0418037&ll=56.606941,9.297695&spn=0.055554,0.150204&t=h&z=13
  • #35: Illustration of the 3-way cube model compared to the more common data array (2 variable dimensions, 2-way, bi-linear)
  • #36: Illustration of the 3-way cube model compared to the more common data array (2 variable dimensions, 2-way, bi-linear)
  • #37: Box-plot of the trait scores to illustrate the effect of the preprocessing. First row is no preprocessing; row 2 is mean-centering (centering across mode 1, samples); last row is auto-scale (centering across mode 1 and scaling across mode 2, traits).Mean centeringremoves the absolute intensity information (the mean for each variable is subtracted from the individual data values). This pre-processing strategy is applied to avoid the model to focus on the variables with the highest numerical values (intensity).Scaling: In general, scaling a variable in the data can be viewed as a multiplication of the corresponding column vector entries with some number. If the significances of the variables to the model are known prior to modeling, then it might be a good idea to upscale the highly relevant variables. In contrast, if a variable is supposed to bear merely noise, then its significance must be downscaled. However this is a rare case in reality. Therefore, unit-variance scaling (UV-scaling) is most often used. Moreover scaling itself is sometimes associated with UV-scaling. (Johann Gasteiger, and Dr. Thomas Engel (editors). 2003. Chemoinformatics: a textbook. Wiley-VCH, Weinheim. ISBN 9783527306817. Page 214)
  • #38: NGB6300 (accide 9039, FRO) observed at Priekuli, Latvia in 2003, replicate 2 is highlighted.NGB776 (accide 8510, SWE) observed at Landskrona, Sweden in 2002 (both replicates) are highlighted. Replicate 2 (LYR312) largest residual, replicate 1 (LYR311) below.
  • #40: Map to illustrated the first successful split-half subsets. Set 1: NGB6300, NGB27, NGB469, NGB776, NGB4701, NGB2072, NGB4641 are indicated with blue placemarks. Set 2: NGB792, NGB13458, NGB9529, NGB468, NGB775, NGB456, NGB2565 are indicated with red placemarks. Map of the second good split-half. Set 1: NGB456, NGB9529, NGB469, NGB2072, NGB468, NGB4641, NGB776 are indicated with blue placemarks. Set 2: NGB4701, NGB27, NGB2565, NGB792, NGB13458, NGB6300, NGB775 are indicated by red placemarks.
  • #41: http://www.vias.org/science_cartoons/regression.html
  • #42: http://en.wikipedia.org/wiki/Correlation, http://en.wikipedia.org/wiki/Coefficient_of_determination, http://en.wikipedia.org/wiki/Statistical_model_validationTable of critical values for r: http://www.runet.edu/~jaspelme/statsbook/Chapter%20files/Table_of_Critical_Values_for_r.pdfTable of critical values for r: http://www.gifted.uconn.edu/siegle/research/Correlation/corrchrt.htmTable of critical values for r: http://www.jeremymiles.co.uk/misc/tables/pearson.html
  • #45: http://www.fallingrain.com/world/NO/1/Bjorke.htmlhttp://www.fallingrain.com/world/LG/5/Priekuli.htmlhttp://commons.wikimedia.org/wiki/File:Rain_on_grass2.jpg
  • #49: Dr Harold Bockelman extracted the trait data (C&E) from the GRIN database (USDA-ARS, National Plant Germplasm System, Germplasm Resources Information Network, online http://www.ars-grin.gov/npgs) USDA GRIN, trait data online: http://www.ars-grin.gov/cgi-bin/npgs/html/desc.pl?1041
  • #51: * Bouhssini, M., Street, K., Joubi, A., Ibrahim, Z., Rihawi, F. (2009). Sources of wheat resistance to Sunn pest, EurygasterintegricepsPuton, in Syria. Genetic Resources and Crop Evolution. URL http://dx.doi.org/10.1007/s10722-009-9427-1,http://www.springerlink.com/content/587250g7qr073636/(Recent FIGS study at ICARDA, Syria.)
  • #52: Closing slide with lifeboat. Source image (Google images): http://www.nut.no/html/body_hyperbaric_lifeboat.html