SlideShare a Scribd company logo
Searching for traits in PGR collections
              using Focused Identification of
              Germplasm Strategy (FIGS)


                     Abdallah Bari, Kenneth Street, Michael Mackay, Eddy De Pauw,
                   Dag Endresen, Ahmed Amri, Kumarse Nazari and Ammor Yahiaoui



                                                                          CIAT
                                                             Palmira, Colombia
                                                                14 March 2012



Grain
Research &
Development
Corporation
Content
              • Background
                 – PGR - traits
                 – FIGS - traits
              • Objective
                 – Develop a priori information
                 – Develop best bet subset of accs with traits
              • Datasets
                 – Trait data
                 – Environmental data
              • Methodologies
                 – Data preparation
                 – Modeling techniques
              • Results/Discussion
                 – Sub-setting (accessions/variables)
                 – “Hot spots”
Grain
Research &
              • Conclusion
Development                                                      2
Corporation
ICARDA

              ICARDA’s Worldwide
              presence
                                   International
                                   Center for
                                   Agricultural
                                   Research in the
                                   Dry
                                   Areas
                                   (ICARDA)




Grain
Research &
Development
Corporation
ICARDA

              PGR
              centers
              of origin
              and
              diversity




Grain
Research &
Development
Corporation
PGR contribution
              Traits of importance to agriculture

                  – phenological adaptation (short growth
                    duration),
                  – efficient use of water,
                  – resistance to biotic stresses (diseases
                    and insects),
                  – tolerance to abiotic stresses (such as
                    drought and salinity), and
                  – superior grain quality




Grain
Research &
                                                              plant pre-evaluation
Development
Corporation
PGR Challenges


              • 50 - 60 000 traits (loci)
              • 7 million of accessions
              • 1400 genebanks
                                            Seed samples




Grain
Research &
Development
Corporation
PGR Challenges

              A needle in a hay stack

               PGR users want variation for
               specific traits and a hundred
               germplasm accessions to evaluate.




Grain
Research &
Development                                        7
Corporation
PGR Challenges and Concerns


                • Size of collections
                  – Addressed by Brown et al. 1999


                • Cost in evaluating accessions
                  lacking the desired trait
                  – Addressed by Gollin et al. 2000



Grain
Research &
Development
Corporation
Content
              • Background
                 – PGR traits
                 – FIGS
              • Objective
                 – Develop a priori information
                 – Develop best bet subset of accs with traits
              • Datasets
                 – Trait data
                 – Environmental data
              • Methodologies
                 – Data preparation
                 – Modeling techniques
              • Results/Discussion
                 – Sub-setting (accessions/variables)
                 – “Hot spots”
Grain
Research &
              • Conclusion
Development                                                      9
Corporation
Objective
               FIGS searches genetic resources (data) germplasm collections to detect
               any particular trait-environment patterns/ relationships (as a priori
               information).

               This a priori information is then used to develop predictive models to find
               novel genetic variation of the traits of interest and where it is likely to
               occur the most.


               Quantification                                               Utilization of
               of trait-             A priori            Develop
                                                                            genetic
               environment           information         trait subsets
                                                                            resources
               relationship




Grain
Research &
Development                                                                              10
Corporation
Origin of FIGS approach

              Boron toxicity of wheat and barley – early FIGS examples




                                 Mediterranean Sea




              Wheat landraces from marine origin soils in Mediterranean region provided
Grain
              all the genetic variation needed to produce boron tolerant varieties
Research &
Development                                                                 M.C. Mackay, 1995
Corporation
FIGS approach
               “FIGS applies to plant genetic resources (stored collections)
                the same selection pressure exerted on plants by evolution.”




                   PGR
                                Collection     sampling        core
               (Biodiversity)




                   PGR
                                 sampling         trait         user
               (Biodiversity)




Grain
Research &
Development                                                                    12
Corporation
FIGS approach
               FIGS has helped breeders identify
                 long sought-after plant traits such
                 as resistance to:

                  –   Net blotch (barley),
                  –   Powdery mildew,
                  –   Russian wheat aphid (RWA) and
                  –   Sunn pest.




                                  Braidotti, G.2009. Keys to the gene bank, Biotechnology.
                                  Partners in Research for Development 16-17.
Grain
Research &
Development
Corporation
Sunn pest trait of resistance
              8 landrace accessions from
              Afghanistan and

              2 from Tajikistan identified as resistant
              at juvenile stage

              Now developing mapping populations




Grain
Research &                                                14
Development
Corporation
FIGS approach to Pm
                  16,000 variétés locales de blé
                                                               FIGS applique

                       1,300 sélectionnées

                                                               Phenotyping 40% yielded
                                                                           accessions that were
                        211 accs entre R et IR                             resistant to the
                                                               Genotyping isolates used


                           7 nouveau allèles

                  Au moins 2 ont la spécificité de race nouvelle
                  100 ans de génétique classiques = 7 allèles

              Kaur K; Street K; Mackay M; Yahiaoui N; Keller B (2008). Allele mining and sequence
Grain
Research &
              diversity at the wheat powdery mildew resistance locus Pm3. 11th IWGS, 24-29 Aug.,
Development   Brisbane)
Corporation
Locating new Pm3 alleles
              The distribution of the new seven functional alleles of Pm3
              Out of 96.2% of the total set screened

               Turkey
               Afghanistan
               Iran
               Pakistan and
               Armenia




Grain
Research &                                                                  16
Development
Corporation
The FIGS picture
              Genotypes x Environments x Time1 = Genetic Variation


              Can we use the same evolutionary principles
              in reverse to identify the environments that
              ‘engender’ trait specific genetic variation?

              Environments x Traits x Time = Trait variation
              (ExT)?
                                                  1   plus some selection
Grain
Research &                                                     17
Development
Corporation
Examples of eco-geographic variation of
                       traits linked to environmental influences
   Environment influence              Trait                               Species                 Reference
   Low altitudes, high winter emp.,   Cyanogenesis                        Trifolium repens        Pederson, Fairbrother et al.
       low summer rain, spring                                                                    1996
       cloudiness
   Aridity                            Seed dormancy, early                Annual legumes          Ehrman and Cocks 1996
                                      flowering, high seed to pod ratio


   Soil type                          Tolerance to Boron toxicity         Bread wheat             Mackay (1990)

   Altitude, winter temp, RWA         Russian Wheat Aphid (RWA)           Bread wheat             Bohssini, et al accepted for
   distribution                       resistance                                                      publication 2008

   Temperature, aridity               Drought resistance                  Triticum dicoccoides    Peleg, Fahima et al. 2005

   Altitude                           Glume colour and beak length        Durum wheat             Bechere, Belay et al. 1996

   Climate, soil and water            Heading date, culm length,          Triticum dicoccoides    Beharav and Nevo 2004
       availability                   biomass, grain yield and its
                                      Components

   Precipitation, minimum             Glutenin diversity                  Durum wheat             Vanhintum and Elings 1991
        January
   temperature, altitude.
   temperature, aridity               More efficient RUBISCO              Woody perennials        Galmes et al, 2005
                                      activity
Grain
Research &relations,
   Water
Development
                       temperature    Hordatine accumulation              Barely                 After18 C Mackay
                                                                                                        M.
                                                                                                  Batchu, Zimmermann et al.
         and
Corporation                           (disease defence)                                           2006
FIGS system                                      PGR collections
    User defined
       needs                     Database                         Filters

                                 Type of material


                                 Evaluation data

                                  Collection site

     Interface                  Other information

                                    Size limit            500     1500
                                                           250   750


Grain                       See www.figstraitmine.com
                                                        New Subset
   After M. C Mackay 1995
Research &                                                       19
Development
Corporation
Mining natural variation
              By linking traits, environments (and associated selection pressures)
              with genebank accessions (e.g. landraces and crop relatives) we can
              ‘focus’ in on those accession most likely to possess trait specific
              genetic variation.




                                                                 60
                                                                 50
                                                                 40
                                                      Latitude

                                                                 30
                                                                 20
                                                                 10
                                                                 0
                                                                        0       50               100   150

                                                                                     Longitude




                Environnement             Trait                       FIGS subset


Grain
Research &
Development
Corporation
FIGS approach – summarized
                                    Focused Identification of
                                      Germplasm Strategy



              Environment (E)                                   Trait (T)


               Geo-referencing of                                 Evaluation
                collecting places                               (phenotyping)


                                          Accession
                                             (G)
Grain
Research &
Development
Corporation
                                                                                21
Content
              • Background
                 – PGR traits
                 – FIGS
              • Objective
                 – Develop a priori information
                 – Develop best bet subset of accs with traits
              • Datasets
                 – Trait data
                 – Environmental data
              • Methodologies
                 – Data preparation
                 – Modeling techniques
              • Results/Discussion
                 – Sub-setting (accessions/variables)
                 – “Hot spots”
Grain
Research &
              • Conclusion
Development                                                      22
Corporation
Eco-climate data (X)
               ICARDA eco-climatic database, average:
               annual temperature (front), annual
               precipitation (middle), and winter
               precipitation (back) (De Pauw 2008)



              Climate data (X as independent variables)
              site_code1 prec01    prec02    prec03    prec04    prec05    …..   ari01      ari02      ari03      ari04      ari05
              ETH-S893          25        36        72    154.22    148.88            0.167      0.246      0.439      1.098      1.169
              ETH-S1222         29        44        92    167.46       168            0.223      0.344      0.646      1.354      1.612
              NS_339            44        67    130.43    177.96    185.74            0.351      0.552      0.949      1.457      1.751
              ETH-S1153         36        48        86    140.92    131.94             0.28       0.39      0.609      1.108      1.078
              NS_415            32     46.61     95.42     150.3       157            0.271      0.419      0.732      1.289      1.437
              NS_424         31.94        45        90    143.62       150            0.257       0.38      0.641      1.146      1.272
              ETH64:55          28     38.26        57     97.57        81            0.247      0.344       0.45      0.834      0.662
              NS_525            28        39        57     97.13     80.78            0.248      0.352      0.452      0.836      0.669
              NS_526            27        39        57     97.01     80.77            0.241      0.354      0.455      0.842       0.68
              NS_559            23        40     61.89    129.04       102            0.226      0.397      0.511      1.206      0.998
          .
          .
          .      Source: International Center for Agricultural Research in the Dry Areas (ICARDA)
          .
          .
Grain
Research &
Development
Corporation
Eco-climate data (X)
              Layers used in the stem rust studies:
              •   Precipitation (rainfall)
              •   Maximum temperatures
              •   Minimum temperatures
              + Derived GIS layers such as:
              •   Potential evapotranspiration (water-loss)
              •   Agro-climatic Zone (UNESCO classification)
              •   Moisture/Aridity index
                  (mean values for month and year)




Grain
Research &
Development                                                    24
Corporation
Trait data set (Y)
              Trait data
              (Y as
              dependent
              variable)


                                                                                                                       http://www.news.cornell.edu/
              site_code1   R_state0       R_state1       R_state2       R_state3       R_state4       R_state5       R_state6       R_state7       R_state8       R_state9
              ETH-S893                0              0              0              0              0              0              0              0              1              0
              ETH-S1222               0              0              0              0              0              0              0              0              0              1
              NS_339                  0              0              0              0              0              0              1              0              1              0
              ETH-S1153               0              0              0              0              2              1              3              0              0              0
              NS_415                  0              0              0              0              0              0              1              0              0              0
              NS_424                  0              0              0              1              0              0              0              0              0              0
              ETH64:55                0              0              1              0              0              0              0              0              0              0
              NS_525                  0              0              0              0              0              0              1              0              0              0
              NS_526                  0              1              2              1              2              0              3              0              0              0
              .
              NS_559                  2              5              1              0              0              2              0              0              0              0
              .
              ETH64:53
              .                       0              0              1              0              0              0              0              0              0              0
              .
              .
                  Source: (USDA) National Genetic Resources Program (NGRP) GRIN database

Grain
Research &
Development
Corporation
Searching for stem rust trait of resistance -
              concerns
               Stem rust
               spreading
               to wheat
               production
               areas




                            http://www.news.cornell.edu/




Grain
Research &
Development
Corporation
Stem rust on wheat landraces – trait data




              Green dots indicate collecting sites for resistant wheat
              landraces and red dots collecting sites for susceptible
              landraces.

              USDA GRIN, trait data online:                              Field experiments made in
              http://www.ars-grin.gov/cgi-bin/npgs/html/desc.pl?65049    Minnesota by Don McVey
Grain
Research &
Development
Corporation
                                                                                               27
Content
              • Background
                 – PGR traits
                 – FIGS
              • Objective
                 – Develop a priori information
                 – Develop best bet subset of accs with traits
              • Datasets
                 – Trait data
                 – Environmental data
              • Methodologies
                 – Data preparation
                 – Modeling techniques
              • Results/Discussion
                 – Sub-setting (accessions/variables)
                 – “Hot spots”
Grain
Research &
              • Conclusion
Development                                                      28
Corporation
Data preparation
                                                                            Climate data (X as independent variables)
              Power relationship   ~   2(p) (spread)                        site_code                                   …..                    ari02               …..
                                                                            ETH-S893                                                                   0.246
                                                                            ETH-S1222                                                                  0.344
                                                                            NS_339                                                                     0.552
                                                                            ETH-S1153                                                                  0.390
                                                                            NS_415                                                                     0.419
                                                                            NS_424                                                                     0.380
                                                                            ETH64:55                                                                   0.344
                                                                            NS_525                                                                     0.352
                                                                            NS_526                                                                     0.354
                                                                            NS_559                                                                     0.397




                                                                                                                                         500
                                                                  800




                                                                                                                                         400
                                                                  600




                                                                                                                              Frequecy

                                                                                                                                         300
                                                       Frequecy

                                                                  400




                                                                                                                                         200
                                                                  200




                                                                                                                                         100
                                                                                                                                         0
                                                                  0




                                                                        0          5                   10                15                    -4        -2            0            2              4

                                                                            Aridity or Moisture Index during February                                  Aridity or Moisture Index during February




Grain
Research &
Development                                                                                                                                                                                        29
Corporation
Platform

                                                   Geographical
                      R language
                                                Information System
              (Development of algorithms)
                                                       (GIS)
              >   Data transformation ( )
                                          Arc Gis
              >   Model <- model(trait ~ climate)
              >
                                          Environmental data/layers
                  Measuring accuracy metrics
              >   ….                      (surfaces)




                     Modeling purpose            Generation of
                                                 environmental data

Grain
Research &
Development                                                           30
Corporation
Modeling framework

                                  Trait data (Y)                Environmental data (X)



                                                     Y ~ f(X)


              Fist linear approach irrespective of the underlying distributions describing the data


              Yi ~
                                                                X is the set of variables that contains
                                                                explanatory variables or predictors
                                                                (climate data) where X ∈ Rm,
                                                                Y ∈ Y that is either a categorical (label)
                                                                or a numerical response (trait descriptor
               Yi ~                                             states).

Grain
Research &
Development                                                                                                  31
Corporation
Modeling framework

              •   Principal component analysis (PCA)
              •   Partial Least Square (PLS)
              •   Random Forest (RF)
              •   Support Vector Machines (SVM)
              •   Neural Networks (NN)


                  Bari A., Street K., Mackay M., Endresen D.T.F., De Pauw E. & Amri A.
                  (2011) Focused identification of germplasm strategy (FIGS) detects wheat
                  stem rust resistance linked to environmental variables.
                  Genetic Resources and Crop Evolution
                  http://www.springerlink.com/content/m7140x68v2065113/fulltext.pdf
Grain
Research &
Development
Corporation
Principal Component Analysis (PCA)
                                                           •   Principal component analysis (PCA)
                                                           •   Partial Least Square (PLS)
                                                           •   Random Forest (RF)
                                                           •   Support Vector Machines (SVM)
                                                           •   Neural Networks (NN)

              B a matrix of coefficients.

              The prediction was initially carried out using the number of
              components (PCs) that account for 95% of explained variance.

              Followed by adding a component at a time till the error reached a
              minimum




Grain
Research &
Development                                                                                    33
Corporation
Partial Least Square (PLS)
                                                             •   Principal component analysis (PCA)
                                                             •   Partial Least Square (PLS)
                                                             •   Random Forest (RF)
                                                             •   Support Vector Machines (SVM)
                                                             •   Neural Networks (NN)

              PLS :
              A product of factors and their loadings (regression coefficients) where
              both environmental dataset and trait dataset simultaneously

              The prediction was initially carried out using the number of components
              (PCs) that account for 95% of explained variance.

              Followed by adding a component at a time till the error reached a
              minimum




Grain
Research &
Development                                                                                      34
Corporation
Random Forest (RF)
                                                        •   Principal component analysis (PCA)
                                                        •   Partial Least Square (PLS)
                                                        •   Random Forest (RF)
                                          Data          •   Support Vector Machines (SVM)
                                                        •   Neural Networks (NN)

                              Bootstrapping (with replacement)


                   Training (set)                                       Out-of-bag (set)
                                                                             OOB




              ntree 1           ntree 2            ntree 1000

Grain
Research &
Development                                                                                 35
Corporation
Support Vector Machines (SVM)
                                                          •     Principal component analysis (PCA)
                                                          •     Partial Least Square (PLS)
              SVM a learning-based technique that maps    •     Random Forest (RF)
              input data to a high-dimensional space.     •     Support Vector Machines (SVM)
                                                          •     Neural Networks (NN)
              Optimally separates mapped input into
              respective classes                                                  v
                                                                                  v




                                                                               (x)    v
                                                                             (x)      v     (x)
                                                                            (x)       (x)
              From l-dimensional space (input variable space)
              into k-dimensional space,

              where k is more higher than l.
Grain
Research &
Development                                                                                       36
Corporation
Neural Networks (NN)
                                                          •   Principal component analysis (PCA)
                                                          •   Partial Least Square (PLS)
                   Neural Networks (RBF)
                                                          •   Random Forest (RF)
                                                          •   Support Vector Machines (SVM)
                                                          •   Neural Networks (NN)
                                                  error


                                                                                    Test set
              x1


              x2                           F(x)
                                                                                     Training set




              xp

                                                                          epochs number

Grain
Research &
Development                                                                                    37
Corporation
Optimization/tuning
              error


                                                                    Test set




                                                                      Training set




                                           PCs, LVs or epochs number


              Trend of output error versus the number of components(PCs/LVs) or epochs (NN)
Grain
Research &
Development
Corporation
Accuracy metrics
              Parameters that provide information on the specificity
              (“trait agro-climate”)


               Confusion matrix (2-by-2 contingency table)
                                                        Observed
                                                        Resistant         Susceptible
               Predicted             Resistant          a                 b
                                     Susceptible        c                 d


               Sensitivity a/ (a + c) =
               Specificity d/(b + d) =

                 and       are indicators of the models ability to correctly classify observations.




Grain
Research &
Development
Corporation
Accuracy metrics
               Parameters that provide information on the specificity
               (“trait agro-climate”) ..

               High AUC (area) values indication of potential trait-environment relationship
              1-
                       ROC curve                            pdf’s of trait distribution
               1




                                     1

Grain
                   The ROC curve and the resulting pdf’s of trait distribution (trait states)
Research &
Development
Corporation
Accuracy metrics
               Randomness
              1-   ROC curve       pdf’s of trait distribution
               1




                               1




Grain
Research &
Development
Corporation
Content
              • Background
                 – PGR traits
                 – FIGS
              • Objective
                 – Develop a priori information
                 – Develop best bet subset of accs with traits
              • Datasets
                 – Trait data
                 – Environmental data
              • Methodologies
                 – Data preparation
                 – Modeling techniques
              • Results/Discussion
                 – Sub-setting (accessions/variables)
                 – “Hot spots”
Grain
Research &
              • Conclusion
Development                                                      42
Corporation
Data preparation - Raw data

                                                                                                                                                               PCs = 42




                                                                                                   1.0




                                                                                                                                                       1
                              0.46




                                                                                                                                                       0.71
                                                                                                   0.8
                                                                              True positive rate
                              0.44




                                                                                                   0.6
                    RMSE




                                                                                                                                                       0.13
                                                                                                   0.4
                              0.42




                                                                                                   0.2
                              0.40




                                                                                                                                                       -0.45
                                                                                                   0.0
                                     0    10    20     30     40   50    60                              0.0   0.2      0.4     0.6        0.8   1.0

                                            Number of components                                                     False positive rate



                                            Distribution by trait
                              2.0




                                                                                                               AUC = 0.67
                              1.5
                    Density




                                                                                                               Kappa = 0.40
                              1.0
                              0.5
                              0.0




                                         -0.5    0.0        0.5    1.0




Grain
Research &
Development
Corporation
Data preparation – Transformed data

                                                                                                                                                                    PCs = 42




                                                                                                        1.0
                               0.46




                                                                                                                                                            0.59
                                                                                                        0.8
                                                                                   True positive rate
                               0.44




                                                                                                        0.6
                     RMSE

                               0.42




                                                                                                                                                            0.03
                                                                                                        0.4
                                                                                                        0.2
                               0.40




                                                                                                                                                            -0.54
                                                                                                        0.0
                                      0   10     20         30   40    50     60                              0.0   0.2      0.4     0.6        0.8   1.0

                                            Number of components                                                          False positive rate



                                           Distribution by trait
                               2.0
                               1.5




                                                                                                                    AUC = 0.71
                     Density

                               1.0




                                                                                                                    Kappa = 0.45
                               0.5
                               0.0




                                          -0.5        0.0        0.5    1.0




Grain
Research &
Development
Corporation
Data preparation - Raw data (PLS)

                                                                                                                                                                 LVs = 30




                                                                                                     1.0
                              0.46




                                                                                                                                                         0.68
                                                                                                     0.8
                                                                                True positive rate
                              0.44




                                                                                                     0.6
                    RMSE




                                                                                                                                                         0.07
                                                                                                     0.4
                              0.42




                                                                                                     0.2
                              0.40




                                                                                                                                                         -0.55
                                                                                                     0.0
                                       0    10     20     30    40   50    60                              0.0   0.2      0.4     0.6        0.8   1.0

                                              Number of components                                                     False positive rate



                                             Distribution by trait
                              2.0




                                                                                                                 AUC = 0.70
                              1.5
                    Density




                                                                                                                 Kappa = 0.43
                              1.0
                              0.5
                              0.0




                                     -1.0   -0.5    0.0        0.5   1.0




Grain
Research &
Development
Corporation
Data preparation – Transformed data

                                                                                                                                                                          LVs = 22




                                                                                                                                                               0.6 0.85
                                                                                                           1.0
                              0.46




                                                                                                           0.8
                                                                                      True positive rate
                              0.44




                                                                                                           0.6
                    RMSE

                              0.42




                                                                                                                                                               0.09
                                                                                                           0.4
                                                                                                           0.2
                              0.40




                                                                                                                                                               -0.42
                                                                                                           0.0
                                     0     10      20      30   40    50         60                              0.0   0.2      0.4     0.6        0.8   1.0

                                                Number of components                                                         False positive rate



                                            Distribution by trait
                              2.0




                                                                                                                       AUC = 0.71
                              1.5
                    Density

                              1.0




                                                                                                                       Kappa = 0.44
                              0.5
                              0.0




                                         -0.5        0.0        0.5        1.0




Grain
Research &
Development
Corporation
Optimization process

                                             R_CALC                                                   R_CALC
                       0.46




                                                                                0.46
                       0.44




                                                                                0.44
              RMSEP




                                                                        RMSEP
                       0.42




                                                                                0.42
                       0.40




                                                                                0.40
                              0   10    20     30      40     50   60                  0   10    20     30      40     50   60

                                       number of components                                     number of components




                       Mean square error (RMSEP) for PCA (left) and PLS (right) models. Arrow indicate
                      minimum errors where the number of components (PCs and LVs) were selected for
                           prediction (red/discount nous = test data, continuous line = training set)

Grain
Research &
Development                                                                                                                      47
Corporation
PCA
                       PC2
                       Few components  ~ random
                                                                                                            Distribution per R_CALC
                                   1.0




                                                                                                 12
                                                                                                                               Resistant
                                   0.8




                                                                                                                               Susceptible




                                                                                                 10
              True positive rate

                                   0.6




                                                                                                 8
                                                                                       Density

                                                                                                 6
                                   0.4




                                                                                                 4
                                   0.2




                                                                                                 2
                                   0.0




                                         0.0   0.2    0.4          0.6     0.8   1.0             0
                                                                                                      0.2          0.3          0.4          0.5
                                                     False positive rate
                                                                                                                         ...




Grain
Research &
Development                                                                                                                                        48
Corporation
PCA
          PC5
                                                                                                                   Distribution per R_CALC
                                   1.0




                                                                                                 4
                                                                                                                                                  Resistant
                                                                                                                                                  Susceptible
                                   0.8




                                                                                                 3
              True positive rate

                                   0.6




                                                                                       Density

                                                                                                 2
                                   0.4




                                                                                                 1
                                   0.2
                                   0.0




                                         0.0   0.2    0.4          0.6     0.8   1.0             0
                                                                                                     -0.4   -0.2    0.0   0.2         0.4   0.6      0.8        1.0
                                                     False positive rate
                                                                                                                                ...




Grain
Research &
Development                                                                                                                                                           49
Corporation
PLS
                      LV2
                      2 latent variables of PLS are better than 2 PCs of PCA
                                                                                                             Distribution per R_CALC
                                   1.0




                                                                                           4
                                                                                                                                            Resistant
                                                                                                                                            Susceptible
                                   0.8




                                                                                           3
              True positive rate

                                   0.6




                                                                                 Density

                                                                                           2
                                   0.4




                                                                                           1
                                   0.2
                                   0.0




                                                                                           0




                                         0.0   0.2    0.4          0.6     0.8     1.0         -0.4   -0.2   0.0    0.2         0.4   0.6     0.8         1.0

                                                     False positive rate                                                  ...



Grain
Research &
Development                                                                                                                                                     50
Corporation
PLS
              LV10
                                                                                                              Distribution per R_CALC
                                   1.0




                                                                                                                                        Resistant




                                                                                                 2.0
                                   0.8




                                                                                                                                        Susceptible
              True positive rate

                                   0.6




                                                                                                 1.5
                                                                                       Density
                                   0.4




                                                                                                 1.0
                                   0.2




                                                                                                 0.5
                                   0.0




                                                                                                 0.0



                                         0.0   0.2    0.4          0.6     0.8   1.0

                                                     False positive rate                               -0.5       0.0          0.5         1.0

                                                                                                                        ...


Grain
Research &
Development                                                                                                                                           51
Corporation
PCA (optimized)
                                                                             •   Principal component analysis (PCA)
                                                                             •   Partial Least Square (PLS)
                                                                             •   Random Forest (RF)
                                                                             •   Support Vector Machines (SVM)
                                                                             •   Neural Networks (NN)
                                     ROC curve
                         1.0




                                                                       2.0
    True positive rate


                         0.8




                                                                       1.5
                                                             Density
                         0.6




                                                                       1.0
                         0.4




                                                                       0.5
                         0.2
                         0.0




                                                                       0.0
                               0.0        0.4          0.8                           -0.5    0.0     0.5    1.0


                                 False positive rate                                   Prediction
Grain
Research &
Development
Corporation
PLS (optimized)
                                                                             •   Principal component analysis (PCA)
                                                                             •   Partial Least Square (PLS)
                                                                             •   Random Forest (RF)
                                                                             •   Support Vector Machines (SVM)
                                                                             •   Neural Networks (NN)
                                     ROC curve
                         1.0




                                                                       2.0
    True positive rate


                         0.8




                                                                       1.5
                                                             Density
                         0.6




                                                                       1.0
                         0.4




                                                                       0.5
                         0.2
                         0.0




                                                                       0.0
                               0.0        0.4          0.8                        -0.5     0.0      0.5      1.0


                                 False positive rate                                   Prediction
Grain
Research &
Development
Corporation
RF
                                                                             •   Principal component analysis (PCA)
                                                                             •   Partial Least Square (PLS)
                                                                             •   Random Forest (RF)
                                                                             •   Support Vector Machines (SVM)
                                                                             •   Neural Networks (NN)
                                     ROC curve




                                                                       3.0
                         1.0




                                                                       2.5
    True positive rate


                         0.8




                                                                       2.0
                                                             Density
                         0.6




                                                                       1.5
                         0.4




                                                                       1.0
                         0.2




                                                                       0.5
                         0.0




                                                                       0.0
                               0.0        0.4          0.8                          0.0         0.5        1.0


                                 False positive rate                                   Prediction
Grain
Research &
Development
Corporation
SVM
                                                                           •   Principal component analysis (PCA)
                                                                           •   Partial Least Square (PLS)
                                                                           •   Random Forest (RF)
                                                                           •   Support Vector Machines (SVM)
                                                                           •   Neural Networks (NN)
                                     ROC curve
                         1.0




                                                                       4
    True positive rate


                         0.8




                                                                       3
                                                             Density
                         0.6




                                                                       2
                         0.4




                                                                       1
                         0.2
                         0.0




                                                                       0
                               0.0        0.4          0.8                          0.0      0.5       1.0


                                 False positive rate                                 Prediction
Grain
Research &
Development
Corporation
NN
                                                                             •   Principal component analysis (PCA)
                                                                             •   Partial Least Square (PLS)
                                                                             •   Random Forest (RF)
                                                                             •   Support Vector Machines (SVM)
                                                                             •   Neural Networks (NN)
                                     ROC curve
                         1.0




                                                                       3.0
    True positive rate


                         0.8




                                                                       2.5
                                                             Density


                                                                       2.0
                         0.6




                                                                       1.5
                         0.4




                                                                       1.0
                         0.2




                                                                       0.5
                         0.0




                                                                       0.0
                               0.0        0.4          0.8                       -0.2    0.2         0.6    1.0


                                 False positive rate                                    Prediction
Grain
Research &
Development
Corporation
Random (PCA)
                                                                                                                                                         R_CALC




                                                                                                                                  0.470
                                   1.0




                                                                                                                                                                                      Complete
                                                                                                                                                                                      random
                                   0.8




                                                                                                                                                                                      distribution




                                                                                                                                  0.465
              True positive rate

                                   0.6




                                                                                                                          RMSEP
                                                                                                                                                                                      of trait of
                                   0.4




                                                                                                                                                                                      stem rust




                                                                                                                                  0.460
                                                                                                                                                                                      resistance
                                   0.2




                                                                                                                   AUC ~ 0.5
                                   0.0




                                         0.0          0.2          0.4            0.6          0.8         1.0                            0   10   20       30         40   50   60

                                                                   False positive rate                                                              number of components
                                   1.0




                                                                                                             0.1




                                                                                                                                  0.465
                                                                                               0.2

                                                                                                                                                                                      Partially

                                                                                                                                  0.460
                                   0.8




                                                                               0.3
                                                                                                                                                                                      random
                                                                                                                                  0.455
              True positive rate

                                   0.6




                                                             0.4
                                                                                                                                  0.450                                               distribution
                                                                                                                          RMSE




                                                                                                                                                                                      of trait of
                                   0.4




                                                                                                                                  0.445




                                                       0.5
                                                                                                                                                                                      stem rust
                                   0.2




                                                                                                                                  0.440




                                                                                                                                                                                      resistance
                                                0.6
                                                                                                                                  0.435
                                   0.0




                                               0.8
                                               0.7


                                         0.0           0.2           0.4                 0.6         0.8         1.0

Grain                                                                False positive rate
                                                                                                                                          0   10   20       30         40   50   60

Research &                                                                                                                                          Number of components
Development                                                                                                                                                                                          57
Corporation
Stem rust hot spots
                         60
                         50
                         40
              Latitude

                         30
                         20
                         10
                         0




                              0   50               100   150

                                       Longitude


Grain
Research &
Development
Corporation
Stem rust hot spots
                                                            areas where resistance is
               latitude    60
                           50
                           40
                                                            likely to occur (longitude wise)
                Latitude

                           30




                                                   1
                           20
                           10
                           0




                                0       50                     100                    150
                           60




                                                Longitude




                                    b
                           50
                           40
                Latitude

                           30
                           20
                           10




Grain
                           0




Research &
Development
Corporation
                                0       50
                                             longitude
                                                Longitude
                                                               100                    150
PLS (optimized)
              Areas where resistance is likely to occur (dark red)
                          60




                                                              -0.2


                                                                                                                                                                                               0.8
                                                                                         0
                          50




                                                                                   0.2
                                                                                                                                                                                               0.6




                                                                                                                                   2
                                                                                                                                -0.
                                                                             0.4                                                                              0
                               0.6




                                                                                                                                                 -0.2
                                                                 0.6
               Latitude
                          40




                                                                                             0.2
                                                                                                            0
                                                                                                                                                              0.2
                                                                                                                                                                                               0.4
                                               0.6
                                         0.4
                  Y




                                                                                                                          0.6
                          30




                                                                                                   0                                                                                           0.2


                                                                                                                0.6
                                                                                                                                   0.4
                                                                                                                                           0.2
                          20




                                                                                    0
                                                                                                                      0
                                                                                                                                                                                               0.0
                                                                                                                          0

                                                          0
                                                                               0.2
                                                                       0.4
                                                                                                                                                                                               -0.2
                          10




                                                                       0.4




                                                                                                                                                                                   0.08




                                     0               20                  40                            60                          80    100            120

                                                                                             Longitude
                                                                                                                                                                                   0.06




                                                                                                X




                                                                                                                                                                    semivariance
                                                                                                                                                                                   0.04




                                                                                                                                                                                   0.02




Grain
Research &                                                                                                                                                                                10         20              30   40



Development                                                                                                                                                                                               distance




Corporation
Random Forest (RF)
              Areas where resistance is likely to occur (dark red)
                            60




                                                                                                  0.4
                            50




                                                                                                                                                                                                    0.8
                                                                0.2
                                                 0


                                                     0.4
                                                                                                                                                                     0
                                 0.6
                                                                  0.8                                                                                                                               0.6
                 Latitude
                            40




                                                                                        0.2

                                                                              0.4
                                           0.6                                                                                                           0.2
                                                                        0.6
                                                                                                        0
                                                                                                                  0.4
                                                                                                  0.2
                                                                                0.4


                                                                                            0.2
                    Y




                                                                                                                                   0
                                                                                                                                                                                                    0.4
                            30




                                                                                                                        0.6
                                                                                                                                             0.4
                                                                                                            0.6



                                                                                                                                       0.6

                                                                                                                                                                                                    0.2
                            20




                                                                                                                                                                                                    0.0
                            10




                                                                                      0.2
                                                                        0.4




                                                                                                                                                                                        0.15




                                       0                   20            40                        60                         80                   100         120

                                                                                            Longitude
                                                                                               X                                                                                        0.10




                                                                                                                                                                         semivariance
                                                                                                                                                                                        0.05




Grain
Research &
Development                                                                                                                                                                                    10         20

                                                                                                                                                                                                               distance
                                                                                                                                                                                                                          30   40




Corporation
svm
              Areas where resistance is likely to occur (dark red)
                            60




                                                                                                                                                                         1.0
                            50




                                                                                              0
                                                        0
                                                                                                                                                                         0.8
                                                            0.6                  0.6               0
                                       0.6
                 Latitude
                            40




                                                                                                  0.2
                                 0.4




                                                        0.4
                                                 0.6                       1                                                                                             0.6
                                                                                       0.8                               0
                                                  0.2
                                                                                                  0                                                    0.2




                                                                                                                         0
                                                                           0.8
                    Y




                                                                                                                               0.6
                            30




                                                                                                                   0.2                                                   0.4




                                                                                                                                      0.8
                                                                                                                         0.6
                                                                                                                                                 0.6
                                                                                                                                     0.4
                                                                                                                   0.4

                                                                                                                                                                         0.2
                            20




                                                                       0                                                                           0.4



                                                                                                                                     0.2
                                                                                                                                                                         0.0
                            10




                                                                                  0
                                                                                        0.4             0.2




                                             0                    20                   40                     60                            80               100   120

                                                                                                   Longitude
                                                                                                      X




Grain
Research &
Development
Corporation
Content
              • Background
                 – PGR traits
                 – FIGS
              • Objective
                 – Develop a priori information
                 – Develop best bet subset of accs with traits
              • Datasets
                 – Trait data
                 – Environmental data
              • Methodologies
                 – Data preparation
                 – Modeling techniques
              • Results/Discussion
                 – Sub-setting (accessions/variables)
                 – “Hot spots”
Grain
Research &
              • Conclusion
Development                                                      63
Corporation
Results – stem rust on wheat
               Dataset (unit)        PPV                    LR+                                  Estimated gain
               Stem rust             0.54 (0.50-0.59)       3.07 (2.66-3.54)                     1.95 (1.79-2.09)
               (accession)
               Random                0.29 (0.26-0.33)       1.04 (0.90-1.20)                     1.03 (0.91-1.16)

               (28 % resistant samples)



               Stem rust (site)      0.50 (0.40-0.60)       4.00 (2.85-5.66)                     2.51 (2.02-2.98)

               Random                0.19 (0.13-0.26)       0.94 (0.63-1.39)                     0.95 (0.66-1.33)

               (20 % resistant samples)
                                                         PPV = Positive Predictive Value; LR+ = Positive Diagnostic Likelihood Ratio

               Endresen, D.T.F., K. Street, M. Mackay, A. Bari, E. De Pauw (2011). Predictive
               association between biotic stress traits and ecogeographic data for wheat and barley
Grain
Research &     landraces. Crop Science 51: 2036-2055. DOI: 10.2135/cropsci2010.12.0717
Development
Corporation
                                                                                                                               64
Results – stem rust on wheat
                                                                   AUC = Area Under the ROC Curve (ROC, Receiver Operating Curve)



               Classifier method                                   AUC                              Cohen’s Kappa

               Principal Component Regression                      0.69 (0.68-0.70)                 0.40 (0.37-0.42)
               (PCR)
               Partial Least Squares (PLS)                         0.69 (0.68-0.70)                 0.41 (0.39-0.43)

               Random Forest (RF)                                  0.70 (0.69-0.71)                 0.42 (0.40-0.44)

               Support Vector Machines (SVM)                       0.71 (0.70-0.72)                 0.44 (0.42-0.45)

               Artificial Neural Networks (ANN)                    0.71 (0.70-0.72)                 0.44 (0.42-0.46)

              Bari, A., K. Street, , M. Mackay, D.T.F. Endresen, E. De Pauw, and A. Amri (2011). Focused
              Identification of Germplasm Strategy (FIGS) detects wheat stem rust resistance linked to
Grain
              environment variables. Genetic Resources and Crop Evolution [online first]. doi:10.1007/s10722-
Research &    011-9775-5; Published online 3 Dec 2011.
Development
Corporation
                                                                                                                             65
Results – stem rust on wheat
               Classifier method          PPV                         LR+                       Estimated gain

               kNN (pre-study)            0.29 (0.13-0.53)            5.61 (2.21-14.28)         4.14 (1.86-7.57)
               SIMCA                      0.28 (0.14-0.48)            5.26 (2.51-11.01)         4.00 (2.00-6.86)

               Ensemble classifier        0.33 (0.12-0.65)            8.09 (2.23-29.42)         6.47 (2.05-11.06)
               Random                     0.06 (0.01-0.27)            0.95 (0.13-6.73)          0.97 (0.16-4.35)

               (pre-study, 550 + 275 accessions)
               Ensemble                   0.26 (0.22-0.30)            2.78 (2.34-3.31)          2.32 (2.00-2.68)

               Random                     0.11 (0.09-0.15)            1.02 (0.77-1.36)          0.95 (0.77-1.32)

               (blind study, 825 + 3738 accessions)

                                                PPV = Positive Predictive Value; LR+ = Positive Diagnostic Likelihood Ratio



               Endresen, D.T.F., K. Street, M. Mackay, A. Bari, E. De Pauw, K. Nazari, and A. Yahyaoui
               (2012). Sources of Resistance to Stem Rust (Ug99) in Bread Wheat and Durum Wheat
               Identified Using Focused Identification of Germplasm Strategy (FIGS). Crop Science
Grain          [online first]. doi: 10.2135/cropsci2011.08.0427; Published online 8 Dec 2011.
Research &
Development
Corporation
                                                                                                                       66
Results of stem rust (Ug99) on wheat

              4563 wheat landraces
              screened for Ug99


              10.2 % resistant
              accessions.


              The true trait scores for 20% of the
              accessions (825 samples)


              500 accessions more likely to be resistant from
              3728 accession with true scores hidden


              25.8 % resistant samples and thus 2.3 times
              higher than expected by chance.

Grain
Research &
Development
Corporation
                                                                67
Content
              • Background
                 – PGR traits
                 – FIGS
              • Objective
                 – Develop a priori information
                 – Develop best bet subset of accs with traits
              • Datasets
                 – Trait data
                 – Environmental data
              • Methodologies
                 – Data preparation
                 – Modeling techniques
              • Results/Discussion
                 – Sub-setting (accessions/variables)
                 – “Hot spots”
Grain
Research &
              • Conclusion
Development                                                      68
Corporation
Conclusion ...

              Results
               –   Raw data vs Transformed data
               –   PLS vs PCA
               –   Non-linear vs linear
               –   FIGS vs random (selection)


              Issues
               – Extent of variables (trait/agro-climate)
               – Phenology (adaptation)
               – Fuzzy approach (trait variation capture)




Grain
Research &
Development                                                 69
Corporation
Grain
Research &
Development
Corporation

More Related Content

PPTX
Transgene-free CRISPR/Cas9 genome-editing methods in plants
PDF
Crop plants with improved culture and quality traits for food, feed and othe...
PPTX
Domestication syndrome in crop plants
PPTX
Reverse Breeding: a tool to create homozygous plants from the heterozygous po...
PPTX
Speed breeding _Manoj CA
PPTX
AB QTL's
PPT
Fas-Track Breeding Approaches in Fruit Crops
PPTX
Breeding for quality traits in minor millets
Transgene-free CRISPR/Cas9 genome-editing methods in plants
Crop plants with improved culture and quality traits for food, feed and othe...
Domestication syndrome in crop plants
Reverse Breeding: a tool to create homozygous plants from the heterozygous po...
Speed breeding _Manoj CA
AB QTL's
Fas-Track Breeding Approaches in Fruit Crops
Breeding for quality traits in minor millets

What's hot (20)

PPTX
Allele mining in crop improvement
PPTX
Allele mining
PPTX
Marker assisted backcross breeding
PPTX
Role of molecular markers in vegetable crops
PPTX
Biotechnological applications in Male Sterility and Hybrid Breeding
PPTX
ADVANCES IN BIOTECHNOLOGY OF VEGETABLE CROPS
PPT
Diversity Array technology
PPTX
Genomic selection, prediction models, GEBV values, genomic selection in plant...
PPTX
Double Haploids in crop improvement.
PPTX
Marker-assisted Selection (MAS) in fruit crops
PPTX
Plant genetic resources in fruit science ankit
PPTX
Speed breeding presentation
PDF
Genomic Selection & Precision Phenotyping
PPTX
SPEED BREEDING AND ITS IMPLICATIONS IN CROP IMPROVEMENT
DOCX
Report- Genome wide association studies.
PPTX
development of hybrids in commercially important cole crops
PDF
S4.4 Doubled Haploid Technology in Maize breeding: Status and prospects
PDF
03 crop descriptors
PPTX
Molecular markers and Functional molecular markers
PPTX
High throughput phenotyping
Allele mining in crop improvement
Allele mining
Marker assisted backcross breeding
Role of molecular markers in vegetable crops
Biotechnological applications in Male Sterility and Hybrid Breeding
ADVANCES IN BIOTECHNOLOGY OF VEGETABLE CROPS
Diversity Array technology
Genomic selection, prediction models, GEBV values, genomic selection in plant...
Double Haploids in crop improvement.
Marker-assisted Selection (MAS) in fruit crops
Plant genetic resources in fruit science ankit
Speed breeding presentation
Genomic Selection & Precision Phenotyping
SPEED BREEDING AND ITS IMPLICATIONS IN CROP IMPROVEMENT
Report- Genome wide association studies.
development of hybrids in commercially important cole crops
S4.4 Doubled Haploid Technology in Maize breeding: Status and prospects
03 crop descriptors
Molecular markers and Functional molecular markers
High throughput phenotyping
Ad

Viewers also liked (18)

PDF
Improving the Accuracy of Object Based Supervised Image Classification using ...
PPTX
54rtgprjgb m
PPTX
Dasar dasar pelaksanaan-pendidikaan
PDF
Gira de Estudios
PDF
Trading StocksSemanal22/02/2013
PDF
Ultimate CRA Development Certificate
PPT
Solis, un paseo por la creación de su obra
ODP
Arduino
DOCX
Programación segunda fase torneo arfa regional sub 17 2012
PDF
IS1 Zekiel Schobernd - Letter of Recommendation
PPTX
Dayanne michea román
PDF
Word cloud of DigitalCSWomen
PPTX
Rubrica
PDF
Design hybrids
PDF
Random Forests R vs Python by Linda Uruchurtu
PPTX
Historia de la fesad
PPTX
Machine learning basics using trees algorithm (Random forest, Gradient Boosting)
Improving the Accuracy of Object Based Supervised Image Classification using ...
54rtgprjgb m
Dasar dasar pelaksanaan-pendidikaan
Gira de Estudios
Trading StocksSemanal22/02/2013
Ultimate CRA Development Certificate
Solis, un paseo por la creación de su obra
Arduino
Programación segunda fase torneo arfa regional sub 17 2012
IS1 Zekiel Schobernd - Letter of Recommendation
Dayanne michea román
Word cloud of DigitalCSWomen
Rubrica
Design hybrids
Random Forests R vs Python by Linda Uruchurtu
Historia de la fesad
Machine learning basics using trees algorithm (Random forest, Gradient Boosting)
Ad

Similar to Searching for traits in PGR collections using Focused Identification of Germplasm Strategy (20)

PDF
Trait data mining using FIGS, seminar at Copenhagen University (27 May 2009)
PPTX
NOVA PhD training course on pre-breeding, Nordic University Network (2012)
PDF
Trait data mining seminar at the Carlsberg research institute (CRI) (4 Nov 2009)
PPTX
New predictive characterization methods for accessing and using crop wild rel...
PPTX
Predictive association between trait data and eco-geographic data for Nordic ...
PPTX
FIGS workshop in Madrid, PGR Secure (9 to 13 January 2012)
PPT
Amman Workshop #3 - M MacKay
PDF
Trait data mining at European pre-breeding workshop at Alnarp (25 Nov 2009)
PDF
Science-based approaches for efficient conservation and use of genetic resources
PPT
Trait data mining using FIGS (2006)
PPTX
Ecogeographic core collections and FIGS
PPTX
Developing sound climate-smart strategies based on zoom-ins
PPT
Programme report-Global System and CWR
PDF
Korea Genebank,National Agrobiodiversity Center
PPTX
Pre breeding and crop improvement using cwr and lr
PDF
ICRISAT Global Planning Meeting 2019:Research Program - Genetic Gains by Dr R...
PDF
Research Program Genetic Gains (RPGG) Review Meeting 2021: A crop of prodigio...
PDF
Using diversity for forage development
PDF
Chirwa presentation-2Breeding for Multiple Constraint Resistant, Micronutrien...
PDF
Plant Genetic Resources for Food and Agriculture: A Commons Perspective
Trait data mining using FIGS, seminar at Copenhagen University (27 May 2009)
NOVA PhD training course on pre-breeding, Nordic University Network (2012)
Trait data mining seminar at the Carlsberg research institute (CRI) (4 Nov 2009)
New predictive characterization methods for accessing and using crop wild rel...
Predictive association between trait data and eco-geographic data for Nordic ...
FIGS workshop in Madrid, PGR Secure (9 to 13 January 2012)
Amman Workshop #3 - M MacKay
Trait data mining at European pre-breeding workshop at Alnarp (25 Nov 2009)
Science-based approaches for efficient conservation and use of genetic resources
Trait data mining using FIGS (2006)
Ecogeographic core collections and FIGS
Developing sound climate-smart strategies based on zoom-ins
Programme report-Global System and CWR
Korea Genebank,National Agrobiodiversity Center
Pre breeding and crop improvement using cwr and lr
ICRISAT Global Planning Meeting 2019:Research Program - Genetic Gains by Dr R...
Research Program Genetic Gains (RPGG) Review Meeting 2021: A crop of prodigio...
Using diversity for forage development
Chirwa presentation-2Breeding for Multiple Constraint Resistant, Micronutrien...
Plant Genetic Resources for Food and Agriculture: A Commons Perspective

More from CIAT (20)

PPTX
Agricultura Sostenible y Cambio Climático
PDF
Resumen mesas trabajo
PDF
Impacto de las intervenciones agricolas y de salud para reducir la deficienci...
PDF
Agricultura sensible a la nutrición en el Altiplano. Explorando las perspecti...
PDF
El rol de los padres en la nutrición del hogar
PPTX
Scaling up soil carbon enhancement contributing to mitigate climate change
PPTX
Impacto del Cambio Climático en la Agricultura de República Dominicana
PDF
BioTerra: Nuevo sistema de monitoreo de la biodiversidad en desarrollo por el...
PPTX
Investigaciones sobre Cadmio en el Cacao Colombiano
PPTX
Cacao for Peace Activities for Tackling the Cadmium in Cacao Issue in Colo...
PPTX
Tackling cadmium in cacao and derived products – from farm to fork
PPTX
Cadmium bioaccumulation and gastric bioaccessibility in cacao: A field study ...
PPTX
Geographical Information System Mapping for Optimized Cacao Production in Col...
PDF
Contenido de cadmio en granos de cacao
PPTX
Técnicas para disminuir la disponibilidad de cadmio en suelos de cacaoteras
PPTX
Cacao and Cadmium Research at Penn State
PPTX
Aportes para el manejo de Cd en cacao
PPTX
CENTRO DE INNOVACIÓN DEL CACAO PERÚ
PPTX
Investigaciones sore Cadmio en el Cacao Colombiano
PPTX
Avances de investigación en cd en cacao
Agricultura Sostenible y Cambio Climático
Resumen mesas trabajo
Impacto de las intervenciones agricolas y de salud para reducir la deficienci...
Agricultura sensible a la nutrición en el Altiplano. Explorando las perspecti...
El rol de los padres en la nutrición del hogar
Scaling up soil carbon enhancement contributing to mitigate climate change
Impacto del Cambio Climático en la Agricultura de República Dominicana
BioTerra: Nuevo sistema de monitoreo de la biodiversidad en desarrollo por el...
Investigaciones sobre Cadmio en el Cacao Colombiano
Cacao for Peace Activities for Tackling the Cadmium in Cacao Issue in Colo...
Tackling cadmium in cacao and derived products – from farm to fork
Cadmium bioaccumulation and gastric bioaccessibility in cacao: A field study ...
Geographical Information System Mapping for Optimized Cacao Production in Col...
Contenido de cadmio en granos de cacao
Técnicas para disminuir la disponibilidad de cadmio en suelos de cacaoteras
Cacao and Cadmium Research at Penn State
Aportes para el manejo de Cd en cacao
CENTRO DE INNOVACIÓN DEL CACAO PERÚ
Investigaciones sore Cadmio en el Cacao Colombiano
Avances de investigación en cd en cacao

Recently uploaded (20)

PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
gpt5_lecture_notes_comprehensive_20250812015547.pdf
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PPTX
A Presentation on Artificial Intelligence
PDF
NewMind AI Weekly Chronicles - August'25-Week II
PDF
Accuracy of neural networks in brain wave diagnosis of schizophrenia
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
cuic standard and advanced reporting.pdf
PPTX
Big Data Technologies - Introduction.pptx
PPTX
Spectroscopy.pptx food analysis technology
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
Digital-Transformation-Roadmap-for-Companies.pptx
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
gpt5_lecture_notes_comprehensive_20250812015547.pdf
The Rise and Fall of 3GPP – Time for a Sabbatical?
Mobile App Security Testing_ A Comprehensive Guide.pdf
Dropbox Q2 2025 Financial Results & Investor Presentation
MIND Revenue Release Quarter 2 2025 Press Release
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
A Presentation on Artificial Intelligence
NewMind AI Weekly Chronicles - August'25-Week II
Accuracy of neural networks in brain wave diagnosis of schizophrenia
“AI and Expert System Decision Support & Business Intelligence Systems”
cuic standard and advanced reporting.pdf
Big Data Technologies - Introduction.pptx
Spectroscopy.pptx food analysis technology
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Reach Out and Touch Someone: Haptics and Empathic Computing
Advanced methodologies resolving dimensionality complications for autism neur...

Searching for traits in PGR collections using Focused Identification of Germplasm Strategy

  • 1. Searching for traits in PGR collections using Focused Identification of Germplasm Strategy (FIGS) Abdallah Bari, Kenneth Street, Michael Mackay, Eddy De Pauw, Dag Endresen, Ahmed Amri, Kumarse Nazari and Ammor Yahiaoui CIAT Palmira, Colombia 14 March 2012 Grain Research & Development Corporation
  • 2. Content • Background – PGR - traits – FIGS - traits • Objective – Develop a priori information – Develop best bet subset of accs with traits • Datasets – Trait data – Environmental data • Methodologies – Data preparation – Modeling techniques • Results/Discussion – Sub-setting (accessions/variables) – “Hot spots” Grain Research & • Conclusion Development 2 Corporation
  • 3. ICARDA ICARDA’s Worldwide presence International Center for Agricultural Research in the Dry Areas (ICARDA) Grain Research & Development Corporation
  • 4. ICARDA PGR centers of origin and diversity Grain Research & Development Corporation
  • 5. PGR contribution Traits of importance to agriculture – phenological adaptation (short growth duration), – efficient use of water, – resistance to biotic stresses (diseases and insects), – tolerance to abiotic stresses (such as drought and salinity), and – superior grain quality Grain Research & plant pre-evaluation Development Corporation
  • 6. PGR Challenges • 50 - 60 000 traits (loci) • 7 million of accessions • 1400 genebanks Seed samples Grain Research & Development Corporation
  • 7. PGR Challenges A needle in a hay stack PGR users want variation for specific traits and a hundred germplasm accessions to evaluate. Grain Research & Development 7 Corporation
  • 8. PGR Challenges and Concerns • Size of collections – Addressed by Brown et al. 1999 • Cost in evaluating accessions lacking the desired trait – Addressed by Gollin et al. 2000 Grain Research & Development Corporation
  • 9. Content • Background – PGR traits – FIGS • Objective – Develop a priori information – Develop best bet subset of accs with traits • Datasets – Trait data – Environmental data • Methodologies – Data preparation – Modeling techniques • Results/Discussion – Sub-setting (accessions/variables) – “Hot spots” Grain Research & • Conclusion Development 9 Corporation
  • 10. Objective FIGS searches genetic resources (data) germplasm collections to detect any particular trait-environment patterns/ relationships (as a priori information). This a priori information is then used to develop predictive models to find novel genetic variation of the traits of interest and where it is likely to occur the most. Quantification Utilization of of trait- A priori Develop genetic environment information trait subsets resources relationship Grain Research & Development 10 Corporation
  • 11. Origin of FIGS approach Boron toxicity of wheat and barley – early FIGS examples Mediterranean Sea Wheat landraces from marine origin soils in Mediterranean region provided Grain all the genetic variation needed to produce boron tolerant varieties Research & Development M.C. Mackay, 1995 Corporation
  • 12. FIGS approach “FIGS applies to plant genetic resources (stored collections) the same selection pressure exerted on plants by evolution.” PGR Collection sampling core (Biodiversity) PGR sampling trait user (Biodiversity) Grain Research & Development 12 Corporation
  • 13. FIGS approach FIGS has helped breeders identify long sought-after plant traits such as resistance to: – Net blotch (barley), – Powdery mildew, – Russian wheat aphid (RWA) and – Sunn pest. Braidotti, G.2009. Keys to the gene bank, Biotechnology. Partners in Research for Development 16-17. Grain Research & Development Corporation
  • 14. Sunn pest trait of resistance 8 landrace accessions from Afghanistan and 2 from Tajikistan identified as resistant at juvenile stage Now developing mapping populations Grain Research & 14 Development Corporation
  • 15. FIGS approach to Pm 16,000 variétés locales de blé FIGS applique 1,300 sélectionnées Phenotyping 40% yielded accessions that were 211 accs entre R et IR resistant to the Genotyping isolates used 7 nouveau allèles Au moins 2 ont la spécificité de race nouvelle 100 ans de génétique classiques = 7 allèles Kaur K; Street K; Mackay M; Yahiaoui N; Keller B (2008). Allele mining and sequence Grain Research & diversity at the wheat powdery mildew resistance locus Pm3. 11th IWGS, 24-29 Aug., Development Brisbane) Corporation
  • 16. Locating new Pm3 alleles The distribution of the new seven functional alleles of Pm3 Out of 96.2% of the total set screened Turkey Afghanistan Iran Pakistan and Armenia Grain Research & 16 Development Corporation
  • 17. The FIGS picture Genotypes x Environments x Time1 = Genetic Variation Can we use the same evolutionary principles in reverse to identify the environments that ‘engender’ trait specific genetic variation? Environments x Traits x Time = Trait variation (ExT)? 1 plus some selection Grain Research & 17 Development Corporation
  • 18. Examples of eco-geographic variation of traits linked to environmental influences Environment influence Trait Species Reference Low altitudes, high winter emp., Cyanogenesis Trifolium repens Pederson, Fairbrother et al. low summer rain, spring 1996 cloudiness Aridity Seed dormancy, early Annual legumes Ehrman and Cocks 1996 flowering, high seed to pod ratio Soil type Tolerance to Boron toxicity Bread wheat Mackay (1990) Altitude, winter temp, RWA Russian Wheat Aphid (RWA) Bread wheat Bohssini, et al accepted for distribution resistance publication 2008 Temperature, aridity Drought resistance Triticum dicoccoides Peleg, Fahima et al. 2005 Altitude Glume colour and beak length Durum wheat Bechere, Belay et al. 1996 Climate, soil and water Heading date, culm length, Triticum dicoccoides Beharav and Nevo 2004 availability biomass, grain yield and its Components Precipitation, minimum Glutenin diversity Durum wheat Vanhintum and Elings 1991 January temperature, altitude. temperature, aridity More efficient RUBISCO Woody perennials Galmes et al, 2005 activity Grain Research &relations, Water Development temperature Hordatine accumulation Barely After18 C Mackay M. Batchu, Zimmermann et al. and Corporation (disease defence) 2006
  • 19. FIGS system PGR collections User defined needs Database Filters Type of material Evaluation data Collection site Interface Other information Size limit 500 1500 250 750 Grain See www.figstraitmine.com New Subset After M. C Mackay 1995 Research & 19 Development Corporation
  • 20. Mining natural variation By linking traits, environments (and associated selection pressures) with genebank accessions (e.g. landraces and crop relatives) we can ‘focus’ in on those accession most likely to possess trait specific genetic variation. 60 50 40 Latitude 30 20 10 0 0 50 100 150 Longitude Environnement Trait FIGS subset Grain Research & Development Corporation
  • 21. FIGS approach – summarized Focused Identification of Germplasm Strategy Environment (E) Trait (T) Geo-referencing of Evaluation collecting places (phenotyping) Accession (G) Grain Research & Development Corporation 21
  • 22. Content • Background – PGR traits – FIGS • Objective – Develop a priori information – Develop best bet subset of accs with traits • Datasets – Trait data – Environmental data • Methodologies – Data preparation – Modeling techniques • Results/Discussion – Sub-setting (accessions/variables) – “Hot spots” Grain Research & • Conclusion Development 22 Corporation
  • 23. Eco-climate data (X) ICARDA eco-climatic database, average: annual temperature (front), annual precipitation (middle), and winter precipitation (back) (De Pauw 2008) Climate data (X as independent variables) site_code1 prec01 prec02 prec03 prec04 prec05 ….. ari01 ari02 ari03 ari04 ari05 ETH-S893 25 36 72 154.22 148.88 0.167 0.246 0.439 1.098 1.169 ETH-S1222 29 44 92 167.46 168 0.223 0.344 0.646 1.354 1.612 NS_339 44 67 130.43 177.96 185.74 0.351 0.552 0.949 1.457 1.751 ETH-S1153 36 48 86 140.92 131.94 0.28 0.39 0.609 1.108 1.078 NS_415 32 46.61 95.42 150.3 157 0.271 0.419 0.732 1.289 1.437 NS_424 31.94 45 90 143.62 150 0.257 0.38 0.641 1.146 1.272 ETH64:55 28 38.26 57 97.57 81 0.247 0.344 0.45 0.834 0.662 NS_525 28 39 57 97.13 80.78 0.248 0.352 0.452 0.836 0.669 NS_526 27 39 57 97.01 80.77 0.241 0.354 0.455 0.842 0.68 NS_559 23 40 61.89 129.04 102 0.226 0.397 0.511 1.206 0.998 . . . Source: International Center for Agricultural Research in the Dry Areas (ICARDA) . . Grain Research & Development Corporation
  • 24. Eco-climate data (X) Layers used in the stem rust studies: • Precipitation (rainfall) • Maximum temperatures • Minimum temperatures + Derived GIS layers such as: • Potential evapotranspiration (water-loss) • Agro-climatic Zone (UNESCO classification) • Moisture/Aridity index (mean values for month and year) Grain Research & Development 24 Corporation
  • 25. Trait data set (Y) Trait data (Y as dependent variable) http://www.news.cornell.edu/ site_code1 R_state0 R_state1 R_state2 R_state3 R_state4 R_state5 R_state6 R_state7 R_state8 R_state9 ETH-S893 0 0 0 0 0 0 0 0 1 0 ETH-S1222 0 0 0 0 0 0 0 0 0 1 NS_339 0 0 0 0 0 0 1 0 1 0 ETH-S1153 0 0 0 0 2 1 3 0 0 0 NS_415 0 0 0 0 0 0 1 0 0 0 NS_424 0 0 0 1 0 0 0 0 0 0 ETH64:55 0 0 1 0 0 0 0 0 0 0 NS_525 0 0 0 0 0 0 1 0 0 0 NS_526 0 1 2 1 2 0 3 0 0 0 . NS_559 2 5 1 0 0 2 0 0 0 0 . ETH64:53 . 0 0 1 0 0 0 0 0 0 0 . . Source: (USDA) National Genetic Resources Program (NGRP) GRIN database Grain Research & Development Corporation
  • 26. Searching for stem rust trait of resistance - concerns Stem rust spreading to wheat production areas http://www.news.cornell.edu/ Grain Research & Development Corporation
  • 27. Stem rust on wheat landraces – trait data Green dots indicate collecting sites for resistant wheat landraces and red dots collecting sites for susceptible landraces. USDA GRIN, trait data online: Field experiments made in http://www.ars-grin.gov/cgi-bin/npgs/html/desc.pl?65049 Minnesota by Don McVey Grain Research & Development Corporation 27
  • 28. Content • Background – PGR traits – FIGS • Objective – Develop a priori information – Develop best bet subset of accs with traits • Datasets – Trait data – Environmental data • Methodologies – Data preparation – Modeling techniques • Results/Discussion – Sub-setting (accessions/variables) – “Hot spots” Grain Research & • Conclusion Development 28 Corporation
  • 29. Data preparation Climate data (X as independent variables) Power relationship ~ 2(p) (spread) site_code ….. ari02 ….. ETH-S893 0.246 ETH-S1222 0.344 NS_339 0.552 ETH-S1153 0.390 NS_415 0.419 NS_424 0.380 ETH64:55 0.344 NS_525 0.352 NS_526 0.354 NS_559 0.397 500 800 400 600 Frequecy 300 Frequecy 400 200 200 100 0 0 0 5 10 15 -4 -2 0 2 4 Aridity or Moisture Index during February Aridity or Moisture Index during February Grain Research & Development 29 Corporation
  • 30. Platform Geographical R language Information System (Development of algorithms) (GIS) > Data transformation ( ) Arc Gis > Model <- model(trait ~ climate) > Environmental data/layers Measuring accuracy metrics > …. (surfaces) Modeling purpose Generation of environmental data Grain Research & Development 30 Corporation
  • 31. Modeling framework Trait data (Y) Environmental data (X) Y ~ f(X) Fist linear approach irrespective of the underlying distributions describing the data Yi ~ X is the set of variables that contains explanatory variables or predictors (climate data) where X ∈ Rm, Y ∈ Y that is either a categorical (label) or a numerical response (trait descriptor Yi ~ states). Grain Research & Development 31 Corporation
  • 32. Modeling framework • Principal component analysis (PCA) • Partial Least Square (PLS) • Random Forest (RF) • Support Vector Machines (SVM) • Neural Networks (NN) Bari A., Street K., Mackay M., Endresen D.T.F., De Pauw E. & Amri A. (2011) Focused identification of germplasm strategy (FIGS) detects wheat stem rust resistance linked to environmental variables. Genetic Resources and Crop Evolution http://www.springerlink.com/content/m7140x68v2065113/fulltext.pdf Grain Research & Development Corporation
  • 33. Principal Component Analysis (PCA) • Principal component analysis (PCA) • Partial Least Square (PLS) • Random Forest (RF) • Support Vector Machines (SVM) • Neural Networks (NN) B a matrix of coefficients. The prediction was initially carried out using the number of components (PCs) that account for 95% of explained variance. Followed by adding a component at a time till the error reached a minimum Grain Research & Development 33 Corporation
  • 34. Partial Least Square (PLS) • Principal component analysis (PCA) • Partial Least Square (PLS) • Random Forest (RF) • Support Vector Machines (SVM) • Neural Networks (NN) PLS : A product of factors and their loadings (regression coefficients) where both environmental dataset and trait dataset simultaneously The prediction was initially carried out using the number of components (PCs) that account for 95% of explained variance. Followed by adding a component at a time till the error reached a minimum Grain Research & Development 34 Corporation
  • 35. Random Forest (RF) • Principal component analysis (PCA) • Partial Least Square (PLS) • Random Forest (RF) Data • Support Vector Machines (SVM) • Neural Networks (NN) Bootstrapping (with replacement) Training (set) Out-of-bag (set) OOB ntree 1 ntree 2 ntree 1000 Grain Research & Development 35 Corporation
  • 36. Support Vector Machines (SVM) • Principal component analysis (PCA) • Partial Least Square (PLS) SVM a learning-based technique that maps • Random Forest (RF) input data to a high-dimensional space. • Support Vector Machines (SVM) • Neural Networks (NN) Optimally separates mapped input into respective classes v v (x) v (x) v (x) (x) (x) From l-dimensional space (input variable space) into k-dimensional space, where k is more higher than l. Grain Research & Development 36 Corporation
  • 37. Neural Networks (NN) • Principal component analysis (PCA) • Partial Least Square (PLS) Neural Networks (RBF) • Random Forest (RF) • Support Vector Machines (SVM) • Neural Networks (NN) error Test set x1 x2 F(x) Training set xp epochs number Grain Research & Development 37 Corporation
  • 38. Optimization/tuning error Test set Training set PCs, LVs or epochs number Trend of output error versus the number of components(PCs/LVs) or epochs (NN) Grain Research & Development Corporation
  • 39. Accuracy metrics Parameters that provide information on the specificity (“trait agro-climate”) Confusion matrix (2-by-2 contingency table) Observed Resistant Susceptible Predicted Resistant a b Susceptible c d Sensitivity a/ (a + c) = Specificity d/(b + d) = and are indicators of the models ability to correctly classify observations. Grain Research & Development Corporation
  • 40. Accuracy metrics Parameters that provide information on the specificity (“trait agro-climate”) .. High AUC (area) values indication of potential trait-environment relationship 1- ROC curve pdf’s of trait distribution 1 1 Grain The ROC curve and the resulting pdf’s of trait distribution (trait states) Research & Development Corporation
  • 41. Accuracy metrics Randomness 1- ROC curve pdf’s of trait distribution 1 1 Grain Research & Development Corporation
  • 42. Content • Background – PGR traits – FIGS • Objective – Develop a priori information – Develop best bet subset of accs with traits • Datasets – Trait data – Environmental data • Methodologies – Data preparation – Modeling techniques • Results/Discussion – Sub-setting (accessions/variables) – “Hot spots” Grain Research & • Conclusion Development 42 Corporation
  • 43. Data preparation - Raw data PCs = 42 1.0 1 0.46 0.71 0.8 True positive rate 0.44 0.6 RMSE 0.13 0.4 0.42 0.2 0.40 -0.45 0.0 0 10 20 30 40 50 60 0.0 0.2 0.4 0.6 0.8 1.0 Number of components False positive rate Distribution by trait 2.0 AUC = 0.67 1.5 Density Kappa = 0.40 1.0 0.5 0.0 -0.5 0.0 0.5 1.0 Grain Research & Development Corporation
  • 44. Data preparation – Transformed data PCs = 42 1.0 0.46 0.59 0.8 True positive rate 0.44 0.6 RMSE 0.42 0.03 0.4 0.2 0.40 -0.54 0.0 0 10 20 30 40 50 60 0.0 0.2 0.4 0.6 0.8 1.0 Number of components False positive rate Distribution by trait 2.0 1.5 AUC = 0.71 Density 1.0 Kappa = 0.45 0.5 0.0 -0.5 0.0 0.5 1.0 Grain Research & Development Corporation
  • 45. Data preparation - Raw data (PLS) LVs = 30 1.0 0.46 0.68 0.8 True positive rate 0.44 0.6 RMSE 0.07 0.4 0.42 0.2 0.40 -0.55 0.0 0 10 20 30 40 50 60 0.0 0.2 0.4 0.6 0.8 1.0 Number of components False positive rate Distribution by trait 2.0 AUC = 0.70 1.5 Density Kappa = 0.43 1.0 0.5 0.0 -1.0 -0.5 0.0 0.5 1.0 Grain Research & Development Corporation
  • 46. Data preparation – Transformed data LVs = 22 0.6 0.85 1.0 0.46 0.8 True positive rate 0.44 0.6 RMSE 0.42 0.09 0.4 0.2 0.40 -0.42 0.0 0 10 20 30 40 50 60 0.0 0.2 0.4 0.6 0.8 1.0 Number of components False positive rate Distribution by trait 2.0 AUC = 0.71 1.5 Density 1.0 Kappa = 0.44 0.5 0.0 -0.5 0.0 0.5 1.0 Grain Research & Development Corporation
  • 47. Optimization process R_CALC R_CALC 0.46 0.46 0.44 0.44 RMSEP RMSEP 0.42 0.42 0.40 0.40 0 10 20 30 40 50 60 0 10 20 30 40 50 60 number of components number of components Mean square error (RMSEP) for PCA (left) and PLS (right) models. Arrow indicate minimum errors where the number of components (PCs and LVs) were selected for prediction (red/discount nous = test data, continuous line = training set) Grain Research & Development 47 Corporation
  • 48. PCA PC2 Few components  ~ random Distribution per R_CALC 1.0 12 Resistant 0.8 Susceptible 10 True positive rate 0.6 8 Density 6 0.4 4 0.2 2 0.0 0.0 0.2 0.4 0.6 0.8 1.0 0 0.2 0.3 0.4 0.5 False positive rate ... Grain Research & Development 48 Corporation
  • 49. PCA PC5 Distribution per R_CALC 1.0 4 Resistant Susceptible 0.8 3 True positive rate 0.6 Density 2 0.4 1 0.2 0.0 0.0 0.2 0.4 0.6 0.8 1.0 0 -0.4 -0.2 0.0 0.2 0.4 0.6 0.8 1.0 False positive rate ... Grain Research & Development 49 Corporation
  • 50. PLS LV2 2 latent variables of PLS are better than 2 PCs of PCA Distribution per R_CALC 1.0 4 Resistant Susceptible 0.8 3 True positive rate 0.6 Density 2 0.4 1 0.2 0.0 0 0.0 0.2 0.4 0.6 0.8 1.0 -0.4 -0.2 0.0 0.2 0.4 0.6 0.8 1.0 False positive rate ... Grain Research & Development 50 Corporation
  • 51. PLS LV10 Distribution per R_CALC 1.0 Resistant 2.0 0.8 Susceptible True positive rate 0.6 1.5 Density 0.4 1.0 0.2 0.5 0.0 0.0 0.0 0.2 0.4 0.6 0.8 1.0 False positive rate -0.5 0.0 0.5 1.0 ... Grain Research & Development 51 Corporation
  • 52. PCA (optimized) • Principal component analysis (PCA) • Partial Least Square (PLS) • Random Forest (RF) • Support Vector Machines (SVM) • Neural Networks (NN) ROC curve 1.0 2.0 True positive rate 0.8 1.5 Density 0.6 1.0 0.4 0.5 0.2 0.0 0.0 0.0 0.4 0.8 -0.5 0.0 0.5 1.0 False positive rate Prediction Grain Research & Development Corporation
  • 53. PLS (optimized) • Principal component analysis (PCA) • Partial Least Square (PLS) • Random Forest (RF) • Support Vector Machines (SVM) • Neural Networks (NN) ROC curve 1.0 2.0 True positive rate 0.8 1.5 Density 0.6 1.0 0.4 0.5 0.2 0.0 0.0 0.0 0.4 0.8 -0.5 0.0 0.5 1.0 False positive rate Prediction Grain Research & Development Corporation
  • 54. RF • Principal component analysis (PCA) • Partial Least Square (PLS) • Random Forest (RF) • Support Vector Machines (SVM) • Neural Networks (NN) ROC curve 3.0 1.0 2.5 True positive rate 0.8 2.0 Density 0.6 1.5 0.4 1.0 0.2 0.5 0.0 0.0 0.0 0.4 0.8 0.0 0.5 1.0 False positive rate Prediction Grain Research & Development Corporation
  • 55. SVM • Principal component analysis (PCA) • Partial Least Square (PLS) • Random Forest (RF) • Support Vector Machines (SVM) • Neural Networks (NN) ROC curve 1.0 4 True positive rate 0.8 3 Density 0.6 2 0.4 1 0.2 0.0 0 0.0 0.4 0.8 0.0 0.5 1.0 False positive rate Prediction Grain Research & Development Corporation
  • 56. NN • Principal component analysis (PCA) • Partial Least Square (PLS) • Random Forest (RF) • Support Vector Machines (SVM) • Neural Networks (NN) ROC curve 1.0 3.0 True positive rate 0.8 2.5 Density 2.0 0.6 1.5 0.4 1.0 0.2 0.5 0.0 0.0 0.0 0.4 0.8 -0.2 0.2 0.6 1.0 False positive rate Prediction Grain Research & Development Corporation
  • 57. Random (PCA) R_CALC 0.470 1.0 Complete random 0.8 distribution 0.465 True positive rate 0.6 RMSEP of trait of 0.4 stem rust 0.460 resistance 0.2 AUC ~ 0.5 0.0 0.0 0.2 0.4 0.6 0.8 1.0 0 10 20 30 40 50 60 False positive rate number of components 1.0 0.1 0.465 0.2 Partially 0.460 0.8 0.3 random 0.455 True positive rate 0.6 0.4 0.450 distribution RMSE of trait of 0.4 0.445 0.5 stem rust 0.2 0.440 resistance 0.6 0.435 0.0 0.8 0.7 0.0 0.2 0.4 0.6 0.8 1.0 Grain False positive rate 0 10 20 30 40 50 60 Research & Number of components Development 57 Corporation
  • 58. Stem rust hot spots 60 50 40 Latitude 30 20 10 0 0 50 100 150 Longitude Grain Research & Development Corporation
  • 59. Stem rust hot spots areas where resistance is latitude 60 50 40 likely to occur (longitude wise) Latitude 30 1 20 10 0 0 50 100 150 60 Longitude b 50 40 Latitude 30 20 10 Grain 0 Research & Development Corporation 0 50 longitude Longitude 100 150
  • 60. PLS (optimized) Areas where resistance is likely to occur (dark red) 60 -0.2 0.8 0 50 0.2 0.6 2 -0. 0.4 0 0.6 -0.2 0.6 Latitude 40 0.2 0 0.2 0.4 0.6 0.4 Y 0.6 30 0 0.2 0.6 0.4 0.2 20 0 0 0.0 0 0 0.2 0.4 -0.2 10 0.4 0.08 0 20 40 60 80 100 120 Longitude 0.06 X semivariance 0.04 0.02 Grain Research & 10 20 30 40 Development distance Corporation
  • 61. Random Forest (RF) Areas where resistance is likely to occur (dark red) 60 0.4 50 0.8 0.2 0 0.4 0 0.6 0.8 0.6 Latitude 40 0.2 0.4 0.6 0.2 0.6 0 0.4 0.2 0.4 0.2 Y 0 0.4 30 0.6 0.4 0.6 0.6 0.2 20 0.0 10 0.2 0.4 0.15 0 20 40 60 80 100 120 Longitude X 0.10 semivariance 0.05 Grain Research & Development 10 20 distance 30 40 Corporation
  • 62. svm Areas where resistance is likely to occur (dark red) 60 1.0 50 0 0 0.8 0.6 0.6 0 0.6 Latitude 40 0.2 0.4 0.4 0.6 1 0.6 0.8 0 0.2 0 0.2 0 0.8 Y 0.6 30 0.2 0.4 0.8 0.6 0.6 0.4 0.4 0.2 20 0 0.4 0.2 0.0 10 0 0.4 0.2 0 20 40 60 80 100 120 Longitude X Grain Research & Development Corporation
  • 63. Content • Background – PGR traits – FIGS • Objective – Develop a priori information – Develop best bet subset of accs with traits • Datasets – Trait data – Environmental data • Methodologies – Data preparation – Modeling techniques • Results/Discussion – Sub-setting (accessions/variables) – “Hot spots” Grain Research & • Conclusion Development 63 Corporation
  • 64. Results – stem rust on wheat Dataset (unit) PPV LR+ Estimated gain Stem rust 0.54 (0.50-0.59) 3.07 (2.66-3.54) 1.95 (1.79-2.09) (accession) Random 0.29 (0.26-0.33) 1.04 (0.90-1.20) 1.03 (0.91-1.16) (28 % resistant samples) Stem rust (site) 0.50 (0.40-0.60) 4.00 (2.85-5.66) 2.51 (2.02-2.98) Random 0.19 (0.13-0.26) 0.94 (0.63-1.39) 0.95 (0.66-1.33) (20 % resistant samples) PPV = Positive Predictive Value; LR+ = Positive Diagnostic Likelihood Ratio Endresen, D.T.F., K. Street, M. Mackay, A. Bari, E. De Pauw (2011). Predictive association between biotic stress traits and ecogeographic data for wheat and barley Grain Research & landraces. Crop Science 51: 2036-2055. DOI: 10.2135/cropsci2010.12.0717 Development Corporation 64
  • 65. Results – stem rust on wheat AUC = Area Under the ROC Curve (ROC, Receiver Operating Curve) Classifier method AUC Cohen’s Kappa Principal Component Regression 0.69 (0.68-0.70) 0.40 (0.37-0.42) (PCR) Partial Least Squares (PLS) 0.69 (0.68-0.70) 0.41 (0.39-0.43) Random Forest (RF) 0.70 (0.69-0.71) 0.42 (0.40-0.44) Support Vector Machines (SVM) 0.71 (0.70-0.72) 0.44 (0.42-0.45) Artificial Neural Networks (ANN) 0.71 (0.70-0.72) 0.44 (0.42-0.46) Bari, A., K. Street, , M. Mackay, D.T.F. Endresen, E. De Pauw, and A. Amri (2011). Focused Identification of Germplasm Strategy (FIGS) detects wheat stem rust resistance linked to Grain environment variables. Genetic Resources and Crop Evolution [online first]. doi:10.1007/s10722- Research & 011-9775-5; Published online 3 Dec 2011. Development Corporation 65
  • 66. Results – stem rust on wheat Classifier method PPV LR+ Estimated gain kNN (pre-study) 0.29 (0.13-0.53) 5.61 (2.21-14.28) 4.14 (1.86-7.57) SIMCA 0.28 (0.14-0.48) 5.26 (2.51-11.01) 4.00 (2.00-6.86) Ensemble classifier 0.33 (0.12-0.65) 8.09 (2.23-29.42) 6.47 (2.05-11.06) Random 0.06 (0.01-0.27) 0.95 (0.13-6.73) 0.97 (0.16-4.35) (pre-study, 550 + 275 accessions) Ensemble 0.26 (0.22-0.30) 2.78 (2.34-3.31) 2.32 (2.00-2.68) Random 0.11 (0.09-0.15) 1.02 (0.77-1.36) 0.95 (0.77-1.32) (blind study, 825 + 3738 accessions) PPV = Positive Predictive Value; LR+ = Positive Diagnostic Likelihood Ratio Endresen, D.T.F., K. Street, M. Mackay, A. Bari, E. De Pauw, K. Nazari, and A. Yahyaoui (2012). Sources of Resistance to Stem Rust (Ug99) in Bread Wheat and Durum Wheat Identified Using Focused Identification of Germplasm Strategy (FIGS). Crop Science Grain [online first]. doi: 10.2135/cropsci2011.08.0427; Published online 8 Dec 2011. Research & Development Corporation 66
  • 67. Results of stem rust (Ug99) on wheat 4563 wheat landraces screened for Ug99 10.2 % resistant accessions. The true trait scores for 20% of the accessions (825 samples) 500 accessions more likely to be resistant from 3728 accession with true scores hidden 25.8 % resistant samples and thus 2.3 times higher than expected by chance. Grain Research & Development Corporation 67
  • 68. Content • Background – PGR traits – FIGS • Objective – Develop a priori information – Develop best bet subset of accs with traits • Datasets – Trait data – Environmental data • Methodologies – Data preparation – Modeling techniques • Results/Discussion – Sub-setting (accessions/variables) – “Hot spots” Grain Research & • Conclusion Development 68 Corporation
  • 69. Conclusion ... Results – Raw data vs Transformed data – PLS vs PCA – Non-linear vs linear – FIGS vs random (selection) Issues – Extent of variables (trait/agro-climate) – Phenology (adaptation) – Fuzzy approach (trait variation capture) Grain Research & Development 69 Corporation

Editor's Notes

  • #22: Landrace samples (genebank seed accessions)Trait observations (experimental design) - High cost dataClimate data (for the landrace location of origin) - Low cost dataThe accession identifier (accession number) provides the bridge to the crop trait observations.The longitude, latitude coordinates for the original collecting site of the accessions (landraces) provide the bridge to the environmental data.
  • #28: GRIN database (USDA-ARS, National Plant Germplasm System, Germplasm Resources Information Network, online http://www.ars-grin.gov/npgs) USDA GRIN, trait data online: http://www.ars-grin.gov/cgi-bin/npgs/html/desc.pl?65049
  • #65: Photo: USDA ARS Image k1192-1, http://www.ars.usda.gov/is/graphics/photos/mar09/k11192-1.htm
  • #66: USDA ARS Image Archive, http://www.ars.usda.gov/is/graphics/photos/
  • #67: Photo: Wheat infected by stem rust (Ug99) at the Kenya Agricultural Research Station in Njoro northwest of Nairobi.
  • #68: Endresen, D.T.F., K. Street, M. Mackay, A. Bari, E. De Pauw, K. Nazari, and A. Yahyaoui (2012). Sources of Resistance to Stem Rust (Ug99) in Bread Wheat and Durum Wheat Identified Using Focused Identification of Germplasm Strategy (FIGS). Crop Science [online first]. doi: 10.2135/cropsci2011.08.0427; Published online 8 Dec 2011.