SlideShare a Scribd company logo
Embedded Feature Selection
of Hyperspectral Bands with
Boosted Decision Trees
Sildomar Monteiro and Richard Murphy

The University of Sydney
Rio Tinto Centre for Mine Automation
    • Totally Autonomous Mine in 10 years:
        – Brings together all elements of systems, perception,
          machine learning, data fusion and more
        – A grand challenge for Field Robotics
    • Driven by safety, predictability and efficiency




     Dr Sildomar Monteiro                   IGARSS 2011          2
2
Goal: Mine Picture Compilation
• Provide a complete and accurate model of the mine
    – Mine planning and better prediction outcomes
• Maintain and update a multi-scale probabilistic
  representation
    –   Geology
    –   Geometry
    –   Equipment
    –   And other properties of interest for the mining process




 Dr Sildomar Monteiro                      IGARSS 2011            3
Today
                    Today




        Geology Feedback to Batch




                                                                  Floor mapping using
                                                                 ripped trench sections
                                    Cone logging

    Dr Sildomar Monteiro                           IGARSS 2011                            4
4
Geology (ground-truth)




Dr Sildomar Monteiro             IGARSS 2011   5
Mine Face Scanning




                               [Nieto, Viejo and
                               Monteiro, 2010]
                 IGARSS 2011                       6
Hyperspectral sensing for mining
• Geology classification (material identification) still has
  many challenges
• Environmental conditions
    – Illumination, temperature, dust
• Timely data acquisition and processing is needed
    – Algorithms and calibration
• High spectral similarity between (ore-bearing) rock
  types
    – Few, if any, distinctive spectral features




 Dr Sildomar Monteiro                      IGARSS 2011     7
Outline

• Hyperspectral classification using Boosting

• Embedded band selection

• Experiments using iron ore data




 Dr Sildomar Monteiro               IGARSS 2011   8
Hyper-spectral Sensors




                                                                Multispectral
                                                                Hyper-spectral



                                                                  SWIR
                                             VisNIR               970-2500 nm
                                             400-970 nm




                                    Band n


                           Band 6
                         Band 5
                       Band 4
                     Band 3
Dr Sildomar MonteiroBand 2                        IGARSS 2011                    9
                 Band 1
Example of Classification and Spectra




                                                                                                                                             0.6                                                    0.20
              0.6                                                      0.20                                                                              a
                            a                                                        b
                                                                                                                                             0.5
              0.5                                                                                                                                                                                   0.15




                                                                                                                               Reflectance
                                                                       0.15
Reflectance




                                                                                                                                             0.4
              0.4

                                                                                                                                             0.3                                                    0.10
              0.3                                                      0.10

                                                                                                                                             0.2
              0.2                                                                                                                                                                                   0.05
                                                                       0.05
                                                                                                                                             0.1
              0.1

                                                                                                                                             0.0                                                    0.00
              0.0                                                      0.00
                                                                                                                                                   500       750   1000 1250 1500 1750 2000 2250
                      500       750    1000 1250 1500 1750 2000 2250           500       750   1000 1250 1500 1750 2000 2250


                                                                                                                                             1.0                                                     0.5
              1.0 0.6
                          c a
                                                                        0.5
                                                                        0.20         db                                                              c
                                                                                                                                             0.8                                                     0.4
              0.8 0.5                                                   0.4




                                                                                                                               Reflectance
                                                                        0.15
Reflectance
Reflectance




                    0.4                                                                                                                      0.6                                                     0.3
              0.6                                                       0.3
                    0.3                                                 0.10
              0.4                                                       0.2                                                                  0.4                                                     0.2
                    0.2
                                                                        0.05
              0.2 0.1                                                   0.1                                                                  0.2                                                     0.1


                                                                        0.00                                                                 0.0                                                     0.0
              0.0 0.0                                                   0.0
                     500 500
                           750 750 1000 1250 1500 1750 20002250
                                 1000 1250 1500 1750 2000                      500       750 1000 1250 1500 1750 2000 2250                         500       750   1000 1250 1500 1750 2000 2250
                                                             2250               500       750 1000 1250 1500 1750 2000 2250

                                      Wavelength (nm)                                          Wavelength (nm)                                                     Wavelength (nm)
                    1.0
                                        Dr Sildomar Monteiro             0.5
                                                                                                                           IGARSS 2011                                                         10
                                c                                                     d
Hyperspectral Band Selection
• Feature Selection (vs Dimensionality Reduction)
       – Remove correlated inputs
       – Physical interpretation (band wavelengths)
•    Faster data processing
•    Possible faster data acquisition
•    Can be tailored to application
•    Indicate multispectral bands




    Dr Sildomar Monteiro                   IGARSS 2011   11
Boosting
• Sound theoretical foundation
    – Additive Logistic Regression [Friedman, 2000]
• Empirical studies show that boosting
    – Yields small classification error rates
    – Is very resilient to overfitting
• State-of-the-art results in many applications, e.g. face
  recognition in computer vision
• The idea of Boosting is to train many “weak” learners
  on various distributions (or set of weights) of the input
  data and then combine the resulting classifiers into a
  single “committee”


 Dr Sildomar Monteiro                      IGARSS 2011    12
Decision Trees
• Advantages:
    – Robustness and interpretability
• Disadvantages
    – Low accuracy and high variance
• Binary decision trees                                  (x )
     f (x , , , a,b ) a (x )               b
                                                    b           a   b
• Boosted trees
    – Accurate, robust and interpretable
                     M                
       G ( x)  sign    m f m ( x ) 
                      m1                                         m
                                                               
                                                             2 3
                                                            1
 Dr Sildomar Monteiro                      IGARSS 2011                   13
Embedded Feature Selection
• Relative Importance of input variables
                                                                    1
                                         ˆ
                                         F (x )
                                                                        2

                        Ij         Ex           . varx x j
                                          xj

• Approximation for decision trees (heuristic)
  [Friedman, 1999]
                                         J 1
                             ˆ
                             I j2 (T )         ˆ
                                               it2 ( (t )      j)
                                         t 1

• Least-squares improvement criterion
            2          wl wr                                                2
           i Rl , Rr          yl yr
                      wl w r

 Dr Sildomar Monteiro                                       IGARSS 2011         14
Embedded Feature Selection (cont.)
• Boosted Decision Trees
                                   M
                        ˆ      1         ˆ
                        I j2             I j2 Tm
                               M   m 1



• The Multi-class case
                                    K
                          ˆ    1          ˆ
                          Ij              I jk
                               K    k 1




 Dr Sildomar Monteiro                              IGARSS 2011   15
Experiments
• Hyperspectral data acquired using a field
  spectrometer (ASD)
    – 429 bands (same as hyperspectral camera)
    – Wavelengths from 350 nm to 2500 nm
• Samples of ore-bearing rocks
    – Martite, goethite, kaolinite, etc (total 9 classes)
    – Different illumination and physical conditions (direct sunlight,
      shadow and viewing angles)
• Methodology of experiments
    – Metrics: accuracy, precision, recall, F, Kappa, AUC
    – 4-fold cross-validation




 Dr Sildomar Monteiro                      IGARSS 2011               16
Hyperspectral data set




Dr Sildomar Monteiro              IGARSS 2011   17
Information in spectra

                       samples_644-17_1_00000.asd.ref                       samples_644-17_1_00035.asd.ref
           0.8
                 VisNIR                                            SWIR
           0.7

           0.6

           0.5

           0.4
 Reflectance




           0.3

           0.2

           0.1

           0.0
                 500         750         1000           1250         1500    1750         2000          2250   2500
                                                               Wavelength
Dr Sildomar Monteiro                                                        IGARSS 2011                               18
Experimental Results: 9 rock types
• Relative importance of features
                                100
 Relative importance (%)




                                 80

                                 60

                                 40

                                 20

                                  0
                                  400   600    800   1000   1200        1400       1600   1800     2000   2200   2400
                                                                   Wavelength (nm)

• Normalized count of features
                                100
 Normalized feature count (%)




                                80

                                60

                                40

                                20

                                 0
                                 400    600    800   1000   1200        1400       1600   1800     2000   2200   2400
                                                                   Wavelength (nm)



             Dr Sildomar Monteiro                                                         IGARSS 2011                   19
Experimental Results: 9 rock types
• Classification performance of selected features
  0.9000

  0.8000

  0.7000

  0.6000

  0.5000                                                 Relative Importance

  0.4000                                                 Normalized Count

  0.3000

  0.2000

  0.1000
              Accuracy   F-score   Kappa     AUC




 Dr Sildomar Monteiro                      IGARSS 2011                         20
Experimental Results
• All 9 classes




• Martite




 Dr Sildomar Monteiro              IGARSS 2011   21
Summary
• Boosting increases the performance of decision trees
  while keeping model interpretability
• We presented two approaches to perform feature
  selection using boosted decision trees
• Calculating the relative importance of features was
  more efficient than the counting of features
• The reduced set is able to predict the classes
  accurately, and more efficiently than using all features




 Dr Sildomar Monteiro              IGARSS 2011          22
Conclusions
• The standard learning procedure of boosted decision
  trees can perform feature selection automatically
• The feature selection is embedded in the internal
  structure of the model, no need for extra parameters
  or separate selection algorithms
• Instability of the models can be an issue
• Future work: how to determine the optimal number of
  features (using statistical tests)




 Dr Sildomar Monteiro            IGARSS 2011         23
When Things Don’t Work...




Dr Sildomar Monteiro              IGARSS 2011   24

More Related Content

PDF
Brief survey on Three-Dimensional Displays
PDF
Miltenburg M Sc Presentation Tu Delft
PPTX
MRV – SISA
PPTX
Salt power
PDF
9th ICCS Noordwijkerhout
PDF
Characteristics of the kinase mutant TPK2 in bioreactors
PPTX
Natalia Restrepo-Coupe_Remotely-sensed photosynthetic phenology and ecosystem...
PDF
Shape contexts
Brief survey on Three-Dimensional Displays
Miltenburg M Sc Presentation Tu Delft
MRV – SISA
Salt power
9th ICCS Noordwijkerhout
Characteristics of the kinase mutant TPK2 in bioreactors
Natalia Restrepo-Coupe_Remotely-sensed photosynthetic phenology and ecosystem...
Shape contexts

What's hot (16)

PDF
Hulett david
XLS
Fctcp Chw Trends 2 12 2007 10 2
PDF
Aslo 2012
PDF
E timmins schiffman_nsa2013
PDF
Quacera Dow Jones Early Warning Reports Sept 28, 2012
PDF
Compressive sensing for transient analsyis
PDF
Models of Synaptic Transmission (2)
PDF
The Origin of Diversity - Thinking with Chaotic Walk
PPTX
Energy scenario and water productivity of maize based CS under CA practices i...
PDF
Quacera Early Warning Report July 30th
PDF
200081003 Friday Food@IBBT
PDF
Radiometry and Photometry: LED Fundamentals
PDF
Ppt compressed sensing a tutorial
PDF
Poster presentation
PDF
Leo mikkonen optiscan
PDF
2011 depestele fdi_ijms_quantifying-causes-of-discard-variability
Hulett david
Fctcp Chw Trends 2 12 2007 10 2
Aslo 2012
E timmins schiffman_nsa2013
Quacera Dow Jones Early Warning Reports Sept 28, 2012
Compressive sensing for transient analsyis
Models of Synaptic Transmission (2)
The Origin of Diversity - Thinking with Chaotic Walk
Energy scenario and water productivity of maize based CS under CA practices i...
Quacera Early Warning Report July 30th
200081003 Friday Food@IBBT
Radiometry and Photometry: LED Fundamentals
Ppt compressed sensing a tutorial
Poster presentation
Leo mikkonen optiscan
2011 depestele fdi_ijms_quantifying-causes-of-discard-variability
Ad

Viewers also liked (12)

PPT
IGARSS_2011_MARPU_3.ppt
PDF
MINIMUM ENDMEMBER-WISE DISTANCE CONSTRAINED NONNEGATIVE MATRIX FACTORIZATION ...
PPT
Igarss1792_v2.ppt
PPT
Jose_TH1_T09_5.ppt
PDF
上海必和 Advances in-hyperspectral_imaging_3-08超光谱高光谱多光谱
PDF
lesson 2 digital data acquisition and data processing
PDF
Hyperspectral Image Reduction
PDF
thenkabail-uav-germany-final1b
PPT
Subspace_Discriminant_Approach_Hyperspectral.ppt
PDF
ATI Professional Development Technical Training Short Course Sampler on Hyper...
PDF
Remote sensing application in agriculture & forestry_Dr Menon A R R (The Kera...
PPTX
Slideshare ppt
IGARSS_2011_MARPU_3.ppt
MINIMUM ENDMEMBER-WISE DISTANCE CONSTRAINED NONNEGATIVE MATRIX FACTORIZATION ...
Igarss1792_v2.ppt
Jose_TH1_T09_5.ppt
上海必和 Advances in-hyperspectral_imaging_3-08超光谱高光谱多光谱
lesson 2 digital data acquisition and data processing
Hyperspectral Image Reduction
thenkabail-uav-germany-final1b
Subspace_Discriminant_Approach_Hyperspectral.ppt
ATI Professional Development Technical Training Short Course Sampler on Hyper...
Remote sensing application in agriculture & forestry_Dr Menon A R R (The Kera...
Slideshare ppt
Ad

Similar to ST.Monteiro-EmbeddedFeatureSelection.pdf (20)

PDF
Sustainable Questions
PPT
Review solar prediction iea 07-06
PDF
Color Imaging Lab Research Interests 2010
PDF
Why we don’t know how many colors there are
PDF
White.p.johnson.k
PDF
2.overview of recent_technology_trends_in_energy-efficient_lighting
PPTX
Corey Bradshaw_Assessing bias in extinction predictions from species-area rel...
PDF
PREVISÃO SURF CARCAVELOS
PDF
Crooks - Seamless cavities
KEY
QMC-based Shape Fingerprints
PPTX
What funding? Building an multi- and interdisciplinary research program in Ca...
PPT
Ispra 2007 luis martín2
PDF
Previsão surf carcavelos
PDF
PREVISÃO SURF CARCAVELOS
PDF
PREVISÃO SURF CARCAVELOS
PPTX
Fluke 810-vibration-tester
PDF
Abaco smith
PDF
Qll Communications
PDF
Sattose 2011
Sustainable Questions
Review solar prediction iea 07-06
Color Imaging Lab Research Interests 2010
Why we don’t know how many colors there are
White.p.johnson.k
2.overview of recent_technology_trends_in_energy-efficient_lighting
Corey Bradshaw_Assessing bias in extinction predictions from species-area rel...
PREVISÃO SURF CARCAVELOS
Crooks - Seamless cavities
QMC-based Shape Fingerprints
What funding? Building an multi- and interdisciplinary research program in Ca...
Ispra 2007 luis martín2
Previsão surf carcavelos
PREVISÃO SURF CARCAVELOS
PREVISÃO SURF CARCAVELOS
Fluke 810-vibration-tester
Abaco smith
Qll Communications
Sattose 2011

More from grssieee (20)

PDF
Tangent height accuracy of Superconducting Submillimeter-Wave Limb-Emission S...
PDF
SEGMENTATION OF POLARIMETRIC SAR DATA WITH A MULTI-TEXTURE PRODUCT MODEL
PPTX
TWO-POINT STATISTIC OF POLARIMETRIC SAR DATA TWO-POINT STATISTIC OF POLARIMET...
PPT
THE SENTINEL-1 MISSION AND ITS APPLICATION CAPABILITIES
PPTX
GMES SPACE COMPONENT:PROGRAMMATIC STATUS
PPTX
PROGRESSES OF DEVELOPMENT OF CFOSAT SCATTEROMETER
PPT
DEVELOPMENT OF ALGORITHMS AND PRODUCTS FOR SUPPORTING THE ITALIAN HYPERSPECTR...
PPT
EO-1/HYPERION: NEARING TWELVE YEARS OF SUCCESSFUL MISSION SCIENCE OPERATION A...
PPT
EO-1/HYPERION: NEARING TWELVE YEARS OF SUCCESSFUL MISSION SCIENCE OPERATION A...
PPT
EO-1/HYPERION: NEARING TWELVE YEARS OF SUCCESSFUL MISSION SCIENCE OPERATION A...
PDF
Test
PPT
test 34mb wo animations
PPT
Test 70MB
PPT
Test 70MB
PDF
2011_Fox_Tax_Worksheets.pdf
PPT
DLR open house
PPT
DLR open house
PPT
DLR open house
PPT
Tana_IGARSS2011.ppt
PPT
Solaro_IGARSS_2011.ppt
Tangent height accuracy of Superconducting Submillimeter-Wave Limb-Emission S...
SEGMENTATION OF POLARIMETRIC SAR DATA WITH A MULTI-TEXTURE PRODUCT MODEL
TWO-POINT STATISTIC OF POLARIMETRIC SAR DATA TWO-POINT STATISTIC OF POLARIMET...
THE SENTINEL-1 MISSION AND ITS APPLICATION CAPABILITIES
GMES SPACE COMPONENT:PROGRAMMATIC STATUS
PROGRESSES OF DEVELOPMENT OF CFOSAT SCATTEROMETER
DEVELOPMENT OF ALGORITHMS AND PRODUCTS FOR SUPPORTING THE ITALIAN HYPERSPECTR...
EO-1/HYPERION: NEARING TWELVE YEARS OF SUCCESSFUL MISSION SCIENCE OPERATION A...
EO-1/HYPERION: NEARING TWELVE YEARS OF SUCCESSFUL MISSION SCIENCE OPERATION A...
EO-1/HYPERION: NEARING TWELVE YEARS OF SUCCESSFUL MISSION SCIENCE OPERATION A...
Test
test 34mb wo animations
Test 70MB
Test 70MB
2011_Fox_Tax_Worksheets.pdf
DLR open house
DLR open house
DLR open house
Tana_IGARSS2011.ppt
Solaro_IGARSS_2011.ppt

Recently uploaded (20)

PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
Heart disease approach using modified random forest and particle swarm optimi...
PDF
gpt5_lecture_notes_comprehensive_20250812015547.pdf
PDF
Mushroom cultivation and it's methods.pdf
PDF
August Patch Tuesday
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Encapsulation theory and applications.pdf
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
A comparative analysis of optical character recognition models for extracting...
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PPTX
Machine Learning_overview_presentation.pptx
PDF
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PPTX
SOPHOS-XG Firewall Administrator PPT.pptx
PDF
Empathic Computing: Creating Shared Understanding
PPTX
cloud_computing_Infrastucture_as_cloud_p
PDF
Assigned Numbers - 2025 - Bluetooth® Document
PPTX
OMC Textile Division Presentation 2021.pptx
Programs and apps: productivity, graphics, security and other tools
Heart disease approach using modified random forest and particle swarm optimi...
gpt5_lecture_notes_comprehensive_20250812015547.pdf
Mushroom cultivation and it's methods.pdf
August Patch Tuesday
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Encapsulation theory and applications.pdf
Building Integrated photovoltaic BIPV_UPV.pdf
A comparative analysis of optical character recognition models for extracting...
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Machine Learning_overview_presentation.pptx
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
Reach Out and Touch Someone: Haptics and Empathic Computing
SOPHOS-XG Firewall Administrator PPT.pptx
Empathic Computing: Creating Shared Understanding
cloud_computing_Infrastucture_as_cloud_p
Assigned Numbers - 2025 - Bluetooth® Document
OMC Textile Division Presentation 2021.pptx

ST.Monteiro-EmbeddedFeatureSelection.pdf

  • 1. Embedded Feature Selection of Hyperspectral Bands with Boosted Decision Trees Sildomar Monteiro and Richard Murphy The University of Sydney
  • 2. Rio Tinto Centre for Mine Automation • Totally Autonomous Mine in 10 years: – Brings together all elements of systems, perception, machine learning, data fusion and more – A grand challenge for Field Robotics • Driven by safety, predictability and efficiency Dr Sildomar Monteiro IGARSS 2011 2 2
  • 3. Goal: Mine Picture Compilation • Provide a complete and accurate model of the mine – Mine planning and better prediction outcomes • Maintain and update a multi-scale probabilistic representation – Geology – Geometry – Equipment – And other properties of interest for the mining process Dr Sildomar Monteiro IGARSS 2011 3
  • 4. Today Today Geology Feedback to Batch Floor mapping using ripped trench sections Cone logging Dr Sildomar Monteiro IGARSS 2011 4 4
  • 5. Geology (ground-truth) Dr Sildomar Monteiro IGARSS 2011 5
  • 6. Mine Face Scanning [Nieto, Viejo and Monteiro, 2010] IGARSS 2011 6
  • 7. Hyperspectral sensing for mining • Geology classification (material identification) still has many challenges • Environmental conditions – Illumination, temperature, dust • Timely data acquisition and processing is needed – Algorithms and calibration • High spectral similarity between (ore-bearing) rock types – Few, if any, distinctive spectral features Dr Sildomar Monteiro IGARSS 2011 7
  • 8. Outline • Hyperspectral classification using Boosting • Embedded band selection • Experiments using iron ore data Dr Sildomar Monteiro IGARSS 2011 8
  • 9. Hyper-spectral Sensors Multispectral Hyper-spectral SWIR VisNIR 970-2500 nm 400-970 nm Band n Band 6 Band 5 Band 4 Band 3 Dr Sildomar MonteiroBand 2 IGARSS 2011 9 Band 1
  • 10. Example of Classification and Spectra 0.6 0.20 0.6 0.20 a a b 0.5 0.5 0.15 Reflectance 0.15 Reflectance 0.4 0.4 0.3 0.10 0.3 0.10 0.2 0.2 0.05 0.05 0.1 0.1 0.0 0.00 0.0 0.00 500 750 1000 1250 1500 1750 2000 2250 500 750 1000 1250 1500 1750 2000 2250 500 750 1000 1250 1500 1750 2000 2250 1.0 0.5 1.0 0.6 c a 0.5 0.20 db c 0.8 0.4 0.8 0.5 0.4 Reflectance 0.15 Reflectance Reflectance 0.4 0.6 0.3 0.6 0.3 0.3 0.10 0.4 0.2 0.4 0.2 0.2 0.05 0.2 0.1 0.1 0.2 0.1 0.00 0.0 0.0 0.0 0.0 0.0 500 500 750 750 1000 1250 1500 1750 20002250 1000 1250 1500 1750 2000 500 750 1000 1250 1500 1750 2000 2250 500 750 1000 1250 1500 1750 2000 2250 2250 500 750 1000 1250 1500 1750 2000 2250 Wavelength (nm) Wavelength (nm) Wavelength (nm) 1.0 Dr Sildomar Monteiro 0.5 IGARSS 2011 10 c d
  • 11. Hyperspectral Band Selection • Feature Selection (vs Dimensionality Reduction) – Remove correlated inputs – Physical interpretation (band wavelengths) • Faster data processing • Possible faster data acquisition • Can be tailored to application • Indicate multispectral bands Dr Sildomar Monteiro IGARSS 2011 11
  • 12. Boosting • Sound theoretical foundation – Additive Logistic Regression [Friedman, 2000] • Empirical studies show that boosting – Yields small classification error rates – Is very resilient to overfitting • State-of-the-art results in many applications, e.g. face recognition in computer vision • The idea of Boosting is to train many “weak” learners on various distributions (or set of weights) of the input data and then combine the resulting classifiers into a single “committee” Dr Sildomar Monteiro IGARSS 2011 12
  • 13. Decision Trees • Advantages: – Robustness and interpretability • Disadvantages – Low accuracy and high variance • Binary decision trees (x ) f (x , , , a,b ) a (x ) b b a b • Boosted trees – Accurate, robust and interpretable M  G ( x)  sign    m f m ( x )   m1  m  2 3 1 Dr Sildomar Monteiro IGARSS 2011 13
  • 14. Embedded Feature Selection • Relative Importance of input variables 1 ˆ F (x ) 2 Ij Ex . varx x j xj • Approximation for decision trees (heuristic) [Friedman, 1999] J 1 ˆ I j2 (T ) ˆ it2 ( (t ) j) t 1 • Least-squares improvement criterion 2 wl wr 2 i Rl , Rr yl yr wl w r Dr Sildomar Monteiro IGARSS 2011 14
  • 15. Embedded Feature Selection (cont.) • Boosted Decision Trees M ˆ 1 ˆ I j2 I j2 Tm M m 1 • The Multi-class case K ˆ 1 ˆ Ij I jk K k 1 Dr Sildomar Monteiro IGARSS 2011 15
  • 16. Experiments • Hyperspectral data acquired using a field spectrometer (ASD) – 429 bands (same as hyperspectral camera) – Wavelengths from 350 nm to 2500 nm • Samples of ore-bearing rocks – Martite, goethite, kaolinite, etc (total 9 classes) – Different illumination and physical conditions (direct sunlight, shadow and viewing angles) • Methodology of experiments – Metrics: accuracy, precision, recall, F, Kappa, AUC – 4-fold cross-validation Dr Sildomar Monteiro IGARSS 2011 16
  • 17. Hyperspectral data set Dr Sildomar Monteiro IGARSS 2011 17
  • 18. Information in spectra samples_644-17_1_00000.asd.ref samples_644-17_1_00035.asd.ref 0.8 VisNIR SWIR 0.7 0.6 0.5 0.4 Reflectance 0.3 0.2 0.1 0.0 500 750 1000 1250 1500 1750 2000 2250 2500 Wavelength Dr Sildomar Monteiro IGARSS 2011 18
  • 19. Experimental Results: 9 rock types • Relative importance of features 100 Relative importance (%) 80 60 40 20 0 400 600 800 1000 1200 1400 1600 1800 2000 2200 2400 Wavelength (nm) • Normalized count of features 100 Normalized feature count (%) 80 60 40 20 0 400 600 800 1000 1200 1400 1600 1800 2000 2200 2400 Wavelength (nm) Dr Sildomar Monteiro IGARSS 2011 19
  • 20. Experimental Results: 9 rock types • Classification performance of selected features 0.9000 0.8000 0.7000 0.6000 0.5000 Relative Importance 0.4000 Normalized Count 0.3000 0.2000 0.1000 Accuracy F-score Kappa AUC Dr Sildomar Monteiro IGARSS 2011 20
  • 21. Experimental Results • All 9 classes • Martite Dr Sildomar Monteiro IGARSS 2011 21
  • 22. Summary • Boosting increases the performance of decision trees while keeping model interpretability • We presented two approaches to perform feature selection using boosted decision trees • Calculating the relative importance of features was more efficient than the counting of features • The reduced set is able to predict the classes accurately, and more efficiently than using all features Dr Sildomar Monteiro IGARSS 2011 22
  • 23. Conclusions • The standard learning procedure of boosted decision trees can perform feature selection automatically • The feature selection is embedded in the internal structure of the model, no need for extra parameters or separate selection algorithms • Instability of the models can be an issue • Future work: how to determine the optimal number of features (using statistical tests) Dr Sildomar Monteiro IGARSS 2011 23
  • 24. When Things Don’t Work... Dr Sildomar Monteiro IGARSS 2011 24