SlideShare a Scribd company logo
Leveraging Feature Selection Within TreeNet
OverviewIntroductionThe Case For Feature SelectionMethodologiesCase Study – DMA Analytics Challenge 2007Comparison of ApproachesAdvanced AlgorithmsConclusion – Questions & Answers
The DataLab EnvironmentDataLab USA
Industries Served
The Data Environment
Analytical FrameworkWhen more is not necessarily betterTreeNet Models are naturally more robust than more traditional algorithms.
Without any limitations a TreeNet Model in a typical DM environment can incorporate hundreds of independent variables.
How many of these variables actually provide true informational gain?Not all variables are created equalCertain types of variables can degrade TN model performance.
High order categorical (e.g. State, cluster)
Composite variables(e.g.  risk score, cluster, family composition)Why Not Specialize?Lower number of variables can allow for tighter parameters
Increased number of terminal nodes
Decreased number of observations in minchild
Allowance for more variable interactions (ICL)You want me to build how many models?Brute Force = 2N-1
60 Variables = 1,152,921,504,606,846,975 Models
Processing Time = 730,693,161,740 years
Age of the Universe ≈ 13,730,000,000 years
1/2 will include top variable
1/4 will include top two variables
1/1024 will include top ten variablesFeature SelectionFeature Selection Goal – Efficiently identify the subset of independent variables that maximize model discrimination.
Basic Feature Selection = N x (N+1)/2
60 Variables = 60 + 59 + 58 + … + 1 = 1,830 ModelsFeature Selection - FrameworkThe programmatic development and evaluation of TN batches is a necessity
Performance of initial models dictate the composition of later models.
Too many decision points to require human interaction.
SAS/C#Variable ShavingStepwise removal of variables from model based on variable importance.
Typically starts with an unrestricted model and removes variables until stop condition is met or there are no more variables to remove.
At each step variable with lowest importance is removed.

More Related Content

PPTX
Feature Selection in Machine Learning
PDF
An introduction to variable and feature selection
PPTX
Data Normalization Approaches for Large-scale Biological Studies
PPT
Intro to Feature Selection
PPTX
Metabolomics and Beyond Challenges and Strategies for Next-gen Omic Analyses
PPTX
Wrapper feature selection method
PPT
Strategies for Metabolomics Data Analysis
PDF
A Review on Feature Selection Methods For Classification Tasks
Feature Selection in Machine Learning
An introduction to variable and feature selection
Data Normalization Approaches for Large-scale Biological Studies
Intro to Feature Selection
Metabolomics and Beyond Challenges and Strategies for Next-gen Omic Analyses
Wrapper feature selection method
Strategies for Metabolomics Data Analysis
A Review on Feature Selection Methods For Classification Tasks

What's hot (19)

PPTX
Normalization of Large-Scale Metabolomic Studies 2014
PPT
Prote-OMIC Data Analysis and Visualization
PPT
Advanced strategies for Metabolomics Data Analysis
PPTX
Some statistical concepts relevant to proteomics data analysis
PDF
Automating System Test Case Classification and Prioritization for Use Case-Dr...
PDF
Case Study: Overview of Metabolomic Data Normalization Strategies
PPTX
0 introduction
PPTX
Feature enginnering and selection
PPTX
The Current State of the Art of Regression Testing
PPTX
Feature selection
PPT
Multivarite and network tools for biological data analysis
PPTX
High Dimensional Biological Data Analysis and Visualization
PPT
Decision table
PPTX
Metabolomic Data Analysis Workshop and Tutorials (2014)
PPTX
Automation of (Biological) Data Analysis and Report Generation
PPT
slide->title; ?>
PPTX
Towards Privacy-Preserving Evaluation for Information Retrieval Models over I...
PPTX
4 partial least squares modeling
PPTX
7 network mapping i
Normalization of Large-Scale Metabolomic Studies 2014
Prote-OMIC Data Analysis and Visualization
Advanced strategies for Metabolomics Data Analysis
Some statistical concepts relevant to proteomics data analysis
Automating System Test Case Classification and Prioritization for Use Case-Dr...
Case Study: Overview of Metabolomic Data Normalization Strategies
0 introduction
Feature enginnering and selection
The Current State of the Art of Regression Testing
Feature selection
Multivarite and network tools for biological data analysis
High Dimensional Biological Data Analysis and Visualization
Decision table
Metabolomic Data Analysis Workshop and Tutorials (2014)
Automation of (Biological) Data Analysis and Report Generation
slide->title; ?>
Towards Privacy-Preserving Evaluation for Information Retrieval Models over I...
4 partial least squares modeling
7 network mapping i
Ad

Viewers also liked (20)

PPTX
Tips for Students to Maintain their Motivation
ODP
Jesuitslideshow
PDF
One Un Training Pdf
PPT
Cindy lore de angel
PPT
Cognitivism
DOC
Praktek dokter
PPT
BPCO IPERTENSIONE POLMONARE-PH COPD
PPT
Jesuitslideshow
PDF
Pengurusan bangunan berstrata
PPTX
Corporate training - Mind Programming - Kushalta Academy
PPTX
Mind tools for project managers
PDF
職災勞工保護法 工傷版
PDF
Style Validator at breakout session of TPAC2015
PDF
How ramón gómez de la serna became a twitter fan final con notas
PDF
Put your polling station on the map
DOCX
SAFEGUARDS_MOSCOW_Consultation_With_Govt_18_March_2013_Summary_ENG
PPT
Hackers and Hollywood: Deleted scene 2
PPTX
Evaluation
PPTX
Introhive overview general
PDF
Socialondemand newsondemand purechannelapps_2014
Tips for Students to Maintain their Motivation
Jesuitslideshow
One Un Training Pdf
Cindy lore de angel
Cognitivism
Praktek dokter
BPCO IPERTENSIONE POLMONARE-PH COPD
Jesuitslideshow
Pengurusan bangunan berstrata
Corporate training - Mind Programming - Kushalta Academy
Mind tools for project managers
職災勞工保護法 工傷版
Style Validator at breakout session of TPAC2015
How ramón gómez de la serna became a twitter fan final con notas
Put your polling station on the map
SAFEGUARDS_MOSCOW_Consultation_With_Govt_18_March_2013_Summary_ENG
Hackers and Hollywood: Deleted scene 2
Evaluation
Introhive overview general
Socialondemand newsondemand purechannelapps_2014
Ad

Similar to Leveraging Feature Selection Within TreeNet (20)

PPTX
Feature Engineering Fundamentals Explained.pptx
PPTX
CSL0777-L07.pptx
PDF
PDF
Machine Learning.pdf
PDF
Preprocessing of Low Response Data for Predictive Modeling
PDF
Machine Learning Notes for beginners ,Step by step
PDF
Machine learning Mind Map
PDF
Data mining with weka
PPT
feature selection slides share and types of features selection
PPTX
Data Engineer’s Lunch #67: Machine Learning - Feature Selection
PPTX
Data Engineer's Lunch #67: Machine Learning - Feature Selection
PDF
PyData London 2018 talk on feature selection
PDF
Optimization Technique for Feature Selection and Classification Using Support...
DOCX
Data Analytics Using R - Report
DOCX
Credit Card Marketing Classification Trees Fr.docx
PDF
Customer Churn Analytics using Microsoft R Open
PDF
Prospect Identification from a Credit Database using Regression, Decision Tre...
PPTX
Lecture 6 Feature Selection Techniques in Data Science.pptx
PPTX
Predicting Hospital Readmission Using TreeNet
PPTX
Module 3_ Classification.pptx
Feature Engineering Fundamentals Explained.pptx
CSL0777-L07.pptx
Machine Learning.pdf
Preprocessing of Low Response Data for Predictive Modeling
Machine Learning Notes for beginners ,Step by step
Machine learning Mind Map
Data mining with weka
feature selection slides share and types of features selection
Data Engineer’s Lunch #67: Machine Learning - Feature Selection
Data Engineer's Lunch #67: Machine Learning - Feature Selection
PyData London 2018 talk on feature selection
Optimization Technique for Feature Selection and Classification Using Support...
Data Analytics Using R - Report
Credit Card Marketing Classification Trees Fr.docx
Customer Churn Analytics using Microsoft R Open
Prospect Identification from a Credit Database using Regression, Decision Tre...
Lecture 6 Feature Selection Techniques in Data Science.pptx
Predicting Hospital Readmission Using TreeNet
Module 3_ Classification.pptx

Leveraging Feature Selection Within TreeNet