SlideShare a Scribd company logo
Genetic Programming-based Evolutionary
Feature Construction for Heterogeneous
Ensemble Learning (IEEE TEVC)
Hengzhe Zhang
Supervisor: Mengjie Zhang, Bing Xue, Qi Chen, Aimin Zhou (ECNU)
Victoria University of Wellington
18/07/2023
Table of Contents
1 Background
2 Algorithm Framework
3 Experimental Results
4 Conclusions
1 15
Background
Evolutionary Feature Construction
The general idea of feature construction is to construct a set of new features
{ϕ1, . . . , ϕm} that improve the learning performance on a given dataset
{{x1, y1}, . . . , {xn, yn}} compared to learning on the original features
{x1, . . . , xp}.
Genetic programming (GP) has been widely used for automatic feature
construction due to its flexible representation and gradient-free search
mechanism.
(a) Feature Construction on Linear Regression (b) New Feature Space
2 15
GP for Ensemble Learning
Motivation: An ensemble of multiple simple/weak GP trees is better than a
single complex/strong GP tree.
GP is naturally suited for ensemble learning because it can generate a diverse
set of candidate solutions (models) through genetic operations in a single run 1.
Multi-modal Landscape on Training Data
1
H. Zhang, A. Zhou, et al. "An Evolutionary Forest for Regression," in IEEE TEVC, 2022.
3 15
Research Objectives
Key Questions in Heterogenous Ensemble Learning:
How to define base learners? (Decision Tree or Linear Model?)
How to select base learners and features of the final model? (Top-N?)
How to search features efficiently? (Guided Mutation?)
4 15
Algorithm Framework
Base Learner
Motivation:
Decision trees are good at fitting piecewise data.
Linear regression is good at fitting continuous data.
Genetic programming is good at constructing features.
How to combine them?
Combine decision tree and linear regression using gradient boosting.
Use GP for feature construction.
Feature construction on a mixed base learner.
5 15
Base Learner
Gradient boosting:
Train a linear regression model first.
Learn the residual using a decision tree.
Illustration of the heterogeneous base learner.
6 15
Greedy Ensemble Selection
Why select a subset?
Not all learners in an ensemble model are accurate and diverse.
How to select a subset:
Select Top-5 models.
Select a model that minimizes training error on top of selected models.
Repeat step 2 until reaching a termination criterion.
Ensemble Selection
7 15
Variable Selection
Why need terminal variable selection?
Not all variables in training data are useful!
How to select?
Calculate the importance of each constructed feature.
Calculate the relative frequency of all variables.
Weight frequency by feature importance values.
Variable Selection
8 15
Experimental Results
R2
Scores
SR-Forest is the best on average in terms of test R2 scores.
í í      
R27HVW6FRUH
65)RUHVW
()

6%3*3

2SHURQ
*3*20($SRF
.7%RRVW

)($7
;*%
*3*20($E

(3/(;

*3*20($
$GD%RRVW
5DQGRP)RUHVW

$)3B)(

$)3

,7($
/*%0
.HUQHO5LGJH

'65

JSOHDUQ
0/3
/LQHDU

05*3

%65

));

$,)HQPDQ
'DWD6XEVHW
5HDO:RUOG'DWDVHW
6QWKHWLF'DWDVHW
$OO
Average R2
scores on 120 regression datasets.
9 15
Running Time
The cost of running time for SR-Forest ranks in the middle among algorithms.












7UDLQLQJ7LPHV
/LQHDU
/*%0
0/3
.HUQHO5LGJH
$GD%RRVW
5DQGRP)RUHVW
;*%

));
.7%RRVW

2SHURQ

$)3
()

$)3B)(
65)RUHVW

)($7

*3*20($

,7($

(3/(;
*3*20($E
*3*20($SRF

%65

JSOHDUQ

'65

$,)HQPDQ

6%3*3

05*3
'DWD6XEVHW
5HDO:RUOG'DWDVHW
6QWKHWLF'DWDVHW
$OO
Average training time on 120 regression datasets.
10 15
Heterogeneous Base Learner
Feature construction on SR-Tree (DT+LR) is better than construction on Ridge
(LR) in 22 out of 106 datasets.
Feature construction on SR-Tree (DT+LR) is better than construction on RDT in
82 out of 106 datasets.
Statistical Comparison on 106 regression datasets.
11 15
Ensemble Selection
Ensemble selection can reduce ensemble size from 100 to 30 on average.
Ensemble selection delivers better performance in 37 datasets.
       
(QVHPEOH6L]H






RXQW
(a) Ensemble Size
  a







(b) Statistical Comparison of R2
on 106
datasets.
12 15
Variable Selection
Importance-based Terminal selection (GM) delivers better performance in 30
datasets.
Selecting only terminal variables is sufficient.
(a) Importance-based GM outperforms
frequency-based GM in 15 datasets.
  a



 


(b) GM outperforms Random in 30 datasets
on R2
.
13 15
Constructed Features
Feature construction on heterogeneous ensemble can make it outperform
other machine learning algorithms.
Thanks to base learners providing feature importance values, we can visualize
which constructed features are important.
6
5

7
U
H
H
6
5

%
D
J
J
L
Q
J
*
3
5
.
1
1
5
L
G
J
H
'
7
5
)
(
7
$
G
D
%
R
R
V
W
*
%
'
7
;
*
%
R
R
V
W
/
L
J
K
W
*
%
0
6
5

)
R
U
H
V
W
0RGHO








6FRUH
R
2
0HDQ6FRUH
(a) Performance Comparison
       
)HDWXUH,PSRUWDQFH
max(MCW, RA)
√SS2 + 1
max(MCW, RA)
MCW− SS
max(MCW, RA, SS)
log(|max(SS, RA− 1)| + 1)
2 ⋅ MCW3
−SS+ max(MCW, RA)
SS
√MCW2 + 1
log(|SS⋅ max(CBL, MCW)| + 1)
log(|MCW| + 1)
RA⋅ (MCW+ 1)
MCW
max(2, MCW, RA, SS)
CBL⋅ (RL− 1)
RA
√SS2 + 1
)HDWXUH1DPH
(b) Feature Importance
14 15
Conclusions

More Related Content

PDF
A Double Lexicase Selection Operator for Bloat Control in Evolutionary Featur...
PDF
EvoFeat: Genetic Programming-based Feature Engineering Approach to Tabular Da...
PDF
A simple framework for contrastive learning of visual representations
PDF
모듈형 패키지를 활용한 나만의 기계학습 모형 만들기 - 회귀나무모형을 중심으로
PDF
Deep_Learning__INAF_baroncelli.pdf
PDF
Xmeasures - Accuracy evaluation of overlapping and multi-resolution clusterin...
PDF
P0126557 slides
PPTX
Linear Regression Paper Review.pptx
A Double Lexicase Selection Operator for Bloat Control in Evolutionary Featur...
EvoFeat: Genetic Programming-based Feature Engineering Approach to Tabular Da...
A simple framework for contrastive learning of visual representations
모듈형 패키지를 활용한 나만의 기계학습 모형 만들기 - 회귀나무모형을 중심으로
Deep_Learning__INAF_baroncelli.pdf
Xmeasures - Accuracy evaluation of overlapping and multi-resolution clusterin...
P0126557 slides
Linear Regression Paper Review.pptx

Similar to Genetic Programming-based Evolutionary Feature Construction for Heterogeneous Ensemble Learning (IEEE TEVC) (20)

PPT
Improving Analogy Software Effort Estimation using Fuzzy Feature Subset Selec...
PDF
A Review on Prediction of Compressive Strength and Slump by Using Different M...
PPTX
PaperReview_ “Few-shot Graph Classification with Contrastive Loss and Meta-cl...
PPTX
AIML UNIT 4.pptx. IT contains syllabus and full subject
PDF
Pegasus
PDF
A Hierarchical Feature Set optimization for effective code change based Defec...
PDF
LNCS 5050 - Bilevel Optimization and Machine Learning
PPTX
Credit card Fraud detection- Feature Selection.pptx
PPT
Presentation
PPTX
Machine Learning Approach.pptx
PPTX
ML_in_QM_JC_02-10-18
PPTX
250602_JW_labseminar[Graph Contrastive Learning Automated].pptx
PPTX
Wasserstein 1031 thesis [Chung il kim]
PDF
Tutorial rpo
PPTX
13 random forest
PPTX
XGBOOST [Autosaved]12.pptx
PDF
Chap 8. Optimization for training deep models
PPTX
Second Order Heuristics in ACGP
PDF
IRJET- Machine Learning Techniques for Code Optimization
PDF
[update] Introductory Parts of the Book "Dive into Deep Learning"
Improving Analogy Software Effort Estimation using Fuzzy Feature Subset Selec...
A Review on Prediction of Compressive Strength and Slump by Using Different M...
PaperReview_ “Few-shot Graph Classification with Contrastive Loss and Meta-cl...
AIML UNIT 4.pptx. IT contains syllabus and full subject
Pegasus
A Hierarchical Feature Set optimization for effective code change based Defec...
LNCS 5050 - Bilevel Optimization and Machine Learning
Credit card Fraud detection- Feature Selection.pptx
Presentation
Machine Learning Approach.pptx
ML_in_QM_JC_02-10-18
250602_JW_labseminar[Graph Contrastive Learning Automated].pptx
Wasserstein 1031 thesis [Chung il kim]
Tutorial rpo
13 random forest
XGBOOST [Autosaved]12.pptx
Chap 8. Optimization for training deep models
Second Order Heuristics in ACGP
IRJET- Machine Learning Techniques for Code Optimization
[update] Introductory Parts of the Book "Dive into Deep Learning"
Ad

Recently uploaded (20)

PDF
AlphaEarth Foundations and the Satellite Embedding dataset
PDF
An interstellar mission to test astrophysical black holes
PDF
Phytochemical Investigation of Miliusa longipes.pdf
PDF
ELS_Q1_Module-11_Formation-of-Rock-Layers_v2.pdf
PDF
CAPERS-LRD-z9:AGas-enshroudedLittleRedDotHostingaBroad-lineActive GalacticNuc...
PPTX
Protein & Amino Acid Structures Levels of protein structure (primary, seconda...
PDF
. Radiology Case Scenariosssssssssssssss
PPTX
Classification Systems_TAXONOMY_SCIENCE8.pptx
PDF
Biophysics 2.pdffffffffffffffffffffffffff
PPTX
TOTAL hIP ARTHROPLASTY Presentation.pptx
PPTX
ANEMIA WITH LEUKOPENIA MDS 07_25.pptx htggtftgt fredrctvg
PPTX
Introduction to Fisheries Biotechnology_Lesson 1.pptx
PDF
VARICELLA VACCINATION: A POTENTIAL STRATEGY FOR PREVENTING MULTIPLE SCLEROSIS
PPTX
neck nodes and dissection types and lymph nodes levels
PPTX
ECG_Course_Presentation د.محمد صقران ppt
PPTX
ognitive-behavioral therapy, mindfulness-based approaches, coping skills trai...
PPTX
INTRODUCTION TO EVS | Concept of sustainability
PDF
bbec55_b34400a7914c42429908233dbd381773.pdf
PDF
The scientific heritage No 166 (166) (2025)
PPTX
GEN. BIO 1 - CELL TYPES & CELL MODIFICATIONS
AlphaEarth Foundations and the Satellite Embedding dataset
An interstellar mission to test astrophysical black holes
Phytochemical Investigation of Miliusa longipes.pdf
ELS_Q1_Module-11_Formation-of-Rock-Layers_v2.pdf
CAPERS-LRD-z9:AGas-enshroudedLittleRedDotHostingaBroad-lineActive GalacticNuc...
Protein & Amino Acid Structures Levels of protein structure (primary, seconda...
. Radiology Case Scenariosssssssssssssss
Classification Systems_TAXONOMY_SCIENCE8.pptx
Biophysics 2.pdffffffffffffffffffffffffff
TOTAL hIP ARTHROPLASTY Presentation.pptx
ANEMIA WITH LEUKOPENIA MDS 07_25.pptx htggtftgt fredrctvg
Introduction to Fisheries Biotechnology_Lesson 1.pptx
VARICELLA VACCINATION: A POTENTIAL STRATEGY FOR PREVENTING MULTIPLE SCLEROSIS
neck nodes and dissection types and lymph nodes levels
ECG_Course_Presentation د.محمد صقران ppt
ognitive-behavioral therapy, mindfulness-based approaches, coping skills trai...
INTRODUCTION TO EVS | Concept of sustainability
bbec55_b34400a7914c42429908233dbd381773.pdf
The scientific heritage No 166 (166) (2025)
GEN. BIO 1 - CELL TYPES & CELL MODIFICATIONS
Ad

Genetic Programming-based Evolutionary Feature Construction for Heterogeneous Ensemble Learning (IEEE TEVC)

  • 1. Genetic Programming-based Evolutionary Feature Construction for Heterogeneous Ensemble Learning (IEEE TEVC) Hengzhe Zhang Supervisor: Mengjie Zhang, Bing Xue, Qi Chen, Aimin Zhou (ECNU) Victoria University of Wellington 18/07/2023
  • 2. Table of Contents 1 Background 2 Algorithm Framework 3 Experimental Results 4 Conclusions 1 15
  • 4. Evolutionary Feature Construction The general idea of feature construction is to construct a set of new features {ϕ1, . . . , ϕm} that improve the learning performance on a given dataset {{x1, y1}, . . . , {xn, yn}} compared to learning on the original features {x1, . . . , xp}. Genetic programming (GP) has been widely used for automatic feature construction due to its flexible representation and gradient-free search mechanism. (a) Feature Construction on Linear Regression (b) New Feature Space 2 15
  • 5. GP for Ensemble Learning Motivation: An ensemble of multiple simple/weak GP trees is better than a single complex/strong GP tree. GP is naturally suited for ensemble learning because it can generate a diverse set of candidate solutions (models) through genetic operations in a single run 1. Multi-modal Landscape on Training Data 1 H. Zhang, A. Zhou, et al. "An Evolutionary Forest for Regression," in IEEE TEVC, 2022. 3 15
  • 6. Research Objectives Key Questions in Heterogenous Ensemble Learning: How to define base learners? (Decision Tree or Linear Model?) How to select base learners and features of the final model? (Top-N?) How to search features efficiently? (Guided Mutation?) 4 15
  • 8. Base Learner Motivation: Decision trees are good at fitting piecewise data. Linear regression is good at fitting continuous data. Genetic programming is good at constructing features. How to combine them? Combine decision tree and linear regression using gradient boosting. Use GP for feature construction. Feature construction on a mixed base learner. 5 15
  • 9. Base Learner Gradient boosting: Train a linear regression model first. Learn the residual using a decision tree. Illustration of the heterogeneous base learner. 6 15
  • 10. Greedy Ensemble Selection Why select a subset? Not all learners in an ensemble model are accurate and diverse. How to select a subset: Select Top-5 models. Select a model that minimizes training error on top of selected models. Repeat step 2 until reaching a termination criterion. Ensemble Selection 7 15
  • 11. Variable Selection Why need terminal variable selection? Not all variables in training data are useful! How to select? Calculate the importance of each constructed feature. Calculate the relative frequency of all variables. Weight frequency by feature importance values. Variable Selection 8 15
  • 13. R2 Scores SR-Forest is the best on average in terms of test R2 scores. í í R27HVW6FRUH 65)RUHVW () 6%3*3 2SHURQ *3*20($SRF .7%RRVW )($7 ;*% *3*20($E (3/(; *3*20($ $GD%RRVW 5DQGRP)RUHVW $)3B)( $)3 ,7($ /*%0 .HUQHO5LGJH '65 JSOHDUQ 0/3 /LQHDU 05*3 %65 )); $,)HQPDQ 'DWD6XEVHW 5HDO:RUOG'DWDVHW 6QWKHWLF'DWDVHW $OO Average R2 scores on 120 regression datasets. 9 15
  • 14. Running Time The cost of running time for SR-Forest ranks in the middle among algorithms. 7UDLQLQJ7LPHV
  • 16. Heterogeneous Base Learner Feature construction on SR-Tree (DT+LR) is better than construction on Ridge (LR) in 22 out of 106 datasets. Feature construction on SR-Tree (DT+LR) is better than construction on RDT in 82 out of 106 datasets. Statistical Comparison on 106 regression datasets. 11 15
  • 17. Ensemble Selection Ensemble selection can reduce ensemble size from 100 to 30 on average. Ensemble selection delivers better performance in 37 datasets. (QVHPEOH6L]H RXQW (a) Ensemble Size a (b) Statistical Comparison of R2 on 106 datasets. 12 15
  • 18. Variable Selection Importance-based Terminal selection (GM) delivers better performance in 30 datasets. Selecting only terminal variables is sufficient. (a) Importance-based GM outperforms frequency-based GM in 15 datasets. a (b) GM outperforms Random in 30 datasets on R2 . 13 15
  • 19. Constructed Features Feature construction on heterogeneous ensemble can make it outperform other machine learning algorithms. Thanks to base learners providing feature importance values, we can visualize which constructed features are important. 6 5 7 U H H 6 5 % D J J L Q J * 3 5 . 1 1 5 L G J H ' 7 5 ) ( 7 $ G D % R R V W * % ' 7 ; * % R R V W / L J K W * % 0 6 5 ) R U H V W 0RGHO 6FRUH R 2
  • 20. 0HDQ6FRUH (a) Performance Comparison )HDWXUH,PSRUWDQFH max(MCW, RA) √SS2 + 1 max(MCW, RA) MCW− SS max(MCW, RA, SS) log(|max(SS, RA− 1)| + 1) 2 ⋅ MCW3 −SS+ max(MCW, RA) SS √MCW2 + 1 log(|SS⋅ max(CBL, MCW)| + 1) log(|MCW| + 1) RA⋅ (MCW+ 1) MCW max(2, MCW, RA, SS) CBL⋅ (RL− 1) RA √SS2 + 1 )HDWXUH1DPH (b) Feature Importance 14 15
  • 22. Conclusions Feature construction on a heterogeneous ensemble outperforms that on a homogeneous ensemble. Ensemble selection effectively reduces ensemble size while enhancing prediction performance. Utilizing feature importance-based variable selection (guided mutation) improves search effectiveness. Open Source Project: Evolutionary Forest (90 GitHub Stars) 15 / 15
  • 23. Thanks for listening! Email: Hengzhe.zhang@ecs.vuw.ac.nz GitHub Project: https://github.com/hengzhe-zhang/EvolutionaryForest/