Genetic Programming-based Evolutionary Feature Construction for Heterogeneous Ensemble Learning (IEEE TEVC)

Genetic Programming-based Evolutionary
Feature Construction for Heterogeneous
Ensemble Learning (IEEE TEVC)
Hengzhe Zhang
Supervisor: Mengjie Zhang, Bing Xue, Qi Chen, Aimin Zhou (ECNU)
Victoria University of Wellington
18/07/2023

Table of Contents
1 Background
2 Algorithm Framework
3 Experimental Results
4 Conclusions
1 15

Evolutionary Feature Construction
The general idea of feature construction is to construct a set of new features
{ϕ1, . . . , ϕm} that improve the learning performance on a given dataset
{{x1, y1}, . . . , {xn, yn}} compared to learning on the original features
{x1, . . . , xp}.
Genetic programming (GP) has been widely used for automatic feature
construction due to its flexible representation and gradient-free search
mechanism.
(a) Feature Construction on Linear Regression (b) New Feature Space
2 15

GP for Ensemble Learning
Motivation: An ensemble of multiple simple/weak GP trees is better than a
single complex/strong GP tree.
GP is naturally suited for ensemble learning because it can generate a diverse
set of candidate solutions (models) through genetic operations in a single run 1.
Multi-modal Landscape on Training Data
1
H. Zhang, A. Zhou, et al. "An Evolutionary Forest for Regression," in IEEE TEVC, 2022.
3 15

Research Objectives
Key Questions in Heterogenous Ensemble Learning:
How to define base learners? (Decision Tree or Linear Model?)
How to select base learners and features of the final model? (Top-N?)
How to search features efficiently? (Guided Mutation?)
4 15

Base Learner
Motivation:
Decision trees are good at fitting piecewise data.
Linear regression is good at fitting continuous data.
Genetic programming is good at constructing features.
How to combine them?
Combine decision tree and linear regression using gradient boosting.
Use GP for feature construction.
Feature construction on a mixed base learner.
5 15

Base Learner
Gradient boosting:
Train a linear regression model first.
Learn the residual using a decision tree.
Illustration of the heterogeneous base learner.
6 15

Greedy Ensemble Selection
Why select a subset?
Not all learners in an ensemble model are accurate and diverse.
How to select a subset:
Select Top-5 models.
Select a model that minimizes training error on top of selected models.
Repeat step 2 until reaching a termination criterion.
Ensemble Selection
7 15

Variable Selection
Why need terminal variable selection?
Not all variables in training data are useful!
How to select?
Calculate the importance of each constructed feature.
Calculate the relative frequency of all variables.
Weight frequency by feature importance values.
Variable Selection
8 15

R2
Scores
SR-Forest is the best on average in terms of test R2 scores.
í í
R27HVW6FRUH
65)RUHVW
()

6%3*3

2SHURQ
*3*20($SRF
.7%RRVW

)($7
;*%
*3*20($E

(3/(;

*3*20($
$GD%RRVW
5DQGRP)RUHVW

$)3B)(

$)3

,7($
/*%0
.HUQHO5LGJH

'65

JSOHDUQ
0/3
/LQHDU

05*3

%65

));

$,)HQPDQ
'DWD6XEVHW
5HDO:RUOG'DWDVHW
6QWKHWLF'DWDVHW
$OO
Average R2
scores on 120 regression datasets.
9 15

Running Time
The cost of running time for SR-Forest ranks in the middle among algorithms.

7UDLQLQJ7LPHV

/LQHDU
/*%0
0/3
.HUQHO5LGJH
$GD%RRVW
5DQGRP)RUHVW
;*%

));
.7%RRVW

2SHURQ

$)3
()

$)3B)(
65)RUHVW

)($7

*3*20($

,7($

(3/(;
*3*20($E
*3*20($SRF

%65

JSOHDUQ

'65

$,)HQPDQ

6%3*3

05*3
'DWD6XEVHW
5HDO:RUOG'DWDVHW
6QWKHWLF'DWDVHW
$OO
Average training time on 120 regression datasets.
10 15

Heterogeneous Base Learner
Feature construction on SR-Tree (DT+LR) is better than construction on Ridge
(LR) in 22 out of 106 datasets.
Feature construction on SR-Tree (DT+LR) is better than construction on RDT in
82 out of 106 datasets.
Statistical Comparison on 106 regression datasets.
11 15

Ensemble Selection
Ensemble selection can reduce ensemble size from 100 to 30 on average.
Ensemble selection delivers better performance in 37 datasets.

(QVHPEOH6L]H

RXQW
(a) Ensemble Size
a

(b) Statistical Comparison of R2
on 106
datasets.
12 15

Variable Selection
Importance-based Terminal selection (GM) delivers better performance in 30
datasets.
Selecting only terminal variables is sufficient.
(a) Importance-based GM outperforms
frequency-based GM in 15 datasets.
a

(b) GM outperforms Random in 30 datasets
on R2
.
13 15

Constructed Features
Feature construction on heterogeneous ensemble can make it outperform
other machine learning algorithms.
Thanks to base learners providing feature importance values, we can visualize
which constructed features are important.
6
5

7
U
H
H
6
5

%
D
J
J
L
Q
J
*
3
5
.
1
1
5
L
G
J
H
'
7
5
)
(
7
$
G
D
%
R
R
V
W
*
%
'
7
;
*
%
R
R
V
W
/
L
J
K
W
*
%
0
6
5

)
R
U
H
V
W
0RGHO

6FRUH
R
2

0HDQ6FRUH
(a) Performance Comparison

)HDWXUH,PSRUWDQFH
max(MCW, RA)
√SS2 + 1
max(MCW, RA)
MCW− SS
max(MCW, RA, SS)
log(|max(SS, RA− 1)| + 1)
2 ⋅ MCW3
−SS+ max(MCW, RA)
SS
√MCW2 + 1
log(|SS⋅ max(CBL, MCW)| + 1)
log(|MCW| + 1)
RA⋅ (MCW+ 1)
MCW
max(2, MCW, RA, SS)
CBL⋅ (RL− 1)
RA
√SS2 + 1
)HDWXUH1DPH
(b) Feature Importance
14 15

Genetic Programming-based Evolutionary Feature Construction for Heterogeneous Ensemble Learning (IEEE TEVC)

More Related Content

Similar to Genetic Programming-based Evolutionary Feature Construction for Heterogeneous Ensemble Learning (IEEE TEVC) (20)

Recently uploaded (20)

Genetic Programming-based Evolutionary Feature Construction for Heterogeneous Ensemble Learning (IEEE TEVC)