Voting Based Learning Classifier System for Multi-Label Classification

Voting-Based Learning
Classifier System
for multi-label classification
Kaveh Ahmadi-Abhari (Presenter)
Ali Hamzeh
Sattar Hashemi

IWLCS 2011 – Dublin, Ireland, 13th July 2011

Multi-label Classification

 Single Label Classification
 Exclusive classes: each
example belongs to one
class

 Multi-label Classification
 Each instance can belong
to more than one class

Kaveh Ahmadi-Abhari 2 Shiraz University, Soft Computing Group

Multi-label Classification
Sky
People
 Single Label Classification
 Exclusive classes: each
example belongs to one
class

 Multi-label Classification
 Each instance can belong
to more than one class

Sand


Current Methods

Problem • Transfer problem to a single-
Transformation label classification problem

Algorithm • Adapt single-label classifiers
Adaptation to Solve the problem

[Tsoumakas & Katakis, 2007]

Problem Transformation Approaches
Ex. Label- set
1a λ1
Copy Transformation 1b λ4
2a λ3
2b λ4
3 λ1
4a λ2
Ex. Label- set 4b λ3
4c λ4
1 {λ1 , λ4 }
2 {λ3 , λ4 }
3 {λ1}
4 {λ2 , λ3 , λ4 }
[Tsoumakas et al., 2009]

Algorithm Adaptation Approaches
 Multi-label lazy algorithm
 ML-kNN [Zhang & Zhou, PRJ07]
 Multi-label decision trees
 ADTBoost.MH [DeComité et al. MLDM03]
 Multi-Label C4.5 [Clare & King, LNCS2168]
 Multi-label kernel methods
 Rank-SVM [Elisseeff & Weston, NIPS02]
 ML-SVM [M.R. Boutell, et al. PR04]
 Multi-label text categorization algorithms
 BoosTexter [Schapire & Singer, MLJ00]
 Maximal Margin Labeling [Kazawa et al., NIPS04]
 Probabilistic generative models [McCallum, AAAI99] [Ueda & Saito, NIPS03]
 BP-MLL [Zhang & Zhou, TKDE06]


Motivation

A lot has been done in terms of classifications
using LCSs

Most of these studies have been conducted for
single-label classification problems

Multi-label classification is in its inception [Vallim
et al., IWLCS 08]


Voting Based Learning Classifier System

How can we guide the discovery mechanism
(e.g. evolutionary operators) in LCSs?




Using the prior knowledge gained from
past experiences




past experiences

Training instances vote their matched rules
according to how correct the rule is




past experiences

Training instances vote their matched rules
according to how correct the rule is

Fitness measure


Voting Defining Rule Types

How can the given votes describe the
quality of the rules accurately?

Define different types for the rules such that each of these types
describes the quality status the rule might have.


Rule Types

Example:
in a single-label classification problem, rule types
might be correct or wrong.

Each rule might receive a “correct” or “wrong” vote from each
matched training instance.

A rule receives a combination of “correct” and “wrong” votes from its
matched training instances


Votes as Fitness Measure

• Given votes
• Describe the quality of the rules
• Use as a fitness measure for
guiding the discovery mechanism.

• For example, a rule with more “wrong”
votes, should be discovered with a high
probability to achieve a meaningful rule


Rules Definition

Antecedent / Consequent
###1 / 110
0011 / 001

 Antecedent part matches with the feature vector.
 Consequent part are the classes predicted by the rule.
 One bit for each class in the consequent part.
 Value 1 in the bit indicates existence of the respective class.


VLCS Vote Types for Multi-label Problem

Correct

Wrong Subset
Multi-label
Vote Types for
VLCS

Partial Superset


Multi-Label Simple Dataset

000
111
001
1, 4
110 1, 3

010
2, 4
1, 2
101

011
100

Expand from [Vallim et al., GECCO’ 08]

VLCS Voting Options for Multi-label Problem

 Correct Rules (C) 111
000
001
1, 4
110
1, 3
00# /1001 2, 4 010
1, 2
101
• Is correct when it matches with: 011
• 000 or 100
• 001



 Wrong Rules (W) 111
000
001
1, 4
110
1, 3
0#0/0010 2, 4 010
1, 2
101
• Is wrong when it matches with: 011
• 000 or 100
• 010



 Subset Rules 111
000
001
1, 4
110
1, 3
#01/1000 2, 4 010
1, 2
101
• Is subset when it matches with: 011
• 001 or 100
• 101



 Subset Rules 111
000
001
1, 4
110
1, 3
#01/1000 2, 4 010
1, 2
101
• Is subset when it matches with: 011
• 001 or 100
• 101

Excepted Classes:

1, 4



 Superset Rules 111
000
001
1, 4
110
1, 3
#00/1101 2, 4 010
1, 2
101
• Is superset when it matches with: 011
• 001 or 100
• 101



 Superset Rules 111
000
001
1, 4
110
1, 3
#00/1101 2, 4 010
1, 2
101
• 001 or 100
• 101

Excepted Classes:

1, 4



 Partial-set Rules 111
000
001
1, 4
110
1, 3
#1# / 0110 2, 4 010
1, 2
101
• 010 or 100
• 111



 Partial-set Rules 111
000
001
1, 4
110
1, 3
#1# / 0110 2, 4 010
1, 2
101
• 010 or 100
• 111

Excepted Classes:

2, 4



000
 Rules might receive different votes 111
001
during the time 1, 4
110
1, 3

2, 4 010
1, 2
#0# / 1001 101
011
100



000
001
110
1, 3

2, 4 010
1, 2
#0# / 1001 101
011
100

Is correct for
instance 000



000
001
110
1, 3

2, 4 010
1, 2
#0# / 1001 101
011
100

Is correct for Is partial-set
instance 000 for instance
101


Using Stored Prior Knowledge
Consider a rule that all received votes
are superset } Information

}
The rule is covering an appropriate area
of the problem

Inference
The rule is predicting greater number
of classes for the matched input
instance

The number of the classes the rule
predicts should be subtracted


Discovery Operators

 In the discovery mechanism an evolutionary algorithm with
four mutation operators is defined:


Discovery Operators
 Mutation operators on rule’s antecedent part

Generalize the rule by flipping the 0
MA-G or 1 bits to #

Specializes the rule by flipping #
MA-S bits to 1 or 0


Discovery Operators
 Mutation operators on rule’s consequent part

Subtract the number of predicted
MC-S classes by flipping 1 bits to 0

Adds more classes to predicted
MC-A classes by flipping 0 bits to 1


Which Discovery Operator?

The votes each rule has received guide which mutation
operator should act.



The votes each rule has received guide which mutation
operator should act.

Wrongly Subtract the
assigned some number of
Superset Rule
non-expected predicted
classes classes (MC-S)


Activated Mutation
Rule Received Votes
Operator
Correct MA-G
Subset MC-A
Superset MC-S
Partial-Set MC-A, MC-S
Wrong MC-A, MC-S
Correct, Subset MA-S
Correct, Superset MA-G
Correct, Partial-Set MA-S
Correct, Wrong MA-S
Wrong, Subset MA-S, MC-A
Wrong, Partial MA-S

Correct, Subset, Wrong MA-S, MA-G


Mutation Rate
• Mutation operator performs bit flipping
using a probability, which is the mutation
rate.

• The strength of a rule is the amount of
reward we predict the system to receive if
the rule acts.

• The more the strength, the less the mutation
rate.


Strength of a Rule
 The mean of the rewards the rule gets over time.

Reward Function:

C rule ∆C expected
R = 1−
C rule  C expected

Alteration of [Vallim et al., GECCO’ 08]

Strength of a Rule
 The mean of the rewards the rule gets over time.

Reward Function:

C rule ∆C expected
R = 1−
C rule  C expected

A ∆B
= {x : ( x ∈ A ) ⊕ ( x ∈ B )}

Alteration of [Vallim et al., GECCO’ 08]

Rules Rewards

Input Expected Selected Received
Reward
Instance output Rule Vote
0001 1, 2 ###1 / 110 Correct 1
0101 1, 2, 3 ###1 / 110 Subset 0.66
0111 1 ###1 / 110 Superset 0.50
1111 1,3 ###1 / 110 Partial-set 0.33
0011 3 ###1 / 110 Wrong 0


Experimental Results
 Data Sets:
 Two binary datasets in the bioinformatics domain
 [Chan and Freitas, GECCO’ 06 ]
 Extracted from [Alves et al., 2009]


 Quality Metrics:

Accuracy

• Proportion of predicted classes among all predicted or
true classes

Precision

• Proportion of true classes among all predicted classes

Recall

• Proportion of predicted classes among all true classes

[Tsoumakas & Katakis, 2007]

 For the VLCS, we use a 5-fold cross validation in which the
training part is used to evaluate the rules using the voting
mechanism described above.
 Fixed size population
 initially are the most general possible rules.
 In each generation, each rule is voted by its matched
instances
 reward is assigned
 Defined mutation operators to discover new rules
 The combination of the best rules among the parents and the
off-springs make the next generation.
 We stop the training phase if the mean strength of the rules
decreases in a number of consecutive generations.

 [Chan and Freitas, GECCO’ 06 ]
 135 instances
 152 attributes
 Two classes
• Each instance could have one or both of the available class labels.

Method Accuracy Precision Recall

BR 0.89 0.89 0.87

ML-KNN 0.91 0.93 0.91

VLCS 0.89 0.89 0.89


 Extracted from [Alves et al., 2009]
 7877 proteins
 40 attributes
 Six classes
• Each instance could have some of the available class labels.

Method Accuracy Precision Recall

BR 0.78 0.77 0.78

ML-KNN 0.80 0.81 0.80

VLCS 0.81 0.83 0.82


Conclusion

Guiding the discovery mechanism
with a prior knowledge, such that is
used in VLCS, can help us solve
applicable problems


Future Work
 A representation for dealing with numeric and nominal
datasets.
 Future studies on scalability and stability of the system is
necessary.
 Additional studies on system performance in dealing with
imbalanced data and noise is also required.
 Improving evolutionary operators, guiding mechanism and
rule refinement.


Any Question?

The most exciting phrase to hear in
science, the one that heralds new
discoveries is not “Eureka”! (I found
it!) but “That's funny...”
- Isaac Asimov


Voting Based Learning Classifier System for Multi-Label Classification

More Related Content

Viewers also liked (15)

Similar to Voting Based Learning Classifier System for Multi-Label Classification (20)

More from Daniele Loiacono (20)

Recently uploaded (20)

Voting Based Learning Classifier System for Multi-Label Classification