A general frame for building optimal multiple SVM kernels

A general frame for building optimal multiple
SVM kernels
Dana Simian, Florin Stoica
University ”Lucian Blaga” of Sibiu, Faculty of Sciences
5-7 dr. I. Rat¸iu str, 550012 Sibiu, Romania
Abstract. The aim of this paper is to define a general scheme for build-ing
optimal multiple SVM kernels. We implement and compare many
hybrid methods derived from this scheme. We tested our multiple ker-nels
for classification tasks but they can be also used for other types of
tasks.
1 Introduction
Classification task can be found in many fields of activity (medicine, biology,
bibliomining, webmining, etc.). The classifier is strong dependent on data type. A
good classifier for a specific class of problems or data might have a bad behaviour
for other types of data sets. It is necessary to train the classifier on specific sets
of data. Support Vector Machines (SVMs) represents an important and popular
tool for machine learning tasks, especially for classification and regression. SVMs
are supervised learning methods introduced by Vapnik ([18]). Being given a set
of training and testing data, defined by their label (target value) and a set of
features (attributes), the goal of SVMs is to produce, using the training set of
data, a model which predicts target value of data instances from a testing set
which are given only the features. The accuracy of the model for a specific test
set is defined as the percentage of test set items that are correctly classified
by the model. If the accuracy is acceptable, the model can be used to classify
data for which the class label is unknown. If the data is linear separable, an
optimal separating hyperplane with maximal margin is obtained ([18], [3]). In
the case of non separable data, the kernel method is mapping the original data
into a higher dimensional features space where they become separable despite
being non-separable by a hyperplane in the original input space. The kernel
substitution method, known as ”kernel trick” was first published by Aizerman
et al. in [1]. A kernel function must satisfy Mercer’s theorem (see [8]), which
states that for any function K : X × X → R, continuous, symmetric, positive,
semi-definite, there exists a function Á defined on an inner product space of
possibly high dimension, such that K(xi, xj) = hÁ(xi), Á(xj)i. The form of the
feature map Á does not need to be known, it is implicitly defined by the choice
of kernel function K. Different kind of kernels can be found in literature ([3], [5],
[6]). The most important are:
Polynomial: Kd,r
pol (x1, x2) = (x1 · x2 + r)d, r, d ∈ Z+ (1)

2
RBF (x1, x2) = expµ −1
RBF: K°
2°2 |x1 − x2|2¶ (2)
Sigmoidal: K°
sig(x1, x2) = tanh(° · x1 · x2 + 1) (3)
In the following we take into account only the binary classification tasks and
SVMs.
Standard SVMs use one kernel from (1)-(3) and the prediction supposes
the choice of kernel parameters. Usually the choice of the kernel is made in an
empirical way. The real problems require more complex kernels. The aim of this
paper is to obtain optimal multiple kernels for given sets of data. We present a
general hybrid scheme for obtaining optimal multiple kernels for given type of
data. Many particular methods derived from this frame are implemented. We
evaluate the performances of these multiple SVM kernels using cross-validation.
The paper is organized as follows. In section 2 we present our main theoretical
result, the frame of a hybrid method for building SVMkernels. Different solutions
for constructing SVM kernels using evolutionary and hybrid approaches ([7],
[12], [13], [15], [16], [17]) can be integrated in our frame. Implementation details
are presented in section 3. Results of testing and validation, using many data
sets, are presented in section 4. In section 5 we make a comparison of different
particular methods and present conclusions and further directions of study.
2 The frame of the hybrid method
2.1 General presentation
We started from the idea presented in [7]. We construct the multiple kernels u-sing
a hybrid method structured on two levels. The macro level is represented by
a genetic algorithm which builds the multiple kernel. Multiple kernels are coded
into chromosomes. In the micro level the quality of chromosomes is computed
using a SVM algorithm. The fitness function is represented by the classification
accuracy on a validation set of data. The optimal multiple kernel is computed
in the genetic algorithm. The chromosome containing this kernel is decoded and
then the cross-validation techniques is applied for obtaining the ”Cross Valida-tion
Accuracy”. A multiple kernel has, in this frame, a formal representation
and a structural representation.
2.2 Formal representation of multiple kernels
The multiple kernel can be formally represented using a tree, which terminal
nodes contain a single kernel and the intermediate nodes contains operations.
It is proved in the kernel theory that multiple kernels can be obtained using
the set of operations (+, ∗, exp), which preserve Mercer’s conditions (1)-(3). If
a node contains the operation exp only the left of its descendants is considered.
The number of the terminal nodes is an input data of formal representation. The
linear kernels are obtained when only the additive operation is used in the formal
representation of multiple kernels. A formal representation of the multiple kernel
K = (K1 op2 K2)op1(K3 op3 K4) is given in figure 1.

3
Fig.1 Formal representation of multiple kernel
2.3 Structural representation of multiple kernels
The structural representation of multiple kernels represents the structure of chro-mosome
in the genetic algorithm from the macro level of the frame method. We
propose two types of structural representation:
1. Tree representation
2. Linear representation
The details of structural representation differentiate the methods derived from
the general frame. Other kind of representations could be added in order to en-large
our frame.
In [7], [12] the chromosome is coded using a tree structure identical with
the formal representation. In [13] we used maximum 4 polynomials kernels
(Kdi,r
i ; i = 1, . . . , 4) and 3 operations. The chromosome has a linear struc-ture
composed by 34 genes. Each operation, opj , j = 1, 2, 3 is represented using
two genes, for a degree di are allocated 4 genes, and the variable r, which is
the same for all the simple kernels, is represented using 12 genes. If one of the
operations is exp, only the first 12 from the 16 genes allocated for the degrees dj
are representative. In [15], [16] we used maximum 4 single kernels, which can be
polynomial, RBF or sygmoidal, and 3 operations. The chromosome has a linear
structure composed by 78 genes: 2 genes for each operation (opi, i = 1, 2, 3), 2
genes for the kernel’s type, ti, i = 1, 2, 3. If the single kernel Ki is polynomial,
we use 4 genes for the degree parameter di and 12 genes for ri. If the associated
kernel is not polynomial, the last 16 genes are used to represent the real value
of parameter °i. The linear structures of the chromosome, presented above, are
represented in fig. 2.
op1 op2 op3 d1 d2 d3 d4 r op1 op2 op3 t1 d1 r1 t2 . . .
°1
Fig.2 Linear representations of multiple kernel-model
2.4 Genetic algorithm
One of the elements that influence the behavior of the hybrid algorithm which
built the SVM kernel is the genetic algorithm used in the macro level. We im-plemented
and tested many kind of genetic algorithms. The difference between
them is made by the mutation operators. Three type of mutation operators were
used:

4
1. Classical mutation operators (like the algorithms implemented in [5]).
2. Co-mutation operators.
3. Improved co-mutation operators using a wasp-based computational scheme.
We used the co-mutation operator Mijn defined in [9], instead of classi-cal
mutation and cross-over operator. The Mijn operator realizes a mutation
of a number of adjacent bits at the same time. It does not perform only a
flip-mutation, it mutates substrings in order to save the implicit information
contained in the adjacency property of these bits.
We also used the co-mutation operator LR − Mijn that we defined in [14].
This co-mutation operator finds the longest sequence of bits, situated in the left
or in the right of a position p, randomly chosen. If the longest sequence is in the
left of p, the LR −Mijn operates identical with Mijn, otherwise it operates on
the set of bits starting from p and going to the right. Both co-mutation opera-tors
we considerate have about the same capabilities of local search as ordinary
mutation, but allows long jumps, to reach far regions in the search space which
cannot be reached by the classical bit- flip mutation.
The single kernels chosen and the operations between them are coded inside
a chromosome. In order to realize an equilibrium between the changing of the
operations and the changing of single kernels’ parameters we improved the mu-tation
and co-mutation operators using a wasp-based computational scheme.
Wasp based computational models are used for solving dynamic reparti-tion
of tasks. The wasp natural behavior is governed by a stimulus - response
mechanism. The response threshold of an individual wasp for each zone of the
nest together with the stimulus from brood located in this zone determine the
engagement of the wasp in the task of foraging for this zone. A wasp based com-putational
model uses a pair of stimulus - response for computing the probability
of some actions. One or many rules for updating the response threshold may be
established in order to adapt the model to particular requirements ([2], [4]). Let
consider in our case a mutation or a co-mutation operator, denoted by M. The
number of genes for operations’ representation is much smaller than the number
of genes for kernel’s parameters. For increasing the probability of changing of
the operations we use a wasp based computational model. We associate a wasp
to each chromosome C. Each wasp has a response threshold µC. The set of o-perations
coded within chromosome broadcasts a stimulus SC which is equal to
difference between maximum classification accuracy (100) and the actual classi-fication
accuracy obtained using the multiple kernel coded in the chromosome,
Sc = 100−CAC. The probability that the operator M perform a mutation that
S°
will change the operations coded within chromosome is P(µC, SC) =
C
S°
C + µ°
C
,
where ° is a system parameter. Good results were obtained for ° = 2.
The threshold update rule is defined as follows: µC = µC − ±, ± > 0, if the clas-sification
accuracy of the new chromosome C is lower than in the previous step,
and µC = µC +±, ± > 0, if the classification accuracy of the new chromosome C
is greater than in the previous step.
A similar model can be used for increasing the probability for changing the

5
type of single kernels which composed the multiple kernels. New methods for
building multiple kernels can be obtained using other modifications to classical
genetic algorithms.
2.5 SVM algorithm
The fitness function for evaluation of chromosomes is the classification accuracy
given by a SVM algorithm acting on a particular set of data. The data are divided
into two subsets: the training subset, used for problem modeling and the test
subset used for evaluation. The training subset is also randomly divided into a
subset for learning and a subset for validation. The data from the learning subset
are used for training and the data from the validation subset for computing the
classification accuracy.
2.6 Model evaluation
After the multiple kernel is found by the genetic algorithm we apply cross
validation method for estimating the performance of our predictive model. In
our frame we use K-fold cross validation.
One idea could be to use cross validation as fitness function in the genetic
algorithm instead of classification accuracy. Our implementations and practical
results proved that this is not a valid solution due to the huge time required.
3 Implementation details
In order to implement one particular method derived from the general frame
presented in section 2, we start from the classes implemented in libsvm ([5]) and
modify them according to the chosen structural representation of multiple ker-nel.
The classes svm parameter, svm predict and Kernel must be adapted to our
particular model. The class svm predict was extended with the predict method.
The Kernel class is modified to accomplish the kernel substitution. A method for
computing the hybrid multiple kernels is necessary. We construct a new method,
that we named k function, for the computation of our simple kernel. Then, the
simple kernels are combined using operation given in the linear structural model
of the chromosome. In the genetic algorithm, the operations and all parameters
assigned to a simple kernel (type of the simple kernels and all other parameters)
are obtained from a chromosome, which is then evaluated using the result of
the modified predict method. After the end of the genetic algorithm, the best
chromosome gives the multiple kernel which can be evaluated on the test subset
of data.
The cross-validation method is applied for the ”optimal” multiple kernel ob-tained
in the genetic algorithm. In order to obtain the cross validation accuracy,
which characterize our predictive model given by the multiple kernel, we modify
the class svm train, introducing the method do cross validation which takes into
account the structural model of the multiple kernel.

6
4 Experimental results
We used the ”Leukemia” and ”Vowel” data sets from the standard libsvm
package ([5]) and different kinds of genetic algorithms: classical genetic approach
(from [5]), approach using Mijnco-mutation operator, approach using LR−Mijn
co-mutation operator, approaches derived from the previous ones combined with
a wasp based computational model. For each execution, dimension of population
was 40 and the number of generations was 30.
The experimental results for the Leukemia data set are presented in Table 1.
Approach Type of Ope- Parameters of Classif. Cross
single rations optimal kernel accuracy Validation
kernels accuracy
1 POL, RBF, +,exp, d1 = 1, r1 = 1887, 88.23% 86.73 %
SYG + °1
1 = 1.399 ,°1
,
2 = 1.890
,
2 POL, SYG, +,exp d2 = 3, r2 = 2383, 90.03% 81.57 %
SYG + °2
1 = 0.622 ,°2
,
2 = 0.256
,
3 RBF, SYG, exp,* d3 = 2, r3 = 1503, 91.17% 92.10 %
POL + °3
1 = 0.294 ,°3
,
2 = 0.065
,
1¤ SYG, SYG, *,+ °4
1 = 1.456 ,°4
,
2 = 1.245, 91.17% 81.57%
,
POL, POL + d4
1 = 1, r4
,
1 = 1373,
,
d4
2 = 3, r4
,
1 = 2165
,
2¤ RBF, POL +,exp °5
1 = 1.596, 91.17% 81.57 %
,
RBF + d5 = 3, r5 = 2198,
°5
2 = 1.309
,
3¤ SYG, RBF, +,* °6
1 = 0.016, °6
,
2 = 1.997, 94.11% 89.47 %
,
POL, SYG * d6 = 1, r6 = 1543,
°6
3 = 0.068
,
Table 1. Leukemia data set
We denoted the approaches as follows, taking into account the genetic algorithm
used in the hybrid approach: 1 -classical genetic approach, 2- Mijn co-mutation
genetic operator, 3 - LR−Mijn co-mutation genetic operator. The methods de-noted
with an additional * are improved methods using a wasp computational
scheme. The terminal nodes of structural representation of multiple kernel (sim-ple
kernels) are given from left to right. The operations are given beginning with
the last intermediate level. By example, the first multiple kernel in the table 1
is (KPOL + KRBF ) + (exp(KSIG)).
The experimental results for the Vowel data set, are presented in Table 2.

7
Approach Type of Ope- Parameters of Classif. Cross
single rations optimal kernel accuracy Validation
kernels accuracy
1 RBF, RBF, *,exp °1
1 = 0.824, °1
,
2 = 0.943, 61.47% 97.34 %
,
RBF + °1
3 = 0.048
,
2 RBF, SYG, *,* °2
1 = 0.654, °2
,
2 = 0.445, 61.68% 98.86 %
,
RBF, SYG + °2
3 = 0.298 ,°2
,
4 = 0.017
,
3 SYG, SYG, *,+ °3
1 = 0.190, °3
,
2 = 0.014 62.33% 99.62 %
,
RBF, RBF * °3
3 = 0.372 , °3
,
4 = 0.760
,
1¤ RBF, SYG, *,exp, °4
1 = 1.064, °4
,
2 = 0.094 61.53% 98.24%
,
RBF * °4
3 = 0.273
,
2¤ SYG, RBF, *,exp, °5
1 = 0.243, °5
,
2 = 0.700, 61.73% 99.24 %
,
RBF, * °5
3 = 0.249
,
3¤ RBF, SYG, *,+ °6
1 = 0.694, °6
,
2 = 0.097, 62.81% 99.62 %
,
RBF, SYG + °6
3 = 0.257, °6
,
4 = 0.014,
,
Table 2- Vovel data set
5 Conclusions and further direction of study
In this article we introduce a general frame which allows us to obtain hybrid
methods for building optimal multiple SVM kernels. Our scheme follows 5 steps:
formal representation of the multiple kernels, structural representation, choice
of genetic algorithm, SVM algorithm and model evaluation. Structural repre-sentation
and genetic algorithm are most important in differentiate particular
methods from this frame. We implemented many particular methods, for the
data sets Leukemia and Vowel. Analyzing the results, presented in Table 1 and
Table 2, we can conclude that utilization in the genetic algorithm of more per-forming
co-mutation algorithms generally improves both the classification and
cross validation accuracy. The results are strong dependent of data sets. The
existence of a frame from which many methods for building optimal multiple
kernels can be easily obtained is very important. It makes possible a quick com-parison
of the performances of multiple kernels and allows the choice of the
better method for a given data set. The performances of the particular methods
implemented are promising. Our frame is an open one, it may be enlarge and
offer possibility of further development.
References
1. M. Aizerman, E. Braverman, and L. Rozonoer, Theoretical foundations of the po-tential
function method in pattern recognition learning, Automation and Remote
Control 25, 1964, 821837.
2. Bonabeau E., G. Theraulaz, J.I. Demeubourg, Fixed response thresholds and the
regulation of division of labor in insect societies, Bull. Math. Biol., vol. 60, 1998,
pp. 753807.
3. Campbell C., An Introduction to Kernel Methods Radial Basis Function Network:
Design and Applications, Springer Verlag, Berlin, 2000, 1 31.

8
4. V. A. Cicirelo, S.F. Smith,Wasp-like Agents for Distributed Factory coordination,
Autonomous Agents and Multi-Agent Systems, Vol. 8, No. 3,2004, pp. 237267.
5. Chang C-C., Lin C-J., LIBSVM : a library for support vector machines, 2001, Soft-ware
available at http://www.csie.ntu.edu.tw/ cjlin/libsvm.
6. Diosan L., Oltean M., Rogozan A., Pecuchet J. P., Improving svm performance
using a linear combination of kernels, Adaptive and Natural Computing Algo-rithms,
ICANNGA07, volume 4432 of LNCS, 2007, 218 - 227.
7. Diosan L., Oltean M., Rogozan A., Pecuchet J. P., Genetically Designed Multiple-
Kernels for Improving the SVM Performance, portal VODEL, http://vodel.insa-rouen.
fr/publications/rfia, 2008.
8. Ha Quang Minh, Partha Niyogi, Yuan Yao, Mercers Theorem, Feature Maps, and
Smoothing, http://people.cs.uchicago.edu/ niyogi/papersps/MinNiyYao06.pdf
9. De Falco I., Iazzetta A., Della Cioppa A., Tarantino E., A new mutation operator
for evolutionary airfoil design, Soft Computing - A Fusion of Foundations, Method-ologies
and Applications, Springer Berlin / Heidelberg, Volume 3, Number 1 / June,
1999, 44-51.
10. Nguyen H. N., Ohn S. Y., Choi W. J., Combined kernel function for support vector
machine and learning method based on evolutionary algorithm, Nikhil R. Pal, Nikola
Kasabov, Rajani K. Mudi, Srimanta Pal, and Swapan K. Parui, editors, Neural
Information Processing, 11th International Conference, ICONIP 2004, volume 3316
of LNCS, Springer, 2004, 1273 - 1278.
11. Stahlbock R., Lessmann S., Crone S., Genetically constructed kernels for support
vector machines, Proc. of German Operations Research, Springer, 2005, 257 - 262.
12. Simian D., A Model For a Complex Polynomial SVM Kernel, Proceedings of the
8-th WSEAS Int. Conf. on Simulation, Modelling and Optimization, Santander
Spain, 2008, within Mathematics and Computers in Science and Engineering, 2008,
pp. 164-170.
13. Simian D., Stoica F., An evolutionary method for constructing complex SVM ker-nels,
Recent Advances in Mathematics and Computers in Biology and Chemistry,
Proceedings of the 10th International Conference on Mathematics and Computers
in Biology and Chemistry, MCBC’09, Prague, Chech Republic, 2009, WSEAS Press
2009, pp. 172-178.
14. F. Stoica, D. Simian, C. Simian, A new co-mutation genetic operator, Advanced
topics on evolutionary computing, Proceeding of the 9-th Conference on Evolutionay
Computing, Sofia, 2008, pp. 76-82.
15. D. Simian, F. Stoica, Evaluation of a hybrid method for constructing multiple SVM
kernels, Recent Advances in Computers, Proceedings of the 13th WSEAS Interna-tional
Conference on Computers, 2009, Recent Advances in Computer Engineering
Series, (2009), WSEAS Press, pp. 619-623.
16. Dana Simian, Florin Stoica, Optimization of Complex SVMKernels Using a Hybrid
Algorithm Based on Wasp Behaviour, Lecture Notes in Computer Science, LNCS
5910 (2010), I. Lirkov, S. Margenov, and J. Wasniewski (Eds.), Springer-Verlag
Berlin Heidelberg, pp. 361-368.
17. Sonnenburg S., Rtsch G., Schafer C., Scholkopf B, Large scale multiple kernel learn-ing,
Journal of Machine Learning Research, 7, 2006, 1531 - 1565.
18. Vapnik V., The Nature of Statistical Learning Theory, Springer Verlag, 1995,
http://www.csie.ntu.edu.tw/ cjlin/libsvmtools/datasets

A general frame for building optimal multiple SVM kernels

More Related Content

What's hot (19)

Viewers also liked (18)

Similar to A general frame for building optimal multiple SVM kernels (20)

More from infopapers (9)

Recently uploaded (20)

A general frame for building optimal multiple SVM kernels