SlideShare a Scribd company logo
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.


IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION                                                                                                                 1




            An Immune Algorithm for Protein Structure
                  Prediction on Lattice Models
      Vincenzo Cutello, Giuseppe Nicosia, Member, IEEE, Mario Pavone, and Jonathan Timmis, Member, IEEE


   Abstract—We present an immune algorithm (IA) inspired by the                     anomaly detection [21], fault diagnosis [22], [23], computer
clonal selection principle, which has been designed for the protein                 security [24], data analysis [10], [25], [26], virus detection [27],
structure prediction problem (PSP). The proposed IA employs two                     and many other areas [1]–[3], [28]. The field of AIS appears
special mutation operators, hypermutation and hypermacromuta-
tion to allow effective searching, and an aging mechanism which is                  not only to be a powerful computing paradigm, but potentially
a new immune inspired operator that is devised to enforce diver-                    a prominent apparatus for improving the understanding of
sity in the population during evolution.                                            biological data and systems [9], [29].
   When cast as an optimization problem, the PSP can be seen as                        When one designs any computational solution, the nature of
discovering a protein conformation with minimal energy. The pro-                    the problem space should always be taken into account. This is
posed IA was tested on well-known PSP lattice models, the HP
model in two-dimensional and three-dimensional square lattices’,                    especially true in an emerging area such as AIS, and we must
and the functional model protein, which is a more realistic biolog-                 avoid the one size fits all attitude that the authors of [30] warn us
ical model.                                                                         against. With this in mind, and in the context of the framework
   Our experimental results demonstrate that the proposed IA is                     for AIS presented in [2], we introduce an immune algorithm
very competitive with the existing state-of-art algorithms for the                  (IA) based on the clonal selection principle [9]. We employ
PSP on lattice models.
                                                                                    a new aging operator and specific mutation operators tailored
   Index Terms—Aging operator, clonal selection algorithms, func-                   for the protein structure prediction problem (PSP) in the HP
tional model proteins, hypermacromutation operator, hypermuta-
                                                                                    model for two-dimensional (2-D) and three-dimensional (3-D)
tion operator, immune algorithms (IAs), protein structure predic-
tion problem, two-dimensional HP model, three-dimensional HP                        lattices, and in the functional model proteins. Given the primary
model.                                                                              sequence of a protein, the protein structure prediction problem
                                                                                    requires the identification of its native (tertiary) conformation
                                                                                    with minimum energy; while the protein folding problem re-
                           I. INTRODUCTION
                                                                                    quires information about the possible pathways to folding and
                                                                                    unfolding. Since a protein’s structure determines its biological
A     RTIFICIAL Immune Systems (AISs) represent a field of
      biologically inspired computing that attempts to exploit
theories, principles, and concepts of modern immunology to
                                                                                    function, it is very important to be able to predict the final spa-
                                                                                    tial conformation of the proteins. This paper is concerned only
design immune system-based applications in science and engi-                        with the static aspect, that is, how to predict the folded tertiary
neering [1]–[3]. One role of the immune system (IS) is to protect                   structure of a protein, given its sequence of amino acids through
the host organism against attacks from antigens (i.e., viruses and                  the use of lattice models.
bacterias) and eliminate those cells that have been “infected.”                        This paper is structured as follows: Section II describes the
The IS provides an excellent example of a bottom up intelli-                        protein structure prediction problem in Dill’s model and in the
gent strategy [4], through which adaptation operates at the local                   functional model protein; Section III presents the IA, inspired
level of cells and molecules, and useful behavior emerges at the                    by the clonal selection theory, for the protein structure predic-
global level: this is exemplified by the immune humoral and cel-                     tion problem; Section IV details the characteristic dynamics of
lular responses.                                                                    the implemented IA using an aging process; Section V describes
   AISs are proving to be a very general and applicable form                        the technique used to partition the landscape of the PSP, and the
of bio-inspired computing. A great deal of work has gone into                       application of the aging process and memory B cells to improve
developing algorithms that extrapolate basic immune processes                       the overall performance of the algorithm; Section VI reports the
such as clonal selection, negative and positive selection, danger                   results for the 2-D HP model; Section VI-A describes previous
theory, and immune networks [2]. To date, AIS have been                             related works, and draws comparisons between these and the
applied to areas such as machine learning [5], [6], optimization                    proposed IA for the 2-D HP model; Section VII presents re-
[7]–[9], bioinformatics [9]–[11], robotic systems [12]–[14],                        sults for the 3-D HP model; Section VIII presents the results
decision support systems [15], network intrusion detection [16],                    obtained for the functional model protein; Section IX provides
[17], combinatorial optimization [18], [19], scheduling [20],                       a brief comparison between the IA and other biologically in-
                                                                                    spired algorithms; finally, concluding remarks are presented in
   Manuscript received August 9, 2005; revised January 3, 2006.                     Section X.
   V. Cutello, G. Nicosia, and M. Pavone are with the Department of Math-
ematics and Computer Science, University of Catania, 95125 Catania, Italy
(e-mail: vctl@dmi.unict.it; nicosia@dmi.unict.it; mpavone@dmi.unict.it).                              II. LATTICE MODELS FOR THE PSP
   J. Timmis is with the Department of Computer Science and the Department           There are essentially five approaches to modeling the PSP:
of Electronics, University of York, Heslington, York YO10 5DD, U.K. (e-mail:
jt517@ohm.york.ac.uk).                                                              molecular dynamics [31], Monte Carlo methods [32], statistical
   Digital Object Identifier 10.1109/TEVC.2006.880328                                mechanical models [33], [34], probabilistic road map-based
                                                                 1089-778X/$20.00 © 2006 IEEE
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.


2                                                                                                      IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION



[35], [36], and lattice models [37], [38]. The first two techniques
have been used to analyze the number and the characteristics of
folding pathways; the second two techniques are useful tools
for studying the folding landscape, while the final technique,
whilst having a fundamental theoretical relevance, cannot be
applied directly to real proteins. In this paper, we focus on
lattice models, in particular, we use the well-known Dill’s
lattice model, the HP model [39], and the “shifted” HP model
(also called functional model proteins) [40].
   The HP model takes into account the hydrophobic interaction
as the main driving force in protein folding. The HP model in-
volves attraction—interaction only. The functional model pro-
teins, unlike Dill’s model, has a unique native fold with an en-
ergy gap between the native and the first excited state; the native
state is not maximally compact, and thus presents cavities or po-
tential binding sites, a key biological property required in order
to investigate ligand binding. To include these properties, the
functional model has both attractive and repulsive interactions.

A. The Dill’s Model
   Proteins are sequences of amino acids. In the standard Dill’s
model, each amino acid is represented as a bead, and connecting
bonds are represented as lines. In this approach, the protein is
composed of a specific sequence of only two types of beads, H                      Fig. 1. Convergence process for the protein sequence No. 1, at different energy
(bead-Hydrophobic/non-polar) or P (bead-hydrophilic/Polar);                       levels.
that is, the 20 amino acids can be divided into two classes:
H and P. This is usually called the HP model (or Dill model)                      and                              . When           and         , we
[39], where the label P is used to represent hydrophilic amino                    have the typical interaction energy matrix for the standard HP
acids because those amino acids are also polar. We reduce the                     model [39]; while for            and        , we have the interac-
alphabet from 20 characters to 2, where our protein sequences                     tion energy matrix for the shifted HP model [43]. For the Dill’s
take the form of strings belonging to the alphabet                  .             model, the native conformation is the one that maximizes the
Hydrophobic amino acids tend to come together to form a                           number of contacts H-H, i.e., the one that minimizes the free
compact core that excludes water. Due to the fact that the                        energy function.
environment inside cells is aqueous (primarily water), these                         Regarding the functional model proteins (described in
hydrophobic amino acids tend to be on the inside of a protein,                    Section VIII), in order to find binding pockets and the required
rather than on its surface. Hydrophobicity is one of the key                      energy gap, the native conformation finds a tradeoff between
factors that determines how the chain of amino acids will fold                    the number of H-H contacts (i.e., the attractive force) and the
up into an active protein.                                                        non H-H contacts (i.e., the repulsive forces).
   The whole conformation is embedded in a two or 3-D lattice,                       Fig. 1 shows snapshots of the IA during the convergence
which simply divides space into amino acid-sized units. Bond                      process, when applied to the Protein sequence No. 1 (see Table I)
angles only have limited discrete values, dictated by the struc-                  at different energy levels: from poor conformations to the native
ture of the lattice (for instance, square, triangular, or cubic [41],             conformation with Energy 9 (the H-H contacts are represented
[42]). A lattice site may either be empty or contain one bead. In                 by dotted lines, and the hydrophobic residues by black circles).
particular, on a 2-D square lattice, the HP model represents pro-                    2) 2-D Square Lattice Standard HP Benchmarks: For our
teins as 2-D self-avoiding walk chains of beads on the lattice,                   experiments, we used the first nine instances of the Tortilla
i.e., two beads cannot occupy the same site of the lattice, and                   2-D HP Benchmarks1 (the first eight sequences are taken from
each bead occupies only one lattice site connected to its chain                   [44], sequence 9 is taken from [45], the last three instances are
neighbors.                                                                        taken from [41]) to test the searching capability of the designed
   1) Protein Energy: For each conformation, one can evaluate                     IA. In Table I,      is the optimal or best-known energy value,
the value of the energy function: this allows for the modeling                        ,            indicate repetitions of the relative symbol or
of free energies of protein folds. The simplest form of energy                    subsequence.
function counts the number of H-H-contacts. Each H-H topo-                           The 12 chosen HP instances are standard benchmarks used to
logical contact has energy value , while all other contact inter-                 test the searching ability of heuristics methods and blind search
action types (H-P, P-H, P-P) have energy value . Two amino                        algorithms. These instances have been tested on more than 20
acids create H-H-contact if they are topological neighbors and                    different algorithms (see Section VI-A and Table VII). Ana-
they are not connected by a bond. The goal is to find a confor-                    lyzing the HP model is very interesting and challenging for
mation with the lowest energy. In the HP model in general, the                       1 http : // www.cs.sandia.gov / tech_reports / compbio / tortilla-hp-benchmarks
residues’ interactions can be defined as follows:                                  .html.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.


CUTELLO et al.: AN IMMUNE ALGORITHM FOR PROTEIN STRUCTURE PREDICTION ON LATTICE MODELS                                                                       3



                            TABLE I                                                   The presence of binding sites in the shifted HP model allows
  2-D SQUARE LATTICE STANDARD HP BENCHMARKS FROM [41] AND [44]                     us to briefly discuss the biological relevance of the model. Such
                                                                                   sites support a significant number of proteins that can be clas-
                                                                                   sified as functional, with the ground states having lower degen-
                                                                                   eracies and more cooperative folding than the regular HP model
                                                                                   [48].

                                                                                   C. The Conformational Space Into Lattice
                                                                                      To embed a hydrophobic pattern                     into a lattice,
                                                                                   we have the following three methods [49].
                                                                                     1) Cartesian Coordinate: The position of residues is specified
                                                                                        independently from other residues.
                                                                                     2) Internal Coordinate: The position of each residue depends
                                                                                        upon its predecessor residues in the sequence. There are
                                                                                        two types of internal coordinate: absolute directions where
                                                                                        the residue directions are relative to the axes defined by the
                                                                                        lattice, and relative directions where the residue directions
                                                                                        are relative to the direction of the previous move.
                                                                                     3) Distance Matrix: The location of a given residue is com-
                                                                                        puted by means of its distance matrix.
                                                                                   Krasnogor et al. [49] performed an exhaustive comparative
                                                                                   study using evolutionary algorithms (EAs) with relative and
                                                                                   absolute directions. The experimental results show that relative
                                                                                   directions almost always outperform absolute directions over
computer scientists, but is considered unsatisfactory for many                     square and cubic lattice, while absolute directions have better
biologists. Whilst proteins fold in nature in a matter of sec-                     performances when facing triangular lattices. Experimental
onds, computational biologists have found that folding proteins                    evidence suggests internal coordinates with relative directions
to their minimum energy conformations is an unsolved opti-                         should be used. However, in general, it is difficult to assess the
mization problem. The PSP on the HP model has been shown to                        effectiveness of direction encoding on an EA’s performance.
be NP complete for 2-D [46] (NP-hardness is shown by a reduc-
tion from an interesting variation of the planar Hamilton cycle
problem), and 3-D lattices [47] (NP-hardness is shown by a re-                                          III. THE IMMUNE ALGORITHM
duction from a variation of the bin packing problem).
                                                                                   A. The Clonal Selection Principle
B. The Functional Model Proteins                                                      The theory of clonal selection [50], suggests that B and T
                                                                                   lymphocytes that are able to recognize the antigen, will start to
   In the HP model, the interaction between two hydrophobic                        proliferate by cloning upon recognition of such antigen. When
residues is      , and zero for the other possible pairs (H-P, P-H,                a B cell is activated by binding an antigen (and a second signal
and P-P), that is, the HP model involves one attraction interac-                   is received from T lymphocytes), many clones are produced in
tion (H with H) and three neutral interactions (H with P, P with                   response, via a process called clonal expansion. The resulting
P), so that the energy matrix of the HP model may be written                       cells can undergo somatic hypermutation, creating offspring B
                                                                                   cells with mutated receptors. The higher the affinity of a B cell to
                                                                                   the available antigens, the more likely it will clone. This results
                                                                           (1)
                                                                                   in a Darwinian process of variation and selection, called affinity
                                                                                   maturation. The increase in size of these populations couples
   There are many other folding codes, i.e., the number of dif-                    with the production of cells with longer than expected lifetimes,
ferent types of residues and the matrix of energies describing                     assuring the organism a higher specific responsiveness to that
the interactions between different kinds of residues. One im-                      antigenic attack in the future. This gives rise to immunological
portant folding code is the “shifted” HP model (or functional                      memory which is demonstrated by the fact that, when the host
model proteins) [40]. This model has native folds that are not                     is first exposed to the antigen, a primary response is initiated;
maximally compact, and presents cavities or potential binding                      in this phase the antigen is recognized and immune memory is
sites which is a key biological property required in order to in-                  developed. When the same antigen is encountered in the future,
vestigate ligand binding. To include these properties, the shifted                 a secondary immune response is initiated. This results from the
HP model has two bead types, and both attractive and repulsive                     stimulation of cells already specialized and present as memory
interaction. Thus, the shifted HP energy matrix is                                 cells: a rapid and more abundant production of antibodies is ob-
                                                                                   served. The secondary response can be elicited from any antigen
                                                                                   that is similar, although not identical, to the original one that es-
                                                                           (2)
                                                                                   tablished the memory. This is known as cross-reactivity.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.


4                                                                                                       IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION



                               TABLE II                                            version of CLONALG [8]), which clones B cells proportion-
                          PSEUDOCODE OF THE IA                                     ally to their antigenic affinities. Experimental results for PSP
                                                                                   using such an operator (not shown in this paper), show a fre-
                                                                                   quent premature convergence during the population evolution.
                                                                                   In fact, proportional cloning allows B cells with high affinity
                                                                                   values to survive for many more generations, and the process
                                                                                   can easily become trapped in local minima.
                                                                                      2) Hypermutation Operators: The hypermutation operators
                                                                                   act on the B cell receptor of the clone population           . The
                                                                                   number of mutations         is determined by a specific function,
                                                                                   mutation potential, with their being several mutation potentials
                                                                                   in existence [51]. In the same research paper, some significant
                                                                                   hypermutation operators are discussed and quantitatively com-
                                                                                   pared with respect to their success rate and computational cost.
                                                                                   The authors of the paper investigated the searching capability
                                                                                   of the IAs based on clonal selection principle using static, pro-
                                                                                   portional and inversely proportional hypermutation operators
                                                                                   and hypermacromutation operator. Analyzing the parameter
                                                                                   surface for each variation operator and the performance on a
                                                                                   complex “toy problem,” the trap functions, and the 2-D HP
                                                                                   model, clarifies that few different and useful hypermutation
                                                                                   operators exist, namely: inversely proportional hypermutation,
B. The Clonal Selection Algorithm                                                  static hypermutation, and hypermacromutation operators. It
    The proposed IA (see Table II) employs two entity types: anti-                 appears that making use of inversely proportional hypermu-
gens (Ag) and B cells. The Ag models the hydrophobic-pattern                       tation and hypermacromutation can contribute to finding the
of the given protein, that is a sequence                      , where              best experimental results for the 2-D HP model. As a conse-
is the protein length, i.e., the number of amino acid in the pro-                  quence of these results, we implemented the IA presented in
tein sequence. The B cell population,             , represents a set of            this paper, with inversely proportional hypermutation operator
candidate solutions in the current fitness landscape at each gen-                   and a hypermacromutation operator. The hypermutation and
eration . The B cell, or B cell receptor, is a sequence of di-                     the hypermacromutation operators mutate the B cell receptors
rections                        (with               ,                and           using different mutation potentials.
                 ), where each , with                        , is a rela-             If during the mutation process, a constructive mutation oc-
tive direction [49] with respect to the previous direction                         curs, the mutation procedure will move on to the next B cell. We
(i.e., there are      relative directions) and the nonrelative di-                 call such an event: Stop at the first constructive mutation (FCM).
rection. Hence, we obtain an overall sequence of length                 .          We adopted such a mechanism to slow down (premature) con-
The sequence specifies a 2-D conformation which is suitable                         vergence, thus allowing a more detailed search through the land-
for computing the energy value of the hydrophobic-pattern of                       scape. A different policy would make use of        mutations ( -
the given protein.                                                                 mut), where the mutation procedure performs all          mutations
    At each generation , there is a B cell population             of size          determined by the potential for the current B cell. With this
  . The initial population, time         , is randomly generated in                policy, however, and for the problems which are faced in this
such a way that each B cell of          , is a self-avoiding confor-               paper, the implemented IA did not provide good results [51].
mation. There are two main functions within the algorithm.                               a) Inversely Proportional Hypermutation: The inversely
    Evaluate(P) which computes the fitness function value of                        proportional hypermutation operator, makes mutations in-
each B cell             ; hence               is the energy of confor-             versely proportional to the fitness value. In particular, at each
mation coded in the B cell receptor ; and Termination_Condi-                       generation , the operator will perform at most the following
tion() which returns true if a solution is found, or a maximum                     mutations:
number of fitness function evaluations                 is reached.
    The implemented IA, like all IAs based on the clonal selection                                                                             if
principle outlined above, is characterized by cloning of B cells                                                                                             (3)
                                                                                                                                               if
with higher antigenic affinity, affinity maturation, and hyper-
mutation of offspring B cells. Within our approach, we employ                      with              , and     the current best fitness value or the
three immune operators: cloning, hypermutation and aging, and                      best-known value. In this case,              has the shape of an
a standard evolutionary operator: the              -selection operator.            hyperbola branch.
    1) Static Cloning Operator: The cloning operator [4], [18],                       In [51], the hypermutation operators obtained by varying the
simply clones each B cell          times producing an intermediate                 parameter , were thoroughly tested. Studying the parameter
population           of size                 . Throughout this paper,              surfaces of the trap functions and the PSP, the authors discov-
we will refer to this as static cloning operator, as opposed to a                  ered that for the hypermutation operator, inversely proportional
proportional cloning operator (used in the pattern recognition                     to the fitness function value [modeled by (3)], the best values
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.


CUTELLO et al.: AN IMMUNE ALGORITHM FOR PROTEIN STRUCTURE PREDICTION ON LATTICE MODELS                                                                        5



                                                                                    ulations         ,          and            . The parameter         sets the
                                                                                    maximum number of generations allowed for generated B cells
                                                                                    to remain in the population. When a B cell is                        old it
                                                                                    is erased from the current population, no matter what its fit-
                                                                                    ness value is. We call this strategy, static pure aging. During the
                                                                                    cloning expansion, a cloned B cell inherits the age of its parent.
                                                                                    After the hypermutation phase, a cloned B cell which success-
                                                                                    fully mutates, i.e., it obtains a better fitness value, will be consid-
                                                                                    ered to have age equal to 0. Thus, an equal opportunity is given
                                                                                    to each “new genotype” to effectively explore the fitness land-
                                                                                    scape. We note that for          greater than the maximum number of
Fig. 2. Example of the hypermacromutation operator applied in the range [i; j]
(in bold face the values successfully mutated).                                     allowed generations, the IA works essentially without the aging
                                                                                    operator. In such a limited case, the algorithm employs a strong
                                                                                    elitist selection strategy.
for the parameter are located in the range                    . In par-                The aging operator is implemented by extending the B cells
ticular, for the sequences 1, 2, 3, 4, and 12 the best value for                    data structure with a counter          , which is initialized as
   is 0.4; for the sequences 5, 6, 9, 10, and 11 the best value is                  at generation            and whenever a cloned B cell is successfully
          ; for the sequence 7 the best value is             ; and for              mutated. B cells are selected for survival, only if its life
the sequence 8 the value is             .                                              . The age of each B cell is incremented by one for each of
      b) Hypermacromutation Operator: For the hyperma-                              the surviving B cells. If the surviving B cells are less than
cromutation operator [52] (previously introduced in [53] as                         (the population size), new B cells are randomly created (with
“macromutation operator”), the number of mutations is deter-                                    ) and are added by the Elitist_Merge function into the
mined by a simple random process which does not use functions                       population.
depending upon constant parameters. Attempts are made to                               Within the literature there is a similar mechanism of the
mutate each B cell receptor          times, whilst maintaining the                  aging process using evolution strategies (ES) [55], where the
self-avoiding property. The number of mutations            is at most               authors allow a life span, , for each parent of a                 -ES or a
                        , in the range     , with and being two                             -ES. A parent older than generations is not considered
random integers such that                          (see Fig. 2). The                further in the selection process, leaving the new offspring to
number of mutations is independent from the fitness function                         enter into the population at the next generation. This mecha-
and any other parameter. The hypermacromutation operator for                        nism allows a more flexible variation of the selection scheme
each B cell receptor, randomly selects a perturbation direction,                    between the two extreme cases                   , that is       -ES, and
either from left to right                      or from right to left                         , that is           -ES. As noted by the authors of the above
                  .                                                                 cited paper, this mechanism has not been properly investigated
    In general, the mutation operators perturb the B cells pop-                     and appears to be a “standalone” research work.
ulation           , generating the new populations                  and                4)              -Selection With Birth Phase and No Redundancy:
            , respectively. Each B cell is a feasible candidate                     A new population                 , of B cells, for the next-genera-
solution of the HP model (for simplicity in 2-D using relative                      tion         , is obtained by selecting the best B cells which “sur-
encoding, it is straightforward to extend to 3-D using other                        vived” the aging operator, from the populations                  ,
encoding schemes), making a self-avoiding walk chain on the                         and                . No redundancy is allowed. Thus, each B cell re-
lattice,                                                     . Hence,               ceptor is unique, i.e., each genotype is different from all other
given a protein conformation sequence , the mutation op-                            genotypes in the current population              . If only          B cells
erator randomly selects a direction ,                               , or            survived, then the Elitist_Merge function creates                   new B
a subsequence                                          (            and             cells (Birth phase). Hence, the                -selection operator (with
              ); then, for each relative direction            , it ran-                       and              , or             if both variation operators
domly selects a new direction                            . If the new               are activated) reduces the offspring B cell population (created
conformation is again self-avoiding, then the operator accepts                      by cloning and hypermutation operators) of size                        to a
it, otherwise, the process is repeated using the last direction                     new parent population of size                  . The selection operator
            ,                  .                                                    identifies the best elements from the offspring set and the old
    3) Aging Operator: This operator is designed to generate di-                    parent B cells, thus guaranteeing monotonicity in the evolution
versity, in an attempt to avoid getting trapped in local minimum.                   dynamics.
Although it is an operator inspired by the observation in the im-                      The properties of each immune operator are relatively well
mune system that there is an expected mean life for the B cell                      understood: the cloning operator explores the attractor basins
[54], the aging operator can be thought of as a general problem-                    and valleys of each candidate solution; the hypermutation op-
and algorithm-independent operator.                                                 erators introduce innovations by exploring the current popula-
    The aging process attempts to capitalize on the immunolog-                      tion of B cells; the aging operator creates diversity during the
ical fact that B cells have a limited life span, and that memory                    search process. The selection evolutionary operator directs the
B cells have a longer life span. Starting from this basic obser-                    search process toward promising regions of the fitness landscape
vation, the aging operator eliminates old B cells from the pop-                     and exploits the information coded within the current popula-
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.


6                                                                                                       IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION




Fig. 3. Average fitness function values of P         ,P          ,P     and the best B cell receptor on protein sequence Seq2, with parameter values d = 10,
dup = 2, and  = 5.

tion. While selection is a universal problem- and algorithm-in-                    tion the system discovers during the learning phase [19]. To this
dependent operator, hypermutation, and in general mutation and                     end, we define the B cells distribution function      as the ratio
crossover operators are specific operators that focus on the struc-                 between the number,       , of B cells at time with fitness func-
ture of the given landscape.                                                       tion value , and the total number of B cells
   Finally, it is worth noting that representation and mutation
operators presented in this section use a discrete coding. They
work on an alphabet of three letters                 for relative di-                                                                                        (4)
rections in 2-D square lattices, and on an alphabet of six let-
ters                          (where             ,              , and              It follows that the information gain can be defined as:
                  ) for relative directions in 3-D cubic lattice (see
Section VII). Hence, the work described in this paper is appli-
cable only in the context of discrete coding. In Table II, we out-                                                                                           (5)
line the pseudocode of the proposed IA.
                                                                                   The gain is the amount of information the system has already
                          IV. IA DYNAMICS                                          learnt from the given problem instance with respect to the ran-
   In this section, we discuss the characteristic dynamics of the                  domly generated initial population              (the initial distribu-
proposed IA. The population size is set to and the maximum                         tion). Once the learning process begins, the information gain
number of fitness function evaluations allowed is set to           ,                increases monotonically until it reaches a final steady state (see
for minimal values             and            10 , with           ,                Fig. 4). This is consistent with the idea of a maximum informa-
          and         . All the values reported in this section are                tion-gain principle of the form                    .
averaged on 100 independent runs.                                                     Fig. 5 shows the information gain curves for                    and
   In Fig. 3, we show the average fitness values of populations                              . For           the IA learns a greater amount of infor-
                          and the best fitness value when the IA                    mation than for            , in fact, in the inset plot, the standard
faces the PSP instance         ,                                  ,                deviation obtained with             is greater than          .
(         and minimum energy value known                 ).                           In the axis log plot 4, it is evident how the information gain
   In this figure, we can see how the four curves decrease, al-                     is a more informative measure than the mean fitness. The stan-
most monotonically, approximately in the first 20–40 gener-                         dard deviation, the uncertainty over the population of a given
ations, whereas in the remaining generations all four curves                       generation (see the inset plot in Fig. 4), decreases quickly in
reach a steady-state dynamics. The small oscillations are due                      the first ten generations. In fact, the IA converges to the global
to the random nature of the overall process governing the hy-                      minimum in this temporal window. After this “threshold” the
permutation and the hypermacromutation operators. The higher                       standard deviation suddenly increases, producing strong oscil-
the average fitness of the hypermutated and hypermacromutated                       lations; that is, strong uncertainty regarding the current popula-
clones, the higher is the diversity in the current population [19].                tions for          .
                                                                                      The mean value is essentially constant during all generations.
A. Maximum Information Gain Principle                                              For example, in the first generation, the IA gains more infor-
  To analyze the learning process, we use an entropy function,                     mation than in the second, because it generates more construc-
the Information Gain. This measures the quantity of informa-                       tive mutations. Thus, the population at generation                 ex-
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.


CUTELLO et al.: AN IMMUNE ALGORITHM FOR PROTEIN STRUCTURE PREDICTION ON LATTICE MODELS                                                                             7




Fig. 4. Information gain, mean fitness versus generations of IA on protein sequence Seq2, with parameter values d = 10, dup = 2, and                  = 5. Inset plot
displays standard deviation.




Fig. 5. Information gain and standard deviation versus generations on protein sequence Seq 2 varying         2 f1; 5g.


tracts more informative building blocks than the population at                      As a function of       and , we show the success rate (SR).
the second generation.                                                              The 3-D plots obtained are characteristic parameter surfaces
                                                                                    for the given operator. We set the population size to a minimal
B. Searching Ability of Hypermutation and                                           value         , to emphasize the property of each operator when
Hypermacromutation                                                                  working with few B cells (points) in the conformational space.
                                                                                    This strategy provides a good measure of the “real” performance
   To understand the searching ability of hypermutation opera-                      of single hypermutation procedures. In addition, the Termina-
tors when across a range of parameter values, we performed a                        tion_Condition() function is allowed at most             10 fit-
set of experiments on the PSP instance,          . The duplication                  ness function evaluations and we performed for each value pair
parameter      varies from 1 to 10 and the aging parameter                          of the parameters 100 independent runs. Using the      and
is drawn from the set                                             .                 (average number of evaluations to solution) values, our experi-
We note that setting the parameter         at a higher value than                   mental protocol has the following three objectives:
the possible number of generations is equivalent to giving the                        1) to plot the characteristic parameter surface of each hyper-
B cell an infinite life, in effect, we turn off the aging operator.                        mutation operator;
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.


8                                                                                                       IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION




Fig. 6. SR versus parameter values dup and        for the sequence 2. The surface parameters of the combination of inversely proportional hypermutation and
hypermacromutation operators.



  2) to analyze the joint effects of hypermacromutations and                           If the native fold has energy value              , we have
     hypermutations;
  3) to find the best settings of the parameter values for each
     operator and for their combination, such that, the best de-
     limited region on the parameter surfaces maximizes the SR                     energy levels, thus the boundary of the first partition and sec-
     value and minimizes the AES value.                                            ondary partition are, respectively
   Fig. 6 shows the surface parameters of the inversely propor-
tional hypermutation operator and the combination of inversely
proportional hypermutation and hypermacromutation operators
(surface in bold face): the hypermacromutation extends the re-
gion with high SR values, and in particular improves the region                    and
where the inversely proportional hypermutation operator alone
performed poorly (                      and               ). The
highest peak,             , is obtained for        and
with                    .
                                                                                       Hence, the B cells with energy in the range
     V. PARTITIONING THE FUNNEL BY MEMORY B CELLS
   Folding energy landscapes are funnel-like, which means that
many conformations have high energy and few have low energy.
More formally, protein conformations having high free energy                       have a life span         , the B cells with energy in the range
(a single point on an energy landscape) have high conforma-
tional entropy and states having low free energy (native state and
other deep minima) have low conformational entropy. Discrete
models have this characteristic, in particular, in the HP model
where the energy level is a funnel landscape [44]; for example,                    have a longer life span               , while all the B cells with
the seq. 1 of the benchmarks (see Table I) over 83,779,155 valid                   energy             have the same life span of B cells in the first
conformations have approximately 66 10 conformations with                          partition .
high “energy” (i.e., 0 or 1), and only four conformations in the                      Fig. 7 shows the partitioning of the funnel landscape of the
native state with minimal energy              (see [44, Table I]).                 PSP problem in three regions. The B cells either belong to the
Starting from this simple topological observation, we can de-                      top region or to the bottom region and have life span , while
duce that within the funnel landscape the hardest area to search                   the memory B cells belong to the middle region and have life
is the middle region: it is typically rugged with many local                       span         .
minima.                                                                               Theoretical findings in [56] and experimental results under-
   For this reason, we partition the funnel landscape in three                     taken by ourselves which are not reported in this paper, show
regions where in the rugged middle region we allocate B cells                      that the hardest region to search is the middle. Typically, it is
with a longer life span, called memory B cells.                                    rugged containing many local minima. Therefore, we only apply
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.


CUTELLO et al.: AN IMMUNE ALGORITHM FOR PROTEIN STRUCTURE PREDICTION ON LATTICE MODELS                                                                        9



                                                                                                                TABLE III
                                                                                          COMPARISONS OF ENERGY EVALUATIONS FOR THE 2-D HP MODEL




Fig. 7. Partitioning of the funnel landscape using memory B cells.                     By observing the performance of the IA with no memory B
                                                                                    cells (Table IV), we can see how the IA using                     does
                                                                                    substantially better in 8 out of 12 PSP instances, while the IA
memory B cells to such a region. Conformations whose energy                         using a longer life span,              , does better in the remaining
value is in the middle region, are allowed to mature.                               four PSP instances. Both IA versions reach the best-known con-
                                                                                    formations with maximal success rate,                    , for the PSP
              VI. RESULTS FOR THE 2-D HP MODEL                                      instances: 1, 2, 3, 6, 9, 10, 11, 12; for the protein sequences 4
   In this section, we show the overall performances of the IA for                  and 5, the IA with               obtains higher success rate values
the protein structure prediction problem for the 2-D HP model                       than with              , while for the protein sequences 7 and 8, the
using the well-known tortilla benchmarks (see Table I). The ex-                     IA is not able to reach the best-known minima, and it appears to
periments were performed with population size                  , du-                become trapped in local minima.
plication parameter             , and maximum number of fitness                         In Table V, we present the results obtained by the IA
function evaluations              10 . All the experimental results                 using memory B cells, with the following aging values
reported in this section are averaged over 30 independent runs.                                                   ,                               . We can
   Table III shows the number of energy evaluations required on                     see how the IA when using                                   , performs
the best run to achieve the optimum, (or the best value found),                     better in 8 out of 12 PSP instances, reaches (for the first time)
by a genetic algorithm (GA) and a Monte Carlo approach as                                          for the protein sequence No. 4 and increases the
reported in [44], the multimeme algorithm [57], the estimation                      success rate value for the protein sequence No. 5 obtaining
of distribution algorithms (EDA) [58], and the IA for eight in-                                    .
stances in the 2-D HP model used in [41] (see [41, Table 6]) and                       When comparing the results obtained by the IA with memory
in [59] (see [59, Table 3]) as test bed for the folding algorithms.                 B cells (Table V) and without memory B cells (Table IV) we are
Gaps in Table III indicate that the particular folding algorithm                    able to see that the IA with memory B cells with aging values
has not been tested on the respective protein instance. As we can                                              , outperforms the IA without memory B
see, the IA obtains the lowest number of energy evaluations, ex-                    cells. In fact, this version reaches the best-known conformations
cept for the protein instance 4 where the EMC approach reaches                      with maximal success rate,                 , for the PSP instances: 1,
a lower number of energy evaluations, and for the protein in-                       2, 3, 4, 6, 9, 10, 11, 12; 9 out of 12 PSP instances; and for the
stance 8, where the EDAs obtain a conformation with 41 topo-                        protein sequence No. 5 obtains the highest success rate value,
logical contacts.                                                                                  .
   For the sake of completeness, we must note the generality of                        Therefore, with the IA using memory B cells,                    and
the MMA algorithm; it has been tested on the HP model and                                          (see Table V), we find it is able to locate the best-
functional model proteins using various lattices, 2-D and 3-D                       known energy values with maximum success rate on 9 pro-
square and triangular lattice, without any modification to the                       tein instances over 12. For protein sequence 5, IA obtains a
algorithm. It would appear that the algorithm performs robustly                                      , while for the instances 7 and 8 (the “hard in-
across all the models [41], [42], [57].                                             stances”), the IA reaches only suboptimal energy values, respec-
   Tables IV and V show the results obtained by the IA using no                     tively, 35, and 39 with high mean, and standard deviation
memory B cells, and memory B cells. For each protein instance,                      values. To improve these protein instances we include in the IA,
and for each value of , the tables report the SR, AES, best                         a special local search procedure known as the Long Range Move
found energy value (b.f.), mean and standard deviation values.                      (as defined and used in [59]). This procedure tries to escape a
In bold face, we show the best results reached for each instance                    local minimum by unfolding the candidate solutions when they
of the tortilla benchmarks, sorted first by SR value, then by AES                    are trapped in a local minimum.
value. For           , we sort them by the following ordered cri-                      In [59], one of the key features of the improved ant colony
teria: best found conformation, mean and standard deviation.                        optimization method is the Long Range Move (LRM). This
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.


10                                                                                                     IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION



                                                              TABLE IV
                                   RESULTS OF THE IA WITH NO MEMORY B CELLS FOR THE 2-D HP MODEL




                                                              TABLE V
                                     RESULTS OF THE IA WITH MEMORY B CELLS FOR THE 2-D HP MODEL




local search, as noted by the authors, mimics the folding                         search starts a “chain reaction” that loops until a self-avoiding
process of the real proteins, where a moving residue will                         path condition is held. Practically, this procedure allows a
typically push its neighbors in the chain to different positions.                 given conformation to fold and unfold moves to escape local
The first step of the procedure selects a direction in a given                     minima in the multiple-minima funnel landscape. Since the
conformation uniformly at random. The second step of the                          above cited procedure is obviously time consuming, as in
procedure randomly changes the direction, and then modifies the                    [59], we apply it to the best conformation in the current
directions of the remaining residues probabilistically. The local                 population of the algorithm.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.


CUTELLO et al.: AN IMMUNE ALGORITHM FOR PROTEIN STRUCTURE PREDICTION ON LATTICE MODELS                                                                       11



                            TABLE VI                                                  Another Monte Carlo method is the EMC [63] that works
       RESULTS OF THE IA WITH LRM FOR THE 2-D HP MODEL,                            with populations of candidate solutions which are optimizated
                   PROTEIN INSTANCES 5, 7, AND 8
                                                                                   by Monte Carlo simulation. This hybrid algorithm found the
                                                                                   best-known structure for protein sequence 8, with energy
                                                                                        .
                                                                                      The contact interactions algorithm, CI, may be regarded as
                                                                                   an extension of the standard MC method which is improved by
                                                                                   the strategy of cooperativity. The major innovation of the CI
                                                                                   approach [45] is that criteria for acceptance of new conforma-
                                                                                   tions are not based on the energy of the entire protein, but on
                                                                                   the fact that cooling factors associated with each residue de-
                                                                                   fine regions of the model protein with higher or lower mobility.
                                                                                   Hence, the CI is not a blind general purpose algorithm, it uses a
   In Table VI, we report the results of the IA using the long                     heuristic based on effects of an H-H contact on the mobility of
range move (IA WITH LRM), for the hard instances, protein se-                      the residues in different portions of a protein. The CI algorithms,
quences 5, 7, and 8. By inspecting the table, we note that for                     with a fixed starting temperature of               (CI,          ), or
instance 5 the IA obtains poor results, while for sequence 7,                      with different starting temperature (CI), proved to be very effi-
although the IA does not reach the best-known energy value                         cient to localize energy minima.
( 36), the algorithm with the long range move always reaches                          Among the best folding algorithms there is the tabu search
the suboptimal energy value 35, which is the best result ob-                       strategy [65]. This incorporates problem domain knowledge into
tained for the IA. It is worthy of note, that the longest protein                  the algorithm, for instance the conformational motifs, during the
sequence the IA with LRM reaches the best-known energy value                       search process for finding low energy conformations.
  42 with mean 39.2 and                 .                                             The core-directed chain growth (CG) [64] is a very efficient
                                                                                   algorithm that has found optimal and best-known conformations
A. Comparison With State-of-Art Folding Algorithms                                 for protein instances 1–6 and 8. It is an ad hoc heuristic that
   In this section, we briefly present related works for the 2-D                    approximates the hydrophobic core of discrete proteins.
HP model. We present this here, as it allows us to show compar-                       The EDA [58] is a suitable class of nondeterministic search
ative results of our proposed approach, and current state-of-art                   procedures for the HP model. EDA constructs an explicit prob-
algorithms that are used on this problem.                                          ability model of the candidate solutions selected and captures
   The first application of EAs on the HP model was in [44]:                        relevant interactions among the variables of the given protein
a GA is employed, and conformations are changed by a muta-                         instance. The experimental results have proven the effectiveness
tion operator that follows the conventional Monte Carlo steps                      of the EDA approach to face lattice models for the standard HP
(MCmutation operator), and by a crossover operator. The au-                        benchmarks and the functional model proteins.
thors found that the GA is superior to conventional Monte Carlo                       The state-of-art algorithm for 2-D HP problem is an improved
methods (MC, LONG MC, and MULTIPLE MC).                                            ant colony optimization (ACO) algorithm, IMPROVED (IACO)
   In [60], there is an improved version of the simple GA (SG)                     [59]. The improvements over the previous ACO algorithm [67],
using a new crossover operator (systematic crossover), the                         are the following: long range moves for chain reconfigurations
SGA-S: it couples the best candidate solutions, tests every pos-                   when the protein conformation is very compact (described in
sible crossover point, and selects the two best conformations for                  Section VI); improving ants that take the global best solution
the next generation. In addition, the authors implemented a new                    found so far and apply a randomized greedy local search to it;
search strategy, the SGA with systematic crossover and pioneer                     and selective local search that performs the critical operation
search, SGA-SC-P, which tried to prevent the population from                       of the local search phase only on promising low energy con-
becoming too homogeneous, moving to a new search region                            formations. Moreover, in [59], the authors report poor perfor-
every ten generations.                                                             mance of a local search procedure (ONLY L), modest perfor-
   A famous metaheuristic for combinatorial optimization is the                    mances of an (IMPROVED GA), and good performance of Pruned
Memetic Algorithm. Memetic algorithms are EAs that include in                      Enriched Rosenbluth Method (PERM) [66]. The Grassberg and
the evolutionary cycle a local search procedure [61]. The Mul-                     co-workers’ algorithm is a Monte Carlo method which is among
timeme algorithm (MMA) [41], [61] is a memetic algorithm                           the best-known algorithms for the 2-D HP model, and is a bias
that self-adaptively selects from a set of local searchers, which                  chain growth algorithm. PERM found the best solution for the
heuristic to use during the search process for different instances.                protein sequence No. 7,                .
The MMA has been used on different protein structure models                           In Table VII, we report the comparisons with the state-of-art
with results competitive with other techniques [41].                               algorithms for the 2-D HP model. The reported energy values
   To improve the performance of the GA, in [62] the authors                       are the lowest obtained by each method. Gaps in the table in-
proposed a hybrid algorithm of GA and tabu search (TS) and                         dicate that a particular algorithm has not been tested on the re-
novel crossover operator borrowed from the TS. The introduc-                       spective protein sequence. The shown results suggest that the
tion of the TS approach improves the overall performances of                       proposed IA using the aging operator and memory B cells, and
the GA and shows that in all the instances, the hybrid algorithm                   the IA with LRM are comparable to and, in many protein in-
GTS works better than a GA alone.                                                  stances, outperform the best algorithms.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.


12                                                                                                      IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION



                              TABLE VII                                                                           TABLE VIII
     IA VERSUS THE STATE-OF-ART ALGORITHMS FOR THE 2-D HP MODEL                                     RESULTS OF THE IA FOR THE 3-D HP MODEL




                                                                                   avoidance constraint such that each set of moves will correspond
                                                                                   to a feasible sequence (feasible conformation).
                                                                                      By inspecting the experimental results for all the considered
                                                                                   instances, it is worthy to note that the IA (working with feasible
                                                                                   solutions) locates the known minimum value. For all instances,
                                                                                   the located mean value is lower than the results obtained in [68],
                                                                                   where EAs working on feasible-space were employed. For sev-
                                                                                   eral sequences presented in [69], (as shown in Table VIII), we
                                                                                   have found new, best—lowest energy values for 3-D protein se-
                                                                                   quences 5, 7, and 8 (results reported in bold face in Table VIII).
                                                                                      For our experiments, the IA was set with standard parameter
                                                                                   values:          ,         , as described in [51], B cells have the
                                                                                   aging parameter              and memory B cells               . For
                                                                                   the experimental protocol, we adopted the same values used in
                                                                                   [69]: 50 independent runs and a maximum number of evalua-
                                                                                   tions equal to 10 . In [68], the author does not use the SR and
                                                                                   AES values as quality metrics, but the following parameters:
                                                                                   Best found solution (Best), mean and standard deviation          .
                                                                                      In addition, we designed an IA which made use of a penalty
                                                                                   strategy and a repair-based approach as reported in [68], which
                                                                                   obtained similar experimental results to [68] (not shown). Such
             VII. RESULTS FOR THE 3-D HP MODEL                                     an IA proved to be very efficient for both absolute and relative
                                                                                   encoding, and allowed us to find energy minima not found by
   The protein structure prediction problem with            , and                  other EAs working in feasible space and described in literature
making use of a square lattice, captures the protein folding                       [68].
problem in the 2-D HP model [39]. Analogously for             and
using a cubic lattice, we have the 3-D HP model [39].                                   VIII. RESULTS FOR THE FUNCTIONAL MODEL PROTEINS
   In the 3-D cubic lattice, each point has six different neigh-
                                                                                      Table IX shows the benchmarks for the functional model pro-
bors and five available locations. We use two different schemes
                                                                                   teins into 2-D square lattice [41], [43], the instance number,
of moves, (absolute and relative directions), to represent and
                                                                                   protein sequence, optimal conformation, and the minimal en-
embed a protein in the lattice. The relative and absolute en-
                                                                                   ergy values. Each instance of the benchmarks2 has a unique na-
coding were described in Section III-B: the residues directions
                                                                                   tive fold conformation minimal energy value, , and an energy
are relative to the direction of the previous move, whilst in the
                                                                                   gap between      and the first excited state (best suboptimal). In
absolute directions, encoding the residues direction is relative
                                                                                   Fig. 8, we report the native fold of all the protein sequences,
to the axes defined by the lattice.
                                                                                   each native fold has at least one binding site, or binding pocket
   Both for the absolute and relative coding, not all moves pro-
vide a feasible conformation. In our work, we force the self-                         2http://www.cs.nott.ac.uk/ñxk/HP-PDB/2dfmp.html
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.


CUTELLO et al.: AN IMMUNE ALGORITHM FOR PROTEIN STRUCTURE PREDICTION ON LATTICE MODELS                                                                        13



                             TABLE IX                                                                             TABLE X
        2-D SQUARE LATTICE FUNCTIONAL MODEL INSTANCES [43],                               ELITIST AGING VERSUS PURE AGING IN THE FUNCTIONAL MODEL
           WHERE EACH PROTEIN SEQUENCE HAS 23 MONOMERS                                     PROTEINS IN TERMS OF (SUCCESS RATE, AVERAGE NUMBER OF
                                                                                                          EVALUATIONS TO SOLUTION)




                                                                                    this section, all experimental results reported, were obtained by
                                                                                    the IA with inversely proportional hypermutation and hyperma-
                                                                                    cromutation operators. All the values are averaged on 30 inde-
                                                                                    pendent runs.
                                                                                       Analogously to the HP model instances, in the functional
                                                                                    model proteins, the pure aging procedure outperforms the elitist
                                                                                    aging in term of SR and AES on all functional model instances.
                                                                                    The IA with pure aging obtains a                   for all the instances
                                                                                    excluding the functional protein sequence 3, where the algo-
                                                                                    rithm reaches                  . This confirms the optimal searching
                                                                                    ability and diversity generation of the pure aging strategy.
                                                                                       In Table XI, we report the experimental results obtained when
                                                                                    using memory B cells. As described previously, we partition the
                                                                                    funnel landscape energy levels. For example, sequences 1 and 4
                                                                                    have               , it follows that there are 21 energy levels, thus
                                                                                    all the B cells with energy value in the range
                                                                                    or with energy                will have life span equal to , while
                                                                                    the B cells with energy                          will be considered as
                                                                                    memory B cells with a life span of                 . Table XI presents
                                                                                    the best experimental results obtained using the aging values
                                                                                                                 ,                             , and
                                                                                                         ; the best values of SR and AES are shown
                                                                                    in bold face. From these results, it is clear that                 , and
                                                                                                     appear to be the optimal choice for partitioning
                                                                                    the funnel landscapes of the protein instances.
                                                                                       When comparing the results obtained by the IA, when
                                                                                    adopting a pure aging strategy and                   with the IA using
Fig. 8. Native fold for 2-D square lattice functional model instances.              memory B cells with                , and                , the algorithm
                                                                                    performs slightly better without the memory B cells. We be-
(which are illustrated in figure by arrows pointing to the binding                   lieve this is due to the length of the protein sequences of the
site(s) of each functional model instance).                                         functional model proteins, which have 23 residues only. Hence,
   In the first experiment, we compared the performance of the                       for short protein sequences                 , the IA without memory
IA with and without elitist aging (see Table X) using the stan-                     B cells performs better than the IA with memory B cells,
dard parameter values:           ,         ,        ,         . In                  both for the HP model and for the functional model proteins.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.


14                                                                                                     IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION



                          TABLE XI                                                   We propose that more effective metrics to assess the overall
IA PERFORMANCES USING MEMORY B CELLS IN TERMS OF (SUCCESS RATE,                   performances are the success rate values and the average
 AVERAGE NUMBER OF EVALUATIONS TO SOLUTION) FOR VARIOUS PAIRS
                       OF ( ,     )                                             number of evaluations to solution. The number of fitness func-
                                                                                  tion evaluations required by the best run for a given instance
                                                                                  are less significant for showing the overall performance of the
                                                                                  randomized algorithms.


                                                                                                              IX. IA VERSUS EAs
                                                                                     It is worth a little time to highlight the contribution of our
                                                                                  algorithms to the artificial immune systems discipline. The pro-
                                                                                  posed IA makes use of a new hypermutation operator, the hyper-
                                                                                  macromutation operator, that extends the region of the param-
                                                                                  eter surface with high SR values, and in particular improves the
                                                                                  region where the inversely proportional hypermutation operator
                                                                                  alone, performed poorly. This simple random process, which
                                                                                  does not use functions dependent upon constant parameters, im-
                                                                                  proves the overall performance of the IA.
                                                                                     The second innovation of the IA is the aging operator, which
                                                                                  is used to generate and maintain diversity in the population.
                                                                                  As shown in the plots and in the tables, the aging operator and
                                                                                  memory B cells with a longer life span, are the key features of
                           TABLE XII
COMPARISON OF BEST RUNS FOR MMA [41], EDAS [58], AND IA, WITH AND
                                                                                  the proposed approach that we feel can be inserted in any EA.
  WITHOUT MEMORY B CELLS FOR THE FUNCTIONAL MODEL PROTEINS                        In fact, as a selection operator, the aging operator is a general,
                                                                                  problem- and algorithm-independent operator.
                                                                                     Obviously, the implemented algorithm can be applied to any
                                                                                  other combinatorial and numerical optimization problem apart
                                                                                  from the protein structure prediction problem using suitable rep-
                                                                                  resentations and variation operators [4], [9], [18], [69].

                                                                                  A. IA Versus Other Clonal Selection Algorithms
                                                                                     A well-known clonal selection algorithm in the AIS literature,
                                                                                  is CLONALG [8], [69]. This algorithm employs fitness values
                                                                                  for proportional cloning, inversely proportional hypermutation,
                                                                                  and a birth operator to introduce diversity in the current popula-
                                                                                  tion along with a mutation rate to flip a bit of a B cell. Extended
                                                                                  versions of this algorithm use a threshold value to clone the best
                                                                                  cells in the present population.
                                                                                     CLONALG maintains two populations: a population of
                                                                                  antigens        and a population of antibodies        (indicated with
                                                                                       ). The individual antibody,       , and antigen,      , are repre-
                                                                                  sented by string attributes                          , that is, a point
                                                                                  in an L-dimensional real-valued shape space S,                        .
However, for long protein sequences             , partitioning the                The Ab population is the set of current candidate solutions,
funnel landscape with the memory B cells appears to be a good                     and the Ag is the environment to be recognized. The algorithm
strategy for effectively searching the rugged landscape in the                    loops for a predefined maximum number of generations                   .
middle of the funnel.                                                             In the first step, affinity values (fitness function values) are
   Finally, in Table XII, we present the number of energy eval-                   determined for all           in relation to the    . Then, it selects
uations required by the best run to locate the optimum or a sub-                           that are to be cloned independently and proportionally to
optimum energy value. We compare the performances of the                          their antigenic affinities, thus generating the clone population
IA with and without memory B cells, with the state-of-art al-                             . The higher the affinity, the higher the number of clones
gorithms for the functional model proteins: the MMA [41] and                      generated for each of the              with respect to the following
the EDAs [58]. Both versions of the IA, outperform the MMA                        function:
and EDA on all the test bed except for the protein instance 3
where EDA reaches a lower number of energy evaluations. In
particular, the IA without memory B cells obtains the best re-                                                                                              (6)
sults on 9 instances over 11.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.


CUTELLO et al.: AN IMMUNE ALGORITHM FOR PROTEIN STRUCTURE PREDICTION ON LATTICE MODELS                                                                           15



where is a multiplying factor. Each term of the sum corre-                         the overall performance of the IA in terms of solution quality
sponds to the clone size of each Ab. The hypermutation operator                    and average number of evaluations to solution (metric, we feel
performs an affinity maturation process inversely proportional                      is more robust than run-time values). We have made use of var-
to the fitness values, generating the matured clone population                      ious discrete protein structure models, and have compared the
        . After having computed the antigenic affinity of the pop-                  results with the present state-of-art algorithms when applied to
ulation          , CLONALG randomly creates new antibodies                         each model.
that will replace the lowest fit          in the current population.                   As future work, we intend to tackle the prediction of 3-D
   Clearly, both CLONALG and the IA are inspired by the                            structures for actual proteins [70] using the designed IA. As
clonal selection principle. Hence, they have many similarities,                    in [70], we plan to consider the parallelization of the folding
but there are some significant differences (for a comparative                       algorithm in order to reduce execution time and resource
study see [69]). We begin with the common features. Both                           expenditure.
algorithms employ a population of Ag’s to represent the input,                        The IA uses a new hypermutation operator, the hyperma-
whilst the population of immune entities (Ab’s or B cells) are                     cromutation operator, which does not use functions depending
the candidate solutions to the given computational problem.                        upon constant parameters, and extends the region of the param-
Both algorithms use hypermutation operators inversely propor-                      eter surface with high success rate values. This operator con-
tional to the fitness values.                                                       tributes to the overall improvement in terms of performance of
   Considering their differences, while in the IA the cloning op-                  the IA.
erator selects all the immune entities for cloning. In CLONALG,                       A second innovation of the IA is the aging operator, which
it is possible to use a threshold value to clone only the best                     is used to generate and maintain diversity in the population. As
cells in the current population. In CLONALG, the cloning op-                       demonstrated by our results, the aging operator and memory B
erator is proportional to the fitness values depending on a mul-                    cells with a longer life span are the key features of the proposed
tiplying factor (see (6)). However, in the IA, the cloning oper-                   approach, that we propose could be inserted in any EA.
ator makes use of static cloning: each immune entity will pro-                        With regards to the actual computational results on the PSP
duce       clones. Hence, CLONALG uses proportional cloning,                       problem, we found that for short protein sequences                 ,
and the IA uses static cloning. The underlying notion of em-                       the IA without memory B cells performs better than the IA with
ploying static cloning, is to give each point of the given search                  memory B cells both for the HP model and for the functional
space equal opportunity to explore its neighborhood; propor-                       model proteins. For long protein sequences                  , parti-
tional cloning provides a bias to each point of the search space                   tioning the funnel landscape with the memory B cells is a good
based on its fitness function value. This bias could be useful or                   strategy to search more effectively the rugged landscape in the
not, depending of course on the computational problem being                        middle of the funnel.
addressed.
   To produce diversity in the population, at each generation                                                  ACKNOWLEDGMENT
CLONALG uses a birth operator which introduces                  new                  The authors would like to express their gratitude to the anony-
immune entities. The IA, however, uses an aging operator                           mous reviewers for their helpful comments on the manuscript.
modeled by an expected life time parameter . Additionally,                         G. Nicosia would like to thank the Computing Laboratory, Uni-
CLONALG uses memory B cells as an implicit “memory                                 versity of Kent, Canterbury, U.K., for their kind support.
mechanism,” an archive of the best candidate solutions, while
the IA version for the partition of the landscape uses memory                                                      REFERENCES
B cells with a longer mean life parameter              with
                                                                                        [1] D. Dasgupta, Artificial Immune Systems and Their Applica-
greater than the mean life of standard B cells           to allow a                         tions. Berlin, Germany: Springer-Verlag, 1999.
search of the rugged regions of the landscape.                                          [2] L. N. de Castro and J. Timmis, Artificial Immune Systems: A New Com-
   The selection scheme used by CLONALG to decide which                                     putational Intelligence Paradigm. London, U.K.: Springer-Verlag,
                                                                                            2002.
immune entities will go to the next generation uses elitism,                            [3] Y. Ishida, Immunity-Based Systems: A Design Perspective. Heidel-
while the IA uses a standard         —selection operator without                            berg, Germany: Springer-Verlag, 2004.
an elitist strategy.                                                                    [4] V. Cutello and G. Nicosia, “The clonal selection principle for in silico
                                                                                            and in vitro computing,” in Recent Developments in Biologically In-
   Finally, as termination condition, CLONALG typically uses                                spired Computing, L. N. de Castro and F. J. von Zuben, Eds. Hershey,
a fixed number of generations, while the IA can use three                                    PA: Idea Group Publishing, 2004.
                                                                                        [5] J. E. Hunt and D. E. Cooke, “Learning using an artificial immune
termination conditions: a fixed number of generations, a max-                                system,” J. Netw. Comput. Appl., vol. 19, pp. 189–212, 1996.
imum number of fitness function evaluations, or the maximum                              [6] G. Nicosia, F. Castiglione, and S. Motta, “Pattern recognition by pri-
information-gain principle                    [19]. Though the last                         mary and secondary response of an artificial immune system,” Theory
condition does not avoid the possibility of being trapped in                                in Biosciences, vol. 120, no. 2, pp. 93–106, 2001.
                                                                                        [7] T. Fukuda and M. T. K. Mori, “Parallel search for multimodal func-
local minima solutions, this termination condition measures the                             tion optimization with diversity and learning of immune algorithm,” in
quantity of information the IA discovers during the convergence                             Artif. Immune Syst. Their Appl., D. Dasgupta, Ed. Berlin, Germany:
process.                                                                                    Springer-Verlag, 1999.
                                                                                        [8] L. N. de Castro and F. J. V. Zuben, “Learning and optimization using
                                                                                            the clonal selection principle,” IEEE Trans. Evol. Comput., vol. 6, pp.
                           X. CONCLUSION                                                    239–251, Jun. 2002.
                                                                                        [9] G. Nicosia, “Immune algorithms for optimization and protein struc-
   This paper has proposed a novel IA for the protein struc-                                ture prediction,” Ph.D. dissertation, Dept. Math. Comput. Sci., Univ.
ture prediction problem. Within this paper, we have assessed                                Catania, Catania, Italy, 2004.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.


16                                                                                                         IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION



     [10] G. B. Bezerra, L. N. de Castro, and F. J. V. Zuben, “A hierachical              [34] V. Muñoz and W. A. Eaton, “A simple model for calculating the ki-
          immune network applied to gene expression data,” in Proc. 3rd Int.                   netics of protein folding from three dimensional structures,” in Proc.
          Conf. Artif. Immune Syst., G. Nicosia, V. Cutello, P. Bentley, and J.                Natl. Acad. Sci. USA, 1999, vol. 96, no. 20, pp. 11311–11316.
          Timmis, Eds., Catania, Italy, Sep. 2004, pp. 14–27.                             [35] M. A. Apaydin, D. L. Brutlag, C. Guestrin, D. Hsu, and J.-C.
     [11] V. Cutello, G. Narzisi, and G. Nicosia, “A multi-objective evolutionary              Latombe, “Stochastic roadmap simulation: An efficient representation
          approach to the protein structure prediction problem,” J. Royal So. In-              and algorithm for analyzing molecular motion,” in Proc. Annu.
          terface, vol. 3, no. 6, pp. 139–151, Feb. 2006.                                      Int. Conf. Comput. Molecular Biol., Washington, DC, Apr. 2002,
     [12] S. Ichikawa, A. Ishiguro, S. Kuboshiki, and Y. Uchikawa, “A method of                pp. 12–21.
          gait coordination of hexapod robots using immune networks,” J. Artif.           [36] N. M. Amato, K. A. Dill, and G. Song, “Using motion planning to map
          Life Robotics, vol. 2, pp. 19–23, 1998.                                              protein folding landscapes and analyze folding kinetics of known native
     [13] S. Singh and S. Thayer, “Kilorobot search and rescue using an immuno-                structures,” J. Comp. Biol., vol. 10, no. 3, pp. 239–255, 2003.
          logically inspired approach,” in Distributed Autonomous Robotic Sys-            [37] K. F. Lau and K. A. Dill, “A lattice statistical mechanics model of
          tems. Berlin, Germany: Springer-Verlag, June 2002, vol. 5.                           the conformational and sequence spaces of proteins,” Macromolecules,
     [14] L. Kesheng, Z. Jun, C. Xianbin, and W. Xufa, “An algorithm based                     vol. 22, pp. 3986–3997, 1989.
          on immune principle adopted in controlling behavior of autonomous               [38] K. A. Dill, S. Bromberg, K. Yue, K. M. Fiebig, D. P. Thomas, and H. S.
          mobile robots,” Comput. Eng. Appl., vol. 5, pp. 30–32, 2000.                         Chan, “Principles of protein folding: A perspective from simple exact
     [15] D. Dasgupta, “An artificial immune system as a multiagent decision                    models,” Protein Science, vol. 4, pp. 561–602, 1995.
          support system,” in Proc. IEEE Int. Conf. Syst., Man, Cybern., San              [39] K. A. Dill, “Theory for the folding and stability of globular proteins,”
          Diego, CA, Oct. 1998, pp. 3816–3820.                                                 Biochemistry, vol. 24, no. 6, pp. 1501–1509, 1985.
     [16] J. Kim and P. Bentley, “Towards an artificial immune system for net-             [40] B. P. Blackburne and J. D. Hirst, “Evolution of functional model pro-
          work intrusion detection: An investigation of clonal selection with neg-             teins,” J. Chem. Phys., vol. 115, no. 4, pp. 1935–1942, 2001.
          ative selection operator,” in Proc. IEEE Int. Congr. Evol. Comput.,             [41] N. Krasnogor, B. P. Blackburne, E. K. Burke, and J. D. Hirst, “Multi-
          Seoul, Korea, May 2001, pp. 1244–1252.                                               meme algorithms for protein structure prediction,” in Proc. Int. Conf.
     [17] D. Dasgupta and F. A. Gonzalez, “An immunity-based technique                         Parallel Problem Solving from Nature (PPSN VII), Granada, Spain,
          to characterize intrusions in computer networks,” IEEE Trans. Evol.                  Sep. 2002, pp. 769–778.
          Comput., vol. 6, pp. 281–291, Jun. 2002.                                        [42] N. Krasnogor, “Towards robust memetic algorithms,” in Recent Ad-
     [18] V. Cutello and G. Nicosia, “An immunological approach to combina-                    vances in Memetic Algorithms, W. E. H. N. Krasnogor and J. E. Smith,
          torial optimization problems,” in Proc. 8th Ibero-American Conf. Artif.              Eds. Berlin, Germany: Springer, 2004.
          Intell., Seville, Spain, Nov. 2002, pp. 361–370.                                [43] J. D. Hirst, “The evolutionary landscape of functional model proteins,”
     [19] V. Cutello, G. Nicosia, and M. Pavone, “A hybrid immune algorithm                    Protein Engineering, vol. 12, no. 9, pp. 721–726, 1999.
          with information gain for the graph coloring problem,” in Proc. LNCS            [44] R. Unger and J. Moult, “Genetic algorithms for protein folding simu-
          on Genetic and Evol. Comput. Conf., Chicago, IL, Jul. 2003, vol. 2723,               lations,” J. Mol. Biol., vol. 231, no. 1, pp. 75–81, 1993.
          pp. 171–182.                                                                    [45] L. Toma and S. Toma, “Contact interactions method: A new algorithm
     [20] E. Hart and P. Ross, “The evolution and analysis of a potential antibody             for protein folding simulations,” Protein Science, vol. 5, pp. 147–153,
          library for use in job-shop scheduling,” in New Ideas in Optimization,               1996.
          D. Corne, M. Dorigo, and F. Glover, Eds. London, U.K.: McGraw-                  [46] P. Crescenzi, D. Goldman, C. Papadimitriou, A. Piccolboni, and M.
          Hill, 1999.                                                                          Yannakakis, “On the complexity of protein folding,” J. Comp. Biol.,
     [21] D. Dasgupta and N. S. Majumdar, “Anomaly detection in multidimen-                    vol. 5, no. 3, pp. 423–466, 1998.
          sional data using negative selection algorithm,” in Proc. IEEE World            [47] B. Berger and T. Leighton, “Protein folding in the hydrophobic-hy-
          Congr. Comput. Intell., Congr. Evol. Comput., Honolulu, HI, May                      drophilic model is NP complete,” J. Comp. Biol., vol. 5, pp. 27–40,
          2002, pp. 1039–1044.                                                                 1998.
     [22] D. Bradley and A. M. Tyrrell, “Hardware fault tolerance: An im-                 [48] H. S. Chan and K. A. Dill, “Comparing folding codes for proteins
          munological solution,” in Proc. IEEE Int. Conf. Syst., Man, Cybern.,                 and polymers,” Proteins: Struct., Funct., Genet., vol. 24, pp. 335–344,
          Nashville, TN, Oct. 2000, pp. 107–112.                                               1996.
     [23] P. J. C. Branco, J. A. Dente, and R. V. Mendes, “Using immunology               [49] N. Krasnogor, W. E. Hart, J. Smith, and D. A. Pelta, “Protein struc-
          principles for fault detection,” IEEE Trans. Ind. Electron., vol. 50, no.            ture prediction with evolutionary algorithms,” in Proc. Genetic Evol.
          2, pp. 362–373, 2003.                                                                Comput. Conf., Orlando, FL, Jul. 1999, pp. 1596–1601.
     [24] S. Forrest, A. S. Perelson, L. Allen, and R. Cherukuri, “Self-nonself           [50] F. M. Burnet, The Clonal Selection Theory of Acquired Immunity.
          discrimination in a computer,” in Proc. IEEE Symp. Research in Secu-                 Cambridge, U.K.: Cambridge Univ. Press, 1959.
          rity and Privacy, Oakland, CA, May 1994, pp. 202–212.                           [51] V. Cutello, G. Nicosia, and M. Pavone, “Exploring the capability of
     [25] D. Dasgupta, Y. Cao, and C. Yang, “An immunogenetic approach to                      immune algorithms: A characterization of hypermutation operators,”
          spectra recognition,” in Proc. Int. Conf. Genetic and Evol. Comput.                  in Proc. 3rd Int. Conf. Artif. Immune Syst., G. Nicosia, V. Cutello,
          Conf., Orlando, FL, Jul. 1999, pp. 149–155.                                          P. Bentley, and J. Timmis, Eds., Catania, Italy, Sep. 2004, pp.
     [26] J. Timmis and M. Neal, “A recourse limited artificial immune system                   263–276.
          for data analysis,” Knowledge Based Systems, vol. 14, no. 3–4, pp.              [52] ——, “An immune algorithm with hyper-macromutations for the 2d
          121–130, 2001.                                                                       hydrophilic-hydrophobic model,” in Proc. Congr. Evol. Comput., Port-
     [27] S. Hedberg, “Combating computer viruses: Ibm’s new computer im-                      land, OR, Jun. 2004, pp. 1074–1080.
          mune system,” IEEE Parallel and Distributed Technology: Systems                [53] N. Krasnogor, “Studies on the theory and design space of memetic al-
          Applications, vol. 4, no. 2, pp. 9–11, 1996.                                         gorithms,” Ph.D. dissertation, Univ. West of England, Bristol, U.K.,
     [28] G. Nicosia, V. Cutello, P. Bentley, and J. Timmis, in Proc. 3rd Int.                 2002.
          Conf. Art. Immune Syst., 2004.                                                  [54] P. E. Seiden and F. Celada, “A model for simulating cognate recogni-
     [29] S. Stepney, R. Smith, J. Timmis, and A. Tyrrell, “Towards a concep-                  tion and response in the immune system,” J. Theor. Biology, vol. 158,
          tual framework for artificial immune systems,” in Proc. LNCS on Artif.                pp. 329–357, 1992.
          Immune Syst., G. Nicosia, V. Cutello, P. Bentley, and J. Timmis, Eds.,          [55] H. P. Schwefel and G. Rudolph, “Contemporary evolution strategies,”
          Catania, Italy, Sep. 2004, vol. 3239, pp. 53–64.                                     in Proc. 3rd Int. Conf. Advances in Artif. Life, F. Moràn, A. Moreno, J.
     [30] A. Freitas and J. Timmis, “Revisiting the foundations of artificial im-               J. Merelo, and P. Chac’on, Eds., 1995, pp. 893–907.
          mune systems: A problem oriented perspective,” in Proc. LNCS on Ar-             [56] S. S. Plotkin and J. N. Onuchic, “Understanding protein folding with
          tificial Immune Syst., J. Timmis, P. Bentley, and E. Hart, Eds., Edin-                energy landscape theory,” Quart. Rev. Biophy., vol. 35, no. 2, pp.
          burgh, U.K., Sep. 2003, vol. 2787, pp. 229–241.                                      111–167, 2002.
     [31] M. Levitt, “Protein folding by restrained energy minimization and               [57] D. A. Pelta and N. Krasnogor, “Multimeme algorithms using fuzzy
          molecular dynamics,” J. Mol. Biol., vol. 170, pp. 723–764, 1983.                     logic based memes for protein structure prediction,” in Recent Ad-
     [32] D. G. Covell, “Folding protein -carbon chains into compact forms by                 vances in Memetic Algorithms, W. E. H. N. Krasnogor and J. E. Smith,
          Monte Carlo methods,” Proteins: Struct. Funct. Genet., vol. 14, no. 4,               Eds. Berlin, Germany: Springer-Verlag, 2004.
          pp. 409–420, 1992.                                                              [58] R. Santana, P. Larrañaga, and J. A. Lozano, “Protein folding in 2-di-
     [33] E. Alm and D. Baker, “Prediction of protein-folding mechanisms from                  mensional lattices with estimation of distribution algorithms,” in Proc.
          free-energy landscapes derived from native structures,” in Proc. Nat.                5th Int. Symp. Biol. Medical Data Analysis, Barcelona, Spain, Nov.
          Acad. Sci. USA, 1999, vol. 96, no. 20, pp. 11305–11310.                              2004, pp. 388–398.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.


CUTELLO et al.: AN IMMUNE ALGORITHM FOR PROTEIN STRUCTURE PREDICTION ON LATTICE MODELS                                                                            17



   [59] A. Shmygelska and H. H. Hoos, “An improved ant colony optimization                                     Giuseppe Nicosia (M’01) received the Laurea de-
        algorithm for the 2d hp protein folding problem,” in Proc. 16th Canad.                                 gree and the Ph.D. degree in computer science from
        Conf. Artif. Intell., Halifax, Canada, Jun. 2003, pp. 400–417.                                         the University of Catania, Catania, Italy, in 2000 and
   [60] R. Konig and T. Dandekar, “Improving genetic algorithms for protein                                    2005, respectively.
        folding simulations by systematic crossover,” Biosystems, vol. 50, no.                                    Since 2001, he has been Grant-Holder of Cineca
        1, pp. 17–25, 1999.                                                                                    Supercomputing Center, Bologna, Italy, in the area
   [61] N. Krasnogor and J. Smith, “A tutorial for competent memetic al-                                       of High-Performance Computing. In 2004, he was a
        gorithms: Model, taxonomy, and design issues,” IEEE Trans. Evol.                                       Visiting Research Assistant in the Computing Labo-
        Comput., vol. 9, no. 5, pp. 474–488, Oct. 2005.                                                        ratory, Kent University, Canterbury, Kent, U.K. Since
   [62] T. Jiang, Q. Cui, G. Shi, and S. Ma, “Protein folding simulations of the                               October 2006, he has been an Associate Professor of
        hydrophobic-hydrophilic model by combining tabu search with genetic                                    Computer Science at the University of Catania. He is
        algorithms,” J. Chem. Phys., vol. 119, no. 8, pp. 4592–4596, 2003.          currently involved in the design and development of optimization algorithms
   [63] F. Liang and W. H. Wong, “Evolutionary Monte Carlo for protein              for circuit design problems, a joint research project supported by the Univer-
        folding simulations,” J. Chem. Phys., vol. 115, no. 7, pp. 3374–3380,       sity of Catania and STMicroeletronics. He is coauthor of more than 40 papers
        2001.                                                                       in international journals and conference proceedings, and three coedited books
   [64] T. C. Beutler and K. A. Dill, “A fast conformational search strategy for    on artificial immune systems. He has chaired various international conferences
        finding low energy structures of model proteins,” Protein Science, vol.      and workshops in the field of artificial immune systems. His primary research
        5, no. 10, pp. 2037–2043, 1996.                                             interests are design and analysis of artificial immune systems and immune algo-
   [65] M. Milostan, P. Lukasiak, K. A. Dill, and J. Bla˙ ewicz, “A tabu search
                                                         z                          rithms, optimization, protein bioinformatics, artificial life, and various aspects
        strategy for finding low energy structures of proteins in HP-model,” in      of unconventional model of computation. His current research interest lies in
        Proc. Annu. Int. Conf. Comput. Molecular Biol., Berlin, Germany, Apr.       the design of hybrid evolutionary and immunological algorithms for dynamic
        2003, pp. Poster No.5-108.                                                  environment and constrained multiobjective optimization problems.
   [66] H. Hsu, V. Mehra, W. Nadler, and P. Grassberger, “Growth algorithms
        for lattice heteropolymers at low temperatures,” J. Chem. Phys., vol.
        118, no. 1, pp. 444–451, 2003.
                                                                                                              Mario Pavone received the M.Sc. and Ph.D. degrees
   [67] A. Shmygelska, R. Anguirre-Hernandez, and H. H. Hoos, “An ant
        colony optimization algorithm for the 2d HP protein folding problem,”                                 in computer science from the University of Catania,
                                                                                                              Catania, Italy, in 1999 and 2004, respectively.
        in Proc. Int. Workshop Ant Algorithms, Brussels, Belgium, Sep. 2002,
        pp. 40–52.                                                                                               He is currently visiting the IBM-KAIST Bio-Com-
                                                                                                              puting Research Center, Korea Advanced Institute of
   [68] C. Cotta, “Protein structure prediction using evolutionary algorithms
        hybridized with backtracking,” in Artificial Neural Nets Problem                                       Science and Technology (KAIST), Korea, doing re-
        Solving Methods, ser. Lecture Notes in Computer Science, J. Mira and                                  search in computer science. His research interests in-
                                                                                                              clude biologically inspired computing, including arti-
        J. Álvarez, Eds. Berlin, Germany: Springer-Verlag, 2003, vol. 2687,
        pp. 321–328.                                                                                          ficial immune systems, combinatorial and numerical
                                                                                                              optimization, and computational biology, with partic-
   [69] V. Cutello, G. Narzisi, G. Nicosia, and M. Pavone, “Clonal selection
        algorithms: A comparative case study using effective mutation po-                                     ular reference on protein structure prediction, mul-
                                                                                    tiple sequence alignment, gene regulatory network, and gene expression data.
        tentials,” in Lecture Notes in Computer Science, Banff, Canada, Aug.
        2005, vol. 3627, Proc. 4th Int. Conf. Artif. Immune Syst., pp. 13–28.
   [70] D. A. V. Veldhuizen, J. B. Zydallis, and G. B. Lamont, “Considerations
        in engineering parallel multiobjective evolutionary algorithms,” IEEE                                Jonathan Timmis (M’02) is a Reader at the Uni-
        Trans. Evol. Comput., vol. 7, no. 2, pp. 144–173, Apr. 2003.                                         versity of York, York, U.K., in a joint appointment
                                                                                                             with the Department of Computer Science and
                          Vincenzo Cutello received the Laurea degree and                                    Department of Electronics. He has published over 60
                          the Ph.D. degree in mathematics from the University                                papers on artificial immune system related research.
                          of Catania, Catania, Italy, in 1984 and 1989, respec-                              He has worked on immune inspired approaches to
                          tively, and the M.S. and Ph.D. degrees in computer                                 real-time fault detection in ATM machines, machine
                          science from the Courant Institute of Mathematical                                 learning, optimization, web mining, robot control,
                          Sciences, New York University, New York, in 1989                                   software testing, and theoretical aspects of immune
                          and 1991, respectively.                                                            inspired systems. His primary research interest is in
                             Since 2001, he has been a Full Professor of Com-                                the computational abilities of the immune, neural and
                          puter Science at the University of Catania. He is cur-    endocrine systems and how they relate to computer science and engineering.
                          rently Chairman of the undergraduate program in Ap-         Dr. Timmis is the cofounder of the International Conference on Artificial
                          plied Computer Science, and Director of the Interdis-     Immune Systems (ICARIS). He is principle investigator for the EPSRC aca-
ciplinary Research Center on Applied Computer Science, University of Catania.       demic network on artificial immune systems, ARTIST, and co-investigator on
He is author and coauthor of more than 100 papers in international journals and     new EPSRC funded project exploring the use of immunological modeling tech-
conference proceedings. His research activities are currently focused on the de-    niques for the development of novel immune inspired algorithms for bioinfor-
sign and analysis of evolutionary algorithms, decision procedures, fuzzy logic,     matics (EP/D501377/1). He is heavily involved with the Grand Challenges for
biological inspired computing, and artificial immune systems.                        Computer Science in the U.K. and now serves on the GC-7 committee.

More Related Content

DOC
NatashaBME1450.doc
PDF
Berlin center for genome based bioinformatics koch05
PDF
Neural Networks in The Chemical Industry
PDF
Biological Network Inference via Gaussian Graphical Models
PDF
AI approaches in healthcare - targeting precise and personalized medicine
PDF
Model of Differential Equation for Genetic Algorithm with Neural Network (GAN...
PDF
H43014046
PDF
Army Study: Ontology-based Adaptive Systems of Cyber Defense
NatashaBME1450.doc
Berlin center for genome based bioinformatics koch05
Neural Networks in The Chemical Industry
Biological Network Inference via Gaussian Graphical Models
AI approaches in healthcare - targeting precise and personalized medicine
Model of Differential Equation for Genetic Algorithm with Neural Network (GAN...
H43014046
Army Study: Ontology-based Adaptive Systems of Cyber Defense

What's hot (20)

PDF
AN EFFICIENT PSO BASED ENSEMBLE CLASSIFICATION MODEL ON HIGH DIMENSIONAL DATA...
PPT
2005: A Matlab Tour on Artificial Immune Systems
PDF
Finding common ground between modelers and simulation software in systems bio...
PPT
2000: Artificial Immune Systems - Theory and Applications
PDF
IRJET- Image Classification using Deep Learning Neural Networks for Brain...
DOC
Technical Paper.doc.doc
PPT
2005: An Introduction to Artificial Immune Systems
PDF
G017544855
PDF
8421ijbes01
PDF
Network Biology: A paradigm for modeling biological complex systems
PPT
2006: Artificial Immune Systems - The Past, The Present, And The Future?
PDF
Application of Hybrid Genetic Algorithm Using Artificial Neural Network in Da...
PPT
2001: An Introduction to Artificial Immune Systems
PDF
Sample Work For Engineering Literature Review and Gap Identification
PDF
MICROARRAY GENE EXPRESSION ANALYSIS USING TYPE 2 FUZZY LOGIC(MGA-FL)
PDF
SBML and related resources 
and standardization efforts
PPTX
Presentation
PDF
Sugarcane yield forecasting using
PDF
A clonal based algorithm for the reconstruction of
AN EFFICIENT PSO BASED ENSEMBLE CLASSIFICATION MODEL ON HIGH DIMENSIONAL DATA...
2005: A Matlab Tour on Artificial Immune Systems
Finding common ground between modelers and simulation software in systems bio...
2000: Artificial Immune Systems - Theory and Applications
IRJET- Image Classification using Deep Learning Neural Networks for Brain...
Technical Paper.doc.doc
2005: An Introduction to Artificial Immune Systems
G017544855
8421ijbes01
Network Biology: A paradigm for modeling biological complex systems
2006: Artificial Immune Systems - The Past, The Present, And The Future?
Application of Hybrid Genetic Algorithm Using Artificial Neural Network in Da...
2001: An Introduction to Artificial Immune Systems
Sample Work For Engineering Literature Review and Gap Identification
MICROARRAY GENE EXPRESSION ANALYSIS USING TYPE 2 FUZZY LOGIC(MGA-FL)
SBML and related resources 
and standardization efforts
Presentation
Sugarcane yield forecasting using
A clonal based algorithm for the reconstruction of
Ad

Viewers also liked (9)

PDF
Robust Immunological Algorithms for High-Dimensional Global Optimization
PDF
Joco pavone
PDF
Immunological Multiple Sequence Alignments
PDF
O-BEE-COL: Optimal BEEs for COLoring Graphs
PPTX
DOCX
Pppoe mikrotik
DOC
Mikro tik
PDF
Mikrotik hwa 5500-cpe_connection
PDF
Mikrotik
Robust Immunological Algorithms for High-Dimensional Global Optimization
Joco pavone
Immunological Multiple Sequence Alignments
O-BEE-COL: Optimal BEEs for COLoring Graphs
Pppoe mikrotik
Mikro tik
Mikrotik hwa 5500-cpe_connection
Mikrotik
Ad

Similar to An Immune Algorithm for Protein Structure Prediction on Lattice Models (20)

PDF
Generic approach for predicting unannotated protein pair function using protein
PDF
Criterion based Two Dimensional Protein Folding Using Extended GA
PDF
Ab Initio Protein Structure Prediction
PDF
Inspiration to Application: A Tutorial on Artificial Immune Systems
PDF
Protein structure prediction by means
DOC
Download Senior Thesis.doc
PPTX
Using uml to model immune system
PDF
Fast protein binding site comparisons using
PDF
Following the Evolution of New Protein Folds via Protodomains [Report]
PDF
Review Paper on Shared and Distributed Memory Parallel Algorithms to Solve Bi...
PDF
57 bio infomark
PDF
Protein sequence classification in data mining– a study
PDF
Protein Sequence Classification In Data Mining - A Study
PDF
PROTEIN SEQUENCE CLASSIFICATION IN DATA MINING– A STUDY
PDF
Protein Sequence Classification In Data Mining - A Study
PDF
ENHANCED POPULATION BASED ANT COLONY FOR THE 3D HYDROPHOBIC POLAR PROTEIN STR...
PDF
ENHANCED POPULATION BASED ANT COLONY FOR THE 3D HYDROPHOBIC POLAR PROTEIN STR...
PDF
Providing SSPCO Algorithm to Construct Static Protein-Protein Interaction (PP...
PDF
Introduction To Protein Structure Prediction Methods And Algorithms Wiley Ser...
PDF
Stable Drug Designing by Minimizing Drug Protein Interaction Energy Using PSO
Generic approach for predicting unannotated protein pair function using protein
Criterion based Two Dimensional Protein Folding Using Extended GA
Ab Initio Protein Structure Prediction
Inspiration to Application: A Tutorial on Artificial Immune Systems
Protein structure prediction by means
Download Senior Thesis.doc
Using uml to model immune system
Fast protein binding site comparisons using
Following the Evolution of New Protein Folds via Protodomains [Report]
Review Paper on Shared and Distributed Memory Parallel Algorithms to Solve Bi...
57 bio infomark
Protein sequence classification in data mining– a study
Protein Sequence Classification In Data Mining - A Study
PROTEIN SEQUENCE CLASSIFICATION IN DATA MINING– A STUDY
Protein Sequence Classification In Data Mining - A Study
ENHANCED POPULATION BASED ANT COLONY FOR THE 3D HYDROPHOBIC POLAR PROTEIN STR...
ENHANCED POPULATION BASED ANT COLONY FOR THE 3D HYDROPHOBIC POLAR PROTEIN STR...
Providing SSPCO Algorithm to Construct Static Protein-Protein Interaction (PP...
Introduction To Protein Structure Prediction Methods And Algorithms Wiley Ser...
Stable Drug Designing by Minimizing Drug Protein Interaction Energy Using PSO

More from Mario Pavone (12)

PDF
A Hybrid Immunological Search for theWeighted Feedback Vertex Set Problem
PDF
The Influence of Age Assignments on the Performance of Immune Algorithms
PDF
Multi-objective Genetic Algorithm for Interior Lighting Design
PDF
DENSA:An effective negative selection algorithm with flexible boundaries for ...
PDF
How long should Offspring Lifespan be in order to obtain a proper exploration?
PDF
Swarm Intelligence Heuristics for Graph Coloring Problem
PDF
O-BEE-COL
PDF
12th European Conference on Artificial Life - ECAL 2013
PDF
Swarm Intelligence Heuristics for Graph Coloring Problem
PDF
Clonal Selection: an Immunological Algorithm for Global Optimization over Con...
PDF
CFP: Optimiation on Complex Systems
PDF
An Information-Theoretic Approach for Clonal Selection Algorithms
A Hybrid Immunological Search for theWeighted Feedback Vertex Set Problem
The Influence of Age Assignments on the Performance of Immune Algorithms
Multi-objective Genetic Algorithm for Interior Lighting Design
DENSA:An effective negative selection algorithm with flexible boundaries for ...
How long should Offspring Lifespan be in order to obtain a proper exploration?
Swarm Intelligence Heuristics for Graph Coloring Problem
O-BEE-COL
12th European Conference on Artificial Life - ECAL 2013
Swarm Intelligence Heuristics for Graph Coloring Problem
Clonal Selection: an Immunological Algorithm for Global Optimization over Con...
CFP: Optimiation on Complex Systems
An Information-Theoretic Approach for Clonal Selection Algorithms

Recently uploaded (20)

PDF
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH 9 GLOBAL SUCCESS - CẢ NĂM - BÁM SÁT FORM Đ...
PDF
Complications of Minimal Access Surgery at WLH
PPTX
Pharmacology of Heart Failure /Pharmacotherapy of CHF
PDF
VCE English Exam - Section C Student Revision Booklet
PPTX
PPH.pptx obstetrics and gynecology in nursing
PPTX
GDM (1) (1).pptx small presentation for students
PDF
Classroom Observation Tools for Teachers
PPTX
Final Presentation General Medicine 03-08-2024.pptx
PDF
RMMM.pdf make it easy to upload and study
PDF
FourierSeries-QuestionsWithAnswers(Part-A).pdf
PPTX
Introduction_to_Human_Anatomy_and_Physiology_for_B.Pharm.pptx
PPTX
Cell Structure & Organelles in detailed.
PDF
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
PDF
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
PPTX
Cell Types and Its function , kingdom of life
PDF
O5-L3 Freight Transport Ops (International) V1.pdf
PDF
Supply Chain Operations Speaking Notes -ICLT Program
PDF
Computing-Curriculum for Schools in Ghana
PDF
102 student loan defaulters named and shamed – Is someone you know on the list?
PPTX
1st Inaugural Professorial Lecture held on 19th February 2020 (Governance and...
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH 9 GLOBAL SUCCESS - CẢ NĂM - BÁM SÁT FORM Đ...
Complications of Minimal Access Surgery at WLH
Pharmacology of Heart Failure /Pharmacotherapy of CHF
VCE English Exam - Section C Student Revision Booklet
PPH.pptx obstetrics and gynecology in nursing
GDM (1) (1).pptx small presentation for students
Classroom Observation Tools for Teachers
Final Presentation General Medicine 03-08-2024.pptx
RMMM.pdf make it easy to upload and study
FourierSeries-QuestionsWithAnswers(Part-A).pdf
Introduction_to_Human_Anatomy_and_Physiology_for_B.Pharm.pptx
Cell Structure & Organelles in detailed.
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
Cell Types and Its function , kingdom of life
O5-L3 Freight Transport Ops (International) V1.pdf
Supply Chain Operations Speaking Notes -ICLT Program
Computing-Curriculum for Schools in Ghana
102 student loan defaulters named and shamed – Is someone you know on the list?
1st Inaugural Professorial Lecture held on 19th February 2020 (Governance and...

An Immune Algorithm for Protein Structure Prediction on Lattice Models

  • 1. This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION 1 An Immune Algorithm for Protein Structure Prediction on Lattice Models Vincenzo Cutello, Giuseppe Nicosia, Member, IEEE, Mario Pavone, and Jonathan Timmis, Member, IEEE Abstract—We present an immune algorithm (IA) inspired by the anomaly detection [21], fault diagnosis [22], [23], computer clonal selection principle, which has been designed for the protein security [24], data analysis [10], [25], [26], virus detection [27], structure prediction problem (PSP). The proposed IA employs two and many other areas [1]–[3], [28]. The field of AIS appears special mutation operators, hypermutation and hypermacromuta- tion to allow effective searching, and an aging mechanism which is not only to be a powerful computing paradigm, but potentially a new immune inspired operator that is devised to enforce diver- a prominent apparatus for improving the understanding of sity in the population during evolution. biological data and systems [9], [29]. When cast as an optimization problem, the PSP can be seen as When one designs any computational solution, the nature of discovering a protein conformation with minimal energy. The pro- the problem space should always be taken into account. This is posed IA was tested on well-known PSP lattice models, the HP model in two-dimensional and three-dimensional square lattices’, especially true in an emerging area such as AIS, and we must and the functional model protein, which is a more realistic biolog- avoid the one size fits all attitude that the authors of [30] warn us ical model. against. With this in mind, and in the context of the framework Our experimental results demonstrate that the proposed IA is for AIS presented in [2], we introduce an immune algorithm very competitive with the existing state-of-art algorithms for the (IA) based on the clonal selection principle [9]. We employ PSP on lattice models. a new aging operator and specific mutation operators tailored Index Terms—Aging operator, clonal selection algorithms, func- for the protein structure prediction problem (PSP) in the HP tional model proteins, hypermacromutation operator, hypermuta- model for two-dimensional (2-D) and three-dimensional (3-D) tion operator, immune algorithms (IAs), protein structure predic- tion problem, two-dimensional HP model, three-dimensional HP lattices, and in the functional model proteins. Given the primary model. sequence of a protein, the protein structure prediction problem requires the identification of its native (tertiary) conformation with minimum energy; while the protein folding problem re- I. INTRODUCTION quires information about the possible pathways to folding and unfolding. Since a protein’s structure determines its biological A RTIFICIAL Immune Systems (AISs) represent a field of biologically inspired computing that attempts to exploit theories, principles, and concepts of modern immunology to function, it is very important to be able to predict the final spa- tial conformation of the proteins. This paper is concerned only design immune system-based applications in science and engi- with the static aspect, that is, how to predict the folded tertiary neering [1]–[3]. One role of the immune system (IS) is to protect structure of a protein, given its sequence of amino acids through the host organism against attacks from antigens (i.e., viruses and the use of lattice models. bacterias) and eliminate those cells that have been “infected.” This paper is structured as follows: Section II describes the The IS provides an excellent example of a bottom up intelli- protein structure prediction problem in Dill’s model and in the gent strategy [4], through which adaptation operates at the local functional model protein; Section III presents the IA, inspired level of cells and molecules, and useful behavior emerges at the by the clonal selection theory, for the protein structure predic- global level: this is exemplified by the immune humoral and cel- tion problem; Section IV details the characteristic dynamics of lular responses. the implemented IA using an aging process; Section V describes AISs are proving to be a very general and applicable form the technique used to partition the landscape of the PSP, and the of bio-inspired computing. A great deal of work has gone into application of the aging process and memory B cells to improve developing algorithms that extrapolate basic immune processes the overall performance of the algorithm; Section VI reports the such as clonal selection, negative and positive selection, danger results for the 2-D HP model; Section VI-A describes previous theory, and immune networks [2]. To date, AIS have been related works, and draws comparisons between these and the applied to areas such as machine learning [5], [6], optimization proposed IA for the 2-D HP model; Section VII presents re- [7]–[9], bioinformatics [9]–[11], robotic systems [12]–[14], sults for the 3-D HP model; Section VIII presents the results decision support systems [15], network intrusion detection [16], obtained for the functional model protein; Section IX provides [17], combinatorial optimization [18], [19], scheduling [20], a brief comparison between the IA and other biologically in- spired algorithms; finally, concluding remarks are presented in Manuscript received August 9, 2005; revised January 3, 2006. Section X. V. Cutello, G. Nicosia, and M. Pavone are with the Department of Math- ematics and Computer Science, University of Catania, 95125 Catania, Italy (e-mail: vctl@dmi.unict.it; nicosia@dmi.unict.it; mpavone@dmi.unict.it). II. LATTICE MODELS FOR THE PSP J. Timmis is with the Department of Computer Science and the Department There are essentially five approaches to modeling the PSP: of Electronics, University of York, Heslington, York YO10 5DD, U.K. (e-mail: jt517@ohm.york.ac.uk). molecular dynamics [31], Monte Carlo methods [32], statistical Digital Object Identifier 10.1109/TEVC.2006.880328 mechanical models [33], [34], probabilistic road map-based 1089-778X/$20.00 © 2006 IEEE
  • 2. This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. 2 IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION [35], [36], and lattice models [37], [38]. The first two techniques have been used to analyze the number and the characteristics of folding pathways; the second two techniques are useful tools for studying the folding landscape, while the final technique, whilst having a fundamental theoretical relevance, cannot be applied directly to real proteins. In this paper, we focus on lattice models, in particular, we use the well-known Dill’s lattice model, the HP model [39], and the “shifted” HP model (also called functional model proteins) [40]. The HP model takes into account the hydrophobic interaction as the main driving force in protein folding. The HP model in- volves attraction—interaction only. The functional model pro- teins, unlike Dill’s model, has a unique native fold with an en- ergy gap between the native and the first excited state; the native state is not maximally compact, and thus presents cavities or po- tential binding sites, a key biological property required in order to investigate ligand binding. To include these properties, the functional model has both attractive and repulsive interactions. A. The Dill’s Model Proteins are sequences of amino acids. In the standard Dill’s model, each amino acid is represented as a bead, and connecting bonds are represented as lines. In this approach, the protein is composed of a specific sequence of only two types of beads, H Fig. 1. Convergence process for the protein sequence No. 1, at different energy (bead-Hydrophobic/non-polar) or P (bead-hydrophilic/Polar); levels. that is, the 20 amino acids can be divided into two classes: H and P. This is usually called the HP model (or Dill model) and . When and , we [39], where the label P is used to represent hydrophilic amino have the typical interaction energy matrix for the standard HP acids because those amino acids are also polar. We reduce the model [39]; while for and , we have the interac- alphabet from 20 characters to 2, where our protein sequences tion energy matrix for the shifted HP model [43]. For the Dill’s take the form of strings belonging to the alphabet . model, the native conformation is the one that maximizes the Hydrophobic amino acids tend to come together to form a number of contacts H-H, i.e., the one that minimizes the free compact core that excludes water. Due to the fact that the energy function. environment inside cells is aqueous (primarily water), these Regarding the functional model proteins (described in hydrophobic amino acids tend to be on the inside of a protein, Section VIII), in order to find binding pockets and the required rather than on its surface. Hydrophobicity is one of the key energy gap, the native conformation finds a tradeoff between factors that determines how the chain of amino acids will fold the number of H-H contacts (i.e., the attractive force) and the up into an active protein. non H-H contacts (i.e., the repulsive forces). The whole conformation is embedded in a two or 3-D lattice, Fig. 1 shows snapshots of the IA during the convergence which simply divides space into amino acid-sized units. Bond process, when applied to the Protein sequence No. 1 (see Table I) angles only have limited discrete values, dictated by the struc- at different energy levels: from poor conformations to the native ture of the lattice (for instance, square, triangular, or cubic [41], conformation with Energy 9 (the H-H contacts are represented [42]). A lattice site may either be empty or contain one bead. In by dotted lines, and the hydrophobic residues by black circles). particular, on a 2-D square lattice, the HP model represents pro- 2) 2-D Square Lattice Standard HP Benchmarks: For our teins as 2-D self-avoiding walk chains of beads on the lattice, experiments, we used the first nine instances of the Tortilla i.e., two beads cannot occupy the same site of the lattice, and 2-D HP Benchmarks1 (the first eight sequences are taken from each bead occupies only one lattice site connected to its chain [44], sequence 9 is taken from [45], the last three instances are neighbors. taken from [41]) to test the searching capability of the designed 1) Protein Energy: For each conformation, one can evaluate IA. In Table I, is the optimal or best-known energy value, the value of the energy function: this allows for the modeling , indicate repetitions of the relative symbol or of free energies of protein folds. The simplest form of energy subsequence. function counts the number of H-H-contacts. Each H-H topo- The 12 chosen HP instances are standard benchmarks used to logical contact has energy value , while all other contact inter- test the searching ability of heuristics methods and blind search action types (H-P, P-H, P-P) have energy value . Two amino algorithms. These instances have been tested on more than 20 acids create H-H-contact if they are topological neighbors and different algorithms (see Section VI-A and Table VII). Ana- they are not connected by a bond. The goal is to find a confor- lyzing the HP model is very interesting and challenging for mation with the lowest energy. In the HP model in general, the 1 http : // www.cs.sandia.gov / tech_reports / compbio / tortilla-hp-benchmarks residues’ interactions can be defined as follows: .html.
  • 3. This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. CUTELLO et al.: AN IMMUNE ALGORITHM FOR PROTEIN STRUCTURE PREDICTION ON LATTICE MODELS 3 TABLE I The presence of binding sites in the shifted HP model allows 2-D SQUARE LATTICE STANDARD HP BENCHMARKS FROM [41] AND [44] us to briefly discuss the biological relevance of the model. Such sites support a significant number of proteins that can be clas- sified as functional, with the ground states having lower degen- eracies and more cooperative folding than the regular HP model [48]. C. The Conformational Space Into Lattice To embed a hydrophobic pattern into a lattice, we have the following three methods [49]. 1) Cartesian Coordinate: The position of residues is specified independently from other residues. 2) Internal Coordinate: The position of each residue depends upon its predecessor residues in the sequence. There are two types of internal coordinate: absolute directions where the residue directions are relative to the axes defined by the lattice, and relative directions where the residue directions are relative to the direction of the previous move. 3) Distance Matrix: The location of a given residue is com- puted by means of its distance matrix. Krasnogor et al. [49] performed an exhaustive comparative study using evolutionary algorithms (EAs) with relative and absolute directions. The experimental results show that relative directions almost always outperform absolute directions over computer scientists, but is considered unsatisfactory for many square and cubic lattice, while absolute directions have better biologists. Whilst proteins fold in nature in a matter of sec- performances when facing triangular lattices. Experimental onds, computational biologists have found that folding proteins evidence suggests internal coordinates with relative directions to their minimum energy conformations is an unsolved opti- should be used. However, in general, it is difficult to assess the mization problem. The PSP on the HP model has been shown to effectiveness of direction encoding on an EA’s performance. be NP complete for 2-D [46] (NP-hardness is shown by a reduc- tion from an interesting variation of the planar Hamilton cycle problem), and 3-D lattices [47] (NP-hardness is shown by a re- III. THE IMMUNE ALGORITHM duction from a variation of the bin packing problem). A. The Clonal Selection Principle B. The Functional Model Proteins The theory of clonal selection [50], suggests that B and T lymphocytes that are able to recognize the antigen, will start to In the HP model, the interaction between two hydrophobic proliferate by cloning upon recognition of such antigen. When residues is , and zero for the other possible pairs (H-P, P-H, a B cell is activated by binding an antigen (and a second signal and P-P), that is, the HP model involves one attraction interac- is received from T lymphocytes), many clones are produced in tion (H with H) and three neutral interactions (H with P, P with response, via a process called clonal expansion. The resulting P), so that the energy matrix of the HP model may be written cells can undergo somatic hypermutation, creating offspring B cells with mutated receptors. The higher the affinity of a B cell to the available antigens, the more likely it will clone. This results (1) in a Darwinian process of variation and selection, called affinity maturation. The increase in size of these populations couples There are many other folding codes, i.e., the number of dif- with the production of cells with longer than expected lifetimes, ferent types of residues and the matrix of energies describing assuring the organism a higher specific responsiveness to that the interactions between different kinds of residues. One im- antigenic attack in the future. This gives rise to immunological portant folding code is the “shifted” HP model (or functional memory which is demonstrated by the fact that, when the host model proteins) [40]. This model has native folds that are not is first exposed to the antigen, a primary response is initiated; maximally compact, and presents cavities or potential binding in this phase the antigen is recognized and immune memory is sites which is a key biological property required in order to in- developed. When the same antigen is encountered in the future, vestigate ligand binding. To include these properties, the shifted a secondary immune response is initiated. This results from the HP model has two bead types, and both attractive and repulsive stimulation of cells already specialized and present as memory interaction. Thus, the shifted HP energy matrix is cells: a rapid and more abundant production of antibodies is ob- served. The secondary response can be elicited from any antigen that is similar, although not identical, to the original one that es- (2) tablished the memory. This is known as cross-reactivity.
  • 4. This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. 4 IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION TABLE II version of CLONALG [8]), which clones B cells proportion- PSEUDOCODE OF THE IA ally to their antigenic affinities. Experimental results for PSP using such an operator (not shown in this paper), show a fre- quent premature convergence during the population evolution. In fact, proportional cloning allows B cells with high affinity values to survive for many more generations, and the process can easily become trapped in local minima. 2) Hypermutation Operators: The hypermutation operators act on the B cell receptor of the clone population . The number of mutations is determined by a specific function, mutation potential, with their being several mutation potentials in existence [51]. In the same research paper, some significant hypermutation operators are discussed and quantitatively com- pared with respect to their success rate and computational cost. The authors of the paper investigated the searching capability of the IAs based on clonal selection principle using static, pro- portional and inversely proportional hypermutation operators and hypermacromutation operator. Analyzing the parameter surface for each variation operator and the performance on a complex “toy problem,” the trap functions, and the 2-D HP model, clarifies that few different and useful hypermutation operators exist, namely: inversely proportional hypermutation, B. The Clonal Selection Algorithm static hypermutation, and hypermacromutation operators. It The proposed IA (see Table II) employs two entity types: anti- appears that making use of inversely proportional hypermu- gens (Ag) and B cells. The Ag models the hydrophobic-pattern tation and hypermacromutation can contribute to finding the of the given protein, that is a sequence , where best experimental results for the 2-D HP model. As a conse- is the protein length, i.e., the number of amino acid in the pro- quence of these results, we implemented the IA presented in tein sequence. The B cell population, , represents a set of this paper, with inversely proportional hypermutation operator candidate solutions in the current fitness landscape at each gen- and a hypermacromutation operator. The hypermutation and eration . The B cell, or B cell receptor, is a sequence of di- the hypermacromutation operators mutate the B cell receptors rections (with , and using different mutation potentials. ), where each , with , is a rela- If during the mutation process, a constructive mutation oc- tive direction [49] with respect to the previous direction curs, the mutation procedure will move on to the next B cell. We (i.e., there are relative directions) and the nonrelative di- call such an event: Stop at the first constructive mutation (FCM). rection. Hence, we obtain an overall sequence of length . We adopted such a mechanism to slow down (premature) con- The sequence specifies a 2-D conformation which is suitable vergence, thus allowing a more detailed search through the land- for computing the energy value of the hydrophobic-pattern of scape. A different policy would make use of mutations ( - the given protein. mut), where the mutation procedure performs all mutations At each generation , there is a B cell population of size determined by the potential for the current B cell. With this . The initial population, time , is randomly generated in policy, however, and for the problems which are faced in this such a way that each B cell of , is a self-avoiding confor- paper, the implemented IA did not provide good results [51]. mation. There are two main functions within the algorithm. a) Inversely Proportional Hypermutation: The inversely Evaluate(P) which computes the fitness function value of proportional hypermutation operator, makes mutations in- each B cell ; hence is the energy of confor- versely proportional to the fitness value. In particular, at each mation coded in the B cell receptor ; and Termination_Condi- generation , the operator will perform at most the following tion() which returns true if a solution is found, or a maximum mutations: number of fitness function evaluations is reached. The implemented IA, like all IAs based on the clonal selection if principle outlined above, is characterized by cloning of B cells (3) if with higher antigenic affinity, affinity maturation, and hyper- mutation of offspring B cells. Within our approach, we employ with , and the current best fitness value or the three immune operators: cloning, hypermutation and aging, and best-known value. In this case, has the shape of an a standard evolutionary operator: the -selection operator. hyperbola branch. 1) Static Cloning Operator: The cloning operator [4], [18], In [51], the hypermutation operators obtained by varying the simply clones each B cell times producing an intermediate parameter , were thoroughly tested. Studying the parameter population of size . Throughout this paper, surfaces of the trap functions and the PSP, the authors discov- we will refer to this as static cloning operator, as opposed to a ered that for the hypermutation operator, inversely proportional proportional cloning operator (used in the pattern recognition to the fitness function value [modeled by (3)], the best values
  • 5. This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. CUTELLO et al.: AN IMMUNE ALGORITHM FOR PROTEIN STRUCTURE PREDICTION ON LATTICE MODELS 5 ulations , and . The parameter sets the maximum number of generations allowed for generated B cells to remain in the population. When a B cell is old it is erased from the current population, no matter what its fit- ness value is. We call this strategy, static pure aging. During the cloning expansion, a cloned B cell inherits the age of its parent. After the hypermutation phase, a cloned B cell which success- fully mutates, i.e., it obtains a better fitness value, will be consid- ered to have age equal to 0. Thus, an equal opportunity is given to each “new genotype” to effectively explore the fitness land- scape. We note that for greater than the maximum number of Fig. 2. Example of the hypermacromutation operator applied in the range [i; j] (in bold face the values successfully mutated). allowed generations, the IA works essentially without the aging operator. In such a limited case, the algorithm employs a strong elitist selection strategy. for the parameter are located in the range . In par- The aging operator is implemented by extending the B cells ticular, for the sequences 1, 2, 3, 4, and 12 the best value for data structure with a counter , which is initialized as is 0.4; for the sequences 5, 6, 9, 10, and 11 the best value is at generation and whenever a cloned B cell is successfully ; for the sequence 7 the best value is ; and for mutated. B cells are selected for survival, only if its life the sequence 8 the value is . . The age of each B cell is incremented by one for each of b) Hypermacromutation Operator: For the hyperma- the surviving B cells. If the surviving B cells are less than cromutation operator [52] (previously introduced in [53] as (the population size), new B cells are randomly created (with “macromutation operator”), the number of mutations is deter- ) and are added by the Elitist_Merge function into the mined by a simple random process which does not use functions population. depending upon constant parameters. Attempts are made to Within the literature there is a similar mechanism of the mutate each B cell receptor times, whilst maintaining the aging process using evolution strategies (ES) [55], where the self-avoiding property. The number of mutations is at most authors allow a life span, , for each parent of a -ES or a , in the range , with and being two -ES. A parent older than generations is not considered random integers such that (see Fig. 2). The further in the selection process, leaving the new offspring to number of mutations is independent from the fitness function enter into the population at the next generation. This mecha- and any other parameter. The hypermacromutation operator for nism allows a more flexible variation of the selection scheme each B cell receptor, randomly selects a perturbation direction, between the two extreme cases , that is -ES, and either from left to right or from right to left , that is -ES. As noted by the authors of the above . cited paper, this mechanism has not been properly investigated In general, the mutation operators perturb the B cells pop- and appears to be a “standalone” research work. ulation , generating the new populations and 4) -Selection With Birth Phase and No Redundancy: , respectively. Each B cell is a feasible candidate A new population , of B cells, for the next-genera- solution of the HP model (for simplicity in 2-D using relative tion , is obtained by selecting the best B cells which “sur- encoding, it is straightforward to extend to 3-D using other vived” the aging operator, from the populations , encoding schemes), making a self-avoiding walk chain on the and . No redundancy is allowed. Thus, each B cell re- lattice, . Hence, ceptor is unique, i.e., each genotype is different from all other given a protein conformation sequence , the mutation op- genotypes in the current population . If only B cells erator randomly selects a direction , , or survived, then the Elitist_Merge function creates new B a subsequence ( and cells (Birth phase). Hence, the -selection operator (with ); then, for each relative direction , it ran- and , or if both variation operators domly selects a new direction . If the new are activated) reduces the offspring B cell population (created conformation is again self-avoiding, then the operator accepts by cloning and hypermutation operators) of size to a it, otherwise, the process is repeated using the last direction new parent population of size . The selection operator , . identifies the best elements from the offspring set and the old 3) Aging Operator: This operator is designed to generate di- parent B cells, thus guaranteeing monotonicity in the evolution versity, in an attempt to avoid getting trapped in local minimum. dynamics. Although it is an operator inspired by the observation in the im- The properties of each immune operator are relatively well mune system that there is an expected mean life for the B cell understood: the cloning operator explores the attractor basins [54], the aging operator can be thought of as a general problem- and valleys of each candidate solution; the hypermutation op- and algorithm-independent operator. erators introduce innovations by exploring the current popula- The aging process attempts to capitalize on the immunolog- tion of B cells; the aging operator creates diversity during the ical fact that B cells have a limited life span, and that memory search process. The selection evolutionary operator directs the B cells have a longer life span. Starting from this basic obser- search process toward promising regions of the fitness landscape vation, the aging operator eliminates old B cells from the pop- and exploits the information coded within the current popula-
  • 6. This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. 6 IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION Fig. 3. Average fitness function values of P ,P ,P and the best B cell receptor on protein sequence Seq2, with parameter values d = 10, dup = 2, and = 5. tion. While selection is a universal problem- and algorithm-in- tion the system discovers during the learning phase [19]. To this dependent operator, hypermutation, and in general mutation and end, we define the B cells distribution function as the ratio crossover operators are specific operators that focus on the struc- between the number, , of B cells at time with fitness func- ture of the given landscape. tion value , and the total number of B cells Finally, it is worth noting that representation and mutation operators presented in this section use a discrete coding. They work on an alphabet of three letters for relative di- (4) rections in 2-D square lattices, and on an alphabet of six let- ters (where , , and It follows that the information gain can be defined as: ) for relative directions in 3-D cubic lattice (see Section VII). Hence, the work described in this paper is appli- cable only in the context of discrete coding. In Table II, we out- (5) line the pseudocode of the proposed IA. The gain is the amount of information the system has already IV. IA DYNAMICS learnt from the given problem instance with respect to the ran- In this section, we discuss the characteristic dynamics of the domly generated initial population (the initial distribu- proposed IA. The population size is set to and the maximum tion). Once the learning process begins, the information gain number of fitness function evaluations allowed is set to , increases monotonically until it reaches a final steady state (see for minimal values and 10 , with , Fig. 4). This is consistent with the idea of a maximum informa- and . All the values reported in this section are tion-gain principle of the form . averaged on 100 independent runs. Fig. 5 shows the information gain curves for and In Fig. 3, we show the average fitness values of populations . For the IA learns a greater amount of infor- and the best fitness value when the IA mation than for , in fact, in the inset plot, the standard faces the PSP instance , , deviation obtained with is greater than . ( and minimum energy value known ). In the axis log plot 4, it is evident how the information gain In this figure, we can see how the four curves decrease, al- is a more informative measure than the mean fitness. The stan- most monotonically, approximately in the first 20–40 gener- dard deviation, the uncertainty over the population of a given ations, whereas in the remaining generations all four curves generation (see the inset plot in Fig. 4), decreases quickly in reach a steady-state dynamics. The small oscillations are due the first ten generations. In fact, the IA converges to the global to the random nature of the overall process governing the hy- minimum in this temporal window. After this “threshold” the permutation and the hypermacromutation operators. The higher standard deviation suddenly increases, producing strong oscil- the average fitness of the hypermutated and hypermacromutated lations; that is, strong uncertainty regarding the current popula- clones, the higher is the diversity in the current population [19]. tions for . The mean value is essentially constant during all generations. A. Maximum Information Gain Principle For example, in the first generation, the IA gains more infor- To analyze the learning process, we use an entropy function, mation than in the second, because it generates more construc- the Information Gain. This measures the quantity of informa- tive mutations. Thus, the population at generation ex-
  • 7. This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. CUTELLO et al.: AN IMMUNE ALGORITHM FOR PROTEIN STRUCTURE PREDICTION ON LATTICE MODELS 7 Fig. 4. Information gain, mean fitness versus generations of IA on protein sequence Seq2, with parameter values d = 10, dup = 2, and = 5. Inset plot displays standard deviation. Fig. 5. Information gain and standard deviation versus generations on protein sequence Seq 2 varying 2 f1; 5g. tracts more informative building blocks than the population at As a function of and , we show the success rate (SR). the second generation. The 3-D plots obtained are characteristic parameter surfaces for the given operator. We set the population size to a minimal B. Searching Ability of Hypermutation and value , to emphasize the property of each operator when Hypermacromutation working with few B cells (points) in the conformational space. This strategy provides a good measure of the “real” performance To understand the searching ability of hypermutation opera- of single hypermutation procedures. In addition, the Termina- tors when across a range of parameter values, we performed a tion_Condition() function is allowed at most 10 fit- set of experiments on the PSP instance, . The duplication ness function evaluations and we performed for each value pair parameter varies from 1 to 10 and the aging parameter of the parameters 100 independent runs. Using the and is drawn from the set . (average number of evaluations to solution) values, our experi- We note that setting the parameter at a higher value than mental protocol has the following three objectives: the possible number of generations is equivalent to giving the 1) to plot the characteristic parameter surface of each hyper- B cell an infinite life, in effect, we turn off the aging operator. mutation operator;
  • 8. This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. 8 IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION Fig. 6. SR versus parameter values dup and for the sequence 2. The surface parameters of the combination of inversely proportional hypermutation and hypermacromutation operators. 2) to analyze the joint effects of hypermacromutations and If the native fold has energy value , we have hypermutations; 3) to find the best settings of the parameter values for each operator and for their combination, such that, the best de- limited region on the parameter surfaces maximizes the SR energy levels, thus the boundary of the first partition and sec- value and minimizes the AES value. ondary partition are, respectively Fig. 6 shows the surface parameters of the inversely propor- tional hypermutation operator and the combination of inversely proportional hypermutation and hypermacromutation operators (surface in bold face): the hypermacromutation extends the re- gion with high SR values, and in particular improves the region and where the inversely proportional hypermutation operator alone performed poorly ( and ). The highest peak, , is obtained for and with . Hence, the B cells with energy in the range V. PARTITIONING THE FUNNEL BY MEMORY B CELLS Folding energy landscapes are funnel-like, which means that many conformations have high energy and few have low energy. More formally, protein conformations having high free energy have a life span , the B cells with energy in the range (a single point on an energy landscape) have high conforma- tional entropy and states having low free energy (native state and other deep minima) have low conformational entropy. Discrete models have this characteristic, in particular, in the HP model where the energy level is a funnel landscape [44]; for example, have a longer life span , while all the B cells with the seq. 1 of the benchmarks (see Table I) over 83,779,155 valid energy have the same life span of B cells in the first conformations have approximately 66 10 conformations with partition . high “energy” (i.e., 0 or 1), and only four conformations in the Fig. 7 shows the partitioning of the funnel landscape of the native state with minimal energy (see [44, Table I]). PSP problem in three regions. The B cells either belong to the Starting from this simple topological observation, we can de- top region or to the bottom region and have life span , while duce that within the funnel landscape the hardest area to search the memory B cells belong to the middle region and have life is the middle region: it is typically rugged with many local span . minima. Theoretical findings in [56] and experimental results under- For this reason, we partition the funnel landscape in three taken by ourselves which are not reported in this paper, show regions where in the rugged middle region we allocate B cells that the hardest region to search is the middle. Typically, it is with a longer life span, called memory B cells. rugged containing many local minima. Therefore, we only apply
  • 9. This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. CUTELLO et al.: AN IMMUNE ALGORITHM FOR PROTEIN STRUCTURE PREDICTION ON LATTICE MODELS 9 TABLE III COMPARISONS OF ENERGY EVALUATIONS FOR THE 2-D HP MODEL Fig. 7. Partitioning of the funnel landscape using memory B cells. By observing the performance of the IA with no memory B cells (Table IV), we can see how the IA using does substantially better in 8 out of 12 PSP instances, while the IA memory B cells to such a region. Conformations whose energy using a longer life span, , does better in the remaining value is in the middle region, are allowed to mature. four PSP instances. Both IA versions reach the best-known con- formations with maximal success rate, , for the PSP VI. RESULTS FOR THE 2-D HP MODEL instances: 1, 2, 3, 6, 9, 10, 11, 12; for the protein sequences 4 In this section, we show the overall performances of the IA for and 5, the IA with obtains higher success rate values the protein structure prediction problem for the 2-D HP model than with , while for the protein sequences 7 and 8, the using the well-known tortilla benchmarks (see Table I). The ex- IA is not able to reach the best-known minima, and it appears to periments were performed with population size , du- become trapped in local minima. plication parameter , and maximum number of fitness In Table V, we present the results obtained by the IA function evaluations 10 . All the experimental results using memory B cells, with the following aging values reported in this section are averaged over 30 independent runs. , . We can Table III shows the number of energy evaluations required on see how the IA when using , performs the best run to achieve the optimum, (or the best value found), better in 8 out of 12 PSP instances, reaches (for the first time) by a genetic algorithm (GA) and a Monte Carlo approach as for the protein sequence No. 4 and increases the reported in [44], the multimeme algorithm [57], the estimation success rate value for the protein sequence No. 5 obtaining of distribution algorithms (EDA) [58], and the IA for eight in- . stances in the 2-D HP model used in [41] (see [41, Table 6]) and When comparing the results obtained by the IA with memory in [59] (see [59, Table 3]) as test bed for the folding algorithms. B cells (Table V) and without memory B cells (Table IV) we are Gaps in Table III indicate that the particular folding algorithm able to see that the IA with memory B cells with aging values has not been tested on the respective protein instance. As we can , outperforms the IA without memory B see, the IA obtains the lowest number of energy evaluations, ex- cells. In fact, this version reaches the best-known conformations cept for the protein instance 4 where the EMC approach reaches with maximal success rate, , for the PSP instances: 1, a lower number of energy evaluations, and for the protein in- 2, 3, 4, 6, 9, 10, 11, 12; 9 out of 12 PSP instances; and for the stance 8, where the EDAs obtain a conformation with 41 topo- protein sequence No. 5 obtains the highest success rate value, logical contacts. . For the sake of completeness, we must note the generality of Therefore, with the IA using memory B cells, and the MMA algorithm; it has been tested on the HP model and (see Table V), we find it is able to locate the best- functional model proteins using various lattices, 2-D and 3-D known energy values with maximum success rate on 9 pro- square and triangular lattice, without any modification to the tein instances over 12. For protein sequence 5, IA obtains a algorithm. It would appear that the algorithm performs robustly , while for the instances 7 and 8 (the “hard in- across all the models [41], [42], [57]. stances”), the IA reaches only suboptimal energy values, respec- Tables IV and V show the results obtained by the IA using no tively, 35, and 39 with high mean, and standard deviation memory B cells, and memory B cells. For each protein instance, values. To improve these protein instances we include in the IA, and for each value of , the tables report the SR, AES, best a special local search procedure known as the Long Range Move found energy value (b.f.), mean and standard deviation values. (as defined and used in [59]). This procedure tries to escape a In bold face, we show the best results reached for each instance local minimum by unfolding the candidate solutions when they of the tortilla benchmarks, sorted first by SR value, then by AES are trapped in a local minimum. value. For , we sort them by the following ordered cri- In [59], one of the key features of the improved ant colony teria: best found conformation, mean and standard deviation. optimization method is the Long Range Move (LRM). This
  • 10. This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. 10 IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION TABLE IV RESULTS OF THE IA WITH NO MEMORY B CELLS FOR THE 2-D HP MODEL TABLE V RESULTS OF THE IA WITH MEMORY B CELLS FOR THE 2-D HP MODEL local search, as noted by the authors, mimics the folding search starts a “chain reaction” that loops until a self-avoiding process of the real proteins, where a moving residue will path condition is held. Practically, this procedure allows a typically push its neighbors in the chain to different positions. given conformation to fold and unfold moves to escape local The first step of the procedure selects a direction in a given minima in the multiple-minima funnel landscape. Since the conformation uniformly at random. The second step of the above cited procedure is obviously time consuming, as in procedure randomly changes the direction, and then modifies the [59], we apply it to the best conformation in the current directions of the remaining residues probabilistically. The local population of the algorithm.
  • 11. This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. CUTELLO et al.: AN IMMUNE ALGORITHM FOR PROTEIN STRUCTURE PREDICTION ON LATTICE MODELS 11 TABLE VI Another Monte Carlo method is the EMC [63] that works RESULTS OF THE IA WITH LRM FOR THE 2-D HP MODEL, with populations of candidate solutions which are optimizated PROTEIN INSTANCES 5, 7, AND 8 by Monte Carlo simulation. This hybrid algorithm found the best-known structure for protein sequence 8, with energy . The contact interactions algorithm, CI, may be regarded as an extension of the standard MC method which is improved by the strategy of cooperativity. The major innovation of the CI approach [45] is that criteria for acceptance of new conforma- tions are not based on the energy of the entire protein, but on the fact that cooling factors associated with each residue de- fine regions of the model protein with higher or lower mobility. Hence, the CI is not a blind general purpose algorithm, it uses a In Table VI, we report the results of the IA using the long heuristic based on effects of an H-H contact on the mobility of range move (IA WITH LRM), for the hard instances, protein se- the residues in different portions of a protein. The CI algorithms, quences 5, 7, and 8. By inspecting the table, we note that for with a fixed starting temperature of (CI, ), or instance 5 the IA obtains poor results, while for sequence 7, with different starting temperature (CI), proved to be very effi- although the IA does not reach the best-known energy value cient to localize energy minima. ( 36), the algorithm with the long range move always reaches Among the best folding algorithms there is the tabu search the suboptimal energy value 35, which is the best result ob- strategy [65]. This incorporates problem domain knowledge into tained for the IA. It is worthy of note, that the longest protein the algorithm, for instance the conformational motifs, during the sequence the IA with LRM reaches the best-known energy value search process for finding low energy conformations. 42 with mean 39.2 and . The core-directed chain growth (CG) [64] is a very efficient algorithm that has found optimal and best-known conformations A. Comparison With State-of-Art Folding Algorithms for protein instances 1–6 and 8. It is an ad hoc heuristic that In this section, we briefly present related works for the 2-D approximates the hydrophobic core of discrete proteins. HP model. We present this here, as it allows us to show compar- The EDA [58] is a suitable class of nondeterministic search ative results of our proposed approach, and current state-of-art procedures for the HP model. EDA constructs an explicit prob- algorithms that are used on this problem. ability model of the candidate solutions selected and captures The first application of EAs on the HP model was in [44]: relevant interactions among the variables of the given protein a GA is employed, and conformations are changed by a muta- instance. The experimental results have proven the effectiveness tion operator that follows the conventional Monte Carlo steps of the EDA approach to face lattice models for the standard HP (MCmutation operator), and by a crossover operator. The au- benchmarks and the functional model proteins. thors found that the GA is superior to conventional Monte Carlo The state-of-art algorithm for 2-D HP problem is an improved methods (MC, LONG MC, and MULTIPLE MC). ant colony optimization (ACO) algorithm, IMPROVED (IACO) In [60], there is an improved version of the simple GA (SG) [59]. The improvements over the previous ACO algorithm [67], using a new crossover operator (systematic crossover), the are the following: long range moves for chain reconfigurations SGA-S: it couples the best candidate solutions, tests every pos- when the protein conformation is very compact (described in sible crossover point, and selects the two best conformations for Section VI); improving ants that take the global best solution the next generation. In addition, the authors implemented a new found so far and apply a randomized greedy local search to it; search strategy, the SGA with systematic crossover and pioneer and selective local search that performs the critical operation search, SGA-SC-P, which tried to prevent the population from of the local search phase only on promising low energy con- becoming too homogeneous, moving to a new search region formations. Moreover, in [59], the authors report poor perfor- every ten generations. mance of a local search procedure (ONLY L), modest perfor- A famous metaheuristic for combinatorial optimization is the mances of an (IMPROVED GA), and good performance of Pruned Memetic Algorithm. Memetic algorithms are EAs that include in Enriched Rosenbluth Method (PERM) [66]. The Grassberg and the evolutionary cycle a local search procedure [61]. The Mul- co-workers’ algorithm is a Monte Carlo method which is among timeme algorithm (MMA) [41], [61] is a memetic algorithm the best-known algorithms for the 2-D HP model, and is a bias that self-adaptively selects from a set of local searchers, which chain growth algorithm. PERM found the best solution for the heuristic to use during the search process for different instances. protein sequence No. 7, . The MMA has been used on different protein structure models In Table VII, we report the comparisons with the state-of-art with results competitive with other techniques [41]. algorithms for the 2-D HP model. The reported energy values To improve the performance of the GA, in [62] the authors are the lowest obtained by each method. Gaps in the table in- proposed a hybrid algorithm of GA and tabu search (TS) and dicate that a particular algorithm has not been tested on the re- novel crossover operator borrowed from the TS. The introduc- spective protein sequence. The shown results suggest that the tion of the TS approach improves the overall performances of proposed IA using the aging operator and memory B cells, and the GA and shows that in all the instances, the hybrid algorithm the IA with LRM are comparable to and, in many protein in- GTS works better than a GA alone. stances, outperform the best algorithms.
  • 12. This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. 12 IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION TABLE VII TABLE VIII IA VERSUS THE STATE-OF-ART ALGORITHMS FOR THE 2-D HP MODEL RESULTS OF THE IA FOR THE 3-D HP MODEL avoidance constraint such that each set of moves will correspond to a feasible sequence (feasible conformation). By inspecting the experimental results for all the considered instances, it is worthy to note that the IA (working with feasible solutions) locates the known minimum value. For all instances, the located mean value is lower than the results obtained in [68], where EAs working on feasible-space were employed. For sev- eral sequences presented in [69], (as shown in Table VIII), we have found new, best—lowest energy values for 3-D protein se- quences 5, 7, and 8 (results reported in bold face in Table VIII). For our experiments, the IA was set with standard parameter values: , , as described in [51], B cells have the aging parameter and memory B cells . For the experimental protocol, we adopted the same values used in [69]: 50 independent runs and a maximum number of evalua- tions equal to 10 . In [68], the author does not use the SR and AES values as quality metrics, but the following parameters: Best found solution (Best), mean and standard deviation . In addition, we designed an IA which made use of a penalty strategy and a repair-based approach as reported in [68], which obtained similar experimental results to [68] (not shown). Such VII. RESULTS FOR THE 3-D HP MODEL an IA proved to be very efficient for both absolute and relative encoding, and allowed us to find energy minima not found by The protein structure prediction problem with , and other EAs working in feasible space and described in literature making use of a square lattice, captures the protein folding [68]. problem in the 2-D HP model [39]. Analogously for and using a cubic lattice, we have the 3-D HP model [39]. VIII. RESULTS FOR THE FUNCTIONAL MODEL PROTEINS In the 3-D cubic lattice, each point has six different neigh- Table IX shows the benchmarks for the functional model pro- bors and five available locations. We use two different schemes teins into 2-D square lattice [41], [43], the instance number, of moves, (absolute and relative directions), to represent and protein sequence, optimal conformation, and the minimal en- embed a protein in the lattice. The relative and absolute en- ergy values. Each instance of the benchmarks2 has a unique na- coding were described in Section III-B: the residues directions tive fold conformation minimal energy value, , and an energy are relative to the direction of the previous move, whilst in the gap between and the first excited state (best suboptimal). In absolute directions, encoding the residues direction is relative Fig. 8, we report the native fold of all the protein sequences, to the axes defined by the lattice. each native fold has at least one binding site, or binding pocket Both for the absolute and relative coding, not all moves pro- vide a feasible conformation. In our work, we force the self- 2http://www.cs.nott.ac.uk/ñxk/HP-PDB/2dfmp.html
  • 13. This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. CUTELLO et al.: AN IMMUNE ALGORITHM FOR PROTEIN STRUCTURE PREDICTION ON LATTICE MODELS 13 TABLE IX TABLE X 2-D SQUARE LATTICE FUNCTIONAL MODEL INSTANCES [43], ELITIST AGING VERSUS PURE AGING IN THE FUNCTIONAL MODEL WHERE EACH PROTEIN SEQUENCE HAS 23 MONOMERS PROTEINS IN TERMS OF (SUCCESS RATE, AVERAGE NUMBER OF EVALUATIONS TO SOLUTION) this section, all experimental results reported, were obtained by the IA with inversely proportional hypermutation and hyperma- cromutation operators. All the values are averaged on 30 inde- pendent runs. Analogously to the HP model instances, in the functional model proteins, the pure aging procedure outperforms the elitist aging in term of SR and AES on all functional model instances. The IA with pure aging obtains a for all the instances excluding the functional protein sequence 3, where the algo- rithm reaches . This confirms the optimal searching ability and diversity generation of the pure aging strategy. In Table XI, we report the experimental results obtained when using memory B cells. As described previously, we partition the funnel landscape energy levels. For example, sequences 1 and 4 have , it follows that there are 21 energy levels, thus all the B cells with energy value in the range or with energy will have life span equal to , while the B cells with energy will be considered as memory B cells with a life span of . Table XI presents the best experimental results obtained using the aging values , , and ; the best values of SR and AES are shown in bold face. From these results, it is clear that , and appear to be the optimal choice for partitioning the funnel landscapes of the protein instances. When comparing the results obtained by the IA, when adopting a pure aging strategy and with the IA using Fig. 8. Native fold for 2-D square lattice functional model instances. memory B cells with , and , the algorithm performs slightly better without the memory B cells. We be- (which are illustrated in figure by arrows pointing to the binding lieve this is due to the length of the protein sequences of the site(s) of each functional model instance). functional model proteins, which have 23 residues only. Hence, In the first experiment, we compared the performance of the for short protein sequences , the IA without memory IA with and without elitist aging (see Table X) using the stan- B cells performs better than the IA with memory B cells, dard parameter values: , , , . In both for the HP model and for the functional model proteins.
  • 14. This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. 14 IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION TABLE XI We propose that more effective metrics to assess the overall IA PERFORMANCES USING MEMORY B CELLS IN TERMS OF (SUCCESS RATE, performances are the success rate values and the average AVERAGE NUMBER OF EVALUATIONS TO SOLUTION) FOR VARIOUS PAIRS OF ( , ) number of evaluations to solution. The number of fitness func- tion evaluations required by the best run for a given instance are less significant for showing the overall performance of the randomized algorithms. IX. IA VERSUS EAs It is worth a little time to highlight the contribution of our algorithms to the artificial immune systems discipline. The pro- posed IA makes use of a new hypermutation operator, the hyper- macromutation operator, that extends the region of the param- eter surface with high SR values, and in particular improves the region where the inversely proportional hypermutation operator alone, performed poorly. This simple random process, which does not use functions dependent upon constant parameters, im- proves the overall performance of the IA. The second innovation of the IA is the aging operator, which is used to generate and maintain diversity in the population. As shown in the plots and in the tables, the aging operator and memory B cells with a longer life span, are the key features of TABLE XII COMPARISON OF BEST RUNS FOR MMA [41], EDAS [58], AND IA, WITH AND the proposed approach that we feel can be inserted in any EA. WITHOUT MEMORY B CELLS FOR THE FUNCTIONAL MODEL PROTEINS In fact, as a selection operator, the aging operator is a general, problem- and algorithm-independent operator. Obviously, the implemented algorithm can be applied to any other combinatorial and numerical optimization problem apart from the protein structure prediction problem using suitable rep- resentations and variation operators [4], [9], [18], [69]. A. IA Versus Other Clonal Selection Algorithms A well-known clonal selection algorithm in the AIS literature, is CLONALG [8], [69]. This algorithm employs fitness values for proportional cloning, inversely proportional hypermutation, and a birth operator to introduce diversity in the current popula- tion along with a mutation rate to flip a bit of a B cell. Extended versions of this algorithm use a threshold value to clone the best cells in the present population. CLONALG maintains two populations: a population of antigens and a population of antibodies (indicated with ). The individual antibody, , and antigen, , are repre- sented by string attributes , that is, a point in an L-dimensional real-valued shape space S, . However, for long protein sequences , partitioning the The Ab population is the set of current candidate solutions, funnel landscape with the memory B cells appears to be a good and the Ag is the environment to be recognized. The algorithm strategy for effectively searching the rugged landscape in the loops for a predefined maximum number of generations . middle of the funnel. In the first step, affinity values (fitness function values) are Finally, in Table XII, we present the number of energy eval- determined for all in relation to the . Then, it selects uations required by the best run to locate the optimum or a sub- that are to be cloned independently and proportionally to optimum energy value. We compare the performances of the their antigenic affinities, thus generating the clone population IA with and without memory B cells, with the state-of-art al- . The higher the affinity, the higher the number of clones gorithms for the functional model proteins: the MMA [41] and generated for each of the with respect to the following the EDAs [58]. Both versions of the IA, outperform the MMA function: and EDA on all the test bed except for the protein instance 3 where EDA reaches a lower number of energy evaluations. In particular, the IA without memory B cells obtains the best re- (6) sults on 9 instances over 11.
  • 15. This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. CUTELLO et al.: AN IMMUNE ALGORITHM FOR PROTEIN STRUCTURE PREDICTION ON LATTICE MODELS 15 where is a multiplying factor. Each term of the sum corre- the overall performance of the IA in terms of solution quality sponds to the clone size of each Ab. The hypermutation operator and average number of evaluations to solution (metric, we feel performs an affinity maturation process inversely proportional is more robust than run-time values). We have made use of var- to the fitness values, generating the matured clone population ious discrete protein structure models, and have compared the . After having computed the antigenic affinity of the pop- results with the present state-of-art algorithms when applied to ulation , CLONALG randomly creates new antibodies each model. that will replace the lowest fit in the current population. As future work, we intend to tackle the prediction of 3-D Clearly, both CLONALG and the IA are inspired by the structures for actual proteins [70] using the designed IA. As clonal selection principle. Hence, they have many similarities, in [70], we plan to consider the parallelization of the folding but there are some significant differences (for a comparative algorithm in order to reduce execution time and resource study see [69]). We begin with the common features. Both expenditure. algorithms employ a population of Ag’s to represent the input, The IA uses a new hypermutation operator, the hyperma- whilst the population of immune entities (Ab’s or B cells) are cromutation operator, which does not use functions depending the candidate solutions to the given computational problem. upon constant parameters, and extends the region of the param- Both algorithms use hypermutation operators inversely propor- eter surface with high success rate values. This operator con- tional to the fitness values. tributes to the overall improvement in terms of performance of Considering their differences, while in the IA the cloning op- the IA. erator selects all the immune entities for cloning. In CLONALG, A second innovation of the IA is the aging operator, which it is possible to use a threshold value to clone only the best is used to generate and maintain diversity in the population. As cells in the current population. In CLONALG, the cloning op- demonstrated by our results, the aging operator and memory B erator is proportional to the fitness values depending on a mul- cells with a longer life span are the key features of the proposed tiplying factor (see (6)). However, in the IA, the cloning oper- approach, that we propose could be inserted in any EA. ator makes use of static cloning: each immune entity will pro- With regards to the actual computational results on the PSP duce clones. Hence, CLONALG uses proportional cloning, problem, we found that for short protein sequences , and the IA uses static cloning. The underlying notion of em- the IA without memory B cells performs better than the IA with ploying static cloning, is to give each point of the given search memory B cells both for the HP model and for the functional space equal opportunity to explore its neighborhood; propor- model proteins. For long protein sequences , parti- tional cloning provides a bias to each point of the search space tioning the funnel landscape with the memory B cells is a good based on its fitness function value. This bias could be useful or strategy to search more effectively the rugged landscape in the not, depending of course on the computational problem being middle of the funnel. addressed. To produce diversity in the population, at each generation ACKNOWLEDGMENT CLONALG uses a birth operator which introduces new The authors would like to express their gratitude to the anony- immune entities. The IA, however, uses an aging operator mous reviewers for their helpful comments on the manuscript. modeled by an expected life time parameter . Additionally, G. Nicosia would like to thank the Computing Laboratory, Uni- CLONALG uses memory B cells as an implicit “memory versity of Kent, Canterbury, U.K., for their kind support. mechanism,” an archive of the best candidate solutions, while the IA version for the partition of the landscape uses memory REFERENCES B cells with a longer mean life parameter with [1] D. Dasgupta, Artificial Immune Systems and Their Applica- greater than the mean life of standard B cells to allow a tions. Berlin, Germany: Springer-Verlag, 1999. search of the rugged regions of the landscape. [2] L. N. de Castro and J. Timmis, Artificial Immune Systems: A New Com- The selection scheme used by CLONALG to decide which putational Intelligence Paradigm. London, U.K.: Springer-Verlag, 2002. immune entities will go to the next generation uses elitism, [3] Y. Ishida, Immunity-Based Systems: A Design Perspective. Heidel- while the IA uses a standard —selection operator without berg, Germany: Springer-Verlag, 2004. an elitist strategy. [4] V. Cutello and G. Nicosia, “The clonal selection principle for in silico and in vitro computing,” in Recent Developments in Biologically In- Finally, as termination condition, CLONALG typically uses spired Computing, L. N. de Castro and F. J. von Zuben, Eds. Hershey, a fixed number of generations, while the IA can use three PA: Idea Group Publishing, 2004. [5] J. E. Hunt and D. E. Cooke, “Learning using an artificial immune termination conditions: a fixed number of generations, a max- system,” J. Netw. Comput. Appl., vol. 19, pp. 189–212, 1996. imum number of fitness function evaluations, or the maximum [6] G. Nicosia, F. Castiglione, and S. Motta, “Pattern recognition by pri- information-gain principle [19]. Though the last mary and secondary response of an artificial immune system,” Theory condition does not avoid the possibility of being trapped in in Biosciences, vol. 120, no. 2, pp. 93–106, 2001. [7] T. Fukuda and M. T. K. Mori, “Parallel search for multimodal func- local minima solutions, this termination condition measures the tion optimization with diversity and learning of immune algorithm,” in quantity of information the IA discovers during the convergence Artif. Immune Syst. Their Appl., D. Dasgupta, Ed. Berlin, Germany: process. Springer-Verlag, 1999. [8] L. N. de Castro and F. J. V. Zuben, “Learning and optimization using the clonal selection principle,” IEEE Trans. Evol. Comput., vol. 6, pp. X. CONCLUSION 239–251, Jun. 2002. [9] G. Nicosia, “Immune algorithms for optimization and protein struc- This paper has proposed a novel IA for the protein struc- ture prediction,” Ph.D. dissertation, Dept. Math. Comput. Sci., Univ. ture prediction problem. Within this paper, we have assessed Catania, Catania, Italy, 2004.
  • 16. This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. 16 IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION [10] G. B. Bezerra, L. N. de Castro, and F. J. V. Zuben, “A hierachical [34] V. Muñoz and W. A. Eaton, “A simple model for calculating the ki- immune network applied to gene expression data,” in Proc. 3rd Int. netics of protein folding from three dimensional structures,” in Proc. Conf. Artif. Immune Syst., G. Nicosia, V. Cutello, P. Bentley, and J. Natl. Acad. Sci. USA, 1999, vol. 96, no. 20, pp. 11311–11316. Timmis, Eds., Catania, Italy, Sep. 2004, pp. 14–27. [35] M. A. Apaydin, D. L. Brutlag, C. Guestrin, D. Hsu, and J.-C. [11] V. Cutello, G. Narzisi, and G. Nicosia, “A multi-objective evolutionary Latombe, “Stochastic roadmap simulation: An efficient representation approach to the protein structure prediction problem,” J. Royal So. In- and algorithm for analyzing molecular motion,” in Proc. Annu. terface, vol. 3, no. 6, pp. 139–151, Feb. 2006. Int. Conf. Comput. Molecular Biol., Washington, DC, Apr. 2002, [12] S. Ichikawa, A. Ishiguro, S. Kuboshiki, and Y. Uchikawa, “A method of pp. 12–21. gait coordination of hexapod robots using immune networks,” J. Artif. [36] N. M. Amato, K. A. Dill, and G. Song, “Using motion planning to map Life Robotics, vol. 2, pp. 19–23, 1998. protein folding landscapes and analyze folding kinetics of known native [13] S. Singh and S. Thayer, “Kilorobot search and rescue using an immuno- structures,” J. Comp. Biol., vol. 10, no. 3, pp. 239–255, 2003. logically inspired approach,” in Distributed Autonomous Robotic Sys- [37] K. F. Lau and K. A. Dill, “A lattice statistical mechanics model of tems. Berlin, Germany: Springer-Verlag, June 2002, vol. 5. the conformational and sequence spaces of proteins,” Macromolecules, [14] L. Kesheng, Z. Jun, C. Xianbin, and W. Xufa, “An algorithm based vol. 22, pp. 3986–3997, 1989. on immune principle adopted in controlling behavior of autonomous [38] K. A. Dill, S. Bromberg, K. Yue, K. M. Fiebig, D. P. Thomas, and H. S. mobile robots,” Comput. Eng. Appl., vol. 5, pp. 30–32, 2000. Chan, “Principles of protein folding: A perspective from simple exact [15] D. Dasgupta, “An artificial immune system as a multiagent decision models,” Protein Science, vol. 4, pp. 561–602, 1995. support system,” in Proc. IEEE Int. Conf. Syst., Man, Cybern., San [39] K. A. Dill, “Theory for the folding and stability of globular proteins,” Diego, CA, Oct. 1998, pp. 3816–3820. Biochemistry, vol. 24, no. 6, pp. 1501–1509, 1985. [16] J. Kim and P. Bentley, “Towards an artificial immune system for net- [40] B. P. Blackburne and J. D. Hirst, “Evolution of functional model pro- work intrusion detection: An investigation of clonal selection with neg- teins,” J. Chem. Phys., vol. 115, no. 4, pp. 1935–1942, 2001. ative selection operator,” in Proc. IEEE Int. Congr. Evol. Comput., [41] N. Krasnogor, B. P. Blackburne, E. K. Burke, and J. D. Hirst, “Multi- Seoul, Korea, May 2001, pp. 1244–1252. meme algorithms for protein structure prediction,” in Proc. Int. Conf. [17] D. Dasgupta and F. A. Gonzalez, “An immunity-based technique Parallel Problem Solving from Nature (PPSN VII), Granada, Spain, to characterize intrusions in computer networks,” IEEE Trans. Evol. Sep. 2002, pp. 769–778. Comput., vol. 6, pp. 281–291, Jun. 2002. [42] N. Krasnogor, “Towards robust memetic algorithms,” in Recent Ad- [18] V. Cutello and G. Nicosia, “An immunological approach to combina- vances in Memetic Algorithms, W. E. H. N. Krasnogor and J. E. Smith, torial optimization problems,” in Proc. 8th Ibero-American Conf. Artif. Eds. Berlin, Germany: Springer, 2004. Intell., Seville, Spain, Nov. 2002, pp. 361–370. [43] J. D. Hirst, “The evolutionary landscape of functional model proteins,” [19] V. Cutello, G. Nicosia, and M. Pavone, “A hybrid immune algorithm Protein Engineering, vol. 12, no. 9, pp. 721–726, 1999. with information gain for the graph coloring problem,” in Proc. LNCS [44] R. Unger and J. Moult, “Genetic algorithms for protein folding simu- on Genetic and Evol. Comput. Conf., Chicago, IL, Jul. 2003, vol. 2723, lations,” J. Mol. Biol., vol. 231, no. 1, pp. 75–81, 1993. pp. 171–182. [45] L. Toma and S. Toma, “Contact interactions method: A new algorithm [20] E. Hart and P. Ross, “The evolution and analysis of a potential antibody for protein folding simulations,” Protein Science, vol. 5, pp. 147–153, library for use in job-shop scheduling,” in New Ideas in Optimization, 1996. D. Corne, M. Dorigo, and F. Glover, Eds. London, U.K.: McGraw- [46] P. Crescenzi, D. Goldman, C. Papadimitriou, A. Piccolboni, and M. Hill, 1999. Yannakakis, “On the complexity of protein folding,” J. Comp. Biol., [21] D. Dasgupta and N. S. Majumdar, “Anomaly detection in multidimen- vol. 5, no. 3, pp. 423–466, 1998. sional data using negative selection algorithm,” in Proc. IEEE World [47] B. Berger and T. Leighton, “Protein folding in the hydrophobic-hy- Congr. Comput. Intell., Congr. Evol. Comput., Honolulu, HI, May drophilic model is NP complete,” J. Comp. Biol., vol. 5, pp. 27–40, 2002, pp. 1039–1044. 1998. [22] D. Bradley and A. M. Tyrrell, “Hardware fault tolerance: An im- [48] H. S. Chan and K. A. Dill, “Comparing folding codes for proteins munological solution,” in Proc. IEEE Int. Conf. Syst., Man, Cybern., and polymers,” Proteins: Struct., Funct., Genet., vol. 24, pp. 335–344, Nashville, TN, Oct. 2000, pp. 107–112. 1996. [23] P. J. C. Branco, J. A. Dente, and R. V. Mendes, “Using immunology [49] N. Krasnogor, W. E. Hart, J. Smith, and D. A. Pelta, “Protein struc- principles for fault detection,” IEEE Trans. Ind. Electron., vol. 50, no. ture prediction with evolutionary algorithms,” in Proc. Genetic Evol. 2, pp. 362–373, 2003. Comput. Conf., Orlando, FL, Jul. 1999, pp. 1596–1601. [24] S. Forrest, A. S. Perelson, L. Allen, and R. Cherukuri, “Self-nonself [50] F. M. Burnet, The Clonal Selection Theory of Acquired Immunity. discrimination in a computer,” in Proc. IEEE Symp. Research in Secu- Cambridge, U.K.: Cambridge Univ. Press, 1959. rity and Privacy, Oakland, CA, May 1994, pp. 202–212. [51] V. Cutello, G. Nicosia, and M. Pavone, “Exploring the capability of [25] D. Dasgupta, Y. Cao, and C. Yang, “An immunogenetic approach to immune algorithms: A characterization of hypermutation operators,” spectra recognition,” in Proc. Int. Conf. Genetic and Evol. Comput. in Proc. 3rd Int. Conf. Artif. Immune Syst., G. Nicosia, V. Cutello, Conf., Orlando, FL, Jul. 1999, pp. 149–155. P. Bentley, and J. Timmis, Eds., Catania, Italy, Sep. 2004, pp. [26] J. Timmis and M. Neal, “A recourse limited artificial immune system 263–276. for data analysis,” Knowledge Based Systems, vol. 14, no. 3–4, pp. [52] ——, “An immune algorithm with hyper-macromutations for the 2d 121–130, 2001. hydrophilic-hydrophobic model,” in Proc. Congr. Evol. Comput., Port- [27] S. Hedberg, “Combating computer viruses: Ibm’s new computer im- land, OR, Jun. 2004, pp. 1074–1080. mune system,” IEEE Parallel and Distributed Technology: Systems [53] N. Krasnogor, “Studies on the theory and design space of memetic al- Applications, vol. 4, no. 2, pp. 9–11, 1996. gorithms,” Ph.D. dissertation, Univ. West of England, Bristol, U.K., [28] G. Nicosia, V. Cutello, P. Bentley, and J. Timmis, in Proc. 3rd Int. 2002. Conf. Art. Immune Syst., 2004. [54] P. E. Seiden and F. Celada, “A model for simulating cognate recogni- [29] S. Stepney, R. Smith, J. Timmis, and A. Tyrrell, “Towards a concep- tion and response in the immune system,” J. Theor. Biology, vol. 158, tual framework for artificial immune systems,” in Proc. LNCS on Artif. pp. 329–357, 1992. Immune Syst., G. Nicosia, V. Cutello, P. Bentley, and J. Timmis, Eds., [55] H. P. Schwefel and G. Rudolph, “Contemporary evolution strategies,” Catania, Italy, Sep. 2004, vol. 3239, pp. 53–64. in Proc. 3rd Int. Conf. Advances in Artif. Life, F. Moràn, A. Moreno, J. [30] A. Freitas and J. Timmis, “Revisiting the foundations of artificial im- J. Merelo, and P. Chac’on, Eds., 1995, pp. 893–907. mune systems: A problem oriented perspective,” in Proc. LNCS on Ar- [56] S. S. Plotkin and J. N. Onuchic, “Understanding protein folding with tificial Immune Syst., J. Timmis, P. Bentley, and E. Hart, Eds., Edin- energy landscape theory,” Quart. Rev. Biophy., vol. 35, no. 2, pp. burgh, U.K., Sep. 2003, vol. 2787, pp. 229–241. 111–167, 2002. [31] M. Levitt, “Protein folding by restrained energy minimization and [57] D. A. Pelta and N. Krasnogor, “Multimeme algorithms using fuzzy molecular dynamics,” J. Mol. Biol., vol. 170, pp. 723–764, 1983. logic based memes for protein structure prediction,” in Recent Ad- [32] D. G. Covell, “Folding protein -carbon chains into compact forms by vances in Memetic Algorithms, W. E. H. N. Krasnogor and J. E. Smith, Monte Carlo methods,” Proteins: Struct. Funct. Genet., vol. 14, no. 4, Eds. Berlin, Germany: Springer-Verlag, 2004. pp. 409–420, 1992. [58] R. Santana, P. Larrañaga, and J. A. Lozano, “Protein folding in 2-di- [33] E. Alm and D. Baker, “Prediction of protein-folding mechanisms from mensional lattices with estimation of distribution algorithms,” in Proc. free-energy landscapes derived from native structures,” in Proc. Nat. 5th Int. Symp. Biol. Medical Data Analysis, Barcelona, Spain, Nov. Acad. Sci. USA, 1999, vol. 96, no. 20, pp. 11305–11310. 2004, pp. 388–398.
  • 17. This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. CUTELLO et al.: AN IMMUNE ALGORITHM FOR PROTEIN STRUCTURE PREDICTION ON LATTICE MODELS 17 [59] A. Shmygelska and H. H. Hoos, “An improved ant colony optimization Giuseppe Nicosia (M’01) received the Laurea de- algorithm for the 2d hp protein folding problem,” in Proc. 16th Canad. gree and the Ph.D. degree in computer science from Conf. Artif. Intell., Halifax, Canada, Jun. 2003, pp. 400–417. the University of Catania, Catania, Italy, in 2000 and [60] R. Konig and T. Dandekar, “Improving genetic algorithms for protein 2005, respectively. folding simulations by systematic crossover,” Biosystems, vol. 50, no. Since 2001, he has been Grant-Holder of Cineca 1, pp. 17–25, 1999. Supercomputing Center, Bologna, Italy, in the area [61] N. Krasnogor and J. Smith, “A tutorial for competent memetic al- of High-Performance Computing. In 2004, he was a gorithms: Model, taxonomy, and design issues,” IEEE Trans. Evol. Visiting Research Assistant in the Computing Labo- Comput., vol. 9, no. 5, pp. 474–488, Oct. 2005. ratory, Kent University, Canterbury, Kent, U.K. Since [62] T. Jiang, Q. Cui, G. Shi, and S. Ma, “Protein folding simulations of the October 2006, he has been an Associate Professor of hydrophobic-hydrophilic model by combining tabu search with genetic Computer Science at the University of Catania. He is algorithms,” J. Chem. Phys., vol. 119, no. 8, pp. 4592–4596, 2003. currently involved in the design and development of optimization algorithms [63] F. Liang and W. H. Wong, “Evolutionary Monte Carlo for protein for circuit design problems, a joint research project supported by the Univer- folding simulations,” J. Chem. Phys., vol. 115, no. 7, pp. 3374–3380, sity of Catania and STMicroeletronics. He is coauthor of more than 40 papers 2001. in international journals and conference proceedings, and three coedited books [64] T. C. Beutler and K. A. Dill, “A fast conformational search strategy for on artificial immune systems. He has chaired various international conferences finding low energy structures of model proteins,” Protein Science, vol. and workshops in the field of artificial immune systems. His primary research 5, no. 10, pp. 2037–2043, 1996. interests are design and analysis of artificial immune systems and immune algo- [65] M. Milostan, P. Lukasiak, K. A. Dill, and J. Bla˙ ewicz, “A tabu search z rithms, optimization, protein bioinformatics, artificial life, and various aspects strategy for finding low energy structures of proteins in HP-model,” in of unconventional model of computation. His current research interest lies in Proc. Annu. Int. Conf. Comput. Molecular Biol., Berlin, Germany, Apr. the design of hybrid evolutionary and immunological algorithms for dynamic 2003, pp. Poster No.5-108. environment and constrained multiobjective optimization problems. [66] H. Hsu, V. Mehra, W. Nadler, and P. Grassberger, “Growth algorithms for lattice heteropolymers at low temperatures,” J. Chem. Phys., vol. 118, no. 1, pp. 444–451, 2003. Mario Pavone received the M.Sc. and Ph.D. degrees [67] A. Shmygelska, R. Anguirre-Hernandez, and H. H. Hoos, “An ant colony optimization algorithm for the 2d HP protein folding problem,” in computer science from the University of Catania, Catania, Italy, in 1999 and 2004, respectively. in Proc. Int. Workshop Ant Algorithms, Brussels, Belgium, Sep. 2002, pp. 40–52. He is currently visiting the IBM-KAIST Bio-Com- puting Research Center, Korea Advanced Institute of [68] C. Cotta, “Protein structure prediction using evolutionary algorithms hybridized with backtracking,” in Artificial Neural Nets Problem Science and Technology (KAIST), Korea, doing re- Solving Methods, ser. Lecture Notes in Computer Science, J. Mira and search in computer science. His research interests in- clude biologically inspired computing, including arti- J. Álvarez, Eds. Berlin, Germany: Springer-Verlag, 2003, vol. 2687, pp. 321–328. ficial immune systems, combinatorial and numerical optimization, and computational biology, with partic- [69] V. Cutello, G. Narzisi, G. Nicosia, and M. Pavone, “Clonal selection algorithms: A comparative case study using effective mutation po- ular reference on protein structure prediction, mul- tiple sequence alignment, gene regulatory network, and gene expression data. tentials,” in Lecture Notes in Computer Science, Banff, Canada, Aug. 2005, vol. 3627, Proc. 4th Int. Conf. Artif. Immune Syst., pp. 13–28. [70] D. A. V. Veldhuizen, J. B. Zydallis, and G. B. Lamont, “Considerations in engineering parallel multiobjective evolutionary algorithms,” IEEE Jonathan Timmis (M’02) is a Reader at the Uni- Trans. Evol. Comput., vol. 7, no. 2, pp. 144–173, Apr. 2003. versity of York, York, U.K., in a joint appointment with the Department of Computer Science and Vincenzo Cutello received the Laurea degree and Department of Electronics. He has published over 60 the Ph.D. degree in mathematics from the University papers on artificial immune system related research. of Catania, Catania, Italy, in 1984 and 1989, respec- He has worked on immune inspired approaches to tively, and the M.S. and Ph.D. degrees in computer real-time fault detection in ATM machines, machine science from the Courant Institute of Mathematical learning, optimization, web mining, robot control, Sciences, New York University, New York, in 1989 software testing, and theoretical aspects of immune and 1991, respectively. inspired systems. His primary research interest is in Since 2001, he has been a Full Professor of Com- the computational abilities of the immune, neural and puter Science at the University of Catania. He is cur- endocrine systems and how they relate to computer science and engineering. rently Chairman of the undergraduate program in Ap- Dr. Timmis is the cofounder of the International Conference on Artificial plied Computer Science, and Director of the Interdis- Immune Systems (ICARIS). He is principle investigator for the EPSRC aca- ciplinary Research Center on Applied Computer Science, University of Catania. demic network on artificial immune systems, ARTIST, and co-investigator on He is author and coauthor of more than 100 papers in international journals and new EPSRC funded project exploring the use of immunological modeling tech- conference proceedings. His research activities are currently focused on the de- niques for the development of novel immune inspired algorithms for bioinfor- sign and analysis of evolutionary algorithms, decision procedures, fuzzy logic, matics (EP/D501377/1). He is heavily involved with the Grand Challenges for biological inspired computing, and artificial immune systems. Computer Science in the U.K. and now serves on the GC-7 committee.