SlideShare a Scribd company logo
Markov Models
• A Markov model is a finite state machine with N distinct states
begins at (Time t = 1) in initial state .
• It moves from current state to Next state according to the transition
probabilities associated with the Current state
• This kind of system is called Finite or Discrete Markov model.
Markov Property
• Markov Property : The Current state of the system depends only on
the previous state of the system
• The State of the system at Time [ T+1 ] depends on the state of the
system at time T.
•
Xt=1
Xt=2 Xt=3 Xt=4 Xt=5
• Set of states:
• Process moves from one state to another generating a sequence of states
•Markov chain property: probability of each subsequent state depends only on what
was the previous state:
•To define Markov model, the following probabilities have to be specified: transition
probabilities and initial probabilities
Markov Models
}
,
,
,
{ 2
1 N
s
s
s 

 ,
,
,
, 2
1 ik
i
i s
s
s
)
|
(
)
,
,
,
|
( 1
1
2
1 
  ik
ik
ik
i
i
ik s
s
P
s
s
s
s
P 
)
|
( j
i
ij s
s
P
a  )
( i
i s
P


Rain Dry
0.7
0.3
0.2 0.8
• Two states : ‘Rain’ and ‘Dry’.
• Transition probabilities: P(‘Rain’|‘Rain’)=0.3 , P(‘Dry’|‘Rain’)=0.7 ,
P(‘Rain’|‘Dry’)=0.2, P(‘Dry’|‘Dry’)=0.8
• Initial probabilities: say P(‘Rain’)=0.4 , P(‘Dry’)=0.6 .
Example of Markov Model
• By Markov chain property, probability of state sequence can be found by the formula:
• Suppose we want to calculate a probability of a sequence of states in our example,
{‘Dry’,’Dry’,’Rain’,Rain’}.
P({‘Dry’,’Dry’,’Rain’,Rain’} ) =
P(‘Rain’|’Rain’) P(‘Rain’|’Dry’) P(‘Dry’|’Dry’) P(‘Dry’)=
= 0.3*0.2*0.8*0.6
Calculation of sequence probability
)
(
)
|
(
)
|
(
)
|
(
)
,
,
,
(
)
|
(
)
,
,
,
(
)
,
,
,
|
(
)
,
,
,
(
1
1
2
2
1
1
1
2
1
1
1
2
1
1
2
1
2
1
i
i
i
ik
ik
ik
ik
ik
i
i
ik
ik
ik
i
i
ik
i
i
ik
ik
i
i
s
P
s
s
P
s
s
P
s
s
P
s
s
s
P
s
s
P
s
s
s
P
s
s
s
s
P
s
s
s
P

















Discrete Markov processes
• Consider the example through which the various elements that constitute a discrete homogeneous
Markov process can be introduced.
• System and states
• Let us consider a highly simplified model of the different states a stock-market is in, in a given week. We assume that
there are only three possible states:
 S1 : Bull market trend
 S2 : Bear market trend
 S3 : Stagnant market trend
• Transition probabilities
 Week after week, the stock-market moves from one state to another state.
 From previous data, it has been estimated that there are certain probabilities associated with these movements.
 These probabilities are called transition probabilities.
• Markov assumption
 We assume that the following statement (called Markov assumption or Markov property) regarding transition
probabilities is true:
 Let the weeks be counted as 1, 2, . . . and let an arbitrary week be the t-th week.
 Then, the state in week t + 1 depends only on the state in week t, regardless of the states in the previous weeks.
 This corresponds to saying that, given the present state, the future is independent of the past.
• Homogeneity assumption
 To simplify the computations, we assume that the following property, called the homogeneity assumption, is also true.
 The probability that the stock market is in a particular state in a particular week t + 1 given that it is in a particular state in week t, is independent of
t.
• Representation of transition probabilities
 Let the probability that a bull week is followed by another bull week be 90%, a bear week be 7.5%, and a stagnant week be 2.5%.
 Similarly, let the probability that a bear week is followed by another bull week be 15%, bear week be 80% and a stagnant week be
5%.
 Finally, let the probability that a stagnant week be followed by a bull week is 25%, a bear week be 25% and a stagnant week be
50%.
 The transition probabilities can be represented in two ways:
• (a) The states and the state transition probabilities can be represented diagrammatically as
• The state transition probabilities can also be represented by a matrix called the state transition matrix. Let us label the states as “1 = bull”, “2 = bear”
and “3 = stagnant” and consider the matrix. In this matrix, the element in the i-th row, j-th column represents the probability that the market in state
i is followed by market in state j.
Cont..
• Initial probabilities
• The initial probabilities are the probabilities that the stock-market is in a particular state
initially.
• These are denoted by π1, π2, π3: π1 is the probability that the stock-market is in bull state
initially; similarly, π2 and π3.
• The values of these probabilities can be presented as a vector:
• The discrete Markov process
• The functioning of the stock-markets with the three states S1, S2, S3 with the assumption
that the Markov property is true, the transition probabilities given by the matrix P and the
initial probabilities given by the vector Π constitute a discrete Markov process.
• Since we also assume the homogeneity property for the transition probabilities is true, it is a
homogeneous discrete Markov process.
Probabilities for future states
• The elements in this row vector represent the probabilities that the
stock-market is in the bull state, the bear state and the stagnant state
respectively in the second week.
• In general, the elements of the row vector Π T
P represent the
probabilities that the stock-market is in the bull state, the bear state
and the stagnant state respectively in the (n + 1)th week.
Example-2
Discrete Markov processes: General case
• A Markov process is a random process indexed by time, and with the
property that the future is independent of the past, given the present.
• The time space may be discrete taking the values 1, 2, . . . or
continuous taking any nonnegative real number as a value.
• Here, we consider only discrete time Markov processes.
In this matrix, the element in the i-th row, j-th column
represents the probability that the system in state Si
moves to state Sj . Here, in the state transition matrix A,
the sum of the elements in every row is 1.
Cont..
Probability for an observation sequence
• Observable Markov model
 The discrete Markov process is also called an observable Markov model or observable discrete Markov process.
 It is so called because the state of the system at any time t can be directly observed.
 If the state of the system cannot be directly observed, the system is called a hidden Markov model.
• Probability for an observation sequence
 In an observable Markov model, the states are observable.
 At any time t we know qt, and as the system moves from one state to another, we get an observation sequence that
is a sequence of states.
 The output of the process is the set of states at each instant of time where each state corresponds to a physical
observable event.
 Let O be an arbitrary observation sequence of length T. Let us consider a particular observation sequence
Machine learning fundamental concepts in detail
Learning the parameters (A and Π)
• Consider a homogeneous discrete Markov process with transition
matrix A and initial probability vector Π, where A and Π are the
parameters of the process.
• The following procedure may be applied to learn these parameters
EXAMPLE
Hidden Markov models
• Hidden Markov Models (HMMs) are
probabilistic models, it implies that the
Markov Model underlying the data is
hidden or unknown.
• More specifically, we only know
observational data and not information
about the states.
• HMM is determined by three model
parameters;
• The start probability; a vector containing the
probability for the state of being
the first state of the sequence.
• The state transition probabilities; a matrix
consisting of the probabilities of
transitioning from state Si to state Sj.
• The observation probability; the likelihood of
a certain observation, y, if the model is in
state Si.
Machine learning fundamental concepts in detail
Hidden markov models
( Probabilistic finite state automata )
• The Scenarios where states cannot be directly observed.
• We need an extension i.e, Hidden Markov Models
a11 a22
a33 a44
a12 a23
a34
b11
b14
b12
b13
1
2 3
4
• aij are state transition probabilities.
• bik are observation (output) probabilities.
• b11 + b12 + b13 + b14 = 1,
• b21 + b22 + b23 + b24 = 1.
Hidden markov model recognition
• For a given model M = { A, B, pi } and a given state sequence Q1 Q2 Q3
… QL , the probability of an observation sequence O1 O2 O3 … OL is
P(O|Q,M) = bQ1O1 bQ2O2 bQ3O3
… bQTOT
• For a given hidden Markov model M = { A, B, pi}
the probability of state sequence Q1 Q2 Q3 QL
is (the initial probability of Q1 is taken to be pQ1)
P(Q|M) = pQ1 aQ1Q2 aQ2Q3 aQ3Q4
… aQL-1QL
Hidden markov model recognition
• So for a given HMM, M the probability of an observed sequence
O1O2O3 … OT is obtained by summing over all possible state
sequences.
P(Q|M) = pQ1 aQ1Q2 aQ2Q3 aQ3Q4
… aQT-1QT
P(O|Q) = bQ1O1 bQ2O2 bQ3O3
… bQTOT
Low High
0.7
0.3
0.2 0.8
Dry
Rain
0.6 0.6
0.4 0.4
Example of Hidden Markov Model
Machine learning fundamental concepts in detail
Machine learning fundamental concepts in detail
Generalized Hidden Markov model (HMM):
Main issues using HMMs :
Solutions to basic problems
• Problem 1 is solved using the Forwards-Backwards algorithms.
• Problem 2 is solved by the Viterbi algorithm and posterior decoding.
• Finally, Problem 3 is solved by the Baum-Welch algorithm.
Machine learning fundamental concepts in detail
Solution to decoding problem ?
• Decoding problem: Viterbi Algorithm
• In this algorithm we go through the observations from start to end
referring a state of hidden machine for each observation.
• We also record the values of Overall Probability, Viterbi path
(sequence of states) and the viterbi probability( Probability of
observed state sequences in viterbi path )
• The probability of possible step given its corresponding observation is
probability of transmission times the emission probability.
Machine learning fundamental concepts in detail
Machine learning fundamental concepts in detail
Machine learning fundamental concepts in detail
Machine learning fundamental concepts in detail
Applications of HMM
• Cryptanalysis
• Speech Recognition
• Pattern Recognition
• Activity Recognition
• Machine Translation
Support Vector Machine
• Support Vector Machine or SVM is one of the most popular Supervised Learning
algorithms, which is used for Classification as well as Regression problems.
• However, primarily, it is used for Classification problems in Machine Learning.
• The goal of the SVM algorithm is to create the best line or decision boundary that
can segregate n-dimensional space into classes so that we can easily put the new
data point in the correct category in the future.
• This best decision boundary is called a hyperplane.
• SVM chooses the extreme points/vectors that help in creating the hyperplane.
• These extreme cases are called as support vectors, and hence algorithm is termed
as Support Vector Machine.
Machine learning fundamental concepts in detail
SVM Example
• Suppose we see a strange cat that also has some features of dogs, so if we want
a model that can accurately identify whether it is a cat or dog, so such a model
can be created by using the SVM algorithm.
• We will first train our model with lots of images of cats and dogs so that it can
learn about different features of cats and dogs, and then we test it with this
strange creature.
• So as support vector creates a decision boundary between these two data (cat
and dog) and choose extreme cases (support vectors), it will see the extreme
case of cat and dog.
• On the basis of the support vectors, it will classify it as a cat.
• SVM algorithm can be used for Face detection, image classification, text
categorization, etc.
Example
Types of SVM
• Linear SVM: Linear SVM is used for linearly separable data,
which means if a dataset can be classified into two classes
by using a single straight line, then such data is termed as
linearly separable data, and classifier is used called as Linear
SVM classifier.
• Non-linear SVM: Non-Linear SVM is used for non-linearly
separated data, which means if a dataset cannot be
classified by using a straight line, then such data is termed
as non-linear data and classifier used is called as Non-linear
SVM classifier.
Machine learning fundamental concepts in detail
Hyperplane and Support Vectors in the
SVM algorithm
• Hyperplane: There can be multiple lines/decision boundaries to segregate the classes in n-
dimensional space, but we need to find out the best decision boundary that helps to classify
the data points. This best boundary is known as the hyperplane of SVM.
• The dimensions of the hyperplane depend on the features present in the dataset, which
means if there are 2 features , then hyperplane will be a straight line. And if there are 3
features, then hyperplane will be a 2-dimension plane.
• We always create a hyperplane that has a maximum margin, which means the maximum
distance between the data points.
• Support Vectors:
• The data points or vectors that are the closest to the hyperplane and which affect the position
of the hyperplane are termed as Support Vector. Since these vectors support the hyperplane,
hence called a Support vector.
How does SVM works?
• Linear SVM:
• The working of the SVM algorithm can be
understood by using an example.
• Suppose we have a dataset that has two tags
(green and blue), and the dataset has two
features x1 and x2.
• We want a classifier that can classify the
pair(x1, x2) of coordinates in either green or
blue.
• So as it is 2-d space so by just using a straight
line, we can easily separate these two classes.
• But there can be multiple lines that can
separate these classes. Consider the image2.
• Hence, the SVM algorithm helps to find the
best line or decision boundary; this best
boundary or region is called as a hyperplane.
• SVM algorithm finds the closest point of the
lines from both the classes.
• These points are called support vectors. The
distance between the vectors and the
hyperplane is called as margin.
• And the goal of SVM is to maximize this
margin. The hyperplane with maximum
margin is called the optimal hyperplane.
Cont..
• Non-Linear SVM:
• If data is linearly arranged, then we can separate it by
using a straight line, but for non-linear data, we cannot
draw a single straight line. Consider the image1:
• So to separate these data points, we need to add one
more dimension. For linear data, we have used two
dimensions x and y, so for non-linear data, we will add
a third dimension z. It can be calculated as:
• z=x2
+y2
• So now, SVM will divide the datasets into classes in the
following way. Consider the image2:
• Since we are in 3-d Space, hence it is looking like a
plane parallel to the x-axis. If we convert it in 2d space
with z=1, then it will become as shown in image 3:
• Hence we get a circumference of radius 1 in case of
non-linear data.
Formulation of the problem
The SVM classifier
• The solution of the SVM problem gives us a classifier for
classifying unclassified data instances. This is known as the
SVM classifier for a given dataset.
Machine learning fundamental concepts in detail
Machine learning fundamental concepts in detail
Machine learning fundamental concepts in detail
Midpoint=1/2[(2+4),(1+3)]
=3,2
Genetic algorithm
Introduction to Optimization
• The process of making something better Based on a set of inputs and a set of outputs.
• Optimization refers to finding the values of inputs in such a way that we get the “best”
output values.
• The definition of “best” varies from problem to problem, but in mathematical terms, it
refers to maximizing or minimizing one or more objective functions, by varying the input
parameters.
• The set of all possible solutions or values which the inputs can take make up the search
space.
• In this search space, a point or a set of points lies which gives the optimal solution.
• The aim of optimization is to find that point or set of points in the search space.
Genetic Algorithms
• It is a Search-based optimization technique based on the principles of Genetics and Natural
Selection.
• Frequently used to find optimal or near-optimal solutions to difficult problems which otherwise
would take a lifetime to solve.
• Based on the evolutionary idea of natural selection and genetics.
• Designed to encourage the theory of “survival of the fittest”.
• Perform a random search to solve optimization problems.
• GA uses techniques that use the previous historical information to direct their search towards
optimization in the new search space.
• GAs are adaptive heuristic search algorithms i.e. the algorithms follow an iterative pattern that
changes with time.
• It is a type of reinforcement learning where the feedback is necessary without telling the correct
path to follow.
• The feedback can either be positive or negative.
What are Genetic Algorithms?
• Genetics is derived from the Greek word, “genesis” that means to grow.
• The genetics decides the heredity factors, resemblances, and differences between the
offsprings in the process of evolution.
• Genetic Algorithms are also derived from natural evolution.
• GAs are a subset of a much larger branch of computation known as Evolutionary Computation.
• Developed by John Holland and his students and colleagues at the University of Michigan.
• In GAs, we have a pool or a population of possible solutions to the given problem.
• These solutions then undergo recombination and mutation (like in natural genetics), producing new
children, and the process is repeated over various generations.
• Each individual (or candidate solution) is assigned a fitness value (based on its objective function
value) and the fitter individuals are given a higher chance to mate and yield more “fitter” individuals.
• This is in line with the Darwinian Theory of “Survival of the Fittest”.
• In this way we keep “evolving” better individuals or solutions over generations, till we reach a
stopping criterion.
Why Use Genetic Algorithms
• GAs are more robust algorithms that can be used for various
optimization problems.
• These algorithms do not deviate easily in the presence of noise,
unlike other AI algorithms.
• GAs can be used in the search for large space or multimodal space.
Advantages of GAs
• Does not require any derivative information (which may not be
available for many real-world problems).
• Is faster and more efficient as compared to the traditional methods.
• Has very good parallel capabilities.
• Optimizes both continuous and discrete functions and also multi-
objective problems.
• Provides a list of “good” solutions and not just a single solution.
• Always gets an answer to the problem, which gets better over the time.
• Useful when the search space is very large and there are a large
number of parameters involved.
Limitations of GAs
• GAs are not suited for all problems, especially problems which
are simple and for which derivative information is available.
• Fitness value is calculated repeatedly which might be
computationally expensive for some problems.
• Being stochastic, there are no guarantees on the optimality or
the quality of the solution.
• If not implemented properly, the GA may not converge to the
optimal solution.
Terminology In GA
• Population: It is a group of individuals. The population includes the number of individuals being tested, search space
information, and the phenotype parameters. Generally, the population is randomly initialized.
• Individuals: Individuals are a single solution in population. An individual has a set of parameters called genes. Genes
combined to form chromosomes.
• Fitness: The fitness tells the value of the problem’s phenotype.
• The fitness function tells how close the solution is to the optimal solution. Fitness function is determined in many ways such as the sum of all parameters
related to the problem – Euclidean distance, etc. There is no rule to evaluate fitness function.
• Genotype − A full combination of genes in an individual is called the genotype.
• Genotype is the population in the computation space.
• In the computation space, the solutions are represented in a way which can be easily understood and manipulated
using a computing system.
• Phenotype − A set of genotypes in a decoded form is called the phenotype. Phenotype is the population in the
actual real world solution space in which solutions are represented in a way they are represented in real world situations.
Terminology(cont..)
• Chromosomes − A chromosome is one such solution to the given problem.
• Gene − A gene is one element position of a chromosome. They are represented by a bit (0 or 1) string of
random length.
• Allele − It is the value a gene takes for a particular chromosome.
• Gene Pool: All possible combinations of genes that are all alleles in a population pool is called gene pool.
• Genome: The set of genes of a species is called a genome.
• Locus: Each gene has a position in a genome that is called locus.
• Decoding and Encoding −
• For simple problems, the phenotype and genotype spaces are the same.
• However, in most of the cases, the phenotype and genotype spaces are different.
• Decoding is a process of transforming a solution from the genotype to the phenotype space, while encoding is a
process of transforming from the phenotype to genotype space.
• Decoding should be fast as it is carried out repeatedly in a GA during the fitness value calculation.
• Genetic Operators − These alter the genetic composition of the offspring. These include crossover,
mutation, selection, etc.
Machine learning fundamental concepts in detail
Correlation Of A Chromosome With GA
• The human body has chromosomes that are made of genes.
• A set of all genes of a specific species is called the genome.
• In living beings, the genomes are stored in various
chromosomes while in GA all genes are stored in the same
chromosome.
Machine learning fundamental concepts in detail
Basic Structure and working of GA
• We start with an initial population (which
may be generated at random or seeded by
other heuristics), select parents from this
population for mating.
• Apply crossover and mutation operators
on the parents to generate new off-
springs.
• And finally these off-springs replace the
existing individuals in the population and
the process repeats.
• In this way genetic algorithms actually try
to mimic the human evolution to some
extent.
A simple genetic algorithm is:
• Start with the population created randomly.
• Calculate the fitness function of each chromosome.
• Repeat the steps till n offsprings are created. The
offsprings can be created as:
• Selection-Select a pair of chromosomes from the population.
• Crossover- the pair with probability pc to form offsprings.
• Mutate- the crossover with probability pm.
• Replace the original population with the new
population and go to step 2.
Genotype Representation(1/3)
• Binary Representation
 Simplest and most widely used representation in GAs.
 The genotype consists of bit strings.
 For some problems when the solution space consists of Boolean decision variables
– yes or no, the binary representation is natural.
 Take for example the 0/1 Knapsack Problem. If there are n items, we can represent
a solution by a binary string of n elements, where the xth
element tells whether the
item x is picked (1) or not (0).
 For other problems, such as, dealing with numbers, can be represented with the
binary representation.
 The problem with this kind of encoding is
 different bits have different significance and therefore mutation and crossover operators can
have undesired consequences.
 This can be resolved to some extent by using Gray Coding, as a change in one bit does not
have a massive effect on the solution.
Genotype Representation(2/3)
• Real Valued Representation
• For problems where we want to define the genes using continuous rather than discrete
variables, the real valued representation is the most natural.
• The precision of these real valued or floating point numbers is however limited to the
computer.
• Integer Representation
• For discrete valued genes, we cannot always limit the solution space to binary ‘yes’ or
‘no’.
• For example, if we want to encode the four distances – North, South, East and West,
we can encode them as {0,1,2,3}. In such cases, integer representation is desirable.
Genotype Representation(2/3)
• Permutation Representation
• In many problems, the solution is represented by an order of elements.
In such cases permutation representation is the most suited.
• A classic example of this representation is the travelling salesman
problem (TSP). In this the salesman has to take a tour of all the cities,
visiting each city exactly once and come back to the starting city.
Phases of GA
• Initial population
• Fitness function
• Selection
• Crossover
• Mutation
Population
• The process begins with a set of individuals which is called a Population.
• Each individual is a solution to the problem you want to solve.
• An individual is characterized by a set of parameters (variables) known as Genes.
• Genes are joined into a string to form a Chromosome (solution).
• It can also be defined as a set of chromosomes.
• In a genetic algorithm, the set of genes of an individual is represented using a string, in terms of an alphabet.
Usually, binary values are used (string of 1s and 0s). We say that we encode the genes in a chromosome.
• There are several things to be kept in mind when dealing with GA population −
• The diversity of the population should be maintained otherwise it might lead to premature convergence.
• The population size should not be kept very large as it can cause a GA to slow down, while a smaller
population might not be enough for a good mating pool.
• Therefore, an optimal population size needs to be decided by trial and error.
• The population is usually defined as a two dimensional array.
Population Models
• Steady State
• In steady state GA, we generate one or two off-springs in each iteration
and they replace one or two individuals from the population.
• A steady state GA is also known as Incremental GA.
• Generational
• In a generational model, we generate ‘n’ off-springs, where n is the
population size, and the entire population is replaced by the new one at
the end of the iteration.
Fitness function
• The fitness function simply defined is a function which takes a candidate solution to the
problem as input and produces as output how “fit” or how “good” the solution is with respect
to the problem in consideration.
• Calculation of fitness value is done repeatedly in a GA and therefore it should be sufficiently fast.
• A slow computation of the fitness value can adversely affect a GA and make it exceptionally slow.
• In most cases the fitness function and the objective function are the same as the objective is to
either maximize or minimize the given objective function.
• Characteristics of fitness function −
• The fitness function should be sufficiently fast to compute.
• It must quantitatively measure how fit a given solution is or how fit individuals can be produced from the
given solution.
Fitness calculation
• Each chromosome from the population is passed to the
objective function one by one and its fitness is calculated.
• For example, the fitness of each of the randomly generated
solutions in the previous step is calculated as :
• f(x1,x2)=x12
+x22
Parent Selection
• Parent Selection is the process of selecting parents which mate and recombine to create off-springs for the
next generation.
• Fitness Proportionate Selection
• One of the most popular ways of parent selection.
• In this every individual can become a parent with a probability which is proportional to its fitness.
• Therefore, fitter individuals have a higher chance of mating and propagating their features to the next generation.
• Consider a circular wheel. The wheel is divided into n pies, where n is the number of individuals in the population.
• Each individual gets a portion of the circle which is proportional to its fitness value.
• Two implementations of fitness proportionate selection are possible −
• Roulette Wheel Selection
• Stochastic Universal Sampling (SUS)
• Other selection methods
• Rank selection
• Tournament selection
Roulette Wheel Selection
• A fixed point is chosen on the wheel circumference as
shown and the wheel is rotated.
• The region of the wheel which comes in front of the fixed
point is chosen as the parent.
• For the second parent, the same process is repeated.
• It is clear that a fitter individual has a greater pie on the
wheel and therefore a greater chance of landing in front of
the fixed point when the wheel is rotated.
• Therefore, the probability of choosing an individual depends
directly on its fitness.
• Implementation wise, we use the following steps −
• Calculate S = the sum of a finesses.
• Generate a random number between 0 and S.
• Starting from the top of the population, keep adding the
finesses to the partial sum P, till P<S.
• The individual for which P exceeds S is the chosen individual.
Stochastic Universal Sampling (SUS)
• Similar to Roulette wheel selection, however instead of having just
one fixed point, we have multiple fixed points.
• All the parents are chosen in just one spin of the wheel.
• It encourages the highly fit individuals to be chosen at least once.
• Fitness proportionate selection methods don’t work for cases where
the fitness can take a negative value.
Tournament Selection
• In K-Way tournament selection, we select K individuals from the population at random and
select the best out of these to become a parent.
• The same process is repeated for selecting the next parent.
• Tournament Selection is also extremely popular as it can even work with negative fitness
values.
Rank Selection
• Rank Selection also works with negative fitness values and is mostly used when the
individuals in the population have very close fitness values (this happens usually at the
end of the run).
• This leads to each individual having an almost equal share of the pie (like in case of
fitness proportionate selection) as shown in the figure and hence each individual no
matter how fit relative to each other has an approximately same probability of getting
selected as a parent.
• This in turn leads to a loss in the selection pressure towards fitter individuals, making the
GA to make poor parent selections in such situations.
• In this, we remove the concept of a fitness value while selecting a parent.
• However, every individual in the population is ranked according to their fitness.
• The selection of the parents depends on the rank of each individual and not the fitness.
The higher ranked individuals are preferred more than the lower ranked ones.
Chromosome Fitness Value Rank
A 8.1 1
B 8.0 4
C 8.05 2
D 7.95 6
E 8.02 3
F 7.99 5
Random Selection
• In this strategy we randomly select parents from the existing
population.
• There is no selection pressure towards fitter individuals and
therefore this strategy is usually avoided.
Crossover
• The crossover operator is analogous to reproduction and biological crossover.
• A crossover point is chosen at random from within the genes.
• Offspring are created by exchanging the genes of parents among themselves until the crossover
point is reached.
• In this more than one parent is selected and one or more off-springs are produced using the
genetic material of the parents.
• Crossover is usually applied in a GA with a high probability – pc .
• Crossover Operators
• One Point Crossover
• In this one-point crossover, a random crossover point is selected and the tails of its two parents are swapped to get new off-
springs.
Crossover operator(cont..)
• Multi Point Crossover
• Multi point crossover is a generalization of the one-point crossover wherein
alternating segments are swapped to get new off-springs.
• Uniform Crossover
• In a uniform crossover, we don’t divide the chromosome into segments, rather
we treat each gene separately.
• In this, we essentially flip a coin for each chromosome to decide whether or not
it’ll be included in the off-spring.
• We can also bias the coin to one parent, to have more genetic material in the
child from that parent.
Mutation
• In certain new offspring formed, some of their genes can be subjected to a mutation with a low
random probability. This implies that some of the bits in the bit string can be flipped.
• Mutation occurs to maintain diversity within the population and prevent premature convergence.
• Some of the ways of mutation are:
• Flipping: Changing from 0 to 1 or 1 to 0.
• Interchanging: Two random positions are chosen, and the values are interchanged.
• Reversing: Random position is chosen and the bits next to it are reversed.
Types of mutation
• Flip mutation-This type of mutation is performed we use binary crossover. A randomly
selected bit of a chromosome is flipped, as shown in the following diagram.
• Swap mutation -Such kind of mutation is performed when we encode chromosomes as
permutations of a given set of elements. In this type of mutation, the alleles of two
randomly selected genes are swapped, as shown in the following diagram.
• Random initialization
• This is very similar to flip mutation but is used when the chromosome is encoded using
discrete values/integers. For example, if a gene's value can be any integer between -5 and
5, we choose a gene at random and reinitialize its value with any integer from the given
range. It is also depicted in the following diagram.
When To Stop Genetic Algorithm
• Best Individual Convergence: When the minimum fitness level drops below the
convergence value, the algorithm is stopped. It leads to faster convergence.
• Worst Individual Convergence: When the least fit individuals in the population attain
minimum fitness value below the convergence, then the algorithm is stopped. In this
method, the minimum fitness standard is maintained in the population. It means that the
best individual is not guaranteed but minimum fitness value individuals will be present.
• Sum of fitness: In this method, if the sum of fitness is less than or equal to convergence
value then the search is stopped. It guarantees that all the population is within the fitness
range.
• Median Fitness: In this method, at least half of the individuals in the population will be
better than or equal to convergence value.
• Some convergence criterion or stopping condition can be:
• When a specified number of generations have evolved.
• When the specified time to run the algorithm has been met.
• When the fitness value of the population does not change further with iterations.
Application areas
• Transport: Genetic algorithms are used in the traveling salesman problem to develop transport
plans that reduce the cost of travel and the time taken. They are also used to develop an efficient
way of delivering products.
• DNA Analysis: They are used in DNA analysis to establish the DNA structure using spectrometric
information.
• Multimodal Optimization: They are used to provide multiple optimum solutions in multimodal
optimization problems.
• Aircraft Design: They are used to develop parametric aircraft designs. The parameters of the
aircraft are modified and upgraded to provide better designs.
• Economics: They are used in economics to describe various models such as the game theory,
cobweb model, asset pricing, and schedule optimization.

More Related Content

PPTX
Markov Chains.pptx
PPTX
NLP_KASHK:Markov Models
PPT
Markov Chains
PPT
Markov chains1
PPTX
Hidden Markov Models
PDF
Lecture4 SIQ3003.pdf
PDF
Fundamentos de la cadena de markov - Libro
PPT
Hidden Markov Models with applications to speech recognition
Markov Chains.pptx
NLP_KASHK:Markov Models
Markov Chains
Markov chains1
Hidden Markov Models
Lecture4 SIQ3003.pdf
Fundamentos de la cadena de markov - Libro
Hidden Markov Models with applications to speech recognition

Similar to Machine learning fundamental concepts in detail (20)

PPT
Hidden Markov Models with applications to speech recognition
PDF
17-markov-chains.pdf
PPTX
Eigenstates of 2D Random Walk with Multiple Absorbing States
PPTX
State space analysis.pptx
PDF
12 Machine Learning Supervised Hidden Markov Chains
PPT
markov chain.ppt
PPTX
Lecture 6 - Marcov Chain introduction.pptx
PPTX
Hidden Markov Model
PPT
MAchin learning graphoalmodesland bayesian netorls
PPTX
Stat 2153 Stochastic Process and Markov chain
PPT
tommy shelby operation on Men United.ppt
PDF
Preparatory_questions_final_exam_DigitalElectronics1 (1).pdf
PPTX
Markov Model chains
PPTX
Stock Market Prediction using Hidden Markov Models and Investor sentiment
PPTX
Presentationhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh1.pptx
PPTX
Discrete state space model 9th &10th lecture
PPT
Markov analysis
PPTX
Hidden markov model
PPT
lecture1 (9).ppt
Hidden Markov Models with applications to speech recognition
17-markov-chains.pdf
Eigenstates of 2D Random Walk with Multiple Absorbing States
State space analysis.pptx
12 Machine Learning Supervised Hidden Markov Chains
markov chain.ppt
Lecture 6 - Marcov Chain introduction.pptx
Hidden Markov Model
MAchin learning graphoalmodesland bayesian netorls
Stat 2153 Stochastic Process and Markov chain
tommy shelby operation on Men United.ppt
Preparatory_questions_final_exam_DigitalElectronics1 (1).pdf
Markov Model chains
Stock Market Prediction using Hidden Markov Models and Investor sentiment
Presentationhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh1.pptx
Discrete state space model 9th &10th lecture
Markov analysis
Hidden markov model
lecture1 (9).ppt
Ad

Recently uploaded (20)

PDF
LDMMIA Reiki Yoga Finals Review Spring Summer
PDF
David L Page_DCI Research Study Journey_how Methodology can inform one's prac...
PPTX
Onco Emergencies - Spinal cord compression Superior vena cava syndrome Febr...
PDF
HVAC Specification 2024 according to central public works department
PDF
Chinmaya Tiranga quiz Grand Finale.pdf
PDF
Empowerment Technology for Senior High School Guide
PPTX
CHAPTER IV. MAN AND BIOSPHERE AND ITS TOTALITY.pptx
PDF
Trump Administration's workforce development strategy
PDF
1_English_Language_Set_2.pdf probationary
PPTX
Computer Architecture Input Output Memory.pptx
DOC
Soft-furnishing-By-Architect-A.F.M.Mohiuddin-Akhand.doc
PDF
Paper A Mock Exam 9_ Attempt review.pdf.
PDF
A GUIDE TO GENETICS FOR UNDERGRADUATE MEDICAL STUDENTS
PPTX
A powerpoint presentation on the Revised K-10 Science Shaping Paper
PPTX
B.Sc. DS Unit 2 Software Engineering.pptx
PPTX
Share_Module_2_Power_conflict_and_negotiation.pptx
PPTX
Introduction to pro and eukaryotes and differences.pptx
PDF
medical_surgical_nursing_10th_edition_ignatavicius_TEST_BANK_pdf.pdf
PDF
احياء السادس العلمي - الفصل الثالث (التكاثر) منهج متميزين/كلية بغداد/موهوبين
PDF
MBA _Common_ 2nd year Syllabus _2021-22_.pdf
LDMMIA Reiki Yoga Finals Review Spring Summer
David L Page_DCI Research Study Journey_how Methodology can inform one's prac...
Onco Emergencies - Spinal cord compression Superior vena cava syndrome Febr...
HVAC Specification 2024 according to central public works department
Chinmaya Tiranga quiz Grand Finale.pdf
Empowerment Technology for Senior High School Guide
CHAPTER IV. MAN AND BIOSPHERE AND ITS TOTALITY.pptx
Trump Administration's workforce development strategy
1_English_Language_Set_2.pdf probationary
Computer Architecture Input Output Memory.pptx
Soft-furnishing-By-Architect-A.F.M.Mohiuddin-Akhand.doc
Paper A Mock Exam 9_ Attempt review.pdf.
A GUIDE TO GENETICS FOR UNDERGRADUATE MEDICAL STUDENTS
A powerpoint presentation on the Revised K-10 Science Shaping Paper
B.Sc. DS Unit 2 Software Engineering.pptx
Share_Module_2_Power_conflict_and_negotiation.pptx
Introduction to pro and eukaryotes and differences.pptx
medical_surgical_nursing_10th_edition_ignatavicius_TEST_BANK_pdf.pdf
احياء السادس العلمي - الفصل الثالث (التكاثر) منهج متميزين/كلية بغداد/موهوبين
MBA _Common_ 2nd year Syllabus _2021-22_.pdf
Ad

Machine learning fundamental concepts in detail

  • 1. Markov Models • A Markov model is a finite state machine with N distinct states begins at (Time t = 1) in initial state . • It moves from current state to Next state according to the transition probabilities associated with the Current state • This kind of system is called Finite or Discrete Markov model.
  • 2. Markov Property • Markov Property : The Current state of the system depends only on the previous state of the system • The State of the system at Time [ T+1 ] depends on the state of the system at time T. • Xt=1 Xt=2 Xt=3 Xt=4 Xt=5
  • 3. • Set of states: • Process moves from one state to another generating a sequence of states •Markov chain property: probability of each subsequent state depends only on what was the previous state: •To define Markov model, the following probabilities have to be specified: transition probabilities and initial probabilities Markov Models } , , , { 2 1 N s s s    , , , , 2 1 ik i i s s s ) | ( ) , , , | ( 1 1 2 1    ik ik ik i i ik s s P s s s s P  ) | ( j i ij s s P a  ) ( i i s P  
  • 4. Rain Dry 0.7 0.3 0.2 0.8 • Two states : ‘Rain’ and ‘Dry’. • Transition probabilities: P(‘Rain’|‘Rain’)=0.3 , P(‘Dry’|‘Rain’)=0.7 , P(‘Rain’|‘Dry’)=0.2, P(‘Dry’|‘Dry’)=0.8 • Initial probabilities: say P(‘Rain’)=0.4 , P(‘Dry’)=0.6 . Example of Markov Model
  • 5. • By Markov chain property, probability of state sequence can be found by the formula: • Suppose we want to calculate a probability of a sequence of states in our example, {‘Dry’,’Dry’,’Rain’,Rain’}. P({‘Dry’,’Dry’,’Rain’,Rain’} ) = P(‘Rain’|’Rain’) P(‘Rain’|’Dry’) P(‘Dry’|’Dry’) P(‘Dry’)= = 0.3*0.2*0.8*0.6 Calculation of sequence probability ) ( ) | ( ) | ( ) | ( ) , , , ( ) | ( ) , , , ( ) , , , | ( ) , , , ( 1 1 2 2 1 1 1 2 1 1 1 2 1 1 2 1 2 1 i i i ik ik ik ik ik i i ik ik ik i i ik i i ik ik i i s P s s P s s P s s P s s s P s s P s s s P s s s s P s s s P                 
  • 6. Discrete Markov processes • Consider the example through which the various elements that constitute a discrete homogeneous Markov process can be introduced. • System and states • Let us consider a highly simplified model of the different states a stock-market is in, in a given week. We assume that there are only three possible states:  S1 : Bull market trend  S2 : Bear market trend  S3 : Stagnant market trend • Transition probabilities  Week after week, the stock-market moves from one state to another state.  From previous data, it has been estimated that there are certain probabilities associated with these movements.  These probabilities are called transition probabilities. • Markov assumption  We assume that the following statement (called Markov assumption or Markov property) regarding transition probabilities is true:  Let the weeks be counted as 1, 2, . . . and let an arbitrary week be the t-th week.  Then, the state in week t + 1 depends only on the state in week t, regardless of the states in the previous weeks.  This corresponds to saying that, given the present state, the future is independent of the past.
  • 7. • Homogeneity assumption  To simplify the computations, we assume that the following property, called the homogeneity assumption, is also true.  The probability that the stock market is in a particular state in a particular week t + 1 given that it is in a particular state in week t, is independent of t. • Representation of transition probabilities  Let the probability that a bull week is followed by another bull week be 90%, a bear week be 7.5%, and a stagnant week be 2.5%.  Similarly, let the probability that a bear week is followed by another bull week be 15%, bear week be 80% and a stagnant week be 5%.  Finally, let the probability that a stagnant week be followed by a bull week is 25%, a bear week be 25% and a stagnant week be 50%.  The transition probabilities can be represented in two ways: • (a) The states and the state transition probabilities can be represented diagrammatically as • The state transition probabilities can also be represented by a matrix called the state transition matrix. Let us label the states as “1 = bull”, “2 = bear” and “3 = stagnant” and consider the matrix. In this matrix, the element in the i-th row, j-th column represents the probability that the market in state i is followed by market in state j.
  • 8. Cont.. • Initial probabilities • The initial probabilities are the probabilities that the stock-market is in a particular state initially. • These are denoted by π1, π2, π3: π1 is the probability that the stock-market is in bull state initially; similarly, π2 and π3. • The values of these probabilities can be presented as a vector: • The discrete Markov process • The functioning of the stock-markets with the three states S1, S2, S3 with the assumption that the Markov property is true, the transition probabilities given by the matrix P and the initial probabilities given by the vector Π constitute a discrete Markov process. • Since we also assume the homogeneity property for the transition probabilities is true, it is a homogeneous discrete Markov process.
  • 9. Probabilities for future states • The elements in this row vector represent the probabilities that the stock-market is in the bull state, the bear state and the stagnant state respectively in the second week. • In general, the elements of the row vector Π T P represent the probabilities that the stock-market is in the bull state, the bear state and the stagnant state respectively in the (n + 1)th week.
  • 11. Discrete Markov processes: General case • A Markov process is a random process indexed by time, and with the property that the future is independent of the past, given the present. • The time space may be discrete taking the values 1, 2, . . . or continuous taking any nonnegative real number as a value. • Here, we consider only discrete time Markov processes.
  • 12. In this matrix, the element in the i-th row, j-th column represents the probability that the system in state Si moves to state Sj . Here, in the state transition matrix A, the sum of the elements in every row is 1.
  • 14. Probability for an observation sequence • Observable Markov model  The discrete Markov process is also called an observable Markov model or observable discrete Markov process.  It is so called because the state of the system at any time t can be directly observed.  If the state of the system cannot be directly observed, the system is called a hidden Markov model. • Probability for an observation sequence  In an observable Markov model, the states are observable.  At any time t we know qt, and as the system moves from one state to another, we get an observation sequence that is a sequence of states.  The output of the process is the set of states at each instant of time where each state corresponds to a physical observable event.  Let O be an arbitrary observation sequence of length T. Let us consider a particular observation sequence
  • 16. Learning the parameters (A and Π) • Consider a homogeneous discrete Markov process with transition matrix A and initial probability vector Π, where A and Π are the parameters of the process. • The following procedure may be applied to learn these parameters
  • 18. Hidden Markov models • Hidden Markov Models (HMMs) are probabilistic models, it implies that the Markov Model underlying the data is hidden or unknown. • More specifically, we only know observational data and not information about the states. • HMM is determined by three model parameters; • The start probability; a vector containing the probability for the state of being the first state of the sequence. • The state transition probabilities; a matrix consisting of the probabilities of transitioning from state Si to state Sj. • The observation probability; the likelihood of a certain observation, y, if the model is in state Si.
  • 20. Hidden markov models ( Probabilistic finite state automata ) • The Scenarios where states cannot be directly observed. • We need an extension i.e, Hidden Markov Models a11 a22 a33 a44 a12 a23 a34 b11 b14 b12 b13 1 2 3 4
  • 21. • aij are state transition probabilities. • bik are observation (output) probabilities. • b11 + b12 + b13 + b14 = 1, • b21 + b22 + b23 + b24 = 1.
  • 22. Hidden markov model recognition • For a given model M = { A, B, pi } and a given state sequence Q1 Q2 Q3 … QL , the probability of an observation sequence O1 O2 O3 … OL is P(O|Q,M) = bQ1O1 bQ2O2 bQ3O3 … bQTOT • For a given hidden Markov model M = { A, B, pi} the probability of state sequence Q1 Q2 Q3 QL is (the initial probability of Q1 is taken to be pQ1) P(Q|M) = pQ1 aQ1Q2 aQ2Q3 aQ3Q4 … aQL-1QL
  • 23. Hidden markov model recognition • So for a given HMM, M the probability of an observed sequence O1O2O3 … OT is obtained by summing over all possible state sequences. P(Q|M) = pQ1 aQ1Q2 aQ2Q3 aQ3Q4 … aQT-1QT P(O|Q) = bQ1O1 bQ2O2 bQ3O3 … bQTOT
  • 24. Low High 0.7 0.3 0.2 0.8 Dry Rain 0.6 0.6 0.4 0.4 Example of Hidden Markov Model
  • 29. Solutions to basic problems • Problem 1 is solved using the Forwards-Backwards algorithms. • Problem 2 is solved by the Viterbi algorithm and posterior decoding. • Finally, Problem 3 is solved by the Baum-Welch algorithm.
  • 31. Solution to decoding problem ? • Decoding problem: Viterbi Algorithm • In this algorithm we go through the observations from start to end referring a state of hidden machine for each observation. • We also record the values of Overall Probability, Viterbi path (sequence of states) and the viterbi probability( Probability of observed state sequences in viterbi path ) • The probability of possible step given its corresponding observation is probability of transmission times the emission probability.
  • 36. Applications of HMM • Cryptanalysis • Speech Recognition • Pattern Recognition • Activity Recognition • Machine Translation
  • 37. Support Vector Machine • Support Vector Machine or SVM is one of the most popular Supervised Learning algorithms, which is used for Classification as well as Regression problems. • However, primarily, it is used for Classification problems in Machine Learning. • The goal of the SVM algorithm is to create the best line or decision boundary that can segregate n-dimensional space into classes so that we can easily put the new data point in the correct category in the future. • This best decision boundary is called a hyperplane. • SVM chooses the extreme points/vectors that help in creating the hyperplane. • These extreme cases are called as support vectors, and hence algorithm is termed as Support Vector Machine.
  • 39. SVM Example • Suppose we see a strange cat that also has some features of dogs, so if we want a model that can accurately identify whether it is a cat or dog, so such a model can be created by using the SVM algorithm. • We will first train our model with lots of images of cats and dogs so that it can learn about different features of cats and dogs, and then we test it with this strange creature. • So as support vector creates a decision boundary between these two data (cat and dog) and choose extreme cases (support vectors), it will see the extreme case of cat and dog. • On the basis of the support vectors, it will classify it as a cat. • SVM algorithm can be used for Face detection, image classification, text categorization, etc.
  • 41. Types of SVM • Linear SVM: Linear SVM is used for linearly separable data, which means if a dataset can be classified into two classes by using a single straight line, then such data is termed as linearly separable data, and classifier is used called as Linear SVM classifier. • Non-linear SVM: Non-Linear SVM is used for non-linearly separated data, which means if a dataset cannot be classified by using a straight line, then such data is termed as non-linear data and classifier used is called as Non-linear SVM classifier.
  • 43. Hyperplane and Support Vectors in the SVM algorithm • Hyperplane: There can be multiple lines/decision boundaries to segregate the classes in n- dimensional space, but we need to find out the best decision boundary that helps to classify the data points. This best boundary is known as the hyperplane of SVM. • The dimensions of the hyperplane depend on the features present in the dataset, which means if there are 2 features , then hyperplane will be a straight line. And if there are 3 features, then hyperplane will be a 2-dimension plane. • We always create a hyperplane that has a maximum margin, which means the maximum distance between the data points. • Support Vectors: • The data points or vectors that are the closest to the hyperplane and which affect the position of the hyperplane are termed as Support Vector. Since these vectors support the hyperplane, hence called a Support vector.
  • 44. How does SVM works? • Linear SVM: • The working of the SVM algorithm can be understood by using an example. • Suppose we have a dataset that has two tags (green and blue), and the dataset has two features x1 and x2. • We want a classifier that can classify the pair(x1, x2) of coordinates in either green or blue. • So as it is 2-d space so by just using a straight line, we can easily separate these two classes. • But there can be multiple lines that can separate these classes. Consider the image2.
  • 45. • Hence, the SVM algorithm helps to find the best line or decision boundary; this best boundary or region is called as a hyperplane. • SVM algorithm finds the closest point of the lines from both the classes. • These points are called support vectors. The distance between the vectors and the hyperplane is called as margin. • And the goal of SVM is to maximize this margin. The hyperplane with maximum margin is called the optimal hyperplane.
  • 46. Cont.. • Non-Linear SVM: • If data is linearly arranged, then we can separate it by using a straight line, but for non-linear data, we cannot draw a single straight line. Consider the image1: • So to separate these data points, we need to add one more dimension. For linear data, we have used two dimensions x and y, so for non-linear data, we will add a third dimension z. It can be calculated as: • z=x2 +y2 • So now, SVM will divide the datasets into classes in the following way. Consider the image2: • Since we are in 3-d Space, hence it is looking like a plane parallel to the x-axis. If we convert it in 2d space with z=1, then it will become as shown in image 3: • Hence we get a circumference of radius 1 in case of non-linear data.
  • 48. The SVM classifier • The solution of the SVM problem gives us a classifier for classifying unclassified data instances. This is known as the SVM classifier for a given dataset.
  • 54. Introduction to Optimization • The process of making something better Based on a set of inputs and a set of outputs. • Optimization refers to finding the values of inputs in such a way that we get the “best” output values. • The definition of “best” varies from problem to problem, but in mathematical terms, it refers to maximizing or minimizing one or more objective functions, by varying the input parameters. • The set of all possible solutions or values which the inputs can take make up the search space. • In this search space, a point or a set of points lies which gives the optimal solution. • The aim of optimization is to find that point or set of points in the search space.
  • 55. Genetic Algorithms • It is a Search-based optimization technique based on the principles of Genetics and Natural Selection. • Frequently used to find optimal or near-optimal solutions to difficult problems which otherwise would take a lifetime to solve. • Based on the evolutionary idea of natural selection and genetics. • Designed to encourage the theory of “survival of the fittest”. • Perform a random search to solve optimization problems. • GA uses techniques that use the previous historical information to direct their search towards optimization in the new search space. • GAs are adaptive heuristic search algorithms i.e. the algorithms follow an iterative pattern that changes with time. • It is a type of reinforcement learning where the feedback is necessary without telling the correct path to follow. • The feedback can either be positive or negative.
  • 56. What are Genetic Algorithms? • Genetics is derived from the Greek word, “genesis” that means to grow. • The genetics decides the heredity factors, resemblances, and differences between the offsprings in the process of evolution. • Genetic Algorithms are also derived from natural evolution. • GAs are a subset of a much larger branch of computation known as Evolutionary Computation. • Developed by John Holland and his students and colleagues at the University of Michigan. • In GAs, we have a pool or a population of possible solutions to the given problem. • These solutions then undergo recombination and mutation (like in natural genetics), producing new children, and the process is repeated over various generations. • Each individual (or candidate solution) is assigned a fitness value (based on its objective function value) and the fitter individuals are given a higher chance to mate and yield more “fitter” individuals. • This is in line with the Darwinian Theory of “Survival of the Fittest”. • In this way we keep “evolving” better individuals or solutions over generations, till we reach a stopping criterion.
  • 57. Why Use Genetic Algorithms • GAs are more robust algorithms that can be used for various optimization problems. • These algorithms do not deviate easily in the presence of noise, unlike other AI algorithms. • GAs can be used in the search for large space or multimodal space.
  • 58. Advantages of GAs • Does not require any derivative information (which may not be available for many real-world problems). • Is faster and more efficient as compared to the traditional methods. • Has very good parallel capabilities. • Optimizes both continuous and discrete functions and also multi- objective problems. • Provides a list of “good” solutions and not just a single solution. • Always gets an answer to the problem, which gets better over the time. • Useful when the search space is very large and there are a large number of parameters involved.
  • 59. Limitations of GAs • GAs are not suited for all problems, especially problems which are simple and for which derivative information is available. • Fitness value is calculated repeatedly which might be computationally expensive for some problems. • Being stochastic, there are no guarantees on the optimality or the quality of the solution. • If not implemented properly, the GA may not converge to the optimal solution.
  • 60. Terminology In GA • Population: It is a group of individuals. The population includes the number of individuals being tested, search space information, and the phenotype parameters. Generally, the population is randomly initialized. • Individuals: Individuals are a single solution in population. An individual has a set of parameters called genes. Genes combined to form chromosomes. • Fitness: The fitness tells the value of the problem’s phenotype. • The fitness function tells how close the solution is to the optimal solution. Fitness function is determined in many ways such as the sum of all parameters related to the problem – Euclidean distance, etc. There is no rule to evaluate fitness function. • Genotype − A full combination of genes in an individual is called the genotype. • Genotype is the population in the computation space. • In the computation space, the solutions are represented in a way which can be easily understood and manipulated using a computing system. • Phenotype − A set of genotypes in a decoded form is called the phenotype. Phenotype is the population in the actual real world solution space in which solutions are represented in a way they are represented in real world situations.
  • 61. Terminology(cont..) • Chromosomes − A chromosome is one such solution to the given problem. • Gene − A gene is one element position of a chromosome. They are represented by a bit (0 or 1) string of random length. • Allele − It is the value a gene takes for a particular chromosome. • Gene Pool: All possible combinations of genes that are all alleles in a population pool is called gene pool. • Genome: The set of genes of a species is called a genome. • Locus: Each gene has a position in a genome that is called locus. • Decoding and Encoding − • For simple problems, the phenotype and genotype spaces are the same. • However, in most of the cases, the phenotype and genotype spaces are different. • Decoding is a process of transforming a solution from the genotype to the phenotype space, while encoding is a process of transforming from the phenotype to genotype space. • Decoding should be fast as it is carried out repeatedly in a GA during the fitness value calculation. • Genetic Operators − These alter the genetic composition of the offspring. These include crossover, mutation, selection, etc.
  • 63. Correlation Of A Chromosome With GA • The human body has chromosomes that are made of genes. • A set of all genes of a specific species is called the genome. • In living beings, the genomes are stored in various chromosomes while in GA all genes are stored in the same chromosome.
  • 65. Basic Structure and working of GA • We start with an initial population (which may be generated at random or seeded by other heuristics), select parents from this population for mating. • Apply crossover and mutation operators on the parents to generate new off- springs. • And finally these off-springs replace the existing individuals in the population and the process repeats. • In this way genetic algorithms actually try to mimic the human evolution to some extent.
  • 66. A simple genetic algorithm is: • Start with the population created randomly. • Calculate the fitness function of each chromosome. • Repeat the steps till n offsprings are created. The offsprings can be created as: • Selection-Select a pair of chromosomes from the population. • Crossover- the pair with probability pc to form offsprings. • Mutate- the crossover with probability pm. • Replace the original population with the new population and go to step 2.
  • 67. Genotype Representation(1/3) • Binary Representation  Simplest and most widely used representation in GAs.  The genotype consists of bit strings.  For some problems when the solution space consists of Boolean decision variables – yes or no, the binary representation is natural.  Take for example the 0/1 Knapsack Problem. If there are n items, we can represent a solution by a binary string of n elements, where the xth element tells whether the item x is picked (1) or not (0).  For other problems, such as, dealing with numbers, can be represented with the binary representation.  The problem with this kind of encoding is  different bits have different significance and therefore mutation and crossover operators can have undesired consequences.  This can be resolved to some extent by using Gray Coding, as a change in one bit does not have a massive effect on the solution.
  • 68. Genotype Representation(2/3) • Real Valued Representation • For problems where we want to define the genes using continuous rather than discrete variables, the real valued representation is the most natural. • The precision of these real valued or floating point numbers is however limited to the computer. • Integer Representation • For discrete valued genes, we cannot always limit the solution space to binary ‘yes’ or ‘no’. • For example, if we want to encode the four distances – North, South, East and West, we can encode them as {0,1,2,3}. In such cases, integer representation is desirable.
  • 69. Genotype Representation(2/3) • Permutation Representation • In many problems, the solution is represented by an order of elements. In such cases permutation representation is the most suited. • A classic example of this representation is the travelling salesman problem (TSP). In this the salesman has to take a tour of all the cities, visiting each city exactly once and come back to the starting city.
  • 70. Phases of GA • Initial population • Fitness function • Selection • Crossover • Mutation
  • 71. Population • The process begins with a set of individuals which is called a Population. • Each individual is a solution to the problem you want to solve. • An individual is characterized by a set of parameters (variables) known as Genes. • Genes are joined into a string to form a Chromosome (solution). • It can also be defined as a set of chromosomes. • In a genetic algorithm, the set of genes of an individual is represented using a string, in terms of an alphabet. Usually, binary values are used (string of 1s and 0s). We say that we encode the genes in a chromosome. • There are several things to be kept in mind when dealing with GA population − • The diversity of the population should be maintained otherwise it might lead to premature convergence. • The population size should not be kept very large as it can cause a GA to slow down, while a smaller population might not be enough for a good mating pool. • Therefore, an optimal population size needs to be decided by trial and error. • The population is usually defined as a two dimensional array.
  • 72. Population Models • Steady State • In steady state GA, we generate one or two off-springs in each iteration and they replace one or two individuals from the population. • A steady state GA is also known as Incremental GA. • Generational • In a generational model, we generate ‘n’ off-springs, where n is the population size, and the entire population is replaced by the new one at the end of the iteration.
  • 73. Fitness function • The fitness function simply defined is a function which takes a candidate solution to the problem as input and produces as output how “fit” or how “good” the solution is with respect to the problem in consideration. • Calculation of fitness value is done repeatedly in a GA and therefore it should be sufficiently fast. • A slow computation of the fitness value can adversely affect a GA and make it exceptionally slow. • In most cases the fitness function and the objective function are the same as the objective is to either maximize or minimize the given objective function. • Characteristics of fitness function − • The fitness function should be sufficiently fast to compute. • It must quantitatively measure how fit a given solution is or how fit individuals can be produced from the given solution.
  • 74. Fitness calculation • Each chromosome from the population is passed to the objective function one by one and its fitness is calculated. • For example, the fitness of each of the randomly generated solutions in the previous step is calculated as : • f(x1,x2)=x12 +x22
  • 75. Parent Selection • Parent Selection is the process of selecting parents which mate and recombine to create off-springs for the next generation. • Fitness Proportionate Selection • One of the most popular ways of parent selection. • In this every individual can become a parent with a probability which is proportional to its fitness. • Therefore, fitter individuals have a higher chance of mating and propagating their features to the next generation. • Consider a circular wheel. The wheel is divided into n pies, where n is the number of individuals in the population. • Each individual gets a portion of the circle which is proportional to its fitness value. • Two implementations of fitness proportionate selection are possible − • Roulette Wheel Selection • Stochastic Universal Sampling (SUS) • Other selection methods • Rank selection • Tournament selection
  • 76. Roulette Wheel Selection • A fixed point is chosen on the wheel circumference as shown and the wheel is rotated. • The region of the wheel which comes in front of the fixed point is chosen as the parent. • For the second parent, the same process is repeated. • It is clear that a fitter individual has a greater pie on the wheel and therefore a greater chance of landing in front of the fixed point when the wheel is rotated. • Therefore, the probability of choosing an individual depends directly on its fitness. • Implementation wise, we use the following steps − • Calculate S = the sum of a finesses. • Generate a random number between 0 and S. • Starting from the top of the population, keep adding the finesses to the partial sum P, till P<S. • The individual for which P exceeds S is the chosen individual.
  • 77. Stochastic Universal Sampling (SUS) • Similar to Roulette wheel selection, however instead of having just one fixed point, we have multiple fixed points. • All the parents are chosen in just one spin of the wheel. • It encourages the highly fit individuals to be chosen at least once. • Fitness proportionate selection methods don’t work for cases where the fitness can take a negative value.
  • 78. Tournament Selection • In K-Way tournament selection, we select K individuals from the population at random and select the best out of these to become a parent. • The same process is repeated for selecting the next parent. • Tournament Selection is also extremely popular as it can even work with negative fitness values.
  • 79. Rank Selection • Rank Selection also works with negative fitness values and is mostly used when the individuals in the population have very close fitness values (this happens usually at the end of the run). • This leads to each individual having an almost equal share of the pie (like in case of fitness proportionate selection) as shown in the figure and hence each individual no matter how fit relative to each other has an approximately same probability of getting selected as a parent. • This in turn leads to a loss in the selection pressure towards fitter individuals, making the GA to make poor parent selections in such situations. • In this, we remove the concept of a fitness value while selecting a parent. • However, every individual in the population is ranked according to their fitness. • The selection of the parents depends on the rank of each individual and not the fitness. The higher ranked individuals are preferred more than the lower ranked ones.
  • 80. Chromosome Fitness Value Rank A 8.1 1 B 8.0 4 C 8.05 2 D 7.95 6 E 8.02 3 F 7.99 5
  • 81. Random Selection • In this strategy we randomly select parents from the existing population. • There is no selection pressure towards fitter individuals and therefore this strategy is usually avoided.
  • 82. Crossover • The crossover operator is analogous to reproduction and biological crossover. • A crossover point is chosen at random from within the genes. • Offspring are created by exchanging the genes of parents among themselves until the crossover point is reached. • In this more than one parent is selected and one or more off-springs are produced using the genetic material of the parents. • Crossover is usually applied in a GA with a high probability – pc . • Crossover Operators • One Point Crossover • In this one-point crossover, a random crossover point is selected and the tails of its two parents are swapped to get new off- springs.
  • 83. Crossover operator(cont..) • Multi Point Crossover • Multi point crossover is a generalization of the one-point crossover wherein alternating segments are swapped to get new off-springs. • Uniform Crossover • In a uniform crossover, we don’t divide the chromosome into segments, rather we treat each gene separately. • In this, we essentially flip a coin for each chromosome to decide whether or not it’ll be included in the off-spring. • We can also bias the coin to one parent, to have more genetic material in the child from that parent.
  • 84. Mutation • In certain new offspring formed, some of their genes can be subjected to a mutation with a low random probability. This implies that some of the bits in the bit string can be flipped. • Mutation occurs to maintain diversity within the population and prevent premature convergence. • Some of the ways of mutation are: • Flipping: Changing from 0 to 1 or 1 to 0. • Interchanging: Two random positions are chosen, and the values are interchanged. • Reversing: Random position is chosen and the bits next to it are reversed.
  • 85. Types of mutation • Flip mutation-This type of mutation is performed we use binary crossover. A randomly selected bit of a chromosome is flipped, as shown in the following diagram. • Swap mutation -Such kind of mutation is performed when we encode chromosomes as permutations of a given set of elements. In this type of mutation, the alleles of two randomly selected genes are swapped, as shown in the following diagram. • Random initialization • This is very similar to flip mutation but is used when the chromosome is encoded using discrete values/integers. For example, if a gene's value can be any integer between -5 and 5, we choose a gene at random and reinitialize its value with any integer from the given range. It is also depicted in the following diagram.
  • 86. When To Stop Genetic Algorithm • Best Individual Convergence: When the minimum fitness level drops below the convergence value, the algorithm is stopped. It leads to faster convergence. • Worst Individual Convergence: When the least fit individuals in the population attain minimum fitness value below the convergence, then the algorithm is stopped. In this method, the minimum fitness standard is maintained in the population. It means that the best individual is not guaranteed but minimum fitness value individuals will be present. • Sum of fitness: In this method, if the sum of fitness is less than or equal to convergence value then the search is stopped. It guarantees that all the population is within the fitness range. • Median Fitness: In this method, at least half of the individuals in the population will be better than or equal to convergence value. • Some convergence criterion or stopping condition can be: • When a specified number of generations have evolved. • When the specified time to run the algorithm has been met. • When the fitness value of the population does not change further with iterations.
  • 87. Application areas • Transport: Genetic algorithms are used in the traveling salesman problem to develop transport plans that reduce the cost of travel and the time taken. They are also used to develop an efficient way of delivering products. • DNA Analysis: They are used in DNA analysis to establish the DNA structure using spectrometric information. • Multimodal Optimization: They are used to provide multiple optimum solutions in multimodal optimization problems. • Aircraft Design: They are used to develop parametric aircraft designs. The parameters of the aircraft are modified and upgraded to provide better designs. • Economics: They are used in economics to describe various models such as the game theory, cobweb model, asset pricing, and schedule optimization.