Machine learning fundamental concepts in detail

Markov Models
• A Markov model is a finite state machine with N distinct states
begins at (Time t = 1) in initial state .
• It moves from current state to Next state according to the transition
probabilities associated with the Current state
• This kind of system is called Finite or Discrete Markov model.

Markov Property
• Markov Property : The Current state of the system depends only on
the previous state of the system
• The State of the system at Time [ T+1 ] depends on the state of the
system at time T.
•
Xt=1
Xt=2 Xt=3 Xt=4 Xt=5

• Set of states:
• Process moves from one state to another generating a sequence of states
•Markov chain property: probability of each subsequent state depends only on what
was the previous state:
•To define Markov model, the following probabilities have to be specified: transition
probabilities and initial probabilities
Markov Models
}
,
,
,
{ 2
1 N
s
s
s 

 ,
,
,
, 2
1 ik
i
i s
s
s
)
|
(
)
,
,
,
|
( 1
1
2
1 
  ik
ik
ik
i
i
ik s
s
P
s
s
s
s
P 
)
|
( j
i
ij s
s
P
a  )
( i
i s
P



Rain Dry
0.7
0.3
0.2 0.8
• Two states : ‘Rain’ and ‘Dry’.
• Transition probabilities: P(‘Rain’|‘Rain’)=0.3 , P(‘Dry’|‘Rain’)=0.7 ,
P(‘Rain’|‘Dry’)=0.2, P(‘Dry’|‘Dry’)=0.8
• Initial probabilities: say P(‘Rain’)=0.4 , P(‘Dry’)=0.6 .
Example of Markov Model

• By Markov chain property, probability of state sequence can be found by the formula:
• Suppose we want to calculate a probability of a sequence of states in our example,
{‘Dry’,’Dry’,’Rain’,Rain’}.
P({‘Dry’,’Dry’,’Rain’,Rain’} ) =
P(‘Rain’|’Rain’) P(‘Rain’|’Dry’) P(‘Dry’|’Dry’) P(‘Dry’)=
= 0.3*0.2*0.8*0.6
Calculation of sequence probability
)
(
)
|
(
)
|
(
)
|
(
)
,
,
,
(
)
|
(
)
,
,
,
(
)
,
,
,
|
(
)
,
,
,
(
1
1
2
2
1
1
1
2
1
1
1
2
1
1
2
1
2
1
i
i
i
ik
ik
ik
ik
ik
i
i
ik
ik
ik
i
i
ik
i
i
ik
ik
i
i
s
P
s
s
P
s
s
P
s
s
P
s
s
s
P
s
s
P
s
s
s
P
s
s
s
s
P
s
s
s
P


















Discrete Markov processes
• Consider the example through which the various elements that constitute a discrete homogeneous
Markov process can be introduced.
• System and states
• Let us consider a highly simplified model of the different states a stock-market is in, in a given week. We assume that
there are only three possible states:
 S1 : Bull market trend
 S2 : Bear market trend
 S3 : Stagnant market trend
• Transition probabilities
 Week after week, the stock-market moves from one state to another state.
 From previous data, it has been estimated that there are certain probabilities associated with these movements.
 These probabilities are called transition probabilities.
• Markov assumption
 We assume that the following statement (called Markov assumption or Markov property) regarding transition
probabilities is true:
 Let the weeks be counted as 1, 2, . . . and let an arbitrary week be the t-th week.
 Then, the state in week t + 1 depends only on the state in week t, regardless of the states in the previous weeks.
 This corresponds to saying that, given the present state, the future is independent of the past.

• Homogeneity assumption
 To simplify the computations, we assume that the following property, called the homogeneity assumption, is also true.
 The probability that the stock market is in a particular state in a particular week t + 1 given that it is in a particular state in week t, is independent of
t.
• Representation of transition probabilities
 Let the probability that a bull week is followed by another bull week be 90%, a bear week be 7.5%, and a stagnant week be 2.5%.
 Similarly, let the probability that a bear week is followed by another bull week be 15%, bear week be 80% and a stagnant week be
5%.
 Finally, let the probability that a stagnant week be followed by a bull week is 25%, a bear week be 25% and a stagnant week be
50%.
 The transition probabilities can be represented in two ways:
• (a) The states and the state transition probabilities can be represented diagrammatically as
• The state transition probabilities can also be represented by a matrix called the state transition matrix. Let us label the states as “1 = bull”, “2 = bear”
and “3 = stagnant” and consider the matrix. In this matrix, the element in the i-th row, j-th column represents the probability that the market in state
i is followed by market in state j.

Cont..
• Initial probabilities
• The initial probabilities are the probabilities that the stock-market is in a particular state
initially.
• These are denoted by π1, π2, π3: π1 is the probability that the stock-market is in bull state
initially; similarly, π2 and π3.
• The values of these probabilities can be presented as a vector:
• The discrete Markov process
• The functioning of the stock-markets with the three states S1, S2, S3 with the assumption
that the Markov property is true, the transition probabilities given by the matrix P and the
initial probabilities given by the vector Π constitute a discrete Markov process.
• Since we also assume the homogeneity property for the transition probabilities is true, it is a
homogeneous discrete Markov process.

Probabilities for future states
• The elements in this row vector represent the probabilities that the
stock-market is in the bull state, the bear state and the stagnant state
respectively in the second week.
• In general, the elements of the row vector Π T
P represent the
probabilities that the stock-market is in the bull state, the bear state
and the stagnant state respectively in the (n + 1)th week.

Discrete Markov processes: General case
• A Markov process is a random process indexed by time, and with the
property that the future is independent of the past, given the present.
• The time space may be discrete taking the values 1, 2, . . . or
continuous taking any nonnegative real number as a value.
• Here, we consider only discrete time Markov processes.

In this matrix, the element in the i-th row, j-th column
represents the probability that the system in state Si
moves to state Sj . Here, in the state transition matrix A,
the sum of the elements in every row is 1.

Probability for an observation sequence
• Observable Markov model
 The discrete Markov process is also called an observable Markov model or observable discrete Markov process.
 It is so called because the state of the system at any time t can be directly observed.
 If the state of the system cannot be directly observed, the system is called a hidden Markov model.
• Probability for an observation sequence
 In an observable Markov model, the states are observable.
 At any time t we know qt, and as the system moves from one state to another, we get an observation sequence that
is a sequence of states.
 The output of the process is the set of states at each instant of time where each state corresponds to a physical
observable event.
 Let O be an arbitrary observation sequence of length T. Let us consider a particular observation sequence

Machine learning fundamental concepts in detail

Learning the parameters (A and Π)
• Consider a homogeneous discrete Markov process with transition
matrix A and initial probability vector Π, where A and Π are the
parameters of the process.
• The following procedure may be applied to learn these parameters

Hidden Markov models
• Hidden Markov Models (HMMs) are
probabilistic models, it implies that the
Markov Model underlying the data is
hidden or unknown.
• More specifically, we only know
observational data and not information
about the states.
• HMM is determined by three model
parameters;
• The start probability; a vector containing the
probability for the state of being
the first state of the sequence.
• The state transition probabilities; a matrix
consisting of the probabilities of
transitioning from state Si to state Sj.
• The observation probability; the likelihood of
a certain observation, y, if the model is in
state Si.

Hidden markov models
( Probabilistic finite state automata )
• The Scenarios where states cannot be directly observed.
• We need an extension i.e, Hidden Markov Models
a11 a22
a33 a44
a12 a23
a34
b11
b14
b12
b13
1
2 3
4

• aij are state transition probabilities.
• bik are observation (output) probabilities.
• b11 + b12 + b13 + b14 = 1,
• b21 + b22 + b23 + b24 = 1.

Hidden markov model recognition
• For a given model M = { A, B, pi } and a given state sequence Q1 Q2 Q3
… QL , the probability of an observation sequence O1 O2 O3 … OL is
P(O|Q,M) = bQ1O1 bQ2O2 bQ3O3
… bQTOT
• For a given hidden Markov model M = { A, B, pi}
the probability of state sequence Q1 Q2 Q3 QL
is (the initial probability of Q1 is taken to be pQ1)
P(Q|M) = pQ1 aQ1Q2 aQ2Q3 aQ3Q4
… aQL-1QL

Hidden markov model recognition
• So for a given HMM, M the probability of an observed sequence
O1O2O3 … OT is obtained by summing over all possible state
sequences.
P(Q|M) = pQ1 aQ1Q2 aQ2Q3 aQ3Q4
… aQT-1QT
P(O|Q) = bQ1O1 bQ2O2 bQ3O3
… bQTOT

Low High
0.7
0.3
0.2 0.8
Dry
Rain
0.6 0.6
0.4 0.4
Example of Hidden Markov Model

Generalized Hidden Markov model (HMM):

Solutions to basic problems
• Problem 1 is solved using the Forwards-Backwards algorithms.
• Problem 2 is solved by the Viterbi algorithm and posterior decoding.
• Finally, Problem 3 is solved by the Baum-Welch algorithm.

Solution to decoding problem ?
• Decoding problem: Viterbi Algorithm
• In this algorithm we go through the observations from start to end
referring a state of hidden machine for each observation.
• We also record the values of Overall Probability, Viterbi path
(sequence of states) and the viterbi probability( Probability of
observed state sequences in viterbi path )
• The probability of possible step given its corresponding observation is
probability of transmission times the emission probability.

Applications of HMM
• Cryptanalysis
• Speech Recognition
• Pattern Recognition
• Activity Recognition
• Machine Translation

Support Vector Machine
• Support Vector Machine or SVM is one of the most popular Supervised Learning
algorithms, which is used for Classification as well as Regression problems.
• However, primarily, it is used for Classification problems in Machine Learning.
• The goal of the SVM algorithm is to create the best line or decision boundary that
can segregate n-dimensional space into classes so that we can easily put the new
data point in the correct category in the future.
• This best decision boundary is called a hyperplane.
• SVM chooses the extreme points/vectors that help in creating the hyperplane.
• These extreme cases are called as support vectors, and hence algorithm is termed
as Support Vector Machine.

SVM Example
• Suppose we see a strange cat that also has some features of dogs, so if we want
a model that can accurately identify whether it is a cat or dog, so such a model
can be created by using the SVM algorithm.
• We will first train our model with lots of images of cats and dogs so that it can
learn about different features of cats and dogs, and then we test it with this
strange creature.
• So as support vector creates a decision boundary between these two data (cat
and dog) and choose extreme cases (support vectors), it will see the extreme
case of cat and dog.
• On the basis of the support vectors, it will classify it as a cat.
• SVM algorithm can be used for Face detection, image classification, text
categorization, etc.

Types of SVM
• Linear SVM: Linear SVM is used for linearly separable data,
which means if a dataset can be classified into two classes
by using a single straight line, then such data is termed as
linearly separable data, and classifier is used called as Linear
SVM classifier.
• Non-linear SVM: Non-Linear SVM is used for non-linearly
separated data, which means if a dataset cannot be
classified by using a straight line, then such data is termed
as non-linear data and classifier used is called as Non-linear
SVM classifier.

Hyperplane and Support Vectors in the
SVM algorithm
• Hyperplane: There can be multiple lines/decision boundaries to segregate the classes in n-
dimensional space, but we need to find out the best decision boundary that helps to classify
the data points. This best boundary is known as the hyperplane of SVM.
• The dimensions of the hyperplane depend on the features present in the dataset, which
means if there are 2 features , then hyperplane will be a straight line. And if there are 3
features, then hyperplane will be a 2-dimension plane.
• We always create a hyperplane that has a maximum margin, which means the maximum
distance between the data points.
• Support Vectors:
• The data points or vectors that are the closest to the hyperplane and which affect the position
of the hyperplane are termed as Support Vector. Since these vectors support the hyperplane,
hence called a Support vector.

How does SVM works?
• Linear SVM:
• The working of the SVM algorithm can be
understood by using an example.
• Suppose we have a dataset that has two tags
(green and blue), and the dataset has two
features x1 and x2.
• We want a classifier that can classify the
pair(x1, x2) of coordinates in either green or
blue.
• So as it is 2-d space so by just using a straight
line, we can easily separate these two classes.
• But there can be multiple lines that can
separate these classes. Consider the image2.

• Hence, the SVM algorithm helps to find the
best line or decision boundary; this best
boundary or region is called as a hyperplane.
• SVM algorithm finds the closest point of the
lines from both the classes.
• These points are called support vectors. The
distance between the vectors and the
hyperplane is called as margin.
• And the goal of SVM is to maximize this
margin. The hyperplane with maximum
margin is called the optimal hyperplane.

Cont..
• Non-Linear SVM:
• If data is linearly arranged, then we can separate it by
using a straight line, but for non-linear data, we cannot
draw a single straight line. Consider the image1:
• So to separate these data points, we need to add one
more dimension. For linear data, we have used two
dimensions x and y, so for non-linear data, we will add
a third dimension z. It can be calculated as:
• z=x2
+y2
• So now, SVM will divide the datasets into classes in the
following way. Consider the image2:
• Since we are in 3-d Space, hence it is looking like a
plane parallel to the x-axis. If we convert it in 2d space
with z=1, then it will become as shown in image 3:
• Hence we get a circumference of radius 1 in case of
non-linear data.

The SVM classifier
• The solution of the SVM problem gives us a classifier for
classifying unclassified data instances. This is known as the
SVM classifier for a given dataset.

Midpoint=1/2[(2+4),(1+3)]
=3,2

Introduction to Optimization
• The process of making something better Based on a set of inputs and a set of outputs.
• Optimization refers to finding the values of inputs in such a way that we get the “best”
output values.
• The definition of “best” varies from problem to problem, but in mathematical terms, it
refers to maximizing or minimizing one or more objective functions, by varying the input
parameters.
• The set of all possible solutions or values which the inputs can take make up the search
space.
• In this search space, a point or a set of points lies which gives the optimal solution.
• The aim of optimization is to find that point or set of points in the search space.

Genetic Algorithms
• It is a Search-based optimization technique based on the principles of Genetics and Natural
Selection.
• Frequently used to find optimal or near-optimal solutions to difficult problems which otherwise
would take a lifetime to solve.
• Based on the evolutionary idea of natural selection and genetics.
• Designed to encourage the theory of “survival of the fittest”.
• Perform a random search to solve optimization problems.
• GA uses techniques that use the previous historical information to direct their search towards
optimization in the new search space.
• GAs are adaptive heuristic search algorithms i.e. the algorithms follow an iterative pattern that
changes with time.
• It is a type of reinforcement learning where the feedback is necessary without telling the correct
path to follow.
• The feedback can either be positive or negative.

What are Genetic Algorithms?
• Genetics is derived from the Greek word, “genesis” that means to grow.
• The genetics decides the heredity factors, resemblances, and differences between the
offsprings in the process of evolution.
• Genetic Algorithms are also derived from natural evolution.
• GAs are a subset of a much larger branch of computation known as Evolutionary Computation.
• Developed by John Holland and his students and colleagues at the University of Michigan.
• In GAs, we have a pool or a population of possible solutions to the given problem.
• These solutions then undergo recombination and mutation (like in natural genetics), producing new
children, and the process is repeated over various generations.
• Each individual (or candidate solution) is assigned a fitness value (based on its objective function
value) and the fitter individuals are given a higher chance to mate and yield more “fitter” individuals.
• This is in line with the Darwinian Theory of “Survival of the Fittest”.
• In this way we keep “evolving” better individuals or solutions over generations, till we reach a
stopping criterion.

Why Use Genetic Algorithms
• GAs are more robust algorithms that can be used for various
optimization problems.
• These algorithms do not deviate easily in the presence of noise,
unlike other AI algorithms.
• GAs can be used in the search for large space or multimodal space.

Advantages of GAs
• Does not require any derivative information (which may not be
available for many real-world problems).
• Is faster and more efficient as compared to the traditional methods.
• Has very good parallel capabilities.
• Optimizes both continuous and discrete functions and also multi-
objective problems.
• Provides a list of “good” solutions and not just a single solution.
• Always gets an answer to the problem, which gets better over the time.
• Useful when the search space is very large and there are a large
number of parameters involved.

Limitations of GAs
• GAs are not suited for all problems, especially problems which
are simple and for which derivative information is available.
• Fitness value is calculated repeatedly which might be
computationally expensive for some problems.
• Being stochastic, there are no guarantees on the optimality or
the quality of the solution.
• If not implemented properly, the GA may not converge to the
optimal solution.

Terminology In GA
• Population: It is a group of individuals. The population includes the number of individuals being tested, search space
information, and the phenotype parameters. Generally, the population is randomly initialized.
• Individuals: Individuals are a single solution in population. An individual has a set of parameters called genes. Genes
combined to form chromosomes.
• Fitness: The fitness tells the value of the problem’s phenotype.
• The fitness function tells how close the solution is to the optimal solution. Fitness function is determined in many ways such as the sum of all parameters
related to the problem – Euclidean distance, etc. There is no rule to evaluate fitness function.
• Genotype − A full combination of genes in an individual is called the genotype.
• Genotype is the population in the computation space.
• In the computation space, the solutions are represented in a way which can be easily understood and manipulated
using a computing system.
• Phenotype − A set of genotypes in a decoded form is called the phenotype. Phenotype is the population in the
actual real world solution space in which solutions are represented in a way they are represented in real world situations.

Terminology(cont..)
• Chromosomes − A chromosome is one such solution to the given problem.
• Gene − A gene is one element position of a chromosome. They are represented by a bit (0 or 1) string of
random length.
• Allele − It is the value a gene takes for a particular chromosome.
• Gene Pool: All possible combinations of genes that are all alleles in a population pool is called gene pool.
• Genome: The set of genes of a species is called a genome.
• Locus: Each gene has a position in a genome that is called locus.
• Decoding and Encoding −
• For simple problems, the phenotype and genotype spaces are the same.
• However, in most of the cases, the phenotype and genotype spaces are different.
• Decoding is a process of transforming a solution from the genotype to the phenotype space, while encoding is a
process of transforming from the phenotype to genotype space.
• Decoding should be fast as it is carried out repeatedly in a GA during the fitness value calculation.
• Genetic Operators − These alter the genetic composition of the offspring. These include crossover,
mutation, selection, etc.

Correlation Of A Chromosome With GA
• The human body has chromosomes that are made of genes.
• A set of all genes of a specific species is called the genome.
• In living beings, the genomes are stored in various
chromosomes while in GA all genes are stored in the same
chromosome.

Basic Structure and working of GA
• We start with an initial population (which
may be generated at random or seeded by
other heuristics), select parents from this
population for mating.
• Apply crossover and mutation operators
on the parents to generate new off-
springs.
• And finally these off-springs replace the
existing individuals in the population and
the process repeats.
• In this way genetic algorithms actually try
to mimic the human evolution to some
extent.

A simple genetic algorithm is:
• Start with the population created randomly.
• Calculate the fitness function of each chromosome.
• Repeat the steps till n offsprings are created. The
offsprings can be created as:
• Selection-Select a pair of chromosomes from the population.
• Crossover- the pair with probability pc to form offsprings.
• Mutate- the crossover with probability pm.
• Replace the original population with the new
population and go to step 2.

Genotype Representation(1/3)
• Binary Representation
 Simplest and most widely used representation in GAs.
 The genotype consists of bit strings.
 For some problems when the solution space consists of Boolean decision variables
– yes or no, the binary representation is natural.
 Take for example the 0/1 Knapsack Problem. If there are n items, we can represent
a solution by a binary string of n elements, where the xth
element tells whether the
item x is picked (1) or not (0).
 For other problems, such as, dealing with numbers, can be represented with the
binary representation.
 The problem with this kind of encoding is
 different bits have different significance and therefore mutation and crossover operators can
have undesired consequences.
 This can be resolved to some extent by using Gray Coding, as a change in one bit does not
have a massive effect on the solution.

• Real Valued Representation
• For problems where we want to define the genes using continuous rather than discrete
variables, the real valued representation is the most natural.
• The precision of these real valued or floating point numbers is however limited to the
computer.
• Integer Representation
• For discrete valued genes, we cannot always limit the solution space to binary ‘yes’ or
‘no’.
• For example, if we want to encode the four distances – North, South, East and West,
we can encode them as {0,1,2,3}. In such cases, integer representation is desirable.

• Permutation Representation
• In many problems, the solution is represented by an order of elements.
In such cases permutation representation is the most suited.
• A classic example of this representation is the travelling salesman
problem (TSP). In this the salesman has to take a tour of all the cities,
visiting each city exactly once and come back to the starting city.

Phases of GA
• Initial population
• Fitness function
• Selection
• Crossover
• Mutation

Population
• The process begins with a set of individuals which is called a Population.
• Each individual is a solution to the problem you want to solve.
• An individual is characterized by a set of parameters (variables) known as Genes.
• Genes are joined into a string to form a Chromosome (solution).
• It can also be defined as a set of chromosomes.
• In a genetic algorithm, the set of genes of an individual is represented using a string, in terms of an alphabet.
Usually, binary values are used (string of 1s and 0s). We say that we encode the genes in a chromosome.
• There are several things to be kept in mind when dealing with GA population −
• The diversity of the population should be maintained otherwise it might lead to premature convergence.
• The population size should not be kept very large as it can cause a GA to slow down, while a smaller
population might not be enough for a good mating pool.
• Therefore, an optimal population size needs to be decided by trial and error.
• The population is usually defined as a two dimensional array.

Population Models
• Steady State
• In steady state GA, we generate one or two off-springs in each iteration
and they replace one or two individuals from the population.
• A steady state GA is also known as Incremental GA.
• Generational
• In a generational model, we generate ‘n’ off-springs, where n is the
population size, and the entire population is replaced by the new one at
the end of the iteration.

Fitness function
• The fitness function simply defined is a function which takes a candidate solution to the
problem as input and produces as output how “fit” or how “good” the solution is with respect
to the problem in consideration.
• Calculation of fitness value is done repeatedly in a GA and therefore it should be sufficiently fast.
• A slow computation of the fitness value can adversely affect a GA and make it exceptionally slow.
• In most cases the fitness function and the objective function are the same as the objective is to
either maximize or minimize the given objective function.
• Characteristics of fitness function −
• The fitness function should be sufficiently fast to compute.
• It must quantitatively measure how fit a given solution is or how fit individuals can be produced from the
given solution.

Fitness calculation
• Each chromosome from the population is passed to the
objective function one by one and its fitness is calculated.
• For example, the fitness of each of the randomly generated
solutions in the previous step is calculated as :
• f(x1,x2)=x12
+x22

Parent Selection
• Parent Selection is the process of selecting parents which mate and recombine to create off-springs for the
next generation.
• Fitness Proportionate Selection
• One of the most popular ways of parent selection.
• In this every individual can become a parent with a probability which is proportional to its fitness.
• Therefore, fitter individuals have a higher chance of mating and propagating their features to the next generation.
• Consider a circular wheel. The wheel is divided into n pies, where n is the number of individuals in the population.
• Each individual gets a portion of the circle which is proportional to its fitness value.
• Two implementations of fitness proportionate selection are possible −
• Roulette Wheel Selection
• Stochastic Universal Sampling (SUS)
• Other selection methods
• Rank selection
• Tournament selection

Roulette Wheel Selection
• A fixed point is chosen on the wheel circumference as
shown and the wheel is rotated.
• The region of the wheel which comes in front of the fixed
point is chosen as the parent.
• For the second parent, the same process is repeated.
• It is clear that a fitter individual has a greater pie on the
wheel and therefore a greater chance of landing in front of
the fixed point when the wheel is rotated.
• Therefore, the probability of choosing an individual depends
directly on its fitness.
• Implementation wise, we use the following steps −
• Calculate S = the sum of a finesses.
• Generate a random number between 0 and S.
• Starting from the top of the population, keep adding the
finesses to the partial sum P, till P<S.
• The individual for which P exceeds S is the chosen individual.

Stochastic Universal Sampling (SUS)
• Similar to Roulette wheel selection, however instead of having just
one fixed point, we have multiple fixed points.
• All the parents are chosen in just one spin of the wheel.
• It encourages the highly fit individuals to be chosen at least once.
• Fitness proportionate selection methods don’t work for cases where
the fitness can take a negative value.

Tournament Selection
• In K-Way tournament selection, we select K individuals from the population at random and
select the best out of these to become a parent.
• The same process is repeated for selecting the next parent.
• Tournament Selection is also extremely popular as it can even work with negative fitness
values.

Rank Selection
• Rank Selection also works with negative fitness values and is mostly used when the
individuals in the population have very close fitness values (this happens usually at the
end of the run).
• This leads to each individual having an almost equal share of the pie (like in case of
fitness proportionate selection) as shown in the figure and hence each individual no
matter how fit relative to each other has an approximately same probability of getting
selected as a parent.
• This in turn leads to a loss in the selection pressure towards fitter individuals, making the
GA to make poor parent selections in such situations.
• In this, we remove the concept of a fitness value while selecting a parent.
• However, every individual in the population is ranked according to their fitness.
• The selection of the parents depends on the rank of each individual and not the fitness.
The higher ranked individuals are preferred more than the lower ranked ones.

Chromosome Fitness Value Rank
A 8.1 1
B 8.0 4
C 8.05 2
D 7.95 6
E 8.02 3
F 7.99 5

Random Selection
• In this strategy we randomly select parents from the existing
population.
• There is no selection pressure towards fitter individuals and
therefore this strategy is usually avoided.

Crossover
• The crossover operator is analogous to reproduction and biological crossover.
• A crossover point is chosen at random from within the genes.
• Offspring are created by exchanging the genes of parents among themselves until the crossover
point is reached.
• In this more than one parent is selected and one or more off-springs are produced using the
genetic material of the parents.
• Crossover is usually applied in a GA with a high probability – pc .
• Crossover Operators
• One Point Crossover
• In this one-point crossover, a random crossover point is selected and the tails of its two parents are swapped to get new off-
springs.

Crossover operator(cont..)
• Multi Point Crossover
• Multi point crossover is a generalization of the one-point crossover wherein
alternating segments are swapped to get new off-springs.
• Uniform Crossover
• In a uniform crossover, we don’t divide the chromosome into segments, rather
we treat each gene separately.
• In this, we essentially flip a coin for each chromosome to decide whether or not
it’ll be included in the off-spring.
• We can also bias the coin to one parent, to have more genetic material in the
child from that parent.

Mutation
• In certain new offspring formed, some of their genes can be subjected to a mutation with a low
random probability. This implies that some of the bits in the bit string can be flipped.
• Mutation occurs to maintain diversity within the population and prevent premature convergence.
• Some of the ways of mutation are:
• Flipping: Changing from 0 to 1 or 1 to 0.
• Interchanging: Two random positions are chosen, and the values are interchanged.
• Reversing: Random position is chosen and the bits next to it are reversed.

Types of mutation
• Flip mutation-This type of mutation is performed we use binary crossover. A randomly
selected bit of a chromosome is flipped, as shown in the following diagram.
• Swap mutation -Such kind of mutation is performed when we encode chromosomes as
permutations of a given set of elements. In this type of mutation, the alleles of two
randomly selected genes are swapped, as shown in the following diagram.
• Random initialization
• This is very similar to flip mutation but is used when the chromosome is encoded using
discrete values/integers. For example, if a gene's value can be any integer between -5 and
5, we choose a gene at random and reinitialize its value with any integer from the given
range. It is also depicted in the following diagram.

When To Stop Genetic Algorithm
• Best Individual Convergence: When the minimum fitness level drops below the
convergence value, the algorithm is stopped. It leads to faster convergence.
• Worst Individual Convergence: When the least fit individuals in the population attain
minimum fitness value below the convergence, then the algorithm is stopped. In this
method, the minimum fitness standard is maintained in the population. It means that the
best individual is not guaranteed but minimum fitness value individuals will be present.
• Sum of fitness: In this method, if the sum of fitness is less than or equal to convergence
value then the search is stopped. It guarantees that all the population is within the fitness
range.
• Median Fitness: In this method, at least half of the individuals in the population will be
better than or equal to convergence value.
• Some convergence criterion or stopping condition can be:
• When a specified number of generations have evolved.
• When the specified time to run the algorithm has been met.
• When the fitness value of the population does not change further with iterations.

Application areas
• Transport: Genetic algorithms are used in the traveling salesman problem to develop transport
plans that reduce the cost of travel and the time taken. They are also used to develop an efficient
way of delivering products.
• DNA Analysis: They are used in DNA analysis to establish the DNA structure using spectrometric
information.
• Multimodal Optimization: They are used to provide multiple optimum solutions in multimodal
optimization problems.
• Aircraft Design: They are used to develop parametric aircraft designs. The parameters of the
aircraft are modified and upgraded to provide better designs.
• Economics: They are used in economics to describe various models such as the game theory,
cobweb model, asset pricing, and schedule optimization.

Machine learning fundamental concepts in detail

More Related Content

Similar to Machine learning fundamental concepts in detail (20)

Recently uploaded (20)

Machine learning fundamental concepts in detail