introduction to machine learning unit iV

Evolutionary Machine Learning (EML
In recent years, many researchers have integrated EC approaches into different phases of the ML
processes (i.e., preprocessing, learning, and postprocessing) to address the limitations of
traditional approaches. These new and hybrid methods are known as Evolutionary Machine
Learning (EML). EC in the learning phase of ML also refers to evolutionary AutoML concepts,
in which different expert-designed components of ML models, such as architecture and
hyperparameters, are automatically determined using EC approaches. Also, optimization
algorithms, such as gradient-based training algorithms, are replaced by EC algorithms or even
invented by an EC approach
An EA contains four overall steps: initialization, selection, genetic operators, and termination.
These steps each correspond, roughly, to a particular facet of natural selection, and provide easy
ways to modularize implementations of this algorithm category. Simply put, in an EA, fitter
members will survive and proliferate, while unfit members will die off and not contribute to the
gene pool of further generations, much like in natural selection.
Context
In the scope of this article, we will generally define the problem as such: we wish to find the best
combination of elements that maximizes some fitness function, and we will accept a final solution

once we have either ran the algorithm for some maximum number of iterations, or we have
reached some fitness threshold. This scenario is clearly not the only way to use an EA, but it does
encompass many common applications in the discrete case.
Initialization
In order to begin our algorithm, we must first create an initial population of solutions. The
population will contain an arbitrary number of possible solutions to the problem, oftentimes
called members. It will often be created randomly (within the constraints of the problem) or, if
some prior knowledge of the task is known, roughly centered around what is believed to be ideal.
It is important that the population encompasses a wide range of solutions, because it essentially
represents a gene pool; ergo, if we wish to explore many different possibilities over the course of
the algorithm, we should aim to have many different genes present.
Selection
Once a population is created, members of the population must now be evaluated according to
a fitness function. A fitness function is a function that takes in the characteristics of a member,
and outputs a numerical representation of how viable of a solution it is. Creating the fitness
function can often be very difficult, and it is important to find a good function that accurately
represents the data; it is very problem-specific. Now, we calculate the fitness of all members, and
select a portion of the top-scoring members.
Multiple objective functions
EAs can also be extended to use multiple fitness functions. This complicates the process
somewhat, because instead of being able to identify a single optimal point, we instead end up with

a set of optimal points when using multiple fitness functions. The set of optimal solutions is called
the Pareto frontier, and contains elements that are equally optimal in the sense that no solution
dominates any other solution in the frontier. A decider is then used to narrow the set down a
single solution, based on the context of the problem or some other metric.
Genetic Operators
This step really includes two sub-steps: crossover and mutation. After selecting the top members
(typically top 2, but this number can vary), these members are now used to create the next
generation in the algorithm. Using the characteristics of the selected parents, new children are
created that are a mixture of the parents’ qualities. Doing this can often be difficult depending on
the type of data, but typically in combinatorial problems, it is possible to mix combinations and
output valid combinations from these inputs. Now, we must introduce new genetic material into
the generation. If we do not do this crucial step, we will become stuck in local extrema very
quickly, and will not obtain optimal results. This step is mutation, and we do this, quite simply, by

changing a small portion of the children such that they no longer perfectly mirror subsets of the
parents’ genes. Mutation typically occurs probabilistically, in that the chance of a child receiving
a mutation as well as the severity of the mutation are governed by a probability distribution.
Termination
Eventually, the algorithm must end. There are two cases in which this usually occurs: either the
algorithm has reached some maximum runtime, or the algorithm has reached some threshold of
performance. At this point a final solution is selected and returned.
o Genetic Algorithms(GAs) Population: Population is the subset of all possible or
probable solutions, which can solve the given problem.
o Chromosomes: A chromosome is one of the solutions in the population for the given
problem, and the collection of gene generate a chromosome.
o Gene: A chromosome is divided into a different gene, or it is an element of the
chromosome.
o Allele: Allele is the value provided to the gene within a particular chromosome.
o Fitness Function: The fitness function is used to determine the individual's fitness level
in the population. It means the ability of an individual to compete with other individuals.
In every iteration, individuals are evaluated based on their fitness function.
o Genetic Operators: In a genetic algorithm, the best individual mate to regenerate
offspring better than parents. Here genetic operators play a role in changing the genetic
composition of the next generation.
o Selection
After calculating the fitness of every existent in the population, a selection process is used to
determine which of the individualities in the population will get to reproduce and produce the
seed that will form the coming generation.
Types of selection styles available
o Roulette wheel selection

o Event selection
o Rank- grounded selection
So, now we can define a genetic algorithm as a heuristic search algorithm to solve optimization
problems. It is a subset of evolutionary algorithms, which is used in computing. A genetic
algorithm uses genetic and natural selection concepts to solve optimization problems.
How Genetic Algorithm Work?
The genetic algorithm works on the evolutionary generational cycle to generate high-quality
solutions. These algorithms use different operations that either enhance or replace the population
to give an improved fit solution.
It basically involves five phases to solve the complex optimization problems, which are given as
below:
o Initialization
o Fitness Assignment
o Selection
o Reproduction
o Termination
1. Initialization
The process of a genetic algorithm starts by generating the set of individuals, which is called
population. Here each individual is the solution for the given problem. An individual contains or
is characterized by a set of parameters called Genes. Genes are combined into a string and
generate chromosomes, which is the solution to the problem. One of the most popular techniques
for initialization is the use of random binary strings.

2. Fitness Assignment
Fitness function is used to determine how fit an individual is? It means the ability of an
individual to compete with other individuals. In every iteration, individuals are evaluated based
on their fitness function. The fitness function provides a fitness score to each individual. This
score further determines the probability of being selected for reproduction. The high the fitness
score, the more chances of getting selected for reproduction.
3. Selection
The selection phase involves the selection of individuals for the reproduction of offspring. All
the selected individuals are then arranged in a pair of two to increase reproduction. Then these
individuals transfer their genes to the next generation.
There are three types of Selection methods available, which are:
o Roulette wheel selection
o Tournament selection
o Rank-based selection
4. Reproduction
After the selection process, the creation of a child occurs in the reproduction step. In this step,
the genetic algorithm uses two variation operators that are applied to the parent population. The
two operators involved in the reproduction phase are given below:

o Crossover: The crossover plays a most significant role in the reproduction phase of the
genetic algorithm. In this process, a crossover point is selected at random within the
genes. Then the crossover operator swaps genetic information of two parents from the
current generation to produce a new individual representing the offspring.
The genes of parents are exchanged among themselves until the crossover point is met.
These newly generated offspring are added to the population. This process is also called
or crossover. Types of crossover styles available:
o One point crossover
o Two-point crossover
o Livery crossover
o Inheritable Algorithms crossover
o Mutation
The mutation operator inserts random genes in the offspring (new child) to maintain the
diversity in the population. It can be done by flipping some bits in the chromosomes.
Mutation helps in solving the issue of premature convergence and enhances
diversification. The below image shows the mutation process:
Types of mutation styles available,
o Flip bit mutation
o Gaussian mutation
o Exchange/Swap mutation

5. Termination
After the reproduction phase, a stopping criterion is applied as a base for termination. The
algorithm terminates after the threshold fitness solution is reached. It will identify the final
solution as the best solution in the population.
General Workflow of a Simple Genetic Algorithm

Advantages of Genetic Algorithm
o The parallel capabilities of genetic algorithms are best.
o It helps in optimizing various problems such as discrete functions, multi-objective
problems, and continuous functions.
o It provides a solution for a problem that improves over time.
o A genetic algorithm does not need derivative information.

Limitations of Genetic Algorithms
o Genetic algorithms are not efficient algorithms for solving simple problems.
o It does not guarantee the quality of the final solution to a problem.
o Repetitive calculation of fitness values may generate some computational challenges.
Difference between Genetic Algorithms and Traditional Algorithms
o A search space is the set of all possible solutions to the problem. In the traditional
algorithm, only one set of solutions is maintained, whereas, in a genetic algorithm,
several sets of solutions in search space can be used.
o Traditional algorithms need more information in order to perform a search, whereas
genetic algorithms need only one objective function to calculate the fitness of an
individual.
o Traditional Algorithms cannot work parallelly, whereas genetic Algorithms can work
parallelly (calculating the fitness of the individualities are independent).
o One big difference in genetic Algorithms is that rather of operating directly on seeker
results, inheritable algorithms operate on their representations (or rendering), frequently
appertained to as chromosomes.
o One of the big differences between traditional algorithm and genetic algorithm is that it
does not directly operate on candidate solutions.
o Traditional Algorithms can only generate one result in the end, whereas Genetic
Algorithms can generate multiple optimal results from different generations.
o The traditional algorithm is not more likely to generate optimal results, whereas Genetic
algorithms do not guarantee to generate optimal global results, but also there is a great
possibility of getting the optimal result for a problem as it uses genetic operators such as
Crossover and Mutation.
o Traditional algorithms are deterministic in nature, whereas Genetic algorithms are
probabilistic and stochastic in nature
Genetic operator
Genetic operator is an operator used in genetic algorithms to guide the algorithm towards a
solution to a given problem. There are three main types of operators (mutation, crossover and

selection), which must work in conjunction with one another in order for the algorithm to be
successful.
Genetic operators are used to create and maintain genetic diversity (mutation operator), combine
existing solutions (also known as chromosomes) into new solutions (crossover) and select
between solution
Selection
Selection operators give preference to better solutions (chromosomes), allowing them to pass on
their 'genes' to the next generation of the algorithm. The best solutions are determined using
some form of objective function (also known as a 'fitness function' in genetic algorithms), before
being passed to the crossover operator. Different methods for choosing the best solutions exist,
for example, fitness proportionate selection and tournament selection; different methods may
choose different solutions as being 'best'. The selection operator may also simply pass the best
solutions from the current generation directly to the next generation without being mutated; this
is known as elitism or elitist selection
Crossover
Crossover is the process of taking more than one parent solutions (chromosomes) and producing
a child solution from them. By recombining portions of good solutions, the genetic algorithm is
more likely to create a better solution.As with selection, there are a number of different methods
for combining the parent solutions, including the edge recombination operator (ERO) and the
'cut and splice crossover' and 'uniform crossover' methods. The crossover method is often chosen
to closely match the chromosome's representation of the solution; this may become particularly
important when variables are grouped together as building blocks, which might be disrupted by a
non-respectful crossover operator. Similarly, crossover methods may be particularly suited to
certain problems; the ERO is generally considered a good option for solving the travelling
salesman problem
Mutation
The mutation operator encourages genetic diversity amongst solutions and attempts to prevent
the genetic algorithm converging to a local minimum by stopping the solutions becoming too
close to one another. In mutating the current pool of solutions, a given solution may change
entirely from the previous solution. By mutating the solutions, a genetic algorithm can reach an
improved solution solely through the mutation operator.[1]
Again, different methods of mutation

may be used; these range from a simple bit mutation (flipping random bits in a binary string
chromosome with some low probability) to more complex mutation methods, which may replace
genes in the solution with random values chosen from the uniform distribution or the Gaussian
distribution. As with the crossover operator, the mutation method is usually chosen to match the
representation of the solution within the chromosome.
Genetic Offspring
Offspring selection (OS) [1] is a generic extension to the general concept of a genetic algorithm
[2, 3] which includes an additional selection step after reproduction: The fitness of an offspring
is compared to the fitness values of its own parents in order to decide whether or not a the
offspring solution candidate
Reinforcement learning
Reinforcement learning is an area of Machine Learning. It is about taking suitable action to
maximize reward in a particular situation. It is employed by various software and machines to
find the best possible behavior or path it should take in a specific situation. Reinforcement
learning differs from supervised learning in a way that in supervised learning the training data
has the answer key with it so the model is trained with the correct answer itself whereas in
reinforcement learning, there is no answer but the reinforcement agent decides what to do to
perform the given task. In the absence of a training dataset, it is bound to learn from its
experience.
Example: The problem is as follows: We have an agent and a reward, with many hurdles in
between. The agent is supposed to find the best possible path to reach the reward. The
following problem explains the problem more easily.
The above image shows the robot, diamond, and fire. The goal of the robot is to get the reward
that is the diamond and avoid the hurdles that are fired. The robot learns by trying all the
possible paths and then choosing the path which gives him the reward with the least hurdles.
Each right step will give the robot a reward and each wrong step will subtract the reward of the
robot. The total reward will be calculated when it reaches the final reward that is the diamond.
Main points in Reinforcement learning –
 Input: The input should be an initial state from which the model will start
 Output: There are many possible outputs as there are a variety of solutions to a particular
problem
 Training: The training is based upon the input, The model will return a state and the user
will decide to reward or punish the model based on its output.

 The model keeps continues to learn.
 The best solution is decided based on the maximum reward.
Difference between Reinforcement learning and Supervised learning:
Reinforcement learning Supervised learning
Reinforcement learning is all about making decisions
sequentially. In simple words, we can say that the output
depends on the state of the current input and the next input
depends on the output of the previous input
In Supervised learning, the
decision is made on the initial
input or the input given at the
start
In Reinforcement learning decision is dependent, So we give
labels to sequences of dependent decisions
In supervised learning the
decisions are independent of
each other so labels are given to
each decision.
Example: Chess game Example: Object recognition
Types of Reinforcement: There are two types of Reinforcement:
1. Positive –
Positive Reinforcement is defined as when an event, occurs due to a particular behavior,
increases the strength and the frequency of the behavior. In other words, it has a positive
effect on behavior.
Advantages of reinforcement learning are:
 Maximizes Performance
 Sustain Change for a long period of time
 Too much Reinforcement can lead to an overload of states which can diminish the
results
2. Negative –
Negative Reinforcement is defined as strengthening of behavior because a negative
condition is stopped or avoided.
Advantages of reinforcement learning:
 Increases Behavior
 Provide defiance to a minimum standard of performance
 It Only provides enough to meet up the minimum behavior
Various Practical applications of Reinforcement Learning –
 RL can be used in robotics for industrial automation.
 RL can be used in machine learning and data processing
 RL can be used to create training systems that provide custom instruction and materials
according to the requirement of students.
RL can be used in large environments in the following situations:

1. A model of the environment is known, but an analytic solution is not available;
2. Only a simulation model of the environment is given (the subject of simulation-based
optimization)
 The only way to collect information about the environment is to interact with it.
Getting Lost Example
Playing games like Go: Google has reinforcement learning agents that learn to solve
problems by playing simple games like Go, which is a game of strategy. Playing this game
requires reasoning and intelligence. Google’s reinforcement learning agent had no prior
knowledge of the rules of the game or how to play it. It simply tried different moves
randomly at first; then it “learned” which moves were the most likely to get the best results.
It continuously learned until it was able to beat human players consistently. Using
reinforcement learning, researchers at MIT created an algorithm called Deep Q-Network
(DQN) that mimics the behavior of animals playing Atari games. As it moves through an
environment, the reinforcement learning agent collects data. It uses this reinforcement
learning data to evaluate possible actions and their consequences in order to determine
which action will likely maximize its expected return of rewards .
Self-driving cars: Reinforcement learning is used in self-driving cars for various purposes
such as the following. Amazon cloud service such as DeepRacer can be used to test RL on
physical tracks.
 Trajectory optimization: Reinforcement learning can be used to train an agent for
optimizing trajectories. In reinforcement learning, the software agents could get reward
from their environment after every time step by executing an action in the state. Reward
is typically normalized to [0, 1].
 Motion planning including lane changing, parking etc
 Dynamic pathing: Reinforcement learning can be used for dynamically planning the
most efficient path in a grid of potential paths.
 Controller optimisation
 Scenario-based learning policies for highways
 Data centre automated cooling using Deep RL: Use deep RL to automate the data center
cooling. At regular time intervals, the snapshot of the data centre cooling system, being
fetched from thousands of sensors, is fed into the deep neural networks. The deep NN
predicts how different combinations of potential actions will impact the future energy
consumption. The AI system, then, identifies the actions that will minimise the energy
consumption. The most appropriate action is sent to the data centre. The recommended
action is verified and implemented.
 Personalised product recommendation system: Personalise / customize what products
need to be shown to individual users to realise maximum sale; This would be something
ecommerce portals would love to implement to realise maximum click-through rates on any
given product and related sales, on any given day
 Ad recommendation system: Customise / personalise what Ads need to be shown to the
end user to have higher click-through rate. Reinforcement learning is used in large-scale ad
recommendation system due to its dynamic adaptation of the Ad according to reinforcement

signals and its success in real-life applications. For example, retargeting user who has
already seen the product before, and show the product to user who has not yet seen it. User
clicks on an ads and get directed to a landing page. The reinforcement signal is defined as
the total click-through rate (CTR) of the ad. The reinforcement learning model calculates
weights at each time step, and then updates them in real-time according to reinforcement
signals. Thus, it learns how to best respond to reinforcement signals at each time step.
 Personalised video recommendations based on different factors related to every
individual.
 Customised action in video games based on reinforcement learning; AI agents use
reinforcement learning to coordinate actions and react appropriately to new situations
through a series of rewards.
 Personalised chatbot response using reinforcement learning based on the behavior of the
end user in order to achieve desired business outcome and great user experience
 AI-powered stock buying/selling: While supervised learning algorithms can be used to
predict the stock prices, it s the reinforcement learning which can be used to decide whether
to buy, sell or hold the stock at given predicted price.
 RL can be used for NLP use cases such as text summarization, question & answers,
machine translation.
 RL in healthcare can be used to recommend different treatment options. While
supervised learning models can be used to predict whether a person is suffering from a
disease or not, RL can be used to predict treatment options given a person is suffering from
a particular disease.
There are several cloud-based AI / ML services such as Azure Personalizer that can be used to
train reinforcement learning models to deliver personalized solutions such as some of those
mentioned above.
Markov Decision Process.
Reinforcement Learning is a type of Machine Learning. It allows machines and software
agents to automatically determine the ideal behavior within a specific context, in order to
maximize its performance. Simple reward feedback is required for the agent to learn its
behavior; this is known as the reinforcement signal.
There are many different algorithms that tackle this issue. As a matter of fact, Reinforcement
Learning is defined by a specific type of problem, and all its solutions are classed as
Reinforcement Learning algorithms. In the problem, an agent is supposed to decide the best
action to select based on his current state. When this step is repeated, the problem is known as
a Markov Decision Process.
A Markov Decision Process (MDP) model contains:
 A set of possible world states S.
 A set of Models.
 A set of possible actions A.
 A real-valued reward function R(s,a).
 A policy the solution of Markov Decision Process.

What is a State?
A State is a set of tokens that represent every state that the agent can be in.
What is a Model?
A Model (sometimes called Transition Model) gives an action’s effect in a state. In particular,
T(S, a, S’) defines a transition T where being in state S and taking an action ‘a’ takes us to
state S’ (S and S’ may be the same). For stochastic actions (noisy, non-deterministic) we also
define a probability P(S’|S,a) which represents the probability of reaching a state S’ if action
‘a’ is taken in state S. Note Markov property states that the effects of an action taken in a state
depend only on that state and not on the prior history.
What are Actions?
An Action A is a set of all possible actions. A(s) defines the set of actions that can be taken
being in state S.

What is a Reward?
A Reward is a real-valued reward function. R(s) indicates the reward for simply being in the
state S. R(S,a) indicates the reward for being in a state S and taking an action ‘a’. R(S,a,S’)
indicates the reward for being in a state S, taking an action ‘a’ and ending up in a state S’.
What is a Policy?
A Policy is a solution to the Markov Decision Process. A policy is a mapping from S to a. It
indicates the action ‘a’ to be taken while in state S.
Let us take the example of a grid world:
An agent lives in the grid. The above example is a 3*4 grid. The grid has a START state(grid
no 1,1). The purpose of the agent is to wander around the grid to finally reach the Blue
Diamond (grid no 4,3). Under all circumstances, the agent should avoid the Fire grid (orange
color, grid no 4,2). Also the grid no 2,2 is a blocked grid, it acts as a wall hence the agent
cannot enter it.
The agent can take any one of these actions: UP, DOWN, LEFT, RIGHT
Walls block the agent path, i.e., if there is a wall in the direction the agent would have taken,
the agent stays in the same place. So for example, if the agent says LEFT in the START grid
he would stay put in the START grid.

First Aim: To find the shortest sequence getting from START to the Diamond. Two such
sequences can be found:
 RIGHT RIGHT UP UPRIGHT
 UP UP RIGHT RIGHT RIGHT
Let us take the second one (UP UP RIGHT RIGHT RIGHT) for the subsequent discussion.
The move is now noisy. 80% of the time the intended action works correctly. 20% of the time
the action agent takes causes it to move at right angles. For example, if the agent says UP the
probability of going UP is 0.8 whereas the probability of going LEFT is 0.1, and the
probability of going RIGHT is 0.1 (since LEFT and RIGHT are right angles to UP).
The agent receives rewards each time step:-
 Small reward each step (can be negative when can also be term as punishment, in the above
example entering the Fire can have a reward of -1).
 Big rewards come at the end (good or bad).
 The goal is to Maximize the sum of rewards.

introduction to machine learning unit iV

More Related Content

Similar to introduction to machine learning unit iV (20)

More from GayathriRHICETCSESTA (20)

Recently uploaded (20)

introduction to machine learning unit iV