Machine Learning.pptx

MACHINE LEARNING
INTRODUCTION AND CONCEPT
Data Science is a study of analyzing and predicting the data,
it is a way to analyze the pattern from the data and bring
out meaningful insightful for business critical decisions and
expansions.
Data Science is a blend of Mathematics, Statistics and
Programming to analyze the data.
Data Science is all about the present and future. That is,
finding out the trends based on historical data which can
be useful for making present decisions and finding patterns
which can be modelled and can be used for predictions to
see what things may look like in the future.
Rapidly evolving technologies that affect data
science:Automation.Text analysis. Platform growth. In-
database analytics.

MACHINE LEARNING
1) BUSINESS UNDERSTANDING
Even though access to data and the computing power have
both increased tremendously in the last decade the success
of an organization still largely depends on the quality of
questions they ask of their data set.
A few right questions that other successful businesses have
asked in the past of their data science teams.
Uber — What percentage of time do drivers actually drive?
How steady is their income?
Oyo Hotels — What is the average occupancy of mediocre
hotels?
Alibaba — What are the per-square-feet profits of our
warehouses?

MACHINE LEARNING
2) Data Collection
If asking the right questions is the recipe, then data is your ingredient.
Once you have the clarity on business understanding, data
collection becomes a matter of breaking the problem down into
smaller components.
The data scientist needs to know which ingredients are required, how
to source and collect them, and how to prepare the data to meet the
desired outcome.
Data collection simply means an online survey, sensors, online
generated data, data extraction from databases.

MACHINE LEARNING
3) Data Preparation
In this step we understand more about the data and prepare it for
further analysis. The data understanding section of the data
science methodology answers the question: Is the data that you
collected representative of the problem to be solved?
Preparation usually involves the following steps.
1. Handling missing data
2. Correcting invalid values
3. Removing duplicates
4. Structuring the data to be fed into an algorithm
(Normalization) [Optional]
5. Feature Engineering

MACHINE LEARNING
4) Data Modelling
Modelling is the stage in the data science methodology where
the data scientist has the chance to sample the sauce and
determine if it’s bang on or in need of more seasoning
Modelling is used to find patterns or behaviors in data. These
patterns either help us in one of two ways
1. Descriptive Analysis
2. Predictive Analysis

MACHINE LEARNING
5) Model Evaluation
In the machine learning world modelling is divided into 2 distinct
stages — training and testing. Training comprises of 70% of original
data while Test data will comprise of 30% of original Data.
Once we have modelled the data we can derive insights from it. This
is the stage where we can finally start evaluating our complete data
science system.
The end of modelling is characterized by model evaluation where
you measure
1. Accuracy — How well the model performs i.e. does it describe
the data accurately.
2. Relevance — Does it answer the original question that you set
out to answer

MACHINE LEARNING
Machine Learning is a subset of artificial intelligence which
focuses mainly on machine learning from their experience and
making predictions based on its experience.
It enables the computers or the machines to make data-driven
decisions rather than being explicitly programmed for
carrying out a certain task. These programs or algorithms are
designed in a way that they learn and improve over time when
are exposed to new data.

MACHINE LEARNING
Supervised Learning
Supervised Learning is the one, where you can consider the learning is guided
by a teacher. We have a dataset which acts as a teacher and its role is to train
the model or the machine. Once the model gets trained it can start making a
prediction or decision when new data is given to it.
Unsupervised Learning
The model learns through observation and finds structures in the data. Once
the model is given a dataset, it automatically finds patterns and relationships
in the dataset by creating clusters in it. What it cannot do is add labels to the
cluster, like it cannot say this a group of apples or mangoes, but it will
separate all the apples from mangoes.
Suppose we presented images of apples, bananas and mangoes to the model,
so what it does, based on some patterns and relationships it creates clusters
and divides the dataset into those clusters. Now if a new data is fed to the
model, it adds it to one of the created clusters.
Reinforcement Learning
It is the ability of an agent to interact with the environment and find out what
is the best outcome. It follows the concept of hit and trial method. The agent
is rewarded or penalized with a point for a correct or a wrong answer, and on
the basis of the positive reward points gained the model trains itself. And
again once trained it gets ready to predict the new data presented to it.

MACHINE LEARNING
Linear Regression
Linear regression is used for finding linear relationship between
target and one or more predictors. There are two types of linear
regression- Simple and Multiple.

MACHINE LEARNING
Logistic Regression is one of the basic and popular algorithm to
solve a classification problem. It is named as ‘Logistic
Regression’, because it’s underlying technique is quite the same
as Linear Regression. The term “Logistic” is taken from the Logit
function that is used in this method of classification.
Suppose we have a data of tumor size vs its malignancy. As it is a
classification problem, if we plot, we can see, all the values will
lie on 0 and 1. And if we fit best found regression line, by
assuming the threshold at 0.5, we can do line pretty reasonable
job.
The Sigmoid function is the main part of logistic regression,
where Sigmoid of 𝜃^𝑇.𝑋, gives us the probability of a point
belonging to a class, instead of the value of y directly.

MACHINE LEARNING
The KNN algorithm assumes that similar things exist in close
proximity. In other words, similar things are near to each other.
The KNN algorithm hinges on this assumption being true enough
for the algorithm to be useful. KNN captures the idea of similarity
(sometimes called distance, proximity, or closeness) with some
mathematics.
The distance between two points on the graph is calculated with
the help of technique called Euclidean distance
Euclidean Distance=sqrt of (X1-Y1)2 + (X2-Y2)2 + (X3-Y3)2 + (XN-
YN)2……
Choosing the right value of K
To select the K that’s right for your data, we run the KNN
algorithm several times with different values of K and choose the
K that reduces the number of errors we encounter while
maintaining the algorithm’s ability to accurately make predictions
when it’s given data it hasn’t seen before.

MACHINE LEARNING
K-means clustering is one of the simplest and popular unsupervised
machine learning algorithms.
Typically, unsupervised algorithms make inferences from datasets
using only input vectors without referring to known, or labelled,
outcomes.
The simple objective of K-Means clustering is group similar data
points together and discover underlying patterns. To achieve this
objective, K-means looks for a fixed number (k) of clusters in a
dataset.
A cluster refers to a collection of data points aggregated together
because of certain similarities.
A Bottom-Up version of hierarchical clustering is known as
Agglomerative clustering.
Agglomerative clustering, you need to compute a distance/proximity
matrix, which is an n by n table of all
distances between each data point in each cluster of your dataset.

MACHINE LEARNING
DBSCAN?
Density-based spatial clustering of applications with noise is a data
clustering algorithm
1) DBSCAN can be used when examining spatial data.
2) DBSCAN can be applied to tasks with arbitrary shaped clusters, or
clusters within clusters.
3) DBSCAN can find any arbitrary shaped cluster without getting
affected by noise.
Advantages
1) DBSCAN does not require one to specify the number of clusters
in the data a priori, as opposed to k-means.
2) DBSCAN can find arbitrarily-shaped clusters. It can even find a
cluster completely surrounded by (but not connected to) a different
cluster.
3) DBSCAN has a notion of noise, and is robust to outliers.

MACHINE LEARNING
A decision tree is a decision support tool that uses a tree-like
graph or model of decisions and their possible consequences,
including chance event outcomes, resource costs, and utility. It
is one way to display an algorithm that only contains conditional
control statements.
Entropy: The amount of information disorder in the data.
When building a decision tree, we want to split the nodes in a
way that decrease entropy and increases information gain.

MACHINE LEARNING
Missing Values
Missing data in the training data set can reduce the power / fit of
a model or can lead to a biased model because we have not
analyzed the behavior and relationship with other variables
correctly. It can lead to wrong prediction or classification.

MACHINE LEARNING
Outlier is a commonly used terminology by analysts and data
scientists as it needs close attention else it can result in wildly
wrong estimations. Simply speaking, Outlier is an observation that
appears far away and diverges from an overall pattern in a sample.

MACHINE LEARNING
Feature Engineering
Feature engineering is the science (and art) of extracting more
information from existing data. You are not adding any new data
here, but you are actually making the data you already have more
useful.
For example, let’s say you are trying to predict foot fall in a shopping
mall based on dates. If you try and use the dates directly, you may
not be able to extract meaningful insights from the data. This is
because the foot fall is less affected by the day of the month than it is
by the day of the week. Now this information about day of week is
implicit in your data. You need to bring it out to make your model
better.
Feature Engineering includes Missing Value Treatment, Outlier’s
Treatment, Dummy Variables and Variable Creation

Machine Learning.pptx

More Related Content

Similar to Machine Learning.pptx (20)

More from NitinSharma134320 (6)

Recently uploaded (20)

Machine Learning.pptx