SlideShare a Scribd company logo
MACHINE LEARNING
INTRODUCTION AND CONCEPT
Data Science is a study of analyzing and predicting the data,
it is a way to analyze the pattern from the data and bring
out meaningful insightful for business critical decisions and
expansions.
Data Science is a blend of Mathematics, Statistics and
Programming to analyze the data.
Data Science is all about the present and future. That is,
finding out the trends based on historical data which can
be useful for making present decisions and finding patterns
which can be modelled and can be used for predictions to
see what things may look like in the future.
Rapidly evolving technologies that affect data
science:Automation.Text analysis. Platform growth. In-
database analytics.
MACHINE LEARNING
MACHINE LEARNING
1) BUSINESS UNDERSTANDING
Even though access to data and the computing power have
both increased tremendously in the last decade the success
of an organization still largely depends on the quality of
questions they ask of their data set.
A few right questions that other successful businesses have
asked in the past of their data science teams.
Uber — What percentage of time do drivers actually drive?
How steady is their income?
Oyo Hotels — What is the average occupancy of mediocre
hotels?
Alibaba — What are the per-square-feet profits of our
warehouses?
MACHINE LEARNING
2) Data Collection
If asking the right questions is the recipe, then data is your ingredient.
Once you have the clarity on business understanding, data
collection becomes a matter of breaking the problem down into
smaller components.
The data scientist needs to know which ingredients are required, how
to source and collect them, and how to prepare the data to meet the
desired outcome.
Data collection simply means an online survey, sensors, online
generated data, data extraction from databases.
MACHINE LEARNING
3) Data Preparation
In this step we understand more about the data and prepare it for
further analysis. The data understanding section of the data
science methodology answers the question: Is the data that you
collected representative of the problem to be solved?
Preparation usually involves the following steps.
1. Handling missing data
2. Correcting invalid values
3. Removing duplicates
4. Structuring the data to be fed into an algorithm
(Normalization) [Optional]
5. Feature Engineering
MACHINE LEARNING
4) Data Modelling
Modelling is the stage in the data science methodology where
the data scientist has the chance to sample the sauce and
determine if it’s bang on or in need of more seasoning
Modelling is used to find patterns or behaviors in data. These
patterns either help us in one of two ways
1. Descriptive Analysis
2. Predictive Analysis
MACHINE LEARNING
5) Model Evaluation
In the machine learning world modelling is divided into 2 distinct
stages — training and testing. Training comprises of 70% of original
data while Test data will comprise of 30% of original Data.
Once we have modelled the data we can derive insights from it. This
is the stage where we can finally start evaluating our complete data
science system.
The end of modelling is characterized by model evaluation where
you measure
1. Accuracy — How well the model performs i.e. does it describe
the data accurately.
2. Relevance — Does it answer the original question that you set
out to answer
MACHINE LEARNING
Machine Learning is a subset of artificial intelligence which
focuses mainly on machine learning from their experience and
making predictions based on its experience.
It enables the computers or the machines to make data-driven
decisions rather than being explicitly programmed for
carrying out a certain task. These programs or algorithms are
designed in a way that they learn and improve over time when
are exposed to new data.
MACHINE LEARNING
Supervised Learning
Supervised Learning is the one, where you can consider the learning is guided
by a teacher. We have a dataset which acts as a teacher and its role is to train
the model or the machine. Once the model gets trained it can start making a
prediction or decision when new data is given to it.
Unsupervised Learning
The model learns through observation and finds structures in the data. Once
the model is given a dataset, it automatically finds patterns and relationships
in the dataset by creating clusters in it. What it cannot do is add labels to the
cluster, like it cannot say this a group of apples or mangoes, but it will
separate all the apples from mangoes.
Suppose we presented images of apples, bananas and mangoes to the model,
so what it does, based on some patterns and relationships it creates clusters
and divides the dataset into those clusters. Now if a new data is fed to the
model, it adds it to one of the created clusters.
Reinforcement Learning
It is the ability of an agent to interact with the environment and find out what
is the best outcome. It follows the concept of hit and trial method. The agent
is rewarded or penalized with a point for a correct or a wrong answer, and on
the basis of the positive reward points gained the model trains itself. And
again once trained it gets ready to predict the new data presented to it.
MACHINE LEARNING
Linear Regression
Linear regression is used for finding linear relationship between
target and one or more predictors. There are two types of linear
regression- Simple and Multiple.
MACHINE LEARNING
Logistic Regression is one of the basic and popular algorithm to
solve a classification problem. It is named as ‘Logistic
Regression’, because it’s underlying technique is quite the same
as Linear Regression. The term “Logistic” is taken from the Logit
function that is used in this method of classification.
Suppose we have a data of tumor size vs its malignancy. As it is a
classification problem, if we plot, we can see, all the values will
lie on 0 and 1. And if we fit best found regression line, by
assuming the threshold at 0.5, we can do line pretty reasonable
job.
The Sigmoid function is the main part of logistic regression,
where Sigmoid of 𝜃^𝑇.𝑋, gives us the probability of a point
belonging to a class, instead of the value of y directly.
MACHINE LEARNING
The KNN algorithm assumes that similar things exist in close
proximity. In other words, similar things are near to each other.
The KNN algorithm hinges on this assumption being true enough
for the algorithm to be useful. KNN captures the idea of similarity
(sometimes called distance, proximity, or closeness) with some
mathematics.
The distance between two points on the graph is calculated with
the help of technique called Euclidean distance
Euclidean Distance=sqrt of (X1-Y1)2 + (X2-Y2)2 + (X3-Y3)2 + (XN-
YN)2……
Choosing the right value of K
To select the K that’s right for your data, we run the KNN
algorithm several times with different values of K and choose the
K that reduces the number of errors we encounter while
maintaining the algorithm’s ability to accurately make predictions
when it’s given data it hasn’t seen before.
MACHINE LEARNING
K-means clustering is one of the simplest and popular unsupervised
machine learning algorithms.
Typically, unsupervised algorithms make inferences from datasets
using only input vectors without referring to known, or labelled,
outcomes.
The simple objective of K-Means clustering is group similar data
points together and discover underlying patterns. To achieve this
objective, K-means looks for a fixed number (k) of clusters in a
dataset.
A cluster refers to a collection of data points aggregated together
because of certain similarities.
A Bottom-Up version of hierarchical clustering is known as
Agglomerative clustering.
Agglomerative clustering, you need to compute a distance/proximity
matrix, which is an n by n table of all
distances between each data point in each cluster of your dataset.
MACHINE LEARNING
DBSCAN?
Density-based spatial clustering of applications with noise is a data
clustering algorithm
1) DBSCAN can be used when examining spatial data.
2) DBSCAN can be applied to tasks with arbitrary shaped clusters, or
clusters within clusters.
3) DBSCAN can find any arbitrary shaped cluster without getting
affected by noise.
Advantages
1) DBSCAN does not require one to specify the number of clusters
in the data a priori, as opposed to k-means.
2) DBSCAN can find arbitrarily-shaped clusters. It can even find a
cluster completely surrounded by (but not connected to) a different
cluster.
3) DBSCAN has a notion of noise, and is robust to outliers.
MACHINE LEARNING
A decision tree is a decision support tool that uses a tree-like
graph or model of decisions and their possible consequences,
including chance event outcomes, resource costs, and utility. It
is one way to display an algorithm that only contains conditional
control statements.
Entropy: The amount of information disorder in the data.
When building a decision tree, we want to split the nodes in a
way that decrease entropy and increases information gain.
MACHINE LEARNING
Missing Values
Missing data in the training data set can reduce the power / fit of
a model or can lead to a biased model because we have not
analyzed the behavior and relationship with other variables
correctly. It can lead to wrong prediction or classification.
MACHINE LEARNING
Outlier is a commonly used terminology by analysts and data
scientists as it needs close attention else it can result in wildly
wrong estimations. Simply speaking, Outlier is an observation that
appears far away and diverges from an overall pattern in a sample.
MACHINE LEARNING
Feature Engineering
Feature engineering is the science (and art) of extracting more
information from existing data. You are not adding any new data
here, but you are actually making the data you already have more
useful.
For example, let’s say you are trying to predict foot fall in a shopping
mall based on dates. If you try and use the dates directly, you may
not be able to extract meaningful insights from the data. This is
because the foot fall is less affected by the day of the month than it is
by the day of the week. Now this information about day of week is
implicit in your data. You need to bring it out to make your model
better.
Feature Engineering includes Missing Value Treatment, Outlier’s
Treatment, Dummy Variables and Variable Creation

More Related Content

PDF
Machine Learning Interview Questions and Answers
PPTX
Regression with Microsoft Azure & Ms Excel
PDF
Supervised learning techniques and applications
PPTX
chapter Three artificial intelligence 1.pptx
PPTX
Machine Can Think
PDF
machinecanthink-160226155704.pdf
PPTX
Introduction to Machine Learning
Machine Learning Interview Questions and Answers
Regression with Microsoft Azure & Ms Excel
Supervised learning techniques and applications
chapter Three artificial intelligence 1.pptx
Machine Can Think
machinecanthink-160226155704.pdf
Introduction to Machine Learning

Similar to Machine Learning.pptx (20)

PPTX
Mis End Term Exam Theory Concepts
PPTX
Industrial training ppt
PPTX
introduction to machine learning
PDF
Machine Learning - Deep Learning
PDF
IRJET- Machine Learning: Survey, Types and Challenges
PPTX
5. Machine Learning.pptx
PDF
Machine Learning_Unit 2_Full.ppt.pdf
PPTX
INTRODUCTION TO MACHINE LEARNING.pptx
PDF
Introduction to machine learning
PDF
Top Machine Learning Algorithms Used By AI Professionals ARTiBA.pdf
PPTX
Chapter 05 Machine Learning.pptx
PPTX
Day1-Introdtechhnology of techuction.pptx
PPTX
Lesson 1 - Overview of Machine Learning and Data Analysis.pptx
PDF
An Introduction to Machine Learning
PDF
A Survey on Machine Learning Algorithms
PDF
Data Science Interview Questions PDF By ScholarHat
PDF
Mlmlmlmlmlmlmlmlmlmlmlmlmlmlmlml.lmlmlmlmlm
PPTX
Machine Learning with Python- Methods for Machine Learning.pptx
PPTX
Machine learning basics using python programking
PDF
Top 20 Data Science Interview Questions and Answers in 2023.pdf
Mis End Term Exam Theory Concepts
Industrial training ppt
introduction to machine learning
Machine Learning - Deep Learning
IRJET- Machine Learning: Survey, Types and Challenges
5. Machine Learning.pptx
Machine Learning_Unit 2_Full.ppt.pdf
INTRODUCTION TO MACHINE LEARNING.pptx
Introduction to machine learning
Top Machine Learning Algorithms Used By AI Professionals ARTiBA.pdf
Chapter 05 Machine Learning.pptx
Day1-Introdtechhnology of techuction.pptx
Lesson 1 - Overview of Machine Learning and Data Analysis.pptx
An Introduction to Machine Learning
A Survey on Machine Learning Algorithms
Data Science Interview Questions PDF By ScholarHat
Mlmlmlmlmlmlmlmlmlmlmlmlmlmlmlml.lmlmlmlmlm
Machine Learning with Python- Methods for Machine Learning.pptx
Machine learning basics using python programking
Top 20 Data Science Interview Questions and Answers in 2023.pdf
Ad

More from NitinSharma134320 (6)

PPTX
PPTX
Battle of Bands.pptx
PPTX
Presentation.pptx
PPTX
PY_17_06_20-1.pptx
PPT
shantanu_11_30.ppt
Battle of Bands.pptx
Presentation.pptx
PY_17_06_20-1.pptx
shantanu_11_30.ppt
Ad

Recently uploaded (20)

PPT
Quality review (1)_presentation of this 21
PDF
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
PPT
Reliability_Chapter_ presentation 1221.5784
PPTX
Computer network topology notes for revision
PPTX
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
PDF
Galatica Smart Energy Infrastructure Startup Pitch Deck
PDF
Foundation of Data Science unit number two notes
PPTX
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
PPTX
Business Acumen Training GuidePresentation.pptx
PDF
Launch Your Data Science Career in Kochi – 2025
PPT
Chapter 3 METAL JOINING.pptnnnnnnnnnnnnn
PPTX
IB Computer Science - Internal Assessment.pptx
PDF
Mega Projects Data Mega Projects Data
PPT
Miokarditis (Inflamasi pada Otot Jantung)
PDF
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
PPTX
05. PRACTICAL GUIDE TO MICROSOFT EXCEL.pptx
PPTX
Acceptance and paychological effects of mandatory extra coach I classes.pptx
PPTX
Introduction to Knowledge Engineering Part 1
PPTX
Introduction-to-Cloud-ComputingFinal.pptx
PPTX
IBA_Chapter_11_Slides_Final_Accessible.pptx
Quality review (1)_presentation of this 21
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
Reliability_Chapter_ presentation 1221.5784
Computer network topology notes for revision
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
Galatica Smart Energy Infrastructure Startup Pitch Deck
Foundation of Data Science unit number two notes
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
Business Acumen Training GuidePresentation.pptx
Launch Your Data Science Career in Kochi – 2025
Chapter 3 METAL JOINING.pptnnnnnnnnnnnnn
IB Computer Science - Internal Assessment.pptx
Mega Projects Data Mega Projects Data
Miokarditis (Inflamasi pada Otot Jantung)
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
05. PRACTICAL GUIDE TO MICROSOFT EXCEL.pptx
Acceptance and paychological effects of mandatory extra coach I classes.pptx
Introduction to Knowledge Engineering Part 1
Introduction-to-Cloud-ComputingFinal.pptx
IBA_Chapter_11_Slides_Final_Accessible.pptx

Machine Learning.pptx

  • 1. MACHINE LEARNING INTRODUCTION AND CONCEPT Data Science is a study of analyzing and predicting the data, it is a way to analyze the pattern from the data and bring out meaningful insightful for business critical decisions and expansions. Data Science is a blend of Mathematics, Statistics and Programming to analyze the data. Data Science is all about the present and future. That is, finding out the trends based on historical data which can be useful for making present decisions and finding patterns which can be modelled and can be used for predictions to see what things may look like in the future. Rapidly evolving technologies that affect data science:Automation.Text analysis. Platform growth. In- database analytics.
  • 3. MACHINE LEARNING 1) BUSINESS UNDERSTANDING Even though access to data and the computing power have both increased tremendously in the last decade the success of an organization still largely depends on the quality of questions they ask of their data set. A few right questions that other successful businesses have asked in the past of their data science teams. Uber — What percentage of time do drivers actually drive? How steady is their income? Oyo Hotels — What is the average occupancy of mediocre hotels? Alibaba — What are the per-square-feet profits of our warehouses?
  • 4. MACHINE LEARNING 2) Data Collection If asking the right questions is the recipe, then data is your ingredient. Once you have the clarity on business understanding, data collection becomes a matter of breaking the problem down into smaller components. The data scientist needs to know which ingredients are required, how to source and collect them, and how to prepare the data to meet the desired outcome. Data collection simply means an online survey, sensors, online generated data, data extraction from databases.
  • 5. MACHINE LEARNING 3) Data Preparation In this step we understand more about the data and prepare it for further analysis. The data understanding section of the data science methodology answers the question: Is the data that you collected representative of the problem to be solved? Preparation usually involves the following steps. 1. Handling missing data 2. Correcting invalid values 3. Removing duplicates 4. Structuring the data to be fed into an algorithm (Normalization) [Optional] 5. Feature Engineering
  • 6. MACHINE LEARNING 4) Data Modelling Modelling is the stage in the data science methodology where the data scientist has the chance to sample the sauce and determine if it’s bang on or in need of more seasoning Modelling is used to find patterns or behaviors in data. These patterns either help us in one of two ways 1. Descriptive Analysis 2. Predictive Analysis
  • 7. MACHINE LEARNING 5) Model Evaluation In the machine learning world modelling is divided into 2 distinct stages — training and testing. Training comprises of 70% of original data while Test data will comprise of 30% of original Data. Once we have modelled the data we can derive insights from it. This is the stage where we can finally start evaluating our complete data science system. The end of modelling is characterized by model evaluation where you measure 1. Accuracy — How well the model performs i.e. does it describe the data accurately. 2. Relevance — Does it answer the original question that you set out to answer
  • 8. MACHINE LEARNING Machine Learning is a subset of artificial intelligence which focuses mainly on machine learning from their experience and making predictions based on its experience. It enables the computers or the machines to make data-driven decisions rather than being explicitly programmed for carrying out a certain task. These programs or algorithms are designed in a way that they learn and improve over time when are exposed to new data.
  • 9. MACHINE LEARNING Supervised Learning Supervised Learning is the one, where you can consider the learning is guided by a teacher. We have a dataset which acts as a teacher and its role is to train the model or the machine. Once the model gets trained it can start making a prediction or decision when new data is given to it. Unsupervised Learning The model learns through observation and finds structures in the data. Once the model is given a dataset, it automatically finds patterns and relationships in the dataset by creating clusters in it. What it cannot do is add labels to the cluster, like it cannot say this a group of apples or mangoes, but it will separate all the apples from mangoes. Suppose we presented images of apples, bananas and mangoes to the model, so what it does, based on some patterns and relationships it creates clusters and divides the dataset into those clusters. Now if a new data is fed to the model, it adds it to one of the created clusters. Reinforcement Learning It is the ability of an agent to interact with the environment and find out what is the best outcome. It follows the concept of hit and trial method. The agent is rewarded or penalized with a point for a correct or a wrong answer, and on the basis of the positive reward points gained the model trains itself. And again once trained it gets ready to predict the new data presented to it.
  • 10. MACHINE LEARNING Linear Regression Linear regression is used for finding linear relationship between target and one or more predictors. There are two types of linear regression- Simple and Multiple.
  • 11. MACHINE LEARNING Logistic Regression is one of the basic and popular algorithm to solve a classification problem. It is named as ‘Logistic Regression’, because it’s underlying technique is quite the same as Linear Regression. The term “Logistic” is taken from the Logit function that is used in this method of classification. Suppose we have a data of tumor size vs its malignancy. As it is a classification problem, if we plot, we can see, all the values will lie on 0 and 1. And if we fit best found regression line, by assuming the threshold at 0.5, we can do line pretty reasonable job. The Sigmoid function is the main part of logistic regression, where Sigmoid of 𝜃^𝑇.𝑋, gives us the probability of a point belonging to a class, instead of the value of y directly.
  • 12. MACHINE LEARNING The KNN algorithm assumes that similar things exist in close proximity. In other words, similar things are near to each other. The KNN algorithm hinges on this assumption being true enough for the algorithm to be useful. KNN captures the idea of similarity (sometimes called distance, proximity, or closeness) with some mathematics. The distance between two points on the graph is calculated with the help of technique called Euclidean distance Euclidean Distance=sqrt of (X1-Y1)2 + (X2-Y2)2 + (X3-Y3)2 + (XN- YN)2…… Choosing the right value of K To select the K that’s right for your data, we run the KNN algorithm several times with different values of K and choose the K that reduces the number of errors we encounter while maintaining the algorithm’s ability to accurately make predictions when it’s given data it hasn’t seen before.
  • 13. MACHINE LEARNING K-means clustering is one of the simplest and popular unsupervised machine learning algorithms. Typically, unsupervised algorithms make inferences from datasets using only input vectors without referring to known, or labelled, outcomes. The simple objective of K-Means clustering is group similar data points together and discover underlying patterns. To achieve this objective, K-means looks for a fixed number (k) of clusters in a dataset. A cluster refers to a collection of data points aggregated together because of certain similarities. A Bottom-Up version of hierarchical clustering is known as Agglomerative clustering. Agglomerative clustering, you need to compute a distance/proximity matrix, which is an n by n table of all distances between each data point in each cluster of your dataset.
  • 14. MACHINE LEARNING DBSCAN? Density-based spatial clustering of applications with noise is a data clustering algorithm 1) DBSCAN can be used when examining spatial data. 2) DBSCAN can be applied to tasks with arbitrary shaped clusters, or clusters within clusters. 3) DBSCAN can find any arbitrary shaped cluster without getting affected by noise. Advantages 1) DBSCAN does not require one to specify the number of clusters in the data a priori, as opposed to k-means. 2) DBSCAN can find arbitrarily-shaped clusters. It can even find a cluster completely surrounded by (but not connected to) a different cluster. 3) DBSCAN has a notion of noise, and is robust to outliers.
  • 15. MACHINE LEARNING A decision tree is a decision support tool that uses a tree-like graph or model of decisions and their possible consequences, including chance event outcomes, resource costs, and utility. It is one way to display an algorithm that only contains conditional control statements. Entropy: The amount of information disorder in the data. When building a decision tree, we want to split the nodes in a way that decrease entropy and increases information gain.
  • 16. MACHINE LEARNING Missing Values Missing data in the training data set can reduce the power / fit of a model or can lead to a biased model because we have not analyzed the behavior and relationship with other variables correctly. It can lead to wrong prediction or classification.
  • 17. MACHINE LEARNING Outlier is a commonly used terminology by analysts and data scientists as it needs close attention else it can result in wildly wrong estimations. Simply speaking, Outlier is an observation that appears far away and diverges from an overall pattern in a sample.
  • 18. MACHINE LEARNING Feature Engineering Feature engineering is the science (and art) of extracting more information from existing data. You are not adding any new data here, but you are actually making the data you already have more useful. For example, let’s say you are trying to predict foot fall in a shopping mall based on dates. If you try and use the dates directly, you may not be able to extract meaningful insights from the data. This is because the foot fall is less affected by the day of the month than it is by the day of the week. Now this information about day of week is implicit in your data. You need to bring it out to make your model better. Feature Engineering includes Missing Value Treatment, Outlier’s Treatment, Dummy Variables and Variable Creation