SlideShare a Scribd company logo
Machine Learning with Python
Compiled by : Dr. Kumud Kundu
Outline
● The general concepts of machine learning
● The three types of learning and basic terminology
● The building blocks for successfully designing machine learning systems
● Introduction to Pandas, Matlplotlib and sklearn framework
○ For basics of Python refer to (https://www.python.org/) and
○ For basics of NumPy refer to (http://www.numpy.org/).
● Simple Program of Plotting Graphs with Matplotlib.pyplot
● Coding Template of Analyzing and Visualizing Dataframe with Pandas
● Simple Program for supervised learning (prediction modelling) with Linear Regression
● Simple Program for unsupervised learning (clustering) with Kmeans
Machine Learning
Machine learning, the application and science of algorithms that make sense of data
Or
Machine Learning uses algorithms that takes input data, learns from data and make
informed decisions.
Or
To design and implement programs that improve with experience
ML: Giving Computers the Ability to Learn from Data
Machine Learning is…
Automating automation
Getting computers to program themselves
Let the data do the work instead!
Training
Data
model/
predictor
past
model/
predictor
future
Testing
Data
JOURNEY FROM DATA TO PREDICTIONS
“Machine learning is the next Internet”
Traditional Programming
Machine Learning
Computer
Data
Program
Output
Computer
Data
Output
Program
Traditional Programming Vs. Machine Learning Programmming
Machine learning is inherently a multi-disciplinary field
It draws on results from :
Artificial intelligence,
Probability
Statistics
Computational complexity theory
Information theory
Philosophy
Psychology
Neurobiology
and other fields.
Most machine learning methods work well because of human-designed representations and input
features
ML becomes just optimizing weights to best make a final prediction
Machine Learning
How Machines Learn???
Learning is all about discovering the best parameter values (a, b, c …) that maps
input to output.
Or
The main goal behind learning, we want to learn how the values are calculated
(relationships between output and input) i.e.
Machine learning algorithms are described as learning a target function (f) that
best maps input variables (X) to an output variable (Y), Y = f(X)
The relationships can be linear or non linear.
These values enable the learned model to output results for new instances based on
previous learned ones.
The problem of learning a function from data is a difficult problem
and this is the reason why the field of machine learning and machine
learning algorithms exist.
● Error creeps in predicting output from real life input data instances (X).
i.e. Y = f(X) + e
● This error might be error such as not having enough attributes to sufficiently characterize the best
mapping from X to Y.
Subject 1
Subject 2
As an example, Face Identification program will recognize subject1 similar to subject 2 on the basis
of intensity profile, though expected output is Subject1 with pose
Subject 1
with pose
Ml programming with python
Ml programming with python
The following diagram shows a typical workflow for
using machine learning in predictive modeling:
ML Program
● A computer program is said to learn from experience E with respect to some class of tasks T
and performance measure P, if its performance at tasks in T, as measured by P, improves with
experience E.
Python for Machine Learning Program
Why Python??
Python is one of the most popular programming languages for data science and thanks to its very active developer
and open source community, a large number of useful libraries LIKE as NumPy and SciPy for scientific
computing and machine learning have been developed.
For machine learning programming tasks, the scikit-learn library, one of the most popular and accessible open
source machine learning libraries will be used.
Python on Jupyter Notebook
The Jupyter Notebook is an open-source web application that allows you
to create and share documents that contain live code, equations,
visualizations and narrative text.
The core programming languages supported by Jupyter are Julia, Python
and R.
Use it on Google Colab colab.research.google.com
or Use Jupyter notebook on Anaconda
● Using the Anaconda Python distribution and package manager
● The Anaconda installer can be downloaded at https://docs.anaconda.com/anaconda/install/, and an
Anaconda quick start guide is available at https://docs.anaconda.com/anaconda/user-guide/getting-started/.
Key Terms in Machine Language Program
● Training example: A row in a table representing the dataset and synonymous with an observation, record,
instance, or sample (in most contexts, sample refers to a collection of training examples).
● Training: Model fitting, for parametric models similar to parameter estimation.
● Feature Set : A column in a data table or data (design) matrix. Synonymous with predictor, variable, input,
attribute, or covariate.
● Target or Test Set y: Outcome, output, response variable, dependent variable, (class) label, and ground truth.
● Loss function / Cost Function / Error Function: Function that measure the deviation of predicted output from
the expected output.
Import the Libraries into the Jupyter Notebook
● Import Numpy as np
● Import Pandas as pd
● Import Matplotlib.pyplot as plt
Matplotlib: A Plotting Library for Python
● it makes heavy use of NumPy
● Importing matplotlib :
● from matplotlib import pyplot as plt or
● import matplotlib.pyplot as plt
● Examples:
● # for plotting bar graph
● x=[1,23,4,5,6,7]
● y=[23,45,67,89,90,100]
● plt.bar(x,y)
● plt.title('bar graph')
● plt.xlabel('fff')
● plt.ylabel('Y')
● plt.show()
● plt.scatter(x,y)
● plt.title('Scatter Plot')
● plt.xlabel('fff')
● plt.ylabel('Y')
● plt.show()
For subplots (Simultaneous plotting)
● Matplotlib.pyplot.subplot
● import numpy as np
● x=np.arange(0,10,0.01)
● plt.subplot(1,3,1)
● plt.plot(x,np.sin(x))
● plt.subplot(1,3,2)
● plt.plot(x,np.cos(x))
● plt.subplot(1,3,3)
● plt.plot(x,np.sin(2*x))
● plt.show()
Pandas is a fast, powerful, flexible and easy to use open source data analysis and
manipulation tool.
Pandas in data analysis:
Importing Data
Writing to different formats
Pandas Data Structures
Data Exploration
Data Manipulation
Aggregating Data
Merging Data
DataFrame
● DataFrame is a two-dimensional array with heterogeneous data.
Reading and Writing into DataFrames
● Import pandas as pd
● Reading Data into Dataframe using Pandas
○ df=pd.read_csv(‘File Name’) # From Comma Seperated Values (CSV) file
○ df=pd.read_csv('C:fdpbatsmen_ratings_all091217.csv')
○ df=pd.read_excel(‘File Name’)
● Writing Data from dataframes to Files on System
df.to_csv(‘File Name’ or ‘Destination Path along with path file’)
df.to_excel(‘File Name’ or ‘Destination Path along with path file’
To display all the records of the file : display(df)
● types = df.dtypes
● print(types)
Getting preview of Dataframe
● To view top n records of dataframe
○ df.head(5)
● To view bottom n records of dataframe
○ df.tail(5)
● View column name
○ df.columns
○ Getting subdataframe from dataframe
○ df['name’] , df[['name','nations']]
SubDataFrame as per Query
To display the records of India with ranking <50
display(df[(df['nations'] == "IND") & (df['rank’] < 50)])
Selecting data columns from dataset with column names:
df[[‘col1’ ‘col2’]]
With iloc (integer-location) based indexing for selection by position
df.iloc[:,:-1] // select all columns but not the last one
df.iloc [:, [4:6]] // select all rows of fourth, fifth and sixth column
Drop Columns from a Dataframe using drop() method.
Drop Columns from a Dataframe using and drop() method.
Method #1: Drop Columns from a Dataframe using drop() method.
Remove specific single column.
k.drop(['rate_date'],axis=1) // Axis =1 denotes dropping column of dataset
Removing specific multiple columns.
k.drop(['rate_date', 'rating'], axis=1)
Remove columns as based on column index.
k.drop[k.columns[[0,1]],axis=1, inplace= True)
Remove all columns between a specific column to another columns
K.iloc(:,[3,4])
Code for Data Reading, Data Manipulation using Pandas
● # Importing Data Reading, Data Manipulation Library of python
import pandas as pd
# import files because the files are not present on google colab
from google.colab import files
upload=files.upload()
# reading dataset using read_csv function
● df=pd.read_csv('rating.csv')
# to display column headers in dataset
df.columns
● # to get the number of instances and associated features
df.shape
# to get insights to data by grouping the data of one column
● df.groupby('nations').size()
# to get smaller dataset as per the query or subqueries
● k=(df[(df['nations'] =="IND") & (df['rank']<50)])
# to display smaller subset of data
display(k)
# to drop desired column from the smaller set of data
● k=dataset.drop(['name','rate_date','nations'],axis=1)
Scikit /sklearn: Free Machine Learning Library for Python
● It supports Python numerical and scientific libraries like NumPy and SciPy .
● Model selection is the process of selecting one final machine learning model from among a collection of candidate
machine learning models for a training dataset. Model selection is a process that can be applied both across different
types of models (e.g. logistic regression, SVM, KNN, etc.)
● from sklearn.model_selection
● model_selection is the process of selecting one final machine learning model among a collection of machine learning
models for training set.
● model parameters are parameters which arise as a result of the fit
Challenge of ML Program
The challenge of applied machine learning is in choosing
a model among a range of different models for your
problem.
Simple Predictive ML Program using Linear Regression
Model
● SIMPLE_REGRESSION.ipynb On Google Colab
# Important Data Reading, Data Manipulation Library of python
import pandas as pd
# import files because the files are not present on google colab
from google.colab import files
upload=files.upload()
# reading dataset using read_csv function
df=pd.read_csv('rating.csv.csv')
# For plotting graphs
import matplotlib.pyplot as plt
# Dividing Dataset into Train Set (X) and Target Set (y)
X = df.iloc[:, :-1].values
y = df.iloc[:, -1].values
# from machine learning library of python (sklearn) import train_test_split function
from sklearn.model_selection import train_test_split
# X is training set
# y is the target set
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 1/3, random_state = 0)
# split with the help of train_test_split function
# X part is divided in two parts Train and Test
# Y part is divided into two parts Train and Test
X_test.shape
# import Linear Regression Model
from sklearn.linear_model import LinearRegression
# created instance of linear regression model
model = LinearRegression()
# Finding the relationship between input AND OUTPUT with the help of fit function
model.fit(X_train, y_train)
# using the same trained model over the unknown test data i.e. x_test
y_pred = model.predict(X_test)
Visualizing and Evaluation of results
# Visualization of Results
plt.scatter(X_train, y_train, color = 'red')
plt.plot(X_train, regressor.predict(X_train), color = 'blue')
plt.title('PCM Marks vs Placement_Package (Training set)')
plt.xlabel('PCM Marks')
plt.ylabel('Placement_Package')
plt.show()
# importing metrics from sklearn to evaluate the predicted result
from sklearn import metrics
print('Mean Absolute Error:', metrics.mean_absolute_error(y_test, y_pred))
print('Mean Squared Error:', metrics.mean_squared_error(y_test, y_pred))
print('Root Mean Squared Error:',
# include Numerical Calculation Python Library numpy
import numpy as np
np.sqrt(metrics.mean_squared_error(y_test, y_pred)))
CLUSTERING : Grouping things together
UNSUPERVISED LEARNING
Cluster Analysis : A method of Unsupervised Learning
● Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group are
more similar to each other than to those in other groups.
● Clustering analysis to gain some valuable insights from our data by seeing what groups the data points fall into when
we apply a clustering algorithm.
● To survey academic performance of high school students , the entire population of particular board can be divided into
different clusters (Excellent Learner, Good Learner , Average Learner and Slow learner).
K-Means Clustering
● Aims to partition ‘n’ observations into k clusters in which each observation belongs to the
cluster with the nearest mean, serving as a prototype of the cluster.
● K-Means falls under the category of centroid-based clustering.
•n = number of instances
•k = number of clusters
•t = number of iterations
K-Means Clustering Algorithm involves the following steps-
● Choose the number of clusters K.
● Randomly select any K data points as cluster centers in such a way that they are as farther as possible from each
other.
○ Calculate the distance between each data point and each cluster center by using given distance function.
○ A data point is assigned to that cluster whose center is nearest to that data point.
○ Re-compute the center of newly formed clusters.
○ The center of a cluster is computed by taking mean of all the data points contained in that cluster.
● Keep repeating the above four steps until any of the following stopping criteria is met-
○ No change in the center of newly formed clusters
○ No change in the data points of the cluster
○ Maximum number of iterations are reached
Metric to evaluate the quality of Clusters
● Inertia : Inertia actually calculates the sum of distances of all the points within a cluster from the
centroid of that cluster.
● It tells us how far the points within a cluster are
● the distance between them should be as low as possible.
from sklearn.cluster import KMeans
● Using the K-Means++ algorithm, we optimize the step where we randomly pick the cluster
centroid.
● kmeans = KMeans(n_clusters = i, init = 'k-means++', random_state = 42)
● Using the elbow method to find the optimal number of clusters
An Elbow Method Algorithm
● The basic idea of the elbow rule is to use a square of the distance between the sample points in
each cluster and the centroid of the cluster to give a series of K values. The sum of squared
errors (SSE) is used as a performance indicator. Iterate over the K-value and calculate the SSE.
● Smaller values indicate that each cluster is more convergent
Clustering Example with K-Means
Coding contd..
Coding contd..
Agglomerative Clustering
● An agglomerative algorithm is a type of hierarchical clustering algorithm where
each individual element to be clustered is in its own cluster. These clusters are merged
iteratively until all the elements belong to one cluster.
● Hierarchical clustering is a powerful technique that allows to build tree structures from
data similarities.
Hierarchical Clustering Example
Coding contd..
Ml programming with python
Applications of Clustering
● Search Engines.
● Spam Detection
● Customer Segmentation

More Related Content

PPT
358 33 powerpoint-slides_4-introduction-data-structures_chapter-4
PDF
Text Classification with Lucene/Solr, Apache Hadoop and LibSVM
PDF
Introduction to R
PPTX
R Programming Tutorial for Beginners - -TIB Academy
PPT
R programming by ganesh kavhar
PDF
R programming groundup-basic-section-i
PPT
358 33 powerpoint-slides_3-pointers_chapter-3
PPT
Data structure
358 33 powerpoint-slides_4-introduction-data-structures_chapter-4
Text Classification with Lucene/Solr, Apache Hadoop and LibSVM
Introduction to R
R Programming Tutorial for Beginners - -TIB Academy
R programming by ganesh kavhar
R programming groundup-basic-section-i
358 33 powerpoint-slides_3-pointers_chapter-3
Data structure

What's hot (20)

PPTX
Workshop presentation hands on r programming
PPT
R-programming-training-in-mumbai
PPTX
R as supporting tool for analytics and simulation
PDF
Python Programming - XII. File Processing
PPTX
LSESU a Taste of R Language Workshop
PPTX
Intellectual technologies
PPTX
Intro to Machine Learning for non-Data Scientists
PPTX
Templates in c++
PPTX
264finalppt (1)
PPT
R programming slides
PPTX
Unit 2 linked list
PDF
Machine Learning in R
DOCX
Primitive data types
PDF
Introduction to the R Statistical Computing Environment
PPTX
Getting Started with R
PDF
Object Oriented Programming in Matlab
PDF
How to make Robust and Scalable Modeling Workbenches with Sirius - SiriusCon ...
PPT
Unit 2 Principles of Programming Languages
PPTX
08 class and object
PPTX
R programming Fundamentals
Workshop presentation hands on r programming
R-programming-training-in-mumbai
R as supporting tool for analytics and simulation
Python Programming - XII. File Processing
LSESU a Taste of R Language Workshop
Intellectual technologies
Intro to Machine Learning for non-Data Scientists
Templates in c++
264finalppt (1)
R programming slides
Unit 2 linked list
Machine Learning in R
Primitive data types
Introduction to the R Statistical Computing Environment
Getting Started with R
Object Oriented Programming in Matlab
How to make Robust and Scalable Modeling Workbenches with Sirius - SiriusCon ...
Unit 2 Principles of Programming Languages
08 class and object
R programming Fundamentals
Ad

Similar to Ml programming with python (20)

PDF
Start machine learning in 5 simple steps
PPTX
Lecture 1 Pandas Basics.pptx machine learning
PDF
Accelerating Production Machine Learning with MLflow with Matei Zaharia
PPT
Concepts In Object Oriented Programming Languages
PDF
Lesson 2 data preprocessing
PPTX
PPT on Data Science Using Python
PDF
Standardizing on a single N-dimensional array API for Python
PPTX
Lecture-6-7.pptx
PDF
Netflix Machine Learning Infra for Recommendations - 2018
PDF
ML Infra for Netflix Recommendations - AI NEXTCon talk
PPTX
Unit 8.4Testing condition _ Developing Games.pptx
PDF
Silicon valleycodecamp2013
PPTX
Meetup Junio Data Analysis with python 2018
PDF
The ABC of Implementing Supervised Machine Learning with Python.pptx
PPTX
Python for data analysis
PPTX
python for data anal gh i o fytysis creation.pptx
PDF
IRJET- Unabridged Review of Supervised Machine Learning Regression and Classi...
PDF
XII - 2022-23 - IP - RAIPUR (CBSE FINAL EXAM).pdf
PPTX
Internship (7)gfytfyugiujhoiipobjhvyuhjkb jh
Start machine learning in 5 simple steps
Lecture 1 Pandas Basics.pptx machine learning
Accelerating Production Machine Learning with MLflow with Matei Zaharia
Concepts In Object Oriented Programming Languages
Lesson 2 data preprocessing
PPT on Data Science Using Python
Standardizing on a single N-dimensional array API for Python
Lecture-6-7.pptx
Netflix Machine Learning Infra for Recommendations - 2018
ML Infra for Netflix Recommendations - AI NEXTCon talk
Unit 8.4Testing condition _ Developing Games.pptx
Silicon valleycodecamp2013
Meetup Junio Data Analysis with python 2018
The ABC of Implementing Supervised Machine Learning with Python.pptx
Python for data analysis
python for data anal gh i o fytysis creation.pptx
IRJET- Unabridged Review of Supervised Machine Learning Regression and Classi...
XII - 2022-23 - IP - RAIPUR (CBSE FINAL EXAM).pdf
Internship (7)gfytfyugiujhoiipobjhvyuhjkb jh
Ad

Recently uploaded (20)

PDF
O5-L3 Freight Transport Ops (International) V1.pdf
PDF
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH 9 GLOBAL SUCCESS - CẢ NĂM - BÁM SÁT FORM Đ...
PPTX
Lesson notes of climatology university.
PPTX
Pharma ospi slides which help in ospi learning
PPTX
Cell Types and Its function , kingdom of life
PDF
STATICS OF THE RIGID BODIES Hibbelers.pdf
PPTX
Final Presentation General Medicine 03-08-2024.pptx
PPTX
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
PDF
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
PDF
01-Introduction-to-Information-Management.pdf
PDF
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
PDF
Classroom Observation Tools for Teachers
PDF
Basic Mud Logging Guide for educational purpose
PPTX
Pharmacology of Heart Failure /Pharmacotherapy of CHF
PDF
Complications of Minimal Access Surgery at WLH
PDF
Computing-Curriculum for Schools in Ghana
PDF
Insiders guide to clinical Medicine.pdf
PDF
RMMM.pdf make it easy to upload and study
PPTX
Cell Structure & Organelles in detailed.
PDF
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
O5-L3 Freight Transport Ops (International) V1.pdf
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH 9 GLOBAL SUCCESS - CẢ NĂM - BÁM SÁT FORM Đ...
Lesson notes of climatology university.
Pharma ospi slides which help in ospi learning
Cell Types and Its function , kingdom of life
STATICS OF THE RIGID BODIES Hibbelers.pdf
Final Presentation General Medicine 03-08-2024.pptx
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
01-Introduction-to-Information-Management.pdf
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
Classroom Observation Tools for Teachers
Basic Mud Logging Guide for educational purpose
Pharmacology of Heart Failure /Pharmacotherapy of CHF
Complications of Minimal Access Surgery at WLH
Computing-Curriculum for Schools in Ghana
Insiders guide to clinical Medicine.pdf
RMMM.pdf make it easy to upload and study
Cell Structure & Organelles in detailed.
3rd Neelam Sanjeevareddy Memorial Lecture.pdf

Ml programming with python

  • 1. Machine Learning with Python Compiled by : Dr. Kumud Kundu
  • 2. Outline ● The general concepts of machine learning ● The three types of learning and basic terminology ● The building blocks for successfully designing machine learning systems ● Introduction to Pandas, Matlplotlib and sklearn framework ○ For basics of Python refer to (https://www.python.org/) and ○ For basics of NumPy refer to (http://www.numpy.org/). ● Simple Program of Plotting Graphs with Matplotlib.pyplot ● Coding Template of Analyzing and Visualizing Dataframe with Pandas ● Simple Program for supervised learning (prediction modelling) with Linear Regression ● Simple Program for unsupervised learning (clustering) with Kmeans
  • 3. Machine Learning Machine learning, the application and science of algorithms that make sense of data Or Machine Learning uses algorithms that takes input data, learns from data and make informed decisions. Or To design and implement programs that improve with experience
  • 4. ML: Giving Computers the Ability to Learn from Data
  • 5. Machine Learning is… Automating automation Getting computers to program themselves Let the data do the work instead! Training Data model/ predictor past model/ predictor future Testing Data
  • 6. JOURNEY FROM DATA TO PREDICTIONS “Machine learning is the next Internet”
  • 8. Machine learning is inherently a multi-disciplinary field It draws on results from : Artificial intelligence, Probability Statistics Computational complexity theory Information theory Philosophy Psychology Neurobiology and other fields.
  • 9. Most machine learning methods work well because of human-designed representations and input features ML becomes just optimizing weights to best make a final prediction Machine Learning
  • 10. How Machines Learn??? Learning is all about discovering the best parameter values (a, b, c …) that maps input to output. Or The main goal behind learning, we want to learn how the values are calculated (relationships between output and input) i.e. Machine learning algorithms are described as learning a target function (f) that best maps input variables (X) to an output variable (Y), Y = f(X) The relationships can be linear or non linear. These values enable the learned model to output results for new instances based on previous learned ones.
  • 11. The problem of learning a function from data is a difficult problem and this is the reason why the field of machine learning and machine learning algorithms exist. ● Error creeps in predicting output from real life input data instances (X). i.e. Y = f(X) + e ● This error might be error such as not having enough attributes to sufficiently characterize the best mapping from X to Y. Subject 1 Subject 2 As an example, Face Identification program will recognize subject1 similar to subject 2 on the basis of intensity profile, though expected output is Subject1 with pose Subject 1 with pose
  • 14. The following diagram shows a typical workflow for using machine learning in predictive modeling:
  • 15. ML Program ● A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E.
  • 16. Python for Machine Learning Program
  • 17. Why Python?? Python is one of the most popular programming languages for data science and thanks to its very active developer and open source community, a large number of useful libraries LIKE as NumPy and SciPy for scientific computing and machine learning have been developed. For machine learning programming tasks, the scikit-learn library, one of the most popular and accessible open source machine learning libraries will be used.
  • 18. Python on Jupyter Notebook The Jupyter Notebook is an open-source web application that allows you to create and share documents that contain live code, equations, visualizations and narrative text. The core programming languages supported by Jupyter are Julia, Python and R. Use it on Google Colab colab.research.google.com or Use Jupyter notebook on Anaconda ● Using the Anaconda Python distribution and package manager ● The Anaconda installer can be downloaded at https://docs.anaconda.com/anaconda/install/, and an Anaconda quick start guide is available at https://docs.anaconda.com/anaconda/user-guide/getting-started/.
  • 19. Key Terms in Machine Language Program ● Training example: A row in a table representing the dataset and synonymous with an observation, record, instance, or sample (in most contexts, sample refers to a collection of training examples). ● Training: Model fitting, for parametric models similar to parameter estimation. ● Feature Set : A column in a data table or data (design) matrix. Synonymous with predictor, variable, input, attribute, or covariate. ● Target or Test Set y: Outcome, output, response variable, dependent variable, (class) label, and ground truth. ● Loss function / Cost Function / Error Function: Function that measure the deviation of predicted output from the expected output.
  • 20. Import the Libraries into the Jupyter Notebook ● Import Numpy as np ● Import Pandas as pd ● Import Matplotlib.pyplot as plt
  • 21. Matplotlib: A Plotting Library for Python ● it makes heavy use of NumPy ● Importing matplotlib : ● from matplotlib import pyplot as plt or ● import matplotlib.pyplot as plt ● Examples: ● # for plotting bar graph ● x=[1,23,4,5,6,7] ● y=[23,45,67,89,90,100] ● plt.bar(x,y) ● plt.title('bar graph') ● plt.xlabel('fff') ● plt.ylabel('Y') ● plt.show()
  • 22. ● plt.scatter(x,y) ● plt.title('Scatter Plot') ● plt.xlabel('fff') ● plt.ylabel('Y') ● plt.show()
  • 23. For subplots (Simultaneous plotting) ● Matplotlib.pyplot.subplot ● import numpy as np ● x=np.arange(0,10,0.01) ● plt.subplot(1,3,1) ● plt.plot(x,np.sin(x)) ● plt.subplot(1,3,2) ● plt.plot(x,np.cos(x)) ● plt.subplot(1,3,3) ● plt.plot(x,np.sin(2*x)) ● plt.show()
  • 24. Pandas is a fast, powerful, flexible and easy to use open source data analysis and manipulation tool. Pandas in data analysis: Importing Data Writing to different formats Pandas Data Structures Data Exploration Data Manipulation Aggregating Data Merging Data
  • 25. DataFrame ● DataFrame is a two-dimensional array with heterogeneous data.
  • 26. Reading and Writing into DataFrames ● Import pandas as pd ● Reading Data into Dataframe using Pandas ○ df=pd.read_csv(‘File Name’) # From Comma Seperated Values (CSV) file ○ df=pd.read_csv('C:fdpbatsmen_ratings_all091217.csv') ○ df=pd.read_excel(‘File Name’) ● Writing Data from dataframes to Files on System df.to_csv(‘File Name’ or ‘Destination Path along with path file’) df.to_excel(‘File Name’ or ‘Destination Path along with path file’ To display all the records of the file : display(df) ● types = df.dtypes ● print(types)
  • 27. Getting preview of Dataframe ● To view top n records of dataframe ○ df.head(5) ● To view bottom n records of dataframe ○ df.tail(5) ● View column name ○ df.columns ○ Getting subdataframe from dataframe ○ df['name’] , df[['name','nations']]
  • 28. SubDataFrame as per Query To display the records of India with ranking <50 display(df[(df['nations'] == "IND") & (df['rank’] < 50)]) Selecting data columns from dataset with column names: df[[‘col1’ ‘col2’]] With iloc (integer-location) based indexing for selection by position df.iloc[:,:-1] // select all columns but not the last one df.iloc [:, [4:6]] // select all rows of fourth, fifth and sixth column
  • 29. Drop Columns from a Dataframe using drop() method. Drop Columns from a Dataframe using and drop() method. Method #1: Drop Columns from a Dataframe using drop() method. Remove specific single column. k.drop(['rate_date'],axis=1) // Axis =1 denotes dropping column of dataset Removing specific multiple columns. k.drop(['rate_date', 'rating'], axis=1) Remove columns as based on column index. k.drop[k.columns[[0,1]],axis=1, inplace= True) Remove all columns between a specific column to another columns K.iloc(:,[3,4])
  • 30. Code for Data Reading, Data Manipulation using Pandas ● # Importing Data Reading, Data Manipulation Library of python import pandas as pd # import files because the files are not present on google colab from google.colab import files upload=files.upload() # reading dataset using read_csv function ● df=pd.read_csv('rating.csv') # to display column headers in dataset df.columns ● # to get the number of instances and associated features df.shape # to get insights to data by grouping the data of one column ● df.groupby('nations').size() # to get smaller dataset as per the query or subqueries ● k=(df[(df['nations'] =="IND") & (df['rank']<50)]) # to display smaller subset of data display(k) # to drop desired column from the smaller set of data ● k=dataset.drop(['name','rate_date','nations'],axis=1)
  • 31. Scikit /sklearn: Free Machine Learning Library for Python ● It supports Python numerical and scientific libraries like NumPy and SciPy . ● Model selection is the process of selecting one final machine learning model from among a collection of candidate machine learning models for a training dataset. Model selection is a process that can be applied both across different types of models (e.g. logistic regression, SVM, KNN, etc.) ● from sklearn.model_selection ● model_selection is the process of selecting one final machine learning model among a collection of machine learning models for training set. ● model parameters are parameters which arise as a result of the fit
  • 32. Challenge of ML Program The challenge of applied machine learning is in choosing a model among a range of different models for your problem.
  • 33. Simple Predictive ML Program using Linear Regression Model ● SIMPLE_REGRESSION.ipynb On Google Colab # Important Data Reading, Data Manipulation Library of python import pandas as pd # import files because the files are not present on google colab from google.colab import files upload=files.upload() # reading dataset using read_csv function df=pd.read_csv('rating.csv.csv') # For plotting graphs import matplotlib.pyplot as plt # Dividing Dataset into Train Set (X) and Target Set (y) X = df.iloc[:, :-1].values y = df.iloc[:, -1].values
  • 34. # from machine learning library of python (sklearn) import train_test_split function from sklearn.model_selection import train_test_split # X is training set # y is the target set X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 1/3, random_state = 0) # split with the help of train_test_split function # X part is divided in two parts Train and Test # Y part is divided into two parts Train and Test X_test.shape # import Linear Regression Model from sklearn.linear_model import LinearRegression # created instance of linear regression model model = LinearRegression() # Finding the relationship between input AND OUTPUT with the help of fit function model.fit(X_train, y_train) # using the same trained model over the unknown test data i.e. x_test y_pred = model.predict(X_test)
  • 35. Visualizing and Evaluation of results # Visualization of Results plt.scatter(X_train, y_train, color = 'red') plt.plot(X_train, regressor.predict(X_train), color = 'blue') plt.title('PCM Marks vs Placement_Package (Training set)') plt.xlabel('PCM Marks') plt.ylabel('Placement_Package') plt.show() # importing metrics from sklearn to evaluate the predicted result from sklearn import metrics print('Mean Absolute Error:', metrics.mean_absolute_error(y_test, y_pred)) print('Mean Squared Error:', metrics.mean_squared_error(y_test, y_pred)) print('Root Mean Squared Error:', # include Numerical Calculation Python Library numpy import numpy as np np.sqrt(metrics.mean_squared_error(y_test, y_pred)))
  • 36. CLUSTERING : Grouping things together UNSUPERVISED LEARNING
  • 37. Cluster Analysis : A method of Unsupervised Learning ● Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group are more similar to each other than to those in other groups. ● Clustering analysis to gain some valuable insights from our data by seeing what groups the data points fall into when we apply a clustering algorithm. ● To survey academic performance of high school students , the entire population of particular board can be divided into different clusters (Excellent Learner, Good Learner , Average Learner and Slow learner).
  • 38. K-Means Clustering ● Aims to partition ‘n’ observations into k clusters in which each observation belongs to the cluster with the nearest mean, serving as a prototype of the cluster. ● K-Means falls under the category of centroid-based clustering. •n = number of instances •k = number of clusters •t = number of iterations
  • 39. K-Means Clustering Algorithm involves the following steps- ● Choose the number of clusters K. ● Randomly select any K data points as cluster centers in such a way that they are as farther as possible from each other. ○ Calculate the distance between each data point and each cluster center by using given distance function. ○ A data point is assigned to that cluster whose center is nearest to that data point. ○ Re-compute the center of newly formed clusters. ○ The center of a cluster is computed by taking mean of all the data points contained in that cluster. ● Keep repeating the above four steps until any of the following stopping criteria is met- ○ No change in the center of newly formed clusters ○ No change in the data points of the cluster ○ Maximum number of iterations are reached
  • 40. Metric to evaluate the quality of Clusters ● Inertia : Inertia actually calculates the sum of distances of all the points within a cluster from the centroid of that cluster. ● It tells us how far the points within a cluster are ● the distance between them should be as low as possible.
  • 41. from sklearn.cluster import KMeans ● Using the K-Means++ algorithm, we optimize the step where we randomly pick the cluster centroid. ● kmeans = KMeans(n_clusters = i, init = 'k-means++', random_state = 42) ● Using the elbow method to find the optimal number of clusters
  • 42. An Elbow Method Algorithm ● The basic idea of the elbow rule is to use a square of the distance between the sample points in each cluster and the centroid of the cluster to give a series of K values. The sum of squared errors (SSE) is used as a performance indicator. Iterate over the K-value and calculate the SSE. ● Smaller values indicate that each cluster is more convergent
  • 46. Agglomerative Clustering ● An agglomerative algorithm is a type of hierarchical clustering algorithm where each individual element to be clustered is in its own cluster. These clusters are merged iteratively until all the elements belong to one cluster. ● Hierarchical clustering is a powerful technique that allows to build tree structures from data similarities.
  • 50. Applications of Clustering ● Search Engines. ● Spam Detection ● Customer Segmentation