SlideShare a Scribd company logo
ICT 3202 - INTRODUCTION
TO DATA SCIENCE
BY
ENGR. JOHNSON C. UBAH
B.ENG, M.ENG, HCNA, ASM
Machine Learning and Statistics
Machine learning is the practice of programming computers to learn from
data.
Machine learning is a subfield of artificial intelligence (AI). The goal of
machine learning generally is to understand the structure of data and fit
that data into models that can be understood and utilized by people.
In machine learning, data is referred to as called training sets or
examples.
Intro. To Machine Learning
Machine learning differs from traditional computational approaches because;
Traditional computing algorithms are sets of steps followed by computers to
solve problems.
Machine learning algorithms allows computers to train on data inputs and use
statistical analysis in order to generate output values that falls within specific
range.
Why Machine Learning?
Lets assume you’d like to write a filter program without using machine learning
methods. The steps would be;
You’d take a look at what spam e-mails looks like
You’d write an algorithm to detect the patterns that you’ve seen and the
software would then flag the e-mails as spam
Finally, you’d test the program, and redo the first two steps again until the results
are good enough.
Why Machine learning?
This program contains very long list of rules and hence
difficult to maintain. But if done with machine learning, you
will be able to maintain it properly.
Programs that uses ML techniques will automatically detect
changes by users, and update their definition automatically.
Why Machine Learning?
Machine Learning algorithm with automatic update when users change preference
When to use machine learning
When you have a problem that requires many rules to find the
solution.
Very complex problems for which there is no solution with
traditional approach.
Non-stable environments: machine learning software can adapt to
new data.
Classification of ML
There are types of machine learning systems. We can divide them into
categories, depending on whether;
1. They have been trained with humans or not
◦ Supervised
◦ Unsupervised
◦ Semi-supervised
◦ Reinforcement learning
2. If they can learn incrementally
3. If they work simply by comparing new data points to find data points or can
detect new patterns in the data, and then will build a model.
introduction to machine learning
Supervised and unsupervised learning
We can classify machine learning systems according to the type
and amount of human supervision during the training. They are;
◦ Supervised learning
◦ Unsupervised learning
◦ Semi-supervised learning
◦ Reinforced learning.
Supervised learning
When an algorithm learns from example data and associated target
responses that can consist of numeric values or string labels, such as
classes or tags, in order to later predict the correct response when
posed with new examples comes under the category of Supervised
learning.
This approach is indeed similar to human learning under the
supervision of a teacher.
Tasks carried out by supervised learning
Supervised learning groups together a task of
classification. The program is a good example of this
because it’s been trained with many emails at the same
time as their class.
Another example is to predict a numeric value like the
price of a flat, given a set of features (location, number
of rooms, facilities) called predictors; this task is called
regression.
Supervised learning algorithms
You should keep in mind that some regression algorithms can be
used for classification as well, and vise versa.
Some important supervised algorithms
◦ K-nearest neighbors
◦ Linear regression
◦ Neural network
◦ Support vector machines
◦ Logistic regression
◦ Decision trees and random forest
Unsupervised learning
Unsupervised learning occurs when an algorithm learns from plain examples
without any associated response, leaving to the algorithm to determine the data
patterns on its own.
This type of algorithm tends to restructure the data into something else, such as
new features that may represent a class or a new series of un-correlated values.
They are quite useful in providing humans with insights into the meaning of data
and new useful inputs to supervised machine learning algorithms.
Unsupervised learning
As a kind of learning, it resembles the methods humans use to figure
out that certain objects or events are from the same class, such as by
observing the degree of similarity between objects. Some
recommendation systems that you find on the web in the form of
marketing automation are based on this type of learning.
In this type of learning the data is unlabeled.
Unsupervised learning algorithms
Some unsupervised learning algorithms includes;
◦Clustering: k-means, hierarchical cluster analysis
◦Association rule learning: Eclat, apriori
◦Visualization and dimensionality reduction: kernel PCA, t-
distribution, PCA
Examples of unsupervised learning
suppose you’ve got many data on visitor, you can use one
algorithm to detect groups with similar visitors. 65% of your
visitors might be males who love watching movie in the
evening, while 30% watch plays in the evening: Using
clustering algorithm, we have the smaller groups.
Secondly, for visualization algorithms, you will need to give
them many data and unlabeled data as input, and then you
will get 2D or 3D visualization as an output. Feature
extraction takes place here.
Reinforcement learning
An Agent “AI system” will observe the
environment, performs given actions, and
then receive rewards in return.
Here, the agent must learn by itself.
You can find this type of learning in many
robotics applications that learns how to
walk.
Semi-supervised learning
where an incomplete training signal is given: a training set
with some (often many) of the target outputs missing.
There is a special case of this principle known as
Transduction where the entire set of problem instances is
known at learning time, except that part of the targets are
missing.
Bad and Insufficient quantity of Training
Data
Machine learning systems are not like children,
who can distinguish apples and oranges in all
sorts of colors and shapes, but they require lot of
data to work effectively, whether you’re working
with very simple programs and problems, or
complex applications like image processing and
speech recognition.
Poor Quality Data
If you are working with training data that is full of errors and
outliers, this will make it very hard for the system to detect
patterns, so it won’t work properly.
So, if you want your program to work well, you must spend
more time cleaning up your training data.
Irrelevant features
The system will only be able to learn if the training data contains enough features
and data that aren’t too irrelevant. The most important part of any ML project is to
develop good features. “feature engineering”
Feature engineering follows this process:
◦ Feature selection: selecting the most useful features
◦ Feature extraction: combining existing features to provide more useful features.
◦ Creation of new features: creation of new features, based on data.
Testing
To ensure your model is working well and that models can generalize
with new cases, you can try out new cases with it by putting the
model in the environment and then monitoring how it will perform.
This is good practice.
You should divide your data into two set, one for training and the
second for testing.
Testing
The generalization error is the rate of error by evaluation of your model on the
test set. The value you get will tell you if your model is good enough, and if it will
work properly.
If the error rate is low, the model is good and will perform properly and vice
versa.
It is advisable to use 80% of your data for training and 20% for testing
Overfitting the data
Overgeneralization in machine learning is called “overfitting”.
Overfitting occurs when the model is very complex for the amount of
training data given.
Solution
Gather more data for “training data”
Reduce the noise level
Select one with fewer parameters
Under-fitting the data
This the opposite of overfitting. You will encounter this when the model is very
simple to learn.
For example, using the example of quality of life, real life is more complex than
your model, so the predictions won’t yield the same, even in the training
examples.
Solution:
◦ Select the most powerful model, which has many parameters
◦ Feed the best features into your algorithms. Here, I’m referring to feature
engineering
◦ Reduce the constraints on your model
Underfitting
Software for this course
Python’s popularity may be due to the increased development of deep learning
frameworks available for this language recently, including TensorFlow, PyTorch,
and Keras. As a language that has readable syntax and the ability to be used as a
scripting language, Python proves to be powerful and straightforward both for
preprocessing data and working with data directly. The scikit-learn machine
learning library is built on top of several existing Python packages that Python
developers may already be familiar with, namely NumPy, SciPy, and Matplotlib.
Software for this course
MATLAB makes machine learning easy. With tools and functions for
handling big data, as well as apps to make machine learning accessible,
MATLAB is an ideal environment for applying machine learning to your
data analytics.
With MATLAB, engineers and data scientists have immediate access to
prebuilt functions, extensive toolboxes, and specialized apps
for classification, regression, and clustering.

More Related Content

PPT
Machine Learning
PPTX
Machine Learning
PPTX
Introduction to Machine Learning
PDF
Lecture1 introduction to machine learning
PPT
Machine learning
PPTX
Machine learning overview
PDF
Machine learning
PPTX
INTRODUCTION TO MACHINE LEARNING.pptx
Machine Learning
Machine Learning
Introduction to Machine Learning
Lecture1 introduction to machine learning
Machine learning
Machine learning overview
Machine learning
INTRODUCTION TO MACHINE LEARNING.pptx

What's hot (20)

ODP
Machine Learning With Logistic Regression
PDF
Machine learning Algorithms
PPTX
Machine Learning
PPTX
Machine learning ppt
PPTX
Machine Learning Algorithms
PDF
Machine Learning and its Applications
PPTX
Machine Learning Algorithms | Machine Learning Tutorial | Data Science Algori...
PPTX
Machine Learning and Real-World Applications
PPT
Basics of Machine Learning
PPT
Machine Learning presentation.
ODP
Machine Learning with Decision trees
PPT
Machine Learning
PDF
Machine Learning: Applications, Process and Techniques
PPTX
An introduction to reinforcement learning
PDF
ML Basics
PPTX
Presentation on supervised learning
PDF
"An Introduction to Machine Learning and How to Teach Machines to See," a Pre...
PDF
Supervised learning
PPTX
Introduction to Linear Discriminant Analysis
PPTX
Overfitting & Underfitting
Machine Learning With Logistic Regression
Machine learning Algorithms
Machine Learning
Machine learning ppt
Machine Learning Algorithms
Machine Learning and its Applications
Machine Learning Algorithms | Machine Learning Tutorial | Data Science Algori...
Machine Learning and Real-World Applications
Basics of Machine Learning
Machine Learning presentation.
Machine Learning with Decision trees
Machine Learning
Machine Learning: Applications, Process and Techniques
An introduction to reinforcement learning
ML Basics
Presentation on supervised learning
"An Introduction to Machine Learning and How to Teach Machines to See," a Pre...
Supervised learning
Introduction to Linear Discriminant Analysis
Overfitting & Underfitting
Ad

Similar to introduction to machine learning (20)

PPTX
Machine Learning Contents.pptx
PDF
Machine Learning Landscape
PPTX
Intro/Overview on Machine Learning Presentation
PDF
An Introduction to Machine Learning
PPTX
Introduction to Machine Learning
DOCX
machine learning.docx
PPTX
Machine learning basics using python programking
PDF
what-is-machine-learning-and-its-importance-in-todays-world.pdf
PDF
Supervised learning techniques and applications
PDF
Machine Learning_Unit 2_Full.ppt.pdf
PPTX
chapter Three artificial intelligence 1.pptx
PDF
Mlmlmlmlmlmlmlmlmlmlmlmlmlmlmlml.lmlmlmlmlm
DOCX
Introduction to Machine Learning for btech 7th sem
DOC
Intro/Overview on Machine Learning Presentation -2
PDF
Machine Learning
PPTX
INTERNSHIP ON MAcHINE LEARNING.pptx
PPTX
Machine Learning with Python- Methods for Machine Learning.pptx
PDF
machine learning
PPT
Machine Learning Ch 1.ppt
Machine Learning Contents.pptx
Machine Learning Landscape
Intro/Overview on Machine Learning Presentation
An Introduction to Machine Learning
Introduction to Machine Learning
machine learning.docx
Machine learning basics using python programking
what-is-machine-learning-and-its-importance-in-todays-world.pdf
Supervised learning techniques and applications
Machine Learning_Unit 2_Full.ppt.pdf
chapter Three artificial intelligence 1.pptx
Mlmlmlmlmlmlmlmlmlmlmlmlmlmlmlml.lmlmlmlmlm
Introduction to Machine Learning for btech 7th sem
Intro/Overview on Machine Learning Presentation -2
Machine Learning
INTERNSHIP ON MAcHINE LEARNING.pptx
Machine Learning with Python- Methods for Machine Learning.pptx
machine learning
Machine Learning Ch 1.ppt
Ad

More from Johnson Ubah (7)

PPTX
Supervised learning
PPTX
Statistical inference with Python
PPTX
Lecture 3 intro2data
PPTX
IP Addressing
PPTX
OSI reference Model
PPTX
introduction to data science
PPTX
Network and computer forensics
Supervised learning
Statistical inference with Python
Lecture 3 intro2data
IP Addressing
OSI reference Model
introduction to data science
Network and computer forensics

Recently uploaded (20)

PDF
Foundation of Data Science unit number two notes
PPTX
Moving the Public Sector (Government) to a Digital Adoption
PDF
Galatica Smart Energy Infrastructure Startup Pitch Deck
PPTX
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
PPTX
Database Infoormation System (DBIS).pptx
PDF
Mega Projects Data Mega Projects Data
PDF
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
PPTX
oil_refinery_comprehensive_20250804084928 (1).pptx
PPTX
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
PDF
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
PPTX
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
PPTX
Supervised vs unsupervised machine learning algorithms
PPTX
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
PDF
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
PPTX
Acceptance and paychological effects of mandatory extra coach I classes.pptx
PPTX
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
PPTX
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
PPT
Quality review (1)_presentation of this 21
PPTX
Business Acumen Training GuidePresentation.pptx
Foundation of Data Science unit number two notes
Moving the Public Sector (Government) to a Digital Adoption
Galatica Smart Energy Infrastructure Startup Pitch Deck
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
Database Infoormation System (DBIS).pptx
Mega Projects Data Mega Projects Data
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
oil_refinery_comprehensive_20250804084928 (1).pptx
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
Supervised vs unsupervised machine learning algorithms
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
Acceptance and paychological effects of mandatory extra coach I classes.pptx
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
Quality review (1)_presentation of this 21
Business Acumen Training GuidePresentation.pptx

introduction to machine learning

  • 1. ICT 3202 - INTRODUCTION TO DATA SCIENCE BY ENGR. JOHNSON C. UBAH B.ENG, M.ENG, HCNA, ASM
  • 2. Machine Learning and Statistics
  • 3. Machine learning is the practice of programming computers to learn from data. Machine learning is a subfield of artificial intelligence (AI). The goal of machine learning generally is to understand the structure of data and fit that data into models that can be understood and utilized by people. In machine learning, data is referred to as called training sets or examples.
  • 4. Intro. To Machine Learning Machine learning differs from traditional computational approaches because; Traditional computing algorithms are sets of steps followed by computers to solve problems. Machine learning algorithms allows computers to train on data inputs and use statistical analysis in order to generate output values that falls within specific range.
  • 5. Why Machine Learning? Lets assume you’d like to write a filter program without using machine learning methods. The steps would be; You’d take a look at what spam e-mails looks like You’d write an algorithm to detect the patterns that you’ve seen and the software would then flag the e-mails as spam Finally, you’d test the program, and redo the first two steps again until the results are good enough.
  • 6. Why Machine learning? This program contains very long list of rules and hence difficult to maintain. But if done with machine learning, you will be able to maintain it properly. Programs that uses ML techniques will automatically detect changes by users, and update their definition automatically.
  • 7. Why Machine Learning? Machine Learning algorithm with automatic update when users change preference
  • 8. When to use machine learning When you have a problem that requires many rules to find the solution. Very complex problems for which there is no solution with traditional approach. Non-stable environments: machine learning software can adapt to new data.
  • 9. Classification of ML There are types of machine learning systems. We can divide them into categories, depending on whether; 1. They have been trained with humans or not ◦ Supervised ◦ Unsupervised ◦ Semi-supervised ◦ Reinforcement learning 2. If they can learn incrementally 3. If they work simply by comparing new data points to find data points or can detect new patterns in the data, and then will build a model.
  • 11. Supervised and unsupervised learning We can classify machine learning systems according to the type and amount of human supervision during the training. They are; ◦ Supervised learning ◦ Unsupervised learning ◦ Semi-supervised learning ◦ Reinforced learning.
  • 12. Supervised learning When an algorithm learns from example data and associated target responses that can consist of numeric values or string labels, such as classes or tags, in order to later predict the correct response when posed with new examples comes under the category of Supervised learning. This approach is indeed similar to human learning under the supervision of a teacher.
  • 13. Tasks carried out by supervised learning Supervised learning groups together a task of classification. The program is a good example of this because it’s been trained with many emails at the same time as their class. Another example is to predict a numeric value like the price of a flat, given a set of features (location, number of rooms, facilities) called predictors; this task is called regression.
  • 14. Supervised learning algorithms You should keep in mind that some regression algorithms can be used for classification as well, and vise versa. Some important supervised algorithms ◦ K-nearest neighbors ◦ Linear regression ◦ Neural network ◦ Support vector machines ◦ Logistic regression ◦ Decision trees and random forest
  • 15. Unsupervised learning Unsupervised learning occurs when an algorithm learns from plain examples without any associated response, leaving to the algorithm to determine the data patterns on its own. This type of algorithm tends to restructure the data into something else, such as new features that may represent a class or a new series of un-correlated values. They are quite useful in providing humans with insights into the meaning of data and new useful inputs to supervised machine learning algorithms.
  • 16. Unsupervised learning As a kind of learning, it resembles the methods humans use to figure out that certain objects or events are from the same class, such as by observing the degree of similarity between objects. Some recommendation systems that you find on the web in the form of marketing automation are based on this type of learning. In this type of learning the data is unlabeled.
  • 17. Unsupervised learning algorithms Some unsupervised learning algorithms includes; ◦Clustering: k-means, hierarchical cluster analysis ◦Association rule learning: Eclat, apriori ◦Visualization and dimensionality reduction: kernel PCA, t- distribution, PCA
  • 18. Examples of unsupervised learning suppose you’ve got many data on visitor, you can use one algorithm to detect groups with similar visitors. 65% of your visitors might be males who love watching movie in the evening, while 30% watch plays in the evening: Using clustering algorithm, we have the smaller groups. Secondly, for visualization algorithms, you will need to give them many data and unlabeled data as input, and then you will get 2D or 3D visualization as an output. Feature extraction takes place here.
  • 19. Reinforcement learning An Agent “AI system” will observe the environment, performs given actions, and then receive rewards in return. Here, the agent must learn by itself. You can find this type of learning in many robotics applications that learns how to walk.
  • 20. Semi-supervised learning where an incomplete training signal is given: a training set with some (often many) of the target outputs missing. There is a special case of this principle known as Transduction where the entire set of problem instances is known at learning time, except that part of the targets are missing.
  • 21. Bad and Insufficient quantity of Training Data Machine learning systems are not like children, who can distinguish apples and oranges in all sorts of colors and shapes, but they require lot of data to work effectively, whether you’re working with very simple programs and problems, or complex applications like image processing and speech recognition.
  • 22. Poor Quality Data If you are working with training data that is full of errors and outliers, this will make it very hard for the system to detect patterns, so it won’t work properly. So, if you want your program to work well, you must spend more time cleaning up your training data.
  • 23. Irrelevant features The system will only be able to learn if the training data contains enough features and data that aren’t too irrelevant. The most important part of any ML project is to develop good features. “feature engineering” Feature engineering follows this process: ◦ Feature selection: selecting the most useful features ◦ Feature extraction: combining existing features to provide more useful features. ◦ Creation of new features: creation of new features, based on data.
  • 24. Testing To ensure your model is working well and that models can generalize with new cases, you can try out new cases with it by putting the model in the environment and then monitoring how it will perform. This is good practice. You should divide your data into two set, one for training and the second for testing.
  • 25. Testing The generalization error is the rate of error by evaluation of your model on the test set. The value you get will tell you if your model is good enough, and if it will work properly. If the error rate is low, the model is good and will perform properly and vice versa. It is advisable to use 80% of your data for training and 20% for testing
  • 26. Overfitting the data Overgeneralization in machine learning is called “overfitting”. Overfitting occurs when the model is very complex for the amount of training data given. Solution Gather more data for “training data” Reduce the noise level Select one with fewer parameters
  • 27. Under-fitting the data This the opposite of overfitting. You will encounter this when the model is very simple to learn. For example, using the example of quality of life, real life is more complex than your model, so the predictions won’t yield the same, even in the training examples. Solution: ◦ Select the most powerful model, which has many parameters ◦ Feed the best features into your algorithms. Here, I’m referring to feature engineering ◦ Reduce the constraints on your model
  • 29. Software for this course Python’s popularity may be due to the increased development of deep learning frameworks available for this language recently, including TensorFlow, PyTorch, and Keras. As a language that has readable syntax and the ability to be used as a scripting language, Python proves to be powerful and straightforward both for preprocessing data and working with data directly. The scikit-learn machine learning library is built on top of several existing Python packages that Python developers may already be familiar with, namely NumPy, SciPy, and Matplotlib.
  • 30. Software for this course MATLAB makes machine learning easy. With tools and functions for handling big data, as well as apps to make machine learning accessible, MATLAB is an ideal environment for applying machine learning to your data analytics. With MATLAB, engineers and data scientists have immediate access to prebuilt functions, extensive toolboxes, and specialized apps for classification, regression, and clustering.