SlideShare a Scribd company logo
Machine Learning Models
and Linear Regression
Department of Computer Science
What is machine learning?
• "Field of study that gives computers the ability to learn without
being explicitly programmed“ -- Arthur Samuel (1959)
• "A computer program is said to learn from experience E with respect
to some class of tasks T and performance measure P, if its
performance at tasks in T, as measured by P, improves with
experience E.“ -- Tom Michel (1999)
Examples
• Database mining:
• Machine learning has recently become so big party because of the huge amount of
data being generated
• Large datasets from growth of automation web
• Sources of data include
• Web data (click-stream or click through data)
• Mine to understand users better
• Huge segment of silicon valley
• Medical records
• Electronic records -> turn records in knowledges
• Biological data
• Gene sequences, ML algorithms give a better understanding of human genome
• Engineering info
• Data from sensors, log reports, photos etc
• Applications that we cannot program by hand
• Autonomous helicopter
• Handwriting recognition
• This is very inexpensive because when you write an envelope, algorithms can
automatically route envelopes through the post
• Natural language processing (NLP)
• AI pertaining to language
• Computer vision
• AI pertaining vision
• Self customizing programs
• Netflix
• Amazon
• iTunes genius
• Take users info
• Learn based on your behavior
• Understand human learning and the brain
• If we can build systems that mimic (or try to mimic) how the brain works, this
may push our own understanding of the associated neurobiology
Types of learning algorithms
• Supervised learning
• Teach the computer how to do something, then let it use it;s new found
knowledge to do it
• Unsupervised learning
• Let the computer learn how to do something, and use this to determine
structure and patterns in data
• Reinforcement learning
Supervised learning - introduction
• Probably the most common problem type in machine learning
• Starting with an example
• How do we predict housing prices
• Collect data regarding housing prices and how they relate to size in feet
• Example problem: "Given this data, a friend has a house 750 square feet -
how much can they be expected to get?"
Unsupervised learning - introduction
• In unsupervised learning, we get unlabeled data
• Just told - here is a data set, can you structure it
• One way of doing this would be to cluster data into to groups
• This is a clustering algorithm
• Clustering algorithm
• Example of clustering algorithm
• Google news
• Groups news stories into cohesive groups
• Used in any other problems as well
• Genomics
• Microarray data
• Have a group of individuals
• On each measure expression of a gene
• Run algorithm to cluster individuals into types of people
• Organize computer clusters
• Identify potential weak spots or distribute workload effectively
• Social network analysis
• Customer data
• Astronomical data analysis
• Algorithms give amazing results
• Basically
• Can you automatically generate structure
• Because we don't give it the answer, it's unsupervised learning
RL
• An RL agent learns by interacting with its environment and observing the
results of these interactions. This mimics the fundamental way in which
humans (and animals alike) learn.
• The idea is commonly known as "cause and effect", and this undoubtedly is
the key to building up knowledge of our environment throughout our
lifetime.
• The "cause and effect" idea can be translated into the following steps for
an RL agent:
• The agent observes an input state
• An action is determined by a decision making function (policy)
• The action is performed
• The agent receives a scalar reward or reinforcement from the environment
• Information about the reward given for that state / action pair is recorded
Uses for Reinforcement Learning
• RL agents can learn without expert supervision, the type of problems
that are best suited to RL are complex problems where there appears
to be no obvious or easily programmable solution.
• Game playing - determining the best move to make in a game often depends
on a number of different factors, hence the number of possible states that
can exist in a particular game is usually very large.
• Control problems - such as elevator scheduling. Again, it is not obvious what
strategies would provide the best, most timely elevator service. For control
problems such as this, RL agents can be left to learn in a simulated
environment and eventually they will come up with good controlling policies.
Linear Regression with one variable
• Housing price data example used earlier
• Supervised learning regression problem
Linear regression
• (x,y) - single training example
• (xi, yj) - specific example (ith training example)
• i is an index to training set
• With our training set defined - how do we used it?
• Take training set
• Pass into a learning algorithm
• Algorithm outputs a function (denoted h ) (h = hypothesis)
• This function takes an input (e.g. size of new house)
• Tries to output the estimated value of Y
How do we represent hypothesis h ?
• Going to present h as;
• hθ(x) = θ0 + θ1x
• h(x) (shorthand)
• What does this mean?
• Means Y is a linear function of x!
• θi are parameters
• θ0 is zero condition
• θ1 is gradient
• This kind of function is a linear regression with one variable
• Also called univariate linear regression
Linear regression - implementation (cost function)
• A cost function lets us figure out how to fit the best straight line to our data
• Choosing values for θi (parameters)
• Different values give you different functions
• If θ0 is 1.5 and θ1 is 0 then we get straight line parallel with X along 1.5 @ y
• If θ1 is > 0 then we get a positive slope
Week 2 - ML models and Linear Regression.pptx
Week 2 - ML models and Linear Regression.pptx
• Based on our training set we want to generate parameters which
make the straight line
• Chosen these parameters so hθ(x) is close to y for our training examples
• Basically, uses xs in training set with hθ(x) to give output which is as close to the actual y
value as possible
• Think of hθ(x) as a "y imitator" - it tries to convert the x into y, and considering we
already have y we can evaluate how well hθ(x) does this
• To formalize this;
• We want to want to solve a minimization problem
• Minimize (hθ(x) - y)2
• i.e. minimize the difference between h(x) and y for each/any/every example
• Sum this over the training set
𝐽 θ0, θ1 =
1
2𝑚
𝑖=1
𝑚
(ℎ𝜃 𝑥 𝑖 − 𝑦 𝑖 )2
• Minimize squared different between predicted house price and actual
house price1/2m
• 1/m - means we determine the average
• 1/2m the 2 makes the math a bit easier, and doesn't change the constants we
determine at all (i.e. half the smallest value is still the smallest value!)
• Minimizing θ0/θ1 means we get the values of θ0 and θ1 which find on
average the minimal deviation of x from y when we use those parameters
in our hypothesis function
• And we want to minimize this cost function
• Our cost function is (because of the summartion term) inherently looking at ALL the
data in the training set at any time
• Hypothesis - is like your prediction machine, throw in an x value, get a
putative y value
• Cost - is a way to, using your training data, determine values for your θ
values which make the hypothesis as accurate as possible
• This cost function is also called the squared error cost function
• This cost function is reasonable choice for most regression functions
• Probably most commonly used function
Linear Regression Problem
A deeper insight into the cost function
𝐽 θ0, θ1 =
1
2𝑚
𝑖=1
𝑚
(ℎ𝜃 𝑥 𝑖 − 𝑦 𝑖 )2
• Generates a 3D surface plot where axis are
• X = θ1
• Z = θ0
• Y = J(θ0,θ1)
We can see that the height (y) indicates
the value of the cost function, so find where
y is at a minimum
• Instead of a surface plot we can use a contour figures/plots
• Set of ellipses in different colors
• Each colour is the same value of J(θ0, θ1), but obviously plot to different
locations because θ1 and θ0 will vary
• Imagine a bowl shape function coming out of the screen so the middle is the
concentric circles
• Each point (like the red one above) represents a pair of parameter
values for θ1 and θ0
• Our example here put the values at
• θ0 = ~800
• θ1 = ~-0.15
•What we really want is an efficient algorithm fro finding
the minimum for θ0 and θ1
Gradient descent algorithm
• Minimize cost function J
• Gradient descent
• Used all over machine learning for minimization
• Start by looking at a general J() function
• Problem
• We have J(θ0, θ1)
• We want to get min J(θ0, θ1)
• Gradient descent applies to more general functions
• J(θ0, θ1, θ2 .... θn)
• min J(θ0, θ1, θ2 .... θn)
How does Gradient Descent work?
• Start with initial guesses
• Start at 0,0 (or any other value)
• Keeping changing θ0 and θ1 a little bit to try and reduce J(θ0,θ1)
• Each time you change the parameters, you select the gradient which reduces J(θ0,θ1) the most
possible
• Repeat
• Do so until you converge to a local minimum
• Has an interesting property
• Where you start can determine which minimum you end up
• Here we can see one initialization point led to one local minimum
• The other led to a different one
Formal Definition
• Do the following until covergence
• What does this all mean?
• Update θj by setting it to (θj - α) times the partial derivative of the
cost function with respect to θj
• Here α is the learning rate.
• If α is big have an aggressive gradient descent
• If α is small take tiny steps
How this gradient descent algorithm is
implemented?
• Do this for θ0 and θ1
• For j = 0 and j = 1 means we simultaneously update both
• How do we do this?
• Compute the right hand side for both θ0 and θ1
• So we need a temp value
• Then, update θ0 and θ1 at the same time
• If you implement the non-simultaneous update it's not gradient descent,
and will behave weirdly
• But it might look sort of right - so it's important to remember this!
Linear regression with gradient descent
• Apply gradient descent to minimize the squared error cost function J(θ0, θ1)
• Now we have a partial derivative
Week 2 - ML models and Linear Regression.pptx
• The linear regression cost function is always a convex function -
always has a single minimum
• Bowl shaped
• One global optima
• So gradient descent will always converge to global optima
• Initialize values to
• θ0 = 900
• θ1 = -0.1
• End up at a global minimum

More Related Content

PDF
CS229 Machine Learning Lecture Notes
PDF
Machine learning (1)
PPTX
Linear Regression.pptx
PPTX
2. Linear regression with one variable.pptx
PDF
Machine learning
PPTX
Coursera 1week
PPTX
Lecture 8 about data mining and how to use it.pptx
PDF
X01 Supervised learning problem linear regression one feature theorie
CS229 Machine Learning Lecture Notes
Machine learning (1)
Linear Regression.pptx
2. Linear regression with one variable.pptx
Machine learning
Coursera 1week
Lecture 8 about data mining and how to use it.pptx
X01 Supervised learning problem linear regression one feature theorie

Similar to Week 2 - ML models and Linear Regression.pptx (20)

PDF
Artificial Intelligence Course: Linear models
PDF
ML_Lec4 introduction to linear regression.pdf
PPTX
Bootcamp of new world to taken seriously
PDF
Regression
PDF
ML_Lec3 introduction to regression problems.pdf
PPTX
06-01 Machine Learning and Linear Regression.pptx
PDF
Lecture 5 - Linear Regression Linear Regression
PDF
A Novel Methodology to Implement Optimization Algorithms in Machine Learning
PDF
Presentation about the Linear Regression.pdf
PDF
Deep learning concepts
PDF
Regression_1.pdf
PDF
Introduction to machine learning
PDF
Lecture 5 - Gradient Descent, a lecture in subject module Statistical & Machi...
PPTX
Ml ppt at
PPTX
Introduction to machine learning and model building using linear regression
PPTX
Essential of ML 1st Lecture IIT Kharagpur
PDF
An introduction to machine learning for particle physics
PPTX
ML Lec 1 (1).pptx
PPTX
PREDICT 422 - Module 1.pptx
PDF
Lect 8 learning types (M.L.).pdf
Artificial Intelligence Course: Linear models
ML_Lec4 introduction to linear regression.pdf
Bootcamp of new world to taken seriously
Regression
ML_Lec3 introduction to regression problems.pdf
06-01 Machine Learning and Linear Regression.pptx
Lecture 5 - Linear Regression Linear Regression
A Novel Methodology to Implement Optimization Algorithms in Machine Learning
Presentation about the Linear Regression.pdf
Deep learning concepts
Regression_1.pdf
Introduction to machine learning
Lecture 5 - Gradient Descent, a lecture in subject module Statistical & Machi...
Ml ppt at
Introduction to machine learning and model building using linear regression
Essential of ML 1st Lecture IIT Kharagpur
An introduction to machine learning for particle physics
ML Lec 1 (1).pptx
PREDICT 422 - Module 1.pptx
Lect 8 learning types (M.L.).pdf
Ad

Recently uploaded (20)

PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
gpt5_lecture_notes_comprehensive_20250812015547.pdf
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Machine learning based COVID-19 study performance prediction
PDF
Encapsulation_ Review paper, used for researhc scholars
PPTX
Big Data Technologies - Introduction.pptx
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
Electronic commerce courselecture one. Pdf
PPT
Teaching material agriculture food technology
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PDF
NewMind AI Weekly Chronicles - August'25-Week II
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Unlocking AI with Model Context Protocol (MCP)
PPTX
Machine Learning_overview_presentation.pptx
DOCX
The AUB Centre for AI in Media Proposal.docx
Review of recent advances in non-invasive hemoglobin estimation
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
gpt5_lecture_notes_comprehensive_20250812015547.pdf
Building Integrated photovoltaic BIPV_UPV.pdf
Digital-Transformation-Roadmap-for-Companies.pptx
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Machine learning based COVID-19 study performance prediction
Encapsulation_ Review paper, used for researhc scholars
Big Data Technologies - Introduction.pptx
Programs and apps: productivity, graphics, security and other tools
Electronic commerce courselecture one. Pdf
Teaching material agriculture food technology
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
NewMind AI Weekly Chronicles - August'25-Week II
20250228 LYD VKU AI Blended-Learning.pptx
Chapter 3 Spatial Domain Image Processing.pdf
Reach Out and Touch Someone: Haptics and Empathic Computing
Unlocking AI with Model Context Protocol (MCP)
Machine Learning_overview_presentation.pptx
The AUB Centre for AI in Media Proposal.docx
Ad

Week 2 - ML models and Linear Regression.pptx

  • 1. Machine Learning Models and Linear Regression Department of Computer Science
  • 2. What is machine learning? • "Field of study that gives computers the ability to learn without being explicitly programmed“ -- Arthur Samuel (1959) • "A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E.“ -- Tom Michel (1999)
  • 3. Examples • Database mining: • Machine learning has recently become so big party because of the huge amount of data being generated • Large datasets from growth of automation web • Sources of data include • Web data (click-stream or click through data) • Mine to understand users better • Huge segment of silicon valley • Medical records • Electronic records -> turn records in knowledges • Biological data • Gene sequences, ML algorithms give a better understanding of human genome • Engineering info • Data from sensors, log reports, photos etc
  • 4. • Applications that we cannot program by hand • Autonomous helicopter • Handwriting recognition • This is very inexpensive because when you write an envelope, algorithms can automatically route envelopes through the post • Natural language processing (NLP) • AI pertaining to language • Computer vision • AI pertaining vision
  • 5. • Self customizing programs • Netflix • Amazon • iTunes genius • Take users info • Learn based on your behavior • Understand human learning and the brain • If we can build systems that mimic (or try to mimic) how the brain works, this may push our own understanding of the associated neurobiology
  • 6. Types of learning algorithms • Supervised learning • Teach the computer how to do something, then let it use it;s new found knowledge to do it • Unsupervised learning • Let the computer learn how to do something, and use this to determine structure and patterns in data • Reinforcement learning
  • 7. Supervised learning - introduction • Probably the most common problem type in machine learning • Starting with an example • How do we predict housing prices • Collect data regarding housing prices and how they relate to size in feet • Example problem: "Given this data, a friend has a house 750 square feet - how much can they be expected to get?"
  • 8. Unsupervised learning - introduction • In unsupervised learning, we get unlabeled data • Just told - here is a data set, can you structure it • One way of doing this would be to cluster data into to groups • This is a clustering algorithm • Clustering algorithm • Example of clustering algorithm • Google news • Groups news stories into cohesive groups • Used in any other problems as well • Genomics • Microarray data • Have a group of individuals • On each measure expression of a gene • Run algorithm to cluster individuals into types of people
  • 9. • Organize computer clusters • Identify potential weak spots or distribute workload effectively • Social network analysis • Customer data • Astronomical data analysis • Algorithms give amazing results • Basically • Can you automatically generate structure • Because we don't give it the answer, it's unsupervised learning
  • 10. RL • An RL agent learns by interacting with its environment and observing the results of these interactions. This mimics the fundamental way in which humans (and animals alike) learn. • The idea is commonly known as "cause and effect", and this undoubtedly is the key to building up knowledge of our environment throughout our lifetime. • The "cause and effect" idea can be translated into the following steps for an RL agent: • The agent observes an input state • An action is determined by a decision making function (policy) • The action is performed • The agent receives a scalar reward or reinforcement from the environment • Information about the reward given for that state / action pair is recorded
  • 11. Uses for Reinforcement Learning • RL agents can learn without expert supervision, the type of problems that are best suited to RL are complex problems where there appears to be no obvious or easily programmable solution. • Game playing - determining the best move to make in a game often depends on a number of different factors, hence the number of possible states that can exist in a particular game is usually very large. • Control problems - such as elevator scheduling. Again, it is not obvious what strategies would provide the best, most timely elevator service. For control problems such as this, RL agents can be left to learn in a simulated environment and eventually they will come up with good controlling policies.
  • 12. Linear Regression with one variable • Housing price data example used earlier • Supervised learning regression problem
  • 14. • (x,y) - single training example • (xi, yj) - specific example (ith training example) • i is an index to training set
  • 15. • With our training set defined - how do we used it? • Take training set • Pass into a learning algorithm • Algorithm outputs a function (denoted h ) (h = hypothesis) • This function takes an input (e.g. size of new house) • Tries to output the estimated value of Y
  • 16. How do we represent hypothesis h ? • Going to present h as; • hθ(x) = θ0 + θ1x • h(x) (shorthand) • What does this mean? • Means Y is a linear function of x! • θi are parameters • θ0 is zero condition • θ1 is gradient • This kind of function is a linear regression with one variable • Also called univariate linear regression
  • 17. Linear regression - implementation (cost function) • A cost function lets us figure out how to fit the best straight line to our data • Choosing values for θi (parameters) • Different values give you different functions • If θ0 is 1.5 and θ1 is 0 then we get straight line parallel with X along 1.5 @ y • If θ1 is > 0 then we get a positive slope
  • 20. • Based on our training set we want to generate parameters which make the straight line • Chosen these parameters so hθ(x) is close to y for our training examples • Basically, uses xs in training set with hθ(x) to give output which is as close to the actual y value as possible • Think of hθ(x) as a "y imitator" - it tries to convert the x into y, and considering we already have y we can evaluate how well hθ(x) does this • To formalize this; • We want to want to solve a minimization problem • Minimize (hθ(x) - y)2 • i.e. minimize the difference between h(x) and y for each/any/every example • Sum this over the training set
  • 21. 𝐽 θ0, θ1 = 1 2𝑚 𝑖=1 𝑚 (ℎ𝜃 𝑥 𝑖 − 𝑦 𝑖 )2 • Minimize squared different between predicted house price and actual house price1/2m • 1/m - means we determine the average • 1/2m the 2 makes the math a bit easier, and doesn't change the constants we determine at all (i.e. half the smallest value is still the smallest value!) • Minimizing θ0/θ1 means we get the values of θ0 and θ1 which find on average the minimal deviation of x from y when we use those parameters in our hypothesis function • And we want to minimize this cost function • Our cost function is (because of the summartion term) inherently looking at ALL the data in the training set at any time
  • 22. • Hypothesis - is like your prediction machine, throw in an x value, get a putative y value • Cost - is a way to, using your training data, determine values for your θ values which make the hypothesis as accurate as possible • This cost function is also called the squared error cost function • This cost function is reasonable choice for most regression functions • Probably most commonly used function
  • 24. A deeper insight into the cost function 𝐽 θ0, θ1 = 1 2𝑚 𝑖=1 𝑚 (ℎ𝜃 𝑥 𝑖 − 𝑦 𝑖 )2 • Generates a 3D surface plot where axis are • X = θ1 • Z = θ0 • Y = J(θ0,θ1) We can see that the height (y) indicates the value of the cost function, so find where y is at a minimum
  • 25. • Instead of a surface plot we can use a contour figures/plots • Set of ellipses in different colors • Each colour is the same value of J(θ0, θ1), but obviously plot to different locations because θ1 and θ0 will vary • Imagine a bowl shape function coming out of the screen so the middle is the concentric circles • Each point (like the red one above) represents a pair of parameter values for θ1 and θ0 • Our example here put the values at • θ0 = ~800 • θ1 = ~-0.15 •What we really want is an efficient algorithm fro finding the minimum for θ0 and θ1
  • 26. Gradient descent algorithm • Minimize cost function J • Gradient descent • Used all over machine learning for minimization • Start by looking at a general J() function • Problem • We have J(θ0, θ1) • We want to get min J(θ0, θ1) • Gradient descent applies to more general functions • J(θ0, θ1, θ2 .... θn) • min J(θ0, θ1, θ2 .... θn)
  • 27. How does Gradient Descent work? • Start with initial guesses • Start at 0,0 (or any other value) • Keeping changing θ0 and θ1 a little bit to try and reduce J(θ0,θ1) • Each time you change the parameters, you select the gradient which reduces J(θ0,θ1) the most possible • Repeat • Do so until you converge to a local minimum • Has an interesting property • Where you start can determine which minimum you end up • Here we can see one initialization point led to one local minimum • The other led to a different one
  • 28. Formal Definition • Do the following until covergence • What does this all mean? • Update θj by setting it to (θj - α) times the partial derivative of the cost function with respect to θj • Here α is the learning rate. • If α is big have an aggressive gradient descent • If α is small take tiny steps
  • 29. How this gradient descent algorithm is implemented? • Do this for θ0 and θ1 • For j = 0 and j = 1 means we simultaneously update both • How do we do this? • Compute the right hand side for both θ0 and θ1 • So we need a temp value • Then, update θ0 and θ1 at the same time • If you implement the non-simultaneous update it's not gradient descent, and will behave weirdly • But it might look sort of right - so it's important to remember this!
  • 30. Linear regression with gradient descent
  • 31. • Apply gradient descent to minimize the squared error cost function J(θ0, θ1) • Now we have a partial derivative
  • 33. • The linear regression cost function is always a convex function - always has a single minimum • Bowl shaped • One global optima • So gradient descent will always converge to global optima
  • 34. • Initialize values to • θ0 = 900 • θ1 = -0.1 • End up at a global minimum

Editor's Notes

  • #19: the loss function is to capture the difference between the actual and predicted values for a single record whereas cost functions aggregate the difference for the entire training dataset.