SlideShare a Scribd company logo
DA 5330 – Advanced Machine Learning
Applications
Lecture 11 – Advanced Learning Techniques
Maninda Edirisooriya
manindaw@uom.lk
End-to-End Learning
• In earlier time intermediate features were generated and they were used again for
training another ML model
• But, when you have more data, it is much accurate to train from original data against the
result information we expect
Source: https://www.youtube.com/watch?v=bkVCAk9Nsss
Multi-Task Learning
• Different tasks (e.g.: News Summarization, News Sentiment Analysis)
need different labeled datasets which are rare
• The available datasets may be insufficient in size to train a model with
a sufficient level of accuracy level
• When the business need is updated new ML tasks emerge where
there are no labeled datasets to train
• In order to address the above problems we need to have a way to
learn more than one task at a time where a new task can be possible
to be trained with the same model without much data and with a
higher speed, which is known as Multi-Task Learning
Examples for Multi-Task Learning
Source: https://www.youtube.com/watch?v=bkVCAk9Nsss
Assumption of Multi-Task Learning
• In order to learn in the multi-task manner each task should share some
structure
• Otherwise, single-task learning is better to be used
• Fortunately, most of the task have common structures. E.g.:
• Share the same laws of physics
• Languages like English and French share common patterns due to historical reasons
• Psychology and physiology of humans are very similar
Source: https://www.youtube.com/watch?v=bkVCAk9Nsss
Notations of Multi-Task Learning
• In multi-task learning, a new variable zi known as Task Descriptor is added
to the approximation function which is generally a one-hot encoded vector
• Task descriptor encodes the task
Source: https://www.youtube.com/watch?v=vI46tzt4O7Y
Encoding the Task Descriptor in NN
Source: https://www.youtube.com/watch?v=vI46tzt4O7Y
Weighted Multi-Task Learning
• Instead of giving an equal weight to each of the task during the training
different weights can be given on different criteria like,
• Manually setting a priority based weight
• Dynamically adjusting during the training process
• This weight is given to the loss function during the optimization
Source: https://www.youtube.com/watch?v=vI46tzt4O7Y
Training With Vanilla Multi-Task Learning
Source: https://www.youtube.com/watch?v=vI46tzt4O7Y
Introduction to Transfer Learning
• Transfer Learning refers to the process of leveraging knowledge
gained from solving one problem and applying it to a different, but
related, problem
• Unlike in traditional ML, where models are trained to perform a
specific task on a specific dataset, Transfer Learning allows to transfer
knowledge from one task/domain to another. This improves the
performance of the target task, especially when labeled data for the
target task is limited or expensive to obtain
• E.g.: In order to train a cat image classifier, you can use a pre-trained
CNN using the huge ImageNet dataset with many miscellaneous
images and then train only the last few layers of the CNN, with the
available cat image dataset which is smaller in size
Motivation of Transfer Learning
• Scarcity of Labeled Data: Annotated datasets required for training
machine learning models are often scarce and expensive to acquire.
Transfer learning mitigates this issue by utilizing knowledge from
related tasks or domains
• Model Generalization: By transferring knowledge from a pre-trained
model, the model can generalize better to new tasks or domains,
even with limited data
• Efficiency: Transfer learning can significantly reduce the
computational resources and time required for training models from
scratch, making it a practical approach in various real-world scenarios
Types of Transfer Learning
1.Inductive Transfer Learning: Involves transferring knowledge from a source
domain to a target domain by learning a new task in the target domain using
the knowledge gained from solving a related task in the source domain
Example: Suppose you have a model trained to classify different types of
fruits based on images in one dataset (source domain). You can then use the
knowledge gained from this task to classify different types of vegetables
based on images in a separate dataset (target domain)
2.Transductive Transfer Learning: Focuses on adapting a model to a new
domain where the target data distribution may differ from the source domain.
Instead of learning a new task, transductive transfer learning aims to adapt
the model to perform well on the target domain.
Example: Let's say you have a model trained on data from one country
(source domain) to predict housing prices. However, when you try to apply
this model to a different country (target domain), you encounter differences
in housing market dynamics. Transductive transfer learning involves
adapting the model to the target domain's characteristics without explicitly
learning a new task
Pre-Trained Models
• Specific models can be developed by training available small labeled
data with supervised learning on top of the commonly available pre-
trained models
• Large generic datasets like ImageNet and GPT models are some of the
examples for the pre-trained models
• ImageNet is an example for a large labeled dataset
• However, there are many unsupervised pre-trained models available
as open source content such as large language models like GPT
models and BERT models
Transfer Learning via Fine Tuning
• The pre-trained model for source data is trained again for the target
domain data
• Sometimes, all the layers of the NN are trained,
• Either a small Learning Rate is used for all the layers
• Or smaller Learning Rates are used for earlier layers
• Sometimes, train only the last layers while freezing the earlier layers and
gradually the unfreezing the earlier layers
• Sometimes, only the last one or few layers are trained while other layers
keeping frozen
• When the target task is simpler than the source task no need to update earlier layers
• Best techniques/hyperparameters are selected with cross-validation
Transfer Learning via Fine Tuning
• Overfitting can be mitigated by Early Stopping technique
• New layers can be added and initialized with Random Initialization
while keeping the earlier layers as they are
Unintuitive Facts about Transfer Learning
• When the pre-training is done with unsupervised ML and fine tuned with supervised ML
(e.g. Transformer models), you don’t need that much diverse data to pre-train
• You can use the same target dataset for pre-training without much sacrifice of the
accuracy!
• This may change when both pre-training and fine tuning is done with supervised ML
Source: https://www.youtube.com/watch?v=bVjCjdq06R4
Unintuitive Facts about Transfer Learning
• Selecting the last layer of a NN may not be the best layer to be fine tuned
• For different scenarios some middle layers may perform better when selected than a full
fine tuning
Source: https://www.youtube.com/watch?v=bVjCjdq06R4
Rule of Thumb for Transfer Learning
Source: https://www.youtube.com/watch?v=bVjCjdq06R4
Meta Learning
• “Given a set of training tasks, can we optimize for the ability to learn
these tasks quickly, so that we can learn new tasks quickly too?”
• This is what is achieved by Meta Learning
• In other words optimization for transferability is known as Meta Learning
Source: https://www.youtube.com/watch?v=bVjCjdq06R4
Two Views of Meta Learning Algorithms
Source: https://www.youtube.com/watch?v=bVjCjdq06R4
Bayes View of Meta Learning
• yi,j label value probabilities are
dependent on 𝜙𝑖 parameter probabilities
of the model of a task
• All the 𝜙𝑖 parameter probabilities for all
the tasks are dependent on the meta
level parameters 𝜃
• If 𝜙𝑖 are independent for each task i,
then 𝜃 has no information and vice versa
• Learning for 𝜃 is the idea of Meta
Learning
Source: https://www.youtube.com/watch?v=bVjCjdq06R4
Mechanistic View of Meta Learning
• yi,j label value probabilities are dependent on 𝜙𝑖 parameter probabilities of the model of a task
• All the 𝜙𝑖 parameter probabilities for all the tasks are dependent on the meta level
parameters 𝜃
• If 𝜙𝑖 are independent for each task i, then 𝜃 has no information and vice versa
• Learning for 𝜃 is the idea of Meta Learning
Source: https://www.youtube.com/watch?v=bVjCjdq06R4
Questions?

More Related Content

PPTX
Transfer Learning: Breve introducción a modelos pre-entrenados.
PPTX
MODULE 4 AAI_______________________.pptx
PPTX
Deep Learning Intoductions along with Examples.pptx
PPTX
MaLAI_Hyderabad presentation
PPTX
OReilly AI Transfer Learning
PDF
How to use transfer learning to bootstrap image classification and question a...
PDF
Transfer Learning and Domain Adaptation - Ramon Morros - UPC Barcelona 2018
PPTX
transferlearning.pptx
Transfer Learning: Breve introducción a modelos pre-entrenados.
MODULE 4 AAI_______________________.pptx
Deep Learning Intoductions along with Examples.pptx
MaLAI_Hyderabad presentation
OReilly AI Transfer Learning
How to use transfer learning to bootstrap image classification and question a...
Transfer Learning and Domain Adaptation - Ramon Morros - UPC Barcelona 2018
transferlearning.pptx

Similar to Lecture 11 - Advance Learning Techniques (20)

PDF
Don't Start from Scratch: Transfer Learning for Novel Computer Vision Problem...
PDF
Transfer Learning and Domain Adaptation (DLAI D5L2 2017 UPC Deep Learning for...
PDF
NTU DBME5028 Week8 Transfer Learning
PDF
Data science (machine learning , statistics)
PDF
ODSC East: Effective Transfer Learning for NLP
PPTX
Nuts and Bolts of Transfer Learning.pptx
PDF
Transfer Learning -- The Next Frontier for Machine Learning
PDF
Transfer_Learning_for_Natural_Language_P_v3_MEAP.pdf
PPTX
Deep Learning: Towards General Artificial Intelligence
PDF
Transfer Learning (D2L4 Insight@DCU Machine Learning Workshop 2017)
PPTX
Deep learning short introduction
PDF
Learning how to learn
PDF
Separating Hype from Reality in Deep Learning with Sameer Farooqui
PDF
Frontier in reinforcement learning
PDF
Multi task learning stepping away from narrow expert models 7.11.18
PPTX
Transfer learning with real world applications in deep learning
PDF
Multi-Task Learning With Deep Neural Networks
PDF
Deep-learning-for-computer-vision-applications-using-matlab.pdf
PDF
PyData2015
Don't Start from Scratch: Transfer Learning for Novel Computer Vision Problem...
Transfer Learning and Domain Adaptation (DLAI D5L2 2017 UPC Deep Learning for...
NTU DBME5028 Week8 Transfer Learning
Data science (machine learning , statistics)
ODSC East: Effective Transfer Learning for NLP
Nuts and Bolts of Transfer Learning.pptx
Transfer Learning -- The Next Frontier for Machine Learning
Transfer_Learning_for_Natural_Language_P_v3_MEAP.pdf
Deep Learning: Towards General Artificial Intelligence
Transfer Learning (D2L4 Insight@DCU Machine Learning Workshop 2017)
Deep learning short introduction
Learning how to learn
Separating Hype from Reality in Deep Learning with Sameer Farooqui
Frontier in reinforcement learning
Multi task learning stepping away from narrow expert models 7.11.18
Transfer learning with real world applications in deep learning
Multi-Task Learning With Deep Neural Networks
Deep-learning-for-computer-vision-applications-using-matlab.pdf
PyData2015
Ad

More from Maninda Edirisooriya (20)

PDF
Lecture - 10 Transformer Model, Motivation to Transformers, Principles, and ...
PDF
Lecture 9 - Deep Sequence Models, Learn Recurrent Neural Networks (RNN), GRU ...
PDF
Extra Lecture - Support Vector Machines (SVM), a lecture in subject module St...
PDF
Lecture 11 - KNN and Clustering, a lecture in subject module Statistical & Ma...
PDF
Lecture 10 - Model Testing and Evaluation, a lecture in subject module Statis...
PDF
Lecture 9 - Decision Trees and Ensemble Methods, a lecture in subject module ...
PDF
Lecture 8 - Feature Engineering and Optimization, a lecture in subject module...
PDF
Lecture 7 - Bias, Variance and Regularization, a lecture in subject module St...
PDF
Lecture 6 - Logistic Regression, a lecture in subject module Statistical & Ma...
PDF
Lecture 5 - Gradient Descent, a lecture in subject module Statistical & Machi...
PDF
Lecture 4 - Linear Regression, a lecture in subject module Statistical & Mach...
PDF
Lecture 3 - Exploratory Data Analytics (EDA), a lecture in subject module Sta...
PDF
Lecture 2 - Introduction to Machine Learning, a lecture in subject module Sta...
PDF
Analyzing the effectiveness of mobile and web channels using WSO2 BAM
PDF
WSO2 BAM - Your big data toolbox
PDF
Training Report
PDF
GViz - Project Report
PPTX
PPT
Hafnium impact 2008
PPTX
Lecture - 10 Transformer Model, Motivation to Transformers, Principles, and ...
Lecture 9 - Deep Sequence Models, Learn Recurrent Neural Networks (RNN), GRU ...
Extra Lecture - Support Vector Machines (SVM), a lecture in subject module St...
Lecture 11 - KNN and Clustering, a lecture in subject module Statistical & Ma...
Lecture 10 - Model Testing and Evaluation, a lecture in subject module Statis...
Lecture 9 - Decision Trees and Ensemble Methods, a lecture in subject module ...
Lecture 8 - Feature Engineering and Optimization, a lecture in subject module...
Lecture 7 - Bias, Variance and Regularization, a lecture in subject module St...
Lecture 6 - Logistic Regression, a lecture in subject module Statistical & Ma...
Lecture 5 - Gradient Descent, a lecture in subject module Statistical & Machi...
Lecture 4 - Linear Regression, a lecture in subject module Statistical & Mach...
Lecture 3 - Exploratory Data Analytics (EDA), a lecture in subject module Sta...
Lecture 2 - Introduction to Machine Learning, a lecture in subject module Sta...
Analyzing the effectiveness of mobile and web channels using WSO2 BAM
WSO2 BAM - Your big data toolbox
Training Report
GViz - Project Report
Hafnium impact 2008
Ad

Recently uploaded (20)

DOCX
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
PDF
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
PDF
Enhancing Cyber Defense Against Zero-Day Attacks using Ensemble Neural Networks
PPTX
IOT PPTs Week 10 Lecture Material.pptx of NPTEL Smart Cities contd
PDF
Automation-in-Manufacturing-Chapter-Introduction.pdf
PDF
Operating System & Kernel Study Guide-1 - converted.pdf
PDF
PPT on Performance Review to get promotions
PPTX
Internet of Things (IOT) - A guide to understanding
PPTX
Recipes for Real Time Voice AI WebRTC, SLMs and Open Source Software.pptx
PPTX
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
PDF
Well-logging-methods_new................
PDF
Embodied AI: Ushering in the Next Era of Intelligent Systems
PDF
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
PPTX
Construction Project Organization Group 2.pptx
PDF
Model Code of Practice - Construction Work - 21102022 .pdf
PPTX
OOP with Java - Java Introduction (Basics)
PPTX
Foundation to blockchain - A guide to Blockchain Tech
PDF
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
PPTX
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
PDF
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
Enhancing Cyber Defense Against Zero-Day Attacks using Ensemble Neural Networks
IOT PPTs Week 10 Lecture Material.pptx of NPTEL Smart Cities contd
Automation-in-Manufacturing-Chapter-Introduction.pdf
Operating System & Kernel Study Guide-1 - converted.pdf
PPT on Performance Review to get promotions
Internet of Things (IOT) - A guide to understanding
Recipes for Real Time Voice AI WebRTC, SLMs and Open Source Software.pptx
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
Well-logging-methods_new................
Embodied AI: Ushering in the Next Era of Intelligent Systems
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
Construction Project Organization Group 2.pptx
Model Code of Practice - Construction Work - 21102022 .pdf
OOP with Java - Java Introduction (Basics)
Foundation to blockchain - A guide to Blockchain Tech
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk

Lecture 11 - Advance Learning Techniques

  • 1. DA 5330 – Advanced Machine Learning Applications Lecture 11 – Advanced Learning Techniques Maninda Edirisooriya manindaw@uom.lk
  • 2. End-to-End Learning • In earlier time intermediate features were generated and they were used again for training another ML model • But, when you have more data, it is much accurate to train from original data against the result information we expect Source: https://www.youtube.com/watch?v=bkVCAk9Nsss
  • 3. Multi-Task Learning • Different tasks (e.g.: News Summarization, News Sentiment Analysis) need different labeled datasets which are rare • The available datasets may be insufficient in size to train a model with a sufficient level of accuracy level • When the business need is updated new ML tasks emerge where there are no labeled datasets to train • In order to address the above problems we need to have a way to learn more than one task at a time where a new task can be possible to be trained with the same model without much data and with a higher speed, which is known as Multi-Task Learning
  • 4. Examples for Multi-Task Learning Source: https://www.youtube.com/watch?v=bkVCAk9Nsss
  • 5. Assumption of Multi-Task Learning • In order to learn in the multi-task manner each task should share some structure • Otherwise, single-task learning is better to be used • Fortunately, most of the task have common structures. E.g.: • Share the same laws of physics • Languages like English and French share common patterns due to historical reasons • Psychology and physiology of humans are very similar Source: https://www.youtube.com/watch?v=bkVCAk9Nsss
  • 6. Notations of Multi-Task Learning • In multi-task learning, a new variable zi known as Task Descriptor is added to the approximation function which is generally a one-hot encoded vector • Task descriptor encodes the task Source: https://www.youtube.com/watch?v=vI46tzt4O7Y
  • 7. Encoding the Task Descriptor in NN Source: https://www.youtube.com/watch?v=vI46tzt4O7Y
  • 8. Weighted Multi-Task Learning • Instead of giving an equal weight to each of the task during the training different weights can be given on different criteria like, • Manually setting a priority based weight • Dynamically adjusting during the training process • This weight is given to the loss function during the optimization Source: https://www.youtube.com/watch?v=vI46tzt4O7Y
  • 9. Training With Vanilla Multi-Task Learning Source: https://www.youtube.com/watch?v=vI46tzt4O7Y
  • 10. Introduction to Transfer Learning • Transfer Learning refers to the process of leveraging knowledge gained from solving one problem and applying it to a different, but related, problem • Unlike in traditional ML, where models are trained to perform a specific task on a specific dataset, Transfer Learning allows to transfer knowledge from one task/domain to another. This improves the performance of the target task, especially when labeled data for the target task is limited or expensive to obtain • E.g.: In order to train a cat image classifier, you can use a pre-trained CNN using the huge ImageNet dataset with many miscellaneous images and then train only the last few layers of the CNN, with the available cat image dataset which is smaller in size
  • 11. Motivation of Transfer Learning • Scarcity of Labeled Data: Annotated datasets required for training machine learning models are often scarce and expensive to acquire. Transfer learning mitigates this issue by utilizing knowledge from related tasks or domains • Model Generalization: By transferring knowledge from a pre-trained model, the model can generalize better to new tasks or domains, even with limited data • Efficiency: Transfer learning can significantly reduce the computational resources and time required for training models from scratch, making it a practical approach in various real-world scenarios
  • 12. Types of Transfer Learning 1.Inductive Transfer Learning: Involves transferring knowledge from a source domain to a target domain by learning a new task in the target domain using the knowledge gained from solving a related task in the source domain Example: Suppose you have a model trained to classify different types of fruits based on images in one dataset (source domain). You can then use the knowledge gained from this task to classify different types of vegetables based on images in a separate dataset (target domain) 2.Transductive Transfer Learning: Focuses on adapting a model to a new domain where the target data distribution may differ from the source domain. Instead of learning a new task, transductive transfer learning aims to adapt the model to perform well on the target domain. Example: Let's say you have a model trained on data from one country (source domain) to predict housing prices. However, when you try to apply this model to a different country (target domain), you encounter differences in housing market dynamics. Transductive transfer learning involves adapting the model to the target domain's characteristics without explicitly learning a new task
  • 13. Pre-Trained Models • Specific models can be developed by training available small labeled data with supervised learning on top of the commonly available pre- trained models • Large generic datasets like ImageNet and GPT models are some of the examples for the pre-trained models • ImageNet is an example for a large labeled dataset • However, there are many unsupervised pre-trained models available as open source content such as large language models like GPT models and BERT models
  • 14. Transfer Learning via Fine Tuning • The pre-trained model for source data is trained again for the target domain data • Sometimes, all the layers of the NN are trained, • Either a small Learning Rate is used for all the layers • Or smaller Learning Rates are used for earlier layers • Sometimes, train only the last layers while freezing the earlier layers and gradually the unfreezing the earlier layers • Sometimes, only the last one or few layers are trained while other layers keeping frozen • When the target task is simpler than the source task no need to update earlier layers • Best techniques/hyperparameters are selected with cross-validation
  • 15. Transfer Learning via Fine Tuning • Overfitting can be mitigated by Early Stopping technique • New layers can be added and initialized with Random Initialization while keeping the earlier layers as they are
  • 16. Unintuitive Facts about Transfer Learning • When the pre-training is done with unsupervised ML and fine tuned with supervised ML (e.g. Transformer models), you don’t need that much diverse data to pre-train • You can use the same target dataset for pre-training without much sacrifice of the accuracy! • This may change when both pre-training and fine tuning is done with supervised ML Source: https://www.youtube.com/watch?v=bVjCjdq06R4
  • 17. Unintuitive Facts about Transfer Learning • Selecting the last layer of a NN may not be the best layer to be fine tuned • For different scenarios some middle layers may perform better when selected than a full fine tuning Source: https://www.youtube.com/watch?v=bVjCjdq06R4
  • 18. Rule of Thumb for Transfer Learning Source: https://www.youtube.com/watch?v=bVjCjdq06R4
  • 19. Meta Learning • “Given a set of training tasks, can we optimize for the ability to learn these tasks quickly, so that we can learn new tasks quickly too?” • This is what is achieved by Meta Learning • In other words optimization for transferability is known as Meta Learning Source: https://www.youtube.com/watch?v=bVjCjdq06R4
  • 20. Two Views of Meta Learning Algorithms Source: https://www.youtube.com/watch?v=bVjCjdq06R4
  • 21. Bayes View of Meta Learning • yi,j label value probabilities are dependent on 𝜙𝑖 parameter probabilities of the model of a task • All the 𝜙𝑖 parameter probabilities for all the tasks are dependent on the meta level parameters 𝜃 • If 𝜙𝑖 are independent for each task i, then 𝜃 has no information and vice versa • Learning for 𝜃 is the idea of Meta Learning Source: https://www.youtube.com/watch?v=bVjCjdq06R4
  • 22. Mechanistic View of Meta Learning • yi,j label value probabilities are dependent on 𝜙𝑖 parameter probabilities of the model of a task • All the 𝜙𝑖 parameter probabilities for all the tasks are dependent on the meta level parameters 𝜃 • If 𝜙𝑖 are independent for each task i, then 𝜃 has no information and vice versa • Learning for 𝜃 is the idea of Meta Learning Source: https://www.youtube.com/watch?v=bVjCjdq06R4