SlideShare a Scribd company logo
Structured Prediction With Reinforcement
Learning
Guruprasad Zapate
University of Paderborn
1
Outline
● Introduction
○ Structured Prediction (SP)
○ Reinforcement Learning (RL)
● SP-MDP framework
● Approximated RL Algorithms
● Conclusion
2
Structured Prediction (SP)
● In “normal” machine learning our goal is to learn unknown function
➔ Here input can be any kind of complex objects
➔ And output is a real value number
◆ For e.g classification, regression, density estimation
● Structure Prediction goal is to learn such that:
➔ Where outputs are structured objects
3
Structured Prediction(SP)
● Structured data :
○ “Data that consists of several parts, and not only the parts themselves contain information, but also
the way in which the parts belong together.”
● Texts, images, documents etc. are type of structured data
Text
4
Image Source
Structured Prediction(SP)
● SP has wide applications in the various AI fields such as:
○ Natural Language Processing
○ Images and video processing
○ Speech Processing
○ Bioinformatics
5
Image Source
Semantic Image Segmentation Spam Filtering
Structured Prediction(SP)
● Models used to solve SP problems
○ Global Models
■ Focuses on the global characteristics of the data
■ For simplicity classification is also used
■ Linear models such as Perceptron, SVM are customized for structured outputs
■ Conditional Random Fields(CRF) are also widely used
○ Incremental Models
■ Turn learning into sequential process
■ After each step partial output is predicted and model is updated
■ Incremental Parsing, Incremental Prediction are some the few models which uses
incremental approach
6
Reinforcement Learning (RL)
● A type of Machine Learning with core idea :
○ To make machines as intelligent as humans, we need to make machines learn like humans
● Learning model contains following entities:
○ Agent (Decision Maker/Learner)
○ Environment (External conditions)
○ States
○ Actions
○ Rewards - the consequence of actions
● Agent learns from interaction with environment through the means of reward
7
Reinforcement Learning (RL)
● Learning procedure in RL
○ Agent observes a state
○ An action is determined by a decision making function (policy)
○ Action is performed and state is changed
○ Reward function assigns rewards from the environment
● Two distinguish characteristics
○ Trial & Error Search
○ Delayed Rewards
8
Image Source
MDP Framework
● Markov Decision Processes formally describes an environment for RL
● Processes hold Markov property which states:
○ “The future is independent of the past given the present”
● MDP framework contains following components:
○ Set of possible states S
○ Set of possible actions A
○ A real valued reward function R(s,a)
○ State Transition function T to measure each action’s effect in each state
● Next state is only dependent on the current state, past states are insignificant
● Goal is to find an optimal policy that maximizes the total reward
9
SP-MDP Framework
● Structured Prediction Markov Decision Process (SP-MDP)
○ Instantiation of MDP for Structured Prediction
● SP-MDP framework constitutes of following components:
○ States
■ Contains both input and partial output
■ Final states contain complete output
○ Actions
■ Each state has a set of available actions
○ Transition
■ Replaces the partial output by the transformed partial output
○ Rewards
■ Loss function is used to assign rewards for state-action pair
10
SP-MDP Framework
11Image Source
Handwritten digit recognition on sequential data
SP-MDP Framework
● Rewards Integration
○ Per-episode rewards :
■ Only final states are considered for rewards
■ Final reward is negative loss of the predicted output to the correct output
○ Per-decision rewards :
■ Rewards are given after each step for all the states
■ Much richer reward signals
12
Optimal Policy in SP-MDP
● Per-episode Rewards
○ Total Reward in one trajectory (state-action sequence from start to finish) for a policy
○ Optimal policy is the one which minimizes the rewards ( due to negative loss)
13
Optimal Policy in SP-MDP
● Per-decision Rewards
○ Total Reward in one trajectory for a policy
○ Optimal policy is the one which minimizes the rewards
14
Equivalence between SP and SP-MDP
● Learning goal in SP is to minimize the empirical risk
○ Empirical risk measures how much, on average, it hurts to use inferred function as our prediction
algorithm
○ Given parameters and loss function , goal is to minimize the loss
● Minimizing the empirical risk in SP is similar to finding optimal policy in
SP-MDP
● Equivalence helps to use RL algorithms for SP through SP-MDP framework
15
Approximated Value based RL
● Quite straightforward approach
● Uses action-value function that assigns the scores for the taken actions
● Employs greedy policy to maximize the score which is defined as
● Discounted rewards are also used which determines current value of the future
rewards
● Linear functions are used for approximation for large MDPs
16
Approximated Value based RL
● SARSA, Q-Learning etc. are widely used value based RL models
● Works well in many applications
● Limitations :
○ Focused on deterministic policies
○ Small change in estimated value can affect the performance
○ Difficult to converge to a policy
○ Very complex in cases of large MDPs 17
Policy gradient RL
● Modify policy parameters directly instead of estimating action values
● Search directly for the optimal policy
● Estimates gradient through simulation
● After each step gradient is updated and stochastic ascent performed on
parameters
● Maximizes the expected reward-per-step
18
Policy gradient RL
● Advantages:
○ Better convergence properties
○ Effective in high–dimensional or continuous action spaces
○ Policy subspace can be chosen according to the task
○ Can learn stochastic policies
● Limitations:
○ Evaluating a policy is typically inefficient and high variance
○ Tends to converge to a local optimum rather than global optimum
19
Using RL algorithms in SP-MDP
● Policy is learned in an SP-MDP based on any approximated RL algorithms
● Use policy for inference
● Start with an input and default initial output
● Actions are chosen using linear approximation
● Final states provide complete predicted output
● Complexity for prediction can be given as where:
○ T : maximum number of steps to predict complete output
○ : mean number of available actions
20
Conclusion
● We learned & discussed topics
○ Structured Prediction
○ Reinforcement Learning
● Introduced a new framework SP-MDP that links SP & RL
● Introduced widely used RL algorithms
● Discussed how RL algorithms can be used in SP-MDP
21
Q&A
● ??!
22

More Related Content

PPTX
Lecture #05
PDF
#1 Berlin Students in AI, Machine Learning & NLP presentation
PPTX
Process change - change managment
PPTX
What is NPS? Net Promoter Score Explained
PPTX
Loyalty calculator and emotional service as a pledge of effective sales
PPTX
Process of Strategic Managment
PPT
RAPID Loyalty Measurement
PDF
Lecture #05
#1 Berlin Students in AI, Machine Learning & NLP presentation
Process change - change managment
What is NPS? Net Promoter Score Explained
Loyalty calculator and emotional service as a pledge of effective sales
Process of Strategic Managment
RAPID Loyalty Measurement

Viewers also liked (11)

PDF
Калькулятор лояльности и эмоциональный сервис как залог эффективных продаж
PDF
Net Promoter Score Benchmarks For Business Cases
PPTX
CustomerGauge B2b Net Promoter Score Measurement
PDF
Case Study: Centrica Improves Customer Experience and Net Promoter Score With...
PDF
[Webinar] Survey and Net Promoter Score Best Practices
PDF
How Maersk Line improved their Net Promoter Score by 40 points | Beyond Philo...
PPTX
Quality Risk Management-ICH Q9-2014
PPTX
Strategic managment process
PDF
Everything You Need To Know About The Employee Net Promoter Score
PPTX
Net Promoter Score (NPS) - Measure Customer Satisfaction in 1 Question
PDF
The Top Skills That Can Get You Hired in 2017
Калькулятор лояльности и эмоциональный сервис как залог эффективных продаж
Net Promoter Score Benchmarks For Business Cases
CustomerGauge B2b Net Promoter Score Measurement
Case Study: Centrica Improves Customer Experience and Net Promoter Score With...
[Webinar] Survey and Net Promoter Score Best Practices
How Maersk Line improved their Net Promoter Score by 40 points | Beyond Philo...
Quality Risk Management-ICH Q9-2014
Strategic managment process
Everything You Need To Know About The Employee Net Promoter Score
Net Promoter Score (NPS) - Measure Customer Satisfaction in 1 Question
The Top Skills That Can Get You Hired in 2017
Ad

Similar to Structured prediction with reinforcement learning (20)

PDF
Reinforcement Learning (Reloaded) - Xavier Giró-i-Nieto - UPC Barcelona 2018
PDF
Deep Reinforcement Learning: MDP & DQN - Xavier Giro-i-Nieto - UPC Barcelona ...
PPTX
Introduction to Reinforcement Learning.pptx
PDF
Reinforcement Learning (DLAI D7L2 2017 UPC Deep Learning for Artificial Intel...
PPTX
Information Theoretic aspect of reinforcement learning
PDF
Deep Reinforcement learning
PDF
Algorithms for Reinforcement Learning
PPTX
REINFORCEMENT_LEARNING POWER POINT PRESENTATION.pptx
PDF
Reinforcement Learning for Financial Markets
PPT
Reinforcement Learning Q-Learning
PDF
Introduction to Deep Reinforcement Learning
PDF
Shanghai deep learning meetup 4
PDF
anintroductiontoreinforcementlearning-180912151720.pdf
PPTX
An introduction to reinforcement learning
PPT
Hierarchical Reinforcement Learning
PDF
Reinforcement Learning Guide For Beginners
PDF
Reinforcement Learning - DQN
PDF
Reinfrocement Learning
PDF
Head First Reinforcement Learning
PDF
Deep reinforcement learning from scratch
Reinforcement Learning (Reloaded) - Xavier Giró-i-Nieto - UPC Barcelona 2018
Deep Reinforcement Learning: MDP & DQN - Xavier Giro-i-Nieto - UPC Barcelona ...
Introduction to Reinforcement Learning.pptx
Reinforcement Learning (DLAI D7L2 2017 UPC Deep Learning for Artificial Intel...
Information Theoretic aspect of reinforcement learning
Deep Reinforcement learning
Algorithms for Reinforcement Learning
REINFORCEMENT_LEARNING POWER POINT PRESENTATION.pptx
Reinforcement Learning for Financial Markets
Reinforcement Learning Q-Learning
Introduction to Deep Reinforcement Learning
Shanghai deep learning meetup 4
anintroductiontoreinforcementlearning-180912151720.pdf
An introduction to reinforcement learning
Hierarchical Reinforcement Learning
Reinforcement Learning Guide For Beginners
Reinforcement Learning - DQN
Reinfrocement Learning
Head First Reinforcement Learning
Deep reinforcement learning from scratch
Ad

Recently uploaded (20)

PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
Spectral efficient network and resource selection model in 5G networks
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PPTX
MYSQL Presentation for SQL database connectivity
PDF
Machine learning based COVID-19 study performance prediction
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PPTX
Cloud computing and distributed systems.
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Spectral efficient network and resource selection model in 5G networks
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
MYSQL Presentation for SQL database connectivity
Machine learning based COVID-19 study performance prediction
Dropbox Q2 2025 Financial Results & Investor Presentation
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
Cloud computing and distributed systems.
Review of recent advances in non-invasive hemoglobin estimation
Building Integrated photovoltaic BIPV_UPV.pdf
Digital-Transformation-Roadmap-for-Companies.pptx
Per capita expenditure prediction using model stacking based on satellite ima...
NewMind AI Weekly Chronicles - August'25 Week I
The Rise and Fall of 3GPP – Time for a Sabbatical?
20250228 LYD VKU AI Blended-Learning.pptx
Agricultural_Statistics_at_a_Glance_2022_0.pdf
“AI and Expert System Decision Support & Business Intelligence Systems”
Advanced methodologies resolving dimensionality complications for autism neur...

Structured prediction with reinforcement learning

  • 1. Structured Prediction With Reinforcement Learning Guruprasad Zapate University of Paderborn 1
  • 2. Outline ● Introduction ○ Structured Prediction (SP) ○ Reinforcement Learning (RL) ● SP-MDP framework ● Approximated RL Algorithms ● Conclusion 2
  • 3. Structured Prediction (SP) ● In “normal” machine learning our goal is to learn unknown function ➔ Here input can be any kind of complex objects ➔ And output is a real value number ◆ For e.g classification, regression, density estimation ● Structure Prediction goal is to learn such that: ➔ Where outputs are structured objects 3
  • 4. Structured Prediction(SP) ● Structured data : ○ “Data that consists of several parts, and not only the parts themselves contain information, but also the way in which the parts belong together.” ● Texts, images, documents etc. are type of structured data Text 4 Image Source
  • 5. Structured Prediction(SP) ● SP has wide applications in the various AI fields such as: ○ Natural Language Processing ○ Images and video processing ○ Speech Processing ○ Bioinformatics 5 Image Source Semantic Image Segmentation Spam Filtering
  • 6. Structured Prediction(SP) ● Models used to solve SP problems ○ Global Models ■ Focuses on the global characteristics of the data ■ For simplicity classification is also used ■ Linear models such as Perceptron, SVM are customized for structured outputs ■ Conditional Random Fields(CRF) are also widely used ○ Incremental Models ■ Turn learning into sequential process ■ After each step partial output is predicted and model is updated ■ Incremental Parsing, Incremental Prediction are some the few models which uses incremental approach 6
  • 7. Reinforcement Learning (RL) ● A type of Machine Learning with core idea : ○ To make machines as intelligent as humans, we need to make machines learn like humans ● Learning model contains following entities: ○ Agent (Decision Maker/Learner) ○ Environment (External conditions) ○ States ○ Actions ○ Rewards - the consequence of actions ● Agent learns from interaction with environment through the means of reward 7
  • 8. Reinforcement Learning (RL) ● Learning procedure in RL ○ Agent observes a state ○ An action is determined by a decision making function (policy) ○ Action is performed and state is changed ○ Reward function assigns rewards from the environment ● Two distinguish characteristics ○ Trial & Error Search ○ Delayed Rewards 8 Image Source
  • 9. MDP Framework ● Markov Decision Processes formally describes an environment for RL ● Processes hold Markov property which states: ○ “The future is independent of the past given the present” ● MDP framework contains following components: ○ Set of possible states S ○ Set of possible actions A ○ A real valued reward function R(s,a) ○ State Transition function T to measure each action’s effect in each state ● Next state is only dependent on the current state, past states are insignificant ● Goal is to find an optimal policy that maximizes the total reward 9
  • 10. SP-MDP Framework ● Structured Prediction Markov Decision Process (SP-MDP) ○ Instantiation of MDP for Structured Prediction ● SP-MDP framework constitutes of following components: ○ States ■ Contains both input and partial output ■ Final states contain complete output ○ Actions ■ Each state has a set of available actions ○ Transition ■ Replaces the partial output by the transformed partial output ○ Rewards ■ Loss function is used to assign rewards for state-action pair 10
  • 11. SP-MDP Framework 11Image Source Handwritten digit recognition on sequential data
  • 12. SP-MDP Framework ● Rewards Integration ○ Per-episode rewards : ■ Only final states are considered for rewards ■ Final reward is negative loss of the predicted output to the correct output ○ Per-decision rewards : ■ Rewards are given after each step for all the states ■ Much richer reward signals 12
  • 13. Optimal Policy in SP-MDP ● Per-episode Rewards ○ Total Reward in one trajectory (state-action sequence from start to finish) for a policy ○ Optimal policy is the one which minimizes the rewards ( due to negative loss) 13
  • 14. Optimal Policy in SP-MDP ● Per-decision Rewards ○ Total Reward in one trajectory for a policy ○ Optimal policy is the one which minimizes the rewards 14
  • 15. Equivalence between SP and SP-MDP ● Learning goal in SP is to minimize the empirical risk ○ Empirical risk measures how much, on average, it hurts to use inferred function as our prediction algorithm ○ Given parameters and loss function , goal is to minimize the loss ● Minimizing the empirical risk in SP is similar to finding optimal policy in SP-MDP ● Equivalence helps to use RL algorithms for SP through SP-MDP framework 15
  • 16. Approximated Value based RL ● Quite straightforward approach ● Uses action-value function that assigns the scores for the taken actions ● Employs greedy policy to maximize the score which is defined as ● Discounted rewards are also used which determines current value of the future rewards ● Linear functions are used for approximation for large MDPs 16
  • 17. Approximated Value based RL ● SARSA, Q-Learning etc. are widely used value based RL models ● Works well in many applications ● Limitations : ○ Focused on deterministic policies ○ Small change in estimated value can affect the performance ○ Difficult to converge to a policy ○ Very complex in cases of large MDPs 17
  • 18. Policy gradient RL ● Modify policy parameters directly instead of estimating action values ● Search directly for the optimal policy ● Estimates gradient through simulation ● After each step gradient is updated and stochastic ascent performed on parameters ● Maximizes the expected reward-per-step 18
  • 19. Policy gradient RL ● Advantages: ○ Better convergence properties ○ Effective in high–dimensional or continuous action spaces ○ Policy subspace can be chosen according to the task ○ Can learn stochastic policies ● Limitations: ○ Evaluating a policy is typically inefficient and high variance ○ Tends to converge to a local optimum rather than global optimum 19
  • 20. Using RL algorithms in SP-MDP ● Policy is learned in an SP-MDP based on any approximated RL algorithms ● Use policy for inference ● Start with an input and default initial output ● Actions are chosen using linear approximation ● Final states provide complete predicted output ● Complexity for prediction can be given as where: ○ T : maximum number of steps to predict complete output ○ : mean number of available actions 20
  • 21. Conclusion ● We learned & discussed topics ○ Structured Prediction ○ Reinforcement Learning ● Introduced a new framework SP-MDP that links SP & RL ● Introduced widely used RL algorithms ● Discussed how RL algorithms can be used in SP-MDP 21