This document introduces the framework of structured prediction Markov decision processes (SP-MDP) which links structured prediction and reinforcement learning. It discusses how SP problems can be formulated as SP-MDPs by defining states, actions, transitions, and rewards. Approximated reinforcement learning algorithms like Q-learning, SARSA, and policy gradients can then be used to find optimal policies for structured prediction in the SP-MDP framework. This allows reinforcement learning techniques to be applied to structured prediction problems.
Related topics: