This document discusses using Q-learning to find optimal control policies for nonlinear systems with continuous state spaces. It outlines a 5-step approach: 1) Recognize the fixed point equation for the Q-function, 2) Find a stabilizing policy that is ergodic, 3) Use an optimality criterion to minimize the Bellman error, 4) Use an adjoint operation, and 5) Interpret and simulate the results. As an example, it applies these steps to the linear quadratic regulator (LQR) problem and approximates the Q-function. The goal is to seek the best approximation within a parameterized function class.
Related topics: