This document provides an introduction to reinforcement learning (RL). It discusses key concepts in RL including supervised vs reinforcement learning, the multi-armed bandit problem, exploration vs exploitation, and challenges in stationary vs non-stationary problems. Action values and optimal action values are estimated using sample averages and updated using a weighted average. Preliminary results show an RL strategy outperforming other strategies on a non-stationary problem.
Related topics: