This document discusses reinforcement learning techniques, focusing on Thompson sampling. It provides an overview of concepts related to reinforcement learning like the multi-armed bandit problem, exploration vs exploitation, and Bayesian probability. It then goes into detail on the Thompson sampling algorithm, how it works, and variations like multi-play Thompson sampling that have been shown to improve performance. The document is presented by Shindong Kang of Intelligent City Ltd.
Related topics: