This document is a comprehensive survey on contextual bandits, detailing key concepts, problem settings, and algorithms like UCB and Thompson sampling. It covers both stochastic and adversarial cases, exploring various approaches and their regret bounds. The document also discusses how supervised learning techniques can be adapted for contextual bandit problems.