Reinforcement learning algorithms like Q-learning can model human behavior better than models of full rationality in games. The document describes experiments using Q-learning agents in repeated 2x2 games. In prisoner's dilemma games, Q-learning agents learned mutual cooperation more when it was the Pareto optimal outcome compared to when it was not. In asymmetric games, Q-learning agents also learned the Pareto optimal outcome when it provided sufficient incentive. In coordination games like Chicken, Q-learning agents sometimes learned alternating equilibrium strategies similar to human subjects.
Related topics: