This three sentence summary provides the key details about the document:
The document proposes using multi-agent Q-learning to control traffic lights in a large, non-stationary traffic network. Each intersection is modeled as an intelligent agent that uses reinforcement learning to determine optimal light timings based on local queue lengths. The Q-learning approach does not require a pre-specified model and can adapt to changing traffic conditions, making it suitable for dynamic, non-stationary environments unlike traditional reinforcement learning.