The document describes using deep reinforcement learning to automatically scale an Apache Spark cluster on an OpenStack platform. Key points:
- Features like CPU/memory usage and network traffic are selected to represent cluster state.
- A deep Q-network agent takes scaling actions (add/remove nodes) based on state to meet time/resource constraints.
- The approach improves on linear regression by reducing job failures during constraint testing with different data sizes and limits.
- An open-source implementation including Docker images and source code is provided.