The document proposes a control method for multi-legged robots that combines simulation modeling improvements and deep reinforcement learning. It trains a policy using reinforcement learning in a stochastic simulator. The policy is then deployed on a real robot. Experimental results show the method enables command-conditioned locomotion, high-speed locomotion over 1.6 m/s, and recovery from falls - outperforming prior model-based approaches. Key techniques include using an actuator network to bridge the simulator-reality gap, improving contact simulation speed using a dichotomy method, and randomizing simulator conditions to learn robust policies.