DIAYN is an unsupervised reinforcement learning method that learns diverse skills without a reward function. It works by maximizing the mutual information between skills and states visited to ensure skills dictate different states, while minimizing the mutual information between skills and actions given a state to distinguish skills based on states. It also maximizes a mixture of policies to encourage diverse skills. Experiments show DIAYN discovers locomotion skills in complex environments and sometimes learns skills that solve benchmark tasks. The learned skills can then be adapted to maximize rewards, used for hierarchical RL, and to imitate experts.
Related topics: