The document discusses the use of a policy gradient method for learning coordination between a kicker and receiver in RoboCup 2D soccer simulation during direct free kicks. It proposes a heuristic function that evaluates target points based on the interaction between the two agents, allowing them to predict each other's actions and enhance their teamwork. The experiments highlight that shared heuristics lead to effective cooperation, while diverging heuristics foster a master-servant dynamic between the agents.
Related topics: