Overview
Consider a game of one agent against the environment, such as solitaire; the agent can play “against” the environment multiple times and we can alter the environment (change the difficulty, randomization, etc.) to target specific training for the agent. What about competitive games with more than one player, such as AI Arena?
Why Do We Need Self-Play?
It’s possible to code up an opponent as a part of the environment and run your agent against said opponent; for instance, one can code up a rules-based fighter and train your RL agent against said fighter. This method, however, has limited utility. There are multiple reasons why, but the three main ones are:
- Training against only one agent encourages overfitting to face a specific strategy. If you only train your fighter with RL against one fighting strategy, your fighter may perform adequately against said strategy but will probably perform poorly against other strategies.
- The strategy employed by the created rules-based fighter also matters greatly for training. For example, if you are training your fighter against a difficult rules-based fighter, your fighter might not be able to learn effectively unless it already possess a decent strategy. In this situation, the rules-based fighter will likely dominate your agent. Alternatively, if you’re training your fighter against an easy rules-based fighter, your fighter might not learn much.
- Creating multiple rules-based fighters is impractical because it takes a great amount of skill (and time) to do so. This takes away the whole point of RL.
Is there a way to create one dynamic opponent which changes automatically as our agent trains?
Yes, there is! Why not train an agent against itself?
What is Self-Play?
Self-play is exactly as the name suggests: pitting your agent against itself in RL training. In self-play your agent is duplicated and both agents train against one another. The agents are updated alternatively each simulation.
This way your agent will be matched against a dynamic opponent whose strategy changes and who is at around the agent’s skill level. Over time, your agent may even uncover new, perhaps “superhuman” strategies!