StarCraft AI Benchmarks

The goal of this page is to collect benchmark problems that can be broadly used and referenced. We present a benchmark composed of a series of scenarios, each of them capturing a different aspect of RTS games. Each scenario is defined by a starting situation in StarCraft where the agent needs to either defeat the opponent or survive for as long as possible. More information can be found in the original paper: A Benchmark for StarCraft Intelligent Agents.

These benchmarks can be cited as:

@inproceedings{uriarte15b,
  author    = {Uriarte, Alberto and Onta\~{n}\'{o}n, Santiago},
  title     = {A Benchmark for StarCraft Intelligent Agents,
  booktitle = {AIIDE},
  year      = {2015}
}

We are preparing an automatic service to test your bot with all the scenarios. For now you can use the launcher provided in the repository (https://bitbucket.org/auriarte/starcraftbenchmarkai) to test your bot locally and send to us (admin[at]starcraftai.com) your bot if you want to appear in the leaderboard. Keep in mind that for micromanager maps the goal for your bot is to reach the opponent's starting position.

Metrics

All the metrics are designed to be normalized either in the interval [0,1] or [-1,1], with higher values representing better agent performance.

  • Survivor’s life: The sum of the square root of hit points remaining of each unit divided by amount of time it took to complete the scenario (win/defeat/timeout), measured in frames.Normalized by an lower and upper bounds. The lower bound is when player A is defeated in the minimum time and without dealing any damage to player B, while the upper bound is the opposite.
  • Time survived: The time the agent survived normalized by a predefined timeout.
  • Time needed: We start a timer when a certain event happens (e.g., a building is destroyed) and we stop it after a timeout or after a condition is triggered (e.g., the destroyed building is replaced).
  • Units lost: The difference in units lost by players A and B. We normalize between [0, 1] by dividing the number of units lost by the maximum units of the player.

Benchmarks Scenarios

All scenarios can be found in the repository

Scenario Description Evaluation
Reactive Control
RC1: Perfect Kiting The purpose of this scenario is to test whether the intelligent agent is able to reason about the possibility of exploiting its mobility and range attack against a stronger but slower unit in order to win. In this scenario, a direct frontal attack will result in losing the combat, but via careful maneuvering, it is possible to win without taking any damage. Survivor’s life
RC2: Kiting In this scenario the intelligent agent is at a disadvantage, but using a hit-and-run behavior might suffice to win. The main difference with the previous case is that here, some damage is unavoidable. Survivor’s life
RC3: Sustained Kiting In this case there is no chance to win so we should try to stay alive as much time as possible. A typical example of this behavior is while we are scouting the enemy base. Time survived in frames since a Zealot starts chasing the SCV normalized by the timeout.
RC4: Symmetric Armies In equal conditions (symmetric armies), positioning and target selection are key aspects that can determine a player’s success in a battle. This scenario presents a test with several configurations as a baseline to experiment against basic AI opponents. Survivor’s life
Tactics
T1: Dynamic obstacles This scenario measures how well an agent can navigate when chokepoints are blocked by dynamic obstacles (e.g., neutral buildings). Notice that we are not aiming to bench- mark pathfinding, but high-level navigation. Time needed
Strategy
S1: Building placement This scenario simulates a Zealot rush and is designed to test whether the agent will be able to stop it (intuitively, it seems the only option is to build a wall). Units lost: (Units player B lost / 4) - (units player A lost / 25).
S2: Plan Recovery An agent should adapt on plan failures. This scenario tests if the AI is able to recover from the opponent disrupting its build order. Time spent to replace a building normalized by the timeout.

Research papers using these scenarios

Leaderboard

Bot RC1 RC2 RC3 RC4 T1 S1 S2
FreScBot -0.0879 -0.1153 N/A -0.0022 N/A N/A N/A
UAlbertaBot -0.0933 0.0422 N/A 0.0369 N/A -1 0.0000
Skynet -0.1087 0.1696 N/A 0.0706 N/A N/A N/A
Nova 0.1111 N/A 0.0335 N/A 0.0000 -0.7420 0.0000