StarCraft Brood War Data Mining

Revision as of 14:47, 5 November 2015 by Warwolf30 (talk | contribs) (Replay websites)

If you want to do some data mining or machine learning on StarCraft Brood War, don't reinvent the wheel. This is a compendium of all the work done. Just remember to notify me (albertouri[at]drexel[dot]edu) if you do something to keep this web updated.

Replay websites

First of all you need some replays (game log files) to do your data mining. There are some websites where you can find replays from professional gamers (a.k.a. gosu). It's important to notice that if you want to analyze the replays using BWAPI, you would need replays played in the last version of StarCraft Brood War (1.16.1).

  • [W1] GosuGamers. Since 04-14-2013 they aren't adding new replays. 7,930 replays.
  • [W2] ICCup. Still active professional game server with regular tournaments.
  • [W3] TeamLiquid. Professional community with regular tournaments.
  • [W4] RepDepot. Replay repository with more than 56,600 replays (gosu and user). Currently OFFLINE
  • [W5] Reps.Ru
  • [W6]
  • [W7]
  • [W8]
  • [W9] Compilation of all the previous sites

Replay crawler

Downloading all the replays by hand it's a really time consuming task, you should use a crawler to automatize this process. Keep in mind that the same replay can be stored in different websites, so it's a good practice to check the hash of the replay to look for duplicates. Some replays could be corrupted.

Replay packages

Once you gather a nice amount of replays, it's nice to offer your replay package to the research community to be able to reproduce your results and compare it with new approaches. The map of the replay is included inside each replay file.

ID Link Size # Replays Sources Author Notes
[R1] Download 1.2 GB 5493 [W1][W2][W3] Ben Webber Contain replays from previous versions of StarCraft not compatible with BWAPI
[R2] Download 359 MB 6000 ? Fobbah Only Zerg replays (versus Zerg, Protoss and Terran)
[R3] Download 644 MB 7649 [W1][W2][W3] Gabriel Synnaeve No duplicates
[R4] Download 54 MB 1029 [W2] Gabriel Synnaeve Users replays (not professional players)
[R5] Download 63 MB 509 [W2] Tom Dietterich Only Protoss vs Terran

Replay analyzers

Now is time to parser the replays. For this we have two options, parser the replay file or parser the BWAPI events/states. The first one we don't need to play the replay on StarCraft, but we only have the click commands of the players and we need to decode the binary replay files. Using BWAPI we can record all game events and all states of the units, even simulate the fog of war, but we need to play each replay in StarCraft.

ID Name Language Type Based on Notes
[A1] LordMartin Replay Browser - File Parser Source code not available
[A2] BWChart C++ File Parser
[A3] RepASsiMilator C++ (PHP) File Parser [A2] PHP extension
[A4] bwhf Java File Parser
[A5] bwrepanalysis Java, C++ File Parser + BWAPI events [A4] BWAPI events recorded: economy 25 frames, tech 1 frame, vision 12 frames, orders 1 frame, unit location 100 frames or new order
[A6] bwrepdump C++ BWAPI events [A5] BWAPI events recorded: economy 25 frames, tech 1 frame, vision 12 frames, orders 1 frame, unit location 100 frames or new order, 1 frame during attacks
[A7] ScExtractor Java File Parser + BWAPI events [A4] BWAPI events recorded: 24 frames OR user action frames OR 24 frames, 7 frames during attacks


After parsing the replays you will have a clean dataset ready to apply machine learning techniques. Each dataset has been created with some data mining in mind, so maybe they don't capture the information that you want or with the granularity you need. So feel free to create your own dataset using a replay analyzer.

ID Link Size Sources Notes
[D1] Download 1.7 MB [R1] ARFF files to use with Weka. Script used to label each player actions with a build order (early game strategy), i.e. supervised learning.
[D2] Download 870 MB [R2][A5] For each replay it provides 3 plain text files: RGD (Replay Game Data), RLD (Replay Location Data) and ROD (Replay Order Data).
[D3] Download 2.1 GB [R3][A6] For each replay it provides 3 plain text files: RGD (Replay Game Data), RLD (Replay Location Data) and ROD (Replay Order Data). Warning: not up-to-date with last version of A6.
[D4] Download 19.6 GB [R3][A7] Provides SQL files to populate a Data Base with the following structure. Recorded state changes (all unit attributes) each 24 frames.
[D5] Download 63.3 MB [R5] Contains the opening build choices of the Protoss player in the first 7 minutes of the game, as well as information about what the Terran player could see during this time. The objective is to predict the Protoss player's choices using the information available to the Terran player.


Some research publications that used a dataset (or a replay analyzer) from the previous section.

  • [A1] (2008) Building a Player Strategy Model by Analyzing Replays of Real-Time Strategy Games. Ji-Lung Hsieh, and Chuen-Tsai Sun
  • [D1] (2009) A Data Mining Approach to Strategy Prediction. Ben Weber and Michael Mateas
  • [A2] (2010) Cooperative Learning by Replay Files in Real-Time Strategy Game. Jaekwang Kim, Kwang Ho Yoon, Taebok Yoon, and Jee-Hyong Lee
  • [W2,A4] (2011) A Corpus Analysis of Strategy Video Game Play in Starcraft: Brood War. Joshua M. Lewis, Patrick Trinh and David Kirsh
  • [D3] (2012) A Dataset for StarCraft AI & an Example of Armies Clustering. Gabriel Synnaeve and Pierre Bessière
  • [D5] (2012) Inferring Strategies from Limited Reconnaissance in Real-time Strategy Games. Jesse Hostetler, Ethan W. Dereszynski, Thomas G. Dietterich, Alan Fern
  • [D4] (2014) An Improved Dataset and Extraction Process for Starcraft AI. Glen Robertson and Ian Watson