StarCraft Brood War Data Mining
If you want to do some data mining or machine learning on StarCraft Brood War, don't reinvent the wheel. This is a compendium of all the work done. Just remember to notify me (albertouri[at]drexel[dot]edu) if you do something to keep this web updated.
First of all you need some replays (game log files) to do your data mining. There are some websites where you can find replays from professional gamers (a.k.a. gosu). It's important to notice that if you want to analyze the replays using BWAPI, you would need replays played in the last version of StarCraft Brood War (1.16.1).
- [W1] GosuGamers. Since 04-14-2013 they aren't adding new replays. 7,930 replays.
- [W2] ICCup. Still active professional game server with regular tournaments.
- [W3] TeamLiquid. Professional community with regular tournaments.
- [W4] RepDepot. Replay repository with more than 56,600 replays (gosu and user). Currently OFFLINE
- [W5] Reps.Ru
- [W6] yaoyuan.com
- [W7] ygosu.com
- [W8] defiler.ru
- [W9] http://bwreplays.com/ bwreplays.com]. Compilation of all the previous sites
Downloading all the replays by hand it's a really time consuming task, you should use a crawler to automatize this process. Keep in mind that the same replay can be stored in different websites, so it's a good practice to check the hash of the replay to look for duplicates. Some replays could be corrupted.
- Broodwar Replay Scrappers. Developed by Gabriel Synnaeve in Python. It downloads replays from [W1], [W2] and [W3].
- Tutorial how to create your own crawler.
Once you gather a nice amount of replays, it's nice to offer your replay package to the research community to be able to reproduce your results and compare it with new approaches. The map of the replay is included inside each replay file.
|[R1]||Download||1.2 GB||5493||[W1][W2][W3]||Ben Webber||Contain replays from previous versions of StarCraft not compatible with BWAPI|
|[R2]||Download||359 MB||6000||?||Fobbah||Only Zerg replays (versus Zerg, Protoss and Terran)|
|[R3]||Download||644 MB||7649||[W1][W2][W3]||Gabriel Synnaeve||No duplicates|
|[R4]||Download||54 MB||1029||[W2]||Gabriel Synnaeve||Users replays (not professional players)|
|[R5]||Download||63 MB||509||[W2]||Tom Dietterich||Only Protoss vs Terran|
Now is time to parser the replays. For this we have two options, parser the replay file or parser the BWAPI events/states. The first one we don't need to play the replay on StarCraft, but we only have the click commands of the players and we need to decode the binary replay files. Using BWAPI we can record all game events and all states of the units, even simulate the fog of war, but we need to play each replay in StarCraft.
|[A1]||LordMartin Replay Browser||-||File Parser||Source code not available|
|[A3]||RepASsiMilator||C++ (PHP)||File Parser||[A2]||PHP extension|
|[A5]||bwrepanalysis||Java, C++||File Parser + BWAPI events||[A4]||BWAPI events recorded: economy 25 frames, tech 1 frame, vision 12 frames, orders 1 frame, unit location 100 frames or new order|
|[A6]||bwrepdump||C++||BWAPI events||[A5]||BWAPI events recorded: economy 25 frames, tech 1 frame, vision 12 frames, orders 1 frame, unit location 100 frames or new order, 1 frame during attacks|
|[A7]||ScExtractor||Java||File Parser + BWAPI events||[A4]||BWAPI events recorded: 24 frames OR user action frames OR 24 frames, 7 frames during attacks|
After parsing the replays you will have a clean dataset ready to apply machine learning techniques. Each dataset has been created with some data mining in mind, so maybe they don't capture the information that you want or with the granularity you need. So feel free to create your own dataset using a replay analyzer.
|[D1]||Download||1.7 MB||[R1]||ARFF files to use with Weka. Script used to label each player actions with a build order (early game strategy), i.e. supervised learning.|
|[D2]||Download||870 MB||[R2][A5]||For each replay it provides 3 plain text files: RGD (Replay Game Data), RLD (Replay Location Data) and ROD (Replay Order Data).|
|[D3]||Download||2.1 GB||[R3][A6]||For each replay it provides 3 plain text files: RGD (Replay Game Data), RLD (Replay Location Data) and ROD (Replay Order Data). Warning: not up-to-date with last version of A6.|
|[D4]||Download||19.6 GB||[R3][A7]||Provides SQL files to populate a Data Base with the following structure. Recorded state changes (all unit attributes) each 24 frames.|
|[D5]||Download||63.3 MB||[R5]||Contains the opening build choices of the Protoss player in the first 7 minutes of the game, as well as information about what the Terran player could see during this time. The objective is to predict the Protoss player's choices using the information available to the Terran player.|
Some research publications that used a dataset (or a replay analyzer) from the previous section.
- [A1] (2008) Building a Player Strategy Model by Analyzing Replays of Real-Time Strategy Games. Ji-Lung Hsieh, and Chuen-Tsai Sun
- [D1] (2009) A Data Mining Approach to Strategy Prediction. Ben Weber and Michael Mateas
- [A2] (2010) Cooperative Learning by Replay Files in Real-Time Strategy Game. Jaekwang Kim, Kwang Ho Yoon, Taebok Yoon, and Jee-Hyong Lee
- [W2,A4] (2011) A Corpus Analysis of Strategy Video Game Play in Starcraft: Brood War. Joshua M. Lewis, Patrick Trinh and David Kirsh
- [D3] (2012) A Dataset for StarCraft AI & an Example of Armies Clustering. Gabriel Synnaeve and Pierre Bessière
- [D5] (2012) Inferring Strategies from Limited Reconnaissance in Real-time Strategy Games. Jesse Hostetler, Ethan W. Dereszynski, Thomas G. Dietterich, Alan Fern
- [D4] (2014) An Improved Dataset and Extraction Process for Starcraft AI. Glen Robertson and Ian Watson