April 07, 2006

The Dreaded Level 60 Scraper Cap

When Eric and Nic first implemented the WoW census scraper early last year, one potential problem they foresaw was the 49 character limit to the return from the /who command. Because the scraper polled each race, class, and level combination separately, this limitation meant that there would be a growing problem with characters who are not scraped because there are more than 49 characters in that category during peak hours (i.e., level 60 Night-Elf Rogues).

As the WoW servers have matured, this issue has become more and more potentially problematic in terms of data quality and analytic validity. And this is an issue that many comments here at the blog have touched upon. To this end, we sought to estimate the percentage of characters that were not being scraped due to this cap. This would give us an idea of how bad the problem was.

To do this, we grabbed one Saturday from every month from a high population server from our logs (from July 2005 to January 2006). Since this was a high population server, we took this to be our worst case scenario. We then parsed the number of level 60s logged per race per class combination (during each snapshot) - thus 20 combinations for Horde and 20 combinations for Alliance.

One problem was that Blizzard started to have authentication problems after November of 2005, and this interefered with our census scrapers logging on during peak hours. To avoid analyzing data after this period, we chose the regular November 2005 data for this analysis.

Next, we scrolled through these parsed logs to find the snapshot with the most number of overloads on that November Saturday. We then proceeded to estimate the number of overloads in that snapshot (the worst case scenario). To estimate the number of characters that we were missing from that overload, we looked at the observed number of characters in a non-overloaded race/class combination and then referring to the overall ratio of race/class combinations (on WarcraftRealms census), we inferred how many characters there should be in the overloaded combinations. We did this separately for the Alliance and the Horde and then calculated the % missing due to overload.

For the Horde, at worst peak time in November, we missed 3% of level 60s. For the Alliance, at worst peak time in November, we missed 13% of level 60s.

So overall, even in the worst case scenario on a high population server in November 2005, we were only missing 13% of level 60s. For most of the day, no race/class combinations were overloaded.

Posted by nickyee at 12:26 PM | Comments (4) | TrackBack

May 17, 2005

Collecting Data from World of Warcraft

In constructing World of Warcraft, Blizzard made the interesting design decision to implement the client-side UI in a way that is open to extension and modification by the user community via an API. Combine this with a way to query the population in-game (the /who command), and it becomes possible for us to issue a decree to take a census of the known world.

Others (WoW Census, for example) have done this before, but for one reason or another, the data was not quite in the format we wanted, so we rolled our own. In essence, we have a way to collect a census snapshot of one faction of one server in about 5 to 15 minutes, depending on server load.

Fine print: Essentially, we loop through all race/class pairs (e.g., "Dwarf Paladin"), emit the appropriate /who command, and wait for the server to respond, at which point we stash away an entry in the SavedVariables.lua file of the form,

Thunderserver,2005/03/24,Crandall,56,Ni,id,y,Felwood,Ant Killers

for a level 56 night elf druid on the server Thunderserver. He's currently in Felwood, grouped ("y"), and is part of the Ant Killers guild. The server is only willing to return 49 entries to us at a time, so if there are more than that, we restrict the levels (e.g. "Dwarf Paladin 1-25") until we're sure that we're seeing everybody. (Caveat: We currently have no way to catch all players if there are more than 49 online gamers with the same race, class, and level. So far, we've only seen that with level 60 dwarf paladins at peak times. And we're ignoring this for the moment.)

We are currently collecting snapshots from both factions of three different worlds: the RP realm that we commonly play in, a normal server which was listed as having a moderate load, and a normal server which was considered to be heavily loaded.

After exiting the game, we have same Lua hackery to scavenge the data in SavedVariables.lua and leave it in a more permanent form on disk.

We've been harvesting data on and off since late March. (We have a manual upgrade process after each patch.) To date, we have taken about 6000 snapshots, or roughly a thousand snapshots per server per faction. Afterward, we're analyzing the collected data with our own Jawa application and Excel, but that's a story for another post.

Posted by at 04:17 PM | Comments (7) | TrackBack