21 July 2006 | PlayOn authors archive
For the last week, I have been trying to change the PlayOn scraperbot software to avoid the problem of not seeing all the level 60 characters that may be logged in at any one time.
And the short answer is: Every attempt I’ve made to improve the collection has only made things worse.
After bashing my head against my keyboard [1], we have decided to recruit the collective wisdom of the web. And to offer a nearly-worthless prize.
Herewith, we offer some Parc-labeled thingy as-yet-unknown (i.e. a coffee mug, or T-shirt, or some such) to anyone who can help us characterize the nature of results returned by the “/who” command in World of Warcraft.
Here’s what we know so far…
The “/who” command, and the “Refresh” button in the socials who pane, both use the SendWho API call. Use which means you wish for testing — they’re all the same.
All of these take a filter, which can be used to select for certain names, zones, classes, races, or levels.
So
/who z-"q"
will try to find players with a “q” (in either upper or lower case) as a substring of their guild name, while
/who c-"d"
will return both druids and paladins.
The existing scrapers will try to perform a query of all race+class combinations for one of the factions, such as
/who r-"Troll" c-"Rogue" 1-60
The server will only return 50 results at most. Along with others, we have assumed that if the server returns 50 results, there may be more than 50, and we need to refine our query. On the other hand, we have assumed that if the server returns less than 50 results, it has told us about all logged-in characters that satisfy the filter. Apparently, not so.
So, if the server returned 50 results for the above, we would split the query in the obvious way to
/who r-"Troll" c-"Rogue" 1-30
/who r-"Troll" c-"Rogue" 31-60
and then split each of those in turn until we are searching for a single level of a certain race and class.
From early on, however, if the server was busy, and we queried
/who r-"Dwarf" c-"Paladin" 60-60
we were likely to get 50 results. So, we simply recorded those 50, and subdivided no further.
As the servers we’re monitoring have aged, however, we find that more and more race-class combinations are saturating the query results at peak times. This is frustrating, as it calls into question the validity of our data, especially when we are trying to analyze the level 60s, who have reached or are transitioning into the endgame, an important of the WoW landscape.
So, as I mentioned, about a week ago I began to restructure the scraping code, so that rather than splitting by race and class first, it would instead query first by zone, and then by level, and then by race and class. Let’s ignore the perils of Blizzard being able to create new zones to be found, or the fact that the filter will not let me query for people only in the zone “Ahn’Qiraj” without also telling me who is in “Gates of Ahn’Qiraj” and “Ruins of Ahn’Qiraj” as well.
In fact, I decided that in order to get the scrape times back down to where they used to be, I would play some shenanigans to do queries like
/who z-"org" 1-60
to check for toons in both “Orgrimmar” and “Searing Gorge” simultaneously, and be able to break it into zone-specific queries only if there was overflow.
When I did this, a surprising thing happened: We started seeing about 50% less inhabitants in the world than the old race-class scrapers. Hmm. Bug in the code? So I thought, until I ran some experiments in-game to flush it out.
Within a few seconds, I did the following queries mid-afternoon on a medium pop server, horde-side:
/who z-"or" 60-60
/who z-"org" 60-60
/who z-"or" 60-60
Guess what: The first and third returned 24 results, none of whom were in Orgrimmar, while the second returned 50 results from Orgrimmar and Searing Gorge.
After further checking, we discovered that we have been seeing 5-10% fewer characters even with full zone names than we see from the old race-class scrapers, whether we split by level->race->class after the zone, or by race->class->level.
We’ve tested and discarded a number of explanations. One that remains is that WoW does the z-, c-, r-, and levels as stages internally, and has cutoff capacities for the internal results.
In any case, we’re wondering if one of you out in the webosphere can explain just what’s going on. If you’ve got an idea, jump into the game, and poke around until you’re pretty sure you’ve grokked what’s happening.
Let us know, and we’ll send you something virtually worthless.
First, on Unity’s data point – are you sure that wasn’t because you were running into the limit of only a certain number of who queries a minute? If you try querying too often it starts to ignore you.
Now onto the larger problem, I spent some time fiddling with this tonight, and reached the following conclusions:
1) The who list is generated by traversing a serverside list of players in a relatively stable order. I suspect it’s login order.
The evidence I have to back this up is as follows:
/who 1-60
/who 5-60
/who 1-60
(With appropriate pauses)
Note the changes between the results, the second list will have all of the members from the first, less those below level 5, and a corresponding number more to fill up the list. Returning to 1-60 again returns you to the first list.
Next, repeat /who 1-60 periodically for a while, note that it occasionally changes, and if you /who the player that left the list, they’re not logged in anymore.
2) There’s an internal cutoff when doing a compound search involving zone, before applying the level filter.
To test this, try the following
/who z-e 60
/who z-e 59
/who z-e 59-60
Pick various ranges and values, note how for a range of less than 50 results, the individual result counts add up to the range that contains them.
It should in theory be possible to measure the internal cap by summing a full set of sub-ranges, but this is somewhat more difficult than it sounds due to the constant flow of players in and out.
Finally, an observation, i f you do:
/who z-orgrimmar z-searing 60
The zone filters combine with an ‘or’ effect.
/who [string[ string2...]] [num[-num2]] [r-"race"] [c-"class"] [z-"zone"] [g-"guild"]
Would it be possible to just search for character names?
For instance do:
/who aa
/who ab
/who ac…
And then if you come across the 50 cap you could split it further. I realize it would take 676 queries to get everyone, but I’m not sure the rate at which you can query.
You should just try to do:
/who 60
Should that return more than 49 results try:
/who 60 Orgrimmar
That should return all 60′s who have any type Orgrimmar in the guild name, character name or zone name.
If there are still too many you can do something like:
/who Orgrimmar A
That will return all characters with Orgrimmar and or the letter A in two seperate feilds, you will slowly narrow the population down. You also could simply look for a population census addon that does it automattically over the course of a half hour. Cosmos used to have one. If Cosmos still exists you can find it on Curse-Gaming.com for sure.
Jaguar: Yes, your proposal would work, but it require us to generate about 1000 /who queries to ensure a complete census (there are more than 30 letters that may be used in a name). Currently we are able to complete a census scan in around 100 queries taking about 10 minutes. So we would have to take a substantial performance hit, which we are trying to avoid.
Tom2: We are aware of the basic technique of splitting the population by narrowing the query. Where we are having problems is that there are queries we are generating which return *fewer* than 49 results, but where there are actually *more* than 49 people currently logged in who actually satisfy the terms of the query. Sorry if that wasn’t clear on the OP. What we are asking help with is someone to find the exact nature of the situations when WoW underreports these results.
As to the census mods, we are quite aware of Census and CensusPlus, and believe that they essentially suffer from the same limitations as our scrapers. Because their interest is accumulating a server over time, while our interest is in understanding the behavior of individual characters, getting full coverage on each and every scan is more important to us. CensusPlus can afford to miss a character in one census, but just pick them up in some other census.
No idea, but it _looks_ like there is a conditional on string length for sort order or where clause, in their own searches. Searching with too open a wildcard in LIKE conditionals can be a big performance hit on the server, and they might not let that happen.
So “or” isn’t long enough, you get a list populated with a huge bunch of people – the top of that list has 24 lvl 60′s, so you get those results. When you search “org”, it matches the conditional, supplies the lvl range requirement in the search (or simply has less “filler” data at the top of the list), and can return the 50 results.
A little more detail, to share with your coder friends:
The SQL on the server-side is probably something like
SELECT *stuff* FROM User = *logged_in_users*
WHERE
(if ZONE is provided) User.Zone LIKE “myZone” AND
(if LVL is provided) User.Level > myLVL.Min AND User.Level A little more detail, to share with your coder friends:
The SQL on the server-side is probably something like
SELECT *stuff* FROM User = *logged_in_users*
WHERE
(if ZONE is provided) User.Zone LIKE “myZone” AND
(if LVL is provided) User.Level > myLVL.Min AND User.Level < myLVL.Max AND …
…
LIMIT 0,49
SORT BY … *whateva*
if myZone is too small (eg, “or”), they could drop that clause … but since that’s an entirely different query, capable of very different, possibly misleading results, they might drop all those clauses, and optionally let the application layer handle the conditionals. (The limit could, or could not, change.)
N for Name… 676 queries with overlap…
/who n-”aa”
ambient intelligence AR augmented reality authentication batteries brainstorming business of innovation CHI cleantech collaboration collective intelligence competitive edge computer vision context-aware computing contextual intelligence crowdsourcing curation data centers decision making disruptive innovation electric vehicles email energy energy efficiency epic conference ethnography ethnography in industry ethnomethodology ev everyware field of use government green HCI information overload innovation innovation culture innovation strategy intellectual property IP IT kiffets licensing lithium-ion location based services long tail malware materials minimum viable product mobile computing mobile devices & interfaces mobile security MVP natural language processing news NSF open innovation opportunity discovery organic electronics Pasteur's Quadrant personal information management pervasive computing phishing photovoltaics portfolio management printed electronics privacy QR codes recommendation systems research methodology responsive mirror SaaS search smart environment smart grid social analytics social computational systems social indexing social media social streams social web software as a service technology scouting technology trends terms thin film transistors twitter ubicomp user behavior modeling user centered design user experience user interface design v2g vehicle-to-grid virtualization virtual machines virtual reality web 2.0 Wikipedia
July 23rd, 2006 at 2:26pm
Posted by Unity
I can’t be much help, but I do have a data point: searching for players can return zero results even when looking for a specific player name that you know is active. It’s not unusual for me to see a LFG post, shift-click on the name to check class and level and get no response.