home › current work › parc blog
SPOTLIGHTS:
Tempted by location apps? - PARC blog
posted 5 November 2009
| Richard Chow view bio
Google has just announced a free GPS navigation service with the latest version of Android. It’s a classic Google bargain: they provide the desired content — in this case, phone-based maps and turn-by-turn directions — and the user will (eventually) see ads.
The lingering question, of course, is what happens to your data.
Should you be concerned?
Why should you be be concerned about your “location trace”? The EFF has an overview On Locational Privacy, and How to Avoid Losing it Forever. You can deduce a lot about people from their locational traces: where they sleep and work and play, what stores and restaurants they like, who they spend time with, more.
Privacy should be the biggest concern for users of location-based social networking apps like Foursquare, Google Latitude, Loopt, and others. For example, will these companies store and analyze your location traces to figure out what ads to show you as part of their business model? [Maybe not; Google Latitude claims to overwrite historical log data whenever new data comes in.]
Location traces are only the beginning of the story. People create many types of contextual data, such as phone logs, web history, e-mail, search engine queries, and more. Let’s consider a different model — one which radically shifts the balance of power from the corporations to the consumer.
Faking contextual data
The idea: enhance privacy by taking advantage of the insecurity of contextual data.
For example:
- You can send the service provider a bunch of fake location traces to obscure the genuine one. The service provider has no way to check the authenticity of the traces, as long as the traces themselves are convincing. [This may not be strictly true if the service provider is a cellular carrier, as traces may be belied through E911 technology, but often the carrier is not part of the application data flow].
- You can do something similar with search engine queries. The insecurity of search engines (their inability to determine valid queries) can, paradoxically, be leveraged for privacy. One example is TrackMeNot, a browser plug-in that obscures actual search queries by automatically generating a multitude of other queries.
I don’t necessarily recommend these approaches as described; it may not be practical to multiply operating expenditures by an order of magnitude or more to support privacy. But this approach might be appropriate at certain times or in specialized domains like the military.
The ability to generate convincing fake contextual traces might also be useful in social applications. Suppose you want to conceal your Vegas trip, yet don’t want to go off the location-app grid, which might arouse suspicion. You would just need to generate one convincing fake trace and substitute that for your actual trace.
We predict fraudsters will be first adopters of technology to fake contextual data, and that this will drive better techniques for detecting fake contextual data (which, in turn, improve the ability to create fake contextual data: a classic arms race). Click fraudsters have already been engaging in a primitive form of faking search engine queries. Consider the version of click fraud where publishers target ads to certain locations, and fraudsters need to seem to “appear” in those locations to get the ads — their trace to the location must appear realistic to foil fraud detection algorithms.
Fake it till you secure it
At PARC, we’ve been experimenting with creating convincing fake location traces. It’s not a trivial task. For example, you can’t just splice in something from a database of past traces if you’re assuming that the parties trying to detect the fakes have access to the same data and are as, if not more, knowledgeable as the parties trying to commit the forgery.
We have developed an algorithm where we fake a driving trace leveraging Google Maps, by:
- extracting the polyline (this is the vector of latitude-longitude pairs that constitute the route);
- filling in more points; and
- adding simulated stops and noise.
One tricky challenge is simulating noise errors in the GPS signal. Errors in an actual GPS signal seem to drift – for example, see this plot of an actual trace. When the signal is off a bit, it tends to stay off in the same direction for a little while.
By the way, faking contextual data only works if you convincingly fake all the data in concert (e.g., a GPS trace showing you at work would not match accelerometer data consistent with driving).
See our full paper for more details of what we did. Others have also developed methods for faking location traces of a car trip, such as John Krumm of Microsoft Research and Pravin Shankar et al from Rutgers.
PARC is also exploring the idea that contextual data implicitly authenticates you. It’s a privacy problem because you not only have to worry about what the data is telling others, but that the data itself fingerprints you.
Tell us: which one is fake?
We don’t know how to make guarantees about how convincing a fake is. Perhaps the best option is peer review (as with cipher evaluation). If you’d like to try distinguishing our fake traces from real ones, download this zip file. The file contains 6 traces of the same route with 2 fakes; the readings in each trace are 5 seconds apart. Let us know which two you think are fake, and why.
tags: mobile devices & interfaces, mobile security posted in security & privacy, social & enterprise computing, ubiquitous computing
categories
- business of breakthroughs
- cleantech
- conferences & talks
- e-newsletter archive
- electronic materials & systems
- ethnography
- glossary (our definitions)
- guest contributors
- human computer interaction (HCI)
- networking
- other
- our culture & processes
- our milestones
- parc.com
- security & privacy
- social & enterprise computing
- ubiquitous computing
- uncategorized
- virtual worlds (PlayOn)

View Comments
November 6th, 2009 at 9:59pm Posted by Senthil
Nice one, Richard! Interesting read.
November 9th, 2009 at 9:36am Posted by John Krumm
Nice post. Thanks for pointing to my work. I like how you’ve thought beyond just the technical aspects of this solution for privacy.
I also like your answer to why we can’t just use previously gathered GPS traces as the fake traces. I’m sometimes asked this question (e.g. by reviewers), and I never had a very good answer until reading your post. You’re right that attackers could have access to the same historical traces as anyone else, rendering the traces useless for this purpose. Good thinking!
November 12th, 2009 at 7:39am Posted by Pravin Shankar
Very interesting post, thanks for pointing to my work.
I agree that evaluation of “fake locations” or SybilQueries is an important problem. This would have to be done as a combination of user study/peer review, as well as machine learning/clustering techniques, and is a great avenue for future research.
I took a look at your traces, and my guess for the fake traces are 08042008.kml and 08062008.kml The rather naive reasoning is that the CDF of distance between consecutive points of these two traces looks different from the other 4 (see this figure)
November 12th, 2009 at 10:33am Posted by Richard Chow
Thanks for the comments, everybody!
Pravin, your guess is right. You’ve pointed out that some adjustments are needed in the distributions used by the coordinate-generation algorithm.
What’s tricky is that ideally we don’t want the algorithm to depend on data from actual trips along the route. The advantage of the algorithm is not needing this data (otherwise, we might have used John Krumm’s algorithm). One thought is to divide up roads into some categories like “freeway” and “residential”, and have distributions for each category. Of course, there might need to be a time-component to the category also, e.g. “freeway-during-rushhour”…
Post Your Comment