Google has just announced a free GPS navigation service with the latest version of Android. It’s a classic Google bargain: they provide the desired content — in this case, phone-based maps and turn-by-turn directions — and the user will (eventually) see ads.
The lingering question, of course, is what happens to your data.
Should you be concerned?
Why should you be be concerned about your “location trace”? The EFF has an overview On Locational Privacy, and How to Avoid Losing it Forever. You can deduce a lot about people from their locational traces: where they sleep and work and play, what stores and restaurants they like, who they spend time with, more.
Privacy should be the biggest concern for users of location-based social networking apps like Foursquare, Google Latitude, Loopt, and others. For example, will these companies store and analyze your location traces to figure out what ads to show you as part of their business model? [Maybe not; Google Latitude claims to overwrite historical log data whenever new data comes in.]
Location traces are only the beginning of the story. People create many types of contextual data, such as phone logs, web history, e-mail, search engine queries, and more. Let’s consider a different model — one which radically shifts the balance of power from the corporations to the consumer.
Faking contextual data
The idea: enhance privacy by taking advantage of the insecurity of contextual data.
- You can send the service provider a bunch of fake location traces to obscure the genuine one. The service provider has no way to check the authenticity of the traces, as long as the traces themselves are convincing. [This may not be strictly true if the service provider is a cellular carrier, as traces may be belied through E911 technology, but often the carrier is not part of the application data flow].
- You can do something similar with search engine queries. The insecurity of search engines (their inability to determine valid queries) can, paradoxically, be leveraged for privacy. One example is TrackMeNot, a browser plug-in that obscures actual search queries by automatically generating a multitude of other queries.
I don’t necessarily recommend these approaches as described; it may not be practical to multiply operating expenditures by an order of magnitude or more to support privacy. But this approach might be appropriate at certain times or in specialized domains like the military.
The ability to generate convincing fake contextual traces might also be useful in social applications. Suppose you want to conceal your Vegas trip, yet don’t want to go off the location-app grid, which might arouse suspicion. You would just need to generate one convincing fake trace and substitute that for your actual trace.
We predict fraudsters will be first adopters of technology to fake contextual data, and that this will drive better techniques for detecting fake contextual data (which, in turn, improve the ability to create fake contextual data: a classic arms race). Click fraudsters have already been engaging in a primitive form of faking search engine queries. Consider the version of click fraud where publishers target ads to certain locations, and fraudsters need to seem to “appear” in those locations to get the ads — their trace to the location must appear realistic to foil fraud detection algorithms.
Fake it till you secure it
At PARC, we’ve been experimenting with creating convincing fake location traces. It’s not a trivial task. For example, you can’t just splice in something from a database of past traces if you’re assuming that the parties trying to detect the fakes have access to the same data and are as, if not more, knowledgeable as the parties trying to commit the forgery.
We have developed an algorithm where we fake a driving trace leveraging Google Maps, by:
- extracting the polyline (this is the vector of latitude-longitude pairs that constitute the route);
- filling in more points; and
- adding simulated stops and noise.
One tricky challenge is simulating noise errors in the GPS signal. Errors in an actual GPS signal seem to drift – for example, see this plot of an actual trace. When the signal is off a bit, it tends to stay off in the same direction for a little while.
By the way, faking contextual data only works if you convincingly fake all the data in concert (e.g., a GPS trace showing you at work would not match accelerometer data consistent with driving).
See our full paper for more details of what we did. Others have also developed methods for faking location traces of a car trip, such as John Krumm of Microsoft Research and Pravin Shankar et al from Rutgers.
PARC is also exploring the idea that contextual data implicitly authenticates you. It’s a privacy problem because you not only have to worry about what the data is telling others, but that the data itself fingerprints you.
Tell us: which one is fake?
We don’t know how to make guarantees about how convincing a fake is. Perhaps the best option is peer review (as with cipher evaluation). If you’d like to try distinguishing our fake traces from real ones, download this zip file. The file contains 6 traces of the same route with 2 fakes; the readings in each trace are 5 seconds apart. Let us know which two you think are fake, and why.
Editor: Sonal Chokshi