August 09, 2006

When Your Past Comes to Haunt You

Could you be identified from a history of your online searches? It would seem to be so...

In August, 2006, AOL deliberately released search records on the web amounting 20 million queries from 658, 000 users over a 3 month period from March to May, 2006, but soon retracted them.
Such is the way of the web that copies were made in the meantime (of course), and the Google cache of the download page also lingered for a while. And then the hunt for who wsa searching for what began...

Sometimes you may deliberately opt in to keep a trace of your search history, as when you turn on Google's personalised search, or if you opt into an attention recording service. But more often than not, we tend to assume that what we search for is our own business. This is not a good assumption to make, because your serches are far from private.

In Spring, 2006, a widely reported news story broke in which Google resisted attempts by the US Government's Justice Department to hand over a random sample of data about searches made using Google, albeit in aggregated and anonymised form (for example: Google data request fuels fears and BBC News: Google defies US over search data and then ultimately Judge tells DoJ "No" on search queries).

Like all major search engines, Google keeps track of users' search behaviour using cookies in the user's browser, as well as reconciling searches against IP addresses from which the requests are made. Typically this will be an IP address registered by your ISP; so although Google may not be able to work out who you are from your IP address if you have cookies disabled, your ISP probably will be able to. The concern with handing over data to the DoJ was, then, in part a concern about privacy fears.

So when AOL released the block of search data (which is actually provided by Google for AOL), albeit for research purposes, the outcry against the release was swift and an apology from AOL was quickly posted.

Interfaces to the data soon followed, however, and people began to explore it, as in 10 Things I learned from the AOL Search Data, a post that also provides links to some online AOL data web interfaces.

Although the data was anonymised - individual users were given random numerical identifiers - it was only a matter of days before the first user was identified from their search records. (Update: see also this article on the tales the AOL data tells.)

One can only expect a version of 'Through the Keyhole Search Box' to air next!

But search history is not the only way your online past can catch up to you. A couple of weeks ago, I was telling a colleague about a podcast I'd heard earlier that day: Future Proofing Your Privacy. At the start of the talk, the speaker, Mark Hedland, tells of how he posted to an online group a post that said...

Hey, why don't you read it, and why don't you listen to what Mark Hedland has to say first hand (the first 7 or 8 minutes particularly).

For those of you who haven't followed the links, here's a recap. Something that was posted over 10 years ago to a part of the web that wasn't supposed to be being archived, was - and now Mark Hedland can show how foolish he was then in thinking that was he was saying then would disappear.

As we talked, my colleague mentioned how 5 or so years ago they had posted a request to a news group asking for a translation of a traditional, Canadian French folk song, a translation they have since lost, along with the name of the song. (Actually, it wasn't a song, French or Canadian, but it was to do with translation; I have changed the specific details to protect my colleague's privacy!)

Two minutes after leaving their office (or maybe it was three, certainly no longer than that) I mailed my colleague a link to a Google Groups search page containing their long lost post. The query used the equivalent of these search terms: translation song "sam smith". The post being searched for was the third item in the list of search results.

If you would like to know more about the world of search, why not check out Beyond Google: Working with Information Online, a new short course from the Open University.

Posted by ajh59 at August 9, 2006 10:16 PM
Comments