Cookie tracking, versus big data sources

This WSJ article, as well as an article on Ars Technica about the increase in behavioral tracking, has me thinking about the future of using people’s data footprints on the web.

By “data footprint,” I mean that puff of residual that’s left by user activities, whether they be cookies on a website, or requests for quotes in financial markets, or ‘likes’ on FB, or ratings (and views) on your favorite streaming site. This footprint is the subject of a lot of anxiety, since there is a big difference between tracking you across your web activities and tracking you on any individual, given website.

The aim, from marketers’ points of view, is to ‘know’ you better, so as to serve your better advertisements. The backbone of the web as free-content-for-advertisement makes this a little unsurprising. It’s often what we mean (or at least a face of it) when we talk about the erosion of privacy in the internet age.

It’s kind of a funny trend, though, especially for those of us who are somewhat skeptical of individual agency to begin with. It’s also the trend that is most likely to end, in my humble opinion; at some point, someone in Congress is going to have their daughter tracked, find it creepy, and we’ll have some federal privacy protection in place (there’s a rumination to be had on this catalyst, but I will leave it for now).

More interesting to me is the confluence of signals that don’t track individuals, but that instead track organizations, markets, industries. To give two examples:

1) What we know about innovation and competition suggests that nascent industries are more likely to favor innovation, while more mature industries are likely to respond to innovation as intrenched interests. To wit, Disney didn’t care about (and if you believe Lawrence Lessig – and you should! – they were big beneficiaries of) cultural borrowing and (re)mixing when they began making Mickey Mouse films. Contentions over intellectual property are a property of the entrenched.

But in data 2.0 age, we could actually use something like data from the US Patent and Trademark Office to see how this might work. You could track types of patents, number of copyright takedown claims made through Google, and volume of intellectual property suits in federal courts, to create a picture of industries (or areas within industries) where there are high and low density of claims. This would create a kind of map of the industry, triangulating data without tracking individual users.

2) Interestingly, this company produces data from traffic flow patterns in major metropolitan areas. It would be interesting to see how the traffic flow changes around Baxter, Inc (a medical products company based in Deerfield, IL, near where I grew up) between 4pm and 8pm, combined with LinkedIn data on the number of Baxter employees posting updates to their resumes. If people are leaving before 5pm, with lots of changes to their resumes, does this mean a downturn for their business? Does an increase in traffic at 7pm, combined with few updates to resumes, mean that it is a thriving company? We could see if there are correlations with time-staggered stock prices, six months later. Again, the aim is not to track individuals, but to use collections of sometimes disparate data to shape understandings about organizations.

Two points for now. First, these are the kinds of questions that are actually answerable with data, that would have been impossible to examine even 5 years ago. And second, it seems that it may be time to revisit the idea of weak signals as a way to make sense of big data.

Comments are disabled for this post