Before I attended Tuesday’s VLAB event on data exhaust, I had never set foot on the Stanford campus. A proud Berkeley grad, its manicured lawns and clean sidewalks and in-tact buildings had always just seemed wrong to me—learning should be dirty and difficult!
Taking the stage first was Roger Magoulas, who introduced the topic. For Roger, data exhaust is the byproduct of transactions or exchange whose applications include price analysis (Terapeak), advertising (Peerset), security (Mark Breier of In-Q-Tel, the totally not creepy venture arm of the CIA, was a panelist), personnel (the salary of a potential job is inversely proportional to the amount of Facebook untagging that you attempt prior to the interview), and content aggregation (which is not the same as stealing if you rebrand the content under the shadow of a vaguely accented, post-nationalist intellectual). JB, our CEO, followed Roger with a presentation of Peerset’s business plan. It was the first time I had heard of how that business plan has developed since the company was founded in 2005. It all began as an exploration into the structure of human creativity, building on research that suggests such creativity is a result not necessarily of new ideas, but of new connections between existing ideas. The founders used this science to develop a gift recommendation engine before evolving the company into the ad targeting technology it is today.
After the conference, a local investor pointed me towards Eugene Webb’s pioneering 1960 study Unobtrusive Measures, a manual for mining the traces, erosions, and accretions of human interaction without intruding on those interactions themselves. We’ve known since physicists stopped looking at apples falling and started looking at electrons spinning that we change a phenomenon by observing it, and this was an operational stumbling block for quantitatively-biased social science. Webb’s suggestion was to look at what was left when the action was complete: the fingerprints on the page of a magazine, for example, could give a very close approximation of how many people saw a particular advertisement; the cars in a retailer’s parking lot could proxy sales figures.
Using data exhaust to inform advertising is a little different. We are bound by the principles of scientific method only after the fact, not before. Simply put, we need well-conceived and well-executed control groups to prove that our methods work (CTR lift? Lift against what?), but before a campaign begins we actually want to change the action being observed. Say, for example, that Chrysler wanted to run an awareness campaign using social data to target only the drivers they think would be most receptive to their message. Our input in this case would be data exhaust from the social web, and our intention would be a shift in that same data—we would want to see an increase in explicit traces of the Chrysler brand in the social web, more PT Cruisers in profile pictures, more Town & Country tweets, more Sebring blogging. This is more reflexive than the simple mining and analysis of digital flotsam. For us, data exhaust is not about merely collecting the refuse of the social web; it is not just about recycling that refuse into valuable products; it is about the careful and measurable manufacture of influence in social spaces.
Â