Of course, their efforts will also help companies hawk products with supercomputer precision and potentially allow government agencies to develop uncanny profiles of its citizens. Mining data could do all of these things and more, but the question remains whether or not privacy matters should give pause to the research itself.
Since the 1930s, data mining has been practiced with the idea of extrapolating different attributes of information—or dimensions—and compiling different attributes into something meaningful to those requesting the data. Facebook recommending pages based on the pages that your friends like, or Amazon recommending books based on books bought by other customers with similar tastes, are examples of today’s data-mining applications.
Venkatasubramanian’s new algorithm—which he presented on last week at the Conference on Knowledge Discovery and Data Mining in Washington D.C.—squashes complex dimensions and makes them easier to interpret, a tool that could help all researchers. “It’s like a screwdriver,” Venkatasubramanian says. “It will tighten any screws in any object, be it a bicycle, car or a dishwasher.”
Not only does it tighten up inefficiencies, but it handles larger data sets, also. Where previous computer programs would struggle to boil down dimensions or attributes for 5,000 people, Venkatasubramanian’s method crunches the data for 50,000 individuals with ease. For data mining, it’s like what dynamite did for traditional mining in the days of using just a pick and shovel.
But strip mining of data may have its own detractors within the field of ethics and science. “Technology is a fast-moving object,” says Robert Gehl, a professor in the University of Utah's Communication Department, who has studied science and new media. He recommends that, in general, “researchers inject ethics into their equations to address ‘what will this be used for?’, and not necessarily leaving that to policymakers, because technology outpaces policy, sometimes.”
Since researchers who develop tools know how they work the best, it’s logical for them to be stakeholders in their use, Gehl says. This is a point not lost on Venkatasubramanian, who notes the academic data-mining community’s efforts to educate the public on their field. Venkatasubramanian also notes that, when used by corporations, efficient data mining can make marketing less of a nuisance and more useful—individuals receiving e-mails recommending the exact book they were looking for, as opposed to being on the receiving end of a mass spam-mail for cheap Mexican pharmaceuticals. “More effective data mining is less invasive because it is more useful to people,” Venkatasubramanian says.
Scholars of the data-mining field believe in responsible use, but also say that access to the information is already out there. For Leo Irakliotis, professor and dean of the school of computer and information science at Nova Southeastern in Florida, the implications of data mining are serious. He says it wouldn’t be improbable to imagine health-insurance companies jacking up clients' premiums by monitoring their Costco purchases, or government agencies tracking the reading habits of bookstore patrons to see if they might need a wiretap on their phones. “It’s easy to see the dark side,” Irakliotis says, adding, however, that fear of the worst can’t hold back hope for positive research.
“Every discovery has the potential to do something good for society, and we must not withhold research on the fear that some bozo out there will do something stupid with it,” he says.
While some may worry about Venkatasubramanian’s method being embraced by government and corporations, it also has gains for all researchers including earth and biology scientists, or researchers with complex data problems like mapping the human genome.
“This exact thing is happening right now,” Venkatasubramanian says. “Large data sets we’ve collected will be researched and will lead to definite benefits in more ways than just money for corporations.”
Eric S. Peterson: