Big Data Research
by Chris Sapardanis
Intrigued by the emerging market of Big Data, CSC World spoke with Paul Gustafson about the Leading Edge Forum’s (LEF) ongoing research on the topic. Gustafson oversees the planning and execution of the LEF’s technology programs, which include grants, papers, technology research reports, briefing series, Chief Technology Officer Exchange, and centers of excellence.
In addition to his big data research, Gustafson consults with large companies on how to rethink business based on emerging technologies and addresses audiences frequently on these topics.
Judging from your research, how is Big Data changing the marketplace?
Gustafson: We were intrigued when The Economist published an article1 talking about the economics of data. It claimed data was becoming the new raw material of business, with an economic input almost on par with capital and labor. That’s a pretty strong statement. And yet it’s true. We are seeing a number of businesses that grow, develop, and then justify the fact that their data is an asset.
But the biggest buzz in data is what is happening in the social network. We’re seeing businesses practically overnight looking to make connections with the data like never before and capitalize on its power. Mostly, the desire is to create some sort of a connection to something that would bring a deeper economic value. This desire to connect data in new ways is true whether we are talking social networks or the power grid.
On the technology front, there is a revolution underway in how we store, process, and manage data to get new insights. This is the move to armies of commodity computers doing parallel processing à la Google, distributed processing frameworks to manage the data across these computers, and new, more flexible databases. The days of the “one size fits all” database — the relational database approach — are over. Add to this the cloud: on-demand provisioning of processing power and storage so you can scale up or down easily with business needs, and reduce costs as a result of “renting” rather than owning your IT infrastructure, paying only for what you use. The business, and more specifically the data, drives IT consumption.
Is this the same data that’s been around for years, or are we talking about actual new data?
Gustafson: It’s a combination. The whole business intelligence movement is based on trying to get more out of the data you have. Typically a business intelligence environment of the past was based on historical data that a business owned. It was their intellectual property; it was stuff they produced and generated. This was all well and fine as people built their analytics and made predictions off the past.
The shift taking place now is the blending of their data and someone else’s data. This often means blending structured and unstructured data. We call this “completing the context.” This practice to glean new insight is not possible with just one’s own view of the world of data. It requires the blending of internal data and data from other sources, like time and place data or social network data.
For example, one company is combining location, terrain, weather, time, and sensor data to provide near real-time situational awareness for utility companies. If a utility company knows a heat wave is expected in a specific area, and when, it can notify heavy users to set their thermostats higher to avoid a power outage. By connecting the dots in a timely way, the company can manage resources more effectively and go for a controlled “dial down” rather than an uncontrolled shutdown.
What roles within organizations will be changed by the infusion of this new connected data?
Gustafson: The social space is typically a channel serviced by marketing communications people. The professionals that manage brands are also vested stakeholders and one of the examples we cite in our report, “Data rEvolution,” is a major consumer products company. They created a new organization inside their business to deeply tune in to brand chatter online and defend the company in the social network if needed. This group is using counterintelligence techniques originally designed for federal security agencies to find bad people. And it all started with a group of angry consumers posting their experiences with a products their kids use.
Do you think this type of group might become common within organizations?
Gustafson: If they care about their brand, and they care about noise and chatter in the spirit of their brand, more organizations may have to launch their own counterintelligence techniques to diffuse what could be false stories. One of the challenges of this social network is its greatest benefit — it’s widely connected.
So if you’ve got a bad story out there that’s untrue — or worse, a bad story that’s true — you will need to launch a counterintelligence tactic to counter that story. That means being much more tuned in to sentiment, and being able to translate that sentiment into information you can act on.
Beyond the brand and marketing aspect, what other observations have you made during your research?
Gustafson: We are moving toward global sharing of data beyond what has been done in the past. Probably the best example is happening with CERN, the European Organization for Nuclear Research, in Geneva. The Large Hadron Collider there is the world’s largest and highest-energy particle accelerator. This system and the volumes of data it generates are designed for collaboration because not even the CERN installation is big enough to analyze and store all the data the collider is generating.
Setting yourself up for a much more distributed set of data and analysis is a leading-edge trend, but not only with the collider. It’s also happening between the worlds of environmental science and healthcare. We know there are environmentally induced implications around health. We are now seeing the need for data across disciplines in order to truly understand and manage the complexity of the world we live in.
Will this new collaborative world be full of new privacy and security challenges?
Gustafson: I don’t think any of that is worked out yet, but there is a lot going on around de-identification. For example, you could still capture the general trend of what’s going on in health, in a specific demographic or location, just without including real identities.
Google launched an analytics platform not even using official medical data. They just correlated against search data and had remarkable projections — when compared with actual health data — that were spot on. This started with the H1N1 virus in 2008, when people were doing searches about the flu. Google was able to predict outbreaks of flu in the United States quickly and accurately. There are a lot of people interested in search data as a new form of data, which they don’t own, to do their own analysis, combined with their data to provide more insight.
Are we heading toward a completely accessible information for-all future?
Gustafson: The architecture of information in a business is certainly changing. Until now, the stewards of information have been in the IT department. The broader expectation of the economics of data, in which we need to do more with more data and exploit it like never before, is going to stretch what used to be the core disciplines of data management.
Just like the cloud world empowered non-IT folks, saying, “I don’t have time to wait for IT to stand up that kind of equipment,” you are seeing business areas take charge of data analytics in a completely different way, drawing from new forms of data that are not IT oriented, and making judgments and decisions on behalf of the business. As a result, we are seeing an inflection point in the world of IT, and the IT response is similar to that around cloud: Give people the tools and data they need to get their jobs done.
CHRIS SAPARDANIS is editor of CSC World magazine.