Big Data, Bigger Potential
For a deep dive into this brave new world, see the “Data rEvolution” report, published by CSC’s Leading Edge Forum.
Listen to a recent online Town Hall event on Big Data.
Read the full Summer 2012 issue.
Once you get your mind around just how big our data has become, the next trick is to open your mind to the possibilities created by having all that data.
The industry term is Big Data, but it really ought to be called Huge Data or perhaps Ginormous Data. In 2011, the total data that had been created was 1.8 zettabytes. A zettabyte is a trillion gigabytes, or sextillion bytes. By 2020, IDC expects annual data generation to increase 4,300% from 2009, with total data produced to be around 35 zettabytes.
More than 70% of the data is generated by individuals, and herein lies the opportunity for enterprises: to access this data, combine it with the enterprises’ data and analyze it all for new insights. EMC says the number of customers storing a petabyte (1 million gigabytes) or more was 1,000 in 2010, and estimates that it will be 100,000 customers before the end of the decade.
Think about how much data you generate every day. You take a picture and store it in the cloud. Maybe you post a video on YouTube. Certainly you make a few purchases on your credit card, send email, browse more than a few websites or post a tweet. Now multiply those actions by billions of people. Add in data generated by sensors, from weather satellites in space, to manufacturing sensors on a factory floor, to sensors on the ocean floor, to sensors in your car.
An exabyte here, an exabyte there, and pretty soon you’re talking about really massive data.
Getting to information
They say that information is power, but information shouldn’t be confused with data. Power comes from mining the data to get at the information it contains. That’s where things get interesting.
Most of the data we are generating and will continue to generate is called “unstructured” data. That means that much of it doesn’t fit neatly into the fields of a relational database.
Unstructured data is less rigid, less ordered and more interrelated than traditional data. All those photos, videos and passages of text fall into this category.
Given the volume, variety and velocity of data, the tools for analysis are changing. More people can peer into the data in more ways for more meaning. Analytics is no longer confined to a few people working with a subset of data on a small analytics server off in a corner.
Today analytics is moving to center stage as more people can access the tools and analyze all the data, leveraging data from multiple sources and unleashing the power of a distributed grid of computing resources to do the heavy lifting involved in the analysis.
Analytics is now moving to the “predictive edge,” where the analysis is more time-sensitive and closes in on real-time results, as CSC’s William Koff and Paul Gustafson write in their research report, “Data rEvolution.”
One example they highlight is insurance fraud analysis. In the past, companies may have run fraud analysis every two months — which may have meant that fraudulent claims had already been paid. With Big Data tools, insurance companies can run those analyses twice a day, catching fraudulent claims in hours.
Another example is in financial services. On one level, banks can make connections to figure out which of their customers are the most profitable for them. On another level, they can use Big Data techniques to perform capital adequacy modeling — that is, to ensure they have enough cash in reserve compared to their outstanding loans.
In life sciences, one biotech company was able to analyze genomic data, cutting through billions of pieces of genetic information to identify 23 optimal genes for predicting heart disease. It then created the first gender-specific diagnostic tests for heart disease.
Big data is about big data sets, and “data as a service” is emerging so that enterprises can access the data without having to own it — think climate data or consumer financial data, for example. Data as a service — and ultimately analytics as a service — makes sense because the data must be available to multiple parties, who need to come together and collaborate on the data.
In many industries, the data just keeps coming. We keep gathering it, storing it. At this point, it is coming in faster than the questions we ask of it, and surely it has more stories to tell.
JEFF CARUSO is a writer for CSC’s digital marketing team.