Features
The ‘Data rEvolution’ Has Begun
Regardless of where you are or what industry you work in, how you find and use data is dramatically changing. A new data-driven economy is emerging and organizations across the globe are just beginning to understand it and explore the possibilities.
“Your own data is not enough,” say the authors of CSC’s latest report, Data rEvolution, which explores the new reality of IT where data is derived from both internal and external sources.
The 65-page report, based on research directed by Paul Gustafson, CSC’s Leading Edge Forum (LEF) Technology Programs director, examines the world’s rapidly escalating appetite for data, how organizations are creating new ways to manage and see that data, and what new discoveries, services and opportunities will arise from it.
“We began this report because we were seeing some pretty interesting and unconventional movements around data,” says Gustafson. “Data growth was starting to drive what we’ve dubbed as our own sort of ‘bitonomics,’ where the data side of business is beginning to hold a new secret to success.”
Yours, mine and our data
Download the full report, Data rEvolution.“Today, no industry is exempt from the challenges – or opportunities – of the Data rEvolution,” say the report’s authors, Gustafson and Sidney Shek, solution architect for CSC’s operations in Australia.
In the first chapter, Great Expectations: Do More with More (Data), challenges like evaluating climate change, managing energy and maintaining stock exchanges are discussed as having data as a primary component. The ability to analyze and act upon data, particularly external data, will be key for both solving challenges and gaining competitive advantage.
Today, for example, climatologists aren’t the only ones interested in weather data. Insurance companies, environmental organizations and energy suppliers are just three groups interested in gaining access to satellite data generated by federal agencies such as the U.S. National Oceanic and Atmospheric Administration (NOAA).
“Climate data is not going to live in a stovepipe any longer,” says Gustafson. “Other businesses are creating value from a deeper understanding of climate trends and the symbiosis associated with it.”
Flexing muscles to manage big data
It takes technology to unlock, manage and gain value from the volume of data the world is generating. Forget megabytes and gigabytes; get ready to handle petabytes, exabytes, zettabytes and yottabytes. CSC’s own High Performance Computing Center manages more than 110 petabytes, or 110 million gigabytes, of data for clients. The Large Hadron Collider, a particle accelerator run by CERN, the European Organization for Nuclear Research, will produce roughly 15 petabytes of data annually.
“To handle the complexity and flexibility inherent in the new world of diverse data, the database is shifting from relational to non-relational. This is a revolutionary change,” says the report.
In the second chapter, A Changing Foundation: New Methods to Manage Big Data, the authors look at the emerging approaches and technologies organizations are using to manage and generate new, unique views on data.
“We looked at the technologies and concepts coming into place that will enable us to perform the heavy lifting, help us cull through all that data and make the sort of correlations that we think will bring insight,” says Gustafson.
These include:
- “Shared nothing” architectures, where tasks and data are distributed, potentially through a cloud, to separate systems and worked on independently
- “Unstructured” data, which is more chaotic and messy than traditional structured data, but potentially rich with contextual insight, and “jagged data,” which is data in a variety of formats that was never meant to be combined
- Google’s distributed processing framework, MapReduce, and related developments, such as Hadoop, the open-source technology stack that is used by companies such as LinkedIn, Facebook, RackSpace and Yahoo, the largest contributor to Hadoop
- A “new breed of databases,” made popular by the “NoSQL” (Not Only SQL) movement, that are being adopted by various organizations to store and process data that has attributes that rarely “fit” into traditional highly structured relational database models
Connecting dots to make magic
New technology solutions will provide the muscle to manage and work data; however, to make real gains with data, organizations will need to forge new connections, as explained in the chapter, The New Alchemy: Connecting the Dots. For example, to make energy infrastructure “smart,” one key will be to have situational and predictive capabilities that can draw on weather data.
“Energy isn’t going to get smarter just doing the same thing of rendering power; energy is going to get smarter when it recognizes consumption, and one of the correlations with consumption is weather,” says Gustafson. “That’s a new dot we’ve never connected before because of the systems and their maturity.”
Aware of the potential competitive advantages data offers, in 2009, the U.S. government launched data.gov to give the public access to its wealth of data and promote global collaboration. Since then, the U.K. has established data.gov.uk, and the European Union is discussing beginning a data.gov.eu initiative.
Predicting the future
In any industry, the ability to solve or anticipate a problem can be enhanced by asking the right questions, as discussed in the chapter, Enabling the Predictive Enterprise: Strategies to Understand, Anticipate and Plan. The report cites Endeca’s MDEX Engine technology, a hybrid search-analytical database developed to help users find the information they need and understand what they found.
For example, Toyota used Endeca’s solution to help resolve a crisis involving its gas-pedals. The database let the automaker’s quality engineers sift through years of product and quality data from numerous systems in ways that were not previously possible, and identify patterns they would not have known to look for in the past.
“If we look at big domains, like research and development, production, operations, and sales and service, and treat them as big process areas, Toyota had never combined the data across the different domains, so they weren’t able to answer the questions that needed to be answered to get to the bottom of what was going on with the pedal assemblies,” says Gustafson.
Gustafson notes that Toyota also had to draw on external data, such as from transportation agencies and other manufacturers, to determine what was happening with its gas pedals. This is another example of an organization’s data not being enough to solve today’s challenges.
Mining the social space
Just as understanding new data patterns and relationships between different sources of data are crucial to capturing new opportunities and solving challenges, organizations are beginning to experience the opportunities available in mining and understanding social media data and the potential pitfalls when not considering that domain.
One example the authors cite is of a major consumer products company and reports that its stock price fell after a group of angry parents began saying on Facebook that one of the company’s products caused problems. The company, known for protecting its brand, quickly went on the offensive, both traditionally and, notably, online. This included constant social media monitoring.
“The social network has become the feedback loop; you can ignore it or leverage it,” says Gustafson. “It’s significant that companies are using the very same types of technologies that we would consider counter-terrorism technologies to manage their brand. You need to monitor the new data channels and learn how to respond.”
Much as those working at the frontier of space exploration have generated new products and developments, so have those working on the Web’s frontier created a wealth of innovation and opportunity.
“The social media and search companies have done the real pioneering of the technologies that we are now seeing commercialized because they were ingesting a lot of data and needed to put it into context,” says Gustafson. “They needed a better method than a supercomputer; they needed massive scale and massive storage. Yahoo and Google really are the pioneers of a lot of the under-the-cover tools that many commercial organizations are now monetizing on.”
A picture’s worth a billion petabytes … or more
With billions of searches and petabytes of data circling the globe, being able to visualize data has become increasingly important. The report’s last chapter, Seeing is Believing: Visualization and Visual Analytics, looks at emerging ways to visually represent information using a “multitude of dimensions across time and space,” calling it a new form of 21st century digital cubism.
In this new era of information visualization, visual analytics use analytical reasoning supported by highly interactive visual interfaces. To solve the big problems, however, like unlocking the mystery of the human genome, or to make truly great discoveries, such as super-fast, super-safe space travel, will take multiple disciplines, people and data. In turn, the potential gains to be had in solving the big problems are noted by some organizations.
Besides federal initiatives, such as the U.S. National Visualization and Analytics Center, which provides strategic leadership and coordination for visual analytics tools, companies like IBM and GE are also sponsoring sites to encourage outside participation in visualization experiments and contests. Also, visualization conferences, like IEEE’s VisWeek, challenge participants and members to solve large-scale data analysis and visualization problems.
“Visualizing today’s data is less about viewing flat passive displays and more about participating with and in the data,” says the report. “Data is being shared across boundaries and organizations, be it Toyota (quality and product data), CERN (physics data) or NOAA (ocean data), so that more people can explore and understand the data. Organizations need to prepare for the data-centered economy.”
