How to Avoid Big Mistakes in Big Data
The past few years have seen an explosion of new technologies for storing, analyzing and displaying the enormous amount of data available to businesses today. Repeated media coverage and highly publicized success stories have led more than one CIO to feel the pressure to "do something" related to big data. But do what, exactly?
Businesses are eager to acquire the latest and greatest technology, both to gain a competitive advantage and to demonstrate to their stockholders that they operate on the cutting edge of their industry. Hidden below the excitement of publicized successes, however, lies the reality that achieving their intended goals is neither automatic nor assured.
As a result, the rush to keep up with other companies can lead to fundamental mistakes that cause big data projects to go awry. A recent survey of IT professionals revealed that 42 percent had experienced a big data project failure.
Many of those failures can be traced back to some fundamental misconceptions about what big data can and can't accomplish, or by trying to apply old methodologies to a technology that has many new dimensions. What follows are ideas and observations from our work with clients that may help your company avoid a similar fate.
Don't rely on technology alone
What's often lost in the hype about big data is a very fundamental idea. Big data is a collection of powerful tools, not solutions. The successful use of those tools lies in the hands of the people who use them.
Big data and advanced analytics offer companies the ability to ask fundamentally new questions. Those questions are laced with nuances that can cause the naïve application of traditional business intelligence (BI) techniques to fail.
Much is made of the person — often called a "data scientist" — who knows how to work with big data. The data scientist possesses an unusual mix of skills that is hard to find in one person. These include a willingness to spend large amounts of time preparing data, a wide knowledge of statistics, an in-depth understanding of the business problem and exceptional communication skills, among others.
Rather than focusing on finding one person, many companies have achieved impressive results using a heterogeneous team of innovative experts, each of whom brings a specialization to the table. Common roles include data scientist, data engineer, solution architect, SME and BI expert.
Don’t use old models
Raw data is difficult for humans to draw ideas from, but the insights derived from proper analytics can produce significant business value. That's why big data analytics is gaining momentum. It is a proven way to extract tangible business value from a combination of structured, semistructured and unstructured data sources.
A common fault of big data projects is that far too little thought is given to how big data problems must be approached from a modeling perspective. Many analysts simply use the modeling techniques they are most familiar with and apply them, unmodified, to big data problems.
A "tried and true" modeling technique that is a poor fit dramatically raises the probability of finding false correlations due to chance combinations of data. This phenomenon is usually referred to as “the curse of big data.” Some correlations, such as, “people who wear blue shirts are more likely to be basketball fans than those who wear darker colors,” can be ruled out by SMEs as mere coincidence. Others, however, are more difficult to identify as either errors or valuable insights.
Whoever is performing the data science analysis needs to have the expertise to test for correlations to determine which are true.
In many modeling approaches, the data scientist must strike a trade-off between the complexity of the model and the transparency of the logic used to arrive at a conclusion. Models that are more complex can produce more accurate results and mitigate the effects of overfitting; however, the logic and calculations resulting in their answer can be nearly impossible for a human analyst to follow.
For many applications, this can be a significant problem. Humans must decide whether to take actions according to the results of the big data analytics calculations. Complex models can make it hard to evaluate whether the results produced by those calculations are valid.
Simple models can be easier for human analysts and decision makers to understand, yet their simplicity could mean that they work well only in limited applications and not consistently. In this case, business decision makers might put more faith in a model that is less accurate!
Data scientists must work closely with the stakeholders to understand the business strategy, tactics, culture, risk aversion, preferences of the end users, trust in automated decision aids, and willingness to commit to data-driven decision making in order to know which modeling approach will result in a data product that actually gets used.
Making your big data analytics program a success
The application of breakthroughs in technology has made processing big data possible. Companies are still adjusting to the differences between traditional BI applications and the new requirements that big data brings to the table.
One of the most important lessons they're learning — some the hard way — is that the difference big data makes doesn't hinge solely on the infrastructure you buy or the size of the Hadoop clusters you build.
Realizing the promise of big data requires the correct technology as well as relevant, high-quality data. Most importantly, it requires highly trained individuals who can extract true value from an ocean of data and communicate it clearly to those who can act on it.
Putting as much emphasis on obtaining the best data scientists as on obtaining the latest technology will provide you with a powerful capability that will return significant dividends on your investment. And it will create a competitive advantage your rivals will be unable to match.
CHENWEI LIU and MARK MELOON are data scientists in CSC’s Big Data & Analytics group.