Trends in Big Data: A Forecast for 2014
Author: Andy Walker, VP and General Manager, Big Data & Analytics
Big data infrastructure was so last year. In 2013, many companies reported success in bringing together their legacy mainframe infrastructure with new big data infrastructure. In 2014, we'll see companies shift their attention to putting that infrastructure investment to use.
Big data in cloud really means private cloud.
In 2013, most of the big data projects we've seen were put on top of bare metal infrastructure in the enterprise. We expect to see an evolution toward a virtualized infrastructure in 2014. We’re seeing a lot of investment in products that make this happen, such as Serengeti for vSphere, Savanna for OpenStack and Ironfan for Amazon Web Services (AWS). These projects allow us to automate the deployment of big data platforms to a virtualized infrastructure.
The era of analytic applications begins.
In 2013, enterprises learned a lot about how to use the big data infrastructure that was new to the market. This coming year, those lessons will be applied toward analytic applications. In 2014, we will see some great use cases happen on that big data infrastructure. This will be the year of: ’What can I do with big data?’ rather than: ’What is big data?’ Given this refocus on analytic applications, 2014 will create an even greater demand for people with skills in data science.
The Hadoop clone wars end.
We feel confident that the big data industry will consolidate down to a couple of Hadoop distributions. Currently, many distributions of Hadoop exist, some proprietary and some open source. In 2014, , the industry will consolidate to two of these. Those that remain will become less relevant — either because they are consolidated by acquisition into one of the survivors or they exit the market.
Speaking of exits, serial extract, transform, load (ETL) processes will largely go away in 2014. As the velocity of data increases, especially social data, there’s more need to analyze data in real time as a stream. Currently, Hadoop is being pressed into service for this — something it’s not well suited for. In-memory analytics and complex event processing give us the capability to analyze these streams in real time and extract intelligence on the fly. That eliminates the need to perform the traditional ETL steps.
MDM will provide the dimensions for big data facts.
Master data management (MDM) is used to create a single definition of data from an internal standpoint. As people realize that external data sources are going to add more dimensions to their internal problems, they’ll look for a single definition, or a single piece of data that will help describe that new definition or that new dimension, even though it’s coming from the outside world. If you realize that external data sources help solve a problem, you'll want an external MDM focus as well.
The consolidation of NoSQL will continue.
NoSQL means ”not only SQL” rather than ”the absence of SQL,” which means it is more inclusive than exclusive. NoSQL means there are many ways to look at data other than the structured and ordered approach that SQL requires. NoSQL was created to offer a way to look at data without forcing it into a concrete schema. That has been extremely successful, and we’re seeing a massive growth in NoSQL. There will be no slowdown in the adoption of NoSQL, but just as with Hadoop distributions, the industry is beginning to settle on a few major players. 2014 will bring a similar consolidation of NoSQL database distributions.
ANDY WALKER is vice president and general manager of Big Data & Analytics at CSC.
Learn more about empowering your business with Big Data & Analytics