Big Data and Business Intelligence: A Match Made in Hadoop
Lately, it’s getting hard to put enough zeros on numbers that quantify the volume of data our wired world generates. Current research estimates that our Facebook “likes,” Instagram photos, YouTube videos and blog entries contribute to some 2.5 billion gigabytes of data generated every 24 hours.
Much of the daily torrent of newly minted information is unseen. In addition to tweets, pics and status updates, a deluge of data generated by RFID readers, sensor networks, logs and countless other auto-reporting systems fills vast data pools.
“Sensors on a single commercial aircraft produce 10 terabytes of data every 30 minutes it is operating,” says Alex Black, senior partner in Analytic Insights and Information Management (AIIM) at CSC. “Multiply that by all the aircraft in service around the world at any moment — that’s a big number.
That’s Big Data. How that data is collected, curated and harnessed is a topic that once interested only computer science researchers. Black describes three characteristics unique to big data: volume, velocity and variety. With equipment sensors pumping out petabytes of data every hour, the volume and velocity of Big Data are easy to appreciate.
Big Data collection
Companies in a growing number of industries generate information with a volume and velocity that require an exceptional architecture. For example, mining companies collect data from sensors on equipment and personnel that generates terabytes of data per hour, requiring automated systems that can absorb data and quickly identify potential performance problems or safety issues.
“This is not a place for the typical database system, happily churning away on its columns and rows,” Black says. “Search engines were the first to encounter issues with volume and velocity, and today we apply the solutions they developed, such as [Apache] Hadoop.”
Instead of relying on small numbers of high-powered supercomputers, the Hadoop open source software framework (named after a toy elephant owned by the originator’s son) makes the opposite approach possible — utilizing hundreds or thousands of low-cost computers, spreading storage and processing tasks across a large cluster.
“Systems built on Hadoop play a growing role in our engagements for clients who need to store these large streams of unceasing data,” Black says. “That also makes it possible to add a layer of business analytics tools on top, so you can ask questions and extract value from that data in the same way you would handle unstructured information.”
A variety of data
Clients wishing to maximize the value of Big Data for their business will need a combination of technologies, in addition to Hadoop, to extract value from their Big Data. But understanding the variety of data is also crucial.
Structured information is only 20% of the picture. Carl Kinson, Enterprise Intelligence and Data global product manager at CSC, says that unstructured data, such as Facebook posts, Twitter feeds, emails, blogs, websites and more, are examples of the “variety” of Big Data.
“Big Data has existed for some time,” Kinson says. “But organizations hadn’t realized its potential simply because we lacked the means to extract real value from it. That has changed. “An onslaught of new tools and techniques is revealing previously hidden patterns in structured and unstructured data that aid in everything from brand management to improving healthcare outcomes to crime prevention.”
Big Data storage
CSC may recommend that businesses leave the data where it is after assessing their needs and data sources. “Most publicly available data is not only too massive to move and ingest into your own system, but it simply may not need to be stored in your own organization,” Kinson says. “CSC solutions can analyze the data at the source or work with data providers to filter the quality and volume of data into the business.
Your need to store data depends on whether it needs to be reused or [if it needs to be stored] for traceability and/or regulatory compliance.” Tools are now available to interrogate data at the source. For example, the real value of user-generated data lies in sample postings to understand what people are talking about. Kinson says some companies start by using social media analytics, which can identify consumer attitudes about specific brands and products.
“They’re becoming very adept at evaluating customer statements even when a customer writes ‘This toothpaste is the BOMB.’ Social analytics can tell you that’s a good thing,” he says.