Open Source

The Elephant in the Room: What's the Scoop on Hadoop

By Paula Bernier, Executive Editor, TMC  |  October 15, 2012

This article originally appeared in the October 2012 issue of INTERNET TELEPHONY.

Big data is one of the hottest catch phrases in information technology today. The term expresses the existence of the large, and growing, collection of information amassed by many organizations. That may involve both structured and unstructured data.

Conventional wisdom would lead you to believe that the more data and data types are involved, the more expensive and complex it is to analyze data. But that’s not necessarily so, says Jack Norris, vice president of marketing at MapR.

In fact, he says, Google (News - Alert) is leading the way on new thinking that simple algorithms work better than more complex ones when you have lots of data.

To enable sophisticated but uncomplicated and relatively affordable data handling, and allow organizations to address unstructured data, which is growing faster than Moore’s Law, such leading companies as Amazon and Google are embracing Hadoop.

Hadoop is an open source technology that runs on commodity hardware, and allows users to do compute on disc instead of on network storage – and unnecessarily using lots of network processing, Norris explains, adding that both of these aspects of Hadoop make it more affordable than alternative solutions.

MapR offers an open, enterprise-grade version of the Hadoop distribution. The company has made Hadoop easier to develop, providing full dependability and performance, and also making it more open so you can use standard database files. Thousands of companies from the Internet, retail, financial services, oil and gas, telecom, media and other verticals use the company’s solutions to analyze data from a wide array of sources.

Hadoop and various tools designed to work with it can allow for sophisticated clustering, data mirroring and more. Such functionality can be used in a variety of applications. For example, some companies are using Hadoop to identify anomalies in credit card transactions to identify fraud.

Norris goes as far as to say that Hadoop is changing the face of analytics in organizations today.

Yahoo! and Google were among the early champions of Hadoop, a technology around which a variety of companies including Cloudera, Hortonworks and MapR have since built businesses.

Doug Cutting, former Yahoo! engineer and current Cloudera executive, led the Yahoo! work on Hadoop. In fact, Hadoop is named after Cutting’s son’s stuffed elephant.

Google pioneered the use of MapR’s MapReduce, which enabled the fledging search engine to go from nineteenth place in the market to No. 1 in just two years. Norris says Google understood that given the sheer size of content on the web, the company needed to take a different approach. So Google took the cheapest hardware it could find and put small pieces of information on various machines, giving each machine a share of the processing job.

MapReduce, which works on 50 to thousands of nodes, allows users to do massive analysis in parallel, and in the process hides that complexity from the developer, says Norris

All of that makes Hadoop an attractive candidate for cloud deployments. In fact, Amazon has extended its Elastic MapReduce service to include MapR’s Hadoop distribution, which is being offered, sold and supported as a service by Amazon to its customers. MapR will also make its distribution available on Google Compute Engine.

“Hadoop is now central to the big data strategies of enterprises, service providers and other organizations,” according a first quarter 2012 report from Forrest Research.

“Forrester regards Hadoop as the nucleus of the next-generation EDW in the cloud,” the firm added, noting that EMC (News - Alert) Greenplum, IBM, Microsoft and Oracle are all evolving their enterprise data warehousing solutions to Hadoop.

Moves to expand the use cases of what’s possible with Hadoop; allow people to use standard tools like file browers; offer standard database access; and view Hadoop as network storage, are expected to spur even more widespread uptake of this technology, says MapR’s Norris.

IDC (News - Alert) forecasts that the Hadoop ecosystem will be worth $800 million by 2016. Market Research Media Ltd./ says the market for Hadoop-MapReduce will reach $2.2 billion in 2018, becoming the de facto standard for gig data management and business intelligence.

Edited by Braden Becker