Big-data exuberance has surged with the recent news about how much money was raised by Cloudera, the frontrunner among start-ups distributing Hadoop, open-source software used for storing and parsing huge volumes of data. The total — mainly from Intel, but also venture capital firms — was $900 million, putting a value of $4.1 billion on the young companyIn a joint white paper in 2011, written by Cloudera and Teradata, the conclusion was
Having both Hadoop and a data warehouse onsite greatly helps everyone learn when to use which.But this week Cloudera announced Cloudera Enterprise 5 and the borderline between Hadoop and data warehousing becomes more fuzzy.
Teradata partnered with Hortonworks, one of Cloudera competitors, to release Teradata QueryGrid
Other big among smaller players is Pivotal . The CEO is Paul Maritz, ex #3 in Microsoft and native of Zimbabwe, a country I lived in the seventies (used to be called Rhodesia). They offer “an easy way to buy not only Hadoop, but all the important layers on top of it.”
According to Nick Rouda, a big data warehousing consultant
Pivotal is making its bid with the bundling strategy. “It is offering a broader platform than the Hadoop players like Cloudera and Hortonworks, positioning itself as a one-stop shop,”SAS is also changing it's products for data mining to include Hadoop.
From now on the game is big business not technology. Gartner estimated in the past about 1,000 accounts for all Hadoop vendors. Oracle has 400,000 accounts. IBM probably has a similar number.
Will the traditional data warehousing, like Teradata Labs, IBM and Oracle be under threat by the Hadoop newcomers (Cloudera, Hortonworks, MapR Technologies, Pivotal, etc) ? No, but it will erode the profit margins for the Data warehousing.
Big Data is organically part of the cloud technology, that Oracle dismissed first, then put all resources into it. As I wrote in my blog
Oracle will not be able to build a cloud business without eroding it's own mighty enterprise software revenues.Looking at Oracle big data pages we read:
learn how Oracle engineered systems -- powered by Intel® Xeon® processor E5 and E7 families -- are designed to access, analyze, and store data more efficiently and cost-effectively to uncover new business insights.Sure, all is beautiful, except Intel now owns 18% of Cloudera and I expect soon Cloudera will run best on Intel platforms - because the devil is in the details.
This "must have Hadoop" craziness is tempered by reality. Big Data does not predict the future by itself. Even Google Flu Trends, the darling of Big-Data-as-Messiah speech writers, discovered that they predicted 30% to 50% more cases of flu than actually occurring.
Hadoop actually represents the "Distributed Computing" paradigm and you could think of it as a subset of Grid Computing. And Grid Computing is originator of the cloud computing.
Grid Computing experts can run a grid, but they are the only ones. See sample message from HTCondor users. Oh My God! Grid computing was too complex and too geeky to reach wide adoption.
Hadoop is less difficult to use, but not by much, see Cloudera Hadoop Hackathon pictures. These photos seem taken from the first episode of HBO's Silicon Valley - you can watch it on you tube
|Thomas Middleditch, left, and Josh Brener in “Silicon Valley,” a new HBO comedy|
She who offers the best user experience, wins.