Saturday, July 24, 2010

Remote Revolution, HPC in the Cloud & my guest blog

My latest blog is featured in the HPC in the Cloud:

Many thanks to Nicole Hemsoth, the Editor of the publication. Nicole started the the idea of Remote Revolution
The Remote Revolution is defined and perpetuated by the contracting economy—a process that in itself is revolutionary because it represents a necessary downshift in unsustainable levels of consumption and waste....
The Remote Revolution is defined as a movement away from traditional, stagnant modes of work that emphasize and value the judgments of vast hierarchies of management as they observe and monitor perceived productivity.
The Remote Revolution is defined by its emphasis on humanity and family, thus it is devaluing the Protestant work ethic that propelled this country forward (or so grandpa always said) with long hours and a corporate culture that actively eschewed personal and family time.
The Remote Revolution encompasses concepts of worker autonomy, proactive environmental change, social and community restructuring, family and flexibility, sustainability in all areas…
It is going to change the world.

HPC on-demand, is the another proof, a continuation of of the Remote Revolution Nicole envisages.HPC on-demand is non-traditional, non-stagnant mode of work and offers outright value to end users. It des-intermediates layers and layers of some bureaucrats and inefficiencies in creating HPC facilities, We can have more wealth, innovation, more democratic access to resources not accessible until now to the ordinary computer users.



Nicole's thoughts will be the subject of a forthcoming book, and the HPC on-demand is another proof that our lives are richer and more meaningful

Wednesday, July 14, 2010

Amazon New HPC Compute Cluster Instances. How much it costs?



I don't think from just reading the announcement, that many people realize the high costs involved in placing an HPC data center on AWS.

See  the blog from Jeff Bar at 
http://bit.ly/99zipE ,

"Each Cluster Compute Instance consists of a pair of quad-core Intel "Nehalem" X5570 processors with a total of 33.5 ECU (EC2 Compute Units), 23 GB of RAM, and 1690 GB of local instance storage, all for $1.60 per hour."

The price list for Compute Cluster Instance (CCI) is at
http://bit.ly/dbdyQ2 . Please note the Reserved Compute cluster (not on demand) is priced $4,290 upfront for 1 year and 56 cents per hour, which apparently is about one third of the non-reserved , 6,590 for 3 year upfront and 56 cents per hour

Armed with information, let's price on demand, without commitment of time the configuration that ranks #146 on TOP500 described below:

" We ran the gold-standard High Performance Linpack benchmark on 880 Cluster Compute instances (7040 cores) and measured the overall performance at 41.82 TeraFLOPS using Intel's MPI (Message Passing Interface) and MKL (Math Kernel Library) libraries, along with their compiler suite. This result places us at position 146 on the Top500 list of supercomputers.

The [price for 880 CCI are then 1.60*24*30*880 = $ ! million per months, or 12 millions per year at 100% utilization. This quite mind boggling.

So if we want reserved instances, the same price will be 0.56*24*30*880 = $0.35M per months or $4.3 million per year., PLUS upfront fees for each new cluster added. For 880 clusters the upfront money to pay is 880*4,290= $$3.8 million. This results in 4.3+3.8 = $8.1 Million in reserved CCI

Sure $8.3M is lower than $12M by 31%. Note the "saving" - if we call these savings, are even higher if the commitment is for 3 years.

What does it mean

o Prices are easy to use
o Prices are designed to encourage long term commitments
o Prices very easy can evolve in high fees to be charged, even out of control
o Ease of use will democratize the HPC user access
o Many champion users will try to negotiate discounts,
o Profits are so attractive that other players will jump in

Over all good news. However to be able to use optimal scheduling policies in hybrid clouds HPC, billing is NOW a condition sine-qua-non. The comparison between the HPC on-site costs and the fees to pay Compute Cluster Instances, should be part from the cloud management software.

2 cents

Miha

Thursday, July 08, 2010

Making big money with Hadoop

Hadoop Summit 2010 in Santa Clara was like a shot of adrenaline to the letargic Silicon Valley. Sold out, booming with developers, and with would be investors in sessions that filled the rooms to the brim. Imagine you wake up with a second chance. This is Hadoop Summit, Santa Clara on June 29, 2010

Who has the greatest opportunity with Hadoop? Yahoo?, Google? Facebook?  I believe they already cash on this technology. The biggest revenue opportunity is for database companies, particularly for the market-leader Oracle.

There are a few bloggers covering what actually happens, but none of them foresee the huge monetization future of this technology. As the CTO of Kharmasphere Shevek Mankin, says it easy to set the Hadoop cluster, put the data in, but how do you take the data out? This is crux of  MapReduce technology: it t is not, in spite of contrary claims, mature enough to conquest the Enterprise on June 29,

Basically all Hadoop applications are collecting huge streams of data, classify them via MapReduce and place them in a structured data base using some form of intelligence..


Open source tools like Oozie, a work-flow system for managing Hadoop jobs including HDFS, are nice, but what is the business model? . This is more frustrating as one can see the billions and billions of dollars in revenues and valuation at social sites companies like Facebook (recently valued at $ 24 Billions), Netflix Yahoo and Google. What about Enterprise, where most of the wealth in our society is created ?

Cloudera and KharmaSphere want to sell supported Hadoop distributions and developer tools. Market is limited by the minimal sales coverage these companies have in enterprise settings.

So here is the other way around. IBM plans it's own Hadoop supported distribution and has presented at the Summit a do-it-yourself analytic tool based on Hadoop. It has “an insight engine, for allowing ad-hoc business insights for business users – at web scale. It allows access to  embedded unstructured data, previously un-available to analyze”

The most puzzling and conspicuous was   is the de-facto absence of Oracle at the Hadoop Summit 2010. If anyone from Oracle attended, it was probably in a stealth mode :-)

Assume Oracle can productize a Hadoop-based analytic at web scale , they can sell add-on to all theiir database enterprise users. Oracle, according to Gartner 2009

  o  Is #1 in worldwide RDBMS software market share by top five vendors
  o  Holds more market share than its four closest competitors combined
  o  Is #1 in total software revenue for Linux and Unix with 74.3 per cent and 60.7 per cent market share respectively


Assuming $24B per year total revenues in Oracle, can you imagine having a Hadoop product to complement the existing $10B a year database income only? Note this is a yearly amount, the installed data base based on the last five years should be at least $40B . Assuming a 1% attach ratio, they can sell Hadoop analytic web-wide tools for $500 million per year growing exponentially to $5 billion if the attach rate is 10%. What  if the attach rate is 20%?

At that level, it would be the biggest money making product using the Hadoop technology, outside social networking industry.

There is simply no other product, IMO in Oracle portfolio that can provide this growth. Oracle has a Grid Engine team, they recently acquired via Sun acquisition, which has been integrated in December 2009 with Hadoop as Sun Grid Engine. A significant chunk of Oracle's Hadoop know-how comes from Sun's merger.

The first step is not engineering, but customer research within their corporate data base customers and determine the minimum number of features customers need and are enchanted with. And making the product wanted through astute customer research are not the focus of the Hadoop Summit developers so far.

References:

1. Hadoop Tutorial; http://thecloudtutorial.com/hadoop-tutorial.html
2. IBM BigSheets : http://www.slideshare.net/ydn/1-ibm-disruptiveapplicationwithhadoophadoopsummit2010
3. Oracle Grid Engine: http://www.oracle.com/us/products/tools/oracle-grid-engine-075549.html
4. Ahrono Associates : http://ahrono.com

Miha Ahronovitz




Blog Archive

About Me

My photo

AI and ML for Conversational Economy