Tuesday, August 27, 2013

How to sell performance computing in 2013

I would rather gamble on our vision than make a ‘me, too’ product. 
Steve in the movie Jobs


  1. Offer Open Ended Performance  Solutions, not products
  2. Listen
  3. Know who you want to please
  4. Be different: resist the hype temptations
  5. Offer the best there is in technology, wherever you find it
  6. Make User Experience a priority
  7. Generate and collect "Aha" testimonials
  8. Seed for follow up business

Until now, the HPC (High Performance Computing)  was the consumer side in the realm of  Government (mainly Defense, Intelligence Agencies, and Science such as energy, agriculture, weather). ambitious political leaders and Academia. Large System builders with deep pockets offer the supply side, often bearing the a substantial part of the costs and hardly making any profits.

High Performance Computing is just one of the terms used to define problems so big, that the ordinary Enterprise Computing is unable to handle. These are usually the Super Computers as ranked by the TOP500 list. The current number 1 is Tianhe-2. This is not a computer normal HPC customers  can handle. Over three million cores  (about the same as the population of Los Angeles).  No one knows how much it costs to build. No software details and no application details are known.

Fig. 1: The Demographics - Behavior - Goals for classic HPC Super Computing market
Beside HPC, we have HTC (High Throughput Computing), a technology used extensively in the search for the Higgs particle at CERN.

Then we have Big Data , a buzz word in the industry, by  very real in High Energy Physics. IDC latest HPC Sites report "reveals that two-thirds of high-performance computing sites are conducting big data analysis as part of their HPC workloads" .

How to recognize the people interested in performance computing.

In 2013, The Demographics - Behavior - Goals of the performance computing, might look like this;
Fig. 2: Sample Demographics - Behavior - Goals for high performance computing
Each organization selling high performance computing must create a similar chart. This chart is dynamic and must be reviewed regularly. It is not so much marketing as it is intuition.

Offer Open Ended Performance  Solutions, not products

No organization or company has a complete list of products and services to solve all high performance desires. I look at HP.com and the way they present their Grid Computing. The link to "What is Grid Computing" takes one to page with wordings from enterprise marketing fluff . The link to Grid innovation for cloud computing, leads to this page

Fig 3. HP grid innovation page: "oops, we can not find this page for you"
The model in the picture is still waiting for it

Listen

You may hear someone who absolutely needs next-generation sequencing (NGS)
NGS is computation intensive and will require cluster computing. Eventually all genome research will be NGS. 

Or you hear or some research on cancer that may require nearly 40 years of processing at cost of $44 millions, we are told. What about reducing this  from  $ 44 millions to about 5,000 dollars and getting the results in 8 hours and not 40 years.

A large financial corporation uses Monte Carlo simulations to do a risk reports that requires 2.5 million hours on a single computer. What about doing it over the weekend?

Know who  you want to please

In performance computing, rarely is the CEO. They are the strategists and scientists who make things happen and who are hands on. The one who will re-assure the CEO that you know what you are doing

Be different: resist the hype temptations

Assume no other solutions are good enough. Deep in your heart, how would you do it? For a moment, all these letters and acronyms mean nothing (HTC, HPC, grid, cluster). Here is a quote from recent paper Perspectives on Grid Computing
We should not waste our time in redefining terms or key technologies: clusters, Grids, Clouds... What is in a name? Ian Foster recently quoted Miron Livny saying: "I was doing Cloud computing way before  people called it Grid computing", referring to the ground breaking Condor technology. It is the Grid scientific paradigm that counts!
Quoting Miron Livny
Solving “real-life” end-to-end problems makes you hype resistant

Offer the best there is in technology, wherever you find it

Now, after you have all clear, make the proposal. The best ever assets in performance computing are your engineers and consultants. People buy software to have your know-how behind it.

The utility HPC is a reality. What is not a reality it is its widespread usage. For now, the ability to offer utility HPC  / HTC / Big Data processing is big competitive advantage.

Always consider open source software. For example R statistical software, open source, has over 50% market share compared to the commercial offering (SAS, SPSS)

Keep the bottom line in black

Make User Experience a priority

ISO 9241-210  standard defines user experience as 
"A person's perceptions and responses that result from the use or anticipated use of a product, system or service"
The technology to achieve these result is not new. It is based on the teachings  of The Stanford Persuasive Technology Lab . Quote:
The Stanford Persuasive Technology Lab creates insight into how computing products ... can be designed to change what people believe and what they do
There is no other field in computing so ripe for this change. 

Generate and collect "Aha" testimonials

Ask: How do  you go from "I'll try this out" to "This is amazing"?

Seed for follow up business

Your body language must say all the time the same invisible words: "We are nice people to do business with."

Monday, August 26, 2013

Is Windows hard to use?

In my blog entry of July 14, 2013 I commented on Microsoft efforts to re-invent itself. It did not work. Mr Ballmer, 57,  announced  he will resign "within a year." He is worth $11 billions. Microsoft made all their money by copying the Apple MacIntosh interface, call it Windows 3.0 (which had a general system failure every 30 seconds or so),  then 3.1 and made a computer accessible to anyone from kindergarten to an old age home. People will put up with anything, as long as it was easy to use. It made everyone feel good.

The user experience (UX)  is what made Windows, Windows.

We live in 2013, when expanding the desktop capabilities to reach clouds and clusters is a must. But to do so, Microsoft takes a step back. It trades simplicity for complexity. It destroys the very reason Windows was Windows and successful.

I, personally, I would take the entire board of Microsoft to watch the movie Jobs. It does not matter whether the movie is good or bad. It doesn't matter whether one likes Ashton Kuchner or not. It is the spirit on this movie that Microsoft must absorb and apply it in their unique way of doing things.


Below is just a  sample of the know-how needed to use Windows in Performance Computing.
Oh My God!
I have a cluster with local IP addresses as 192.168.1.1~192.168.1.10 (node name: N01~N10) and every node has the Windows 7 64bit installed. I built the program by VS2010 (C++)+Intel Fortran+Intel MPI. Currently I launch my program by Intel MPI with the following command:
mpiexec -wdir Z:\ -hosts 10 n01 12 n02 12 n03 12 n04 12 n05 12 n06 12 n07 12 n08 12 n09 12 n10 12 -mapall Z:\test

Now the problem is that with the same parameters to program 'test', sometimes the program test is OK but sometimes it has the following error message:
Fatal error in MPI_Init: Other MPI error, error stack:
MPIR_Init_thread(659)......................:
MPID_Init(195).............................: channel initialization failed
MPIDI_CH3_Init(106)........................:
MPID_nem_tcp_post_init(344)................:
MPID_nem_newtcp_module_connpoll(3099)......:
recv_id_or_tmpvc_info_success_handler(1328): read from socket failed - No error
Fatal error in MPI_Init: Other MPI error, error stack:
MPIR_Init_thread(659)................:
MPID_Init(195).......................: channel initialization failed
MPIDI_CH3_Init(106)..................:
MPID_nem_tcp_post_init(344)..........:
MPID_nem_newtcp_module_connpoll(3099):
gen_read_fail_handler(1194)..........: read from socket failed - The specified n
etwork name is no longer available.
or the following error message:
*********** Warning ************
Unable to map \\n01\Debug. (error 71)

*********** Warning ************
launch failed: CreateProcess(\\n01\Debug\directional\fem) on 'N09' failed, error
 2 - The system cannot find the file specified.

launch failed: CreateProcess(\\n01\Debug\directional\fem) on 'N07' failed, error
 2 - The system cannot find the file specified.

launch failed: CreateProcess(\\n01\Debug\directional\fem) on 'N02' failed, error
 2 - The system cannot find the file specified.

*********** Warning ************
Unable to map \\n01\Debug. (error 71)

I don't know what could lead to these problems. 

Thursday, August 22, 2013

Amazon Web Services: We hate you and We love you.

The 2013 Magic Quadrant for Cloud Infrastructure as a Service from Gartner is out

In my blog entry Part 2. Getting out of the Trough of Disillusion Will cloud computing be adopted massively in 2011? I say:
Ability to Execute and  Completeness of Vision: These are two axes of the Magic Quadrant. This is in a nut shell how the companies are evaluated
  •  Market Understanding and Product Strategy are the highest ratings possible in Completeness of Vision
  • Product/Service and Customer Experience are the highest ratings for Ability to Execute
To get into the Data Centers, we need to understand this market, the way it is now, setting aside our belief-in-Nirvana that cloud computing will bring. In fact, we need to completely forget any solutions we have in our mind when interviewing for product management purposes a significant customer. We need to be humble   and respect for the all data center classic practitioners
See below the succession of magic quadrants  for 2010, 2012 and 2013
Fig. 1 Gartner Magic Quadrant Cloud December 2010
Note AWS in the middle of the pack, with a below overlarge ability to execute.

Fig. 2 Gartner Magic Quadrant Cloud October 2012
Note AWS takes off. The 2010 judgement that AWS
has no ability to execute probably motivated Jeff Bezos

Fig. 3 Gartner Magic Quadrant Cloud  August 2013.
AWS further distances from the pack who seemingly move backwards.
Perhaps all others were unable to update their visions?
There are a few words we can add. Perhaps a quote from the Steve Jobs movie impersonated by Ashton Kuchner:
I would rather gamble on our vision than make a ‘me, too’ product. 
Jeff Bezos. From New York Times. Things were not always rosy for Mr. Bezos.  
In 1999, Amazon, among the most celebrated of the dot-coms, was burdened with debt and spiraling losses. Jeff Bezos, its founder and chief impresario, had to impress Wall Street that he was serious about cutting costs.
So what Mr. Bezos cut? He eliminated the free aspirin for the employees.

This is what every company on the Magic Quadrant should do. There must be something else in cloud computing than what AWS is doing so well. Corporations are not very good at discovering what it is.

We go back to the Steve Jobs movie. These are the people we need to break through
Here’s to the crazy ones. The misfits. The rebels. The trouble-makers. The round pegs in the square holes. The ones who see things differently…they change things. They push the human race forward. And while some may see them as the crazy ones, we see genius.
Watch this trailer below from the Jobs movie. Skip the ad and get to the actual scene. Corporate employees are trained to take no risk. Most of the people inside the pack, with a few exceptions, work for these organizations. What we learn from the clip, is we must risk to be somebody, face failure and go over it.  We don't have to catch up with AWS. There are many ways to do cloud differently or add value to existing clouds or simply discover something  new, no one thought before.






Friday, August 16, 2013

Pleasantly Parallel Computing.

Jason Stowe, CEO of Cycle Computing  will be the keynote speaker at ISC Cloud'13 this September 23-24. Below are some excerpts from his interview with Nages Sieslack.

Jason  calls HTC "high throughput computing" , "pleasantly parallel computing" 

The terminology is not new. The term "Pleasantly Parallel Computing" was coined by  Miron Livny , - the CTO of Open Science Grid - at the SciDAC PI Meeting - Boston, June, 2007. You can see Miron's slide presentation here

Wikipedia also recommends using "pleasantly parallel computing" and not "embarrassingly parallel computing"

It took then seven years for this terminology to reach mainstream

Maybe we should start a new acronym, PPC. Jason build his company on HTCondor, the open source product for the HTC technology and implemented in the Open Science Grid. By offering a commercially  "on-demand" HTC and HPC application, Cycle Computing created a cloud. It a new name for the same technology. Jason  proved the practical benefits of the "pleasantly parallel" technology outside the realm of science and academia.

I can see a similar potential for Bosco, a product developed by Open Science Grid. We try to tame the complexity make all these powerful tools to the mainstream scientists. We try to make HTC and HPC, Pleasant Computing
Many use cases for cloud-enabled technical computing seem to be in the life science realm. What do you attribute this to? 
Stowe: Many of life science workloads, such as genome sequencing, or needle-in-a-haystack simulations like drug design are “pleasantly parallel” or high throughput, where computations are independent of each other. In the case of drug design, a cancer target is a protein that, much like a lock, has a pocket where molecules can fit, like keys, to either enhance or inhibit its function. The problem is, rather than the tens of keys on a normal key chain, you have tens of millions of molecules to check. Each one is computationally intensive to simulate, so in this case, a drug designer has approximately 340,000 hours of computation, or nearly 40 compute years, ahead of herself.With utility HPC, what would have taken a year to set up at a price-tag of $44 million, this drug sequencing workload completed in just 11 hours at a cost of $4,372. Without utility HPC, it’s safe to say this science would never happen. Even though life science was a logical proving ground for HPC in the beginning, other industries - financial services/insurance, EDA, manufacturing, and even energy, are now capitalizing on these kinds of benefits.
What other types of HPC applications or industries do you think are most suitable for the utility model of computing at this point?  
Stowe: We think that utility HPC will be the single largest accelerator of human invention in the coming decades. We have many use cases – energy, manufacturing, financial services, and many more - that prove how most modern sciences, especially Monte Carlo or data parallel simulations, work great in the cloud. Researchers, quants, and scientists of all disciplines can now execute computational science and complex finer-grained analysis that was previously unapproachable due to cost or overhead. Consider the impact on Financial Services as an example: a Fortune 100 firm uses HPC in the cloud to launch its monthly risk report – a 2.5 million compute hour Monte Carlo simulation that now completes over a weekend. A Fortune 1000 life insurance firm dynamically hedges risk across its entire portfolio, with nested stochastic on stochastic models and billions of inner paths for each annuity. Even at smaller scales, where scientists can start work in 10 minutes instead of waiting 6 weeks to get five servers, great science can now be done in a wide range of industries and applications.

Sunday, August 11, 2013

What a Data Scientist does?

BoscoR is a solution for scientists - using R open source software - who want to expand the computing resources beyond a desktop or a single cluster, yet with the best user experience. As of today, many scientists do not know how to move easily to a cluster and work as easily as with a laptop.
Q: Do I need to be a computer genius to set up cloud cluster?
A: If you want a whole cluster of machines in order to parallelize your jobs, you will probably require a little more technical expertise than for a single machine– certainly for the initial setup. Then again, once the cluster is up and running, there are many tools and packages facilitating the parallelization of jobs for the end-user.
We hope Bosco R using a package GridR will become one of the most simple to use. Why we believe this?

Just read Derek Weitzel blog on R.

But who are the Data Scientists today and what is the role of statisticians in the Big Data  hoopla ?


Big Data [sorry] and  Data Science: What Does a Data Scientist Do? from Data Science London

There is a big debate about who the Data Scientists are. Are they statisticians or are they computer science? For now, only a tiny percentage of the statisticians are litterate in super-computing, high throughput computing , and clusters.

Here is a  reference to the an article by Larry Wasserman which is worth reading:
Generally, speaking, statisticians have limited computational skills. I saw a talk a few weeks ago in the machine learning department where the speaker dealt with a dataset of size 10 billion. And each data point had dimension 10,000. It was very impressive. Few statisticians have the skills to do calculations like this.
 What do we do about it? Whining won’t help. We can complain that that “data scientists” are ignoring biases, not computing standard errors, not stating and checking assumption and so on. No one is listening.
First of all, we need to make sure our students are competitive. They need to be able to do serious computing, which means they need to understand data structures, distributed computing and multiple programming languages.
Second, we need to hire CS people to be on the faculty in statistics department. This won’t be easy: how do we create incentives for computer scientists to take jobs in statistics departments?
Statistics needs a separate division at NSF. Simply renaming DMS (Division of Mathematical Sciences) as has been debated, isn’t enough. We need our own pot of money. (I realize this isn’t going to happen.)
 This is changing fast. This is quote from Joseph Rickert from Revolution Analytics
R is a fundamental tool of modern computational statistics that provides the very bridge to data science... In recent years, survey after survey  has singled out R as being one of the top tools for data scientists 
See slide 26 of ten things Data Scientists do. The last bullet #10:  "Tell Relevant Business Stories from Data" No one can tell these stories without a creative mind who knows the nuances and finesse of mathematical insights this is an art statisticians posses.

The wonders will happen as soon as R users cannot  stumble for lack of computing resources.

Friday, August 02, 2013

The Kafkian Castle and the Digital Universe

40,000,000,000,000,000,000,000 Bytes (40 Zettabytes), that’s how much digitally stored data humankind will possess by 2020, according to IDC, That is nearly 40% increase per year

Intersect360 Research projects the HPC revenue compound annual growth rate to be 6.5% from 2012 to 2017, reaching $39.9 billion at the end of the forecast period.

Here we are: Digital Universe increases 40%, while the size of the technology required to make sense of these enormous data increases only 6.5%,

I discovered that 99%  of the world scientists are supercomputer illiterate and they don't have access to one large cluster or supercomputer.


 Most supercomputers have restricted access, have individual allocations. They are shielded by armies of computer scientists.  The researchers must go through them. It is not like working with a Mac.

They are like inside the village with the Castle - from Franz  Kafka novel - in the background. Every one can see it, but no one knows how to get there


The Top500 Tianhe-2 supercomputer is such a castle. No one knows how much it costs, no one is sure what software they use and what applications they run. But it is the fastest. Fastest for what?


Blog Archive

About Me

My photo

AI and ML for Conversational Economy