Friday, March 29, 2013

HPCwire Top Feature: HTC, Big Data and God Particle

Here is an excerpt

It is about time to give HTC (High Throughput Computing)  the credit. Without this technology, the Higgs boson particle would have been relegated to an elegant theory. Sooner or later, technological advances in computer science ought to be successful to prove the God particle exists. It so happened that the first breakthrough technology to get there is HTC.

Great physicists are like great prophets. Professor Higgs and his colleagues are among them. They detect "the unrevealed", those whispers requiring great effort to comprehend . The WLCG (Worldwide Large Hadron Collider Computing Grid) delivered the proof of their thought experiments, while they are still alive. Never mind Nobel prize speculations, this by itself is a huge reward.

Wednesday, March 27, 2013

Hope vs. Motivation: Why Big Data needs empathy and emotion

Because, - says Om Malik, one of the most extraordinary thinkers on Silicon Valley and the  founder of GIGAOM -
The problem with data is that the way it is used today, it lacks empathy and emotion. Data is used like a blunt instrument, a scythe trying to cut and tailor a cashmere sweater...
He concludes
The idea of combining data, emotion and empathy as part of a narrative is something every company — old, new, young and mature — has to internalize. If they don’t, they will find themselves on the wrong side of history.
The first reaction most people trained to follow procedures, is disappointment. Derrick Harris writes;
"Some people say big data is wallowing in the trough of disillusionment, but that’s a limited worldview

(but) there are small pockets of technologists who are letting their imaginations lead the way. In a suddenly cliché way of saying it, they’re aiming for 10x improvement rather than 10 percent improvement. They can do that because they now have a base set of analytic technologies and techniques"
Usually the big complains in social enterprise are for loss of privacy. But this is the surface not the reason

It is the unintelligent way of using data, with no empathy and no deep insights. Absolutely no one will get any business from me by filling my  mailing box (the snail mail box)  with junk offers. And like me, there are many people sick of telemarketers phone calls every evening at 8:00 p.m.. Or broadcast prerecorded message reminders from schools and doctors.

There is no difference in scientific data. The amount of data collected at the ATLAS detector from the  Large Hadron Collider (LHC) in CERN, Geneva  is described like this:
If all the data from ATLAS would be recorded, this would fill 100,000 CDs per second. This would create a stack of CDs 450 feet high every second, which would reach to the moon and back twice each year. The data rate is also equivalent to 50 billion telephone calls at the same time. ATLAS actually only records a fraction of the data (those that may show signs of new physics) and that rate is equivalent to 27 CDs per minute.
Sure, this is a technological advance to find the Higgs particle inside this massive data. Somebody must have a lot a Hope and Motivation

Figure 2: The shape of the curve Hope versus Motivation can  predicts the success

Higgs particle discovery needs to correlate to the humans feelings behind. the endeavor. I call them Hope and Motivation. Compare Higgs curve to medieval failed philosopher's stone

Paraphrasing  Malik Om insight. "The idea of combining data, emotion and empathy as part of a narrative is something every science research team has to internalize."

Saturday, March 16, 2013

Moving an application to distributed HTC

If you are technically fluent, this is very useful - new out of the box - presentation by Zach Miller .

He is NOT the Zach Miller the football player   He is from the Center of High Throughput Computing (CHTC) at University of Wisconsin, Madison. Zach presented it at OSG All Hands 2013 in Indianapolis last week

Some problems can be solved with HTC, as an alternative to HPC, some can not. But when possible, HTC may offer a surprising compute power at an unbeatable cost.

Saturday, March 09, 2013

Why Open Source Software?

I decided to look at something everyone thinks we know well, Open Source Software (OSS)  and see what I discover. The child in me said; "Do it".

The Strange and Not-So-Strange I uncovered

 No. This is not a high energy physics paper. This is a study on OSS performance.

Figure 1 A sample from the paper Motivation and Sorting in Open Source Software Innovation, Belanzon Schankerman November 2012
I scratch my head seeing the bizarre equation of performance as motivation in OSS, I wonder what these sigmas, alpha, beta  have to do with success.

Looking at Wikipedia entry on Open Source Movement , we see formally OSS is 30 years old,  since it was formally "incorporated" , so to speak,  by Richard Stallman's creation of the Free Software Foundation in 1983. This is venerable institution, beyond a media darling. I found this button from his web page:

He looks like a priest , an atheist priest who wants to impeach God. He never married, and apparently lives only by breathing, as his job at MIT has no salary. He says: I do free software. Open source is a different movement.

Dirk Riehle from SAP (SAP? Yes in his Stakeholder Perspectives  gives us his angle on why  Open Source Software (not Free Software)  is good for business:
  • [OSS] "increases profits through cost savings and reach more customers due to flexible pricing. This has upset existing ecosystems and shuffled structural relationships, resulting in the emergence of firms providing consulting services to open source projects.
  • "Developers face new career prospects and paths, since their formal position in an open source project, in addition to their experience and capabilities, determines their value to an employer. Developers strive to become committers to high-profile open source projects to further their careers, for more recognition, independence, and job security."

Figure 2: Dirk Riehle showing why an OSS must make money in order to reach a wider  user base

Ye and Kishida in their paper Toward an Understanding of the Motivation of Open Source Software Developers, show the solar system of an OSS:
Every developer wants to move into center, where the power and visibility and potential employment are maximized.
It is  the desire to learn without being taught formally, says Ye and Kishida study,  that motivates people. They even have a fancy name for it, LPP
Learning is one of the  motivations that attracts many users to become active  contributors and drive them to contribute more to OSS systems, we need to understand how learning takes place in communities of practice by introducing the theory of  Legitimate Peripheral Participation (LPP) developed by Lave and Wenger
LPP says learners experience learning not as a result of  being taught, but through direct engagement in the social, cultural, and technical practice of the community.
The conclusion of Ye and Kishida study: OSS is a very complicated phenomenon that is related to technology, human behaviors, economics, culture, and society. 

This is obvious.

My thoughts on Open Source Software 

For many years , OSS communities have written software meant to please each other in the community. Developers wrote to impress other developers. There was NO concern of who was going to use the software, outside from the people in the group, their girl friends, their wives and their friends

Most Free Software projects fail . The reasons we read on are lack of coding skills, project management skills, in essence technical skills.

No. The true reason is  this:

No one so far in a true, blue-blood hardline open source project considers UX, aka User Experience. The UX is not Usability, which is a bunch of metrics for making navigation of an UI easy.

UX is how users feel about the software being produced. When someone says: "I hate Windows",  or "I love Mac" or "I don't want command line" this is user experience.

In all those OSS motivational scholarly papers, no one ever mentions an UX contributor being motivated to participate. Nada. Zero. Zilch.  An open source software must be wanted  by outside users to survive and thrive.

Creating Habit and Desire

Not paying money is not a reason , per se,  to use a piece of OSS software. Here is the  Fogg Behavior Model, which tells us why

The user behavior is a function of motivation, ability and a trigger. This model works not only with software, but with decisions we take as humans.

I would let BJ Fogg to explain it himself using Facebook as an example. Based on this strategy, one can devise a strategy to get even Mr. Stallman join Facebook, given the right trigger

Bottom Line

Open Source Software is beneficial. Not only it can make money, it helps candidates to Nobel prize, to actually win a Nobel Prize.  Al Gore sits in Apple board, something Richard Stallman never managed to do. Maybe Richard by did not want it, although at the age 60 people mellow a bit. But most of us, we do want to do goodness. There is no way to accomplish our mission and  keping the freedom, if we starve to death, if we don't have children,  if we don't get funding for  our projects and if no one uses what we invented and created ex-nihilo.

Life is short. Just stop what you do now, and meditate for a minute. You see? Life is very short.

Tuesday, March 05, 2013

Some thoughts on the differences between HTC and HPC

High Throughput Computing (HTC) and High Performance Computing (HPC) represent two computational models that are very different, both in implementation as well as the resources required to run these.

Quoting XSEDE, part of the national cyber-infrastructure for high performance computing, "HPC codes … are tightly coupled MPI, OpenMP?, GPGPU, and hybrid programs. These codes require many low latency interconnected nodes." Because of this interconnect, HPC resources tend to be pricey. The Titan - Cray XK7 , which was ranked number one out of the TOP 500 in November 2012 , is an upgrade of $90 million.
HPC top supercomputers utilize codes that run on the entire system. In many cases HPC codes are not very portable since both MPI and GPU libraries often have library and/or machine specific components. For example, the Titan uses both CPU and GPU nodes. A light water reactor simulation called Vera will run in Titan, but "... the adaption to Titan's hybrid architecture is of greater difficulty than between previous CPU based supercomputers."
The Open Science Grid (OSG) is part of the National cyber-Infrastructure, but is dedicated to HTC computing. The OSG pool is a  Virtual Cluster overlaying on top of OSG resources  aggregated from many sites using the OSG Glidein Factory . By definition, a glidein allows the temporary addition of a grid resource to a local OSG pool.  The OSG Glidein Factory  is the glidein producing infrastructure tasked to advertise itself. It listens for requests  from other OSG Virtual Organizations (VOs) or outside HTC users.   It provides a way to run programs that utilize the spare capacity on a large number of resources in various locations.  
OSG Factory Glidein

Any resource under one or limited ownership - be it a car, a laptop, a cluster or a data center among many examples - inherently can not be used 100% all the time. There is an enormous dormant  capacity to be extracted from all pools managed by OSG, which sums up to many millions of hours of CPUs available. The OSG HTC technology brings forth this hidden power,  elevating the utilization of its' managed resources as close to 100% as possible 
HTC is, by design, a system based on unreliable components. Give work out to every node and the results eventually come back. If some of the nodes fail, the jobs can be restarted on a different system. 
Many science problems can be adapted to HTC, perhaps easier than adapting existing codes for top ranking HPC machines. HTC supports a new frame of mind that unleashes what I call "guerrilla" science. I am inspired by Greg Thain, an HTC evangelist at University of Wisconsin:
As a researcher you are in a constant pressure to deliver results from a limited project funding. What will happen to your scientific project if computation were really cheap? (Because it is) So try not to think about being constrained by the amount of computation you have locally. What would happen if you could run 100,000 hours, one million hours? This is research. This is cheap. You can take risks. If you used 100,000 hours and still don't get the expected results, you still have the ability to analyze what happened and try again.

Blog Archive

About Me

My photo

AI and ML for Conversational Economy