Higgs Boson: Think HTC instead of HPC.
What is Higgs Boson?The LHC (Large Hadron Collider) in Europe announced the appearance of a new particle among the pieces of smashed protons. It is called Higgs Boson, and it is believed to be the secret force that confers mass to matter.
Most of the people, including most educated ones, never heard of Higgs boson, even less understanding why it was hard to find it.
Why has the Higgs been so hard to find? It is only produced at very high energies, such as those in the Big Bang or generated in a particle collider like the LHC, and it breaks down almost immediately into a shower of other particles. "The probability of making a Higgs is so small that you are looking for one collision out of 10 trillion,
Most people think HPC
"After an enormous effort by LHC experimenters, the CERN laboratory and worldwide Grid computing community we are very excited to observe an excess in our data from a new particle consistent with the production of a Higgs boson," says UW-Madison Bjorn Wiik Professor of Physics Wesley Smith, who plays a lead role in the CMS experiment. "We will need the additional data planned from the running of the LHC until next year to establish if this is indeed the Higgs boson and that we stand at the threshold of a new era of understanding the origins of mass."
Our cherished assumption is wrongDavid Ungar, a manycore processor researcher said during an interview
The obstacle we shall have to overcome, if we are to successfully program manycore systems, is our cherished assumption that we write programs that always get the exactly right answers. This assumption is deeply embedded in how we think about programming. The folks who build web search engines already understand, but for the rest of us, to quote Firesign Theatre: Everything You Know Is Wrong!
Grid Computing versus Cloud Computing at CERN
The grid computing infrastructure was created, it handled 15 petabytes to 20 petabytes of data annually. This year, CERN is on track to produce up to 30 PB of data. "There was no way CERN could provide all that on our own," says Ian Bird, CERN's computing grid project leader. Grid computing was once a buzz phrase similar to that of what cloud computing is now. "In a certain sense, we've been here already," he says.
The entire grid has a capacity of 200 PB of disk and 300,000 cores, with most of the 150 computing centers connected via 10Gbps links. "The grid is a way of tying it all together to make it look like a single system."Internally, CERN is running a private cloud based on OpenStack open source code. CERN and two other major European research organizations took steps to create a public cloud resource called Helix Nebula - The Science Cloud.
All is nice and groovy but there is a small problem: As Ian Bird says politely "we're just not sure of the costs and how it would impact our funding structure.... "From a technical point of view, it could probably work," he says. "I just don't know how you'd fund it.""
The French say: "Le bon Dieu est dans le détail" (the good God is in the detail) . In English we say "the devil is in the details."
Thinking HTC (High Throughput Computing)
The unprecedented volume of computations for the Higgs Boson discovery was (and still is) carried out using the concept of HTC.
Open Science Grid services knit together researchers, many repositories of LHC data (UW–Madison is home to two research teams, one each for the two biggest experiments at LHC) and more than 100,000 computers at about 80 sites around the country.
“It’s also a huge triumph for mankind,” says Miron Livny, CTO at the Wisconsin Institute for Discovery. “There were more than 40 nations that came together for a long time to do this one thing that — even if it all worked out — wasn’t going to make anyone rich. It’s a powerful demonstration of the spirit of collaboration.”This colossal computer power came almost for free.
HTC is about sustained, long term computation. You might think the difference between sustained long term computation and a short term sprint is merely quantitative, but this difference really is a qualitative one. What HTC is in essence sustained throughput over long times.
You would like to measure computational hours , per day, per week, per year, for example. These numbers are so large so we really care about sustained hours. For example OSG (Open Science Grid) delivers about 2,000,000 hours a day, plus or minus, 730 millions hours per year..
OSG is an opportunistic resource, so there are never guarantees about available resources, but on average there is a tremendous amount of capacity there. Each site of OSG is autonomous, locally owned and operated.
Getting people to think in a high throughput way helps a lot. There are still many machines idles that anyone can access for free, but, they are not HPC (High Performance Computing) resources. They may be only be idle for an hour or two. If we have a single 10,000 hour long job, it will never complete on the OSG. But if you are able to deploy the same task as a workflow of 10,000 one hour jobs, you could finish in one day. Statistical and Monte Carlo techniques are often very applicable in HTC and these are similar to the Higgs boson time consuming stochastic modelling .
|Greg Thain HTCondor guru, teaching "Think HTC" at OSG Summer School 2012|
By Summer 2013 we will knowOn January 26 2013 Washington Post writes:
The timing could also help Scottish physicist Peter Higgs win a Noble PrizeThe world should know with certainty by the middle of this year whether a subatomic particle discovered by scientists is a long-sought Higgs boson, the head of the world’s largest atom smasher said Saturday.Rolf Heuer, director of the European Organization for Nuclear Research, or CERN, said he is confident that “towards the middle of the year, we will be there.” By then, he said reams of data from the $10 billion Large Hadron Collider on the Swiss-French border near Geneva should have been assessed.
|Professor Peter Higgs explaining what other call the "God Particle"|
Unleashing "guerilla" scienceThis is what Greg Thain, the "Think HPC" lead evangelist says:
You, Mr. Researcher are in a constant pressure to deliver results from a limited project funding. What will happen to your scientific project, if computation were really cheap? Because it is. So try not to think about being constrained by the amount of computation you have locally. What would happen if you could run 100,000 hours, one million hours? This is research. This is cheap. You can take risks. If you used 100,000 hours and still don't get the expected results, you still have the ability to analyze what happened and try again. No one will cut your funding. Quite the contrary.
Greg Thain, from HTCondor project, Derek Weitzel, Bosco architect and free thinker.
The opinions expressed in this blog are personal. Yet I am a member of the Bosco team, the quintessential "Think HTC" open source product one can try and use for free. You can download it from here,