As described on HTCondor web site:
...most scientists are concerned with how many floating point operations per month or per year they can extract from their computing environment rather than the number of such operations the environment can provide them per second or minute. Floating point operations per second (FLOPS) has been the yardstick used by most High Performance Computing (HPC) efforts to evaluate their systems. Little attention has been devoted by the computing community to environments that can deliver large amounts of processing capacity over long periods of time. We refer to such environments as High Throughput Computing (HTC) environments.In essence, Miron's research is driven by the following challenge:
How can we accommodate an unbounded need for computing and an unbounded amount of data with an unbounded amount of resources?This is a daring thought to have and it inspired a fascinating conversation that was triggered by the upcoming release of the BOSCO tool by the Open Science Grid (OSG)
|Professor Miron Livny|
Miron: My PhD at Weizman Institute of Science in Israel (1978) dealt with load balancing in distributed systems, a new field at the time. I looked into the undesired state of such a system.One of the many customers (users) are waiting for a resource, while one or more of the resources capable and willing to serve this customer are idle, yet they can not serve this user. Distributed computing, scheduling, matchmaking absorbed me. I do the same thing now for over thirty years.
Miha: And then?
Miron Then in 1983 I came to University of Wisconsin. While at Weizman Institute, all the research was simulation work. I really started developing the technology when I came to US. Here we developed the concept of separate, individual resource ownership. This was key. In my doctoral thesis, the system owned everything.
Miron In principle, I can run a lot of independent tasks on IBM Blue Gene, for example. It may not be as easy because Blue Gene does not have a full stack of communication and people are running MPIs. In fact we work with IBM to add HTC capabilities to their machines, But is this platform the most cost effective way of doing it?
A successful company such as Google realized that following the distributed computing principles that underpin HTC is the way to go. They decided to design a system where each component is assumed to be unreliable, and it is miracle if it actually does what it is intended to do. From the beginning they envisaged a system based on unreliable components. Giving work out to every node and the results eventually come back even if some of the nodes fail. In HPC, because of the MPI model, everything has to work, otherwise the whole thing falls apart
Miha: Did Google adapted HTCondor open source code?
Miron: While they refer to our system in their publications as an example for the resource manager they have developed, - no, they did not use our software. One of the reasons for this - I believe - is that is doing more than Google needs in their environment. They wanted something minimal attuned to their needs and not carrying the extra luggage.
Miha: Unofficial studies estimate last year Google Compute Engine had 800,000 cores. Today this number may be at least several million cores. It appears the decision to embrace these HTC principles paid off for them
Miron: Google wrote everything in house. I visited them a few times to share my experiences, and some of my former students joined them. If you look in the chapter 13 of the Grid book edited by Ian Foster et.al, Rajesh Raman - who joined Google - and I described the principles of HTC, not the HTCondor technology.
We are computer scientists. We create computer science principles, not just develop a software package. Anyone who wants to use those principles is encourage to create the software tailored to their needs, if doing so is cost effective for them. It goes without say that for Google it was!
Miha: Do you have multiple adaptations of HTCondor for different users?
Miron: We are funded by the National Science Foundation (NSF) and the Department of Energy ( DOE) to support the scientific HTC. It is therefore natural that our work is driven by the needs of the main HTC science communities and motivated by emerging science disciplines that embrace HTC.
Miha: What about industry?
Miron: While we have a lot of users in industry, we are not developing HTCondor for their needs. HTCondor enables science, and one major discovery can have a long term impact on the future of HTC.
Miha: One your collaborators, Cycle Computing has a powerful Amazon Web Service cluster priced on demand per hour that uses HTCondor. Are they other examples?
Miron: Red Hat offers support services for for MRG that uses HTCondor. For example all of Dreamworks Animation rendering farms are managed by HTCondor. It is a Red Hat account. Fedora and Debian also offer HTCondor.
|An intuitive diagram of Bosco|
Miron: The idea developed naturally. You just have to look at the HTCondor architecture under the hood. The philosophy was always "Submit locally and run globally" – namely submit on your workstation and run on any workstation in the organization. Now, why don't we go to the researchers and say: "OK, we'll give you just this one piece of software, called Bosco, that is running on your side. You will learn and install Bosco, in a few hours,and it will enable you to Submit locally and run on machines all over the world!"
You can view Bosco "My Personal High Throughput Manager"
Miha: This is the empathy Bosco seeks. Make the researchers happy. Offer them a simple interface to create a galaxy of HTC resources
I can share with you that I have a graduate student student who is building a MatLabs-based piece of software that we believe it will manage in its final incarnation more than 1 million jobs. But the user of this tool - yet unnamed - does not think in terms of jobs, it thinks about the points in the calculation of a multi-dimensional function. Each of the points is calculated by another HTC-Job, originated via MatLabs, but behind the scenes, it is Bosco that does the heavy lifting. This combinations of Matlabs with Bosco, enables a researcher with very basic computing skills to invoke a computation where every point is an independent job that is running on different machines in different parts of the world.
Miha: If this works, it will be mind boggling.
Miron: Bosco can help us by spreading the practical incarnation of the "Submit Locally, Run Globally" concept in HTC. If you submit say to SGE (Sun Grid Engine on one its many flavors), PBS, LSF clusters, you can get in, but you can not get out to another cluster. You are stuck with SGE or PBS or LSF When you are submitting to Bosco, you can go everywhere. And that's the concept: Bosco helps a lot the researcher to build bridges to different on and off campus resources and thus help improve throughput. Regardless of the software that manages these remote resources, Bosco should get your jobs there and bring back the results!
Miha: How did a scientist submit and manage jobs on different clusters, before Bosco?
Miron: If you want to run two jobs, don't use Bosco, because you don't have a High Throughput Computing Problem. But if you have 1,000 or 100,000 jobs, then you need a tool that will help you manage the entire high throughput computing activity. Without Bosco, when you want to use two clusters, you "ssh" to one cluster and submit jobs, and then "ssh" to the second one, and submit some jobs. This is a manual process that is not effective and does not scale. In Bosco there is only one submission for all the jobs.
Miha: What exactly this means?
Miron: All the researcher sees is one Bosco interface where he writes his submission files. Bosco automatically sends the jobs to the remote clusters via SSH. It can even load balance the workload among the remote clusters It takes care of everything needed to work seamlessly on those remote clusters (passwords, port numbers). The scientist does not know the name of the remote machine and its exact location. The researchers – users - do not have to install ANYTHING on the remote machine . There is no need to talk to the remote system administrator or ask anyone for any "special" treatment on those remote sites
Many cluster administrators put a limit to how many jobs you can have queued. In such a case, you must submit in batches of the maximum allowed number of jobs, say 500 which for 100,000 jobs means 200 manual submissions. Bosco does this automatically, we call it throttling.
|System administration tasks for a Bosco researcher|
managing multiple clusters from all over the world
Miha: David Ungar - in an interview of how to program many core processors said that the precision of a calculation depends how much money one needs to spend. How does it work in HTC?
Miron: For many HTC applications throughput is linked to accuracy / understanding / knowledge. Namely, if we can run more jobs (simulations / searches) we can establish more confidence in the results or explore more options / parameters / cases that will give us more information about the problem.
Miha: : Is it possible to predict (one day) what will be the minimum number of resources needed to deliver the results in y accuracy within, say, x days?
Miron: Having clouds helps us with a cost model for the computing part of this challenge, something we did not have in the past . Today, if you give me a function that relates product quality to CPU hours, I can translate it to $'s using clouds. It is not a trivial function to create Time to market can be a factor, too.
Miha: We defined Bosco in many ways, but we need a single definition of Bosco
Miron: I totally agree,
Miha: Can you help us spell this message?
Miron: I want the message to come from the Bosco team and you are part of it, You are new , while I am doing the same thing for thirty years.
(1) Bosco is another capabilities provided by the Open Science Grid for high-end high-throughput super-computing jobs running anywhere in the world and managed from local campuses
(2) Bosco beta v1.1 is available for download now