This somewhat geekier blog, shows how Sun Grid Engine, the leading world software managing cluster resources evolves in the future 6.2 Update 5 release to optimize the new multiple core CPUs ubiquitous in modern clusters. This step is necessary before the irreversible transition to cloud computing is fully completed. The complexities will be hidden from the end user, but they can explain how this software will work behind the scenes to assure the magic of an HPC cloud.
All features subject to change...
Features in both Sun Grid Engine 6.2 Update 5 and Open Source Grid Engine
F. Slot-wise preemption
D. More efficient and effective preemption
B. Preemption makes users unhappy. Therefore, it must be enforced only when absolutely necessary. The concept of the subordinate queue comes from the times when only one core CPU's were used in clusters. Rather then preempting the entire subordinated queue (Queue-wise subordination), Slot-wise subordination allows preempting individually jobs from a subordinated queue, minimizing the disruption of users work. The ability to more finely enforce subordination policies, results in a more efficient use of resources. It works well in conjunction with Topological Scheduling (see below) and thus contributes to higher throughputs.
F Array job throttling
D. Allow users to prevent large jobs from monopolizing a cluster
B. An SGE Array Job is a task that is to be run multiple times with a single command. This means EXACTLY the same task is going to be run multiple times, usually processing different data segments. The same task processing gets applied on different aspects of a problem. Array Job Throttling allows users to set a self-imposed limit with the maximum number of concurrent running job tasks. The array job may take some tolerable longer time to conclude, but this blocks out less resources for the array jobs and allows other jobs from the same user or other users to run sooner.
F. Topological scheduling
D. Performance optimization for multi-core processors, specifically on Nehalem
B. In the modern multi-core processing, each socket CPU and each core has execution units, cache, memory channels, I/O channels. Under NUMA (Non-Uniform Memory Access) a processor can access its own local memory faster than non-local memory, that is, memory local to another processor or memory shared between processors. Topological Scheduling allows to schedule jobs at core level or CPU level according to its unique needs. The use of Topological Scheduling has resulted in dramatic performance increases when tested at leading EDA customers.
Unique Sun Grid Engine 6.2 Update 5 Features
F. Data-aware job scheduling
D. Send work to the data rather than vice versa, because the application performance is tied to data locality Update 5 allows Hadoop (also known as MapReduce) jobs to be run efficiently in an SGE cluster. Update 5 offers the groundwork for a future Oracle Coherence integration.
B. Sun Grid Engine will be making a best effort to schedule jobs to servers with fast data access. It will be made aware of data locality and of data needs of jobs and will place data dependent jobs correspondingly. The rest is then done by the data grid technology being used (Hadoop in Update 5, Oracle Coherence in a future release). These data grid technology usually are able to migrate data where needed. Data-aware scheduling will minimize the need for data migration thus resulting in dramatic throughput improvements.
F. Service Domain Manager Cloud Adapter
D. SDM Cloud adapter will support more use cases, especially if multiple OS versions are needed and thus multiple AMIs need to be managed.
B Increases the number of users who can take advantage of SGE connectivity with AWS EC2.
F. Power Saving
D. A Cloud Service adapter can be configured for power savings, based on SLOs for power savings. New commands such as "showCloudHosts", "startupCloudHosts" and "shutdownCloudHost" can used to create new power saving scripts.
B. Substantial savings in reducing the costs of the utilities bills and the ability of having a Green Data Center.
F. SGE Inspect improvements
D. Will support configuration of Sun Grid Engine Parallel Environments
B. More ease of use when managing parallel processing workloads with Sun Grid Engine.