Version 8.6.1 of Univa Grid Engine has been made generally available. This release represents a very significant update to Univa’s flagship product that reinforces its earned position as the de facto standard in workload management across a wide variety of industries – industries that now routinely embrace Deep Learning alongside HPC in facilitating their computational requirements. From improved diagnostics for end users to enhanced scheduling policies that better exploit GPUs (and other resources), while being extremely mindful of stability and performance, version 8.6.1 of Univa Grid Engine is introduced here.
Let’s face it, if you’re an end user, you always want to know why your jobs aren’t running. Even for current-generation workload managers, extracting the reason(s) why jobs are pending is a far from trivial task. At Univa, this is a requirement we’ve been systematically addressing in recent releases of our implementation of Grid Engine, and in version 8.6.1 significant progress has been made. Briefly, authoritative information known to the scheduler is digested and exposed concisely to end users without compromising the overall performance of the workload manager. In technical terms, the detailed option of qstat (qstat -j <jobid>) deprecates scheduler poking (qalter -w p <jobid>). To support selective use of this enhancement, a new scheduler configuration option is available (schedd_job_info = if_requested) that allows end users to drill into the pending-reason details should they so desire. Information collection for pending jobs gets activated then via the qsub/qalter option -rdi.
Next-Generation GPU Support
A significant fraction of our enterprise customers, with representation from almost every vertical market we serve, have been making routine computational use of GPUs for HPC for a number of years. With the more-recent addition of workloads typical of Deep Learning, adoption of GPUs as computational resources has accelerated dramatically. Fortunately, our ongoing commitment to supporting GPU technology as a market-leading computational platform truly achieves the next level in version 8.6.1 of Univa Grid Engine.
Because they are frequently packaged into dense configurations within a single compute node (running a single instance of an operating system), Univa generalized the Grid Engine notion of a complex to that of a Resource Map (RSMAP) to better abstract GPUs as workload-managed resources. As of this latest release of Univa’s flagship product, the existence of a device-level mapping that is known to Grid Engine via RSMAPs enables classes of use cases where binding tasks to specific GPUs is critical. A highly compelling example of a use case enabled by this enhancement comes from those emerging supercomputing centres that seek primarily to execute Deep Learning workloads on GPUs via Docker or Singularity containers. Even a cursory glance of Tokyo Tech’s node-level architectural schematic for TSUBAME3.0 (shared below and described in greater detail elsewhere) makes evident the challenges and opportunities for managing HPC and Deep Learning workloads on such supercomputers.
In the future-that-is-now however, such use cases can also benefit from a topological awareness exposed to Univa Grid Engine via NVIDIA’s Datacenter GPU Manager (DCGM). What this means, most importantly, is that combinations of CPU cores and GPUs can be scheduled by Grid Engine according to their mutual affinity – and in so doing, maximize the relative locality of their setting from a topological perspective that takes connectivity (including PCIe switches and NVLInk) into account.
Along with NVIDIA DCGM then, it is this differentiating support in Univa Grid Engine 8.6.1 for ‘containerized GPUs’, that enables the HPC and/or Deep Learning use cases envisioned for the TSUBAME3.0, ABCI as well as RAIDEN supercomputers.
Greedy Resource Reservation
Even in those Grid Engine clusters that make use of highly popular entitlement based scheduling policies (e.g., share tree, functional share) the most resource-demanding, and often highest priority workloads, can experience unacceptable wait times. While improving the entitlement of a resource-demanding job, significant amounts of override tickets may not have the desired impact in practice, as the scheduler is rarely able to accumulate the required resources. To avoid ‘starving’ resource-demanding jobs such as large parallel workloads, Grid Engine allows for reservations to be made. When used in tandem with backfill scheduling, resource-demanding jobs will eventually be scheduled, while resources can be utilized in a close-to-optimal fashion. Of course, for some of our customers’ use cases, this eventuality is quite simply not soon enough; and that is precisely where the ‘greedy’ version of resource reservation requires introduction. By focusing purely on reserving an appropriate number of compute nodes (rather than slots) for resource-demanding jobs, so-called greedy resource reservation in version 8.6.1 of Univa Grid Engine results in a reduction of wait time at the expense of utilization – i.e., nodes reserved through this greedy strategy are likely to have unused slots. Once the resource-demanding job has actually been dispatched and is executing, however, the scheduler can allow other workloads to make use of the unused slots ‘squandered’ by the greediness of this scheduling strategy. Greedy resource reservation therefore provides a compromise with a balance shifted towards highly important jobs (versus optimal utilization) for customers having this requirement.
If you’ve ever had an interest in scheduling shared or distributed-memory parallel workloads with Grid Engine, you’re likely familiar with slot-placement strategies – i.e., strategies in which slots span a single compute node only (for threaded applications), occupy distributed nodes sequentially until they are filled up, or are distributed more-or-less evenly (in a round robin fashion) across nodes. Separate and distinct from parallel workloads however, exist additional use cases for persuasively encouraging jobs to ‘gravitate’ towards certain compute nodes (aka. affinity) or, with equal importance, away from specified nodes (aka. anti-affinity). Enabled (in part) through an “affinity” property, use of this brand new complex attribute enables use cases such as access to shared data and a balanced network load, respectively. Because the potential scope for exploiting such affinities is broad, we look forward to hearing of your use cases for this new feature in version 8.6.1 of Univa Grid Engine.
Not content to rest on our laurels of scaling Univa Grid Engine 8.5.5 to 1,000,000 AWS cores in a demonstration shared in various ways at ISC18, version 8.6.1 of our workload-management product continues to place emphasis on performance that delivers scalability in an efficient and effective way. Version 8.6.1 of Univa Grid Engine continues the tradition established in the previous release of our software – in version 8.5.0 we were able to best the performance of open-source Grid Engine by a factor of two! In this latest release, emphasis has been placed upon memory management, parallelization as well as scheduling algorithms. Notably, the introduction of compression ensures that data communicated between Grid Engine components is done so efficiently; this capability will be a welcome addition to the increasing numbers of our customers interested in the deployment, use and ongoing maintenance of hybrid clouds for Deep Learning and HPC.
Version 8.6.1 of Univa Grid Engine is immediately available to our customers for download and use. If you’re not an existing customer, you can freely evaluate a trial version of our software.
More than 200 enhancements and fixes collectively comprise this latest release of our software. Although in this post we’ve focused on a small fraction of these, you can obtain the details here.
Obviously, we’re quite excited to share such a significant release with our valued customers, and especially look forward to your feedback as you make use of these new capabilities in Univa Grid Engine 8.6.1.