A primer on using Univa Grid Engine with GPU-aware applications
In Part I of this two-part article, I provided a short primer on GPUs, explained how they are used in HPC and AI, and covered some of the challenges users encounter running GPU applications on HPC clusters.
In this article, I’ll cover some of the specific innovations in Univa Grid Engine (UGE) that help make GPU applications easier to deploy and manage at scale.
GPU-aware scheduling facilities in Univa Grid Engine
Univa Grid Engine helps us to do a variety of things that are essential for managing GPU workloads in HPC and AI clusters.
We describe some of the unique capabilities in UGE that enable these capabilities below.
One of the simplest ways to monitor and manage GPU workloads in Grid Engine is by using a load sensor. By default, Grid Engine tracks a variety of load parameters automatically. To track additional parameters (GPU-related parameters in our case), we can add a GPU-specific load sensor. A load sensor is a script or binary that runs on a host and outputs an arbitrary resource name and a current resource value so that additional resources can be factored into scheduling/workload placement decisions.
To simplify support for GPU-enabled cluster hosts, Univa Grid Engine includes default load sensors for various types of GPUs. Load sensors are included for Nvidia GPUs (cuda_load_sensor.c) and Intel Phi GPUs (phi_sensor.c). Users can also build their own load sensors, and samples are distributed as binaries and source code to allow for recompilation as libraries or available GPU parameters change.
When in use, UGE GPU load sensors provide values for various GPU device parameters in real time. These values can be used for monitoring GPU status and reporting.
Load sensor parameters can also be used when submitting jobs to specify various resource and runtime requirements for GPU jobs. As examples:
When an application executing on a CPU core wants to communicate with a PCIe device such as a GPU it is preferable that the GPU be attached to the same PCIe bus. Otherwise traffic needs to cross CPU sockets resulting in high-latency and lower performance.
It is increasingly common to have multiple GPUs installed on a single host. For example, new AWS P3 instances provide up to 8 NVIDIA V100 GPUs per machine instance. To support these and similar NUMA architectures, UGE supports the notion of Resource Maps (RSMAPS) since UGE 8.1.3.
When using RSMAPs in UGE, each physical GPU on a host can be mapped to a set of cores that have good affinity to the device as shown below. In this example, we define a resource map complex called “gpu” with two IDs (gpu0 and gpu1), and each ID is mapped to a GPU. The second line provides RSMAP topology information to show what CPU cores have affinity to each GPU device.
When expressing RSMAP topologies, “S” refers to sockets and “C” refers to cores, and the topology mask reflects the number of sockets, cores (and threads) available on a CPU. By convention cores expressed in upper-case (“C”) have good affinity while cores expressed in lower-case (“c”) have a poor affinity.
With RSMAPs, we have precise control over how GPU jobs can bind CPU cores to GPU workloads.
Suppose we want to run a job that needs a single CPU core associated with a GPU connected on a local PCIe bus. In this case, we can use the syntax below, and UGE will pick a host and assign a core and GPU based on the RSMAP host-specific topology mask.
We can also submit a job that requests all free available cores with affinity to a GPU. This will prevent other jobs from being assigned to cores with affinity to the same GPU that could potentially conflict with our GPU workload.
For many GPU workloads (like parallel MPI-enabled Deep Learning workloads) we need to schedule GPUs as parallel workloads. This is easily accomplished using parallel environment (PE) features in UGE.
Consider an example where the allocation rule in our PE is 28 slots on each machine. We want to reserve four machines (each with 28 host slots) and four GPUs per host for a parallel job that requires 112 host slots and 16 physical GPUs.
In the example below, we create a reservation for one hour for 112 slots and four GPUs per host, and submit the parallel deep learning workload spanning four hosts and 16 GPUs by specifying the reservation ID to Grid Engine:
As of Univa Grid Engine 8.6.0, configuring GPUs and scheduling GPU workloads became easier still because UGE is integrated directly with NVIDIA’s Data Center GPU Manager (DCGM) providing Grid Engine with detailed information about GPU resources. This avoids the need for administrators to use UGE customized load sensors.
If DCGM is running on a cluster host, Univa Grid Engine can automatically retrieve load values and other metrics for installed GPUs and expose them through Grid Engine so that GPU information is available for scheduling and reporting.
DGCM provides mechanisms for discovering GPU topology. Topology information is exposed automatically in Grid Engine via the load value “affinity” as shown in the example below.
In this example, the first set of four cores (and threads) on each socket have good affinity to gpu0 while the second set of cores on each socket have good affinity to gpu1.
If we want to schedule a GPU enabled workload taking this affinity into account, we can request a single P100 GPU as shown below and require that a job is scheduled only on a host with available cores that have affinity to the required P100 GPU.
As explained above, Resource Maps can be used to identify the RSMAP ids on a host (for example gpu0, gpu1, gpu2, gpu3, etc.) and associate each RSMAP id with a physical device. This allows Grid Engine users to request one or more GPUs, and Grid Engine worries about keeping track of what physical devices are allocated to each GPU-enabled job. Users can see these associations when they run the qstat command to see what jobs are associated with GPU devices on each host.
Normally, the configuration and assignment of devices have no effect on scheduling, but in newer Linux kernels, control groups (cgroups) can be used for fine-grained resource management. A new setting called cgroups_params can be set globally or at the host level in Univa Grid Engine (host level settings override global defaults) to provide granular control over resources and how jobs use them. By listing GPU devices in a cgroup_path undercgroups_params, Univa Grid Engine will limit access to GPUs using cgroups based on how resources are assigned in RSMAPs. This provides administrators with better control over how GPU-enabled jobs use resources, and prevents applications from accidentally accessing devices that have not been allocated to them.
In part I of this article, we showed an example of how a TensorFlow deep learning workload could be run interactively using a container pulled from NGC (Nvidia’s GPU Cloud) and easily run a compute host with a GPU and CUDA runtime.
Provide seamless support for similar containerized workloads is critical in cluster environments because many GPU workloads run in containers.
When running with Grid Engine, rather than running nvidia-dockerfrom the command line, it is a better practice to run docker with the alternate --runtime=nvidia syntax as shown above to register the nvidia container runtime with the Docker daemon. This allows Grid Engine to recognize this as a docker workload and take advantage of the rich Docker integration features in Grid Engine. Using nvidia-docker requires that users be able to pass environment variables (NVIDIA_VISIBLE_DEVICES for example) into a running container, so Grid Engine needs to provide a mechanism for this as well.
Univa Grid Engine supports the NVIDIA 2.0 Docker Container Runtime (nvidia-docker) allowing transparent access to GPUs on Univa managed compute nodes. For a detailed discussion on running Docker workloads on Univa Grid Engine see our article Using Univa Grid Engine with Docker.
To submit a nvidia-docker workload to UGE simply use the -xd “--runtime=nvidia” switch on the qsub or qrsh command line. To pass environment variables that need to run inside the Docker container, add the additional directive -xd “--env NVIDIA_VISIBLE_DEVICES=0” switch to have Grid Engine pass environment variables used by nvidia-docker into the container.
With these enhancements, users can easily run a variety of nvidia-docker enabled containers with different software and library versions across UGE cluster hosts to share GPU resources more effectively while taking advantage of the other rich, topology-aware GPU scheduling features described above.