A few years ago, while at one of the SuperComputing shows, I met with several industry analysts and tenured product people with HPC providers. We talked about the software stack and tools deployed in the enterprise to manage and monitor clusters, and the workload management system. Generally, their views on the software stack can be summed up as “it’s all open source so it doesn’t matter” or “that’s an interchangeable part”.
I asked if they knew how an end user â€˜got their work done’. Blank stares were followed by â€˜they run it on the cluster’. I asked if they were aware that there could be 30,000 to 40,000 lines of integration code written around the scheduler that manages the workflow of pipelines, submission and policy? They did not. Nor did anyone know that the interfaces (command line or API) to schedulers are not standardized. Or that the choice an enterprise makes is a very, very long lasting one once the business logic â€“ unique to that enterprise â€“ is encapsulated in 10’s of thousands of lines of code.
Enabling workflow in HPC cloud:
The industry has reached a tipping point. More enterprise users are open to cloud than at any time, especially when it comes to workflow software. The quality of the offerings from public cloud providers has improved dramatically encouraging more companies to make validating cloud a priority project in 2018.
When an enterprise looks to move workload to the cloud, they are best served with a seamless extension to their on-premise clusters. Therefore, hybrid cloud computing (the method of 70% of those deploying cloud in our recent survey) has become an essential approach to HPC architectures to ensure enterprise competitiveness.
Enabling workflow in HPC cloud can take on a few meanings. One could, for example, start with a new application class and design the end-to-end execution for a specific cloud using proprietary cloud products and capabilities. Second, one could take an existing application and re-build the runtime dependencies within a cloud architecture and re-create the workflow from the ground up. Both of these methods will work and could prove effective, but let’s consider the typical enterprise that is an established consumer of HPC.
Where to begin?
As mentioned, a typical Univa enterprise customer HPC environment could have more than 40,000 lines of code written against Grid Engine that controls everything from submission, environment and data setup, monitoring, passing results and clean-up. Over the years the pipeline and submission methods have become trusted and changes can risk delays to tape-out in EDA or meeting deadlines or validating results in Life Sciences. Finally, as we will discuss below, there are no standards across schedulers for submission, management or monitoring, so this controlling code is tied to the currently used scheduler. Even DRMAA2 (a standard) is not broadly supported or high enough common denominator to be used as an abstraction layer.
What is this â€˜integration code’, what does it look like (or do)?
The short answer is it’s how an end-user interfaces with the cluster; the integration code manages a series of steps that gets the end-user the resulting answer. Those steps could be very simple â€“ like a simple `qsub’ submission of a single task â€“ or very complex controlling the flow of a series of steps, such as environment setup, data movement, watching job status, or a pipeline that sequences results from task to task.
First, let’s look at how workflows can interact with Grid Engine
The code can be written in shell scripts (although many may be written in Perl) or higher-level languages including python. Often the primary interface to a scheduler is the command line interface, used for its practicality and speed.
Example machine learning workload management integration code:
As an example, here is a submission script for integrating distributed Tensorflow with Univa Grid Engine that I borrowed from the blog Integrating Distributed Tensorflow with Grid Engine.
(NB: this script is Copyright Â© Univa Corporation, 2018 and provided under an Apache 2.0 license)
This submission script is a good example of how tightly coupled to Grid Engine a script quickly becomes; the script does two things:
When one looks at the script in detail it is obvious how closely tied to the scheduler the script is as it uses Grid Engine environment variables, commands and paths.
And then there are APIs
One should take note that despite the great attempts by the Open Grid Forum (the community formed in 2006 after the merger of the Global Grid Forum and the Enterprise Grid Alliance) there is no standardization across different schedulers. The Open Grid Forum sponsored several projects to standardize workload management such as the Job Submission Definition Language (JSDL), Simple API for Grid Applications (SAGA) and Distributed Resource Management Application API (DRMAA) but industry demand (meaning commercial representation) was limited, and therefore product implementations were few and far between.
The most important of these projects is DRMAA as the “specification for the submission and control of jobs to aÂ Distributed Resource ManagementÂ (DRM) system, such as aÂ ClusterÂ orÂ Grid computingÂ infrastructure. The scope of the API covers all the high-level functionality required for applications to submit, control, and monitor jobs on execution resources in the DRM system.” In theory, the goal of DRMAA was to abstract the integration code to increase flexibility in the back-end (meaning portability). The problem, however, was industry had written code to the native interfaces and DRMAA was a lowest common denominator approach. DRMAA had it limitations from the outset, even though DRMAA2 doubled the methods to around 100.
Grid Engine was the first to support DRMAA 1.0 and as of the writing of this blog is the only scheduler to support DRMAA 2.0. There are customers that use this API but not many.
Another way to integrate to the scheduler is to use the Univa Grid Engine REST Web Service, which is a super-set of DRMAA2. Univa Grid Engine REST Web Service is a web-based management, monitoring and submission API using modern web technologies. The server component is internally based on Univa Grid Engine JGDI API and the Restlet development framework exposing Univa Grid Engine objects and functionality as REST resources to the application programmer (bindings include python, Meteor, Node.JS, Java).
Univa built this API such that a Grid Engine cluster could be managed and monitored programmatically by HPC admins and workload could be submitted and monitored using modern technologies. In fact, Univa product Navops Launch is using the REST-API to talk to Grid Engine for reporting, monitoring job data, triggering rules and automatic scaling of cloud nodes.
Leveraging a hybrid cloud architecture means that workload can run on-premise or in the cloud. The need then is to use the same workflow that, as discussed above, is tied to the scheduler. Therefore, Grid Engine enables existing users to seamlessly access hybrid cloud for all of the existing workflow, easing the impact from re-writing or re-tooling the flow to run in the cloud.
The industry has reached a tipping point. Hybrid cloud computing has become an essential approach to HPC architectures as a means to ensure ongoing competitiveness. By using the right strategies, enterprises can maximize local resource usage, reduce total costs and improve productivity and capacity by bursting to the cloud.
For HPC users, the question is no longer whether to embrace hybrid cloud, rather, how to start. In the Capitalizing on the opportunity of hybrid cloud in HPCÂ whitepaper, we offer recommendations and suggest a four-step process for getting started with hybrid cloud.