HPC cloud. It’s been a long time coming hasn’t it? It seems like yesterday that Bill Bryce and Cameron Brunner of Univa worked with Chris Dagdigian “Dag” of BioTeam to build the first Grid Engine based HPC cluster in EC2. (The webinar, videos and whitepapers are still available for the curious.) At that time (circa 2008) the process to create a cluster in EC2 was manual with 13 rather complicated steps.
Wind the clock forward and the ability to build HPC clusters in the cloud has not only been improved and simplified, the cloud infrastructure has caught up to the unique needs of the performance-oriented. Take a look at Univa’s own AWS Marketplace offering which makes it easier than ever to spin-up fully functional Univa Grid Engine (UGE) clusters on AWS Marketplace using a 1-click installation process.
It’s safe to say that HPC cloud provisioning offerings that support Grid Engine are commonplace today; with offerings such as CFNcluster, Google Genomics and UberCloud (plus various other projects and cloud offerings) all supporting Grid Engine as the default scheduler.
It’s also reasonable to state that Grid Engine is the de-facto standard choice for workload management in HPC cloud.
One might ask “why?”.
From my point of view there are two principal reasons: First, Grid Engine has the widest adoption of any workload manager, and being open source, is available for free. Second, there are thousands of deployments across numerous industries which implies a rich set of integration code for toolkits, libraries, internal applications and commercial ISV applications like Ansys, Jeppesen, Mentor Graphics or Synopsys. Someone – somewhere and at some time – has most likely written an integration for Grid Engine and published it.
That integration code is what drives the pipeline in life sciences or the verification process in electronic design (for example); it encapsulates the business process of an enterprise (or an organization) that runs shared and repeated workflows in a cluster.
Clearly, it does more than simply submit workload; it manages the lifecycle of a job from environment setup and validation, dispatch through to system clean up. What’s more important is that it’s guaranteed that Grid Engine users will expect to be able to execute their workloads in a public cloud without rewriting or starting over. They expect – and need – to run their existing workflows in the public cloud.
In an upcoming blog we will look at how Univa enables enterprise HPC workflows to run in the public cloud.