This year we have seen significant acceleration of HPC in the cloud. The industry segment grew 44 percent to $1.1 billion, and the pace is expected to continue with an anticipated $3 billion in 2022, according to Intersect360.
Many organizations have started leveraging hybrid strategies – a blend of on-premise HPC that is tightly coupled to the public cloud. In our recent customer survey, 61% of enterprise HPC users indicated that they are open to or already using cloud. Here are some significant factors that are enticing enterprises to move their HPC workloads to the cloud:
Simplified set up
Launching an external cloud cluster is incredibly easy, especially if you get someone like KSM Consulting IT support to do it for you. For instance, with Amazon EC2, users go to the Amazon Market Place, choose HPC from the list, and launch a cluster. A similar feature is available from RightScale, where Univa Grid Engine templates are available.
Once the cloud is built, which can take anywhere from 5 to 20 minutes (depending on how large you make your cluster), the next step is to install applications. Application installation works just like a regular local machine and requires a similar amount of time. Once those machines are provisioned by EC2, additional provisioning and configuration management is provided by Univa, turning “raw” virtual machines into nodes that can operate as an HPC cluster. This process is automatic and adding and removing nodes from this HPC cluster environment can be done on demand.
Extreme scale automation
Scientists and engineers use High Performance Computing (HPC) in order to solve complex problems. Requirements to run jobs frequently include big compute, high-throughput, and fast networking. Until recently, the cloud simply wasn’t an option for these workloads because they were too big to move, or latency was too high. Cloud providers have advanced to a point that the same type of on-premise infrastructure and services can be replicated in the cloud. This capability combined with more cloud-friendly licensing models from software vendors have nearly eliminated any previous barriers to running in the cloud.
To demonstrate the unique ability to run very large enterprise HPC clusters and workloads, Univa leveraged AWS to deploy 1,015,022 cores in a single Univa Grid Engine cluster showcasing the advantages of running large-scale electronic design automation (EDA) workloads in the cloud. The cluster was built in approximately 2.5 hours using Navops Launch automation and comprised more than 55,000 AWS instances in 3 availability zones, 16 different instance types and leveraged AWS Spot Fleet technology to maximize the rate at which Amazon EC2 hosts were launched while enabling capacity and costs to be managed according to policy.
Address peak performance needs
When faced with a spike in demand for capacity to run workload, tapping cloud computing resources is usually a better solution than more on-premise “racking and stacking.” Depending on the amount of workload, enterprises may need large amounts of computing power, but only periodically or for a short amount time.
Mellanox Technologies needed a highly stable engineering cluster for their silicon design-related activities – one that could perform exceptionally well in their on-premise high-performance computing environment and burst transparently into Microsoft Azure Cloud during tape-outs (the most critical and peak load periods). Migrating HPC to the cloud allowed Mellanox to gain necessary capacity, reduce time-to-market, and have access to specialized hardware at previously unimaginable scales. [read case study]
Many of our customers have found using cloud computing for HPC workload to be a smart financial decision. Shifting what was a capital expense to an operational expense has many advantages. Cloud computing offers scalable and instantly available computing resources with unlimited storage at a reasonable metered cost.
Paying only for what is used in the cloud optimizes costs, reduces idle compute capacity, eliminates long-term contracts, and reduces complex licensing. Novartis ran a project that involved virtually screening 10 million compounds against a common cancer target in less than a week. They calculated that it would take 50,000 cores and close to a $40 million investment if they wanted to run the experiment internally. Using Amazon Web Services (AWS) the project ran across 10,600 Spot Instances (approximately 87,000 compute cores) and allowed Novartis to use its 39 years of computational chemistry knowledge to conduct the experiment, which only required 9 hours and an investment of $4,232. Out of the 10 million compounds screened, three were successfully identified. [read case study]
Access to Specialized Resources
Cloud providers have become HPC-enabled as demand increased for AI and machine learning resources, as well as support for analytics and big data. Specific workload requirements have led to access to specialized resources powered by big memory, GPUs, FPGAs and specialist AI processors that may not be available on-premise.
The Wharton School, University of Pennsylvania, extended its HPC environment to the cloud and avoided infrastructure expenses and steep user training, while transparently tripling its core count. Wharton now offers its user base the option to cloud burst, a hybrid model that augments their local compute infrastructure with access to specialized resources, including GPUs, offered by AWS. [read case study]