The use of cloud computing for high-performance workloads is on the rise. Most cloud providers now offer HPC nodes with state-of-the-art CPUs, GPUs, and high-performance storage and networking. Despite this, deciding whether to take the cloud-computing plunge can be tricky. In this short article, we discuss five key considerations to help you decide whether cloud-computing is right for your business.
There is a common perception that cloud computing is cheaper than managing local servers. While often true, it’s not always a slam dunk – it’s worth investing some time to compare costs.
The benefits of running in the cloud are compelling. You can add or remove state-of-the-art infrastructure whenever you want, side-step the headaches of installing and troubleshooting hardware, reduce vendor management costs, avoid paying for idle capacity, and rather than accounting for depreciating capital assets you can enjoy flexible consumption-based pricing. You can also benefit from a wide variety of cloud-resident services (like speech to text converters or image recognition solutions) that would be cost-prohibitive to deploy locally.
Cloud providers usually market server instances based on the instance type, number of virtual CPUs (vCPUs), and available memory and storage. Reserving an HPC capable c5d.9xlarge instance on Amazon Web Services (AWS) presently costs USD $0.558 per hour. At this price point, a 20 node cluster (comprised of 720 vCPUs, 1.44 TB RAM with 10TB of SSD ) would run you about $100K per year. Even after factoring power, cooling, and facilities costs, long-term resource use in the cloud is usually pricier than on-premise deployments.
In cloud bursting scenarios however (discussed below), or in cases where specialized resources are needed for a few weeks or months, the economics tilt decisively in favor of cloud.
When planning capacity in the cloud, make sure you’re comparing apples-to-apples. A cloud vCPU usually corresponds to a single thread on a hyper-threaded core, so all things being equal, you may need more cloud vCPUs than local cores for equivalent throughput. Also, be careful assuming that cloud computing will reduce personnel costs. While cloud-based tools can boost efficiency and avoid some costs, unless you’re running a pure a software-as-a-service (SaaS) environment, you’ll still need skilled technical people to administer cloud-based systems and applications.
Despite these cautions, in most situations, the pros outweigh the cons. Used appropriately, cloud computing can help most organizations simplify their environments and reduce costs.
In addition to cost, the volume and nature of your data is another consideration when contemplating a move to the cloud.
Most cloud providers offer multiple storage options including block storage, object stores, databases, in some cases shared file system solutions. Large storage environments can be difficult to manage and back up, so cloud storage can be attractive and help avoid significant complexity. There are a variety of solutions that can synchronize data between local and cloud-resident clusters efficiently.
Monthly costs for block storage range from approx $0.05 to $0.13 per GB-month depending on whether you opt for magnetic or more expensive solid-state storage. At this price, storing 50TB of data on block storage in the cloud will cost between $2,500 and $6,500 per month. Object storage is cheaper (in the range of $1,000 monthly for the same amount of data), but if you’re planning to use object storage, you’ll likely need to modify your applications or workflows.
Don’t underestimate the challenge of moving large datasets to the cloud. Although cloud providers usually don’t charge network fees for importing data, moving large datasets is non-trivial. With a dedicated 1 Gbps connection and a WAN optimization solution, best case transfer times are in the range of 700 Mbps meaning that transferring 1TB of data will take over 3 hours. Most cloud providers also offer physical data transport solutions, useful for initial transfer of large datasets like video libraries, image repositories or genomics data but these come at a price.
As a rule of thumb, it’s a good idea to keep processing close to where you plan to store your large datasets. While cloud data management solutions can address most applications, data requirements may require that at least some applications stay local.
A common use case in HPC is cloud bursting. Depending on your applications, you may need large amounts of computing capacity, but only periodically or for short durations. Rather than have assets sit idle, It’s often more cost-effective to operate a smaller cluster locally and “burst” to cloud capacity as needed.
As above, the feasibility of cloud bursting will depend on your applications and data. For some workloads like modeling the profitability of an insurance product under various scenarios, or running a large computational fluid dynamics (CFD) simulation (where intermediate data may be large, but the models themselves are relatively small) cloud-bursting can be an excellent solution offering significant savings and productivity benefits.
Software licensing is another consideration if you’re running commercial software. While ISV licenses are increasingly cloud-friendly, it’s a good idea to check that licenses can be used with your chosen cloud provider and that the vendor supports usage-based pricing suitable for bursting.
The key to effective cloud bursting is automation. The process of deploying and tearing down cloud application environments needs to be reliable and transparent to end-users. People costs often dominate infrastructure costs, and if it takes hours or days of manual effort to establish a working environment in the cloud, any financial benefits advantages will quickly disappear. Ideally, cloud bursting should be integrated with your workload manager so you can control what applications are eligible for bursting, and make the process seamless for application users.
Depending on the business you’re in, you may run dozens of applications. For example, a CAE environment may run a variety of commercial and open-source simulators for finite-element analysis, dynamic simulation, and CFD. With on-premise clusters, all of these applications typically share the same infrastructure, although some host types might be preferred for some workloads.
Hosted application services (software-as-a-service) for a particular vendor’s tool can sound attractive, but users need to be careful. In the quest for simplicity, it’s easy to magnify costs by paying for siloed replicated infrastructure. Ideally, the cloud environment should support all your workloads. Hybrid approaches where some applications run locally and others run in the cloud can be effective, but it’s important to take stock of all the applications and avoid scenarios where workflows are made more complex, less reliable, or slowed down by the need to transfer data back and forth between local and cloud-based servers.
Virtualization and container technologies are helping with this challenge, and as more applications are deployable in containers, cross-cloud portability and managing application diversity are becoming less of an issue.
In an age of growing concerns about a range of cyber-threats, security is top of mind for most organizations. While security is a real issue, this is one area where cloud computing probably gets a bad rap. The packets traversing the internet sent by malicious actors don’t distinguish between on-premise data centers and public clouds – they only see routers and firewalls and how they’re configured.
It’s often said in IT that “security is not something you buy, it’s something you practice.” It’s a good bet most major cloud providers have more sophistication when it comes to securing networks and systems than their corporate IT brethren. That said, it’s still incumbent on cloud users to take proper advantage of the tools available to help them secure their environments. These include firewalls, credential management, appropriate segmentation of servers across VLANs, dedicated instances or dedicated hosts, network and filesystem encryption, etc.
Depending on their business, organizations may be subject to laws and regulation including HIPAA, PCI, GDPR, or various financial requirements. Failing to protect data can result in severe consequences. Emerging high-performance applications in analytics and AI (machine learning model training for example) increasingly operate on data sets covered by regulation. For some applications, regulation may be less of a concern, but companies are still concerned about protecting their intellectual property.
Organizations are accountable for meeting regulatory requirements regardless of whether data resides in a corporate data center or cloud provider. The trick for managing cloud providers is to ensure that you impose all of the legal and regulatory requirements that apply to your business to your supplier as well.
Enterprises need to do due diligence on cloud providers considering issues like their financial stability, physical security of their data centers, disaster recovery plans, and level of technical expertise. While the risks are real and important to consider, they probably exist regardless of who operates the infrastructure.
Univa offers a variety of cloud-ready solutions that can help customers deploy and manage a wide variety of high-performance applications locally, or in hybrid environments using your choice of cloud provider. To learn more about Univa solutions for cloud computing or to speak with a Univa representative, contact us or visit http://univa.com.