It’s no secret that cloud computing is on the rise. While we wait for data on 2018, Gartner estimated cloud infrastructure-as-a-service (IaaS) revenue at US $34.6B in 2017, a whopping 36.8% high than the year before.
For Enterprise HPC, cloud increasingly makes sense
After roughly a decade of caution, HPC users are following their corporate computing brethren to the cloud. At Univa, we’ve seen a 10x increase in customer interest, and recent survey data shows that 61% of HPC users are either open to or already using cloud for high-performance workloads. Enterprise users are moving past “tire-kicking” into more serious pilots and production deployments.
New workloads drive new requirements
Why the shift? Part of the reason is likely steadily improving cloud services, but new workloads are a factor too. HPC workloads increasingly include high-performance analytics and AI workloads like model training and inference. Economic arguments used to favor owning rather than renting for HPC, but diverse processors and an increasing variety of expensive and exotic GPU and FPGA devices have turned this on its head. For many organizations, buying and holding hardware for three years looks like the riskier choice.
A little skepticism can be a healthy thing
HPC presents unique challenges, and the professionals that run clusters tend to be a realistic and pragmatic bunch. While the cloud presents exciting opportunities, it’s good to keep our curmudgeonly spirits alive.
Often it is not technical issues that sink cloud migration efforts, but business and management challenges, and in this spirit, we explore five pitfalls to avoid.
If you’re planning to migrate one or more applications to the cloud, you’ve likely already considered the idea and have spent time crunching costs and ensuring technical feasibility. As with any complex project, it’s important not only to have a clear idea of what you want to achieve but to sell it internally.
An oft-touted Six Sigma principle is the idea that Effectiveness = Quality * Acceptance (E=Q*A). You can have the best idea in the world, and execute it flawlessly, but if other stakeholders aren’t bought in (acceptance) or misunderstand the objectives, the project can still be viewed as a failure.
One of the best ways to communicate and sell your cloud migration project is to have a clear, well-articulated business case. Think carefully about what the goals of the cloud project and the benefits it will bring to the organization. It will take time to get alignment on business objectives, but internal selling is critical to the business case.
Establish clear metrics for success and make these metrics adhere to the “SMART” principle – Specific, Measurable, Achievable, Realistic and Time-Bound. The metrics don’t necessarily need show cost-savings, but they need to be specific and aligned to your company’s strategic business objectives.
Sometimes you need to go slow to go fast. There will probably be several stakeholders in a cloud migration project from senior management, to finance, to line-of-business leaders, to IT. Related to the point above, take the time to build consensus with your stakeholders. Be careful not to oversell anticipated benefits. It is better to under-promise and over-deliver than lose credibility part way through a cloud migration project.
By enlisting stakeholder support, colleagues will feel ownership in the plan and will be in your corner when inevitable hiccups occur. As the project progresses, make sure you communicate regularly, and continually remind stakeholders why they are doing the project, and why they were smart to support it! Continue to illustrate and communicate how you are collectively progressing against the key metrics in your shared business plan.
Given the nature of on-premise HPC clusters, most cloud migration projects will probably involve cloud infrastructure services. Cloud providers tend to offer a wide variety of services that will look like a virtual “candy store” to tech-savvy HPC users and administrators.
Cost is one of the biggest attractions of cloud models and also one of the bigger downside risks. When it’s easy to spin up infrastructure with the click of a mouse, costs can easily get out of control. Why spin up a cluster in the cloud when I have capacity locally? With so many easy-to-consume services, it’s easy for people to leave idle instances running or over-provision environments accidentally. Stealth costs like bandwidth or “pay-by-the-drink” metered API costs can cause costs get out of control.
As a cloud administrator, you need to rule your cloud HPC environment with an “iron-first” controlling credentials and entitlements carefully and monitoring usage to keep a lid on total spending.
It’s easy to be lulled into a false sense of security when piloting cloud computing initiatives. Everything may go well at the pilot stage with motivated, hand-picked volunteers, but as production workloads run in the cloud, it’s wise to expect surprises.
The ease-of-use of cloud management interfaces masks the underlying complexity. Remember that at the end of the day you are still managing complex applications and software infrastructure, but are now managing everything remotely, using tools that are new and unfamiliar. Make sure you plan for the learning curve required for master new tools and skills. The business benefits you sold to management will come true, but it may take a little longer than you think.
We’re all familiar with the saying garbage-in, garbage-out and this principle is alive and well in HPC cloud migrations. If you don’t have visibility and control over workloads and resources in your on-premise environment, don’t expect things to get better when you move to the cloud. If anything, moving to the cloud may compound your challenges by exposing additional degrees of freedom and complexities.
Getting your local house in order is important. If you have workload management on your local cluster, established policies for resource use, and solid cluster monitoring and reporting you’re probably in good shape. If you don’t have operational maturity in these areas, it might wise to think twice and focus on this before making a move.
Costs are self-limiting on-premise because compute and storage are finite. In the cloud, there is every opportunity for spending to get out of control without strong management controls.
Univa offers a variety of cloud-ready solutions that can help customers deploy and manage high-performance applications locally, or in hybrid cloud environments using your choice of cloud provider. To learn more about Univa solutions for cloud computing or to speak with a Univa representative, contact us or visit http://univa.com.