For HPC users considering a move to the cloud, security is an important topic. In the first article in this series, we began with the perimeter of our cloud security model covering cloud provider edge protection, VPCs, network security and firewall rules. In the second article, we explored identity and access management at the level of the cloud provider, deployed services, and individual machine instances. In this final article, we’ll focus on additional measures to secure cloud instances and data including patching, trusted instances, encrypted storage, and other security-related tools. As with other articles in this series we discuss security in the context of Amazon Web Services (AWS) although the same concepts apply to other cloud providers as well.
In cloud environments, protecting the machine instances that comprise the cluster is important. We’ve already discussed several layers of security including:
For administrators, patch management is a challenge that often doesn’t get enough attention. Security-related vulnerabilities in Linux kernels and layered programs are being discovered constantly, and rootkits allow even unsophisticated users to exploit known security holes easily. When applying patches and software updates, administrators are always concerned about breaking things that are working, so administrators need to balance security risks with operational risks.
Some administrators rely on security bulletins from the Computer Emergency Readiness Team (CERT) or commercial services like Red Hat Network (RHN) to alert them to security vulnerabilities. In Linux environments, patches are normally distributed as operating system updates managed using facilities like yum (yellow dog update manager) on RHEL and CENTOS, zypper on Suse, or Ubuntu’s apt (Advanced Packaging Tool). Users on RHEL or CentOS might run commands yum check-update to see if new updates are available and yum update to apply any new packages. Some administrators prefer only to apply security-related patches to minimize changes to the environment and use filtering tools such as yum-plugin-security.
To reduce risk when applying patches, administrators often maintain a small test cluster that mirrors their production environment for purposes of testing updates before putting them in production. While tools like yum and apt are powerful, they are also complex. Administrators need to be knowledgeable about how to use them to apply patches safely, and rollback changes if something goes wrong.
To simplify patch management for cloud instances, administrators can use AWS Systems Manager which includes patch management functionality. The benefit of an automated patch manager is that it provides an easy-to-use interface across multiple OS flavors and users don’t need to interact with lower-level OS facilities. While a patch manager makes it easy to apply and rollback patches, it doesn’t guarantee that a new update won’t affect application environments, so administrators still need to be concerned with change management.
When a cloud service launches a machine instance, the instance is based on a machine image. In AWS the image is referred to as an AMI or Amazon Machine Image. Other cloud providers have the same concept. For example, Google Cloud Platform has the notion of public images and custom images. Azure provides facilities to create an image of a virtual machine or VHD. In the AWS marketplace, public images prepared by third parties can be used for a fee. For security-minded administrators, the question that always arises is “Can I trust this image?”
Because of security-related concerns and the need for particular software, organizations often choose to deploy their own custom machine images rather than use machine images offered by the cloud provider. Instances deployed using these images are trusted because users know what software is in the image.
AWS provides facilities to create custom AMIs, share them among specific accounts, and even make AMIs public. When Univa Grid Engine is deployed via the AWS Marketplace, it is using a custom AMI behind the scenes containing a pre-configured OS and software provided by Univa. For customers automating the deployment clouds or hybrid clouds on AWS or other providers, a nice feature of Navops Launch is that it provides “bring your own AMI” functionality allowing users to automatically provision private or shared custom images that they trust in addition to machine images offered by the cloud provider.
For customers using container environments like Docker or Singularity to deploy and manage HPC applications, the same issues apply. When users pull an image from a public registry, they are trusting that the container is secure, reasonably up to date and that it doesn’t contain any malware, backdoors or known vulnerabilities.
There are a few different ways to increase your confidence in containers pulled from public registries. One technique is to pull containers from a commercial trusted registry from a vendor you trust (e.g., Univa, The UberCloud, or a Docker Trusted Registry). Images pulled from a Docker Trusted Register are digitally signed so users can be confident that they haven’t been altered.
Another approach to deploying trusted containers is to set up a private container registry that contains only container images vetted by your organization. Cloud users can either deploy their own registries in the cloud or look to cloud services like Amazon’s Elastic Container Registry (ECR) to manage registry services for them.
In addition to making sure that containers are trusted, HPC users running container environments should be aware of other security-related considerations. When deploying containerized workloads, the container runtime will need to be installed on each cluster host. In Docker environments, the Docker daemon runs as root by default, leading to concerns about root escalation attacks where containerized workloads find a way to run commands as root compromising the security of the environment. While this concern about Docker has been addressed, it is important for administrators to configure Docker appropriately to make sure the environment is secure.
For an extra measure of security, most cloud providers offer services to encrypt data at rest. Encryption services are supported for most storage types including block-storage, object storage, and various database services. When using encryption services for block storage, data inside the volume, snapshots created from the volume, and all data moving between the volume and the machine instance are encrypted. Encrypting storage provides an extra layer of security ensuring that your cloud storage is only readable by you.
Amazon offers EBS encryption and S3 encryption, Microsoft offers storage service encryption across its various storage services, and Google supports encryption of data at rest by default in Google Cloud Platform. Cloud providers make encrypting storage simple, so this is another layer of defense that users can take advantage of. For those concerned about performance, Amazon claims that users can expect the same IOPS performance from encrypted volumes with minimal impact on latency.
So far, we’ve been focusing on hardening the cloud environment in various ways to protect access to applications and data. As important as this is, it is equally important to monitor the environment so that you know whether people are attempting to breach your defenses and whether someone has been successful.
For AWS users CloudTrail is a useful resource. CloudTrail records various actions and API calls made in your AWS account. It provides a complete log of events including sensitive activities like creating users, login profiles, adding users to groups, or modifying IAM roles or permissions. Users can use CloudTrail to automatically log AWS events to “trails” stored in AWS Simple Storage Service (S3) buckets, or pipe event logs to AWS Lambda where user-defined code can monitor events and take corrective action in case of unusual activities.
CloudTrail can also trigger the AWS Simple Notification Service (SNS) to send proactive push notifications, or pass information to the AWS Simple Queue Service (SQS) as another way of handling CloudTrail events. Users can also monitor CloudTrail events through AWS CloudWatch providing a single dashboard for monitoring AWS resource usage and security-related exceptions.
While there are security-related risks when running applications in the cloud, there are benefits as well. Cloud providers operate at a different scale than corporate data centers and are sophisticated in their understanding of security. Cloud providers offer a wealth of tools, best practices, and security-related expertise that is difficult for most enterprises to replicate.
For example, the AWS Trusted Advisor service is available to all AWS users to validate S3 bucket permissions, check your security groups, and provide security related recommendations related to IAM, EBS services, RDS, verify multi-factor authentication (MFA) on log-in accounts. Customers with upgraded support plans can take advantage of a variety of additional automated checks, diagnostics, and receive proactive notification of security-related events. Microsoft’s Azure Security Center and Google’s new Cloud Security Command center provide similarly comprehensive security services.
In addition to the various facilities we’ve already mentioned, users can take advantage of additional services to help monitor and secure HPC applications in AWS. Among these additional services are:
We hope you’ve enjoyed our discussion cloud security for HPC applications and picked up a few useful tips. Contact us to share your thoughts or learn more about securing HPC cloud environments.