For Univa Grid Engine customers deploying hybrid clouds, getting the storage environment right is a key challenge. While some applications require very high-bandwidth (CFD or crash simulations reading and writing massive files) other applications (think life sciences and machine learning) are much more IOPS intensive, needing to process millions of reads and writes per second with very low latency.
In an earlier article, I discussed a variety of file systems and data synchronization solutions and their tradeoffs. In this article, I wanted to take a closer look at WekaIO Matrix™, and explain how Matrix works with Univa Grid Engine and Navops Launch to deliver high-performance compute and storage clusters.
About WekaIO Matrix
WekaIO is a Univa partner and the creators of WekaIO Matrix, a distributed high-performance file system. Matrix is exciting to HPC users for several reasons:
Much like Univa Grid Engine, WekaIO Matrix is deployed in many industries requiring high-performance compute and filesystem services including EDA, Life Sciences, and Machine Learning/AI. While file system performance depends on the number of cluster nodes and how they’re configured, at the time of this writing, Matrix is the clear performance champ beating all comers on the latest Q1, 2019 SPEC SFS benchmarks.
Performance and reliability
Unlike distributed file systems that use 3-way block replication, Matrix stripes data across between 4 and 16 SSDs and data is protected by either 2 or 4 parity drives. N+4 data protection (where N is the number of data drives) delivers much better reliability than triple replication schemes, and N+2 protection delivers similar reliability to 3-way replication but is more space efficient.
Part of the reason that Matrix is so fast is that it uses small 4K block sizes that match the block size of underlying SSD devices. Coupled with the low latency facilitated by an embedded real-time operating system (RTOS), the small block size helps Matrix efficiently read and write data. It also avoids the “write amplification” problem found in competing file systems where even small changes in data cause large blocks to be unnecessarily over-written, thus extending the life of SSD drives in the cluster.
Deploying WekaIO Matrix
Matrix is designed to run on standard x86 Linux servers using 10GbE or faster network components. It supports off-the-shelf SATA, SAS or NVMe SSDs found in most computer systems and cloud instances. Customers can deploy Matrix in multiple configurations:
When deploying Matrix on a Univa Grid Engine cluster, administrators can control the resources on the host assigned to the Matrix container (# of cores, memory, network interface cards, and SSDs). Matrix will only use the resources assigned to it, ensuring that it doesn’t interfere with Grid Engine jobs.
The capacity of the file system will depend on the number of cluster hosts, and the size and number of SSDs on each host. For example, if I need 100TB of shared usable storage, I can accomplish this by using a cluster of 12 x i3.16xlarge storage dense instances on AWS where each instance has 8 x 1,900GB NVMe SSDs for a total of 15.2 TB of SSD storage per instance. A twelve node cluster, in theory, would have 182.4 TB of capacity, but if we deploy a stripe size of eight (6N + 2P) approximately 25% of our capacity will be dedicated to storing parity. 10% of the Matrix file system is recommended to be held in reserve for internal housekeeping and caching so the usable capacity of the twelve node cluster is ~123 TB (assuming no backing store).
182.4TB * (6/(6+2)) * 0.9 = ~123 TB
As described above, WekaIO Matrix provides a lot of deployment flexibility. It can be installed on dedicated hosts or in a converged model with different storage technologies and interconnects. Matrix also supports transparent management of tiered-storage where the SSD cluster stores warm data but is backed by a local or remote S3 compatible object store. By augmenting the SSD cluster with object storage, the capacity of the file system can grow arbitrarily depending on the amount of available object storage. With tiered storage, users can manage tradeoffs related to performance, cost, and capacity. The management of tiered-storage is automatic and is transparent to users and applications.
Hybrid Cloud Deployments with Univa Grid Engine and Navops Launch
In a Univa Grid Engine environment, a sample hybrid-cloud deployment with Matrix is illustrated below. In this example, the Matrix software is installed on each on-premise Univa Grid Engine host. Whether this is practical for your environment will depend on the storage available on your cluster hosts. Customers can optionally setup a dedicated storage cluster.
In the cloud (Matrix supports AWS), there are similar deployment options. It’s possible to set up a converged environment in the cloud also, but since Univa Grid Engine users often scale compute capacity dynamically based on application demand, it’s more practical to deploy a dedicated storage cluster on storage-dense AWS i3 cloud instances.
WekaIO provides an online calculator to help select optimal storage configurations and estimate IOPS and bandwidth. If I needed 100TB of usable capacity in the cloud for my Univa Grid Engine applications, but assume that 80% of this capacity can be on S3, I can deploy ten i3.4xlarge instances in the cloud backed by S3. According to the Matrix calculator, this configuration would cost $21.22 per hour (a relatively modest $15K per month) but would deliver over 1.2M IOPS and 6 GB/sec of bandwidth to my Grid Engine cluster. Depending on the bandwidth, IOPS, and back-end storage capacity used, users can select cluster sizes and instance types tailored to their workloads.
Using Navops Launch to scale compute and storage
Using the configuration above, I can automatically deploy different cloud instance types depending on the nature of the workload under Grid Engine control. For example, using cloud bursting policies in Univa Grid Engine, when there is an application workload that can’t be fulfilled by the on-premise cluster, Grid Engine can signal Navops Launch to auto-provision AWS hosts using a custom Amazon Machine Image (AMI) that has Grid Engine, Docker, and the Matrix client software pre-installed along with other pre-requisites. These instances can be up and running in minutes and be bound to a cloud or on-premise cluster expanding capacity.
The new applet functionality in Navops Launch is a gamechanger for hybrid cloud environments. Rather than simply scheduling workloads to optimize the use of compute and storage, the built-in automation engine and applet facilities provide dynamical marshalling of cloud storage services based on application metrics collected from Univa Grid Engine and usage and cost metrics extracted from the cloud provider. User-defined applets can make decisions at runtime related to data locality and data movement, optimizing performance, and cost and even provisioning, de-provisioning, or scaling the Matrix storage cluster.