Next Generation Sequencing (NGS) is a fundamental practice in bioinformatics. Pipelines are comprised of complex, multi-step processes involving many different tools and intermediate data formats. With easy access to cloud infrastructure and containerized applications that are portable across clouds, users are increasingly extending pipelines to the cloud.
In part I of this article, we discussed Nextflow, a leading tool for managing bioinformatics workflows and showed how it can be used with Univa Grid Engine and Navops Launch to facilitate transparent hybrid cloud bursting to multiple cloud providers.
In this second article, we’ll look at the mechanics of how cloud bursting is enabled in Univa Grid Engine and Navops Launch and explain how bioinformatics users can enable bursting to multiple clouds regardless of their chosen pipeline tools and management frameworks.
Nextflow is a free and open-source software solution for application workflows developed by the Centre for Genomic Regulation (CRG). Seqera Labs was recently incorporated as a spin-off from the CRG to provide enterprise-level support and professional services around the Nextflow platform, as well as to explore new, innovative products to power the next generation of big data analysis applications.
Under the covers, transparent cloud bursting is provided by Navops Launch and its integration with Univa Grid Engine. An easy way to get started with a working environment is to use the Univa Grid Engine offering on the AWS Marketplace. The marketplace deployment provides a ready-to-use cloud-resident Univa Grid Engine master host with Navops Launch and Univa Unisight (for reporting and management) pre-installed.
When you log in to master host, you’ll be provided with step-by-step instructions on how to add and remove cluster nodes using the Navops Launch add-nodes and delete-node commands.
After following the AWS Marketplace procedure and logging into the Univa Grid Engine master host, installing Nextflow is straightforward. Simply install Java and Nextflow using the commands below as described in the Nextflow quick start guide.
Navops Launch (based on open-source Project Tortuga) employs the notion of kits to install and manage software easily. You can think of a kit as a smarter RPM. Software kits can include multiple software components. Enabling cloud bursting requires that the Univa supplied simple policy engine kit be installed on the cluster. You can verify it is present using the get-kit-list command as shown below from the Univa Grid Engine master host as shown:
The awsadapter is pre-installed in the AWS marketplace deployment as shown above. A full installation of Navops Launch would involve additional cloud adapter kits. A complete list of available cloud adapters is provided on the Project Tortuga overview page on GitHub.
Detailed instructions for setting up cloud bursting policies and associating these policies with Univa Grid Engine queues are provided in the Navops Launch documentation, but for convenience, a single script is provided to set up a burstable queue (burst.q) with some reasonable default policies.
The script can be found on the cluster master host under $TORTUGA_ROOT. Simply run the provided enable-cloud-bursting.sh script to set up cloud bursting on the cluster as shown.
As shown above, the enable-cloud-bursting.sh script creates a queue, adds a hostgroup for Univa Grid Engine bursting hosts, and creates rules and scripts in $TORTUGA_ROOT/rules that define XML-based policies on how to activate and deactivate cloud hosts.
You can verify the hardware and software profiles that will be used to provision cloud-resident cluster hosts by using the commands below. The name execd-burst is attached to both the hardware and software profiles for Grid Engine compute host to be provisioned under control of the simple policy engine. Cloud instances are added when a threshold number of jobs are pending in a burstable queue.
Navops Launch is designed to provision on-premise clusters as well as cloud-based clusters, and by default, it asserts its own naming scheme for cluster nodes (compute-#nn). It’s important to use the update-hardware-profile command (last line in the script below) to change the default name format to a wildcard to avoid the activateNodes.sh script from failing when it attempts to impose a new name on the dynamically created AWS instance during a cloud bursting operation.
In the default configuration, as soon as ten or more nodes are needed (pending jobs) the bursting policy will add additional cluster hosts. Administrators can alter these policies by adjusting the configuration files found in $TORTUGA_ROOT/rules. After changing a rule associated with the cloud bursting policy, administrators will need to run delete-rule to remove the rule followed by add-rule to apply the modified XML template.
You can verify that cloud bursting queue (burst.q) is working as expected by submitting test jobs to the queue as shown below. If there are available local hosts associated with the queue jobs will begin executing immediately and there may be no need for cloud bursting.
If the pending jobs reach the pre-configured threshold, following the polling interval defined in $TORTUGA_HOME/rules/post_basic_resource.xml (5 minutes by default), Navops Launch will begin adding cloud hosts automatically based on the bursting policy.
Behind these scenes, the simple policy engine in Navops Launch will automatically start and add cluster hosts under Navops Launch’s control based on the execd-burst hardware and software profiles shown above. You can monitor new AWS instances coming online using the AWS console or the AWS CLI as shown below.
When configuring Navops Launch, you’ll need to make sure there is a rule in the AWS security group that allows hosts in the group to communicate with one another. An example of a suitable security group configuration is shown below. Nodes are added by default to the univa-sg security group (you can call this group whatever you want), and the All traffic rule in the AWS security group enables cluster nodes to communicate freely with one another. This is important because NFS runs across the cluster nodes and some port numbers are negotiated dynamically at runtime.
As cloud instances are added to the cluster (see below) the pending jobs will be scheduled to the dynamically added hosts. After all jobs are complete, idle nodes will start being removed based on the policies defined in $TORTUGA_HOME/rules/post_basic_unburst.xml.
Nextflow provides native support for containerized Docker and Singularity jobs. Singularity is relatively straightforward. Docker can be a little more complicated, however. Nextflow handles the construction of the qsub command line and the command line that runs the container on the compute host (i.e., docker run). This means that the Univa Grid Engine – Docker integration features are bypassed.
Univa Grid Engine will still automatically define a host-level “docker” resource for cloud hosts with a docker runtime installed. To ensure that containerized workflow steps in Nextflow are only dispatched to hosts with a docker runtime, you can add a clusterOptions directive in the Nextflow.config file (or in-line in a process definition in the workflow) and set the value to “-l docker” (-l lower-case ‘el’) as described in the Nextflow documentation. This will pass the Docker resource requirement to the sge executor and ensure that Univa Grid Engine enforces the resource requirement and dispatch container jobs only to Univa Grid Engine hosts that have Docker installed.
To get the cluster example working with Docker, for expediency I set “sudo = true” in the docker section of the Nextflow.config file and adjusted the /etc/sudoers file on each compute host adding the line “sge ALL = NOPASSWD /bin/docker” at the end of the file. This command allows the non-privileged sge user to run the /bin/docker command without being challenged for a root password.
Using Nextflow with Univa Grid Engine provides users with additional ways to burst to public cloud providers beyond the native AWS Batch integration.
As a recap, some of the benefits of this approach are:
Are you running pipelines using Nextflow and other workflow tools on-premise or in the cloud? We’d love to hear from you and learn from your experiences.
You can learn more about commercially supported Nextflow and the integration with Univa Grid Engine at https://www.seqera.io.