Open Workflow Management Standards and Univa Grid Engine
In many fields of research, workflow management systems are used to automate the execution of multi-step processes. Applications include bioinformatics (for pipeline management), medical imaging and astronomy. The diversity of workflow tools can be a challenge; there are hundreds of different tools (here) implemented in different languages and optimized for different disciplines. Popular workflow tools in life sciences include Galaxy, BPipe, GATK, Snakemake, and NextFlow. With so many tools for describing and executing workflows, sharing workflows between organizations is a challenge. Workflow management is an area crying out for some standardization.
In this article I’ll look at two workflow standardization efforts: the Common Workflow Language (CWL), a specification supported by several business and research organizations; and the Workflow Description Language (WDL) led by researchers from the Broad Institute.
The Common Workflow Language (CWL)
The Common Workflow Language is a broadly supported specification aimed at making it easier for organizations to share data analysis workflows. The specification is maintained on GitHub along with cwltool the open-source reference implementation[i]. Multiple tools implement the CWL specification.
For Univa Grid Engine users, Toil developed by the USCS Computational Genomics Lab is a good way to get started with CWL. Toil is a pipeline management system written in Python, and in addition to its native Python-based syntax, Toil supports version 1.0.1 of the CWL specification. Another tool that supports CWL is Cromwell, but I’ll save Cromwell for when I discuss WDL.
To run Toil workflows with Univa Grid Engine, you’ll need to install Toil on each cluster node, or on a shared file system accessible to all nodes. The toil worker process on each node is responsible for running and monitoring the jobs steps that comprise a Toil workflow. If you plan to run CWL workflows with Toil make sure that you install Toil with the “cwl” extra feature ($ pip install ‘toil[cwl]’). This will result in the toil-cwl-runner executable being installed on each host. You can follow this procedure in the Toil documentation to run a basic CWL workflow called example.cwl.
Running a simple CWL workflow in Toil with Univa Grid Engine looks something like this:
$ export TOIL_GRIDENGINE_ARGS='-q batch.q' $ toil-cwl-runner --batchSystem=sge --defaultMemory 100000000 --defaultCores 4 \ example.cwl example-job.yaml
In addition to passing resource requirements on the command line, you can pass Univa Grid Engine command line arguments via the $TOIL_GRIDENGINE_ARGS environment variable. Toil also allows for different workflow steps to have different resource requirements via flexible ResourceRequirement directives [ii].
You can see details of how Toil interacts with Univa Grid Engine commands (qstat, qsub, qdel, etc.) by inspecting the Toil gridengine.py module on GitHub. Toil periodically polls Univa Grid Engine to monitor workload execution. Resource requirements for Toil workflows can be passed via the command line as shown in the example above, or individual resource requirements can be expressed for each step in a Toil workflow.
The Workflow Description Language (WDL)
The Workflow Description Language (pronounced “widdle”) is another open workflow specification developed at the Broad Institute. While CWL and WDL provide similar capabilities, CWL is often described as more generalized. WDL is frequently used with GATK (Genome Analysis Toolkit) also developed at the Broad Institute.
Cromwell is written in Scala and is integrated with Univa Grid Engine. A step-by-step example of configuring Cromwell to use Univa Grid Engine (or Sun Grid Engine) is covered here. A nice feature of Cromwell is that it provides a “runtime” directive that can be used to specify resources such as CPU, memory and Docker containers for individual job steps. For Univa Grid Engine users, Cromwell has an active community of Grid Engine users and many Grid Engine examples, so it is easy to get assistance with the tool.
Cromwell has added support for CWL 1.0 in Cromwell version 32 and later versions. Similarly, WDL support is being offered in Toil. It seems likely that as these open-workflow standards gain traction, workflow tools will evolve to support multiple workflow specification languages.
The Bottom Line
Open workflow standards are coming, but it’s still early days. The standards are new enough that support for CWL or WDL should be one of many considerations in selecting a workflow tool. In an earlier article, we looked at NextFlow (commercially supported by seqera.io) and its integration with Univa Grid Engine. Both NextFlow and CWL provide rich functionality and allow inline coding of workflow steps using a user’s preferred scripting language. It’s easy to see how some users that rely on these features might view standards-based workflow specifications as a step backward. As open workflow language specifications gain traction, leading pipeline tool provider will likely find ways to support the standards.
Your choice of a workflow or pipeline manager will probably come down to your choice of tools, who you are collaborating with, and existing repositories of workflows that you can leverage. The good news for Univa Grid Engine users is that there are plenty of choices since all these popular tools feature a Grid Engine integration.
Are you using a workflow tool with Univa Grid Engine? We’d be interested in learning about your workloads and experience to date.
This article has been updated based on new information. Sincere thanks to Michael Crusoe, Common Workflow Language co-founder and CWL Project Lead for his comments and corrections.
[i] CWL is owned by the Software Freedom Conservancy, a NYC 501(c)(3) public charity
[ii] This syntax reflects version 1.0.2 of the CWL command line tool standard. At the time of this writing Toil conforms to CWL v1.0.1