In the burgeoning fields of life sciences, bioinformatics and genomics continue to undergo rapid advancements, churning out breakthroughs and new discoveries never thought possible even a few decades ago. It is astonishing to consider that by 2003, it had taken about 13 years and $2.7 billion to complete the first human whole genomic sequence. Today, largely thanks to the accessibility of faster and less expensive high-performance computing technologies, it is possible for a human genome to be sequenced in hours at a cost of just $1000.
With the increasing accessibility to HPC in life sciences now generating enormous datasets, promising deeper understandings of biological systems, the result is that these environments must cope with new complexities. Workload scheduling decisions must consider a myriad of factors, such as resource requirements, dynamic loads on servers, thresholds, time constraints, sharing policies, and much more.
Take for example the assembly of the DNA sequence of an organism based on results from a high-throughput sequencer: millions of very short DNA sequences must be puzzled together to finally yield complete DNA sequences, a highly RAM-demanding task. Generating such enormous amounts of data exacts a heavy burden on the HPC clusters tasked with its storage, processing, visualization and integration. Without a reliable and powerful orchestration solution, multiple such workloads will have the result of inefficient resource usage and bottlenecks. Under-utilization of substantial HPC cluster investments can quickly become costly.
Our recent case study with Germany’s Bielefeld University is truly ‘a case in point’.
Bielefeld University has long been distinguished for its unwavering emphasis on innovative top-level research, an approach of applying multi-perspective methods to solving problems. In fact, the University’s official motto is, “Transcending Boundaries – between disciplines and scientific cultures, between research and teaching, and between science and society.”
The Center for Biotechnology, or CeBiTec, at Bielefeld University is one of the largest faculty-spanning central academic institutions. Its purpose is to bundle biotechnological activities and research projects, foster crosslinking of research approaches and technologies and develop innovative projects. As part of CeBiTec, the Bioinformatics Resource Facility, or BRF, provides the high-performance compute infrastructure for its 150 members and partner groups in the Faculties of Biology, Technology and Chemistry, and over 1000 national and international researchers and affiliates. Comprising over 100 servers with 3,000 CPU cores and about 15.5 TB of RAM, the heterogeneous setup includes servers with 8 cores and 32 GB RAM, up to systems with 48 cores and 2 TB RAM. Several hundreds of bioinformatics software and tools enable sequencing data evaluation, assembly, read mapping, gene prediction, functional annotation, and analysis.
When massive volumes of data, coupled with growing demand from its extensive user base of researchers and partners, was creating bottlenecks and workload inefficiencies, BRF was faced with an immediate need to upgrade its aged workload orchestration system and deploy a future-ready enterprise-grade solution that would not only eliminate bottlenecks and underutilization, but provide a more feature-rich and automated experience.
BRF selected Univa Grid Engine for its optimized performance and reliability, and its deeply integrated tools, applications and libraries needed by CeBiTec’s researchers. As Univa VP and GM Rob Lalonde explains, “As the defacto standard for Life Sciences organizations – Univa solutions power 11 of the world’s top 12 Life Sciences companies plus many of all sizes – we ensure seamless integration with the applications, tools, and libraries that Life Sciences researchers utilize.”
For BRF, Univa’s 100% DRMAA2 support is also a key feature because, in addition to the many bioinformatics tools maintained on their servers, BRF designs and develops their own applications for analysis and processing of large-scale -omics datasets, like QuPE, MeltDB, and Fusion, applications that rely on the submission of workloads via the DRMAA2 interface.
Dr. Stefan Albaum at Bielefeld’s CeBiTec noted that, “Univa Grid Engine enables highly efficient usage of our compute resources with a very small footprint…it’s an outstanding workload orchestration solution for the distribution of large numbers of jobs on a compute cluster – even for heterogeneous setups like ours.”
Download the case study here.
Learn how Univa Grid Engine can help your organization realize time and cost savings while providing new capabilities in the areas of containers, hybrid cloud, machine learning, and more by contacting the Univa team today.