Why does a seismic-processing method originally developed in the 1980s still warrant interest in 2016 at the annual Rice University Oil & Gas HPC Conference? Reverse Time Migration (RTM) remains performance-challenged. Because it’s also the black-gold standard for sharpening seismic images, the multitude of organizations that rely on RTM are extremely motivated towards anything that delivers improvements. In fact, the stakes are even higher today: The industry demands progressively higher-resolution subsurface imagery. And that resolution carries with it a performance hit proportional to wavefield frequency to the fourth power! In other words, RTM is increasingly performance-challenged.
Although nifty algorithms that exploit GPUs have been responsible for recent performance gains, I decided to take a completely different approach based on use of Apache Spark. With its predisposition for in-memory computing, Spark minimizes the disk I/O required by traditional methods during wavefield correlations involving multiple TB scale data volumes. Although I’ve been prototyping code on my laptop using Spark via Thunder, I’m anticipating a smooth transition to a production cluster with large datastores through use of Univa Universal Resource Broker – as through its support for any Mesos-compliant framework, Spark is automagically included.
Spark gets its smarts from Resilient Distributed Datasets (RDDs). As a compelling example of savvy research in Computer Science from Berkeley’s AMPLab, RDDs possess characteristics that appeal to HPC in addition to Big Data Analytics – i.e., the primitives for distributed-memory parallel computing, that include built-in fault tolerance, for example. In other words, I remain optimistic that Spark can also be applied to the finite-difference modeling kernel that remains the f4 RTM bottleneck. Further research is required.
Finally, it’s worth pointing out that you don’t need to be refactoring entire HPC applications or workflows for the introduction of Apache Spark to make sense. Instead, you can systematically introduce Spark where it makes sense – for example, especially in situations where you seek to exploit RDDs in manipulating data. And because Univa Universal Resource Broker is an add-on to Univa Grid Engine, you can rest assured that your refactored workload can be fully managed in clusters on the ground or in the cloud.