Our recent webinar on the viability of (High Performance Computing) HPC in the cloud generated some interesting questions regarding containers. In this second part of two, and in no specific order, here are some of the questions and our answers. Click here to download the webinar.
Is the beta-quality integration with Docker included with standard Univa Grid Engine, or is it a separate product?
Univa Grid Engine includes all of the Grid Engine capabilities plus support for Docker containers. Existing Univa customers can request access to the beta-quality release by opening a support ticket with us. For those who aren’t existing customers, please request a trial. When you are contacted by Univa, please make clear your interest in our support for containers and Grid Engine.
Can containers be provided to the cloud infrastructure by the users or do they have to be developed by a cloud provider?
Ultimately, this is a matter of policy, as opposed to technology. In other words, those responsible for providing cloud-based services have the liberty to determine the repositories from which pre-existing images can be retrieved. In the case of Docker for example, Portus is addressing security related concerns for registered images through an easy-to-use interface.
Do containers allow checkpointing and migration of jobs to other machines as VMs do?
To quote our CTO, Fritz Ferstl, on this topic: “The mainstream of the container ecosystem views them as ephemeral – i.e., you can just kill them, restart them (whether on the same node or elsewhere), and then they somehow re-establish ‘service’ (i.e., what they are supposed to do … even though this isn’t an intrinsic capability of a Docker container).” Fritz also pointed out that snapshotting an application’s state can be non-trivial. It may require other components (e.g., containers running databases), and can present challenges due to the by-design ephemerality of containers together with their networking and security specifics.
Because a generic interface for checkpoint, restart and migration exists within Univa Grid Engine, any solution capable of capturing an application’s state while executing within a container can be integrated in principle. And although we haven’t tested it yet, Flocker certainly looks quite encouraging in this regard – well, at least for application-level checkpointing. In the case of application-level checkpointing, suitably enabled applications write out their state periodically during execution. If Flocker is used as the –volume-driver, then another Docker 1.8 container can make use of this checkpoint for a migrated restart.
Of course, application-level checkpointing is in such limited supply that it is almost an exceptional case. Thus for many, a snapshotted VM remains the only option for checkpointing – migration of GB-sized VMs notwithstanding!