E-HPC: A Library for Elastic Resource Management in HPC
Environments
Author/Presenters
Event Type
Workshop
TimeMonday, November 13th2pm -
2:25pm
Location501
DescriptionNext-generation data-intensive scientific workflows
need to support streaming and real-time applications
with dynamic resource needs on high performance
computing (HPC) platforms. The static resource
allocation model on current HPC systems that was
designed for monolithic MPI applications is insufficient
to support the elastic resource needs of current and
future workflows. In this paper, we discuss the design,
implementation and evaluation of Elastic-HPC (E-HPC), an
elastic framework for managing resources for scientific
workflows on current HPC systems. E-HPC considers a
resource slot for a workflow as an elastic window that
might map to different physical resources over the
duration of a workflow. Our framework uses
checkpoint-restart as the underlying mechanism to
migrate workflow execution across the dynamic window of
resources. E-HPC provides the foundation necessary to
enable dynamic resource allocation of HPC resources that
are needed for streaming and real-time workflows. E-HPC
has negligible overhead beyond the cost of
checkpointing. Additionally, E-HPC results in decreased
turnaround time of workflows compared to traditional
model of resource allocation for workflows, where
resources are allocated per stage of the workflow. Our
evaluation shows that E-HPC improves core hour
utilization for common workflow resource use patterns
and provides an effective framework for elastic
expansion of resources for applications with dynamic
resource needs.




