Toward Scalable Parallel Training of Deep Neural Networks
Author/Presenters
Event Type
Workshop
Deep Learning
Machine Learning
SIGHPC Workshop
TimeMonday, November 13th3:30pm -
3:54pm
Location502-503-504
DescriptionWe propose a new framework for parallelizing deep
neural network training that maximizes the amount of
data that is ingested by the training algorithm. Our
proposed framework called Livermore Tournament Fast
Batch Learning (LTFB) targets large-scale data problems.
The LTFB approach creates a set of Deep Neural Network
(DNN) models and trains each instance of these models
independently and in parallel. Periodically, each model
selects another model to pair with, exchanges models,
and then run a local tournament against held-out
tournament datasets. The winning model continues
training on the local training datasets. This new
approach maximizes computation and minimizes amount of
synchronization required in training deep neural
network, a major bottleneck in existing synchronous deep
learning algorithms. We evaluate our proposed algorithm
on two HPC machines at Lawrence Livermore National
Laboratory including an early access IBM Power8+ with
NVIDIA Tesla P100 GPUs machine. Experimental evaluations
of the LTFB framework on two popular image
classification benchmark: CIFAR10 and ImageNet, show
significant speed up compared to the sequential
baseline.




