DLMHS-18: The First International Workshop on Large-Scale Deep Learning on Modern Heterogeneous Supercomputers

June 12, 2018, Beijing, China.


Held in conjunction with the 32nd ACM International Conference on Supercomputing(ACM ICS-2018), June 12-15, 2018, Beijing, China.

Submission deadline for abstract is May 1st, 2018 (AOE)


The last decade has seen blooming emergence in deep learning. With ever-increased problem size to resolve and data size to process, deep learning becomes extremely computation demanding. As a result, migrating deep learning algorithms and applications on modern supercomputers, especially heterogeneous supercomputers that incorporate special accelerators such as GPUs, Xeon-Phi, FPGA, TPU, ASIC, etc. becomes a trend.

 Download the DLMHS 18 CFP


This workshop will emphasize novel, disruptive research ideas over incremental advances. We will solicit papers on topics including, but not limited to, the following areas:

In this workshop, we invite novel and recent works including, but not limited to, the following topics:


As a brand new workshop, we decide to make it discussion oriented this year. We invite 2-page double-column submission on novel and recent published works. We also welcome unfinished novel ideas. Please follow the ACM proceeding sigconf template (https://www.acm.org/publications/proceedings-template) using 10-point font for submission. Kindly note that the submission will not appear in proceedings so it can be further developed and submitted to a formal conference or journals. Finished or published works will be given 25 minutes to fully describe their contributions while unfinished work and novel ideas will be given 10 minutes to motivate the audience.

Submission will be accepted through the EasyChair System through this link: (https://easychair.org/conferences/?conf=dlmhs18)

Important Dates

All dates are Anywhere on Earth (AOE)

General Chairs

Program Committee

Workshop Program

13:30 to 14:15 AM

Keynote-1: Parallel and Distributed Deep Learning and HPC

Prof. Torsten Hoefler, ETH Zurich

14:15 to 15:00 AM

Keynote-2: Challenges in Deep Learning from HPC Perspectives

Dr. Jiangming Jin, TuSimple

15:00 to 15:30 Coffee Break
15:30 to 16:00 PM

Invited Talk-1: Efficient Allocation and Heterogeneous Composition of NVM Crossbars for Deep Learning Acceleration

Prof. Lide Duan, University of Texas at San Antonio

15:30 to 16:00 PM

Invited Talk-2: ImageNet Training in Minutes

Dr. Yang You, UC Berkeley

15:30 to 16:00 PM

Invited Talk-3: Automated Systolic Array Architecture Synthesis for High Throughput CNN Inference on FPGAs

Dr. Xuechao Wei, Peking University

15:30 to 16:00 PM

Invited Talk-4: Fflow: an FPGA extension for TensorFlow with device placement optimization based on reinforcement learning

Dr. Yongbiao Chen, Shanghai Jiao Tong University

04:45 PM to 04:55 PMWorkshop Closing Comments



Parallel and Distributed Deep Learning and HPC



Prof. Torsten Hoefler, ETH Zurich


Deep Neural Networks (DNNs) are becoming an important tool in modern computing applications. Accelerating their training is a major challenge and techniques range from distributed algorithms to low-level circuit design. In this overview talk, we describe the problem from a theoretical perspective, followed by approaches for its parallelization. Specifically, we present trends in DNN architectures and the resulting implications on parallelization strategies. We discuss the different types of concurrency in DNNs; synchronous and asynchronous stochastic gradient descent; distributed system architectures; communication schemes; and performance modeling. Based on these approaches, we extrapolate potential directions for parallelism in deep learning.

Speaker Bio

Torsten is an Associate Professor of Computer Science at ETH Zurich, Switzerland. Before joining ETH, he led the performance modeling and simulation efforts of parallel petascale applications for the NSF-funded Blue Waters project at NCSA/UIUC. He is also a key member of the Message Passing Interface (MPI) Forum where he chairs the “Collective Operations and Topologies” working group. Torsten won best paper awards at the ACM/IEEE Supercomputing Conference SC10, SC13, SC14, EuroMPI’13, HPDC’15, HPDC’16, IPDPS’15, and other conferences. He published numerous peer-reviewed scientific conference and journal articles and authored chapters of the MPI-2.2 and MPI-3.0 standards. He received the Latsis prize of ETH Zurich as well as an ERC starting grant in 2015. His research interests revolve around the central topic of “Performance-centric System Design” and include scalable networks, parallel programming techniques, and performance modeling. Additional information about Torsten can be found on his homepage at htor.inf.ethz.ch.



Challenges in Deep Learning from HPC Perspectives



Dr. Jiangming Jin, TuSimple


Topics about deep learning have been widely discussed, from academics to industries, and from infrastructures to applications. This talk explains the challenges in deep learning from HPC perspectives. With the increased computing resources required for processing deep learning workload, it is attractive to apply HPC techniques to promote the deep learning performance both in the inference and training. In the deploy and inference aspect, with the blooming of edge devices such as GPUs, FPGAs, and ASICs, it is challenging to obtain the performance gains because of the varying architecture characteristics in terms of memory organizations, compute primitives, etc. This talk explains some novel approaches that come from kernel optimization, operator optimization, and graph optimization.

Speaker Bio

Dr. Jiangming Jin is the Director of HPC Department in TuSimple. He oversees the HPC R&D projects across the autonomous driving system and deep learning framework within TuSimple. Prior to join TuSimple, He worked as an HPC & Quantitative Research Engineer in JP Morgan (Singapore, Beijing). Jiangming received his bachelor’s degree from University of Electronic Science and Technology of China (UESTC) and PhD degree from Nanyang Technological University (NTU) in 2008 and 2013 accordingly.