MUG'24

(Preliminary Program)

All Times Are U.S. EDT

Bale Theater at Ohio Supercomputer Center

Monday, August 19

7:30 - 8:30

Registration and Continental Breakfast

Abstract


Bio

Richard

Prof. Sameer Shende has helped develop the TAU Performance System, the Program Database Toolkit (PDT), the Extreme-scale Scientific Software Stack (E4S) [https://e4s.io] and the HPCLinux distro. His research interests include tools and techniques for performance instrumentation, measurement, analysis, runtime systems, software stacks, HPC container runtimes, and compiler optimizations. He serves as a Research Professor and the Director of the Performance Research Laboratory at the University of Oregon, and as the President and Director of ParaTools, Inc. and ParaTools, SAS.

11:00 - 11:30

Morning Coffee Break

Abstract


Bio

Broadcom

Dr. Richard Graham is a Senior Director, HPC Technology at NVIDIA's Networking Business unit. His primary focus is on HPC network software and hardware capabilities for current and future HPC technologies. Prior to moving to Mellanox/NVIDIA, Rich spent thirteen years at Los Alamos National Laboratory and Oak Ridge National Laboratory, in computer science technical and administrative roles, with a technical focus on communication libraries and application analysis tools. He is cofounder of the Open MPI collaboration and was chairman of the MPI 3.0 standardization efforts.

12:30 - 1:30

Lunch Break

Abstract


Bio

Donglai

Dr. Donglai Dai is a Chief Engineer at X-ScaleSolutions and leads company’s R&D team. His current work focuses on developing scalable efficient communication libraries, checkpointing and restart libraries, and performance analysis tools for distributed and parallel HPC and deep learning applications on HPC systems and clouds. He has more than 20 years of industry experience in engineering management and development of computer systems, VLSI, IoT, and interconnection networks while working at Intel, Cray, SGI, and startups. He holds more than 10 granted US patents and has published more than 40 technical papers or book chapters. He has a PhD degree in computer science from The Ohio State University.

Abstract

The tutorial will start with an overview of the MVAPICH libraries and their features. Next, we will focus on installation guidelines, runtime optimizations and tuning flexibility in-depth. An overview of configuration and debugging support in MVAPICH2 libraries will be presented. High-performance support for NVIDIA/AMD GPU-enabled clusters in MAVPICH-Plus/MVAPICH2-GDR and many-core systems in MVAPICH-Plus/MVAPICH2-X will be presented. The impact on the performance of the various features and optimization techniques will be discussed in an integrated fashion. `Best Practices' for a set of common applications will be presented. A set of case studies related to example applications to demonstrate how one can effectively take advantage of MVAPICH for High End Computing applications using MPI and CUDA/OpenACC will also be presented.

Bio

Hari Nat

Dr. Hari Subramoni is an assistant professor in the Department of Computer Science and Engineering at the Ohio State University, USA. His current research interests include high performance interconnects and protocols, parallel computer architecture, network-based computing, exascale computing, network topology aware computing, QoS, power-aware LAN-WAN communication, fault tolerance, virtualization, big data, deep learning and cloud computing. He has published over 100 papers in international journals and conferences related to these research areas. He has been actively involved in various professional activities in academic journals and conferences. Dr. Subramoni is doing research on the design and development of MVAPICH2 (High Performance MPI over InfiniBand, iWARP and RoCE) and MVAPICH2-X (Hybrid MPI and PGAS (OpenSHMEM, UPC and CAF)) software packages. He is a member of IEEE.

Nat Shineman is a software engineer in the Department of Computer Science and Engineering at the Ohio State University. His current development work includes high performance interconnects, parallel computing, scalable startup mechanisms, and performance analysis and debugging of the MVAPICH2 library.

3:00 - 3:30

Afternoon Coffee Break

Abstract

The OSU Microbenchmark suite is a popular set of benchmarks for evaluating the performance of HPC systems. In this tutorial, we will take the attendees through the new set of features that have been added to OMB - like support for Java-based benchmarks, Python-based benchmarks. We will also discuss the enhancements done for the C benchmarking suite like support for user defined data types, creating graphical representations of output, data validation, support for profiling tools like PAPI, and support for newer MPI primitives like persistent operations, MPI sessions.


Bio

Hari Subramoni Aamir Shafi Aamir Shafi

Dr. Hari Subramoni is an assistant professor in the Department of Computer Science and Engineering at the Ohio State University, USA. His current research interests include high performance interconnects and protocols, parallel computer architecture, network-based computing, exascale computing, network topology aware computing, QoS, power-aware LAN-WAN communication, fault tolerance, virtualization, big data, deep learning and cloud computing. He has published over 100 papers in international journals and conferences related to these research areas. He has been actively involved in various professional activities in academic journals and conferences. Dr. Subramoni is doing research on the design and development of MVAPICH2 (High Performance MPI over InfiniBand, iWARP and RoCE) and MVAPICH2-X (Hybrid MPI and PGAS (OpenSHMEM, UPC and CAF)) software packages. He is a member of IEEE.

Aamir Shafi is currently a Research Scientist at the Ohio State University where he is involved in the High Performance Big Data project. Dr. Shafi was a Fulbright Visiting Scholar at MIT where he worked on the award-winning Cilk technology. Dr. Shafi received his PhD in Computer Science from the University of Portsmouth, UK in 2006. Dr. Shafi’s current research interests include architecting robust libraries and tools for Big Data computation with emphasis on Machine and Deep Learning applications. Dr. Shafi co-designed and co-developed a Java-based MPI-like library called MPJ Express.

Akshay Paniraja Guptha is a software engineer in the Department of Computer Science and Engineering at the Ohio State University. His current development work includes high-performance interconnects, parallel computing, scalable startup mechanisms, and performance analysis and debugging of the OSU Microbenchmarks and the MVAPICH2 library.

Abstract

The fields of Machine and Deep Learning (ML/DL) have witnessed remarkable advances in recent years, paving the way for cutting-edge technologies and leading to exciting challenges and opportunities. Modern ML/DL frameworks, including TensorFlow, PyTorch, and cuML, have emerged to offer high-performance training and deployment for various types of ML models and Deep Neural Networks (DNNs). This tutorial provides an overview of recent trends in ML/DL leveraging powerful hardware architectures, interconnects, and distributed frameworks to accelerate the training of ML/DL models, especially as they grow larger and more complicated. We present an overview of different DNN architectures, focusing on parallelization strategies for model training. We highlight new challenges and opportunities for communication runtimes to exploit high-performance CPU/GPU architectures to support large-scale distributed training efficiently. We also highlight some of our co-design efforts to utilize MPI for large-scale DNN training on cutting-edge CPU/GPU architectures available on modern HPC clusters.


Bio

Aamir Shafi Arpan Jain

Aamir Shafi is currently a Research Scientist at the Ohio State University where he is involved in the High Performance Big Data project. Dr. Shafi was a Fulbright Visiting Scholar at MIT where he worked on the award-winning Cilk technology. Dr. Shafi received his PhD in Computer Science from the University of Portsmouth, UK in 2006. Dr. Shafi’s current research interests include architecting robust libraries and tools for Big Data computation with emphasis on Machine and Deep Learning applications. Dr. Shafi co-designed and co-developed a Java-based MPI-like library called MPJ Express.

Nawras Alnaasan is a Graduate Research Associate at the Network-Based Computing Laboratory, Columbus, OH, USA. He is currently pursuing a Ph.D. degree in computer science and engineering at The Ohio State University. His research interests lie at the intersection of deep learning and high-performance computing. He works on advanced parallelization techniques to accelerate the training of Deep Neural Networks and exploit underutilized HPC resources covering a wide range of DL applications including supervised learning, semi-supervised learning, and hyperparameter optimization. He is actively involved in several research projects including HiDL (High-performance Deep Learning) and ICICLE (Intelligent Cyberinfrastructure with Computational Learning in the Environment). Alnaasan received his B.S. degree in computer science and engineering from The Ohio State University. Contact him at alnaasan.1@osu.edu.

4:15 - 5:30

Short Talks

4:30 - 6:30

Visit to the State of Ohio Computer Center, SOCC (Optional)

6:30 - 9:30

Reception and Dinner at Endeavor Brewing and Spirits

909 W 5th Ave,

Columbus, OH 43212

Tuesday, August 20

7:30 - 8:20

Registration and Continental Breakfast

8:20 - 8:30

Opening Remarks

Dhabaleswar K (DK) Panda, The Ohio State University

Abstract


Bio

Sadaf

Dr Sadaf R. Alam is the University of Bristol's Director of Advanced Computing Strategy. Sadaf joined Bristol University in 2022 from the Swiss National Supercomputing Centre (CSCS) where she was the Chief Technology Officer (CTO). Dr. Alam studied computer science at the University of Edinburgh, UK, where she received her Ph.D. Until March 2009, she was a computer scientist at the Oak Ridge National Laboratory, USA. Sadaf ensures end-to-end integrity of HPC systems and storage solutions and leads strategic projects at the centre. She has held several different roles across her career including group lead of future systems, chief architect and head of operations. She is a member of ACM, ACM-W, SIGHPC and Women in HPC, and was the technical chair of the world Supercomputing conference SC22. Sadaf was the chief architect of multiple generations of Piz Daint supercomputing platforms, which is one of Europe’s fastest and among the top 3 supercomputers in the world for many years, and also chief architect of the MeteoSwiss innovative, co-designed operational numerical weather forecasting platforms.

Abstract

This talk will provide an overview of the MVAPICH project (past, present, and future). Future roadmap and features for upcoming releases of the MVAPICH software family (including MVAPICH-Plus) will be presented. Current status and future plans OMB will also be presented.


Bio

Dhabaleswar K (DK) Panda

DK Panda is a Distinguished Professor of Engineering and University Distinguished Scholar at the Ohio State University. He has published over 500 papers in the area of high-end computing and networking. The MVAPICH (High-Performance MPI and PGAS over InfiniBand, iWARP, RoCE, EFA, Rockport Networks, and Slingshot) libraries, designed and developed by his research group (mvapich.cse.ohio-state.edu), are currently being used by more than 3,325 organizations worldwide (in 90 countries). More than 1.69 million downloads of this software have taken place from the project's site. This software is empowering several InfiniBand clusters (including the 7th, 21st, 36th and 49th ranked ones) in the TOP500 list. High-performance and scalable solutions for deep learning and machine learning from his group are available from hidl.cse.ohio-state.edu. High-performance and scalable libraries for Big Data stacks (Spark, Hadoop, and Memcached) and Data science applications from his group (hibd.cse.ohio-state.edu) are also publicly available. These libraries are currently being used by more than 360 organizations in 39 countries. More than 47,000 downloads of these libraries have taken place. He is an IEEE Fellow and a recipient of 2022 IEEE Charles Babbage Award. More details about Prof. Panda are available at cse.ohio-state.edu/~panda.

10:15 - 10:45

Morning Coffee Break

Abstract


Bio

Adam Moody

Gilad Shainer serves as senior vice-president of networking at NVIDIA. Mr. Shainer serves as the chairman of the HPC-AI Advisory Council organization, the president of UCF consortium, a member of IBTA and a contributor to the PCISIG PCI-X and PCIe specifications. Mr. Shainer holds multiple patents in the field of high-speed networking. He is a recipient of 2015 R&D100 award for his contribution to the CORE-Direct In-Network Computing technology and the 2019 R&D100 award for his contribution to the Unified Communication X (UCX) technology. Gilad Shainer holds a MSc degree and a BSc degree in Electrical Engineering from the Technion Institute of Technology in Israel.

Abstract


Bio

Matthew Anderson

Martin Hilgeman joined Dell Technologies in 2011, after having worked as an HPC application specialist for 12 years at SGI and IBM. In 2019, he joined AMD as a senior manager and worked on porting and optimizing the major HPC applications to the “Rome” microarchitecture. Martin returned to Dell Technologies in May 2020 as the HPC performance lead and Distinguished Member of Technical Staff in Dell ISG. He owns a master’s degree in physical chemistry, obtained at the VU University of Amsterdam.

Abstract


Bio

Gilad Shainer

Douglas Fuller is the director of software development at Cornelis Networks. Doug joined Cornelis from Red Hat, where he served as a software engineering manager leading teams working on the Ceph distributed storage system. Doug’s career in HPC has included stints at various universities and Oak Ridge National Laboratory.

Doug holds bachelor's and master's degrees in computer science from Iowa State University. His master's work at DOE Ames Laboratory involved early one-sided communication models in supercomputers. From his undergraduate days, he remains keenly aware of the critical role of floppy diskettes in Beowulf cluster administration.Doug holds bachelor's and master's degrees in computer science from Iowa State University. His master's work at DOE Ames Laboratory involved early one-sided communication models in supercomputers. From his undergraduate days, he remains keenly aware of the critical role of floppy diskettes in Beowulf cluster administration.

12:15 - 12:30

Group Photo

12:30 - 1:30

Lunch Break

Abstract


Bio

Jithin Jose

Matt Anderson is part of the High Performance Computing group at Idaho National Laboratory with specific focus in supporting University and Industry users.

Abstract


Bio

Ashok

Dr. Jithin Jose is a Principal Software Engineer at Microsoft. His work is focused on co-design of software and hardware building blocks for high performance computing platform, and performance optimizations. His research interests include high performance interconnects and protocols, parallel programming models, big data and cloud computing. Before joining Microsoft, he worked at Intel and IBM Research. He has published more than 25 papers in major conferences and journals related to these research areas. Dr. Jose received his Ph.D. degree from The Ohio State University in 2014.

Abstract


Bio

Hemal Shah

Dr. Alan Ayala is a Software Engineer at AMD. His work focusses on design of FFT software for GPUs and high-performance computing systems. His research interests include GPU and parallel programming, FFT applications, performance optimization, profiling tools, and network interconnects. Before joining AMD, he worked at the Innovative Computing Laboratory as a research scientist and developed heFFTe library for FFT computation on Exascale Systems. Dr. Ayala received his Ph.D. degree in Computational Mathematics in 2018 from Sorbonne University in Paris-France.

3:30 - 4:00

Student Poster Session (In-Person) and Coffee Break

Abstract


Bio

Sameer Shende

Ammar Ahmad Awan is a Researcher at Microsoft. He received his PhD in Computer Science in May, 2020 from The Ohio State University. He received his B.S. and M.S. degrees in Computer Science and Engineering from National University of Science and Technology (NUST), Pakistan and Kyung Hee University (KHU), South Korea, respectively. His current research focus lies at the intersection of High Performance Computing (HPC) libraries and Deep Learning (DL) frameworks. He previously worked on a Java-based Message Passing Interface (MPI) and nested parallelism with OpenMP and MPI for scientific applications. He has published 20 papers in conferences and journals related to these research areas. He actively contributes to the DeepSpeed project at Microsoft. Before that, he has contributed to various projects like MVAPICH2-GDR (High Performance MPI for GPU clusters, OMB (OSU Micro Benchmarks), and HiDL (High Performance Deep Learning). He is the lead author of the OSU-Caffe framework (part of HiDL project) that allows efficient distributed training of Deep Neural Networks.

Abstract


Bio

Vipin Chaudhary

Prof. Sameer Shende has helped develop the TAU Performance System, the Program Database Toolkit (PDT), the Extreme-scale Scientific Software Stack (E4S) and the HPCLinux distro. His research interests include tools and techniques for performance instrumentation, measurement, analysis, runtime systems, software stacks, HPC container runtimes, and compiler optimizations. He serves as a Research Professor and the Director of the Performance Research Laboratory at the University of Oregon, and as the President and Director of ParaTools, Inc. and ParaTools, SAS.

5:30 - 5:45

Open MIC Session

4:30 - 6:30

Visit to the State of Ohio Computer Center, SOCC (Optional)

6:30 - 9:30

Banquet Dinner at Bravo Restaurant

1803 Olentangy River RD

Columbus, OH 43212

Wednesday, August 21

7:30 - 8:30

Registration and Continental Breakfast

Abstract


Bio

Taisuke

Associate Vice President for Research at The University of Texas at Austin since 2018 and Executive Director of the Texas Advanced Computing Center (TACC) since 2014, is a nationally recognized leader in high performance computing. He is the principal investigator (PI) for a National Science Foundation (NSF) grant to acquire and deploy Frontera, which will be the fastest supercomputer at any U.S. university. Stanzione is also the PI of TACC's Stampede2 and Wrangler systems, supercomputers for high performance computing and for data-focused applications, respectively. For six years he was co-PI of CyVerse, a large-scale NSF life sciences cyberinfrastructure. Stanzione was also a co-PI for TACC's Ranger and Lonestar supercomputers, large-scale NSF systems previously deployed at UT Austin. Stanzione received his bachelor's degree in electrical engineering and his master's degree and doctorate in computer engineering from Clemson University.

Abstract


Bio

Aamir Shafi

Mahidhar Tatineni received his M.S. & Ph.D. in Aerospace Engineering from UCLA. He currently leads the User Services group at SDSC as a Computational and Data Science Research Specialist Manager. He has led the support of high-performance computing and data applications software on several NSF and UC funded HPC and AI supercomputers including Voyager, Expanse, Comet, and Gordon at SDSC. His research interests are in HPC architecture and systems, performance and scalability, benchmarking and HPC middleware. He has worked on many NSF funded optimization and parallelization research projects such as MPI performance tuning frameworks, hybrid programming models, big data middleware, and application performance evaluation using next generation communication mechanisms for emerging HPC systems. He is co-PI on the NSF funded Expanse HPC system and the National Research Platform projects at SDSC.

10:30 - 11:00

Morning Coffee Break

Abstract


Bio

CHRIS EDSALL
Dr. Yifeng Cui heads the High Performance GeoComputing Lab at SDSC, and helped to establish the Southern California Earthquake Center (SCEC) as a world leader in advancing high performance computing in earthquake system science. Cui’s groundbreaking work includes enabling TeraShake, ShakeOut and M8, some of the worst-case scenarios on San Andreas fault revealing order-of-magnitude LA wave-guide amplification. He is recipient of several HPC awards including 2015 Nvidia Global Impact Award, 2013 IDC HPC innovation excellence award, and 2009/2011 SciDAC OASCR awards. He also directed an Intel Parallel Computing Center on earthquake research. Cui earned his Ph.D. in Hydrology from the University of Freiburg, Germany.

Abstract


Bio

Stan Tomov

Nathan Hanford is a Computer Scientist in the Livermore Computing Division at Lawrence Livermore National Laboratory. His research is currently focused on application and development environment portability for parallel software applications at the application binary interface (ABI). His operational work is focused on development environment design and verification, message passing interface (MPI) support and development, and system-wide accelerator-aware interconnect benchmarking for codesign, system acceptance, and strategic decision-making support.

Towards accomplishing these goals, Nathan collaborates with Groupe EOLEN and the Commissariat a l'énergie atomique et aux énergies alternatives (CEA), leveraging the Wi4MPI project, which dynamically translates ABI-incompatible MPI operations at runtime. He also works closely with multiple vendor partners to increase their middleware portability to a variety of compute clusters, and participates in the MPI Forum.

Nathan came from a high-performance networking background. While earning his PhD at University of California Davis, he was a perennial summer student with ESnet at Lawrence Berkeley Laboratory. During this time, he focused on end-system optimizations and congestion avoidance for high-speed, long-distance networking.

Abstract


Bio

Greg Becker

Ahmad Abdelfattah, research assistant professor at the Innovative Computing Laboratory at the University of Tennessee, received his PhD in computer science from King Abdullah University of Science and Technology (KAUST) in 2015, where he was a member of the Extreme Computing Research Center (ECRC). His research interests span high performance computing, parallel numerical algorithms, and general purpose GPU computing. He currently serves as the principal investigator of the MAGMA library. Abdelfattah has been acknowledged by NVIDIA and AMD for contributing to their numerical BLAS libraries, cuBLAS and rocBLAS, respectively.

12:30 - 1:30

Lunch Break

Abstract


Bio

Dan Stanzione

HooYoung Ahn received the B.S. and M.S. degrees in computer science from Sookmyung Women's University, Seoul, Republic of Korea, in 2007 and 2009, respectively, and the Ph.D. degree in school of computing from the Korea Advanced Institute of Science and Technology (KAIST), Deajeon, Republic of Korea, in 2016. She is currently a Senior Researcher with the Supercomputing Technology Research Center, Electronics and Telecommunications Research Institute, Deajeon, Republic of Korea. Her research interests include parallel processing, artificial intelligence, high performance computing.

2:45 - 3:15

Afternoon Coffee Break

Abstract


Bio

Aamir Shafi

Dr. Donglai Dai is a Chief Engineer at X-ScaleSolutions and leads company’s R&D team. His current work focuses on developing scalable efficient communication libraries, checkpointing and restart libraries, and performance analysis tools for distributed and parallel HPC and deep learning applications on HPC systems and clouds. He has more than 20 years of industry experience in engineering management and development of computer systems, VLSI, IoT, and interconnection networks while working at Intel, Cray, SGI, and startups. He holds more than 10 granted US patents and has published more than 40 technical papers or book chapters. He has a PhD degree in computer science from The Ohio State University.

Abstract

The High Performance Computing (HPC) community has widely adopted Message Passing Interface (MPI) libraries to exploit high-speed and low-latency networks like InfiniBand, Omni-Path, Slingshot, and others. This talk provides an overview of MPI4Spark and MPI4Dask that are enhanced versions of Spark and Dask frameworks, respectively. These stacks can utilize MPI for communication in a parallel and distributed setting on HPC systems connected via fast interconnects. MPI4Spark can launch the Spark ecosystem using MPI launchers to utilize MPI communication. It also maintains isolation for application execution on worker nodes by forking new processes using Dynamic Process Management (DPM). It bridges semantic differences between the event-driven communication in Spark compared to the application-driven communication engine in MPI. MPI4Dask is an MPI-based custom Dask framework that is targeted for modern HPC clusters built with CPU and NVIDIA GPUs. MPI4Dask provides point-to-point asynchronous I/O communication coroutines, which are non-blocking concurrent operations defined using the async/await keywords from the Python's asyncio framework. The talk concludes by evaluating the performance of MPI4Spark and MPI4Dask on the state-of-the-art HPC systems.


Bio

Aamir Shafi

Aamir Shafi is currently a Research Scientist at the Ohio State University where he is involved in the High Performance Big Data project. Dr. Shafi was a Fulbright Visiting Scholar at MIT where he worked on the award-winning Cilk technology. Dr. Shafi received his PhD in Computer Science from the University of Portsmouth, UK in 2006. Dr. Shafi’s current research interests include architecting robust libraries and tools for Big Data computation with emphasis on Machine and Deep Learning applications. Dr. Shafi co-designed and co-developed a Java-based MPI-like library called MPJ Express.

4:15 - 5:00

Short Talks

5:00

Closing Remarks