MUG'24

(Preliminary Program)

All Times Are U.S. EDT

Bale Theater at Ohio Supercomputer Center

Monday, August 19

7:30 - 8:30

Registration and Continental Breakfast

Abstract


Bio

Dennis

Dennis Dalessandro is a kernel engineer for Cornelis Networks leading in the development of Omni-Path Architecture HW drivers. He received a BS in Computer Science from The Ohio State University. Over the past 18 years he has been a researcher for the Ohio Supercomputer Center, a performance engineer at NetApp, and a driver developer at Intel. Dennis is a very active supporter of OpenSouce software and enjoys working closely with the Kernel.org community and with various Linux distributions.

Abstract


Bio

Richard

Prof. Sameer Shende has helped develop the TAU Performance System, the Program Database Toolkit (PDT), the Extreme-scale Scientific Software Stack (E4S) [https://e4s.io] and the HPCLinux distro. His research interests include tools and techniques for performance instrumentation, measurement, analysis, runtime systems, software stacks, HPC container runtimes, and compiler optimizations. He serves as a Research Professor and the Director of the Performance Research Laboratory at the University of Oregon, and as the President and Director of ParaTools, Inc. and ParaTools, SAS.

11:00 - 11:30

Morning Coffee Break

Abstract


Bio

Broadcom

Dr. Richard Graham is a Senior Director, HPC Technology at NVIDIA's Networking Business unit. His primary focus is on HPC network software and hardware capabilities for current and future HPC technologies. Prior to moving to Mellanox/NVIDIA, Rich spent thirteen years at Los Alamos National Laboratory and Oak Ridge National Laboratory, in computer science technical and administrative roles, with a technical focus on communication libraries and application analysis tools. He is cofounder of the Open MPI collaboration and was chairman of the MPI 3.0 standardization efforts.

12:30 - 1:30

Lunch Break

Abstract


Bio

Donglai

Dr. Donglai Dai is a Chief Engineer at X-ScaleSolutions and leads company’s R&D team. His current work focuses on developing scalable efficient communication libraries, checkpointing and restart libraries, and performance analysis tools for distributed and parallel HPC and deep learning applications on HPC systems and clouds. He has more than 20 years of industry experience in engineering management and development of computer systems, VLSI, IoT, and interconnection networks while working at Intel, Cray, SGI, and startups. He holds more than 10 granted US patents and has published more than 40 technical papers or book chapters. He has a PhD degree in computer science from The Ohio State University.

Abstract

The tutorial will start with an overview of the MVAPICH libraries and their features. Next, we will focus on installation guidelines, runtime optimizations and tuning flexibility in-depth. An overview of configuration and debugging support in MVAPICH2 libraries will be presented. High-performance support for NVIDIA/AMD GPU-enabled clusters in MAVPICH-Plus/MVAPICH2-GDR and many-core systems in MVAPICH-Plus/MVAPICH2-X will be presented. The impact on the performance of the various features and optimization techniques will be discussed in an integrated fashion. `Best Practices' for a set of common applications will be presented. A set of case studies related to example applications to demonstrate how one can effectively take advantage of MVAPICH for High End Computing applications using MPI and CUDA/OpenACC will also be presented.

Bio

Hari Nat

Dr. Hari Subramoni is an assistant professor in the Department of Computer Science and Engineering at the Ohio State University, USA. His current research interests include high performance interconnects and protocols, parallel computer architecture, network-based computing, exascale computing, network topology aware computing, QoS, power-aware LAN-WAN communication, fault tolerance, virtualization, big data, deep learning and cloud computing. He has published over 100 papers in international journals and conferences related to these research areas. He has been actively involved in various professional activities in academic journals and conferences. Dr. Subramoni is doing research on the design and development of MVAPICH2 (High Performance MPI over InfiniBand, iWARP and RoCE) and MVAPICH2-X (Hybrid MPI and PGAS (OpenSHMEM, UPC and CAF)) software packages. He is a member of IEEE.

Nat Shineman is a software engineer in the Department of Computer Science and Engineering at the Ohio State University. His current development work includes high performance interconnects, parallel computing, scalable startup mechanisms, and performance analysis and debugging of the MVAPICH2 library.

3:00 - 3:30

Afternoon Coffee Break

Abstract

The OSU Microbenchmark suite is a popular set of benchmarks for evaluating the performance of HPC systems. In this tutorial, we will take the attendees through the new set of features that have been added to OMB - like support for Java-based benchmarks, Python-based benchmarks. We will also discuss the enhancements done for the C benchmarking suite like support for user defined data types, creating graphical representations of output, data validation, support for profiling tools like PAPI, and support for newer MPI primitives like persistent operations, MPI sessions.


Bio

Hari Subramoni Aamir Shafi Aamir Shafi

Dr. Hari Subramoni is an assistant professor in the Department of Computer Science and Engineering at the Ohio State University, USA. His current research interests include high performance interconnects and protocols, parallel computer architecture, network-based computing, exascale computing, network topology aware computing, QoS, power-aware LAN-WAN communication, fault tolerance, virtualization, big data, deep learning and cloud computing. He has published over 100 papers in international journals and conferences related to these research areas. He has been actively involved in various professional activities in academic journals and conferences. Dr. Subramoni is doing research on the design and development of MVAPICH2 (High Performance MPI over InfiniBand, iWARP and RoCE) and MVAPICH2-X (Hybrid MPI and PGAS (OpenSHMEM, UPC and CAF)) software packages. He is a member of IEEE.

Aamir Shafi is currently a Research Scientist at the Ohio State University where he is involved in the High Performance Big Data project. Dr. Shafi was a Fulbright Visiting Scholar at MIT where he worked on the award-winning Cilk technology. Dr. Shafi received his PhD in Computer Science from the University of Portsmouth, UK in 2006. Dr. Shafi’s current research interests include architecting robust libraries and tools for Big Data computation with emphasis on Machine and Deep Learning applications. Dr. Shafi co-designed and co-developed a Java-based MPI-like library called MPJ Express.

Akshay Paniraja Guptha is a software engineer in the Department of Computer Science and Engineering at the Ohio State University. His current development work includes high-performance interconnects, parallel computing, scalable startup mechanisms, and performance analysis and debugging of the OSU Microbenchmarks and the MVAPICH2 library.

Abstract

The fields of Machine and Deep Learning (ML/DL) have witnessed remarkable advances in recent years, paving the way for cutting-edge technologies and leading to exciting challenges and opportunities. Modern ML/DL frameworks, including TensorFlow, PyTorch, and cuML, have emerged to offer high-performance training and deployment for various types of ML models and Deep Neural Networks (DNNs). This tutorial provides an overview of recent trends in ML/DL leveraging powerful hardware architectures, interconnects, and distributed frameworks to accelerate the training of ML/DL models, especially as they grow larger and more complicated. We present an overview of different DNN architectures, focusing on parallelization strategies for model training. We highlight new challenges and opportunities for communication runtimes to exploit high-performance CPU/GPU architectures to support large-scale distributed training efficiently. We also highlight some of our co-design efforts to utilize MPI for large-scale DNN training on cutting-edge CPU/GPU architectures available on modern HPC clusters.


Bio

Aamir Shafi Arpan Jain

Aamir Shafi is currently a Research Scientist at the Ohio State University where he is involved in the High Performance Big Data project. Dr. Shafi was a Fulbright Visiting Scholar at MIT where he worked on the award-winning Cilk technology. Dr. Shafi received his PhD in Computer Science from the University of Portsmouth, UK in 2006. Dr. Shafi’s current research interests include architecting robust libraries and tools for Big Data computation with emphasis on Machine and Deep Learning applications. Dr. Shafi co-designed and co-developed a Java-based MPI-like library called MPJ Express.

Nawras Alnaasan is a Graduate Research Associate at the Network-Based Computing Laboratory, Columbus, OH, USA. He is currently pursuing a Ph.D. degree in computer science and engineering at The Ohio State University. His research interests lie at the intersection of deep learning and high-performance computing. He works on advanced parallelization techniques to accelerate the training of Deep Neural Networks and exploit underutilized HPC resources covering a wide range of DL applications including supervised learning, semi-supervised learning, and hyperparameter optimization. He is actively involved in several research projects including HiDL (High-performance Deep Learning) and ICICLE (Intelligent Cyberinfrastructure with Computational Learning in the Environment). Alnaasan received his B.S. degree in computer science and engineering from The Ohio State University. Contact him at alnaasan.1@osu.edu.

4:15 - 5:30

Short Talks

4:30 - 6:30

Visit to the State of Ohio Computer Center, SOCC (Optional)

6:30 - 9:30

Reception and Dinner at Endeavor Brewing and Spirits

909 W 5th Ave,

Columbus, OH 43212

Tuesday, August 20

7:30 - 8:20

Registration and Continental Breakfast

8:20 - 8:30

Opening Remarks

Dhabaleswar K (DK) Panda, The Ohio State University

Abstract


Bio

Sadaf

Dr Sadaf R. Alam is the University of Bristol's Director of Advanced Computing Strategy. Sadaf joined Bristol University in 2022 from the Swiss National Supercomputing Centre (CSCS) where she was the Chief Technology Officer (CTO). Dr. Alam studied computer science at the University of Edinburgh, UK, where she received her Ph.D. Until March 2009, she was a computer scientist at the Oak Ridge National Laboratory, USA. Sadaf ensures end-to-end integrity of HPC systems and storage solutions and leads strategic projects at the centre. She has held several different roles across her career including group lead of future systems, chief architect and head of operations. She is a member of ACM, ACM-W, SIGHPC and Women in HPC, and was the technical chair of the world Supercomputing conference SC22. Sadaf was the chief architect of multiple generations of Piz Daint supercomputing platforms, which is one of Europe’s fastest and among the top 3 supercomputers in the world for many years, and also chief architect of the MeteoSwiss innovative, co-designed operational numerical weather forecasting platforms.

Abstract

This talk will provide an overview of the MVAPICH project (past, present, and future). Future roadmap and features for upcoming releases of the MVAPICH software family (including MVAPICH-Plus) will be presented. Current status and future plans OMB will also be presented.


Bio

Dhabaleswar K (DK) Panda

DK Panda is a Distinguished Professor of Engineering and University Distinguished Scholar at the Ohio State University. He has published over 500 papers in the area of high-end computing and networking. The MVAPICH (High-Performance MPI and PGAS over InfiniBand, iWARP, RoCE, EFA, Rockport Networks, and Slingshot) libraries, designed and developed by his research group (mvapich.cse.ohio-state.edu), are currently being used by more than 3,325 organizations worldwide (in 90 countries). More than 1.69 million downloads of this software have taken place from the project's site. This software is empowering several InfiniBand clusters (including the 7th, 21st, 36th and 49th ranked ones) in the TOP500 list. High-performance and scalable solutions for deep learning and machine learning from his group are available from hidl.cse.ohio-state.edu. High-performance and scalable libraries for Big Data stacks (Spark, Hadoop, and Memcached) and Data science applications from his group (hibd.cse.ohio-state.edu) are also publicly available. These libraries are currently being used by more than 360 organizations in 39 countries. More than 47,000 downloads of these libraries have taken place. He is an IEEE Fellow and a recipient of 2022 IEEE Charles Babbage Award. More details about Prof. Panda are available at cse.ohio-state.edu/~panda.

10:15 - 10:45

Morning Coffee Break

Abstract

NVIDIA networking technologies are designed for training AI at scale. In-network computing, highly effective bandwidth, and noise isolation capabilities have facilitated the creation of larger and more complex foundational models. We'll dive deep into the recent technology announcements and their essential roles in next-generation AI data center designs.


Bio

Adam Moody

Gilad Shainer serves as senior vice president of networking at NVIDIA, focusing on high-performance computing and artificial intelligence. He holds multiple patents in the field of high-speed networking. Gilad Shainer holds an M.S. and a B.S. in electrical engineering from the Technion Institute of Technology in Israel.

Abstract


Bio

Matthew Anderson

Martin Hilgeman joined Dell Technologies in 2011, after having worked as an HPC application specialist for 12 years at SGI and IBM. In 2019, he joined AMD as a senior manager and worked on porting and optimizing the major HPC applications to the “Rome” microarchitecture. Martin returned to Dell Technologies in May 2020 as the HPC performance lead and Distinguished Member of Technical Staff in Dell ISG. He owns a master’s degree in physical chemistry, obtained at the VU University of Amsterdam.

Abstract


Bio

Gilad Shainer

Douglas Fuller is the director of software development at Cornelis Networks. Doug joined Cornelis from Red Hat, where he served as a software engineering manager leading teams working on the Ceph distributed storage system. Doug’s career in HPC has included stints at various universities and Oak Ridge National Laboratory.

Doug holds bachelor's and master's degrees in computer science from Iowa State University. His master's work at DOE Ames Laboratory involved early one-sided communication models in supercomputers. From his undergraduate days, he remains keenly aware of the critical role of floppy diskettes in Beowulf cluster administration.Doug holds bachelor's and master's degrees in computer science from Iowa State University. His master's work at DOE Ames Laboratory involved early one-sided communication models in supercomputers. From his undergraduate days, he remains keenly aware of the critical role of floppy diskettes in Beowulf cluster administration.

12:15 - 12:30

Group Photo

12:30 - 1:30

Lunch Break

Abstract

Butterfly valve performance factors are crucial to the pressurized water reactor industry, and computational fluid dynamics studies of these valves are critical both from a design and safety standpoint. This talk presents results using the Navier-Stokes module in the MOOSE framework built on MVAPICH for simulating butterfly valve performance factors and compares those simulations with empirical results from a nuclear reactor butterfly valve in operation. Multiphysics operations using MVAPICH are discussed.


Bio

Jithin Jose

Matthew Anderson is the manager for the High Performance Computing group at Idaho National Laboratory which maintains and operates the principal HPC datacenter supporting nuclear energy research for the Department of Energy. He came to INL in 2019 from Indiana University where he was assistant research scientist. He is a co-author over 40 peer reviewed publications and one textbook on High Performance Computing and has over 15 years’ experience working in the HPC industry. He holds a Ph.D. in physics from the University of Texas at Austin.

Abstract


Bio

Sameer Shende

Ammar Ahmad Awan is a Researcher at Microsoft. He received his PhD in Computer Science in May, 2020 from The Ohio State University. He received his B.S. and M.S. degrees in Computer Science and Engineering from National University of Science and Technology (NUST), Pakistan and Kyung Hee University (KHU), South Korea, respectively. His current research focus lies at the intersection of High Performance Computing (HPC) libraries and Deep Learning (DL) frameworks. He previously worked on a Java-based Message Passing Interface (MPI) and nested parallelism with OpenMP and MPI for scientific applications. He has published 20 papers in conferences and journals related to these research areas. He actively contributes to the DeepSpeed project at Microsoft. Before that, he has contributed to various projects like MVAPICH2-GDR (High Performance MPI for GPU clusters, OMB (OSU Micro Benchmarks), and HiDL (High Performance Deep Learning). He is the lead author of the OSU-Caffe framework (part of HiDL project) that allows efficient distributed training of Deep Neural Networks.

Abstract

In this talk, we will survey recent developments of multi-GPU FFT implementations on large-scale systems, and study communication frameworks for general parallel transposition of multi-dimensional arrays. We will then evaluate asymptotic scalability behavior, the impact of selecting MPI distributions and types of collective routines for accelerating FFT performance. Finally, we will present several experiments on modern supercomputers leveraging different MPI libraries, such as MVAPICH2, and network topologies.


Bio

Hemal Shah

Dr. Alan Ayala is a member of the technical staff at AMD. His work focusses on design of FFT software for GPUs and high-performance computing systems. His research interests include GPU and parallel programming, FFT applications, performance optimization, profiling tools, and network interconnects. Before joining AMD, he worked at the Innovative Computing Laboratory as a research scientist and developed heFFTe library for FFT computation on Exascale Systems. Dr. Ayala received his Ph.D. degree in Computational Mathematics in 2018 from Sorbonne University in Paris-France.

Abstract

The talk will introduce the audience to C-DAC’s indigenously developed Trinetra interconnect and will focus on enabling MVAPICH2 over Trinetra interconnect. Performance numbers with MVAPICH2 over Trinetra will also be discussed during the talk.


Bio

Mr. Yogeshwar Sonawane is associated with Centre for Development of Advanced Computing (C-DAC), Pune for last 20 years. He is working with HPC Technologies group and holds a position of Scientist F. He leads the team involved in system software development for Trinetra network and firmware development for Rudra server platform. His research interests include High Performance Interconnects, Programming Models and Performance Optimizations. Yogeshwar has a Bachelor of Engineering (B.E.) degree in Electronics & Telecommunications from Govt. College of Engineering, Pune (COEP), India.

3:30 - 4:00

Student Poster Session (In-Person) and Coffee Break

Abstract

The performance and feature gap between bare-metal and Cloud HPC/AI clusters is almost imperceptible on Clouds such as Azure. This is quite evident as Azure Supercomputers have climbed up into the top HPC/AI cluster rankings lists such as Top500 and MLPerf. Public clouds democratize HPC/AI Supercomputers with focus on performance, scalability, and cost-efficiency. As the cloud platform technologies and features continue to evolve, middleware such as MPI libraries and communication runtimes play a key role in enabling applications to make use of the technology advancements, and with high performance. This talk focuses on how MVAPICH2 efficiently enables the latest technology advancements such as SR-IOV, GPU-Direct RDMA, DPU, etc. in virtualized HPC and AI clusters. This talk will also provide an overview of the latest HPC and AI offerings in Microsoft Azure HPC along with their performance characteristics with MVAPICH2/MVAPICH2-X.


Bio

Ashok

Dr. Jithin Jose is a Principal Software Engineer at Microsoft. His work is focused on co-design of software and hardware building blocks for high performance computing platform, and performance optimizations. His research interests include high performance interconnects and protocols, parallel programming models, big data and cloud computing. Before joining Microsoft, he worked at Intel and IBM Research. He has published more than 25 papers in major conferences and journals related to these research areas. Dr. Jose received his Ph.D. degree from The Ohio State University in 2014.

Abstract


Bio

Vipin Chaudhary

Prof. Sameer Shende has helped develop the TAU Performance System, the Program Database Toolkit (PDT), the Extreme-scale Scientific Software Stack (E4S) and the HPCLinux distro. His research interests include tools and techniques for performance instrumentation, measurement, analysis, runtime systems, software stacks, HPC container runtimes, and compiler optimizations. He serves as a Research Professor and the Director of the Performance Research Laboratory at the University of Oregon, and as the President and Director of ParaTools, Inc. and ParaTools, SAS.

5:30 - 5:45

Open MIC Session

4:30 - 6:30

Visit to the State of Ohio Computer Center, SOCC (Optional)

6:30 - 9:30

Banquet Dinner at Bravo Restaurant

1803 Olentangy River RD

Columbus, OH 43212

Wednesday, August 21

7:30 - 8:30

Registration and Continental Breakfast

Abstract


Bio

Taisuke

He is the principal investigator (PI) for a number of the National Science Foundation (NSF) supercomputers, including the current Frontera system, which is the fastest supercomputer at a U.S. university, and is leading the upcoming NSF Leadership Class Computing Facility. Stanzione received his bachelor's degree in electrical engineering and his master's degree and doctorate in computer engineering from Clemson University.

Abstract

Supercomputers are not only the core infrastructure that enables simulations in all areas where experiments are difficult, but they are also increasingly being used to develop AI technologies. In this talk, we will introduce the Supreme-K project led by the Korean government. I will introduce the accelerators, SW and HW platforms for the supercomputer, which is the first in-house designed and developed in Korea, and the future development direction.


Bio

Dhabaleswar K (DK) Panda

Yoomi Park received her Ph.D. from Chungnam National University, Republic of Korea, in 2010. She is currently a principal researcher at the Supercomputing System Research Section, Electronics and Telecommunications Research Institute, Daejeon, Korea. Her research interests include high performance computing, artificial intelligence, and distributed and parallel computing.

Abstract


Bio

Aamir Shafi

Mahidhar Tatineni received his M.S. & Ph.D. in Aerospace Engineering from UCLA. He currently leads the User Services group at SDSC as a Computational and Data Science Research Specialist Manager. He has led the support of high-performance computing and data applications software on several NSF and UC funded HPC and AI supercomputers including Voyager, Expanse, Comet, and Gordon at SDSC. His research interests are in HPC architecture and systems, performance and scalability, benchmarking and HPC middleware. He has worked on many NSF funded optimization and parallelization research projects such as MPI performance tuning frameworks, hybrid programming models, big data middleware, and application performance evaluation using next generation communication mechanisms for emerging HPC systems. He is co-PI on the NSF funded Expanse HPC system and the National Research Platform projects at SDSC.

10:30 - 11:00

Morning Coffee Break

Abstract

The Gordon Bell-winning AWP-ODC application continues to push the boundaries of earthquake simulation by leveraging the enhanced performance of MVAPICH on both CPU and GPU-based architectures. This presentation highlights the recent improvements to the code and its application to broadband deterministic 3D wave propagation simulations of earthquake ground motions, incorporating high-resolution surface topography and detailed underground structures. The results of these simulations provide critical insights into the potential impacts of major earthquakes, contributing to more effective disaster preparedness and mitigation strategies. Additionally, the presentation will address the scientific and technical challenges encountered during the process and discuss the implications for future large-scale seismic studies on Exascale computing systems.


Bio

CHRIS EDSALL

Dr. Yifeng Cui heads the High Performance GeoComputing Lab at SDSC, and helped to establish the Southern California Earthquake Center (SCEC) as a world leader in advancing high performance computing in earthquake system science. Cui’s groundbreaking work includes enabling TeraShake, ShakeOut and M8, some of the worst-case scenarios on San Andreas fault revealing order-of-magnitude LA wave-guide amplification. He is recipient of several HPC awards including 2015 Nvidia Global Impact Award, 2013 IDC HPC innovation excellence award, and 2009/2011 SciDAC OASCR awards. He also directed an Intel Parallel Computing Center on earthquake research. Cui earned his Ph.D. in Hydrology from the University of Freiburg, Germany.

Dr. Te-Yang Yeh is a postdoctoral research scholar at San Diego State University, working in collaboration with the San Diego Supercomputer Center (SDSC) on the development of advanced wave propagation simulation codes. Dr. Yeh has been deeply involved in earthquake ground motion simulations using high-performance computing (HPC) to refine the understanding of the Earth's elastic properties and enhance the accuracy of predicted ground motions in Southern California and other regions across the United States. Leveraging the exceptional performance of the Gordon Bell Award-winning AWP-ODC application, Dr. Yeh conducts large-scale deterministic 3D wave propagation simulations at frequencies up to 10Hz, significantly advancing the use of numerical simulations for real-world seismic risk assessments.

Abstract


Bio

Stan Tomov

Nathan Hanford is a Computer Scientist in the Livermore Computing Division at Lawrence Livermore National Laboratory. His research is currently focused on application and development environment portability for parallel software applications at the application binary interface (ABI). His operational work is focused on development environment design and verification, message passing interface (MPI) support and development, and system-wide accelerator-aware interconnect benchmarking for codesign, system acceptance, and strategic decision-making support.

Towards accomplishing these goals, Nathan collaborates with Groupe EOLEN and the Commissariat a l'énergie atomique et aux énergies alternatives (CEA), leveraging the Wi4MPI project, which dynamically translates ABI-incompatible MPI operations at runtime. He also works closely with multiple vendor partners to increase their middleware portability to a variety of compute clusters, and participates in the MPI Forum.

Nathan came from a high-performance networking background. While earning his PhD at University of California Davis, he was a perennial summer student with ESnet at Lawrence Berkeley Laboratory. During this time, he focused on end-system optimizations and congestion avoidance for high-speed, long-distance networking.

Abstract

This talk presents heFFTe as a cross-platform library for scaling up the computation of the three-dimensional Fast Fourier Transform (FFT) on large scale heterogeneous systems with GPUs. With its ability to utilize different communication patterns and different vendor backends, heFFTe also serves as a good benchmark for MPI implementations, such as MVAPICH and OpenMPI, on different system architectures. The talk will present the strong scaling behavior of heFFTe on different systems using both point-to-point and collective communication primitives. We also plan to show preliminary results utilizing the data compression capability in MVAPICH-Plus.


Bio

Greg Becker

Ahmad Abdelfattah, research assistant professor at the Innovative Computing Laboratory at the University of Tennessee, received his PhD in computer science from King Abdullah University of Science and Technology (KAUST) in 2015, where he was a member of the Extreme Computing Research Center (ECRC). His research interests span high performance computing, parallel numerical algorithms, and general purpose GPU computing. He currently serves as the principal investigator of the MAGMA library. Abdelfattah has been acknowledged by NVIDIA and AMD for contributing to their numerical BLAS libraries, cuBLAS and rocBLAS, respectively.

12:30 - 1:30

Lunch Break

Abstract

In this presentation, we will discuss the research on improving collective communication performance using CXL interconnect, a joint effort between OSU and ETRI. CXL (Compute Express Link) is a cutting-edge high-speed interconnect technology that enhances system scalability by efficiently supporting communication among computing resources such as CPUs, memory, accelerators, and storage. This technology is gaining significant attention as it enables composable computing architectures, allowing data center and HPC systems to configure computing resources in a pool, utilizing only the necessary amount to maximize resource efficiency. In particular, this presentation will focus on techniques to improve MPI allgather and reduce scatter performance using iMEX. The proposed technique is expected to significantly enhance communication performance in CXL-enabled multi-node computing environments at the rack scale, compared to conventional methods.


Bio

Dan Stanzione

HooYoung Ahn received the Ph.D. degree in the School of Computing from the Korea Advanced Institute of Science and Technology (KAIST), Daejeon, Republic of Korea, in 2016. She is currently a Senior Researcher with the Supercomputing System Research Section, Electronics and Telecommunications Research Institute, Daejeon, Republic of Korea. Her research interests include distributed and parallel computing, artificial intelligence, and high performance computing.

2:45 - 3:15

Afternoon Coffee Break

Abstract


Bio

Aamir Shafi

Dr. Donglai Dai is a Chief Engineer at X-ScaleSolutions and leads company’s R&D team. His current work focuses on developing scalable efficient communication libraries, checkpointing and restart libraries, and performance analysis tools for distributed and parallel HPC and deep learning applications on HPC systems and clouds. He has more than 20 years of industry experience in engineering management and development of computer systems, VLSI, IoT, and interconnection networks while working at Intel, Cray, SGI, and startups. He holds more than 10 granted US patents and has published more than 40 technical papers or book chapters. He has a PhD degree in computer science from The Ohio State University.

Abstract

The High Performance Computing (HPC) community has widely adopted Message Passing Interface (MPI) libraries to exploit high-speed and low-latency networks like InfiniBand, Omni-Path, Slingshot, and others. This talk provides an overview of MPI4Spark and MPI4Dask that are enhanced versions of Spark and Dask frameworks, respectively. These stacks can utilize MPI for communication in a parallel and distributed setting on HPC systems connected via fast interconnects. MPI4Spark can launch the Spark ecosystem using MPI launchers to utilize MPI communication. It also maintains isolation for application execution on worker nodes by forking new processes using Dynamic Process Management (DPM). It bridges semantic differences between the event-driven communication in Spark compared to the application-driven communication engine in MPI. MPI4Dask is an MPI-based custom Dask framework that is targeted for modern HPC clusters built with CPU and NVIDIA GPUs. MPI4Dask provides point-to-point asynchronous I/O communication coroutines, which are non-blocking concurrent operations defined using the async/await keywords from the Python's asyncio framework. The talk concludes by evaluating the performance of MPI4Spark and MPI4Dask on the state-of-the-art HPC systems.


Bio

Aamir Shafi

Aamir Shafi is currently a Research Scientist at the Ohio State University where he is involved in the High Performance Big Data project. Dr. Shafi was a Fulbright Visiting Scholar at MIT where he worked on the award-winning Cilk technology. Dr. Shafi received his PhD in Computer Science from the University of Portsmouth, UK in 2006. Dr. Shafi’s current research interests include architecting robust libraries and tools for Big Data computation with emphasis on Machine and Deep Learning applications. Dr. Shafi co-designed and co-developed a Java-based MPI-like library called MPJ Express.

4:15 - 5:00

Short Talks

5:00

Closing Remarks