MUG'21

(Preliminary Program)

All Times Are U.S. EDT

Monday, August 23

Abstract

This tutorial will walk through the basics of setting up a scalable Slurm-based cluster on AWS, highlighting some features such as shared filesystems, and different instance types.


Bio

aws

TBA

Abstract

TBA


Bio

NVIDIA

Devendar Bureddy is a Principal SW Engineer at Mellanox Technologies. At Mellanox, Devendar was instrumental in building several key technologies like SHARP,UCX, HCOLL..etc. Previously, he was a software developer at The Ohio State University in network-Based Computing Laboratory led by Dr. D. K. Panda, involved in the design and development of MVAPICH. He had received his Master’s degree in Computer Science and Engineering from the Indian Institute of Technology, Kanpur. His research interests include high speed interconnects, parallel programming models and HPC/DL software.

1:00 - 1:30

Break

Tuesday, August 24

10:00 - 10:15

Opening Remarks

Abstract

High-performance computing and artificial intelligence have evolved to be the primary data processing engines for wide commercial use, hosting a variety of users and applications. While providing the highest performance, supercomputers must also offer multi-tenancy security. Therefore they need to be designed as cloud-native platforms. The key element that enables this architecture is the data processing unit (DPU). DPU is a fully integrated data-center-on-a-chip platform that can manage the data center operating system instead of the host processor, enabling security and orchestration of the supercomputer. This architecture enables supercomputing platforms to deliver bare-metal performance, while natively supporting multi-node tenant isolation. We'll introduce the new supercomputing architecture, and include first applications performance results.


Bio

Gilad Shainer

Gilad Shainer serves as senior vice-president of marketing for Mellanox networking at NVIDIA, focusing on high- performance computing, artificial intelligence and the InfiniBand technology. Mr. Shainer joined Mellanox in 2001 as a design engineer and later served in senior marketing management roles since 2005. Mr. Shainer serves as the chairman of the HPC-AI Advisory Council organization, the president of UCF and CCIX consortiums, a member of IBTA and a contributor to the PCISIG PCI-X and PCIe specifications. Mr. Shainer holds multiple patents in the field of high-speed networking. He is a recipient of 2015 R&D100 award for his contribution to the CORE-Direct In-Network Computing technology and the 2019 R&D100 award for his contribution to the Unified Communication X (UCX) technology. Gilad Shainer holds a MSc degree and a BSc degree in Electrical Engineering from the Technion Institute of Technology in Israel.

Abstract

This talk will provide an overview of the MVAPICH project (past, present, and future). Future roadmap and features for upcoming releases of the MVAPICH2 software family (including MVAPICH2-X and MVAPICH2-GDR) will be presented. Current status and future plans for OSU INAM and OMB will also be presented.


Bio

Dhabaleswar K (DK) Panda

DK Panda is a Professor and University Distinguished Scholar of Computer Science and Engineering at the Ohio State University. He has published over 500 papers in the area of high-end computing and networking. The MVAPICH2 (High-Performance MPI and PGAS over InfiniBand, iWARP, RoCE, EFA, and Rockport Networks) libraries, designed and developed by his research group (mvapich.cse.ohio-state.edu), are currently being used by more than 3,200 organizations worldwide (in 89 countries). More than 1.4 million downloads of this software have taken place from the project's site. This software is empowering several InfiniBand clusters (including the 4th, 10th, 20th and 31st ranked ones) in the TOP500 list. High-performance and scalable solutions for deep learning and machine learning from his group are available from hidl.cse.ohio-state.edu. High-performance and scalable libraries for Big Data stacks (Spark, Hadoop and Memcached) and Data science applications from his group (hibd.cse.ohio-state.edu) are also publicly available. These libraries are currently being used by more than 340 organizations in 38 countries. More than 40,000 downloads of these libraries have taken place. He is an IEEE Fellow. More details about Prof. Panda are available at cse.ohio-state.edu/~panda.

Abstract

TBA


Bio

Adam Moody

Adam is a member of the Development Environment Group within Livermore Computing. His background is in MPI development, collective algorithms, networking, and parallel I/O. He is responsible for supporting MPI on Livermore's Linux clusters. He is a project lead for the Scalable Checkpoint / Restart library and mpiFileUtils -- two projects that use MPI to help users manage large data sets. He leads the CORAL burst buffer working group for Livermore. In recent work, he has been investigating how to employ MPI and fast storage in deep learning frameworks like LBANN.

Abstract

Idaho National Laboratory maintains nearly 9 Petaflops of High Performance Computing resources supporting both leadership and engineering simulations using MVAPICH across a wide range of disciplines. This talk will cover recently added improved integration of MVAPICH with PBS Pro and how we use this integration for debugging and job monitoring. Finally, we present MVAPICH benchmarks on our largest systems.


Bio

Matthew Anderson

TBA

1:00 - 1:30

Break

Abstract

TBA


Bio

Hemal Shah Moshe Voloshin

Hemal Shah is a Distinguished Engineer and Systems/Software/Standards architect in the Compute and Connectivity (CCX) division at Broadcom Inc. He leads and manages a team of architects. Hemal is responsible for the definition of Ethernet NIC product architecture and software roadmap/architecture of performance NICs/Smart NICs. Hemal led the architecture definition of several generations of NetXtreme® E-Series/NetXtreme I server product lines and NetXtreme I client product lines. Hemal spearheaded the system architecture development of TruFlowTM technology for vSwitch acceleration/packet processing software frameworks, TruManageTM technology for system and network management, device security features, virtualization and stateless offloads. Hemal has defined the system architecture of RDMA hardware/software solutions for more than two decades.

Moshe Voloshin is Systems architect in Data Center Solutions Group (DCSG) division at Broadcom Inc. Moshe spearheaded the system architecture development of ROCE and Congestion Control in Broadcom Ethernet NICs, involved in definition of product architecture, modeling, and system simulations. Previously Moshe was a Director, manager, ASIC/HW engineer at Cisco High End router division where he developed and managed the development of Network Processing Unit (NPU), QOS, and fabric ASICs, in products such as GSR and CRS.

Abstract

TBA


Bio

Matthew Williams

Matthew Williams is CTO of Rockport Networks and has 25 years of technical leadership and engineering experience, 14 years as CTO of successful network technology companies and has 21 issued US patents. He is an expert strategist, analyst and visionary who has delivered on transformational product concepts. Matthew is an insightful and energetic communicator who enjoys product evangelization and inspiring global business and technical audiences. Matthew has a B.Sc. in Electrical Engineering with First Class Honours from Queen's University, Kingston, Canada and is a registered P.Eng.

Abstract

Recent technology advancements have substantially improved the performance potential of virtualization. As a result, the performance gap between bare-metal and cloud clusters are continuing to shrink. This is quite evident as public clouds such as Microsoft Azure has climbed up into the top 20 and 30 rankings in Graph500 and Top500 list, respectively. Moreover, public clouds democratize these technology advancements with focus on performance, scalability, and cost-efficiency. Though the platform technologies and features are continuing to evolve, middlewares such as MPI libraries play a key role in enabling applications to first make use of the technology advancements, and with high performance. This talk focuses on how MVAPICH2 efficiently enables the latest technology advancements in Azure HPC and AI clusters. This talk will also provide an overview of the latest HPC and AI offerings in Microsoft Azure HPC along with their performance characteristics. It will cover the Microsoft Azure HPC marketplace images that include MVAPICH2-Azure MPI libraries, as well as recommendations and best practices for using MVAPICH2 and MVAPICH2-X on Microsoft Azure. We will also discuss the performance and scalability characteristics using microbenchmark and HPC applications.


Bio

Jithin Jose

Dr. Jithin Jose is a Principal Software Engineer at Microsoft. His work is focused on co-design of software and hardware building blocks for high performance computing platform, and performance optimizations. His research interests include high performance interconnects and protocols, parallel programming models, big data and cloud computing. Before joining Microsoft, he worked at Intel and IBM Research. He has published more than 25 papers in major conferences and journals related to these research areas. Dr. Jose received his Ph.D. degree from The Ohio State University in 2014.

Abstract

There are a lot of hardware and software choices for answering your research questions. This talk covers some of the services, hardware choices, and underlying technologies that enable HPC on AWS, with examples of real-life workloads and benchmarks.


Bio

Matthew Koop

TBA

Abstract

A short presentation on the newly funded AI Institute (ICICLE) will take place, followed by an open discussion on potential collaboration opportunities.


Bio

DK Panda

DK Panda is a Professor and University Distinguished Scholar of Computer Science and Engineering at the Ohio State University. He has published over 500 papers in the area of high-end computing and networking. The MVAPICH2 (High-Performance MPI and PGAS over InfiniBand, iWARP, RoCE, EFA, and Rockport Networks) libraries, designed and developed by his research group (mvapich.cse.ohio-state.edu), are currently being used by more than 3,200 organizations worldwide (in 89 countries). More than 1.4 million downloads of this software have taken place from the project's site. This software is empowering several InfiniBand clusters (including the 4th, 10th, 20th and 31st ranked ones) in the TOP500 list. High-performance and scalable solutions for deep learning and machine learning from his group are available from hidl.cse.ohio-state.edu. High-performance and scalable libraries for Big Data stacks (Spark, Hadoop and Memcached) and Data science applications from his group (hibd.cse.ohio-state.edu) are also publicly available. These libraries are currently being used by more than 340 organizations in 38 countries. More than 40,000 downloads of these libraries have taken place. He is an IEEE Fellow. More details about Prof. Panda are available at cse.ohio-state.edu/~panda.

Abstract

The TAU Performance System is a powerful and highly versatile profiling and tracing tool ecosystem for performance analysis of parallel programs at all scales. TAU has evolved with each new generation of HPC systems and presently scales efficiently to hundreds of thousands of cores on the largest machines in the world. To meet the needs of computational scientists to evaluate and improve the performance of their applications, we present TAU's support for the key MVAPICH features including its support for the MPI Tools (MPI_T) interface with support for setting MPI_T control variables on a per MPI communicator basis. TAU's support for GPUs including CUDA, DPC++/SYCL, OpenCL, OpenACC, Kokkos, and HIP/ROCm improve performance evaluation of heterogenous programming models. It will also describe TAU's support for MPI's performance and control variables exported by MVAPICH, and its support for instrumentation of OpenMP runtime, and APIs for instrumentation of Python programs. TAU uses these interfaces on unmodified binaries without the need for recompilation. This talk will describe these new instrumentation techniques to simplify the usage of performance tools including support for compiler-based instrumentation, rewriting binary files, preloading shared objects. The talk will also highlight TAU's analysis tools including its 3D Profile browser, ParaProf and cross-experiment analysis tool, PerfExplorer and its usage with MVAPICH2 under Amazon AWS using the Extreme-scale Scientific Software Stack (E4S) AWS image. http://tau.uoregon.edu


Bio

Sameer Shende

Dr. Sameer Shende has helped develop the TAU Performance System, the Program Database Toolkit (PDT), the Extreme-scale Scientific Software Stack (E4S) [https://e4s.io] and the HPCLinux distro. His research interests include tools and techniques for performance instrumentation, measurement, analysis, runtime systems, HPC container runtimes, and compiler optimizations. He serves as a Research Associate Professor and the Director of the Performance Research Laboratory at the University of Oregon, and as the President and Director of ParaTools, Inc., ParaTools, SAS, and ParaTools, Ltd.

4:30 - 5:30

Short Presentations

Wednesday, August 25

Abstract

TBA


Bio

TBA

Dr. Luiz DeRose is a Director of Cloud Engineering for HPC at Oracle. Before joining Oracle, He was a Sr. Science Manager at AWS, and a Senior Principal Engineer and the Programming Environments Director at Cray. Dr. DeRose has a Ph.D. in Computer Science from the University of Illinois at Urbana-Champaign. He has more than 25 years of high-performance computing experience and a deep knowledge of programming and middleware environments for HPC. Dr. DeRose has eight patents and has published more than 50 peer-review articles in scientific journals, conferences, and book chapters, primarily on the topics of compilers and tools for high performance computing.

Abstract

KISTI's Nurion supercomputer features 8305 nodes with Intel Xeon Phi KNL (Knight Landing) processors (68 cores) and 132 nodes with Intel Skylake CPUs (2-socket, 40 cores). Nurion is a system consisting of compute nodes, CPU-only nodes, Omni-Path interconnect networks, burst buffer high-speed storage, Luster-based parallel file system. Also, KISTI’s Neuron supercomputer features 78 nodes with NVIDIA GPUs to support GPU computing and design the KISTI’s next supercomputer. We will present microbenchmark and application performance results using MVAPICH2 on Nurion and Neuron.


Bio

Minsik Kim

Minsik Kim is a senior researcher in the Supercomputing Infrastructure Center of the Korea Institute of Science and Technology Information (KISTI). He received the Ph.D. degree in Electrical and Electronic Engineering from Yonsei University in 2019. His research interests include deep learning optimization on GPUs, computer architecture, and high-performance computing. He is a member of IEEE. More details about Dr. Kim is available at minsik-kim.github.io

Abstract

The progress engine in the MPI library recognizes changes in communication states, such as message arrivals, by polling. Although polling provides a low communication latency, its use results in low energy efficiency because the progress engine occupies CPU resources while performing polling. The decrease in energy efficiency induced by polling has become more severe as the skew has increased with the advent of exascale systems. In this talk, we describe a progress engine that uses both polling and signaling to perform energy-efficient intra-node communication. There have been studies on energy efficient MPI; however, existing studies do not significantly consider the intra-node communication channels that use shared memory buffers. We present that our preliminary implementation of signaling-based progress engine based on MVAPICH2 improves energy efficiency as the skew increases on a many-core system.


Bio

Hyun-Wook Jin

Hyun-Wook Jin is a Professor in the Department of Computer Science and Engineering at Konkuk University, Seoul, Korea. He is leading the System Software Research Laboratory (SSLab) at Konkuk University. Before joining Konkuk University in 2006, He was a Research Associate in the Department of Computer Science and Engineering at The Ohio State University. He received Ph.D. degree from Korea University in 2003. His main research focus is on operating systems for high-end computing systems and cyber-physical systems.

Abstract

TBA


Bio

Mahidhar Tatineni

Mahidhar Tatineni received his M.S. & Ph.D. in Aerospace Engineering from UCLA. He currently leads the User Services group at SDSC. He has led the deployment and support of high performance computing and data applications software on several NSF and UC resources including Comet, and Gordon at SDSC. He has worked on many NSF funded optimization and parallelization research projects such as petascale computing for magnetosphere simulations, MPI performance tuning frameworks, hybrid programming models, topology aware communication and scheduling, big data middleware, and application performance evaluation using next generation communication mechanisms for emerging HPC systems. He is co-PI on the SDSC Comet and Expanse HPC systems projects at SDSC.

1:00 - 1:30

Break

Abstract

TBA


Bio

Alan Sussman

Alan Sussman is currently program director in the Office of Advanced Cyberinfrastructure at NSF in charge of learning and workforce development programs, and is also active in software and data related cyberinfrastructure programs. He is on leave from his permanent position as a Computer Science professor at the University of Maryland. His research interests have focused on systems software support for large-scale applications that require high performance parallel and distributed computing. In addition, since 2010, he has been helping coordinate a curriculum initiative in parallel and distributed computing with the premise that every undergraduate student in Computer Science or Computer Engineering must acquire basic parallel computing skills. This curriculum has had wide adoption including its direct impact on ACM’s CS2013 Curriculum.

Abstract

TBA


Bio

Donglai Dai

Donglai Dai is a Chief Engineer at X-ScaleSolutions and leads company’s R&D team. His current work focuses on developing scalable efficient communication libraries and performance analysis tools for distributed and parallel HPC and deep learning applications on HPC systems. He has more than 20 years of industry experience in engineering management and development of computer systems, VLSI, IoT, and interconnection networks while working at Intel, Cray, SGI, and startups. He holds more than 10 US granted patents and has published more than 30 technical papers or book chapters. He has a PhD degree in computer science from The Ohio State University.

Abstract

TBA


Bio

Karen Tomko

Karen Tomko is the Director of Research Software Applications and serves as manager of the Scientific Applications group at the Ohio Supercomputer Center where she oversees deployment of software for data analytics, modeling and simulation. Her research interests are in the field of parallelization and performance improvement for High Performance Computing applications. She has been with OSC since 2007 and has been collaborating with DK Panda and the MVAPICH team for about 10 years.

4:00 - 5:00

Short Presentations