Detecting MPI Usage Anomalies via Partial Program Symbolic Execution

Abstract

MPI is a message passing based programming model for distributed-memory parallelism that is widely used for programming supercomputers. However, debugging and verification of MPI programs is generally recognized to be a deep technical challenge. This challenge is further exacerbated by a recent increase in the use of nonblocking MPI operations for improved scalability, since incorrect use of these operations can introduce new classes of bugs related to data races.

In this paper, we introduce a new debugging approach based on partial symbolic execution to identify anomalies in MPI usage. Our approach avoids the false positives inherent in static analysis, while still scaling to large programs. Further, our approach can be applied to incomplete programs and explore multiple execution paths, thereby leading to increased coverage compared with dynamic approaches. An experimental comparison with state-of-the-art tools for debugging MPI applications show that our approach delivers demonstrably better precision than a state-of-the art static tool (MPI-Checker) and a state-of-the art dynamic tool (MUST), without incurring excessive overheads.

Fangke Ye
Fangke Ye
Research Scientist