B.Schaeli, R.D. Hersch
Proceedings of the 6th Workshop on Parallel and Distributed Systems: Testing, Analysis, and Debugging (PADTAD '08), article 2, ACM, 2008, pp. 1-10
In message-passing parallel applications, messages are not delivered in a strict order. The number of messages, their content and their destination may depend on the ordering of their delivery. Nevertheless, for most applications, the computation results should be the same for all possible orderings. Finding an ordering that produces a different outcome or that prevents the execution from terminating reveals a message race or a deadlock. Starting from the initial application state, we dynamically build an acyclic message-passing state graph such that each path within the graph represents one possible message ordering. All paths lead to the same final state if no deadlock or message race exists. If multiple final states are reached, we reveal message orderings that produce the different outcomes. The corresponding executions may then be replayed for debugging purposes. We reduce the number of states to be explored by using previously acquired knowledge about communication patterns and about how operations read and modify local process variables. We also describe a heuristic that tests a subset of orderings that are likely to reveal existing message races or deadlocks. We applied our approach on several applications developed using the Dynamic Parallel Schedules (DPS) parallelization framework. Compared to the naive execution of all message orderings, the use of a message-passing state graph reduces the cost of testing all orderings by several orders of magnitude. The use of prior information further reduces the number of visited states by a factor of up to fifty in our tests. The heuristic relying on a subset of orderings was able to reveal race conditions in all tested cases. We finally present a first step in generalizing the approach to MPI applications.
Download the full paper: PDF 209.19 KB