In this talk we describe several techniques used in parallel runtime systems Charm++ and AMPI that allow complex irregular and dynamic applications to be developed quickly and perform scalably on large parallel machines. One of our core techniques is based on the idea of processor virtualization — the programmer divides the computation into a large number of entities, which are mapped to the available processors by an intelligent runtime system. This separation of concerns frees the programmers from thinking about the number of processors when writing applications, while allowing an intelligent runtime system to optimize the application load balance and provide fault tolerance in a way that is application independent.
This talk will mainly focus on automatic dynamic load balancing for AMPI applications. We will describe techniques used in migrating MPI processes across processors for achieving global load balance. Various application independent load balancing strategies that calculate optimized work-to-processor mappings to evenly distribute workload on processors are discussed. The same techniques are also used in our fault tolerance schemes based on automatic checkpoint/restarting.