CCF:SHF:EAGER:Collaborative Research: Asynchronous Algorithms for Exascale Computing Systems
- View All
Future computing systems will allow computations to be performed simultaneously with massive parallelization. However, current practices in scientific simulations cannot utilize the maximum potential of these machines. This is fundamentally due to the need for data synchronization across computing cores, which may cause up to 80% idling of the machines, and there is a need for new methods to overcome this inefficiency. This research is to develop a new framework for high performance computing where synchronization across the processing elements is relaxed. This will eliminate the overhead associated with extreme parallelism and potentially lay the foundation for simulations at scale. The framework is based on asynchronous model of computation for high performance computing to better utilize future systems. The price to pay for asynchrony is poor predictability of the code, resulting in uncertainty in the calculations. The central theme of this project is to accurately quantify this induced uncertainty and develop techniques to mitigate it. Specific research thrusts include: i) study of numerical stability, consistency and accuracy of widely used numerical algorithms, in the context of fluid-flow, under asynchronous conditions; ii) development of new schemes which can maintain its accuracy under asynchronous conditions; iii) determination of efficient implementations of the resulting algorithms on current and future systems. These research goals are addressed in a dynamical systems framework. The behavior of asynchronous numerical algorithms is modeled as Markov jump systems. Issues related to consistency, numerical stability and error control is addressed by using tools for analysis and design of Markov jump systems.