Wait-free clock synchronization Conference Paper uri icon

abstract

  • Multi-processor computer systems with many processors are becoming increasingly important as vehicles for solving computationally expensive problems. Synchronization among the processors is achieved with a variety of clock configurations. A new notion of fault-tolerance for clock synchronization algorithms is defined, tailored to the requirements and failure patterns of multiprocessors. Algorithms in this class can tolerate any number of processors that can fail by ceasing operation for an arbitrary time interval and then resuming operation (with or) without recognizing that a fault has occurred. These algorithms guarantee that, for some fixed k, once a processor P has been working correctly for at least k time, then as long as it continues to work correctly, (1) P does not adjust its clock, and (2) P's clock agrees with the clock of every other processor that has also been working correctly for at least k time. Because a working processor must synchronize in a fixed amount of time regardless of the actions of the other processors, these algorithms are called wait-free. Four wait-free clock synchronization algorithms are presented for various system settings. Two of them are both wait-free and self-stabilizing. An algorithm is self-stabilizing if it is resilient to any number and any type of faults in the history in the following sense: starting with an arbitrary state of the system, a self-stabilizing algorithm eventually reaches a point after which it correctly performs its task. The existence of an algorithm that can tolerate any number of faulty processors and work correctly when started in an arbitrary system state is somehow surprising.

published proceedings

  • Proceedings of the Annual ACM Symposium on Principles of Distributed Computing

author list (cited authors)

  • Dolev, S., & Welch, J. L.

complete list of authors

  • Dolev, S||Welch, JL

publication date

  • December 1993