Crash-Quiescent Failure Detection Conference Paper uri icon

abstract

  • A distributed algorithm is crash quiescent if it eventually stops sending messages to crashed processes. An algorithm can be made crash quiescent by providing it with either a crash notification service or a reliable communication service. Both services can be implemented in practical environments with failure detectors. Therefore, crash-quiescent failure detection is fundamental to system-wide crash quiescence. We establish necessary and sufficient conditions for crash-quiescent failure detection in partially synchronous environments where a bounded, but unknown, number of consecutive messages can be arbitrarily late or lost. Without a correct majority of processes, not even the weakest oracle for fault-tolerant consensus, , can be implemented crash quiescently. With a correct majority, however, the eventually perfect failure detector, , is possible. Our algorithm is correct in all runs, but improves performance via crash quiescence in any run with a correct majority. We also present a refinement of our algorithm to mitigate the overhead of achieving crash quiescence; the resulting bit complexity per utilized link is asymptotically better than or equal to that of non-crash-quiescent counterparts. 2009 Springer Berlin Heidelberg.

published proceedings

  • DISTRIBUTED COMPUTING, PROCEEDINGS

author list (cited authors)

  • Sastry, S., Pike, S. M., & Welch, J. L.

citation count

  • 3

complete list of authors

  • Sastry, Srikanth||Pike, Scott M||Welch, Jennifer L

publication date

  • December 2009