Crash-Quiescent Failure Detection
Conference Paper
Overview
Research
Identity
Additional Document Info
Other
View All
Overview
abstract
A distributed algorithm is crash quiescent if it eventually stops sending messages to crashed processes. An algorithm can be made crash quiescent by providing it with either a crash notification service or a reliable communication service. Both services can be implemented in practical environments with failure detectors. Therefore, crash-quiescent failure detection is fundamental to system-wide crash quiescence. We establish necessary and sufficient conditions for crash-quiescent failure detection in partially synchronous environments where a bounded, but unknown, number of consecutive messages can be arbitrarily late or lost. Without a correct majority of processes, not even the weakest oracle for fault-tolerant consensus, , can be implemented crash quiescently. With a correct majority, however, the eventually perfect failure detector, , is possible. Our algorithm is correct in all runs, but improves performance via crash quiescence in any run with a correct majority. We also present a refinement of our algorithm to mitigate the overhead of achieving crash quiescence; the resulting bit complexity per utilized link is asymptotically better than or equal to that of non-crash-quiescent counterparts. 2009 Springer Berlin Heidelberg.