Abstract
Communication Induced Checkpointing protocols usually make the assumption that any process can be checkpointed at any time. We propose an alternative approach which releases the constraint of always checkpointable processes, without delaying any message reception nor altering message ordering enforced by the communication layer or by the application. This protocol has been implemented within ProActive, an open source Java middleware for asynchronous and distributed objects implementing the ASP (Asynchronous Sequential Processes) model.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Aglets Software Development Kit. IBM (1999), http://www.trl.ibm.com/aglets/
Ramkumar, B., Strumpen, V.: Portable checkpointing for heterogenous architectures. In: Fault-Tolerant Parallel and Distributed Systems, pp. 73–92 (1998)
Delbé, C.: Causal ordering of asynchronous request services. In: Dependable Systems and Networks - Student Forum, June 2004, IEEE, Los Alamitos (2004)
Charron-Bost, B., Mattern, F., Tel, G.: Synchronous, asynchronous, and causally ordered communications. Distributed Computing 9(4), 173–191 (1996)
Briatico, D., Ciuffoletti, A., Simoncini, L.: A distributed domino-effect free recovery algorithm. In: IEEE International Symposium on Reliability, Distributed Software, and Databases, pp. 207–215 (1984)
Caromel, D., Henrio, L., Serpette, B.: Asynchronous and deterministic objects. In: 31st ACM Symposium on Principles of Programming Languages, ACM Press, New York (2004)
Caromel, D., Klauser, W., Vayssiere, J.: Towards seamless computing and metacomputing in java. In: Fox, G.C. (ed.) Concurrency Practice and Experience, November 1998, vol. 10, pp. 1043–1061. Wiley & Sons, Ltd., Chichester (1998)
Manivannan, D., Singhal, M.: A low-overhead recovery technique using quasisynchronous checkpointing. In: Proceedings of the 16th ICDCS, pp. 100–107 (1996)
Manivannan, D., Singhal, M.: Quasi-synchronous checkpointing: Models, characterization, and classification. IEEE Transactions on Parallel and Distributed Systems 10, 703–713 (1999)
Baude, F., Caromel, D., Delbé, C., Henrio, L.: A fault tolerance protocol for asp calculus: Design and proof. Technical Report RR-5246, INRIA (2004)
Bronevetsky, G., Marques, D., Pingali, K., Stodghill, P.: Automated applicationlevel checkpointing of mpi programs. SIGPLAN Not. 38(10), 84–94 (2003)
Ruiz-Garcia, J.C., Killijian, M.O., Fabre, J.C., Chiba, S.: Optimized object state checkpointing using compile-time reflection. In: Workshop on Embedded Fault- Tolerant Systems, pp. 46–48 (1998)
Howell, J.: Straightforward java persistence through checkpointing. In: Proceedings of the 3rd International Workshop on Persistence and Java, pp. 322–334 (1998)
Plank, J.S., Beck, M., Kingsley, G., Li, K.: Libckpt: Transparent checkpointing under Unix. In: Usenix Winter Technical Conference, January 1995, pp. 213–223 (1995)
Alvisi, L., Elnozahy, E.N., Rao, S., Husain, S., Mel, A.D.: An analysis of communication induced checkpointing. In: Symposium on Fault-Tolerant Computing, pp. 242–249 (1999)
Schlichting, R.D., Schneider, F.B.: Fail-stop processors: an approach to designing fault-tolerant computing systems. ACM Transactions on Computer Systems 1, 222–238 (1983)
Strom, R.E., Yemini, S.: Optimistic recovery in distributed systems. ACM Transactions on Computer Systems 3, 204–226 (1985)
Bouchenak, S.: Pickling threads state in the java system. In: Third European Research Seminar on Advances in Distributed Systems (1999)
Chandra, T.D., Toueg, S.: Unreliable failure detectors for reliable distributed systems. Journal of the ACM 43(2), 225–267 (1996)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Baude, F., Caromel, D., Delbé, C., Henrio, L. (2005). A Hybrid Message Logging-CIC Protocol for Constrained Checkpointability. In: Cunha, J.C., Medeiros, P.D. (eds) Euro-Par 2005 Parallel Processing. Euro-Par 2005. Lecture Notes in Computer Science, vol 3648. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11549468_71
Download citation
DOI: https://doi.org/10.1007/11549468_71
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-28700-1
Online ISBN: 978-3-540-31925-2
eBook Packages: Computer ScienceComputer Science (R0)