Abstract
Complex software-intensive applications can be built with commercially available systems for cluster systems. To improve availability of (n,k)-way cluster systems, we develop self-configuring algorithm that not only determines the number of primary and backup nodes for meeting the requirement of availability and waiting time deadline, but also uses software rejuvenation for dealing with dormant software faults. Availability modeling of (n,k)-way cluster systems with software rejuvenation has a view of fault tolerance and switchover states with a semi-Markov process. According to the operating parameters, steady-state probabilities and availability are calculated, which are used for self-configuring algorithm.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Buyya, R.: High Performance Cluster Computing: Architectures and Systems. Prentice-Hall (1999)
Du, X., Zhang, X.: Memory Hierarchy Considerations for Cost-effective Cluster Computing. IEEE Transactions on Computer (2000) 915–933
Sullivan, M. and Chillarehe, R.: Software Defects and Their Impact on System Availability-A Study of Field Failures in Operating Systems. Proceedings of the 21st IEEE International Symposium on Fault-Tolerant Computing (1991) 2–9
Gray, J. and Siewiorek, D.P.: High-Availability Computer Systems. IEEE Computer 24 (1991) 39–48
Huang, Y., Kintala, C., Kolettis, N., and Fultion, N.D.: Software Rejuvenation: Analysis, Module and Applications. Proceedings of the 25th Symposium on Fault Tolerant Computer Systems (1995) 318–390
Garg, S., Moorsel, A.van, Vaidyanathan, K., and Trivedi, K.: A Methodology for Detection and Estimation of Software Aging. Proceedings of the 9th International Symposium on Software Reliability Engineering (1998) 282–292
Huang, Y. et al.: Software Tools and Libraries for Fault Tolerance. Bulletin of the Technical Committee on Operating Systems and Application Environment (1995) 5–9
Hunter, S.W. and Smith, W.E.: Availability Modeling and Analysis of a Two Node Cluster. Proceedings of the 5th International Conference on Information Systems, Analysis and Synthesis (1999)
Lyu, M.R. and Mendiratta, V.B.: Software Fault Tolerance in a Clustered Architecture: Techniques and Reliability Modeling. Proceedings of the 1999 IEEE Aerospace Conference (1999) 141–150
Mendiratta, V.B.: Reliability Analysis of Clustered Computing Systems. Proceedings of the 9th IEEE International Symposium on Software Reliability Engineering (1998) 268–272
Park, K. and Kim, S.: Availability Analysis and Improvement of Active/Standby Cluster Systems using software rejuvenation. The Journal of Systems Software 61 (2002) 121–128
Castelli, V., et al.:Proactive Management of Software Aging. IBM Journal of Research and Development 45 (2001) 311–332
Sericola, B.: Availability Analysis of Repairable Computer Systems and Stationary Detection. IEEE Transactions on Computers 48 (1999) 1166–1172
Kleinrock, L.: Queueing Systems Volume I: Theory. Wiley (1975)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Choi, C., Kim, S. (2003). Self-configuring Algorithm for Software Fault Tolerance in (n,k)-way Cluster Systems. In: Kumar, V., Gavrilova, M.L., Tan, C.J.K., L’Ecuyer, P. (eds) Computational Science and Its Applications — ICCSA 2003. ICCSA 2003. Lecture Notes in Computer Science, vol 2667. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44839-X_78
Download citation
DOI: https://doi.org/10.1007/3-540-44839-X_78
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-40155-1
Online ISBN: 978-3-540-44839-6
eBook Packages: Springer Book Archive