Self-configuring Algorithm for Software Fault Tolerance in (n,k)-way Cluster Systems

Changyeol Choi¹⁰ &
Sungsoo Kim¹⁰

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2667))

Included in the following conference series:

International Conference on Computational Science and Its Applications

807 Accesses

Abstract

Complex software-intensive applications can be built with commercially available systems for cluster systems. To improve availability of (n,k)-way cluster systems, we develop self-configuring algorithm that not only determines the number of primary and backup nodes for meeting the requirement of availability and waiting time deadline, but also uses software rejuvenation for dealing with dormant software faults. Availability modeling of (n,k)-way cluster systems with software rejuvenation has a view of fault tolerance and switchover states with a semi-Markov process. According to the operating parameters, steady-state probabilities and availability are calculated, which are used for self-configuring algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Buyya, R.: High Performance Cluster Computing: Architectures and Systems. Prentice-Hall (1999)
Google Scholar
Du, X., Zhang, X.: Memory Hierarchy Considerations for Cost-effective Cluster Computing. IEEE Transactions on Computer (2000) 915–933
Google Scholar
Sullivan, M. and Chillarehe, R.: Software Defects and Their Impact on System Availability-A Study of Field Failures in Operating Systems. Proceedings of the 21^st IEEE International Symposium on Fault-Tolerant Computing (1991) 2–9
Google Scholar
Gray, J. and Siewiorek, D.P.: High-Availability Computer Systems. IEEE Computer 24 (1991) 39–48
Google Scholar
Huang, Y., Kintala, C., Kolettis, N., and Fultion, N.D.: Software Rejuvenation: Analysis, Module and Applications. Proceedings of the 25^th Symposium on Fault Tolerant Computer Systems (1995) 318–390
Google Scholar
Garg, S., Moorsel, A.van, Vaidyanathan, K., and Trivedi, K.: A Methodology for Detection and Estimation of Software Aging. Proceedings of the 9^th International Symposium on Software Reliability Engineering (1998) 282–292
Google Scholar
Huang, Y. et al.: Software Tools and Libraries for Fault Tolerance. Bulletin of the Technical Committee on Operating Systems and Application Environment (1995) 5–9
Google Scholar
Hunter, S.W. and Smith, W.E.: Availability Modeling and Analysis of a Two Node Cluster. Proceedings of the 5^th International Conference on Information Systems, Analysis and Synthesis (1999)
Google Scholar
Lyu, M.R. and Mendiratta, V.B.: Software Fault Tolerance in a Clustered Architecture: Techniques and Reliability Modeling. Proceedings of the 1999 IEEE Aerospace Conference (1999) 141–150
Google Scholar
Mendiratta, V.B.: Reliability Analysis of Clustered Computing Systems. Proceedings of the 9^th IEEE International Symposium on Software Reliability Engineering (1998) 268–272
Google Scholar
Park, K. and Kim, S.: Availability Analysis and Improvement of Active/Standby Cluster Systems using software rejuvenation. The Journal of Systems Software 61 (2002) 121–128
Article Google Scholar
Castelli, V., et al.:Proactive Management of Software Aging. IBM Journal of Research and Development 45 (2001) 311–332
Article Google Scholar
Sericola, B.: Availability Analysis of Repairable Computer Systems and Stationary Detection. IEEE Transactions on Computers 48 (1999) 1166–1172
Article MathSciNet Google Scholar
Kleinrock, L.: Queueing Systems Volume I: Theory. Wiley (1975)
Google Scholar

Download references

Author information

Authors and Affiliations

Graduate School of Information and Communication, Ajou University, Suwon, Korea
Changyeol Choi & Sungsoo Kim

Authors

Changyeol Choi
View author publications
You can also search for this author in PubMed Google Scholar
Sungsoo Kim
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Army High Performance Computing Research Center, USA
Vipin Kumar
Department of Computer Science, University of Calgary, Calgary, AB, T2N1N4, Canada
Marina L. Gavrilova
Heuchera Technologies Inc., 122 9251-8 Yonge Street, Richmond Hill, ON, Canada, L4C 9T3
Chih Jeng Kenneth Tan
Département d’informatique et de recherche opérationelle, Université de Montréal, Montréal, Québec, H3C 3J7, Canada
Pierre L’Ecuyer
Department of Computer Science and Engineering, University of Minessota, MN, 55455, USA
Vipin Kumar
The Queen’s University of Belfast, School of Computer Science, Belfast BT7 1NN, Northern Ireland, UK
Chih Jeng Kenneth Tan

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Choi, C., Kim, S. (2003). Self-configuring Algorithm for Software Fault Tolerance in (n,k)-way Cluster Systems. In: Kumar, V., Gavrilova, M.L., Tan, C.J.K., L’Ecuyer, P. (eds) Computational Science and Its Applications — ICCSA 2003. ICCSA 2003. Lecture Notes in Computer Science, vol 2667. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44839-X_78

Download citation

DOI: https://doi.org/10.1007/3-540-44839-X_78
Published: 18 June 2003
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-40155-1
Online ISBN: 978-3-540-44839-6
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics