Abstract
This paper proposed a new methodology to perform cluster analysis based on genetic algorithm (GA). Firstly, the population of GA is initialized by k-means algorithm to reach the best centers of clusters. Secondly, the GA operators are applied. New mutation is proposed depending on the extreme points in clusters groups to overcome the limitations of k-means algorithm. Finally, the proposed approach is applied on a set of data consists of a non-overlapping data and large datasets with high dimensionality from machine learning repository (UCI). In addition an electrical application is used to measure the capability of our approach to solve real world application. The results proved the superiority of the new methodology.

























Similar content being viewed by others
References
Abdelsalam AM, El-Shorbagy MA (2018) Optimization of wind turbines siting in a wind farm using genetic algorithm based local search. Renew Energy 123:748–755
Abualigah LM, Khader AT, Hanandeh ES, Gandomi AH (2017) A novel hybridization strategy for krill herd algorithm applied to clustering techniques. Appl Soft Comput 60:423–435
Alabsi F, Naoum R (2012) Comparison of selection methods and crossover operations using steady state genetic based intrusion detection system. J Emerg Trends Comput Inf Sci 3:1053–1058
Al-Malki A, Rizk MM, El-Shorbagy MA, Mousa AA (2016a) Hybrid genetic algorithm with k-means for clustering problems. Open J Optim 5:71–83
AL-Malki A, Rizk MM, El-Shorbagy MA, Mousa AA (2016b) Identifying the most significant solutions from Pareto front using hybrid genetic k-means approach. Int J Appl Eng Res 11:8298–8311
Armano G, Farmani MR (2016) Multiobjective clustering analysis using particle swarm optimization. Expert Syst Appl 55:184–193
Beckstead JW (2002) Using hierarchical cluster analysis in nursing research. West J Nurs Res 24:307–319
Bholowalia P, Kumar A (2014) EBK-means: a clustering technique based on elbow method and k-means in WSN. Int J Comput Appl 105:17–24
Chen X, Zhou Y, Luo Q (2014) A hybrid monkey search algorithm for clustering analysis. Sci World J 2014:1–16
Cura T (2012) Aparticle swarm optimization approach to clustering. Expert Syst Appl 39:1582–1588
Dai W, Liu SS, Liang S (2009) An improved ant colony optimization cluster algorithm based on swarm intelligence. J Softw 4:299–306
Das S, Abraham A, Konar A (2008) Automatic clustering using an improved differential evolution algorithm. IEEE Trans Syst Man Cybern A: Syst Hum 38:218–237
El-Desoky IM, El-Shorbagy MA, Nasr SM, Hendawy ZM, Mousa AA (2016) A hybrid genetic algorithm for job shop scheduling problems. Int J Adv Eng Technol Comput Sci (IJAETCS) 3:6–17
EL-Sawy AA, Hussein MA, Zaki EM, Mousa AA (2014) An introduction to genetic algorithms: a survey a practical issues. Int J Sci Eng Res 5:252–262
EL-Shorbagy MA, Mousa AA, Nasr SM (2016) A chaos-based evolutionary algorithm for general nonlinear programming problem. Chaos Solitons Fractals 85:8–21
EL-Tarabily M, Abdel-Kader RF, Marie M, Abdel-Azeem G (2013) A PSO-based subtractive data clustering algorithm. Int J Res Comput Sci 3:1–9
Everitt BS, Landau S, Leese M, Stahl D (2011) Cluster analysis. Wiley, London
Farag MA, El-Shorbagy MA, El-Desoky IM, El-Sawy AA, Mousa AA (2015a) Binary-real coded genetic algorithm based k-means clustering for unit commitment problem. Appl Math 6:1873–1890
Farag MA, El-Shorbagy MA, El-Desoky IM, El-Sawy AA, Mousa AA (2015b) Genetic algorithm based on k-means-clustering technique for multi-objective resource allocation problems. Br J Appl Sci Technol 8:80–96
Fränti P, Kivijarvi J (2000) Randomised local search algorithm for the clustering problem. Pattern Anal Appl 3:358–369
Fränti P, Virmajoki O (2006) Iterative shrinking method for clustering problems. Pattern Recognit 39:761–775
Gong A, Gao Y (2016) An optimized artifical bee colony algorithm for clustering. Int J Comput Appl 9:107–116
Hruschka ER, Campello RJ, Freitas AA, Ponce Leon F, de Carvalho AC (2009) A survey of evolutionary algorithms for clustering. IEEE Trans Syst Man Cybern C (Appl Rev) 39:133–155
Kanungo T, Mount DM, Netanyahu NS, Piatko CD, Silverman R, Wu AY (2004) A local search approximation algorithm for k-means clustering. Comput Geom 28:89–112
Karaboga D, Ozturk C (2011) A novel clustering approach: artifical bee colony (ABC) algorithm. Appl Soft Comput 11:652–657
Karthikeyan S, Christopher T (2014) A hybrid clustering approach using artificial bee colony (ABC) and particle swarm optimization. Int J Comput Appl 100:1–6
Karunakar NV, Rosalina KM, Kumar NP (2013) Clustering analysis and its application in electrical distribution system. Int J Electr Electron Comput Syst 1:2347–2820
Kowalski PA, Łukasik S, Charytanowicz M, Kulczycki P (2016) Clustering based on the krill herd algorithm with selected validity measures In: 2016 Federated conference on computer science and information systems (FedCSIS), Gdansk, pp 79–87
Kuila P, Jana PK (2014) A novel differential evolution based clustering algorithm for wireless sensor networks. Appl Soft Comput 25:414–425
Kumar A (2015) Optimization of artificial bee colony algorithm for clustering in data mining. Int J Adv Res Comput Sci Softw Eng 5:399–404
Lan L, Qiao-mei R (2009) Implementation of clustering algorithm using artificial immune system. In: 2009 First international workshop on database technology and applications, Wuhan, Hubei, pp 275–278
Li H, Chen X, Wei K (2017) An improved pigeon-inspired optimization for clustering analysis problems. Int J Comput Intell Appl 16:1–21
Lin H, Yang F, Kao Y (2005) An efficient GA-based clustering technique. Tamkang J Sci Eng 8:113–122
Liu Y, Shen YD (2010) Data clustering with cat swarm optimization. J Converg Inf Technol 5:21–28
Liu X, Guangdong G, Fu H (2010) An effective clustering algorithm with ant colony. J Comput 5:598–605
Ma H, Lu Y, Yin H (2003) Customer segmentation based on the share of the custome. J Wuhan Univ Technol Inf Manag Eng 25:184–187
Mane SU, Gaikwad PG (2014) Hybrid particle swarm optimization (HPSO) for data clustering. Int J Comput Appl 97:1–5
Maulik U, Bandyopadhyay S (2000) Genetic algorithm-based clustering technique. Pattern Recognit 33:1455–1465
Monath N, Kobren A, Krishnamurthy A, McCallum A (2017) Gradient-based hierarchical clustering. In: 31st Conference on neural information processing systems (NIPS 2017), Long Beach, CA, USA
Mor M, Gupta P, Sharma P (2014) A genetic algorithm approach for clustering. Int J Eng Comput Sci 3:6442–6447
Mousa AA, El-Shorbagy MA, Farag MA (2017) K-means-clustering based evolutionary algorithm for multi-objective resource allocation problems. Appl Math Inf Sci 11:1681–1692
Olesen JR, Cordero J, Zeng Y (2009) Auto-clustering using particle swarm optimization and bacterial foraging. In: Cao L, Gorodetsky V, Liu J, Weiss G, Yu PS (eds) Agents and data mining interaction. ADMI 2009. Lecture Notes in Computer Science. Springer, Berlin, Heidelberg, vol 5680, pp 69–83
Parsha MK, Pacha S (2013) Recent advances in clustering algorithms. Int J Concept Comput Inf Technol 1:1–4
Ranjbar M, Mosavi MR (2012) Simulated annealing clustering for optimum GPS satellite selection. Int J Comput Sci 9:101–104
Ranjbar M, Azami M, Rosta AS (2015) Fuzzy artificial bee colony for clustering. J Agric Sci Eng 1:46–53
Ribeiro Filho LL, Treleaven PC, Alippi C (1994) Genetic algorithm programming environments. Computer 27:28–43
Saha I, Mukhopadhyay A (2008) Genetic algorithm and simulated annealing based approaches to categorical data clustering. In: 2008 IEEE Region 10 and the third international conference on industrial and information systems, Kharagpur, pp 1–6
Santosa B, Ningrum MK (2009) Cat swarm optimization for clustering. In: 2009 International conference of soft computing and pattern recognition, Malacca, pp 54–59
Sing TY, Siraj SE, Raguraman R, Marimuthu PN, Gowrishankar K, Nithiyananthan K (2016) Cluster analysis based fault identification data mining models for 3 phase power systems. Int J Innov Sci Res 24:285–292
Singh K, Malik D, Sharma N (2011) Evolving limitations in k-means algorithm in data mining and their removal. IJCEM Int J Comput Eng Manag 12:105–109
Song YC, Meng HD, Zhang Y (2010) Clustering analysis and its applications. In: 2010 Second IITA international conference on geoscience and remote sensing, Qingdao, pp 514–517
Wang Y, Chen L (2016) Multi-view fuzzy clustering with minimax optimization for effective clustering of data from multiple sources. Expert Syst Appl 1:1–10
Yim O, Ramdeen KT (2015) Hierarchical cluster analysis: comparison of three linkage measures and application to psychological data. Quant Methods Psychol 11:8–21
Zand MD, Ansari AH, Lucas C, Zoroofi RA (2010) Risk assessment of coronary arteries heart disease based on neuro-fuzzy classifiers. In: 2010 17th Iranian conference of biomedical engineering (ICBME), Isfahan, pp 1–4
Zheng Y (2012) Clustering methods in data mining with its applications in high education. In: 2012 International conference on education technology and computer (ICETC2012), vol 43. IACSIT Press, Singapore
Acknowledgements
The authors are grateful to the anonymous reviewers for their valuable comments and helpful suggestions which greatly improved the paper’s quality.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
El-Shorbagy, M.A., Ayoub, A.Y., Mousa, A.A. et al. An enhanced genetic algorithm with new mutation for cluster analysis. Comput Stat 34, 1355–1392 (2019). https://doi.org/10.1007/s00180-019-00871-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00180-019-00871-5