AI-Driven Customer Segmentation
AI-Driven Customer Segmentation
Abstract:- Customer categorization is an essential strategy We intend to categorize customers into distinct segments
for companies seeking to maximize their advertising spend. based on their historical behavior employing the Recency,
Businesses can boost client involvement and sales rates of Frequency, Monetary model or RFM Analysis. This model
conversion substantially by identifying specific customer enables us to delve deeper into customer interactions by taking
segments, tailoring products or services to their into account the most recent purchases, frequency of
preferences, minimizing the hassle of irrelevant transactions, and monetary value spent. We hope to gain a
advertisements, and increasing customer satisfaction, comprehensive understanding of customer preferences and
resulting in improved long-term interactions with clients. habits by using this nuanced approach.
This paper presents a classification model that uses Keras
and support vector machine stacked classification passed on We incorporate advanced machine learning classifiers into
to a meta-learner to predict the customer segment, and our system to improve the precision of customer segmentation.
RFM analysis is performed to identify the customer To predict customer labels within these identified segments, we
segment. This focused strategy lowers marketing costs and use Keras and Support Vector Machines (SVM). The
boosts income, increasing the business's efficiency. collaboration of these classifiers results in more robust and
Temporal mining helps us predict the next purchase of a accurate customer categorization.
customer using a time series model.
Both Keras and SVM classifier predictive outcomes are
Keywords:- Customer Classification, RFM Analysis, Keras And strategically stacked and used as input for a meta-learner, such
Support Vector Machine . as logistic regression. This final step of refinement aims to
optimize the customer classification, ensuring a nuanced and
I. INTRODUCTION reliable representation of various customer groups.
Customer classification is a vital approach for companies The ultimate goal of this project is to deliver businesses
looking to maximize their marketing efforts. Companies can with a more refined understanding of how consumers interact
significantly improve customer engagement and sales with them. These insights will not only streamline marketing
conversion rates by identifying specific customer segments and efforts but will also foster a more profound connection with
tailoring products or services to their customers. Using robust customers, contributing to the company's overall success and
machine learning algorithms for customer classification can growth.
improve results accuracy and consistency, allowing businesses
to better target their marketing campaigns and promotions. II. LITERATURE SURVEY
This, in turn, reduces the annoyance of irrelevant [1]The research modes of customer purchasing
advertisements while increasing customer satisfaction, characteristics are divided into three distinct groups in this
resulting in more fruitful long-term customer relationships. paper: experience-driven mode, theory-driven mode, and data-
Customer classification can be especially useful for insurance driven mode. A customer consumption behavior analysis
companies because it allows them to identify high-value algorithm is proposed, as well as the concept of integrating
customers who are more likely to purchase insurance products. customer intake behavior factors such as Contentment and
This targeted approach not only boosts revenue but also reduces fidelity. It is demonstrated through comparison that the data-
marketing costs, making the company more cost-effective. driven model is best suited to assessing aspects of online
customer purchasing patterns. To classify customers, a deep
Understanding consumer behavior is essential for neural network structure algorithm is proposed. The study
developing individualized marketing campaigns and successful extracts various kinds of essential information, such as
marketing strategies in the fast-paced business environment of customer habits and consumption structure, from massive
today. The objective of our project is to use historic transactions amounts of consumer behavior data in order to realize the
of customers and analyze them to create an advanced customer notion of personalization.
classification system.
[2] The aim of this research is to promote the use of improves the accuracy of predicting customer conduct;
intelligence services to discover prospective consumers through businesses can forecast which clients will react in a positive
offering retail corporations with timely and pertinent data. The way which is being addressed.
information used is the result of a thorough investigation and
serves a scientific purpose in assessing customer transaction [6] This study looks at e-commerce data analysis for
history and purchasing habits.The current research utilizes customer segmentation using an ensemble technique with
dataset categorization concepts via the K-Means Algorithm various machine learning algorithms. The combined approach,
depending on the RFM method (Recency, Frequency, and which incorporates random forest modeling, k-Nearest
Monetary) model. The Silhouette Coefficient is calculated and Neighbors, as well as ,gradient booster, accomplishes an
used to validate a variety of information set clusters. The results impressive 76.83% accuracy in client categorization,
of the sales transactions are contrasted to various factors such demonstrating the potential for improving online shopping
as Purchase Recency, Purchase Frequency, and Purchase company tactics. For customer segmentation in e-commerce,
Volume. the study's algorithm employs a combination approach
including Decision Tree, Logistics Regression, Support Vector
[3] The purpose of this study is to examine the predictive Classification (SVC),Logistics Regression, k-Nearest
capability of Keras' deep learning models via a trio resilient Neighbors, Random Forest, and Gradient Boosting
optimization techniques (stochastic gradient descent, root- Classifier.By applying machine learning algorithms,
average-square propagation, and adaptive moment optimizing) particularly ensemble models, the research aims to efficiently
and two-loss functions over local risk of landslides in classify customers based on e-commerce data, providing
geographic modeling. As an instance study, low-lying valuable insights for stakeholders to enhance business strategies
landslides in the HaLong region in Vietnam were opted for. To and decision-making processes.
this, a group of ten influencing factors (the slope, perspective,
curves, topographic moisture index, land use, distant to road, [7]The transaction forecasting challenge within the realm
distant to river, kind of soil, distant to failure, and rock type) of online tourism is investigated in this research, which is a
and 193 landslides the polygons have been taken to relatively recent and widespread online shopping application.
consideration for the purpose to establish a Geographic Although numerous investigations on buying forecasts have
Information System (GIS) a database for this research.The already been carried out, not much study was conducted
evaluation wraps up that the algorithm' deep neural network regarding customer buying habits for goods related to travel.
model is an innovative instrument for shallow in nature Following that, a statistical evaluation is performed to deal with
susceptibility visualizing in areas vulnerable to landslides. several appealing features related to buying habits with the goal
to verify the efficacy of factors. They present co-EM logistic
[4] The proposed research employs a novel approach that regression, an innovative framework that incorporates semi-
includes data collection, preliminary processing, feature supervised training and multi-view acquiring to its strategy for
encoded data, and categorization yet tackling sentiment predicting the likelihood that or not a purchase is going to be
assessment challenges alongside three long short-term recall created throughout the present exploring event. The structure
modifications. When analyzing such information, it is crucial to acquires and fully employs the logistical modeling benefits of
evaluate suitable gathering of information, preliminary simplified comprehension and data that is unlabeled.
processing, and categorization. In the studies, various literary
datasets have been employed to assess the significance of the [8] The writers of the present research use a visualization
proposed models. The suggested sentiment forecasting based on graphs to suggest an innovative approach for client
technique achieves stronger or at least alike, outcomes while categorization in the financial and banking industries. The
requiring fewer computations. The findings of this research expenditure habits of consumers are expressed as transmission
demonstrate the vital role of sentiment evaluation of feedback vectors identified as purchase characteristics, collecting the
from customers and social networking information in obtaining way consumers assign their money across various seller
significant insights. The technique of deep learning is used for classifications and methods of payment, and to assess and
assessing reviews from clients. divide consumers according to their expenditure allocations,
employing enormous scale debit and credit card purchases
[5] The article delves into segmenting customers, which is information sets. The findings of the analysis, graph-based
regarded to be among the cornerstones of an effective methods, especially the ones that use arbitrarily walk-based
marketing effort. Marketing professionals place a high value on techniques, provide more reliable and insightful findings for
this pivotal stage in the procedure of promoting novel goods. segmenting clients, with potential uses in banking risk
The framework has been altered in this paper by including evaluation and electronic payment supervisors.
variance "D" as a 4th parameter, which refers to the broadening
of goods bought through an individual consumer. In a retail [9] The article addresses the difficulties when employing
marketplace, categorization on the basis of RFM-D is employed ethical advertising techniques to attract prospective clients
to identify consumer buying trends. The model that is suggested while retaining existing ones. Companies use tools to cater to
the two kinds of customers, resulting in a greater return D. Model Stacking and Meta Learner:
investment as well as increased revenue. The contributor then This phase operates stacking on a validation set by
clarifies the concept of "the consumer grouping," which is used combining predictions from two base models (a neural network
by companies to classify various segments of consumers and model and an SVM classifier). The stacking is done
lend those various services. The current research investigates horizontally to create a new feature matrix
all four categories of customers, namely engaged, cold, warm, (stacked_X_val).The final stacked model is used to make
and passive, in greater depth. It has been found that such predictions on the test set after a logistic regression meta-model
fragments are insufficient for describing advertising approaches (meta_model) is trained
and require additional research. The article uses the analysis of
RFM to increase the size of the piece of information and then E. Recommendation System:
clusters the results derived from this procedure of analysis. This The recommendation is done based on content and
examination generates the necessary rules. collaborative filtering which uses cosine similarity for Product
associations and their relevance for recommendations.
[10] The article presents future years consumer habits correlation in user item matrix and this Aims to provide
forecasting and offers vital data for effectively distributing personalized and accurate product suggestions enhancing the
assets to the advertising and marketing divisions. Such data aids overall user experience
in arranging supplies in the storage facility and the place of sale,
and also in managerial choices made during production F. Streamlite and Data Analysis Report:
procedures. The establishment of sophisticated analytics A Streamlit script (e.g., data_entry_app.py) with input
instruments which forecast future client behavior in a non- fields for new data entry. Widgets for uploading data and
contractual placing and anticipate whether or not a client will displaying classification results are included.Power BI reports
be making an investment at the organization across a particular and dashboards offer insights into your customer data. Include
period of time in the coming years is discussed in this article. graphs, charts, and tables to help you visualize key metrics and
Utilizing an array of data that includes over 10,000 customers trends.
as well as a total of 200 thousand payments, the gradient-based
tree booster approach proves to be the most effective technique, G. Validation and Testing:
achieving a precision rating of 89 percent and an area under the During the client's categorization testing and validation
curve (AUC) of 0. phase, you have to assess the model's efficacy on unknown
information and assess the model's accuracy on the set of
III. METHODOLOGY validation data using suitable metrics such as precision, recall,
precision, accuracy, F1-scoring and confusion matrices.
A. Data Collection: Furthermore, by thoroughly validating and testing the client
Initially, we collect a past purchased products dataset categorization algorithm's hyperparameters that guarantee its
without class labels.Then we use RFM analysis by creating 6 validity and appropriateness for situations in the real world,
feature columns of recency ,frequency and monetary with resulting in improved choices based on segmentation of
scores respectively then using the feature columns we create customers.
customer_segments as the class labels for each customer ID and
store it in a new csv file IV. CONCLUSION
development of tailored strategies to enhance customer [10]. Martínez, A., Schmuck, C., Pereverzyev Jr, S., Pirker,
relationships, improve marketing efforts, and drive business C., & Haltmeier, M. (2020).A machine learning
growth. framework for customer purchase prediction in the non-
contractual setting. European Journal of Operational
REFERENCES Research, 281(3), 588-596.
https://doi.org/10.1016/j.ejor.2018.04.034.
[1]. Shijiao Yuan.,Miao Chao Chen.,The Research Article of [11]. Jair Cervantes. , Xiaoou Li. , Wen Yu. , Kang
Analysis of Consumer Behavior Data Based on Deep Li.,Support vector machine classification for large data
Neural Network Model. In:2022 Hindawi Journal of sets via minimum enclosing ball clustering. In:2008
Function Spaces Volume 2022, Article ID 4938278, Neurocomputing Volume 71, Issues 4–6, January 2008,
https://doi.org/10.1155/2022/4938278 Pages 611-619.
[2]. Anitha P., & Patil, M. M. (2022). RFM model for https://doi.org/10.1016/j.neucom.2007.07.028.
customer purchase behavior using K-Means algorithm. [12]. Moghaddam S Q, Abdolvand N and Harandi S R 2017
Journal of King Saud University-Computer and A RFMV Model and Customer Segmentation Based on
Information Sciences,34(5), 1785-1792 , Variety of Products J. Inf.Syst. Telecommun. 5 155–61
https://doi.org/10.1016/j.jksuci.2019.12.011. .In 2017 , Information
[3]. Viet-Ha Nhu., Nhat-Duc Hoang ., Hieu Nguyen., Systems,Telecommunication.https://www.researchgate.
Phuong Thao Thi Ngo., Tinh Thanh Bui., Pham Viet net/publication/321195899.
Hoa., Pijush Samui f., Dieu Tien Bui.Effectiveness [13]. Haghighat Nia S, Abdolvand N and Rajaee Harandi S
assessment of Keras based deep learning with different 2017 Evaluating discounts as a dimension of customer
robust optimization algorithms for shallow landslide behavior analysis J.Mark. Communication.In: 2017
susceptibility.Volume 188, May 2020, 104458. Journal of Marketing
https://doi.org/10.1016/j.catena.2020.104458. Communications.https://doi.org/10.1080/13527266.201
[4]. AmjadIqbal.,RashidAmin.,JavedIqbal.,RoobaeaAlroob 7.1410210.
aea.,Ahmed Binmahfoudh., Mudassar [14]. Alessia Galdeman,Cheick Ba,Matteo Zignani,Sabrina
Hussain.,Sentiment Analysis of Consumer Reviews Gaito.A Multilayer Network Perspective on Customer
Using Deep Learning .In :2022 Issue Artificial Segmentation Through Cashless Payment Data .In: 2021
Intelligence and Digital IEEE 8th International Conference on Data Science and
Transformation.https://doi.org/10.3390/su141710844. Advanced Analytics (DSAA).
[5]. Moulay Youssef Smaili,Hanaa Hachimi.New RFM-D https://doi.org/10.1109/DSAA53316.2021.9564187.
classification model for improving customer analysis [15]. Tabianan, K., Velu, S., & Ravi, V. (2022). K-means
and response prediction.In:2023 Ain Shams Engineering clustering approach for intelligent customer
Journal. https://doi.org/10.1016/j.asej.2023.102254. segmentation using customer purchase behavior data.
[6]. Im K. and Park S., (2021) “A Study on Analyzing Sustainability, 14(12), 7243.
Characteristics of Target Customers from Refined Sales https://www.mdpi.com/2071-1050/14/12/72.
Data”, APIEMS. In:2016 Conference: 18th International
Conference on Business Intelligence, Analytics, and
Knowledge Management.Volume 9,No 4,2016.
https://www.researchgate.net/publication/287815433.
[7]. Christy, A. J., Umamakeswari, A., Priyatharsini,L., &
Neyaa, A. (2021).RFM ranking–An effective approach
to customer segmentation. Journal of King Saud
University Computer and
InformationSciences,33(10),1251-
1257.https://doi.org/10.1016/j.jksuci.2018.09.004.
[8]. Roggeveen, A. L., & Sethuraman, R. (2020).Customer-
interfacing retail technologies in Im K. and Park S.,
(2021) “A Study on Analyzing Characteristics of Target
Customers from Refined Sales Data”.In:2020 Journal of
Retailing.
https://doi.org/10.1016/j.jretai.2020.08.001.
[9]. Zhu, G., Wu, Z., Wang, Y., Cao, S., & Cao, J. (2019).
Online purchase decisions for tourism e-commerce.In
:2019 Electronic Commerce Research and Applications,
38,100887.
https://doi.org/10.1016/j.elerap.2019.100887.