[go: up one dir, main page]

ALL Metrics
-
Views
-
Downloads
Get PDF
Get XML
Cite
Export
Track
Research Article

Mitigating unbalanced and overlapped classes in credit card fraud data with enhanced stacking classifiers system

[version 1; peer review: 1 approved, 1 approved with reservations]
PUBLISHED 21 Jan 2022
Author details Author details
OPEN PEER REVIEW
REVIEWER STATUS

This article is included in the Research Synergy Foundation gateway.

Abstract

Background: Credit cards remain the preferred payment method by many people nowadays. If not handled carefully, people may face severe consequences such as credit card frauds. Credit card frauds involve the illegal use of credit cards without the owner’s knowledge. Credit card fraud was estimated to exceed a $35.5 billion loss globally in 2020, and results in direct or indirect financial loss to the owners. Hence, a detection system capable of analysing and identifying fraudulent behaviour in credit card activities is highly desirable.
Credit card data are not easy to handle due to their inherited problems: (i) unbalanced class distributions and (ii) overlapping classes. General learning algorithms may not be able to address and handle the problems well.
Methods: This study addresses these problems using an Enhanced Stacking Classifiers System (ESCS) that comprises two sequential levels. The first level is an excellent classifier for detecting normal credit card transactions (the majority class), while the second level contains stacking classifiers that distinguish credit card frauds (the minority class). The ESCS can improve the fraud detection via the second level, which contains sensitive classifiers to identify the misclassified fraud transactions as normal transactions from the first classifier. The meta-classifier then combines the decisions of the base classifiers from the levels to produce the final detections.
Results: We evaluated the ESCS using the benchmark credit card fraud dataset (CCFD) that exhibits the two problems. The highest true positive rate (TPR) for detecting credit card frauds was 0.8841, which outperformed the single classifiers, bagging, boosting, and other researchers’ works.
Conclusions: This study proves that the ESCS, with an additional level added to the stacking classifiers, can improve fraud detection on credit card data.

Keywords

Enhanced Stacking Classifiers System, Unbalanced Class Distribution, Overlapping Classes, Credit Card Frauds

Introduction

Credit cards were first introduced in the USA in the early 20th century, and in Malaysia in the mid-1970s.1 Its usage has increased, and it is now widely used in financial transactions around the world. This growth, however, has led to an increase in the number of cases of fraudulent transactions using these cards.

Credit card fraud can be defined as the unlawful use of any system or criminal activity involving a physical card or card information, without the cardholder’s knowledge.2 Based on the study by Ref. 3, credit card fraud detection relies on the automatic analysis of recorded transactions to detect fraudulent behaviour. When a credit card is used, transaction data consisting of several attributes (e.g. credit card identifier, transaction date, recipient, transaction amount) are stored in a service provider’s database.

According to the Nilson report,4 between 2015 and 2020, card fraud worldwide was expected to lead to a total loss of $183.29 billion. In 2020, global card fraud was estimated to exceed $35.54 billion. Credit card frauds have thus become a major issue in society.5

Numerous fraud detection studies have consistently proposed seamless approaches to overcome this issue. However, credit card data sets are not easy to handle as they usually present two challenging characteristics, i.e., (i) unbalanced class distributions and (ii) overlapping classes. Such characteristics are difficult for general classification algorithms to learn and detect credit card frauds.

Unbalanced in class distributions

According to Refs. 68, an unbalanced class distribution is said to happen when some classes in a data set have a much greater number of samples than the other classes (Figure 1). Classes with more samples are called the majority class, while on the other hand, classes with a few samples are called the minority class. In a credit card data set, legitimate transactions are the majority class, whereas frauds are the minority class. Fraudulent transactions happen infrequently compared to legitimate transactions, and the percentage of fraudulent transactions is typically low.

56c479da-d55d-4b0f-8da8-b2214b3a609d_figure1.gif

Figure 1. Unbalanced class distribution in a credit card data set.

Having a few instances of one class means that the general learning algorithms are often unable to generalise their behaviour. Consequently, the algorithms tend to misrepresent a fraudulent transaction as a legitimate transaction.9

Furthermore, most general learning algorithms maximise their effectiveness based on classification accuracy, which is not a good metric for evaluating their performance in classifying unbalanced data sets. Learning algorithms usually assume an even distribution of samples for both classes.10,11 It has caused the general learning algorithms to be overwhelmed by the majority class, hence, perform poorly on the minority class.

Overlapping of classes

Overlapping classes in data sets occur when samples in a minority class are overlapped with samples in a majority class (Figure 2), as the samples share common regions in feature space. When overlapping occurs, it causes difficulties for the general learning algorithms to identify the small class samples.1214 Overlapping classes also occur when minority class samples are located near the decision boundary of a majority class. Thus, the decision boundary of a minority class and a majority class may overlap.1517 A decision boundary is a borderline that separates the regions of different classes in a data set. When the overlapping scenario is combined with the unbalanced class distribution problem, it gives rise to even more difficulties for general learning algorithms in classifying the samples.

56c479da-d55d-4b0f-8da8-b2214b3a609d_figure2.gif

Figure 2. Overlapping classes that occur in a credit card data set.

Related works

Husejinovic18 performed a study on credit card fraud detection using the single classifiers Naïve Bayes and C4.5, and the ensemble classifier Bagging. Bagging consists of a group of “weak learners” to form “strong learners” and uses majority voting to identify the predicted class by selecting the class with the highest vote assigned by the base learners.1921 The researcher conducted experiments on the credit card fraud dataset (CCFD) and investigated the performance of these classifiers through recall, precision, and precision-recall curve (PRC) area rates; PRC was chosen as the main indicator for the study. PRC measures the overall ability to distinguish between binary classes to predict whether a transaction is normal or fraudulent. A higher PRC indicates that the model was performing better. We observed that the fraud detection rates were 0.8 for single classifiers and bagging, and still have room for improvements.

Divakar and Chitharanjan22 also experimented with the CCFD to study the role of boosting classifiers. Boosting is a classification method where each classifier tries to correct the previous classifiers by adding more weights to the previously misclassified sample, and these weighted samples are then given more attention when classified by the next classifiers.23 Three classifiers, namely, AdaBoost, Gradient Boost, and XGBoost, were selected. The researchers managed to achieve a fraud detection rate for AdaBoost (0.69), Gradient boosting (0.72), and XGBoost (0.83), with model accuracies of 99.9%, 99.9%, and 100%, respectively. We can see that the classifiers performed averagely based on the fraud detection rates. The researchers also used the model accuracy as the metric for their performance evaluation. Accuracy is not a suitable performance metric when using unbalanced datasets, as classifiers maximise accuracy and are more biased towards the majority class. Kalid et al.24 used a multiple classifier System (MCS), which utilised a cascading decision combination strategy to detect frauds. The MCS was tested on the credit card fraud dataset. Using this technique, the output of the first classifier was an input of the subsequent classifier, where the samples were classified several times. The classifiers used were C4.5 (which is good at classifying the majority class) for the first level, and Naive Bayes (for classifying the minority class) for the second level. The fraud detection rate achieved by the researchers was 0.872. This result is good, but there is still room for improvement. Sailusha et al.25 also classified transactions in the CCFD using Random Forest and AdaBoost. The fraud detection rate achieved by Random Forest was 0.77 and 0.64 for AdaBoost. The results were average.

As single classifier and ensemble classifier cannot perform well in detecting credit card frauds, we proposed designing the enhanced stacking classifiers system (ESCS) to solve the two main characteristics presented by the credit card data mentioned above. ESCS is a multiple classifiers system that consists of two sequential levels. We integrated a single classifier on the first level, with stacking classifiers on the second level. Wolper first proposed stacking,26 a learning technique that combines multiple classifications through a meta-classifier.27 The meta-classifier then combines all base classifiers’ decisions to produce a final detection. We evaluated the proposed ESCS using the credit card fraud dataset (CCFD), which exhibits unbalanced class distributions and overlapping classes, as mentioned earlier. We describe the detail of ESCS in the following section.

Ethical considerations

This work has been granted ethical approval (Approval Number: EA0532021) by the Research Ethics Committee (REC) of Multimedia University.

Methods

Credit card fraud dataset (CCFD)

In this study, we used a publicly available CCFD released by Ref. 9, which was collected and analysed during a research collaboration between Worldline and the Machine Learning Group of Université Libre de Bruxelles (ULB) on big data mining and fraud detection. The dataset comprises 31 numerical variables, as shown in Table 1. Variables V1 to V28 are the transformation resulting from a principal components analysis (PCA). The original variables and more background information cannot be provided due to confidentiality concerns. The only variables which have not been transformed using PCA are ‘Time’ and ‘Amount’. ‘Time’ refers to the time elapsed between each transaction and the first transaction in the dataset in seconds. ‘Amount’ is the transaction amount. ‘Class’ is the target variable, and it indicates whether the case is a fraud, marked as ‘1’, or normal, marked as ‘0’.

Table 1. Attributes of the credit card fraud dataset (CCFD).

VariablesDescriptions
V1 to V28 (numerical input variables)Not being disclosed as it contains sensitive data
TimeTime elapsed between each transaction and the first transaction in the dataset in seconds
AmountTransaction amount
ClassTarget variable, value ‘1’ refers to a fraud, value ‘0’ to a normal transaction

The CCFD contains credit card transactions made by European credit cardholders over two days in September 2013. It is highly unbalanced: out of 284,807 transactions, 492 were frauds, and the remaining 284,315 were labelled as legitimate transactions. Figures 3 and 4 depict the unbalanced class distributions and overlapped classes of the dataset, which are the main issues to be tackled in this study.

56c479da-d55d-4b0f-8da8-b2214b3a609d_figure3.gif

Figure 3. Visualisation of CCFD variables V10 & V26, showing unbalanced class distributions and overlapping classes.

56c479da-d55d-4b0f-8da8-b2214b3a609d_figure4.gif

Figure 4. Visualisation of CCFD features V5 & V17, showing unbalanced class distributions and overlapping classes.

Class ‘0’ in blue represents the majority class (normal transactions), and Class ‘1’ in orange is the minority class (fraudulent transactions). As shown in Figures 3 and 4, the attributes involved overlapped with each other, and the samples in the majority class outnumbered the minority class. These characteristics also cause difficulties for general learning algorithms to detect credit card frauds effectively.

Enhanced stacking classifiers system

An enhanced stacking classifiers system (ESCS) is proposed to address the class distributions and overlapping issues. It was strategically designed by separating the classes and tackling the data individually at different levels to improve fraud detection rates. ESCS incorporates two sequential levels of multiple-classifier system. The first level contains a classifier that is excellent at detecting normal credit card transactions (the majority class), while the second level consists of single-level stacking classifiers that are good at distinguishing credit card fraud (the minority class). The fraudulent data that were misclassified as normal data, sorted by the classifier on the first level, were filtered out and passed to the second level for re-classification. The re-classification in the second level was performed using two base classifiers stacked with a meta-classifier. These base classifiers are more sensitive classifiers for identifying the misclassified frauds that passed the first level. The meta-classifier was used to combine the base classifier’s decisions to produce the final detection. The framework of ESCS is shown in Figure 5.

56c479da-d55d-4b0f-8da8-b2214b3a609d_figure5.gif

Figure 5. The enhanced stacking classifiers system (ESCS) framework.

Pseudocode 1. Algorithm for ESCS fraud detection

Input: credit card fraud dataset, ccfd
Output: true positive rate for minority & majority class
1. //create a single level stacking classifier called SSC
2. //SSC ←two base classifiers, C2 and C3 and a meta- //classifier MC
3. //create a multiple classifiers system and named it as
  //Enhanced Stacking Classifiers System (ESCS)
4. //ESCS ←classifier C1 + SSC
  //ESCS is a model with the combination of a classifier
  //C1 and a stacking classifier SSC
5. divide ccfd into five partitions with equal distribution of normal and fraud data
6. label the five partitions with K1, K2, … , K5
  //5-fold cross validation
7. for i ← 1 to 5 do
8.  set Ki as Test_ccfd       //test set
9.  set remaining four partitions as Training_ccfd
   //training set
10.  train classifier C1 with Training_ccfd
   //C1 is classifier strong in classifying normal data
   // (majority class)
11.  classify Test_ccfd with trained classifier C1
12.  for each transaction, x in the Test_ccfd do
13.   if class(x) is equal to 1       //fraud data
14.    append x to ccfd(1)
       //ccfd(1) is a dataset of ‘1’ /fraud data
15.   else {class(x) is 0}       //normal data
16.    append x to ccfd(0)
   //ccfd(0) is a dataset of ‘0’ data/predicted normal
17.   end if
18.  end for
19. end for
20. divide ccfd(0) into five partitions with equal distribution of normal and fraud data
21. label five partitions with P1, P2, … , P5
   //5-fold cross validation
22. for j ←1 to 5 do
23.  set Pj as Test_ccfd(0)
24.  set remaining four partitions as Training_ccfd(0)
25.  train SSC with Training_ ccfd(0)
   //C2, C3 are classifiers strong in classifying
   //minority class
26.  classify Test_ccfd(0) with trained SSC
27.  for each transaction, y in the Test_ccfd(0) do
28.   if class(y) is equal to 1   //fraud data
29.    append y to ccfd(1)
   //ccfd(1) is a dataset of ‘1’ /fraud data
30.    remove y from ccfd(0)
31.   end if
32.  end for
33. end for
34. combine ccfd(1), ccfd(0) to ccfdFinal
   //ccfdFinal is a combination of dataset ccfd(1) and
   //ccfd(0)
35. class ← Retrieve only ‘class’ column from ccfdFinal
36. predicted ← Retrieve only ‘predicted’ column from ccfdFinal
37. calculate confusion matrix (class, predicted)
38. calculate TPR(1)       //TPR for minority class
39. calculate TPR(0)       //TPR for majority class
40. return TPR(1), TPR(0)

In this study, five-fold cross-validation was conducted on the CCFD, ccfd. The dataset was divided into five partitions with equal distribution of normal and fraud data (Line 5, Pseudocode 1). A single partition was reserved at each validation step as the test set, Test_ccfd (Line 8, Pseudocode 1), while the remaining four partitions were used as the training data, Training_ccfd (Line 9, Pseudocode 1). This process was then repeated five times until every partition was used for training and testing. On the first level, classifier C1 was trained with Training_ccfd (Line 10, Pseudocode 1) and classified Test_ccfd with it (Line 11, Pseudocode 1). Classifier C1 is a strong classifier of the majority class (normal data).

During classification, if the samples were classified as ‘1’, then they were appended to ccfd(1), which stores all the fraud data (Line 14, Pseudocode 1). If the samples were classified as ‘0’, they were appended to ccfd(0) (Line 16, Pseudo. 1). The ccfd(0) dataset stores all data predicted as normal and is passed to the second level to re-classify the data.

On the second level, we conducted the five-fold cross-validation on ccfd(0) again, divided into five partitions with equal distribution of normal and fraud data, and labelled them P1 to P5 (Line 21, Pseudocode 1). At each validation step, a single partition was reserved as the test set, Test_ccfd(0) (Line 23, Pseudocode 1), while the remaining four partitions were employed as the training data, Training_ccfd(0) (Line 24, Pseudocode 1). A single level stacking classifier, which consisted of classifiers C2, C3 and the meta-classifier, was trained with Training_ccfd(0) (Line 25, Pseudocode 1) and classified Test_ccfd(0) (Line 26, Pseudo. 1) with it. Classifiers C2 and C3 are strong at classifying the minority class.

During re-classification, if the samples were classified as ‘1’, then the samples were appended to data set ccfd(1) (Line 29, Pseudocode 1), and the same samples were deleted in the ccfd(0) to avoid any redundancy in both data (Line 30, Pseudocode 1). If the samples were classified as ‘0’, then they were still stored in ccfd(0).

ccfd(1) and ccfd(0) were then combined and saved as ccfdFinal (Line 34, Pseudocode 1). From ccfdFinal, only the ‘Class’ column (Line 35, Pseudo. 1) and the ‘Predicted’ column (Line 36, Pseudocode 1) were retrieved to form the confusion matrix (Line 37, Pseudo. 1) as in Table 2. Lastly, the final true positive rate (TPR) score for the minority and majority classes was calculated (Lines 38-39, Pseudocode 1) as in Equation (1) and Equation (2).

(1)
TPR for minority=TPActual positive=TPTP+FN
(2)
TPR for majority=TNActual negative=TNTN+FP

Table 2. Representation of confusion matrix.

Predicted: 0Predicted: 1
Actual: 0True negative (TN)False positive (FP)
Actual: 1False negative (FN)True positive (TP)

Results and discussion

We conducted three experiments on the CCFD; single classifiers, bagging and boosting classifiers, and the proposed ESCS model. Their TPR, area under the receiver operating characteristic (ROC AUC) and accuracy, were calculated and are presented in the following tables.

For the single classifier experiment, seven classifiers were used, namely, naïve Bayes (NB), ID3, logistic regression (LR), random forest (RF), multi-layer perceptron (MLP), K-nearest neighbour (KNN) and CART. Overall, we observed good accuracy, where the algorithms achieved scores above 0.99, except KNN (0.4226), CART (0.7995) and RF (0.8030) (Table 3).

Table 3. Result of single classifiers.

Generally, the algorithms could not perform well on the minority class. TPR (1) = true positive rate for minority class; TPR (0) = true positive rate for majority class; ROC AUC = area under the receiver operating characteristic.

Single classifiers
TPR (1)TPR (0)ROC AUCAccuracy
Naïve Bayes0.65850.99300.82570.9924
ID30.75200.99960.87580.9992
Logistic Regression0.64020.99640.81830.9958
Random Forest0.79470.80300.86320.8030
Multi-Layer Perceptron0.79270.99580.79890.9954
K-Nearest Neighbour0.36380.42270.39330.4226
CART0.73980.79960.76970.7995

The TPR results were good for the majority class (class 0). This experiment yielded scores over 0.99 for most classifiers, except KNN (0.4227), CART (0.7996) and RF (0.8030). It was found that the best achievable TPR for the minority class (class 1) was only 0.7947 (RF), followed by 0.7927 (MLP), with a slight difference of 0.002. Then, it was followed by ID3 (0.7520), CART (0.7398), NB (0.6585), LR (0.6402) and KNN (0.3638).

We then tried to improve the detection rate using bagging and boosting (ensemble classifiers) since the single classifiers did not perform well in detecting frauds. This experiment involved one bagging classifier and five boosting classifiers: AdaBoost, gradient boosting, extreme gradient boosting (XGBoost), light gradient boosting machine (LightGBM) and CatBoost.

As shown in Table 4, we can achieve a good overall accuracy rate and TPR for the majority class. The highest accuracy recorded was CatBoost by 0.9993, followed by AdaBoost (0.9979), LightGBM (0.9944), XGBoost (0.9939), bagging (0.8028) and Gradient boosting (0.8004). Similarly, for TPR for the majority class (class 0), CatBoost achieved the highest accuracy by 0.9996, followed by AdaBoost (0.9984), LightGBM (0.9953), XGBoost (0.9942),bagging (0.8028) and gradient boosting (0.8009). However, the overall TPR for the minority class (class 1) was not promising, being an average value. The highest fraud detection rate was 0.7846 by XGBoost and CatBoost. The second best was performed by the bagging classifier, with a value of 0.7744. Finally, it was followed by AdaBoost (0.6931), Gradient Boosting (0.5264) and LightGBM (0.4593).

Table 4. Result of bagging and boosting classifiers.

Fraud detection rates achieved by the classifiers were still not performing at their best. TPR (1) = true positive rate for minority class; TPR (0) = true positive rate for majority class; ROC AUC = area under the receiver operating characteristic.

Bagging
TPR (1)TPR (0)ROC AUCAccuracy
Bagging classifier0.77440.80280.78860.8028
Boosting
TPR (1)TPR (0)ROC AUCAccuracy
AdaBoost0.69310.99840.84580.9979
gradient boosting0.52640.80090.66360.8004
XGBoost0.78460.99420.88940.9939
LightGBM0.45930.99530.72730.9944
CatBoost0.78460.99960.89210.9993

We then designed an ESCS, comprising two sequential levels, to alleviate the two inherited problems of the credit card fraud data (Figure 6). On the first level, we used ID3, which is a strong classifier of the majority class (refer to Table 3). Then, the fraud data that misclassified as normal data were filtered and passed through the second level. On the second level, we used MLP and RF (refer to Table 3), which efficiently classify the minority class, and stacked them with a meta-classifier. These classifiers are more sensitive and identify the misclassified fraud detection from the ID3 at the first level. The meta-classifier was used to combine the decisions of the base classifiers to produce the final detection. We evaluated five different classifiers for the meta-classifier, namely, ID3, RF, LR, NB and MLP. All the classifiers were chosen based on their performance on the CCFD. The ESCS can improve the fraud detection rate through the second level as it contains stacking classifiers that are effective at distinguishing credit card frauds. The ESCS framework is shown in Figure 6 below, and its performance is shown in Table 5.

56c479da-d55d-4b0f-8da8-b2214b3a609d_figure6.gif

Figure 6. The enhanced stacking classifiers system.

Table 5. Results of the enhanced stacking classifiers system.

There were improvements in the fraud detection rate compared to single classifiers, bagging and boosting. TPR (1) = true positive rate for minority class; TPR (0) = true positive rate for majority class; ROC AUC = area under the receiver operating characteristic.

Enhanced stacking classifiers system
TPR (1)TPR (0)ROC AUCAccuracy
ESCS 1; with meta-classifier ID30.76220.99960.88090.9992
ESCS 2; with meta-classifier RF0.80280.99950.90120.9991
ESCS 3; with meta-classifier LR0.77850.99960.88900.9992
ESCS 4; with meta-classifier NB0.88410.98390.93400.9837
ESCS 5; with meta-classifier MLP0.75200.99960.87580.9992

We observed that NB was the best meta-classifier combining all the base classifiers’ decisions to produce the final decision. It attained a 0.8841 fraud detection rate overall. It showed a good non-fraud detection rate of 0.9839, a ROC AUC score of 0.9340 and an accuracy of 0.9837. We could also achieve comparable accuracy rates and non-fraud detection rates for ESCS 1, ESCS 2, ESCS 3 and ESCS 5 when compared with single classifiers and ensemble classifiers. The second-best result was ESCS 2 with the meta-classifier RF, for which the fraud detection rate was 0.8028, followed by ESCS 3 with the meta-classifier LR at 0.7785, ESCS 1 with the meta-classifier ID3 at 0.7622 and ESCS 5 with the meta-classifier MLP at 0.7520.

In conjunction with this experiment, ESCS was compared to other researchers’ works. The comparisons are shown in Table 6.

Table 6. Comparison between the enhanced stacking classifiers system (ESCS) and other researchers’ work on the credit card fraud dataset (CCFD).

ESCS outperformed the rest. TPR (1)= true positive rate for minority class; TPR (0) = true positive rate for majority class.

Credit card fraud dataset
Researchers’ worksTechniqueClassifiersTPR (1)TPR (0)Accuracy
ESCSEnhanced Stacking ClassifiersID3+MLP+RF
Meta -classifier: NB
0.88410.98390.9837
Kalid et al. (2020)24Cascading of Multiple ClassifiersC4.5+NB0.87201.0000.9990
Husejinović (2020)18BaggingBagging0.79700.9160-
Divakar and Chitharanjan (2019)22BoostingXGBoost0.83000.94001.000

ESCS was able to achieve the highest TPR (0.8841) for the minority class, and outperformed the other researchers’ models. ESCS also gave a good accuracy of 0.9837 and a TPR of 0.9839 for the majority class.

ESCS with NB as the meta-classifier showed great performance, and proved that ESCS could improve the fraud detection rate as it can effectively identify misclassified fraud transactions.

Conclusions

Nowadays, credit cards are the most common payment method because of the conveniences they provide. If credit card usage is not well-managed, it may lead to undesirable events, such as credit card frauds. Credit card frauds involve the illegal use of credit cards without the owner’s consent and cause them to suffer a financial loss.

Utilising credit card transaction data is now a necessity to detect frauds. However, it would be challenging to handle credit card data because of their (i) unbalanced class distributions and (ii) overlapping classes. These characteristics also cause difficulties for general learning algorithms to detect frauds effectively.

This study proposes to address these two issues using an ESCS, strategically separating the classes and tackling the data individually at different levels to improve fraud detection rates. We compared the performance of ESCS with single bagging and boosting classifiers. The highest TPR for the minority class (frauds) was 0.8841 using ESCS with NB as the meta-classifier, which outperformed other combinations. We also compared our ESCS with previous research. The results showed that our ESCS outperformed other researchers’ works. This study proves that ESCS can improve the fraud detection rate on credit card data.

Data availability

Underlying data

Figshare: CCFD_dataset, https://doi.org/10.6084/m9.figshare.16695616.v3.28

This project contains the following underlying data:

  • - CCFD_dataset.xlsx (extracted credit card dataset from the original Kaggle dataset)

Data are available under the terms of the Creative Commons Zero “No rights reserved” data waiver (CC0 1.0 Public domain dedication).

Extended data

Analysis code available from: https://github.com/nuramirahishak/ESCS/tree/escs

Archived analysis code as as time of publication: https://doi.org/10.5281/zenodo.5647747.29

License: OSI 3.0

Comments on this article Comments (0)

Version 1
VERSION 1 PUBLISHED 21 Jan 2022
Comment
Author details Author details
Competing interests
Grant information
Copyright
Download
 
Export To
metrics
Views Downloads
F1000Research - -
PubMed Central
Data from PMC are received and updated monthly.
- -
Citations
CITE
how to cite this article
Ishak NA, Ng KH, Tong GK et al. Mitigating unbalanced and overlapped classes in credit card fraud data with enhanced stacking classifiers system [version 1; peer review: 1 approved, 1 approved with reservations]. F1000Research 2022, 11:71 (https://doi.org/10.12688/f1000research.73359.1)
NOTE: If applicable, it is important to ensure the information in square brackets after the title is included in all citations of this article.
track
receive updates on this article
Track an article to receive email alerts on any updates to this article.

Open Peer Review

Current Reviewer Status: ?
Key to Reviewer Statuses VIEW
ApprovedThe paper is scientifically sound in its current form and only minor, if any, improvements are suggested
Approved with reservations A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.
Not approvedFundamental flaws in the paper seriously undermine the findings and conclusions
Version 1
VERSION 1
PUBLISHED 21 Jan 2022
Views
1
Cite
Reviewer Report 25 May 2024
Faouzia Benabbou, Department of Mathematics and Computer Sciences, Hassan II University, Casablanca, Morocco 
Approved with Reservations
VIEWS 1
The paper presents an interesting approach but the paper needs to be improved and some evidence needs to be provided:
  • The paper needs to be better organized, and the section headings need to be numbered.
... Continue reading
CITE
CITE
HOW TO CITE THIS REPORT
Benabbou F. Reviewer Report For: Mitigating unbalanced and overlapped classes in credit card fraud data with enhanced stacking classifiers system [version 1; peer review: 1 approved, 1 approved with reservations]. F1000Research 2022, 11:71 (https://doi.org/10.5256/f1000research.77004.r158344)
NOTE: it is important to ensure the information in square brackets after the title is included in all citations of this article.
Views
7
Cite
Reviewer Report 22 Jul 2022
Yen Lung Lai, School of Information Technology, Monash University Malaysia, Subang Jaya, Malaysia 
Approved
VIEWS 7
This manuscript studies the classification problem over the credit card transactions. In particular, in a credit card transactions data set, legitimate transactions are the majority class, whereas frauds are the minority class. 

The authors proposed the enhanced ... Continue reading
CITE
CITE
HOW TO CITE THIS REPORT
Lai YL. Reviewer Report For: Mitigating unbalanced and overlapped classes in credit card fraud data with enhanced stacking classifiers system [version 1; peer review: 1 approved, 1 approved with reservations]. F1000Research 2022, 11:71 (https://doi.org/10.5256/f1000research.77004.r122464)
NOTE: it is important to ensure the information in square brackets after the title is included in all citations of this article.

Comments on this article Comments (0)

Version 1
VERSION 1 PUBLISHED 21 Jan 2022
Comment
Alongside their report, reviewers assign a status to the article:
Approved - the paper is scientifically sound in its current form and only minor, if any, improvements are suggested
Approved with reservations - A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.
Not approved - fundamental flaws in the paper seriously undermine the findings and conclusions
Sign In
If you've forgotten your password, please enter your email address below and we'll send you instructions on how to reset your password.

The email address should be the one you originally registered with F1000.

Email address not valid, please try again

You registered with F1000 via Google, so we cannot reset your password.

To sign in, please click here.

If you still need help with your Google account password, please click here.

You registered with F1000 via Facebook, so we cannot reset your password.

To sign in, please click here.

If you still need help with your Facebook account password, please click here.

Code not correct, please try again
Email us for further assistance.
Server error, please try again.