1. Introduction
Data mining and machine learning have been at the forefront of research, helping to solve analytical problems and overcoming business problems [
1,
2]. The power of data mining in analyzing big data has been proven in various studies [
3,
4]. The apparel industry is relatively new to the field of data mining and machine learning; however, it has a gamut of application areas in retail, production, and other business operations. Businesses, such as Myntra, Zalando, and StitchFix are trying to tap into the potential of data to gain deeper insight into their consumer bases [
5,
6,
7]. They even provide smart recommendations based on customers’ past purchases. Some retailers gather data using machine learning models and then use it to make important business decisions [
8]. For instance, with the information extracted from data, they can learn what products sell best and which ones need refining. Mined data can be of immense use to marketing teams in designing appealing and targeted promotions to attract more customers.
With the advent of the internet and massive technological developments, there has also been a rise in e-commerce in the apparel industry. The number of retail channels has increased, with customers buying products through different retail channels, such as mobile commerce, social media commerce, and retail shops [
9]. Due to increasing web interactions, there are more ways for customers to leave their digital footprints and for businesses to collect data. These data, available from a multitude of sources and channels, necessitate the adoption of the latest technologies, such as artificial intelligence, big data analytics, and machine learning.
As the contemporary customer relies on online retail channels to make purchases, the need also arises for powerful and intelligent systems that can recommend, personalize, or help the customer in making purchasing decisions. Such models (decision support systems) can help customers in finding the right garments, according to their requirements. The first step towards achieving this is to make the models recognize the different garment categories and corresponding garment attributes. It is important to recommend the right garment to the customer as it directly impacts the customer’s shopping experience as well as the perception of the retailer itself [
10]. Moreover, classifying products based on attributes can be beneficial for demand forecasting, as well as efficient assortment planning and comparison by retailers and producers [
11]. In this context, this study proposes to utilize the big data available in the apparel industry to support the development of a classification framework by applying data mining and machine learning techniques.
Hence, the focus of this article is to build an integrated model that is capable of identifying garment attributes to predict garment type. Our approach applies data mining techniques to build an intelligent model, which learns from existing training dataset containing garment attributes, categories (upper wear, bottom wear, and whole-body wear garment), and sub-categories (shirt, jeans, dress, blouse, etc.). The classifiers are first individually trained to classify the garment categories (subsystem 1) and sub-categories (subsystem 2). After this, an integrated model is created that consists of both subsystems and provides a soft classification for any new instance that the model is provided with. Generally, this article is a preliminary attempt to use data mining and symmetry-based learning concepts, particularly classification, to support the decision-makers by evaluating product attribute data to identify the garment type.
The rest of the paper is structured as follows:
Section 2 discusses the previous research carried in the field of data mining and machine learning in the apparel industry.
Section 3 describes the data mining and machine learning algorithms used in this research.
Section 4 briefly discusses the research framework adopted.
Section 5 presents the results and findings and
Section 6 provides the limitations, future scope, and conclusion.
2. Research Background
Even though the application of data mining and machine learning techniques are relatively new in the apparel industry, they have quickly gained popularity in related research. A considerable amount of work is done in improving various operations in the apparel production supply chain, with the help of data mining, which is discussed in the following section.
For instance, achieving a good garment fit has been a big issue in the apparel industry [
12]. Nonetheless, attempts have been made to address the issue using various data mining techniques. There are a few sub-areas of research within this that are highly focused, including finding the most relevant body measurements to develop a new sizing system [
13,
14,
15] and evaluating the fit of the garment using virtual try-on [
16,
17]. N. Zakaria et al. [
18] employed principal component analysis, k-means clustering, and regression tree to address issues related to the identification of the most important body measurements. Similarly, Hsu and Wang [
19] used Kaiser’s eigenvalue criteria along with the Classification and Regression Trees (CART) decision tree algorithm to identify and classify significant patterns in the body data.
On the other hand, forecasting is another popular research area, where data mining has been used for sales forecasting [
20,
21,
22] and demand forecasting [
23,
24]. An application of time series on e-commerce to forecast sales trends has been discussed in the study by S.V. Kumar et al. [
25]. With their proposed method, it is possible to achieve both short-term and long-term forecasting. The study by Z. Al-halah et al. [
26] used fashion images to predict the popularity of styles in the future. They trained a forecasting model using these style images to represent the trend over time. Yet another application of data mining extensively worked upon is recommender systems [
27,
28]. An excellent overview of the existing apparel recommendation systems is presented in [
29]. It highlights the improvement required in creating a comprehensive apparel and user profile to improve the existing recommendation systems and shows the need for long-term recommendations in design and manufacturing. On these lines, Z.H.U. Ming et al. [
30] considered both user preference and behavioral data to design an online recommendation system aiming to provide increased relevance of the recommendations. In the study [
31], C. Skiada et al. generated association rules using real Point-of-Sales (POS) data to provide recommendations and to understand the customer’s needs and behavior while shopping online or offline.
Furthermore, significant attention has been paid to utilizing image recognition and pattern recognition [
32,
33], and deep learning for classification of fashion images [
34,
35]. W. Surakarin et al. focused on classifying upper-body garments using Support Vector Machine (SVM) with a linear kernel to train the machine-learning model to classify clothing into sub-categories and realized an overall accuracy of 73.57%. On the other hand, C.-I. Cheng et al. [
36] used neural network and fuzzy sets for garment characterization and measurements. More recently, generative adversarial networks were used by K.E.A. et al. [
37] to translate target attributes into fashion images. This method has the advantage of working when the number of attributes to be manipulated in an image is large, which is usually the case with the data in the fashion and apparel industry [
38]. This technique is still at a nascent stage, however, and holds immense potential to advance the task of automatic generation of fashion styles.
Classification techniques have also been used to categorize fabric and sewing defects in the industry using computer vision for different applications (e.g., see [
39,
40] for fabric defects and [
41] for garment defects). It is interesting to note that the classification systems have also been employed in image retrieval systems. For example, A. Vuruskan et al. [
42] created an intelligent system to select fashion for non-standard female bodies using a genetic algorithm and neural network. More recently, the convolutional neural network has become popular for the task of classification of clothing images. H. Tuinhof et al. [
43] trained a convolutional neural network to classify images of fashion products and proposed a system that takes one image as input from the user and provides a range of similar recommendations. Luca Donati et al. [
44] worked on automatic recognition and classification of various features of the garment, solely from rendering images of the products, and achieved an accuracy of 75%.
In some other works, Bossard et al. [
45] focused on identifying the clothes worn by people in images by first locating the upper body in the image and then extracting the features for garment classification using Support Vector Machine and Random Forest with an accuracy of 35.03% and 38.29%, respectively. An interesting finding of this study was the different training accuracies between 38% and 71% for different garment categories. The study in [
46] proposed a cross model search tool, which can do both image annotation and image search by training a neural network with fashion attributes.
When it comes to the classification of garments, most of the studies are associated with image recognition and computer vision. However, when a customer searches for a garment on an online retail channel, they often use certain keywords (garment attributes, categories, styles) while using a retailer’s website, or use ‘hashtags’ while searching on social media retail channels, such as Instagram. Classifying garments using text instead of images can be useful in this scenario. An efficient classification framework for categorizing garment categories according to their attributes can be useful for customers—as it provides better user experience when they receive the correct product suggestions—as well as businesses, as it directly influences sales. In this context, in a study by Hammar, K. et al. [
47], they train a classifier using data from Instagram of clothing attributes and used it to predict the clothing with an f1 score of 0.60. The study in [
48] trained a support vector machine by using the text representing product description to classify fashion styles by brand and achieved an accuracy of 56.25%.
As has been realized by examining the extant literature in the field of data mining and machine learning in the apparel industry, most of the research related to the classification of an apparel product has been focused on using visual features, while the research using attributes as ‘words’ to train the classification model is scant. Consequently, this study uses ‘words’ to build a classification framework that can predict the category and sub-category of garments, given their product attributes.
3. Machine Learning Algorithms for Garment Classification
Constructing precise and effectual classifiers for big databases is one of the basic tasks of data mining and machine learning algorithms. Typically, classification is one of the initial steps to inspect whether a set of observations can be grouped based on some similarity. A classifier aims to find predictor,
, where
represents the instance space, i.e., a feature vector of length
constituting the features set of the object to be classified, and
represents the object level denoting the classification into
unique classes [
49]. The classification predictor
is often trained by training dataset
, split from the original dataset of instances
, where
represents feature-label set of
instance. Here,
where
is the feature set for
object or instance and
is the label assigned to
object or instance. For the binary feature sets i.e., a set of binary variables if selected attributes are present is presented as
, thus,
.
Building these kinds of effective classification functions or systems is central to data mining. Provided a partial observation and a classification, a system can statistically identify the unobserved attribute. There are various kinds of techniques used for classification such as Decision Trees, Gradient Boost, Naïve Bayes, ensemble Learning methods, etc. However, this study employs four techniques: Decision Trees, Naïve Bayes, Random Forest, and Bayesian Forest, and are discussed in brief below.
3.1. Naïve Bayes (NB) Classification
Naïve Bayes classifier is a probabilistic machine-learning model, which is a collection of classification algorithms based on Bayes’ Theorem. It is considered fast, efficient, and easy to implement. It assumes that the predictive features are mutually independent given the class [
50]. In this study, the Bernoulli Naïve Bayes algorithm is used, where each feature is supposed to be a binary-valued variable. Assuming that we have an object
represented by a given feature vector of m-dimensions, i.e.,
, which is a Boolean expressing absence or presence of the
ith feature. Based on the features, the object can be classified into a class
in
. Therefore, according to Bayes theorem [
51],
where,
is called a posterior probability, i.e., probability of class
conditioned to a given feature vector
,
is known as the likelihood and defined as the probability of feature vector
conditioned to class
. The most common applications of the NB classifier include sentiment analysis, recommendation engines, and spam filtering, and is considered fast, efficient, and easy to implement [
52].
3.2. Decision Trees (DT)
Decision trees are one of the most widely implemented supervised learning algorithms and are considered a structured approach for multiclass classification [
53,
54]. They are robust and can achieve high accuracy in various tasks while being accountable. The information gained by a decision tree during the training phase is formulated into a hierarchical structure. This structure is easy to interpret even by non-experts. The development of DT usually involves two steps—induction and pruning—in the formation of a tree-like structure. Induction involves tree building, i.e., the formation of nodes and branches of the decision tree. Each node (excluding the terminal nodes) splits the assigned attribute based on the magnitude or category and creates branching leading to nodes of the next attribute. A given node
is divided into
and
such that the training set
are classified into two subsets namely
and
based on the division of a particular feature
into
and
and
. The splitting of the feature at the node is carried out such that it creates the node, which is purer (i.e., homogenous in terms of their features) in the divided datasets. Therefore, a feature resulting in better segregation of the training data is placed near to the root node (first node of the tree hierarchy) and, subsequently, the other attributes are divided into an iterative process and placed in the tree hierarchy. In this context, Gini impurity or Gini index is used to determine the homogeneity or purity of the split data, based on the attribute, based on the following formulation [
55],
where
is the total number of classes, and
is the fraction of objects labeled in
class.
If the elements of or are of the same class label, no further splitting is done and that particular node is labeled as a terminal node. On the other hand, a node having a mixed labels dataset is further divided into two nodes based on another feature.
Pruning is the process where unnecessary structures are removed from the tree. This reduces the complexity and chances of overfitting making the tree easier to interpret. The basic algorithm iterates through the tree in the top to bottom approach, where the top node with no incoming branch is the root node, the nodes with outgoing branches are internal nodes and all others are leaves. The attributes of a model are depicted by the root and internal nodes, while the target class is depicted by the leaves. To decide the target class of a new instance, the decision tree algorithm begins at the root node, advancing towards the bottom through the internal nodes until it reaches a leaf node. At each node, an assessment is made to choose one of the branches. The new instance is labeled with the class of the concluding leaf node [
56].
3.3. Random Forest (RF)
A random forest is an ensemble of multiple decision trees. It is a popular and highly efficient ensemble method for supervised learning algorithms and can be used for both regression and classification. Since the decision tree approach mentioned in
Section 3.2 involves a single decision network, the main issue remains that the formed single decision tree may not be suitable for all data. In RF, bootstrap aggregating (bagging) technique is applied to a large set of decision tree learners [
57]. Bagging is the process of creating sub-training datasets using the existing data with replacement [
58]. Thus, there could be duplicate values in the sample datasets. As the name suggests, the random forest algorithm stochastically selects training sets to create decision trees. During the phase of testing, the RF receives predictions from each tree and then chooses the most efficient solution with the help of voting [
59]. In a classification problem, every tree created provides a unit vote and assigns each input to the most probable target class. This collection of trees is also called the forest. It is comparatively a faster method that can identify non-linear patterns in data and is a good solution to a common problem with decision trees of overfitting. It works well for both numerical and categorical data.
3.4. Bayesian Forest (BF)
A Bayesian Forest is another ensemble learning method where the decision tree formation relies on the Bayesian statistics [
60]. In RF, the training of the multiple random trees takes place and the appropriate tree configuration is selected, which results in the best classification. In a Bayesian-based random forest method, the Bayesian statistics are used for the selection of random decision trees from a collection of trees. As explained in
Section 3.1, the Bayesian approach starts with a prior distribution. Subsequently, it estimates a likelihood function for each set of data in a decision tree. Bayesian forest draws the weights of the trees from an exponential distribution and the prediction is an approximate posterior mean. The mathematical formulation of the method and the computational steps followed can found in [
60].
4. Research Methodology
Figure 1 shows an overview of the research framework. The research consists of three steps. The first step explains the dataset and tools used and provides details about the feature and target variables. Second is the data pre-processing step that includes data cleaning, data integration, feature selection, and data reduction. Lastly, the model-building step presents the development of the two subsystems, their integration, and the evaluation methods used.
Following the above-mentioned steps, the aim was to develop a classification model that can predict the garment types based on their attributes. The classification model consists of two-level hierarchy, the first level for classifying the garment category, and the other for classifying the garment sub-category. Hence, the classification system first gives an initial decision on whether a garment is for upper, lower or whole body and then based on this further provides a final class decision i.e., shirt, blouse, trousers, jeans, dress, kimono, and other garment sub-categories.
4.1. Tools and Dataset
The dataset used in this study is an open-source dataset named DeepFashion [
61] The original dataset contains 289,222 images of apparel products tagged with 50 garment sub-categories (e.g., shirt, jeans, dresses, etc.) and 1000 garment attributes (A-line, long-sleeve, zipper, etc.). The tagged information was extracted from the dataset to build the classification model while the apparel product images were not used. The garment sub-categories are further grouped into three garment categories: upper wear, bottom wear, and whole-body wear (the list of garment sub-categories within each garment category is available in the
Supplementary Materials as Table S1).
The open source dataset consists of different files, out of which four files were required to develop the classification model. The following files were used to extract information relevant to this study:
List of garment sub-categories tagged in the images along with the corresponding garment categories.
List of 289,222 image names with the corresponding garment category (upper, lower, whole).
List of garment attributes containing the attribute name (A-line, long-sleeve, zipper, etc.) and the corresponding attribute type.
List of 289,222 image names with 1000 columns for each garment attributes providing the presence or absence of the attribute in that image by (−1, 0, 1).
4.2. Data Preprocessing
This section briefly discusses data pre-processing carried out in two steps (data extraction and cleaning and integration) and features selection and data reduction. The following section describes these steps in detail.
4.2.1. Data Extraction, Cleaning, and Integration
As discussed in the previous section, it was important to extract information from different files and then integrate to create a dataset that can be provided as an input to the classification algorithm. The first and second files were used to get a list of image names with corresponding garment categories and sub-categories tagged in that image. As in the fourth file, the garment attributes were represented by numbers (1 to 1000), and the third file contained the attribute names corresponding to each number; the third file was used to replace these numbers by actual attribute names. The resulting dataset and integration of the first and second files were further integrated to get the final dataset.
Finally, this dataset was filtered at two levels. At the first level, dataset A was used that consisted three garment categories as the target variable, i.e., upper wear (referred as Upper or U), bottom wear (referred as Lower or L), and whole-body wear (referred as Whole or W). While at the second level, there were garment sub-categories for each category mentioned at the first level represented by dataset U, L, and W respectively, which included shirts, dresses, jeans, etc.
The resulting dataset was split and transformed to give a dataset for each garment category, as shown in
Figure 1. This step was carried to develop the two subsystems of the classification model, discussed in detail in the sections to follow. After splitting, there were four datasets, the initial dataset A containing all the instances of the three garment categories, a dataset U containing instances of upper wear (U), a dataset L containing instances of bottom wear, and a dataset W containing instances of whole-body wear. The garment categories and sub-categories in each dataset were considered as target labels and the garment attributes as the feature variables.
4.2.2. Feature Selection and Data Reduction
For efficient training of the classifier, it was necessary to select the features that are most relevant for the target class. Since, dataset A has all three garment categories as the target classes, having all the garment attributes is understandable. However, after splitting the dataset for each garment category, not all garment attributes might be relevant. Therefore, this step illustrates feature selection for the datasets U, L, and W. This study uses tree-based feature importance measures. Due to the applicability of random forests to a wide range of problems, the capability to create accurate models, and provide variable importance measures, it was chosen as the preferred algorithm to implement the feature selection.
In case of this type of feature selection, the importance of
feature in
for predicting
class in
is calculated by adding weighted Gini decreases for the nodes
where
is used, averaged over all the trees
in the forest. Therefore, the importance of each feature is calculated by [
62]:
where,
is the proportion of samples reaching .
is the impurity function, i.e., Gini importance or mean decrease Gini.
is the feature used in the split .
This method was chosen, as it is straightforward, fast, and the most accurate method for selecting suitable features for machine learning. Once the feature importance was calculated, the features with a threshold value above ‘1.25 * median’ were selected. The table of most relevant features can be found in the
Supplementary Materials as Table S3. After the selection of the most important features for each dataset, the data reduction step was carried out by removing the rows in all four datasets that did not have any attribute tagged in the corresponding image. This resulted in reduced datasets A, U, L, and W; the final number of attributes and observations for these four reduced datasets are summarized in
Table 1.
4.3. Model Building
The main objective of the proposed methodology is to build a classification that predicts the garment type based on its attributes. As depicted in
Figure 1, to accomplish this, the model building process in itself was split into two phases—the development of subsystems and integration of the subsystems. In the first phase, the classifiers were trained individually for each dataset. The classifier trained with dataset A led to the formation of subsystem 1. While the classifiers trained with dataset U, L, and W led to the formation of subsystem 2. As discussed in
Section 3, the chosen machine learning techniques for training the classifiers were Decision Trees, Naïve Bayes, Bayesian Forest, and Random Forest. The framework of the integrated system with an explanatory instance is depicted in
Figure 2.
4.3.1. Development of Subsystems
Model Development
In general, data classification is a two-step process. The first step indicates the learning or training phase, where a model is developed by providing a predetermined set of classes and the corresponding set of training instances. Each instance is assumed to represent a predefined class. The second step, the testing phase, uses a different set of data instances to estimate the classification accuracy of the model. If the model achieves acceptable accuracy, it can be used to classify future unlabeled data instances. Finally, the model acts as a classifier in the decision-making process. The primary focus of this study is the classification of garment attribute data. The process of testing and training is shown in
Figure 3.
In order to create the classification models (i.e.,
), the four datasets were first split into two parts, 80% used for building the model and the remaining 20% as the validation set for computing the performance of the integrated model. The dataset used for model building was further split into a set of features,
, and target variables,
. All of the garment attributes constitute the feature space, while the garment categories and sub-categories constitute the target space. Next, the target and feature datasets were split into train and test using stratified k (=10) fold cross-validation. The advantage of using stratified k-fold cross-validation is that it rearranges the data to ensure that each fold is a good representation of the entire dataset and, hence, is generally considered a good strategy for classification problems [
63]. Stratified cross-validation is a common preference when dealing with multi-class classification problems, especially in the case of class imbalance [
64]. A final evaluation was done using the test set. The following are the steps followed to accomplish stratified k-fold cross-validation:
The dataset was randomly split into k (=10) equal size partitions.
From the k partitions, one was reserved as the test dataset for the final evaluation of the model, while the other k-1 partitions were used to model training.
The process was repeated for each model and machine learning technique k times with each of the k-partitions used exactly once as the test data.
The k results acquired from each of the test partitions were combined by averaging them, to produce a single estimation.
Following this procedure, all four classifiers were trained separately for each dataset. The classifiers trained using dataset A belonged to subsystem 1, while all the other classifiers belonged to subsystem 2. These classifiers were further integrated into the next section to predict the label of new data instances.
Evaluation
Evaluation is one of the important steps in model building. With this, the accuracy of the classifier can be judged. There are many evaluation metrics available to determine the performance of a classification model. However, for a multiclass classifier, accuracy is the most widely used metric and is calculated as the number of correctly predicted labels divided by the total number of labels. [
65]. Besides, a confusion matrix is widely adopted to measure the performance of a supervised machine-learning algorithm. The number of correct and incorrect predictions is aggregated by count values and broken down by category [
66]. Hence, this study adopts accuracy and confusion matrix to assess the classification model. Moreover, the precision, recall, and f1-score of all the classifiers are also evaluated. The results from each evaluation metric are discussed in detail in
Section 5.
4.3.2. Integration of Subsystems
Up to this point, the two subsystems trained independently, i.e., an instance can be classified into either a garment category or a garment sub-category, and each trained classifier, worked separately to give a prediction. Moreover, there was no way to handle ambiguous cases, where the classifier could not perform a hard classification and resulted in lower accuracy. To tackle these limitations, the concept of soft classification was adopted, which evaluates the conditional probabilities of each class and then realizes the classification based on the evaluated probabilities [
67]. The two subsystems were combined by taking advantage of this characteristic. This section discusses the process of achieving the same in detail.
Model Development
Most classification algorithms compute the posterior probability of a class, given the learning data. In case of hard classification, the model directly yields the predicted class, while a soft classification yields a list of probabilities of all the classes in the form
, where n is the number of data instances and
is the number of classes [
68]. Given the complexity of an apparel product, there are more chances of an ambiguous case occurring in the prediction phase of a classification model. Hence, the concept of soft classification was adopted, which indicates the confidence of a model in its prediction.
Thus, the test dataset from each dataset was used to compute the probability of the target classes. For every data instance, the classifier assigned an estimated posterior probability to each class. If the probability mass concentrates in one class, then it is very likely that the instance belongs to that class. However, if the probability mass is highly distributed, then that is considered as an ambiguous case, and making the final prediction using a threshold value becomes important. By using a threshold, a classifier considers the class with a probability above the given threshold and classifies the instance in question accordingly.
For the mathematical computation of this model, let us consider that the apparel product dataset X is represented by
where
being the total number of instances in the dataset. Each instance is of the form
, where
is a set of product attributes represented by
and
is a set of target classes represented by
. The set of instances
is divided into two sets, train set
and test set
. The instances in
, i.e.,
are used to train the model
, i.e., model to classify garment categories (upper, lower, and whole). Similarly, models
,
, and
are trained to classify garment sub-categories belonging to upper, lower, and whole-body garments, respectively. The datasets used for training these models are explained in
Section 4.2.2.
Following this, the test set
was used to integrate the functionality of the trained models. In this case, the set of features
from
was used. When the first instance from
is given to the model
, it makes a decision
among the class probabilities
, and the final decision is made using the following formulation,
Depending on the decision
, the instance
passes through one of the classifiers from
,
, and
, where
signifies classifier, subscript indicates the respective dataset
, or
as described in
Section 4.2.1. If
is lower (L), then
will be utilized for making further classification of the instance and make a decision
from the class probabilities in
, where
is the number of target classes in lower body garment categories, as explained below,
where,
represents the maximum in
, and
represents the second-highest in
The accuracy of the model is calculated by checking whether the final label is the same as the class
in the test dataset, i.e., if
Hence, the resultant class provided by the model will be given by
. (a comprehensive table of the mathematical symbols used is available in the
supplementary file as Table S2).
Evaluation
After integrating the two subsystems to create a single model, the validation dataset (not used during the model building process) was used to evaluate the model again to see if the accuracy of the classifiers changed positively, as discussed in detail in the next section.
5. Experimentation and Results
This section summarizes the results of the experiments. First, the results from the classification of the individual subsystems are discussed with a comparison between the performances of the four algorithms—Naïve Bayes, Decision Trees, Bayesian Forest, and Random Forest for each dataset. Further, the confusion matrix for each algorithm and subsystem is presented. Following this, the results from the integration of the two subsystems using soft classification are described. Finally, for better comprehension of the working of the entire system, a brief description is provided.
5.1. Analysis of Subsystems
In this study, four algorithms were used to classify the garment data—Naïve Bayes, Decision Trees, Bayesian Forest, and Random Forest. All the classifiers were provided with the same dataset and the model parameters of each classifier are presented in
Table 2. As described in
Section 4.3.2, the dataset was divided into training and testing data, using ten-fold cross-validation.
Figure 4 shows the accuracy of the four classification models for each dataset (A, U, L, and W) as achieved during the k cross-validation implementation. The box plot represents the overall pattern of accuracies achieved by each classifier for each dataset. Further, the evaluation of this model is carried out with a validation dataset to calculate accuracy, precision, recall, and f-score as shown in
Table 3. It should be noted that this validation dataset was not used during the model building process. As is evident in
Figure 4 and
Table 3, for all the datasets, RF achieved the highest performance in terms of accuracy, precision, and recall. The boxplot for RF is comparatively shorter for dataset A, indicating less variation in accuracy during the different training cycles. While for datasets U and W, this variation seems larger. This could correspond to the fact that there are a larger number of target classes for these two datasets. For dataset L, even though the box plot is short, the data is skewed towards the quartile 3 and 4. Besides, there is the presence of an outlier, which is also the case for DT and RF. An outlier can be seen in DT for all datasets, except dataset W. Apart from this, the boxplot for NB is comparatively consistent for all datasets, although the accuracy attained by this classifier is lowest amongst all the classifiers as resulted from the k cross-validation presented in the box plot in
Figure 4.
As can be further analyzed from
Figure 4, dataset U achieved the lowest accuracy for all the classifiers, while datasets A and W, the highest. One of the reasons for the low accuracy for dataset U could be more variation in the product types, i.e., the product attributes used in each upper body garment sub-category highly varied. This corresponds to the fact that in general, there is a higher number of styles available in the upper wear garment category.
To further validate the proposed method, a confusion matrix for all the classifiers and datasets was constructed using the validation dataset (data instances unseen by the model). As an example,
Figure 5 shows the confusion matrix for the RF classifier (the confusion matrix for all of the other classifiers can be found in
Supplementary Materials, Figures S1–S3). Each row represents the instances of the true label, while each column represents the instances of the predicted label. The diagonal represents the number of correct classifications and the off-diagonal instances represent the miss-classifications by the model.
As can be seen in
Figure 5a, the number of correctly classified upper, lower, and whole-body garment categories are 35,402, 11,710, and 18,524, respectively, out of 39,486, 16,775, and 24,661. As in
Figure 5b, the most correctly classified garment sub-categories (in lower) are shorts, skirts, and jeans. Similarly, in
Figure 5d, tee, blouse, and tank, and
Figure 5c, dress, romper, and jumpsuit, are the top three most correctly classified garment sub-categories.
5.2. Analysis of the Integrated System
Until this point, the two subsystems worked independently, with an average accuracy of 71%. To integrate the two subsystems and handle ambiguous cases and improve the accuracy of classification, the concept of soft classification was introduced, as discussed in
Section 4.3.2. To do this, the pre-trained classifiers provided the probability of the classes instead of yielding the predicted class. Subsystem 1 predicted the probability of the garment categories (upper, lower, or whole-body garment), and subsystem 2 predicted the probability of garment sub-categories (dress, blouse, tee, capris, trousers, etc.). The integrated model was presented in
Section 4.3 with an instance shown in
Figure 2.
To present an overview of the working of the whole system, let us consider the following instance. When subsystem 1 receives a string of garment attributes, it will first try to label the data instance into one of the three target classes, upper, lower, or whole-body garments. The class with the highest probability will be considered as the resultant label from subsystem 1. If the label of the new set of data is lower body garment, the string of garment attributes will now pass through the second subsystem. Since it is already determined that it is a lower-body garment, the classifier trained with dataset L will get activated and further try to label the data instance into a specific lower garment sub-category. In this case, the classifier will compute the probabilities of all the lower garment sub-category classes and compare these values to a pre-set threshold value. Based on this value, subsystem 2 will decide the label of the new data instance based on the highest probability. In another case, where at subsystem 2, if two labels have equal or very close probabilities, if the classifier provides the class with the highest probability, even if the difference between the two values is as low as 0.1, the classification result can be considered biased. This would mean that even though the new data instance is close to more than one type of lower garment sub-category, the classifier does not handle this ambiguity well. Due to this reason, having subsystem 2 provide probabilities of these two classes, instead of a single predicted class, can help make an intelligent decision, in turn improving the model accuracy for future data instances. In this way, the system becomes equipped with handling ambiguous cases, which can occur frequently in a large dataset, given the complexity of an apparel product.
The change in classification accuracy due to the aforementioned algorithm can be seen in
Figure 6. To compute the accuracy of the integrated model, the validation set (not used throughout the model building process) was used. As is visible, the accuracy for all the classifiers at different thresholds (0.1, 0.2, 0.3, and 0.4) for datasets U, L, and W improved considerably. In
Figure 6d), the accuracy for dataset U increased from 75% to around 85%. A similar increment can be observed for this dataset for other classifiers as well. Dataset W reached an accuracy greater than 95% for random forest classifiers, which is considered as good performance for a classification model. For all the datasets, the accuracy is still the greatest with the random forest classifier, in correspondence to the results presented in
Figure 4 and
Figure 5.
6. Conclusions
The term big data has become extremely prevalent in the business world, leading to an increase in the use of techniques, such as data mining, machine learning, and artificial intelligence. Businesses are excessively applying these techniques to help collect data on sales trends to understand, better, everything from marketing and inventory needs to acquiring new leads. Data mining is one of the most used techniques due to its ability to analyze a large amount of data for solving business problems. These problems can be targeted by focusing on the business database already present, of customer choices, past transactions, and product profiles.
This study recognizes the importance of product data and uses open-source product attribute data (namely Deep Fashion) from the apparel industry to create a classification model that can identify the garment category (upper, lower, or wholebody garment) and garment sub-category (dress, blouse, capris, trousers etc.). To do this, four classification algorithms were employed: Decision Trees, Naïve Bayes, Bayesian Forest, and Random Forest. The classification model consists of two individual subsystems: (1) to identify the garment category and (2) to identify the garment subcategory. After this, the two subsystems were integrated using soft computation to handle ambiguous cases and improve the overall accuracy of the classification model. It was observed that the performance of the Random Forest classifier was comparatively better with an accuracy of 86%, 73%, 82%, and 90%, respectively, for the garment category, and sub-categories of upper body garment, lower body garment, and whole-body garment. The reason behind a comparatively better performance of random forest classifiers lies in that it creates a large number of uncorrelated trees that are averaged to reduce bias and variance, and handles unbalanced data very well.
Every garment retailer and/or production house collects similar data related to the garment, i.e., the garment categories and attributes in the archive. In addition, these are also the details present on the product pages of the e-commerce websites. Hence, the data can be obtained from these sources and used to create a segmentation based on the attributes used in various garments. This segmentation can be used to classify the data based on the methodology described in this article. Such a classification can have various applications, such as in improving the existing recommendation algorithms by providing words instead of images, and enhancing the parsing algorithms, etc. In addition, as discussed in [
69], living in a digital age, there is the availability of massive datasets in various formats, making it essential to design approaches to handle the access and integration of such data. The presented model can be trained with additional data formats and, hence, incorporate accessing and integrating data from multiple resources (especially data from the internet) as it provides a uniform terminology of garment categories, sub-categories, and their attributes.
This study presents a preliminary investigation and, hence, there are several potential avenues for future work, such as in-depth evaluation of why the upper body garment dataset exhibits the lowest classification accuracy for all the algorithms and how it can be improved. The threshold of the feature selection process can be varied to observe how it affects the model performance. The accuracy of the model can be further improved with the help of a richer dataset as the dataset employed in this study deals with a few limitations, such as data imbalance and the presence of too many negative attributes. Moreover, an application of the proposed model can be realized in a decision support system or a recommendation system that can support the customer in decision making during purchase. Additionally, the proposed framework can be tested with advanced techniques, such as deep learning, to enhance model performance. Further, with data growing at an unprecedented rate, its handling and management incur additional costs (especially when manually labeling data collected through the internet, it is not only expensive but labor-intensive). Hence, the proposed model can be utilized to support the transfer from manual labeling to automatic labeling of the internet data. In the future, we would also like to work on comparing the performance of algorithms based on the input being textual or visual.