Logistics4 0MachineLearninginSCM
Logistics4 0MachineLearninginSCM
net/publication/347639073
CITATIONS READS
3 2,770
4 authors:
All content following this page was uploaded by Belkız Torğul on 04 July 2022.
CHAPTER 7
1. Introduction
Members providing input, raw materials, products or information to companies for the realization of a good or service
are called suppliers. The supplier selection process consists of several steps such as identifying the purpose, determining
the criteria for the purpose, pre-evaluation of the appropriate suppliers found according to the specified criteria and then
making the final selection. In today’s competitive conditions, effective supplier selection, management and development
are crucial for companies to achieve their goals. Because, considering the effect of the material received on the product to
be produced, better the factors such as quality, cost, delivery on time of the material used in production are, the higher is
the value of the goods to be produced in the market and this also provides competitive advantage to the business. To support
the success of the partnership, full cooperation between manufacturers, suppliers and suppliers of suppliers is required. Once
businesses have identified appropriate suppliers and gathered information about them, they evaluate potential suppliers
according to the determined criteria. There are three main criteria for purchasing; Quality, Cost and Delivery. However,
the points to be considered when choosing suppliers have changed from past to present with the development of the supply
chain concept and especially Industry 4.0 effects; nowadays supplier selection has become a process in itself and criteria
for suppliers have increased while there were only a few criteria wanted before such as reasonable price, quality and close
distance. The criteria may vary according to purpose and the product to be supplied, and should be defined in this direction.
The supplier selection process does not end with finding the supplier wanted, but rather, it is a continuous process
that aims to follow, develop and if require, replace existing suppliers with new suppliers, which may benefit more in terms
of criteria. With Industry 4.0, a lot of information is now available on supply chains. Digital technologies enable flexible
decision-making by providing real-time data for all links/members of supply chains (Cavalcante et al. 2019). In addition,
rapid developments in information technology make it easier to collect, transmit and store information. It is necessary to
identify an effective method for evaluating suppliers in the information society, where everything is shaped according to
1
Department of Computer Engineering, Konya Technical University, Faculty of Engineering and Natural Sciences, Konya, Turkey.
2
Department of Computer Programming, Kulu Vocational School, Selcuk University, Konya, Turkey; Email: eesme@selcuk.edu.tr
3
Department of Industrial Engineering, Konya Technical University, Faculty of Engineering and Natural Sciences, Konya, Turkey.
Emails: belkistorgul@gmail.com; tpaksoy@yahoo.com
* Corresponding author: mskiran@ktun.edu.tr
104 Logistics 4.0: Digital Transformation of Supply Chain Management
information. In the age of Industry 4.0, in order to achieve smart results by using all these data effectively, we will apply
the machine-learning method, which can analyze large, various data sets for our supplier selection problem in this chapter.
The chapter is organized as follows: Section 2 presents current relevant literature for supplier selection methods
and studies on supplier selection applied machine learning. Section 3 provides fundamental content covering machine
learning, learning types and learning tasks, Section 4 presents an extract of content describing the use of WEKA. Section
5 illustrates the classification of the Supplier Chain Data on the WEKA platform with four classification algorithms and
finally, Section 6 presents conclusions.
2. Literature
In the current literature, multi-criteria decision making (MCDM) approaches, which support decision makers in evaluating
potential alternatives according to several criteria, have been frequently used for the supplier selection problem. Such as
the Analytic hierarchy process (AHP) (Chan 2003; Liu and Hai 2005), and the Analytic network process (ANP) (Sarkis
and Talluri 2002; Gencer and Gurpinar 2007). In particular, fuzzy set theory with MCDM methods has been widely used
to deal with uncertainty in supplier selection decision-making, such as the Fuzzy AHP (Chan and Kumar 2007; Chan et
al. 2008; Lee 2009; Buyukozkan and Cifci 2011), Fuzzy ANP (Razmi et al. 2009; Kang et al. 2012; Zhang et al. 2015;
Chen et al. 2018), Fuzzy technique for order of preference by similarity to ideal solution (TOPSIS) (Chen et al. 2006;
Awasthi et al. 2010; Kilic 2013; Kumar et al. 2018; Yu et al. 2019), Fuzzy multi criteria optimization and compromise
solution (VIKOR) (Awasthi and Kannan 2016), Fuzzy Multi objective optimization by ratio analysis (MOORA) (Dey et
al. 2012), Fuzzy Elimination and choice expressing reality (ELECTRE) (Sevkli 2010), Fuzzy Decision making trial and
evaluation laboratory (DEMATEL) (Keskin 2015) and combinations thereof, such as the Fuzzy AHP‒TOPSIS (Chen and
Yang 2011), Fuzzy ANP‒TOPSIS (Kuo et al. 2015), Fuzzy AHP‒VIKOR (Mohammady and Amid 2011), etc. Another
commonly used methodology is mathematical programming techniques, such as linear programming (Tiwari et al. 2012),
integer programming (Ding et al. 2009), mixed integer programming (Amid et al. 2009), multi objective programming
(Wu et al. 2010) and goal programming (Mirzaee et al. 2018), again especially in the fuzzy environment. In addition,
Stochastic Programming (Talluri and Lee 2010), Non-linear programming (Yang et al. 2007), Artificial Intelligence models
(Heuristic Algorithms, Neural Networks, Gray System Theory, Rough Set Theory, Case Based Reasoning, ...) have also
started to be applied for supplier selection problems (Guo et al. 2009; Guo et al. 2014).
Machine learning is a classification technique, which has been newly applied in supply chain management. Despite
the remarkable improvements that Machine learning techniques have made in supply chain management, they have
recently attracted researchers’ attention and therefore, their researches on evaluation and selection of suppliers are few.
Valluri and Croson (2005) used agent-based modeling for a supplier selection problem in literature. They modeled two
techniques determining exploration reference points—auction-style focusing on probability of success and newsvendor-
style focusing on profitability and studied the dynamics of high-quality and low-quality supplier interactions. Finally, they
showed that it is definitely better for the buyer to take action with a few suppliers. Guo et al. (2009) introduced a new
support vector machine technology combined with decision tree to address feature selection and multiclass classification
on supplier selection and tested the proposed approach on the data from China. Tang (2009) proposed the support vector
machine, which is kind of new machine learning technology for the assessment of the logistics suppliers in small sample
case condition. Mori et al. (2012) proposed AI-based approach to find plausible candidates of business partners and used
machine-learning techniques to build a prediction model of customer–supplier relationships for 30,660 manufacturing
firms in the Tokyo, Japan. Omurca (2013) proposed a new solution hybridization of fuzzy c-means as a machine learning
technique and rough set theory techniques for supplier evaluation, development and selection problem. The proposed
method selects the best supplier(s), clusters all of the suppliers, decides the most important criteria and extracts the
decision rules about data. Guo et al. (2014) suggested a model based on semi-fuzzy support vector domain description
to address multi-classification problem of supplier selection. They used the semi-fuzzy kernel clustering algorithm to
divided original samples into two subsets—deterministic/fuzzy and used cooperative coevolution algorithm for decision
making. Finally, they tested the proposed model on the data from China. Mirkouei and Haapala (2014) suggested an
integration of machine learning techniques (Support Vector Machine Method) and a mathematical programming model
to select the most appropriate feedstock suppliers. Allgurin and Karlsson (2018) provided a framework for implementing
the Machine Learning algorithm for a qualitative case study of the supplier selection process in Bufab Sweden AB. They
identified 26 variables that are critical for supplier selection and prepared theory and empirical data and then ranked
identified variables by considering Machine Learning algorithms. Cavalcante et al. (2019) developed a hybrid approach
that combines machine learning and simulation and examines its applications for data-driven decision-making support in
selection of resilient supplier.
Supplier Selection with Machine Learning Algorithms 105
3. Machine Learning
Man has struggled to invent and develop various tools to cope with the challenges of meeting his needs throughout history.
Some of the inventions that were the products of intelligence expressed as creative problem-solving skills had an effect far
beyond meeting the needs, and even influenced our way of life. Is it possible that intelligence is a gift that is given only
to mankind? Is it possible to produce machines that can imitate cognitive skills like comprehension, application, analysis,
and synthesis? The “artificial intelligence” concept, which John McCarthy, who was a pioneering American computer
scientist in his field, described as “science and engineering of making intelligent machines” was used for the first time in
1956 at “The Dartmouth College Artificial Intelligence Conference: The Next Fifty Years”, which was organized by him;
and was born as a discipline (Moor 2006). The first examples of Artificial Intelligence were able to produce problem-
focused, specific solutions with classical programming approaches. In other words, machines that can react by detecting
the situations around them can be said to imitate an intelligence; however, it is very difficult to develop programs in areas
where we do not know exactly how the human brain works, where conditions vary and cannot be defined clearly (Hinton
2013). As an alternative to this difficulty in programming, the data mining approaches, which emerged as computers
accelerated and as the Internet became more widespread, have led to significant developments in machine learning methods.
Machine Learning was first used in 1959 by Arthur Samuel, who was pioneer in the field of computer gaming and artificial
intelligence, and constituted a sub-field of Artificial Intelligence. It may be important to hear what the masters of this
field said on machine learning to better understand it. Arthur Samuel defined machine learning as “Machine Learning is
the field of study that gives computers the ability to learn without being explicitly programmed” (Samuel 1959). Yoshua
Bengio, who is known for his works on artificial neural networks and deep learning, defines machine learning as “Machine
learning research is part of research on artificial intelligence, seeking to provide knowledge to computers through data,
observations and interacting with the world. That acquired knowledge allows computers to correctly generalize to new
settings”. Tom Mitchell, American computer scientist and E. Fredkin University Professor at the Carnegie Mellon University,
explained machine learning in a mathematical form as “A computer program is said to learn from experience E with
respect to some class of tasks T and performance measure P if its performance at tasks in T, as measured by P, improves
with experience E” (Mitchell 1997). Based on the definitions of the masters of this field, it may be summarized as follows;
what is asked from machine learning algorithms is to discover the patterns in the data at hand, to develop a model for the
solution of the problem, and generalize it, in other words, produce accurate results for new situations. In this respect, this
field is closely related with computational statistics, mathematical optimization, probability theory, data mining to be able
to carry out the tasks like clustering, classification, regression and estimations. Although there is no clarity and consensus
in the literature, in the common sense, machine learning algorithms may be classified according to the learning type as
Supervised, Unsupervised, and Reinforcement Learning. On the other hand, the problem types that are handled may be
categorized as Classification, Regression, Clustering, Association Rules, Dimensional Reduction, and Density Estimation
(Liao et al. 2012; Shalev-Shwartz and Ben-David 2014; Neapolitan and Jiang 2018).
7. K-Means
8. Fuzzy C-Means
9. Soft Clustering
10. Self-Organizing Maps
11. PCA
12. Associated Rules
13. Neural Network
3.2.1 Classification
Each observation or sample in the dataset belongs to a category. The data set may consist of only two categories (Binominal,
binary), or more than two categories (multinominal, multi-class). The category is often called as class, label, or destination.
In classification problems, which mean a supervised learning task, it is expected that the algorithm that is trained with the
dataset at hand learns the categories in the dataset, and then associate it with a new observation in the category it belongs to.
3.2.2 Regression
The target values are continuous in regression problems, which are supervised learning tasks. The regression approach
predicts the target value by determining the linear or nonlinear relation between two (simple regression) or more variables
(multiple regressions).
3.2.3 Clustering
It divides the observations or instances in the dataset into groups based on their similarities of their features. It is an
unsupervised learning task. The similarity is also expressed as the linear distance, the norm in geometry, and is measured
by calculating. The commonly-used distance function is the Euclidean distance.
4. Introduction to WEKA
It is being developed by Waikato University in New Zealand. WEKA which stands for “Waikato Environment for Knowledge
Analysis”, is a comprehensive collection of machine learning algorithms employed in data mining tasks. WEKA is coded
in java and is open source software released under the GNU General Public License. It can be run on Windows, Macintosh,
Linux operating systems and almost all platforms. By connecting to databases via the Java Database Connectivity (JDBC)
driver, it can treat a query consequence and store the results of the transaction in the databases.
WEKA is kept up-to-date with commendable efforts by its developers to include even the latest algorithms in the
field of data mining. The current algorithms are included in the form of plug-in packages, and users can access and install
the packages through the package management system. Thanks to its diversity of algorithms, it paves the way for users
to solve their problems with different and up-to-date methods and to compare the solution methods without the demand
for code writing.
The GUI Chooser shown in Figure 1, which welcomes the user when WEKA starts, allows switching between five
interactive interfaces. The Explorer is the basic section that contains the tools for the algorithms used to examine and analyze
a dataset and visualization. The Experimenter provides performance statistics by benchmarking different classifiers or filters
applied to the problem. In addition, advanced users can distribute the computational load to multiple machines by using
a Java remote method invocation. The Knowledge Flow is an interface which serves to establish learning models in the
form of a data stream by combining graphical blocks representing data sources, preprocessing tools, learning algorithms,
evaluation methods and visualization modules. The Workbench is a stand binding all graphical interfaces within a single
window in which the appearance of applications and plug-ins can be customized. Even if the interactive interfaces fulfill
the need for many problems, in case they are inadequate for advanced analysis, the SimpleCLI which is a text based coding
section completes the task. The SimpleCLI is also advantageous in terms of memory consumption.
Attribute-Relation File Format (ARFF) developed for WEKA is a text-based dataset file consisting of two distinct sections
called HEADER and DATA as shown in the Figure 2 below. The Header Section contains information lines about the source
and content of the data set, which are written after “%” character, in order to inform the user; however, the comment lines
are not interpreted by WEKA. In addition to the description lines, @relation refers to the descriptive name of the data set
and this information is displayed in the Current Relation Field in the perspective of Preprocess when the data set is loaded.
The last part of the Header lists attributes with their types exposing the data structure. @attribute refers to the attribute name
108 Logistics 4.0: Digital Transformation of Supply Chain Management
followed by its type definition or nominal values. Type definition consists of numeric, string, date and relational. Curly
Brackets, commas and spaces are used composing nominal values. The data section that is started with the @data tag contains
instances on each row where it is mandatory that the sequence corresponds to the above definition and sequence declared by
@attribute. For unknown attribute value question mark is available instead of its value.
4.2 Explorer
It is the most commonly used interface in which tasks operated in the data mining process are provided to the user in six
perspectives which are preprocess, classify, cluster, associate, select attributes, visualize. Initially, the data discovery to
be analyzed is started on the preprocessing perspective, because the other perspectives will invisible unless a data set is
properly loaded in the preprocessing section.
4.2.1 Preprocess
The data to be processed can be obtained from a file, a URL address, a database source or can be created artificially by
means of DataGenerators. Furthermore, an interior editor is also provided for manual manipulation of data. This first
step window performs two important tasks on the data set. The former provides information, such as the number of
instances, the number of attributes, and statistical information for each attribute, to grasp the structure of the data set.
The latter provides a variety of filters implementing processes such as cleansing, transforming, integrating, reducing and
discretizing. In the Filter Panel, filter parameters can be assigned in the TextField as text-based or by selecting options on
the GenericObjectEditor, the visual interface that appears when left-clicking. Changes made to the data can be canceled
or saved for later use. Preprocess Screen can be viewed in Figure 3.
4.2.2 Classify
Numerous classifiers accessed by clicking the Classifier Button are organized according to key approaches as shown in
Figure 4. The ones of the provided algorithms compatible with the dataset loaded in the previous step are visible and others
are invisible in the list. The parameters of the selected algorithm can be edited with its TextField or GenericObjectEditor,
as in classifiers counterpart. Four types of methods are presented to evaluate classifier performance.
1. Use training set: The classifier is tested with the data set used in its training.
2. Supplied test set: The classifier is tested with an exterior data set that is not used in its training.
3. Cross Validation: The data set is subdivided into groups and each one is held for testing, while others are used for
training.
4. Percentage Split: The data set is subdivided into a training set and a test set based on a user-defined percentage.
Supplier Selection with Machine Learning Algorithms 109
@relation weather
@data
sunny,85,85,FALSE,no
sunny,80,90,TRUE,no
overcast,83,86,FALSE,yes
rainy,70,96,FALSE,yes
rainy,68,80,FALSE,yes
rainy,65,70,TRUE,no
overcast,64,65,TRUE,yes
sunny,72,95,FALSE,no
sunny,69,70,FALSE,yes
rainy,75,80,FALSE,yes
sunny,75,70,TRUE,yes
overcast,72,90,TRUE,yes
overcast,81,75,FALSE,yes
rainy,71,91,TRUE,no
Measurement of classifier performance can be elaborated using additional evaluation options and specific evaluation
metrics. These extra options are invoked via the more options button on the test panel. Classifier Output Panel is the area
where the results of the training and test operations are explained. The structure of the data set, the learning scheme and
the test statistics are presented in detail here. As for the Result List Panel, it holds a list of results for each classification
attempt. Through this panel, the user can compare the results of classification experiments, graphically review the results,
and also store them.
4.2.3 Cluster
The Clusterer Button brings up the list of clustering schemas. Similar to classifiers perspective, the parameters of the chosen
algorithm can be edited with its TextField or GenericObjectEditor. The Ignore Attributes Button which throws undesired
attributes out is located under the cluster mode panel. The Store Cluster option determining whether the clustering results
will be visualized is productive for data sets requiring enormous memory usage. Four methods are present for evaluating
the clustering performance:
110 Logistics 4.0: Digital Transformation of Supply Chain Management
Fig. 3: Preprocess.
Fig. 4: Classify.
1. Use training set: The training set is classified according to the clusters obtained and the number of instances per
cluster is calculated.
2. Supplied test set: The boundaries of the clusters can be evaluated on a separate test data.
3. Percentage split: The data set is split into two parts as the training set and the test set, considering a user-defined
certain percentage. The clusters generated using the training segment are evaluated with the test segment.
4. Classes to clusters evaluation: Clustering is assessed by taking into account predefined classes in the data set and
results are represented in the confusion matrix.
The Result List and Clusterer Output Panels are no different from those of the classify perspective shown in Figure 5.
Supplier Selection with Machine Learning Algorithms 111
Fig. 5: Cluster.
4.2.4 Associate
The algorithms in this perspective shown in Figure 6 reveal the association rules among attributes in a data set. The
algorithms are preferred from the Associator Field. The parameters of the algorithms can be edited with Textfield or
GenericObjectEditor. Once the Start Button is clicked, the rules obtained are listed in Associator Output Field.
Fig. 6: Associate.
4.2.6 Visualize
Two-dimensional graphs in which the distribution of attributes in the data set can be displayed are accessed through the
visualize panel shown in Figure 8. Graphs can be constituted with user-defined attributes or data instances. Arrangements
regarding the appearance of the graphs such as color and size can be made.
Fig. 8: Visualize.
Whereas Explorer can only actuate schemes individually, more comprehensive experiments can be implemented by
Experimenter facilities. As distinct from Explorer, a data set can be processed by using a number of algorithms and besides
more than one data set can be processed with one or more algorithms in the application. Consequently, the user has the
opportunity to analyze the performance of algorithms run on the data sets. An experiment consists of three steps entitled
Setup, Run, and Analyze.
Supplier Selection with Machine Learning Algorithms 113
4.3.1 Setup
Setup, shown in Figure 9, has two views including simplified and advanced option. Whereas Simple has a simpler display for
the user, Advanced offers access to all options. The test results can be stored in several alternative recording environments
such as ARFF file, CSV file or JDBC database. Storing the results in the database is more advantageous in terms of time
consumption for experiments broken or to expand them. Just as the classification counterpart, the cross-validation or
percentage split techniques can be used for testing and training process. It is important to repeat the training to generate
more reliable results. In the Iteration Control Field, the number of repetitions is set and while working multiple algorithms
on multiple data sets, it is preferred whether the data sets or algorithms are handled initially. In the Datasets and Algorithms
Panels, once the data sets and learning schemes examined are selected, they become ready to work. It is also possible to
load and store the settings of algorithms that require multiple parameter settings.
As for the advanced interface, the Result Generator Panel has been added to allow the user to determine the result
generators, which is the detailed equivalent of the experiment type in the simple view. Apart from the Result Generator,
there is a Distribute Experiment Panel that distributes the processing load to the other nodes in the network. A database
server, computers and properly generated remote engine policy file are required to perform this feature.
Fig. 9: Setup.
4.3.2 Run
The experiment is launched by clicking the Start Button in the Run Tab shown in Figure 10. After selected learning
schemes have been employed on the data sets, a message stating that the operations were completed without error should
be received. The results are stored in the specified file path in the Result Destination Field on the Setup Tab.
4.3.3 Analyse
If already saved, the experiment is accessed from a file or database source; otherwise, clicking the Experiment Button
will bind the experiment that has just been finalized on the Run Tab. Perform Test is the button which generates detailed
statistics, yet the test configuration must primarily be done by selecting the options in the Configure Test Field. Numerous
criteria are proposed to assess the performance of the learning schemas. These criteria can be viewed in Figure 11.
1. T-Test: Measures whether there is a significant difference between the averages of the user groups.
2. Select Rows and Cols: Assigns the criteria to the rows and columns of the result table.
3. Comparison Field: Selects the type of statistics to compare.
4. Significance: Specifies confidence threshold
5. Sorting (asc) by: Sets the sorting criteria of table rows.
114 Logistics 4.0: Digital Transformation of Supply Chain Management
It is an application where all the data mining methods mentioned in the Explorer section are served in iconic form. The
block functions representing the operation processes are associated with link nodes on the edges of their symbols and
thereby composing a flowchart executing the work. Unlike Explorer; In the KnowledgeFlow shown in Figure 12, both
Supplier Selection with Machine Learning Algorithms 115
continuous learning and components can be applied sequentially, e.g., the data can be passed through multiple filters. In
addition, each of the components is executed as a separate thread.
To summarize the usage of this perspective, filters, classifiers, clusters, association rules and attribute selectors, as well
as other tools, are located in folders named after them in the Design Field built into the left edge of the perspective. These
components are placed on a layout canvas by drag and drop in the order required by the data mining tasks and thereafter
the parameters of the components can be edited by double-clicking or selecting configure option from the shortcut menu.
As for general handling in a data flow, the DataSources tools are used to obtain the dataset from a data source or can
be generated through the DataGenerators tools. Evaluation tools are used for the determination of the column that holds the
class information in the dataset, the approach to which training and test data will be obtained, e.g., split or cross-validation
and the criteria for evaluating test phase. Using the components in the visualization category, the evaluation results can be
represented as text or chart. By means of DataSinks components, a data set subdivision, a trained model, a chart and text-
based information can be recorded. Extra tools are also available to manage the data flow, in the flow and tools categories.
The Workbench added with WEKA version 3.8 brings together the perspectives described so far under the same roof. As
viewed in Figure 13, on the Workbench, each perspective has its own tab located at the top of the layout. The user can define
the settings in the perspectives, such as initial settings, default values and appearance. Apart from these adjustments, there
is no difference in the functionality of the perspectives from those previously described. These settings can be accessed
with the gear illustrated button located to the left of the perspectives.
As an example in the following screenshot shown in Figure 14, the settings which are some initial and default values
belong to the clustering options, are shown. It is also a pleasing alternative to leave the text and background colors on the
output panel to the user’s preference.
4.6 SimpleCLI
Java packages running behind interactive interfaces can be activated with coding via Weka’s command-line interface.
Help lists the main commands of SimpleCLI.
1. capabilities <classname> <args>: Lists the capabilities of the specified class. If the class is a weka.core.OptionHandler
then trailing options after the classname will be set as well.
2. cls: Clears the output area.
3. echo msg: Outputs a message.
116 Logistics 4.0: Digital Transformation of Supply Chain Management
package which is at the much higher level, and the classifiers package is contained in weka package which is at the top-
level. Like IBk> Lazy> Classifiers> Weka. Java packages are called with the java <classname> <args> command. In the
example, k-Nearest Neighbour is run on iris.arff dataset with default parameters. The parameter “- t” refers to the dataset
portion to be used for training.
As shown in the Figure 15, the Tab key is functional as a command complement. In the example, the Tab key lists
matching packages/commands after “java.weka.a” and “java.weka.c”. If the command is composed until the classifier
name, a description of both the general parameters used in each classifier and the classifier-specific parameters is displayed.
Detailed information on the schemes, algorithms and parameters of the packages can be purchased from the WEKA
documentation pages (Bouckaert et al. 2018).
The classification processes have been sampled on supply chain data by using 4 different classifiers consisting of Decision
Tree, Naive Bayes (NB), K-Nearest Neighbor (k-NN), and Artificial Neural Network (ANN).
H(D) = – ∑ p log (p )
i 2 i
(1)
i=1
Here, D refers to the observations in the dataset, c refers to the classes, pi refers to the class i probability, and H(D)
refers to the entropy of the dataset in the Equation 1. Entropy means the probability of an event, and is inversely proportional
to the amount of information acquired. The entropy of an attribute is calculated with Equation 2.
s
|Dj|
HA (D) = – ∑ —D H(D ) j
(2)
j =1
118 Logistics 4.0: Digital Transformation of Supply Chain Management
HA(D)
Information Gain Ratio = (3)
H(D)
Here, HA(D) shows the entropy of the discriminating attribute, which has a value of s dividing the dataset into s
subsets. The attribute with the minimum Information Gain Ratio, the ratio of the information acquisition of the tested
attribute to the total information acquisition, is determined as the decision node.
P(X|C)P(C)
P(C|X ) = (4)
P(X)
Here,
P(C|X) is the posterior probability of Class given Observation.
P(X|C) is the likelihood which is the probability of Observation given Class.
P(C) is the prior probability of Class.
P(X) is the prior probability of Observation.
As it is seen in Equation 5, the Naive Bayes Classifier is the product of all the conditional probabilities, and as P(X)
is equal to all classes, the X observation is considered to belong to the class that maximizes P(X|C) P(C).
n
Since the Naive Bayes Classifier is not an iterative calculation method, it can work quickly classifying big data sets
with high accuracy rates.
The distance between the attribute vector of the observation whose class is sought Xu and the X ij, which is the attribute
vectors of all observations in the training set is calculated as in the distance Equation 6.
The most common metric that is used in similarity measurement is the Euclidean Distance, which is the application
of Pythagorean Theorem, and is formulated in Equation 7. It is calculated by taking the square root of the sum of the
squares the differences between the attribute variables of the new observation and the attribute variables of the neighboring
previous observations.
√
n
The class in which the majority of the most similar k observations belongs is considered as the class of the new
observation in the similarity vector calculated with Equation 6.
In this exemplary application, suppliers are considered according to 10 criteria. The following Table 1 shows the criteria
along with related sources and detailed descriptions. The Quality and On-Time delivery criteria are in the form of number
in percentages, and other criteria consist of discrete numbers between 1–9 according to Likert Scale. The data set contains
1000 instances generated randomly, and each of them contains 10 attributes. The class distribution in the data set is that
144 samples belong to the low-grade supplier represented by class 1, 764 samples belong to the middle-class supplier
120 Logistics 4.0: Digital Transformation of Supply Chain Management
Table 1: Supplier selection criteria with their related sources and definitions.
represented by class 2, and 92 belong to the high-class supplier represented by class 3. The class information is in the
latest column. The dataset constructed is shown in Figure 18 in the Arff format.
The dataset opened in WEKA Explorer is shown in Figure 19. The Preprocess Screen has several parts like the number
of the samples that inform the user on the contents of the dataset, the number of attributes, statistical information on the
values of the attributes, the attributes, and the bar-graph showing the class distributions. However, the main function of
the Preprocess is its including a variety of filters to prepare the data for processing.
When the model is established, if the dataset is as whole as it is here, there are two approaches to create the training and
test sets. The first one is the Percentage Split Method, which divides the datasetoninto
Assurance two parts
the quality at a user-specified
of the service offered rate,
by the supplier and liabilities between the
the training and theMutual
test set. Here,
trust the drawback is that the classification results may be very good or very bad because
buyer and the supplier, Supplier's
the training and9 testand
sets that are (Taherdoost
easy created with the division
and Brard 2019) do not represent the overall
communication characteristics
system with information well.
on The second
communication
is the Cross Validation. In this approach, the dataset is divided into specific subsets, widely
the order's progress data. / to
The10 subsets.
level of trust Each step is
considered as a subset test, and the rest is considered as a training set. The andaverage results
communication withfor each suppliers.
potential subset are considered
as the final classification performance.
Reputation and (Dickson 1966, Weber et al. 1991, Jain et al. The factors such as market share, status,
Figure 20 shows the classification
10 position in 2009, perspective
Deshmukh and of theChttudhari
Supply Chain
2011;, Data. Thepast
image, testing was done
performance andwith the Cross-Validation
reputation of
Method. The Classifier Output Screen
industry provides
Taherdoost detailed
and Brard 2019) statistical data potential
for thesuppliers.
classification process. The results of the
statistics are given in Table 2 and confusion matrices of classifiers are given in Table 3.
@relation SupplyChain
@data
81,76,3,5,1,2,2,3,5,2,1
75,89,3,4,6,2,2,4,5,3,1
81,71,5,2,5,2,1,4,5,3,1
83,83,2,6,2,4,2,5,2,3,1
87,75,2,1,7,3,1,2,4,2,1
81,71,1,4,2,8,1,6,2,2,1
|||||||||||||||||||||||
According to Table 2, the four methods used correctly classified suppliers by more than 80%. However, the ANN
method showed the best performance with 92.6%. Looking at Table 3, the ANN method also predicted classes 1 and 2
better than other methods, and predicted Class 3 worse than only the NB method. As a result, we can say that the use of
ANN would be correct in terms of achieving the best results in supplier evaluation.
The suppliers who are member of Class 1 are the suppliers with high-risk level and the conditions regarding the critical
criteria should be improved first. The firm recommends reducing or eliminating the high risks identified in the process of
guiding suppliers of this class. In case of a negative response in which recovery cannot be achieved, these suppliers are
pruned. The suppliers who are member of Class 2 are the suppliers with medium risk level. Their basic characteristics
(primary criteria) are in good condition and they are potentially recommended candidates, however the findings identified
as risky for the company should be corrected. The evaluation process continues until these suppliers enter in Class 3. The
suppliers who are members of Class 3 are the suppliers with low risk. They can be chosen to establish a long-term relationship
and do not require any action for the firm.
Supplier Selection with Machine Learning Algorithms 123
6. Conclusion
With the growth of supply chains, complex and large amounts of data have become difficult to analyze and supplier
selection has begun to be influenced by many attributes with too complex effects to be determined by conventional methods.
In this chapter, we introduced a new solution approach to supplier evaluation. 10 criteria from the current literature
were selected primarily to evaluate suppliers, and each was assessed for their importance in selecting an appropriate
supplier. Then, hypothetic data set was created for 1000 supplier profiles evaluated according to these criteria. Artificial
neural networks, decision trees, bayesian classifiers and k-nearest neighbor were applied to classify these data in the
WEKA machine learning tool. Three classes were determined for the risk profiles of suppliers and the attitudes of the
enterprises to their suppliers according to each classes were proposed. The results show that the use of J48, NB, k-NN
and ANN algorithms with WEKA machine learning tool can support supplier selection decision-making process and may
lead to improvements in suppliers’ risk reduction decisions and efforts.
In the next step, different algorithms can be run for the data in the WEKA and the results can be compared. As a
result, the proposed approach is flexible and so, can be used to find new partners or is easily applicable to other real case
supplier selection problems however much the dataset size.
References
Allgurin, A. and F. Karlsson. 2018. Exploring Machine Learning for Supplier Selection: A case study at Bufab Sweden AB. Independent
thesis Advanced level (degree of Master (One Year)) Student Thesis.
124 Logistics 4.0: Digital Transformation of Supply Chain Management
Amid, A., S.H. Ghodsypour and C. O’Brien. 2009. A weighted additive fuzzy multiobjective model for the supplier selection problem
under price breaks in a supply Chain. International Journal of Production Economics, 1212: 323–332.
Awasthi, A., S.S. Chauhan and S.K. Goyal .2010. A fuzzy multicriteria approach for evaluating environmental performance of suppliers.
International Journal of Production Economics, 1262: 370–378.
Awasthi, A. and G. Kannan. 2016. Green supplier development program selection using NGT and VIKOR under fuzzy environment.
Computers & Industrial Engineering, 91: 100–108.
Bouckaert, R.R., E. Frank, M. Hall, R. Kirkby, P. Reutemann, A. Seewald and D. Scuse. 2018. WEKA manual for version 3-9-3. The
University of Waikato, Hamilton, New Zealand.
Buyukozkan, G. and G. Cifci. 2011. A novel fuzzy multi-criteria decision framework for sustainable supplier selection with incomplete
information. Computers in Industry, 622: 164–174.
Buyukozkan, G. and F. Gocer. 2017. Application of a new combined intuitionistic fuzzy MCDM approach based on axiomatic design
methodology for the supplier selection problem. Applied Soft Computing, 52: 1222–1238.
Cavalcante, I.M., E.M. Frazzon, F.A. Forcellini and D. Ivanov. 2019. A supervised machine learning approach to data-driven simulation
of resilient supplier selection in digital manufacturing. International Journal of Information Management, 49: 86–97.
Chan, F.T.S. 2003. Interactive selection model for supplier selection process: an analytical hierarchy process approach. International
Journal of Production Research, 4115: 3549–3579.
Chan, F.T.S. and N. Kumar. 2007. Global supplier development considering risk factors using fuzzy extended AHP-based approach.
Omega-International Journal of Management Science, 354: 417–431.
Chan, F.T.S., N. Kumar, M.K. Tiwari, H.C.W. Lau and K.L. Choy. 2008. Global supplier selection: a fuzzy-AHP approach. International
Journal of Production Research, 4614: 3825–3857.
Chen, C.T., C.T. Lin and S.F. Huang. 2006. A fuzzy approach for supplier evaluation and selection in supply chain management.
International Journal of Production Economics, 1022: 289–301.
Chen, Y.G., S. Wang, J.R. Yao, Y.X. Li and S.Q. Yang. 2018. Socially responsible supplier selection and sustainable supply chain
development: A combined approach of total interpretive structural modeling and fuzzy analytic network process. Business Strategy
and the Environment, 278: 1708–1719.
Chen, Z.P. and W. Yang. 2011. An MAGDM based on constrained FAHP and FTOPSIS and its application to supplier selection.
Mathematical and Computer Modelling, 5411-12: 2802–2815.
Chiou, C.Y., C.W. Hsu and W.Y. Hwang. 2008. Comparative investigation on green supplier selection of the american, Japanese
and Taiwanese electronics industry in China. Ieem: 2008 International Conference on Industrial Engineering and Engineering
Management, 1–3: 1909–1914.
Deshmukh, A.J. and A.A. Chttudhari. 2011. A review for supplier selection criteria and methods. Technology Systems and Management,
145: 283–291.
Dey, B., B. Bairagi, B. Sarkar and S. Sanyal. 2012. A MOORA based fuzzy multi-criteria decision making approach for supply chain
strategy selection. International Journal of Industrial Engineering Computations, 34: 649–662.
Dickson, G.W. 1966. An analysis of vendor selection systems and decisions. Journal of Purchasing, 21: 5–17.
Ding, R., Y. Chen and L. He. 2009. Fuzzy Multi-objective Programming Model for Logistics Service Supplier Selection. Ccdc 2009:
21st Chinese Control and Decision Conference, Vols 1-6, Proceedings: 1527–1531.
Erginel, N. and A. Gecer. 2016. Fuzzy multi-objective decision model for calibration supplier selection problem. Computers & Industrial
Engineering, 102: 166–174.
Fix, E. and J.L. Hodges. 1989. Discriminatory analysis—nonparametric discrimination—consistency properties. International Statistical
Review, 573: 238–247.
Gencer, C. and D. Gurpinar. 2007. Analytic network process in supplier selection: A case study in an electronic firm. Applied
Mathematical Modelling, 3111: 2475–2486.
Guo, X.S., Z.P. Yuan and B.J. Tian. 2009. Supplier selection based on hierarchical potential support vector machine. Expert Systems
with Applications, 363: 6978–6985.
Guo, X.S., Z.W. Zhu and J. Shi. 2014. Integration of semi-fuzzy SVDD and CC-Rule method for supplier selection. Expert Systems
with Applications, 414: 2083–2097.
Hinton, G.E. 2007. Learning multiple a layers of representation. Trends in Cognitive Sciences, 1110: 428–434.
Hinton, G.E. 2013. “Lecture 1a: Why do we need machine learning?” http://www.cs.toronto.edu/~tijmen/csc321/lecture_notes.shtml#1a.
Huang, G.B., Q.Y. Zhu and C.K. Siew. 2006. Extreme learning machine: Theory and applications. Neurocomputing, 701-3: 489–501.
Jain, V., S. Wadhwa and S.G. Deshmukh. 2009. Select supplier-related issues in modelling a dynamic supply chain: potential, challenges
and direction for future research. International Journal of Production Research, 4711: 3013–3039.
Kang, H.Y., A.H.I. Lee and C.Y. Yang. 2012. A fuzzy ANP model for supplier selection as applied to IC packaging. Journal of Intelligent
Manufacturing, 235: 1477–1488.
Keskin, G.A. 2015. Using integrated fuzzy DEMATEL and fuzzy C: means algorithm for supplier evaluation and selection. International
Journal of Production Research, 5312: 3586–3602.
Kilic, H.S. 2013. An integrated approach for supplier selection in multi-item/multi-supplier environment. Applied Mathematical
Modelling, 3714-15: 7752–7763.
Kumar, S., S. Kumar and A.G. Barman. 2018. Supplier selection using fuzzy TOPSIS multi criteria model for a small scale steel
manufacturing unit. Procedia Computer Science, 133: 905–912.
Kuo, R.J., C.W. Hsu and Y.L. Chen. 2015. Integration of fuzzy ANP and fuzzy TOPSIS for evaluating carbon performance of suppliers.
International Journal of Environmental Science and Technology, 1212: 3863–3876.
Lee, A.H.I. 2009. A fuzzy supplier selection model with the consideration of benefits, opportunities, costs and risks. Expert Systems with
Applications, 362: 2879–2893.
Supplier Selection with Machine Learning Algorithms 125
Liao, S.H., P.H. Chu and P.Y. Hsiao. 2012. Data mining techniques and applications—A decade review from 2000 to 2011. Expert
Systems with Applications, 3912: 11303–11311.
Liu, F.H.F. and H.L. Hai. 2005. The voting analytic hierarchy process method for selecting supplier. International Journal of Production
Economics, 973: 308–317.
Mirkouei, A. and K.R. Haapala. 2014. Integration of machine learning and mathematical programming methods into the biomass
feedstock supplier selection process. Flexible Automation and Intelligent Manufacturing.
Mirzaee, H., B. Naderi and S.H.R. Pasandideh. 2018. A preemptive fuzzy goal programming model for generalized supplier selection
and order allocation with incremental discount. Computers & Industrial Engineering, 122: 292–302.
Mitchell, T.M. 1997. Machine Learning. New York, McGraw-Hill.
Mohammady, P. and A. Amid. 2011. Integrated fuzzy AHP and fuzzy VIKOR model for supplier selection in an agile and modular virtual
enterprise. Fuzzy Information and Engineering, 34: 411–431.
Moor, J. 2006. The Dartmouth College Artificial Intelligence Conference: The next fifty years. Ai Magazine, 274: 87–91.
Mori, J., Y. Kajikawa, H. Kashima and I. Sakata. 2012. Machine learning approach for finding business partners and building reciprocal
relationships. Expert Systems with Applications, 3912: 10402–10407.
Neapolitan, R.E. and X. Jiang. 2018. Artificial Intelligence: With an Introduction to Machine Learning, CRC Press.
Omurca, S.I. 2013. An intelligent supplier evaluation, selection and development system. Applied Soft Computing, 131: 690–697.
Quinlan, J.R. 1993. C4.5 : programs for machine learning. San Mateo, Calif., Morgan Kaufmann Publishers.
Razmi, J., H. Rafiei and M. Hashemi. 2009. Designing a decision support system to evaluate and select suppliers using fuzzy analytic
network process. Computers & Industrial Engineering, 574: 1282–1290.
Samuel, A.L. 1959. Some studies in machine learning using the game of checkers. IBM J. Res. Dev., 33: 210–229.
Sarkar, A. and P.K. Mohapatra. 2006. Evaluation of supplier capability and performance: A method for supply base reduction. Journal
of Purchasing and Supply Management, 123: 148–163.
Sarkis, J. and S. Talluri. 2002. A model for strategic supplier selection. Journal of Supply Chain Management, 384: 18–28.
Sevkli, M. 2010. An application of the fuzzy ELECTRE method for supplier selection. International Journal of Production Research,
4812: 3393–3405.
Shalev-Shwartz, S. and S. Ben-David. 2014. Understanding Machine Learning : From Theory to Algorithms, Cambridge university
press.
Taherdoost, H. and A. Brard. 2019. Analyzing the process of supplier selection criteria and methods. 12th International Conference
Interdisciplinarity in Engineering (Inter-Eng. 2018), 32: 1024–1034.
Talluri, S. and J.Y. Lee. 2010. Optimal supply contract selection. International Journal of Production Research, 4824: 7303–7320.
Tang, X.L. 2009. Study on selection of logistics supplier based on support vector machine. Proceedings of 2009 International Conference
on Machine Learning and Cybernetics, 1–6: 1231–1235.
Thanaraksakul, W. and B. Phruksaphanrat. 2009. Supplier Evaluation Framework Based on Balanced Scorecard with Integrated
Corporate Social Responsibility Perspective. Imecs 2009: International Multi-Conference of Engineers and Computer Scientists,
Vols. I and II: 1929–1934.
Tiwari, A.K., C. Samuel and A. Tiwari. 2012. Flexibility in Supplier Selection Using Fuzzy Numbers with Nonlinear Membership
Functions, New Delhi, Springer India.
Torğul, B. and T. Paksoy. 2019. Smart and Sustainable Supplier Selection for Electric Car Manufacturers. The International Aluminium-
Themed Engineering and Natural Sciences Conference (IATENS’19). Seydişehir/TURKEY pp. 1063–1067.
Valluri, A. and D.C. Croson. 2005. Agent learning in supplier selection models. Decision Support Systems, 392: 219–240.
Wang, J.W., C.H. Cheng and H. Kun-Cheng. 2009. Fuzzy hierarchical TOPSIS for supplier selection. Applied Soft Computing, 91:
377–386.
Weber, C.A., J.R. Current and W.C. Benton. 1991. Vendor selection criteria and methods. European Journal of Operational Research,
501: 2–18.
Wu, D.D., Y.D. Zhang, D.X. Wu and D.L. Olson. 2010. Fuzzy multi-objective programming for supplier selection and risk modeling: A
possibility approach. European Journal of Operational Research, 2003: 774–787.
Yang, S.T., J. Yang and L. Abdel-Malek. 2007. Sourcing with random yields and stochastic demand: A newsvendor approach. Computers
& Operations Research, 3412: 3682–3690.
Yu, C.X., Y.F. Shao, K. Wang and L.P. Zhang. 2019. A group decision making sustainable supplier selection approach using extended
TOPSIS under interval-valued Pythagorean fuzzy environment. Expert Systems with Applications, 121: 1–17.
Zhang, X.G., Y. Deng, F.T.S. Chan and S. Mahadevan. 2015. A fuzzy extended analytic network process-based approach for global
supplier selection. Applied Intelligence, 434: 760–772.