PDM For Conveyor Belts
PDM For Conveyor Belts
15, 2020.
Digital Object Identifier 10.1109/ACCESS.2020.3006788
ABSTRACT The ascent of Industry 4.0 and smart manufacturing has emphasized the use of intelligent
manufacturing techniques, tools, and methods such as predictive maintenance. The predictive maintenance
function facilitates the early detection of faults and errors in machinery before they reach critical stages.
This study suggests the design of an experimental predictive maintenance framework, for conveyor motors,
that efficiently detects a conveyor system’s impairments and considerably reduces the risk of incorrect faults
diagnosis in the plant; We achieve this remarkable task by developing a machine learning model that classifies
whether the abnormalities observed are production-threatening or not. We build a classification model using
a combination of time-series imaging and convolutional neural network (CNN) for better accuracy. In this
research, time-series represent different observations recorded from the machine over time. Our framework is
designed to accommodate both univariate and multivariate time-series as inputs of the model, offering more
flexibility to prepare for an Industry 4.0 environment. Because multivariate time-series are challenging to
manipulate and visualize, we apply a feature extraction approach called principal component analysis (PCA)
to reduce their dimensions to a maximum of two channels. The time-series are encoded into images via the
Gramian Angular Field (GAF) method and used as inputs to a CNN model. We added a parameterized
rectifier linear unit (PReLU) activation function option to the CNN model to improve the performance
of more extensive networks. All the features listed added together contribute to the creation of a robust
future proof predictive maintenance framework. The experimental results achieved in this study show the
advantages of our predictive maintenance framework over traditional classification approaches.
INDEX TERMS Convolutional neural network (CNN), Gramian angular field (GAF), industry 4.0 (I40),
predictive maintenance, principal component analysis (PCA), smart manufacturing, time-series imaging.
I. INTRODUCTION data types collected in this new growing era of Industry 4.0 is
The recent explosion of smart manufacturing applications, time-series data. Time-series data are known as observations
the Internet of things (IoT), and big data has considerably sequentially recorded over time [3], [4].
increased the amount of data collected and analyzed in dif- Time-series data are intensively analyzed, as a preventive
ferent areas such as health care, transportation, power energy, tool, in the manufacturing industry where unforeseen failures
food and beverage, multimedia, environment, finance, and of machinery can conduct to very long production downtime
logistics. Several types of predictions, production forecast- and losses. Studying and analyzing data to detect faults and
ing, fault detection and, predictive maintenance result from threats in devices before they occur and taking appropriate
analyzing various datasets [1], [2]. One of the most common measures to reduce the risk of failures is called ‘‘predictive
maintenance’’ [5]. As per [6], predictive maintenance is an
The associate editor coordinating the review of this manuscript and ensemble of activities that detect any abnormal physical con-
approving it for publication was Qinfen Lu . dition changes in equipment (signs of failure) to carry out the
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
VOLUME 8, 2020 121033
K. S. Kiangala, Z. Wang: Effective Predictive Maintenance Framework for Conveyor Motors
required maintenance tasks to boost the service life of equip- images combined into a single bigger image used as an input
ment without increasing the risk of failure. For the past years, to a CNN model.
predictive maintenance has been subject to much research to Our research takes a step further on previous work done
bring improvement. One of the current innovative trends for in the manufacturing sector on this innovative concept by
this concept is the use of machine learning (ML) techniques developing an experimental framework that:
in combination with advanced technological concepts to offer 1) Generates accurate predictive maintenance flags for
better predictive maintenance results. conveyor motors by classifying whether observed system
Machine learning (ML) is a field of Artificial Intel- parameters inputs are threats or not.
ligence (AI) to extricate useful insights from various 2) Combines the use of UTS and MTS in one single plat-
data (time-series data) [7] through some of the following form to increase the flexibility of the system. No need to have
paradigms: supervised learning, semi-supervised learning, separate models.
unsupervised learning, and reinforcement learning [8]. It is 3) Facilitates inputs and manipulations of MTS data in
also commonly known as a study that offers machines dif- CNN by reducing their size to two channels through a fea-
ferent means and ways to make correct decisions on their ture extraction method called principal component analysis
own and execute tasks without explicit assistance from human (PCA).
beings. Deep Learning is a branch of ML that has the 4) Offers an option for a future proof CNN model by using
capability of extracting data representation. Some popular parameterized rectifier linear unit configuration (PReLU) to
deep learning methods are Artificial Neural Network (ANN), improve the performance of larger networks.
Convolution Neural Networks (CNN), Deep Belief Network, Our paper is structured as follows: Section 2 presents
Recurrent Neural Networks, and Stacked Auto-Encoders [9]. a literature review of some concepts such as time-series,
In this research, we focus on CNN, which is a deep learning deep learning, predictive maintenance with machine learning,
technique that tries to imitate the operations of a human brain, Imaging time-series for classification. Section 3 describes the
especially its ability to recognize and classify objects based methodologies, technological approaches, and architecture of
on their appearances. This feature has made CNN the con- our predictive maintenance framework. Section 4 presents
ventional method used for image classification and identifi- the experimental results obtained, and the conclusion and
cation [10]. In 2015, [11] initiated an inventive approach that suggestions for future research are provided in Section 5.
improved classification and imputation by encoding univari-
ate time-series (UTS) data to images and using them as inputs II. LITERATURE REVIEW
to CNN models. The concept of computer vision introduced A. TIME-SERIES DATA
the transformation of time-series into images. By learning As mentioned previously, time-series can be defined as
spatially invariant features from raw time series (inputs to a sequence of observations recorded over successive time
the model), the CNN method can reduce the risks of losing points [15]. Time series data can be grouped into two main
temporal information and those that the features learned are categories: Univariate Time Series (UTS) and Multivariate
no longer time-invariant, which are with the traditional multi- Time Series (MTS). UTS are time series composed of a
layer-perceptron approach [12]. The outcome of this study single variable observed over a regular period of time. MTS
generated better results than traditional machine learning are those made of two or more variables recorded over a
techniques for classification, such as decision tree (DT), ran- successive period of time [16].
dom forest (RF), or Support Vector Machine (SVM). Since Equation (1) is a mathematical representation of UTS
then, fewer more studies were conducted in the same vision defined as follows:
utilizing the basis of time-series imaging encoding and deep
B = [b1 , b2 , b3 , · · · , bn , · · · bt ] (1)
learning approaches to ameliorate classification modeling in
various sectors. Reference [13] developed a similar frame- where bn ∈ R, t ∈ N and represents the size of the time series
work that uses Relative Position Matrix with CNN. The data.
method was named RPMCNN and was used to perform the On the other hand, (2) is an expression for MTS.
classification by transforming 2D images from time-series
D = [B1 , B2 , B3 , · · · , Bi , · · · Bm ] (2)
data received as inputs. Their results displayed improved
performances. In the manufacturing sector, an approach was where m ∈ N and represents the size of the MTS, m is also
introduced by [14] using multivariate time-series (MTS) data equal to the number of univariate time series in D, i is the
as input to a classification of Tool wear for a CNN model. unique position identification for each UTS in D. As per (2),
Because of the large volume of MTS data and in order to D contains several UTS similar to those defined in (1). For a
ease data processing, this approach divided MTS inputs into MTS, D, regrouping a number of UTS, B, a single UTS object
three channels before being converted to images and fed into can be defined by (3) as:
the CNN model. Reference [3] conducted another research
Bi = bi(1) , bi(2) , bi(3) , · · · , bi(n) , · · · bi(t)
(3)
in that direction by converting MTS data to colored images
and feeding them as inputs of a CNN model for sensor where t ∈ N and is the size of the UTS [17], [18], i is the
classification. Their research encodes MTS data into multiple unique position identification for each UTS in the MTS.
spatial correlation between each polar point and determine meaningful spatial correlation and create features informa-
their GASF and GADF. GASF mathematical representation tion from input data used to detect patterns. The animals’
is presented in (8) and (9). GADF is defined in (10) and (11). visual cortex was the inspiration behind the CNN concept
and introduced by Hubel and Wiesel, two neurophysiolo-
GASF = cos θi + θj
(8) gists who did many types of research on visual cortical
q q
0 neurons of monkeys and cats [26]. The first modern CNN
GASF = b̃0 · b̃ − 1 − b̃2 · 1 − b̃2 (9)
framework was called LeNet and was published by [27].
GADF = sin θi − θj
(10) After this first model, several other successful architectures
q q
such as ResNet [28], AlexNet [10], VGGNet [29], Inception
GADF = 1 − b̃2 · b̃ − b̃0 1 − b̃2
0
(11)
v3 etc. [30]
A popular mathematical representation of GAF is done in An underlying CNN architecture has the following layers:
a matrix format and defined by (12) and (13):
cos(θ1 + θ1 ) · · · cos(θ1 + θn )
1) A CONVOLUTIONAL LAYER
.. .. .. This layer extracts the input image features by using some
GASF = . . . (12)
filters (feature detectors) and generating a smaller size image
cos(θm + θ1 ) ··· cos(θm + θn )
containing the original input image features. The result of
sin(θ1 − θ1 ) sin(θ1 − θn )
··· the convolutional layer is called a feature map. Before going
.. .. ..
GADF = . . . (13) to the next layer, in most CNN architectures, an activation
sin(θm − θ1 ) ··· sin(θm − θn ) function is applied to the feature maps to increase the non-
linearity of the image (useful to avoid linearity in images
A graphical representation of steps to convert time series to since most images have non-linear features predominantly).
GAF is displayed in Fig.3. One of the most popular activation functions used in deep
In this research, we focus on the GAF method for image learning for the past few years in the Rectifier Linear Unit
encoding since it preserves the temporal correlation of time (ReLU) [31].
series data inputs which is needed for our predictive mainte-
nance framework.
2) A POOLING LAYER
C. CONVOLUTIONAL NEURAL NETWORK (CNN) The pooling layer’s objectives are to generate a spatial invari-
Convolutional neural network (CNN) is a deep learning algo- ant feature for the image (the ability to recognize the image
rithm successfully used for image classification problems. in positions different than the input image) and reduce the
The outstanding performance of CNN in image classifica- size of the feature maps. One standard pooling method used is
tion (computer vision) is due to its ability to extracting ‘‘Max Pooling’’ [32]. Many other convolutional and pooling
layers can be added before the flattening layer to improve the on using SVM and Regularized Least Square (RLS) to pre-
accuracy of a CNN model. dict the appropriate maintenance time of the gas turbines
based on its speed, compressor decay, and gas turbines
3) A FLATTENING LAYER decay. The results showed that the SVM outperformed the
The flattening layer converts the pooled feature map matrix RLS model. Another method proposed by Leahy et al. [20]
(2D) into a vector(1D) input for the neural network (next uses SVM to perform Predictive maintenance on Wind Tur-
layer) [3]. bines. Their approach used ML to create a classification
model for six faults in wind turbines: fault/no-fault, feeding
4) A FULL CONNECTION LAYER faults, air cooling faults, excitation faults, generator heat-
The full connection layer is a neural network composed of ing faults, and mains failure. By using SVM hyperparame-
several neurons’ layers interconnected through the synopsis ter optimization by randomized search, they reached better
and converging to the final outputs. The full connection results on detecting generator heating faults, classifying
layer is where all the classification intelligence of the CNN correctly 100% of the cases. However, on the fault/no-
happens. The first neurons layer receives its input from the fault dataset, they got only 90%recall and 8% precision.
previous flattening layer and goes through several hidden Susto et al. [21] introduced another ensemble approach to
layers before producing the results. detect the best moment for tungsten filaments replaced during
Note: One way to improve a CNN model accuracy is to ion implantation. It is a step in the process of manufacturing
pass through the model forward and backward several times semiconductors. The authors tested SVMs ensembles Pre-
by adjusting the weight of the inputs (iterating the dataset) dictive Maintenance with K-Nearest Neighbor (KNN) and
based on obtained output results until we achieve the desired Predictive Maintenance with SVM; the predictive mainte-
accuracy. The number of time the dataset is iterated is called nance with SVM gave slightly better results than the KNN
Epochs [33]. approach. In [22], the authors used the random forest (RF)
A graphical representation of the basic structure of a CNN ML technique to generate a predictive maintenance approach
architecture is presented in Fig.4. for a cutting machine. The RF model used different rotor
status of the cutting machine to perform classification in the
D. PREDICTIVE MAINTENANCE APPROACHES USING predictive maintenance scheme. Kulkarni et al. [23] worked
MACHINE LEARNING TECHNIQUES on a refrigeration and cold storage system by developing
Predictive maintenance is one advantageous approach an ML base approach that performs predictive maintenance
to ensure smooth and reliable operations of production by detecting early faults on the machinery involved in the
processes. For the past years, researchers conducted many refrigeration. They apply a feature extraction step in the
studies to improve predictive maintenance techniques. One pre-processing phase of the model, which consisted of learn-
innovative trend introduced in the research field is the use ing the pattern of the dataset and seasonality decomposition
of machine learning techniques to ameliorate this concept’s by dynamic time wrapping and clustering. They also built
outcomes. A study was proposed by [19] to apply ML for an RF classifier to recognize if the pattern was abnormal
the predictive maintenance of gas turbines. They focused or not.
Following the same path to improve the quality of pre- difference in the manufacturing environment where avail-
dictive maintenance approaches implemented for systems, ability of systems and machines is essential to production.
our study applies CNN modeling that can extract feature A 1% more accuracy could be the information needed to
representation to offer better results than traditional ML avoid chaos in the plant.
techniques. Unlike traditional ML techniques such as RF
or SVM. CNN has the advantage of accommodating a very A. PRINCIPAL COMPONENT ANALYSIS (PCA) TECHNIQUE
high number of features by quickly determining which ones PCA is an unsupervised learning method, it means that we
have higher weights (more influential to the system) than the don’t make use of the dependent variable to perform its
others, therefore eliminating unnecessary ones. In this era of operations. A. Yunusa-Kaltungo et al. [54] describe a similar
IoT and Big data where massive amounts of data are available approach to reduce data dimension for fault diagnostic on
every day, it is convenient to integrate such a feature into rotating machines. To achieve dimensionality reduction using
modeling techniques. PCA, we go through the following steps:
Having a sound theoretical background on all useful con-
cepts used in this study, let us have a detailed look at 1) PRE-REDUCTION OF DATASET DIMENSION
the methodology applied to construct the predictive mainte- In this study, we consider datasets of time series variables,
nance framework. This methodology focuses on reducing the which are observations of different conveyor motor parame-
dataset dimension (PCA) and using CNN to achieve accurate ters indicating threats to the system. These could be obser-
classification. vations of any other system. Our experimental data is com-
posed of 12 parameters, 11 observations: Vibration speed,
III. PREDICTIVE MAINTENANCE FRAMEWORK Motor torque, Acceleration, Motor Speed, Air pressure, Prod-
METHODOLOGY uct Weight, Deceleration, Current, Belt tension, Motor ten-
This experimental predictive maintenance framework aims to sion, Temperature and one outcome which is the type of
classify conveyor motor states as dangerous or not dangerous Fault detected in the system. Each parameter has about
by encoding time series as images and feeding them into a 15,000 observations or values recorded during a specific
CNN model that performs the classification task. The frame- interval. We express the overall dataset as the expression
work consists of the following stages: p + 1, with p being the number of observations or inde-
1) FEEDING STAGE pendent time series variables and one the number of the
In order to accommodate dual time-series types, we design dependent variable or the label (In our case, the type of
this stage is responsible for the separation of MTS and UTS. faults generated). We discard the number of label one and
This stage has two inputs. The feeding stage has a sub-stage remain with p as the new dimension of our dataset, in this
for MTS data inputs. The sub-stage is called ‘‘Dimensional- case, p=11.
ity Reduction Stage’’ and aims to reduce the size of MTS
inputs to two channels using an approach called principal 2) CALCULATE THE AVERAGE OF EVERY DIMENSION OF THE
component analysis (PCA). By reducing the size of MTS NEW DATASET
data, the system’s complexity decreases, and the performance Since the new size of our dataset is p = 11, the dataset is
improves (data processing volume reduces considerably). composed of eleven time series variables or eleven vectors of
2) IMAGING STAGE observations. In this research, they can be detailed as follows:
At this stage, time series received from the feeding step:
P1(vibration speed) = p11 , p12 , · · ·, p1n
either a UTS or a Reduced MTS are converted into images
P2(motor torque) = p21 , p22 , · · ·, p2n
using the GAF method.
3) CNN CLASSIFICATION MODELING STAGE P3(acceleration) = p31 , p32 , · · ·, p3n
This stage receives encoded images from the previous step
P4(motor speed) = p41 , p42 , · · ·, p4n
and performs a classification task using the CNN method.
P5(air pressure) = p51 , p52 , · · ·, p5n
In this research, we add an option in the CNN model
that uses the Parameterized Rectifier Linear Unit (PReLU)
P6(product weight) = p61 , p62 , · · ·, p6n
activation function to improve the non-linearity feature of
P7(deceleration) = p71 , p72 , · · ·, p7n
input images and to achieve better accuracy at the out-
put when using extensive input networks. Since we built P8(current) = p81 , p82 , · · ·, p8n
our predictive model for small manufacturing industries,
P9(belt tension) = p91 , p92 , · · ·, p9n
the performance results obtained using both CNN with clas-
sic rectifier linear unit (ReLU) and PReLU are very much P10(motor tension) = p101 , p102 , · · ·, p10n
similar.
P11(temperature) = p111 , p112 , · · ·, p11n
Note: Although the performance improvement between
CNN models with ReLU and those using PReLU has been where n is the length of each time series variable.
proven by some authors to be very small (about 1% to In this experiment let us assume n = 15,000, we can gen-
2% accuracy improvement), this could make a massive erate a matrix (14) of size p × n,(11 × 15,000), representing
TABLE 1. Variance-covariance vector relationship. The mathematical expression to find Eigenvalues for the
CVM matrix can be presented as follows:
4) FIND THE EIGENVALUES AND THEIR EIGENVECTORS 5) REDUCE DATASET DIMENSION BY KEEPING
An Eigenvector is in simple terms, a vector which will not EIGENVECTORS WITH HIGHEST EIGENVALUES
change directions after we apply any linear transformation to The Eigenvector with the smallest Eigenvalue carry the least
it [34]. Let us assume our square variance-covariance matrix information of our data. To effectively reduce the dimension
to be defined by (19). of the dataset we focus on the eigenvectors corresponding to
higher Eigenvalues. Since we would like to reduce the size
CVM
p=11 to a dimension of 2 (2 channels input), we only select
VC(P1, P1) · · · VC(P1, P6) · · · VC(P1, P11)
.. .. .. .. .. the first two higher Eigenvalues and their Eigenvectors. If we
. . . . . consider λ1 and λ2 to be the two Eigenvalues with higher
. . . VC(P6, P11) values, with λ1 > λ2 , their corresponding Eigenvectors can
= VC(P6, P1) · · · VC(P6, P6)
.. .. .. .. ..
be combined into a new matrix (23).
. . . . .
VC(P11, P1) . . . VC(P11, P6) . . . VC(P11, P11) e11 e12 . . . e111
G= (23)
(19) e21 e22 . . . e211
TABLE 2. Vibration severity criteria based on ISO 2372 [5]. TABLE 3. Time-series dataset variables.
workflow, we apply PCA to reduce the dimension of the TABLE 4. PCA settings.
independent variables to a maximum of two channels.
FIGURE 13. ‘‘No Fault (NF)’’ motor status sample on GAF images. FIGURE 14. ‘‘Minor Fault (MF)’’ motor status sample on GAF images.
• Precision: Another evaluation metric is precision TABLE 7. Confusion matrix result of SVM model.
defined as the percentage of correct prediction for each
different class, individually, over the total number of
instances predicted for those classes. In this research
our dataset has three classes for the classification model:
no fault (NF), minor fault (MF) and critical fault (CF).
Equation (31) is the mathematical expression for the
precision:
CPm TABLE 8. Confusion matrix result of CNN + RELU.
precision = 100%, m ∈ N (31)
Pm
where m is the number of classes of the dataset, CPm is the
number of correct predictions per class, pm is the total number
of instances predicted for that class (correct and incorrect).
• Recall: The recall is known as the percentage of
instances of a class that were correctly predicted.
In other words it is a ratio of the number of correct pre-
dictions of a class over the sum of correct predictions and
TABLE 9. Confusion matrix result of CNN + PRELU.
missed correct predictions. Its mathematical expression
is presented on (32).
CPm
recall = 100%, m ∈ N (32)
(CPm + IPm )
where m is the number of classes of the dataset, CPm is the
number of correct predictions per class, pm is the total number
of instances predicted for that class (correct and incorrect).
While accuracy is an evaluation metric for the overall
model (all classes included), Precision and Recall are useful for each class. The remaining uncolored cells contain the
to have insights on individual classes and interpret the behav- number of incorrect predictions. The meaning of the labels
ior of each class better. is (1) CF: critical fault, (2) MF: minor fault, and (3) NF:
A confusion matrix is an essential tool that displays a sum- no-fault. We reduced the dimension of the input data for all
mary of classification results, mainly the actual labels versus three machine learning models by applying PCA.
predicted ones [53]. It also computes the accuracy, precision, Note: Confusion matrices results of both CNN mod-
and recall of a classification model. Tables 7, 8, and 9 are els are quite impressive with 100% positive predictions.
confusion matrices of the three experimental classification We achieved an outstanding prediction by running three
models used in our predictive maintenance framework. epochs, which is the number of times the CNN algorithm
On the above confusion matrices, the green-colored cells learns the model behavior using available training set data
are the number of correct predictions made by the model when the training and testing of these models. Training and
TABLE 10. Classification models results summary. obtained show the relevance of using deep learning algo-
rithms such as CNN to improve the accuracy of classification
models. Our predictive maintenance framework architecture
accommodates both UTS and MTS data input. It classifies
the conveyor motor status into three categories based on input
parameters: critical fault, minor fault, and no-fault. The best
overall accuracy achieved by our experimental framework is
about 100%, which is quite sufficient for initiating predict-
ing maintenance schedules. With these excellent results, our
framework reduces the high risk of missing critical faults in
the system, which could lead to a more prolonged breakdown
or unnecessarily initiating maintenance on motors due to
incorrect predictions leading to a waste of resources.
For future work, we would like to expend the frame-
validation (testing) accuracies obtained at the last epoch for work’s ability to deal with diverse data types by adding a
both models are very close to just less than 1% (99, xx% - feature that includes non-linear time series input. We could
100%), which reflects models with fewer chances of overfit- reduce the dimensionality of the non-linear data through
ting. Table 7 summarizes the evaluation metrics results of the the Kernel PCA algorithm; we would also fine-tune our
three classification models used in this research: CNN models by incorporating additional parameters such as
As per the results in Table 10, using CNN for predictive ‘‘Dropout,’’ which can prevent the risk of overfitting. Fur-
maintenance increases the experimental system’s accuracy thermore, we would like to incorporate CNN classifications
to almost 50% as opposed to utilizing a traditional SVM results of the predictive maintenance framework in the oper-
machine learning model. Although the preparation and mod- ational technology (OT) environment where we include clas-
eling steps of a predictive maintenance system using CNN sification motor statuses in Supervisory Control And Data
may seem tedious and demanding, the hardest part of the Acquisition (SCADA) displayed on a Human Machine Inter-
modeling is done once in the beginning. The remaining face (HMI) and remotely accessible in cloud-based applica-
operations will be a fine-tuning of parameters and load- tions by supervisors and operators.
ing new observations in the system for the algorithm to
improve its performance. Depending on the plant activities, REFERENCES
the supervisors can perform this operation once a month or [1] R. Wan, S. Mei, J. Wang, M. Liu, and F. Yang, ‘‘Multivariate temporal
convolutional network: A deep neural networks approach for multivariate
during maintenance and shutdown. Results obtained for both time series forecasting,’’ Electronics, vol. 8, no. 8, p. 876, Aug. 2019.
CNN+ReLU and CNN+PReLU are identical for relatively [2] R. Zhang, F. Zheng, and W. Min, ‘‘Sequential behavioral data process-
small datasets but could make a difference for more extensive ing using deep learning and the Markov transition field in online fraud
detection,’’ Math., Comput. Sci., vol. 1808, no. 05329, pp. 1–5, Aug. 2018.
networks. This option is, therefore, quite handy for a future [Online]. Available: https://arxiv.org/pdf/1808.05329.pdf
proof model, which will undoubtedly have to deal with a [3] C. L. Yang, Z. X. Chen, and C. Y. Yang, ‘‘Sensor classification using
more significant number of data. The best overall accuracy convolutional neural network by encoding multivariate time series as
two-dimensional colored image,’’ Sensors, vol. 20, no. 1, 168, pp. 1–15,
achieved is 100%. These are outstanding results that bring Dec. 2019, doi: 10.3390/s20010168.
more effectiveness and reliability in the system and makes a [4] M. Gahirwal and M. Vijayalakshmi, ‘‘Inter time series sales forecasting,’’
big difference when predicting machine conditions (conveyor Int. Journ. Adv. Stud. Comp. Sci. Eng., vol. 2, no. 1, pp. 55–66, Mar. 2013.
[Online]. Available: https://arxiv.org/pdf/1303.0117.pdf
motors) since an incorrect prediction could either result in
[5] K. S. Kiangala and Z. Wang, ‘‘Initiating predictive maintenance for a
critical breakdown or unnecessary maintenance expenses. conveyor motor in a bottling plant using industry 4.0 concepts,’’ Int. J. Adv.
As the production/manufacturing systems are different, it is Manuf. Technol., vol. 97, nos. 9–12, pp. 3251–3271, May 2018.
essential to conduct a proper study on each system behavior [6] K. Wang, ‘‘Intelligent predictive maintenance (IPdM) system-industry 4.0
scenario,’’ WIT Trans. Eng. Sci., vol. 113, no. 10, pp. 259–268, 2016.
before implementing the adequate predictive maintenance [7] P. Domingos, ‘‘A few useful things to know about machine learning,’’
approach. Commun. ACM, vol. 55, no. 10, pp. 78–87, Oct. 2012.
[8] C. M. Bishop, Pattern Recognition and Machine Learning (Information
VI. CONCLUSION Science and Statistics). New York, NY, USA: Springer-Verlag, 2006.
[9] Q. Zhang, L. T. Yang, Z. Chen, and P. Li, ‘‘A survey on deep learning for
This research presents an experimental framework that trig- big data,’’ Inf. Fusion, vol. 42, pp. 146–157, Jul. 2018.
gers effective predictive maintenance for conveyor motors [10] A. Krizhevsky, I. Sutskever, and G. E. Hinton, ‘‘ImageNet classifica-
in a small manufacturing industry utilizing a classification tion with deep convolutional neural networks,’’ in Proc. Adv. Neural Inf.
Process. Syst. (NIPS), vol. 25. Red Hook, NY, USA: Curran Associates,
model built through time-series input data imaging and CNN
Dec. 2012, pp. 1097–1105.
with a future-proof option for more extensive networks using [11] Z. Wang and T. Oates, ‘‘Imaging time-series to improve classifica-
PReLU activation function to improve model performance. tion and imputation,’’ 2015, arXiv:1506.00327. [Online]. Available:
Several conveyor system parameters observed sequentially http://arxiv.org/abs/1506.00327
[12] Z. Wang and T. Oates, ‘‘Encoding time series as images for visual inspec-
are converted into images using GASF that has the advan- tion and classification using tiled convolutional neural networks,’’ in Proc.
tage of preserving temporal features. Experimental results AAAI Workshops, Jan. 2015, pp. 40–46.
[13] W. Chen and K. Shi, ‘‘A deep learning framework for time series classi- [36] P. Sermanet, D. Eigen, X. Zhang, M. Mathieu, R. Fergus, and Y. LeCun,
fication using relative position matrix and convolutional neural network,’’ ‘‘Overfeat: Integrated recognition, localization and detection using convo-
Neurocomputing, vol. 359, pp. 384–394, Sep. 2019. lutional networks,’’ in Proc. Int. Conf. Learn. Represent., Scottsdale, AZ,
[14] G. Martínez-Arellano, G. Terrazas, and S. Ratchev, ‘‘Tool wear classifi- USA, 2014, pp. 1–16.
cation using time series imaging and deep learning,’’ Int. J. Adv. Manuf. [37] K. Chatfield, K. Simonyan, A. Vedaldi, and A. Zisserman, ‘‘Return of
Technol., vol. 104, nos. 9–12, pp. 3647–3662, Oct. 2019. the devil in the details: Delving deep into convolutional nets,’’ 2014,
[15] R. Adhikari and R. K. Agrawal, ‘‘An introductory study on time series arXiv:1405.3531. [Online]. Available: http://arxiv.org/abs/1405.3531
modeling and forecasting,’’ 2013, arXiv:1302.6613. [Online]. Available: [38] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov,
http://arxiv.org/abs/1302.6613 D. Erhan, V. Vanhoucke, and A. Rabinovich, ‘‘Going deeper with con-
[16] L. Sadouk, ‘‘CNN approaches for time series classification,’’ in Time volutions,’’ 2014, arXiv:1409.4842. [Online]. Available: http://arxiv.org/
Series Analysis Methods and Applications for Flight Data. London, abs/1409.4842
U.K.: IntechOpen, 2018. [Online]. Available: https://www.intechopen. [39] V. Nair and G. E. Hinton, ‘‘Rectified linear units improve restricted Boltz-
com/books/time-series-analysis-data-methods-and-applications/cnn- mann machines,’’ in Proc. 27th Int. Conf. Mach. Learn. (ICML), 2010,
approaches-for-time-series-classification, doi: 10.5772/intechopen.81170. pp. 807–814.
[17] G. E. P. Box, G. M. Jenkins, G. C. Reinsel, and G. M. Ljung, Time Series [40] A. L. Maas, A. Y. Hannun, and A. Y. Ng, ‘‘Rectifier nonlinearities improve
Analysis: Forecasting and Control, 5th ed. New York, NY, USA: Wiley, neural network acoustic models,’’ in Proc. Int. Conf. Mach. Learn.(ICML),
2015. 2013, pp. 1–6.
[18] T. Górecki and M. Łuczak, ‘‘Multivariate time series classification with [41] M. D. Zeiler, M. Ranzato, R. Monga, M. Mao, K. Yang, Q. V. Le,
parametric derivative dynamic time warping,’’ Expert Syst. Appl., vol. 42, P. Nguyen, A. Senior, V. Vanhoucke, J. Dean, and G. E. Hinton, ‘‘On
no. 5, pp. 2305–2312, Apr. 2015. rectified linear units for speech processing,’’ in Proc. IEEE Int. Conf.
[19] A. Coraddu, L. Oneto, A. Ghio, S. Savio, D. Anguita, and M. Figari, Acoust., Speech Signal Process., Vancouver, BC, Canada, May 2013,
‘‘Machine learning approaches for improving condition-based mainte- pp. 3517–3521.
nance of naval propulsion plants,’’ Proc. Inst. Mech. Eng., Part M, J. Eng. [42] M. Lin, Q. Chen, and S. Yan, ‘‘Network in network,’’ 2013,
Maritime Environ., vol. 230, no. 1, pp. 136–153, Feb. 2016. arXiv:1312.4400. [Online]. Available: http://arxiv.org/abs/1312.
[20] K. Leahy, R. L. Hu, I. C. Konstantakopoulos, C. J. Spanos, and 4400
A. M. Agogino, ‘‘Diagnosing wind turbine faults using machine learning [43] R. K. Srivastava, J. Masci, S. Kazerounian, F. Gomez, and J. Schmidhuber,
techniques applied to operational data,’’ in Proc. IEEE Int. Conf. Prognos- ‘‘Compete to compute,’’ in Proc. Adv. Neural Inf. Process. Syst. (NIPS),
tics Health Manage. (ICPHM), Jun. 2016, pp. 1–8. 2013, pp. 2310–2318.
[21] G. A. Susto, A. Schirru, S. Pampuri, S. McLoone, and A. Beghi, ‘‘Machine [44] I. J. Goodfellow, D. Warde-Farley, M. Mirza, A. C. Courville, and Y. Ben-
learning for predictive maintenance: A multiple classifier approach,’’ IEEE gio, ‘‘Maxout networks,’’ in Proc. Int. Conf. Mach. Learn., vol. 28, no. 3,
Trans. Ind. Informat., vol. 11, no. 3, pp. 812–820, Jun. 2015. Jun. 2013, pp. 1319–1327.
[22] M. Paolanti, L. Romeo, A. Felicetti, A. Mancini, E. Frontoni, and J. Lon- [45] K. He, X. Zhang, S. Ren, and J. Sun, ‘‘Spatial pyramid pooling in deep
carski, ‘‘Machine learning approach for predictive maintenance in industry convolutional networks for visual recognition,’’ IEEE Trans. Pattern Anal.
4.0,’’ in Proc. 14th IEEE/ASME Int. Conf. Mech. Embedded Syst. Appl. Mach. Intell., vol. 37, no. 9, pp. 1904–1916, Sep. 2015.
(MESA), Jul. 2018, pp. 1–6. [46] A. G. Howard, ‘‘Some improvements on deep convolutional neural net-
[23] K. Kulkarni, U. Devi, A. Singhee, J. Hazra, and P. Rao, ‘‘Predictive main- work based image classification,’’ 2013, arXiv:1312.5402. [Online]. Avail-
tenance for supermarket refrigeration systems using only case temperature able: http://arxiv.org/abs/1312.5402
data,’’ in Proc. Ann. Amer. Contr. Conf. (ACC), Milwaukee, WI, USA, [47] G. E. Hinton, N. Srivastava, A. Krizhevsky, I. Sutskever, and
Jun. 2018, pp. 4640–4645. R. R. Salakhutdinov, ‘‘Improving neural networks by preventing co-
[24] J. Tenenbaum, V. de Silva, and J. C. Langford, ‘‘A global geometric adaptation of feature detectors,’’ 2012, arXiv:1207.0580. [Online].
framework for nonlinear dimensionality reduction,’’ Science, vol. 290, Available: http://arxiv.org/abs/1207.0580
no. 5500, pp. 2319–2323, Dec. 2000. [48] N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and
[25] S. T. Roweis and L. K. Saul, ‘‘Nonlinear dimensionality reduction by R. Salakhutdinov, ‘‘Dropout: A simple way to prevent neural networks
locally linear embedding,’’ Science, vol. 290, no. 5500, pp. 2323–2326, from overfitting,’’ J. Mach. Learn. Res., vol. 15, no. 1, pp. 1929–1958,
Dec. 2000. 2014.
[26] J. Patterson and A. Gibson, Deep Learning: A Practitioner’s Approach. [49] L. Wan, M. Zeiler, S. Zhang, Y. L. Cun, and R. Fergus, ‘‘Regularization
Sebastopol, CA, USA: O’Reilly Media, 2017. of neural networks using dropconnect,’’ in Proc. Int. Conf. Mach. Learn.,
[27] Y. LeCun, B. Boser, J. S. Denker, D. Henderson, R. E. Howard, 2013, pp. 1058–1066.
W. Hubbard, and L. D. Jackel, ‘‘Handwritten digit recognition with a back- [50] K. He, X. Zhang, S. Ren, and J. Sun, ‘‘Delving deep into rec-
propagation network,’’ in Proc. Adv. Neural Inf. Process. Syst. Conf., 1990, tifiers: Surpassing human-level performance on ImageNet classifica-
pp. 396–404. tion,’’ in Proc. IEEE Int. Conf. Comput. Vis. (ICCV), Dec. 2015,
[28] K. He, X. Zhang, S. Ren, and J. Sun, ‘‘Deep residual learning pp. 1026–1034.
for image recognition,’’ 2015, arXiv:1512.03385. [Online]. Available: [51] Y. LeCun, B. Boser, J. S. Denker, D. Henderson, R. E. Howard,
http://arxiv.org/abs/1512.03385 W. Hubbard, and L. D. Jackel, ‘‘Backpropagation applied to handwrit-
[29] K. Simonyan and A. Zisserman, ‘‘Very deep convolutional networks for ten zip code recognition,’’ Neural Comput., vol. 1, no. 4, pp. 541–551,
large-scale image recognition,’’ 2014, arXiv:1409.1556. [Online]. Avail- Dec. 1989.
able: http://arxiv.org/abs/1409.1556 [52] IFM Electronics. (2013). From Process Monitoring to Vibration
[30] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, ‘‘Rethinking Analysis. Condition Monitoring Systems. pp. 5–12. [Online]. Available:
the inception architecture for computer vision,’’ 2015, arXiv:1512.00567. http://www.ifm.com/download/files/ifm-efector-octavis-brochure-GB-
[Online]. Available: http://arxiv.org/abs/1512.00567 2013/$file/ifm-efector-octavis-brochure-GB-2013.pdf
[31] X. Glorot, A. Bordes, and Y. Bengio, ‘‘Deep sparse rectifier neural net- [53] S. Visa, B. Ramsay, A. L. Ralescu, and E. Van Der Knaap, ‘‘Confusion
works,’’ in Proc. Conf. Artif. Intell. Statist., 2011, pp. 315–323. matrix-based feature selection,’’ in Proc. Midwest Artif. Intel. Cognit.
[32] Y. T. Zhou and R. Chellappa, ‘‘Computation of optical flow using a Scienc. Conf., vol. 710, 2011, pp. 120–127.
neural network,’’ in Proc. IEEE Int. Conf. Neural Netw., vol. 2, Jul. 1988, [54] A. Yunusa-Kaltungo, J. K. Sinha, and A. D. Nembhard, ‘‘A novel fault
pp. 71–78. diagnosis technique for enhancing maintenance and reliability of rotating
[33] A. Devarakonda, M. Naumov, and M. Garland, ‘‘AdaBatch: Adaptive machines,’’ Struct. Health Monit., Int. J., vol. 14, no. 6, pp. 604–621,
batch sizes for training deep neural networks,’’ 2017, arXiv:1712.02029. Nov. 2015, doi: 10.1177/1475921715604388.
[Online]. Available: http://arxiv.org/abs/1712.02029 [55] C. Sanders. (2011). A guide to vibration analysis and associated techniques
[34] M. Munir, ‘‘Eigenvalues-theory and applications,’’ Dept. Math., in condition monitoring. DAK Consulting-Chiltern House. [Online].
Govern. Postgrad. College, Karl-Franzens-Universität, Graz, Austria, Available: http://www.dakacademy.com/newsite/index.php?option=com_
Tech. Rep., 2015. [Online]. Available: https://www.researchgate. k2&Itemid=500&id=94_007cd4b8b347e375bc10dbe5efbccc28&lang=
net/publication/309012418_Eigenvalues-Theory_and_Applications, en&task=download&view=item
doi: 10.13140/RG.2.2.15926.91201. [56] J. Alsalaet. (Dec. 2012). Vibration Analysis and Diagnostic Guide.
[35] M. D. Zeiler and R. Fergus, ‘‘Visualizing and understanding convolutional [Online]. Available: https://www.researchgate.net/publication/311420765_
networks,’’ in Proc. Eur. Conf. Comput. Vis., 2014, pp. 818–833. Vibration_Analysis_and_Diagnostic_Guide
KAHIOMBA SONIA KIANGALA received the ZENGHUI WANG (Member, IEEE) received the
B.Tech. degree in electrical engineering from the B.Eng. degree in automation from the Naval Avia-
Tshwane University of Technology (TUT), South tion Engineering Academy, China, in 2002, and the
Africa, in 2014, and the master’s degree in elec- Ph.D. degree in control theory and control engi-
trical engineering from the University of South neering from Nankai University, China, in 2007.
Africa (UNISA), in 2019, where she is currently He is currently a Professor with the Department
pursuing the Ph.D. degree in science, engineer- of Electrical and Mining Engineering, Univer-
ing and technology (SET). Her research interest sity of South Africa (UNISA), South Africa. His
includes adapting artificial intelligence techniques research interests are industry 4.0, control theory
to improve automation techniques for small to and control engineering, engineering optimization,
medium scale industries in an Industry 4.0 environment. image/video processing, artificial intelligence, and chaos.