Open AccessArticle

Gasoline Engine Misfire Fault Diagnosis Method Based on Improved YOLOv8

Mechanical & Electrical Engineering College, Jinling Institute of Technology, Nanjing 211169, China

Science and Technology on Microsystem Laboratory, Shanghai Institute of Microsystem and Information Technology, Chinese Academy of Sciences, Shanghai 201800, China

Author to whom correspondence should be addressed.

Electronics 2024, 13(14), 2688; https://doi.org/10.3390/electronics13142688

Submission received: 7 June 2024 / Revised: 27 June 2024 / Accepted: 28 June 2024 / Published: 9 July 2024

(This article belongs to the Special Issue AI-Aided Sustainable IoT System: Theories, Techniques, and Applications)

Download

Browse Figures

Figure 1
Engine sound signal acquisition. "> Figure 2
Engine sound signal. (a) normal; (b) one-cylinder misfire; (c) two-cylinder misfire. "> Figure 3
Wavelet transformation time–frequency image. (a) normal; (b) one-cylinder misfire; (c) two-cylinder misfire. "> Figure 4
Structure of BiFormer. "> Figure 5
The overview of CBAM (Note: Pictures are from the Ref. [<a href="#B34-electronics-13-02688" class="html-bibr">34</a>]). (A) the structure of channel attention; (B) the structure of spatial attention; (C) the structure of CBAM attention. "> Figure 5 Cont.
The overview of CBAM (Note: Pictures are from the Ref. [<a href="#B34-electronics-13-02688" class="html-bibr">34</a>]). (A) the structure of channel attention; (B) the structure of spatial attention; (C) the structure of CBAM attention. "> Figure 6
Structural comparison of YOLOv8 and YOLOv8-CBBF. (A) Structure of YOLOv8; (B) Structure of YOLOv8-CBBF. "> Figure 7
YOLOv8-CBBF training process. (A) Train Loss; (B) Validation Loss; (C) Train accuracy. ">

Versions Notes

Abstract

In order to realize the online diagnosis and prediction of gasoline engine fire faults, this paper proposes an improved misfire fault detection algorithm model based on YOLOv8 for sound signals of gasoline engines. The improvement involves substituting a C2f module in the YOLOv8 backbone network by a BiFormer attention module and another C2f module substituted by a CBAM module that combines channel and spatial attention mechanisms which enhance the neural network’s capacity to extract the complex features. The normal and misfire sound signals of a gasoline engine are processed by wavelet transformation and converted to time–frequency images for the training, verification, and testing of convolutional neural network. The experimental results show that the precision of the improved YOLOv8 algorithm model is 99.71% for gasoline engine fire fault tests, which is 2 percentage points higher than for the YOLOv8 network model. The diagnosis time of each sound is less than 100 ms, making it suitable for developing IoT devices for gasoline engine misfire fault diagnosis and driverless vehicles.

Keywords:

sound; misfire fault diagnosis; improved YOLOv8; time–frequency images; wavelet transformation

1. Introduction

Misfire failures in gasoline engines, often resulting from issues within the fuel supply, intake, or ignition systems, can significantly impair engine performance. Such failures lead to a reduction in the engine’s output torque and an increase in emission pollution, thereby diminishing combustion efficiency and exacerbating vibration. More critically, misfires can induce severe issues such as crankshaft deformation or fatigue failure, which may lead to catastrophic accidents. Given these implications, many countries mandate the detection of misfire failures in gasoline engines. Accurate prediction and early diagnosis of these failures can prevent disastrous outcomes. This has prompted numerous scholars to dedicate their research to this crucial area of study.

Data-driven deep learning methods have been widely used in the diagnosis of fault research in recent years. The deep features of the fault data can be extracted adaptively by deep learning methods. João L. Firmino designed an artificial neural network to identify the features selected after a fast Fourier transform of the vibration and sound signals of a four-stroke spark ignition engine, which could achieve a high identification accuracy of engine misfire faults [1]. S. Naveen Venkatesh converted the engine head vibration signal into images and used the converted image to train the classic networks AlexNet, VGG-16, GoogleNet, and ResNet-50. In the training process, he found the optimal back size, solver, learning rate, and train–test split ratio, and other super-parameters, and the performance using the best network for the internal combustion engine fault detection was very good [2]. Wang, X. collected crankshaft speed data of a six-cylinder inline turbocharged diesel engine under normal conditions, six single-cylinder misfire faults, and 15 double-cylinder misfire faults. Four neural network structures were designed, and five data partitioning strategies were tested. The results showed that the algorithm based on LSTM RNN can overcome the limitations of traditional misfire detection methods under high-speed and low-load conditions, with a misfire fault diagnosis accuracy rate of no less than 99.90% [3].

Convolutional Neural Networks (CNN) possess strong capabilities in data feature extraction, robustness, and generalization. The CNN method has been considered an effective approach for developing automated fault diagnosis systems. Many scholars have conducted research on fault diagnosis based on CNN [4,5,6,7,8,9]. Syed Maaz Shahid designed a CNN with a single convolutional layer and trained it using crankshaft angle signals that represented the complete working cycle of the engine. The test results showed that the accuracy of this method for engine misfire fault detection exceeded 99% [10]. Pan Zhang installed pressure sensors on the cylinder head of a six-cylinder inline diesel engine and an angular velocity sensor on the flywheel housing. The obtained pressure and angular velocity data were then used to train a designed CNN. The final test results indicated that the network could identify single-cylinder and double-cylinder misfire faults in diesel engines, with an accuracy rate of over 96% for single-cylinder misfire identification. However, when the engine operated at high speeds, the accuracy of identifying misfire faults decreased significantly [11]. Terwilliger developed a new multi-task CNN deep learning cascade architecture that explored experimental studies on acoustic features, data enhancement, and data reliability by cascading fuel type, engine configuration, cylinder count, and aspiration type attributes. The research results showed that the cascaded CNN could achieve a test set accuracy of 87.0% for misfire fault detection [12].

YOLO (You Only Look Once) is also used for fault diagnosis due to its excellent performance [13]. Wu Yukun established a new convolutional network model by deepening and pruning the network model YOLO, which is feasible and effective to apply this model to bearing fault diagnosis [14]. Xuan Zhenyuan used the k-means algorithm for cluster analysis and applied the YOLO model to quickly and accurately identify the faults of Electric Multiple Unit skirts [15]. An increasing number of scholars have used the YOLO network for various equipment fault detections and achieved high accuracy [16,17].

Vibration data are utilized by scholars in misfire detection in combination with neural networks. Cheng Gu decomposed the multi-channel vibration signals of diesel engine cylinder heads using the Multivariate Empirical Mode Decomposition (MEMD) method, and reconstructed the Intrinsic Mode Functions (IMF) components with high correlation coefficients of the original signals. Cheng Gu used the dispersion entropy (DE) of the reconstructed signals as fault feature vectors to input into a Support Vector Machine (SVM). Experimental results showed that this method was effective in identifying and diagnosing misfire faults in diesel engines [18]. Anil Kumar performed wavelet synchrosqueezed transform (WSST) on the time-domain vibration signals of a motorcycle’s internal combustion engine to form time–frequency images. By introducing an entropy-based regularization function to improve the cost function, the performance of the CNN was enhanced. Testing experiments proved that the improved deep learning method improved the accuracy of diagnosing defects in motorcycle internal combustion engines [19]. Qin, C. first designed a CNN to remove original vibration noise, then designed a multi-scale CNN to extract features, and finally used LSTM to extract sequence features. By constructing a new loss function, a relatively accurate diagnosis of diesel engine misfire faults was achieved, with an average accuracy rate of over 96% [20]. Xiaowei Xu established a rigid–flexible mixed multibody dynamics model to simulate the instantaneous angular velocity signals of the shaft under various operating conditions. By analyzing the changes in torsional vibration of the shafting, it was found that the 0.5-order harmonic phase and amplitude characteristics could effectively identify engine misfire faults [21]. Zhiwei Liu used the Lagrangian equation method to establish a torsional dynamic system model. The research results found that the torsional vibration and torque of the flexible coupling could effectively identify diesel engine faults, and the vibration signal parameters could identify and locate the misfire faults of the engine [22]. A. Syta et al. utilized vibration data from three directions of a Rotax 912 ULS aircraft engine and proposed a linear index to describe the vibration level based on the spectral values of the power amplitudes of two selected frequencies. He used these indexes to identify the faulty cylinder [23].

Noise is a crucial statistical data reflecting the operating state of gasoline engines. In the last century, experienced gasoline engine maintenance technicians judged and located the types and sources of gasoline engine faults by listening to the engine sounds. The noise of gasoline engines mainly comes from combustion noise and mechanical noise. Combustion noise is caused by the vibration generated by the rapid increase of pressure inside the cylinder of the gasoline engine during the ignition process. Mechanical noise is created by various impacts caused by the reciprocating motion of the piston and other moving parts. When a gasoline engine experiences a misfire fault, the pressure inside the cylinder changes differently from normal combustion, and the mechanical impacts also change. The sound emitted by the faulty gasoline engine is different from the sound emitted during normal operation. Gasoline engine sound signals are easy to collect, and using sound differences for gasoline engine fault diagnosis is a research direction pursued by many researchers [24,25,26]. The use of wavelet features of sound for gasoline engine fault diagnosis has also been applied [27].

Fan Xinhai utilized the noise signals from the exhaust port of a diesel engine to obtain the main frequency components by performing local mean decomposition for diagnosing diesel engine misfire faults [28]. Qin proposed a deep dual convolutional neural network with multi-domain inputs to extract multi-domain information from vibration signals, which is used for misfire fault diagnosis of diesel and gasoline engines under strong environmental noise and different working conditions [29].

The above fire fault research requires signal pretreatment, and the signal acquisition is complicated. To address this disadvantage, this paper designs a convolutional neural network to automatically extract fault features from gasoline engine sound signals, which achieves an end-to-end misfire fault diagnosis even with unseen noise data. The research on gasoline engine fault diagnosis methods based on sound signals is of great significance for realizing online real-time monitoring of gasoline engines. Sound recognition for engine misfire fault diagnosis does not require a data acquisition system or contact with the engine. The process of collecting sound signals is easy to implement, and accurately predicts and diagnoses misfire faults in gasoline engines.

The main contributions of this paper can be summarized as follows:

By directly using the microphone array of a laptop to collect the sound signals of vehicle engines, a relatively extensive dataset of gasoline engine misfire faults has been established.
Each frame of the sound signal was wavelet-transformed using Morlet. The transformed wavelet coefficients were converted into colors based on their magnitudes to create a time–frequency image.
The structure of the YOLOv8 network has been improved by adding the BiFormer and CBAM attention mechanisms to two C2f modules. Experimental results have demonstrated that the improved network based on YOLOv8 was effective in diagnosing misfire faults in gasoline engines.
The improved YOLOv8-CBBF network can be applied to devices such as the Internet of Things and unmanned vehicles.

The rest of this article is arranged as follows. Section 2 introduces how to obtain engine misfire data through the established experimental platform, and the method of converting a frame sound signal into a time–frequency diagram by using wavelet transformation is also presented. Section 3 provides details of designing the improved convolutional YOLOv8-CBBF network model. Then, in Section 4, the process of training and testing YOLOv8-CBBF is detailed, and the test result is analyzed. Finally, Section 5 provides conclusive comments.

2. Experimental Setup

2.1. Data Collection

The recording of normal and faulty sound signals of the gasoline engine was conducted at the Vehicle Laboratory of the Mechanical and Electrical Engineering College, Jinling Institute of Technology. The teaching experimental Volkswagen gasoline engine (turbocharged) setup used in the laboratory is manufactured by the experiment setup producer, Shanghai Fangchen Science and Education Equipment Manufacturing Co., Ltd. (Shanghai, China). The gasoline engine is a second-generation EA888 gasoline engine produced by Volkswagen (Figure 1). The engine displacement is 1798 mL, and it uses No.92 gasoline. It has a maximum output power of 118 Kw and has four cylinders with an ignition sequence of 1–3–4–2. A lack of fuel supply in the gasoline engine can lead to misfire faults within the cylinder. Additionally, if the ignition coil of the gasoline engine is disconnected or has insufficient electrical power, it may fail to ignite, also resulting in misfire faults within the cylinder. During the operation of the gasoline engine, a misfire fault within the cylinder can be simulated by unplugging the ignition coil of a cylinder. The Lenovo computer, equipped with a microphone array, is capable of receiving sound signals. The model of the computer is XiaoXinPro-13API, which was produced and launched in 2019. The computer has a main frequency of 2.1 GHz, and the central processing unit is an AMD Ryzen 5 3550H with Radeon Vega Mobile Gfx. The MO, MTM and S/N of computer is PFNXB9C23001, 81XD0001CD and PF1CX73S, respectively. During the sound signal acquisition process, the distance between the computer and the gasoline engine was 500 mm, with the height of the computer basically parallel to the cylinder head of the gasoline engine. The placement positions of the computer and the experimental platform are shown in Figure 1.

Taking into account various factors such as the performance of the computer acquisition equipment and the signal conditions of the gasoline engine sound, the sampling frequency for the acquisition of gasoline engine sound signals satisfies Shannon’s sampling theorem, as shown in Formula (1). In the formula,

f_{s}

represents the sampling frequency of the sound signal, and

f_{m a x}

represents the frequency of the highest frequency signal component in the gasoline engine sound signal. The frequency of the highest frequency signal in the gasoline engine sound signal is obtained through experimentation.

f_{s} \geq 2 f_{m a x}

(1)

During the experiment, it was found that the sound of the gasoline engine is primarily composed of low-frequency signals, and high-frequency signals exceeding 1 KHz have a minimal impact on fault identification. The sampling frequency and quantization bits of the sound signal determine the fidelity of the numerical sound signal. In order to balance the fidelity and data volume during the acquisition of gasoline engine sound signals, the sampling frequency of the sound signal was set to 8 KHz, which means

f_{m a x}

= 4 KHz,

f_{s}

= 8 KHz, and the quantization bits for each sampling point were set to 16 bits. The recorded sound signals were saved in .wav format. To reduce impurities and noise in the sound, a mono sound recorder was used.

Before the experiment, the gasoline engine was run for ten minutes to allow it to fully warm up. The acquisition channels, sampling frequency, sampling time, and other parameters of the data acquisition system were set. Sound signals were recorded separately under three conditions: normal state, one cylinder misfire, and simultaneous misfire of two cylinders. The gasoline engine was operated under no-load conditions. First, the sound signal at idle speed was recorded, and then the sound signal was recorded every time the engine speed increased by 500 rpm. For safety considerations, the maximum speed of the test gasoline engine was set to 3000 rpm. The recording time for the gasoline engine sound under each different condition was not less than 10 min, and the recording time for the gasoline engine sound under each condition is shown in Table 1. The idle speed of the gasoline engine is 960 rpm. After a misfire fault occurs in the gasoline engine, the actual speed will decrease (see Table 2). When a misfire occurs in a cylinder at idle speed, the engine speed becomes uncertain, varying between 500 rpm and 960 rpm. Taking the engine speed of 2500 rpm as an example, the sound signals of the gasoline engine under normal conditions, one-cylinder misfire, and two-cylinder misfire are shown in Figure 2. From the time-domain analysis of the sound alone, the sound signal of a one-cylinder misfire in the gasoline engine does not differ significantly from the sound signal under normal conditions, and the high-frequency signal is very ambiguous.

2.2. Signal Processing

Inevitably, the collected sound signals of the gasoline engine will be mixed with environmental noise signals. If the environmental noise varies greatly, the collected sound signals of gasoline engine faults need to be preprocessed with filtering [30]. However, in this study, the environmental noise remains basically consistent and has minimal impact on fault diagnosis. Therefore, to achieve real-time monitoring of misfire faults in gasoline engines, filtering to remove noise is not employed in this paper.

It is often difficult to distinguish detailed changes in the temporal domain signals of sound, while the frequency characteristics of sound signals make the equipment behave differently under fault conditions from normal operating states. The Fourier transform or wavelet transformation are commonly used to process vibration and sound digital signals. However, the Fourier transform has the drawback of being a full-time domain transformation without temporal locality. It means that the transformed spectrum lacks time information. To address this limitation, time window functions can be employed, as shown in Formulas (2) and (3). In these formulas,

f (t)

represents a non-periodic function,

F (j ω)

represents the frequency domain,

e^{- j ω t}

is a complex exponential function,

ω

is angular frequency,

g (t - μ)

is a window function, and

μ

is a temporal characteristic point.

F (ω, μ)

is the result of the windowed Fourier transform, representing the spectral density at time point

μ

and frequencyω.

e^{- j ω μ}

is the complex exponential basis function of the Fourier transform, where j is the imaginary unit and ω is the angular frequency. The windowed Fourier transform resolves the issue of temporal locality, but it has a single resolution. A narrow window results in a short signal within the window, leading to inaccurate frequency analysis and poor frequency resolution. Conversely, a wide window lacks precision in the time domain, resulting in low temporal resolution.

Wavelet transformation, on the other hand, exhibits excellent localization properties in both the time and frequency domains. It is a local transformation in both time and frequency, making it very effective in extracting different characteristic information from sound signals. Wavelet transformations with different resolutions allow for both an overview of the signal and detailed analysis, addressing issues such as refinement analysis that Fourier transform cannot handle. Wavelet analysis utilizes fine steps in either the time or frequency domain to analyze signals, overcoming the limitations of the Fourier transform’s inability to simultaneously achieve good time and frequency resolution, as seen in Formula (4). In this formula,

ψ (t)

represents the wavelet function, a is the scaling factor (smaller values correspond to higher frequencies; larger values to lower frequencies), and b is the translation parameter. This allows for adaptive adjustment of the time–frequency window. The wavelet transformation coefficient

{W T}_{f} (a, b)

represents the transform result of the signal

f (t)

at the scale a and translation b.

F (j ω) = \int_{- \infty}^{+ \infty} f (t) e^{- j ω t} d t

(2)

F (ω, μ) = \int_{- \infty}^{+ \infty} f (t) g (t - μ) e^{- j ω μ} d t

(3)

W T_{f} (a, b) = \frac{1}{\sqrt{a}} \int_{- \infty}^{+ \infty} f (t) ψ (\frac{t - b}{a}) d t

(4)

To adapt to the characteristics of computer signal processing, a short-term analysis within a finite time period is chosen. The over-10 min recorded sound signal of the gasoline engine is trimmed and divided into several segments. Each sound segment has a duration of 3.75 s and contains 30,000 data points, with each segment referred to as a “frame”. The signal of this frame is then subjected to wavelet transformation.

There are many types of wavelets in the wavelet family. The Morlet wavelet function has a narrow bandwidth in the frequency domain, resulting in high frequency resolution. It is able to accurately capture the amplitude variation characteristics of signals. Due to its widespread application in the field of signal analysis, the Morlet wavelet function was chosen for the research in this paper.

In order to take advantage of deep learning in image classification, some scholars have transformed one-dimensional vibration signals into two-dimensional images through certain transformations, and then used the images as the input of subsequent convolutional diagnostic networks, achieving high accuracy in fault diagnosis [31,32]. In view of this, each frame of the sound signal is transformed using the wavelet. The resulting wavelet coefficients are converted into colors based on their magnitudes and plotted on the time–frequency image. A larger wavelet coefficient indicates higher energy, resulting in a deeper color in the figure, as shown in Figure 3. Using the cwt() function from the pywt library to perform wavelet transformation, it takes 0.06 s to perform wavelet transformation on a frame of the signal when the scale parameter of the function is set to 64. By constructing spectral images and analyzing the frequency distribution and intensity of sound signals at different engine speeds, the sound patterns of engine misfire faults can be revealed. The constructed spectral images can be used to identify changes in the frequency characteristics of engine sound signals under fault conditions.

3. Method

3.1. Overview of YOLOv8

YOLOv8 was released by Ultralytics (Los Angeles, CA, USA) on 11 January 2023. It inherits the real-time characteristics of the YOLO series. YOLOv8 uses the C2f module to replace the C3 module, and the Split operation in the C2f module significantly improves the model’s performance. YOLOv8 also significantly improves accuracy through a deeper and more complex network structure and improved training techniques. It employs VFL Loss as the classification loss, along with DFL Loss and CIOU Loss as the regression losses, which perform exceptionally well in object detection and classification tasks. YOLOv8 enhances detection performance through multi-scale prediction techniques. Ultralytics positions the YOLOv8 open-source library as an algorithmic framework, rather than a specific algorithm, enabling YOLOv8 to be widely applied to various tasks such as object detection, image segmentation, pose estimation, classification, and more.

3.2. Improved YOLOv8

Q = X^{r} \times W^{q}

(5)

K = X^{r} \times W^{k}

(6)

K = X^{r} \times W^{v}

(7)

The YOLO algorithm is based on the Convolutional Neural Network (CNN), known for its fast classification and detection speed as well as high accuracy. Typically, the input layer of YOLO for image classification tasks consists of RGB color images. The feature extraction layer is composed of a series of convolutional and pooling layers that employ the ReLU activation function. After passing through the fully connected layer, the number of neurons in the output layer equals the number of classification categories, and the output result represents the probability distribution of the categories. Over the years, the algorithm has evolved, reaching the YOLOv8 version.

3.2.1. Add the BiFormer

The Vision Transformer with Bi-level Routing Attention (BiFormer) model exhibits remarkable characteristics in terms of bidirectional modeling, pre-training and fine-tuning, multi-layer self-attention mechanism, and query-adaptive attention [33]. The BiFormer module has been introduced into the YOLOv8 head. This integration is beneficial for improving the recognition and classification accuracy of the convolutional neural network model. The BiFormer can filter out the least relevant key–value pairs in larger image regions and then apply token-to-token attention to the preserved larger regions. The specific algorithm is outlined below.

The image is divided into $s^{2}$ block regions, with each block containing m pixels, a height of $\frac{H}{S}$ , and a width of $\frac{W}{S}$ , as shown in Figure 4 [33].
The correlation between every two blocks is calculated based on the query vector (Q) and key vector (K), as detailed in Formulas (5)–(7). After obtaining the relationship matrix and sparsifying it, the value vector (v) is computed, where $W^{q}, W^{k},$ and $W^{v}$ are three trainable parameter matrices. The block region j with the highest v value for block region i is selected.
The K and V values for the m × j pixels in block region j are calculated (lower routing in Figure 4), and the v value for the m pixels in block region i is computed (upper routing in Figure 4). These values are then fused with the lower routing K basis for calculation and, subsequently, the BiFormer attention vector is generated through further fusion with the lower routing K basis.

Although BiFormer exhibits excellent performance, experiments have revealed that it can lead to additional GPU kernel launch overhead and storage transactions. Therefore, it is important to be mindful of preventing memory crashes when using it. The backbone of the YOLOv8 network, which is responsible for feature extraction, has been modified by replacing the C2f module in Layer 2 with BiFormer to create a new convolutional network model, as shown in Table 3 and Figure 5.

3.2.2. Add the CBAM

The CBAM attention mechanism, which integrates spatial attention and channel attention mechanisms, helps enhance feature representations across different channels and extract critical information from various spatial locations, thereby strengthening the overall feature representation [34].

The channel attention mechanism adaptively selects and weights useful feature graph information mainly through global average pooling and fully connected structure, as shown in Figure 5A. The spatial attention mechanism adaptively selects and weights the useful spatial information of the feature map by calculating the corresponding weight of the feature of each spatial position of the feature map, as shown in Figure 5B. The input feature layer is first multiplied with the output weight of the channel attention mechanism. Then is multiplied with the output weight of the spatial attention mechanism. The final output feature is the addition of the CBAM attention mechanism characteristics, as shown in Figure 5C. The C2f module of layer 6 in the backbone is replaced with C2f_CBAM adding CBAM, as shown in Table 3 and Figure 5.

3.2.3. YOLOv8-CBBF

The improved YOLOv8 network structure that combines these two aspects differs from the original YOLOv8 as outlined in Table 3 and Figure 6, and this enhanced network is named YOLOv8-CBBF. The main difference between the improved network and the baseline network lies in the replacement of the C2f modules in Layer 2 and 6 of the YOLOv8 backbone with BiLevelRoutingAttention (BiFormer) and C2f_CBAM, respectively. The k in Figure 6 represents the Kernel. The s is abbreviation for the Stride. The meaning of the numbers between the two layers is both the output of the upper layer and the input of the next layer. The shortcut structure is adopted by C2f.

4. Experiments and Results

4.1. Experimental Environment and the Network Parameters

We used a JDG Lenovo Legion Blade 9000 K desktop computer running on Windows 10 to train and validate the convolutional neural network YOLOv8-CBBF; the computer was equipped with an Intel Core i7-14700KF CPU, an RTX 4080-16G dedicated graphics card, 192 GB of RAM, a Z790 motherboard, and a clock speed of 5.4 GHz. The programming of the convolutional neural network YOLOv8-CBBF was conducted in Python, utilizing the TensorFlow.keras framework. During the process of importing images for training, validation, and testing, normalization calculations were performed on the data, and label vectors were automatically generated during training.

The selection of appropriate network learning parameters has a significant impact on the diagnosis results of gasoline engine misfire faults. After repeated experiments and hyperparameter tuning, it was found that when the learning rate was set to 0.01 and the momentum was set to 0.937, the overall effect of gasoline engine misfire state recognition was the best. The weight decay was set to 0.0005. The values of these three parameters enabled the model to converge to the optimal solution quickly during the training process. The training epoch of the convolutional neural network was set to 100.

4.2. Dataset

A total of 3528 time–frequency spectrum images were obtained after wavelet transformation of gasoline engine sounds. All images were divided into three parts, which were used for training, validating, and testing the convolutional neural network, respectively. The distribution of image numbers is shown in Table 4.

To prevent overfitting during the training process and improve the generalization ability of the model, the distribution of data in the dataset in the experiment was kept as balanced as possible. In future research, data augmentation and unsupervised learning methods can be used to improve the adaptability of the network model to data distribution differences.

4.3. Ablation Experiments

Ablation experiments were conducted to understand the impact of various component modules on the overall performance of complex neural network systems. By altering certain parts of the YOLOv8 network system and observing the effects of these changes on the system’s performance, we could gain insights into the roles and importance of the various components and modifications within the YOLOv8 system. The network model that only adds the CBAM attention mechanism is named YOLOv8-CBAM, the model that only incorporates BiFormer is designated as YOLOv8-BF, and the model that incorporates both modules is designated as YOLOv8-CBBF.

Four ablation experiments were conducted under identical experimental conditions, and the results are presented in Table 5. The first experiment used the base model YOLOv8, which serves as a reference for evaluating the performance of the improved network models in identifying misfire faults. For misfire fault identification, the total number of parameters in the YOLOv8 model does not exceed 1.5 M, and the computational cost is relatively low. However, six sound segments of a one-cylinder misfire were missed, and two sound segments of a one-cylinder misfire were falsely identified as a two-cylinder misfire.

The second experiment used the YOLOv8-CBAM model. With minimal changes in the number of parameters, the accuracy improved by 0.76%, and the precision increased by 1.14%. Two sound segments of a one-cylinder misfire were missed. By adding an attention mechanism, the model pays more attention to complex feature learning, avoiding the loss of important information and enhancing the performance of gasoline engine misfire fault identification.

The third experiment used the YOLOv8-BF model. With a slight reduction in the number of parameters, the accuracy improved by 0.76%, and the precision increased by 1.14%. Here also, two sound segments of a one-cylinder misfire was missed. The BiFormer module, with its multi-layer attention mechanism, pays closer attention to subtle features, contributing to improved precision rates for gasoline engine misfire fault identification.

The fourth experiment used the YOLOv8-CBBF model. While the number of parameters remained unchanged, the computational cost during training increased significantly. The accuracy improved by 1.13%, and the precision increased by 2%. Only one sound segment of a one-cylinder misfire was missed. The improved performance of the YOLOv8-CBBF model in gasoline engine misfire fault identification can be attributed to the addition of two attention mechanism modules with different structures to the YOLOv8 base model. This enhancement not only facilitates the extraction of global features but also aids in the extraction of subtle features.

4.4. Result Analysis

The main metrics used to evaluate deep learning network models include four indicators: True Positive (TP), False Positive (FP), True Negative (TN), and False Negative (FN). TP represents the number of correct predictions for positive samples, while TN measures the number of correctly classified negative samples. FP is an indicator of misdiagnosis for gasoline engine misfire, reflecting the number of incorrectly predicted positive samples. Another metric for misdiagnosis of gasoline engine misfire is FN, which represents the number of samples incorrectly predicted as negative.

Accuracy reflects the proportion of correctly predicted samples to the total number of samples (see Formula (8)). Although accuracy can be used to determine the overall correct rate, it can become ineffective in cases of unbalanced samples. When it is preferable to miss a detection rather than make a wrong prediction, the precision index is used. Precision is also known as the positive predictive value. It serves as an assessment of the diagnostic capability of the convolutional neural network YOLOv8-CBBF for gasoline engine misfire, particularly in evaluating the robustness of YOLOv8-CBBF. The calculation formula for precision is shown in (9). Precision reflects the prediction of positive samples. Recall represents the proportion of actual positive samples that are correctly predicted as positive by the model. A high recall indicates a low probability of missing true positive samples by the model, and its calculation formula is shown in (10).

A c c u r a c y = \frac{T P + T N}{T P + T N + F P + F N}

(8)

P R E = \frac{T P}{T P + F P}

(9)

R e c a l l = \frac{T P}{T P + F N}

(10)

The YOLO series of algorithms has always attracted widespread attention. By comparing and analyzing the performance of different versions of YOLO algorithms in detecting engine misfire faults, we can understand the performance of typical YOLO algorithms in the application scenario of engine misfire fault identification, providing strong technical support for practical applications. YOLOv3, YOLOv5, and Scnn-M5 (a neural network designed by the author with five feature channel grouped convolutions) were selected for comparative analysis with the improved YOLOv8 algorithm model, YOLOv8-CBBF. The comparison results are summarized in Table 6. The confusion matrices for the four network models are presented in Table 7, where Z represents the normal state of the gasoline engine, Y represents the one-cylinder misfire state, and E represents the two-cylinder misfire state. The changes in training and validation losses, as well as training accuracy, for the YOLOv8-CBBF model are depicted in Figure 7A–C. The green line in these figures represents the actual changes in loss and accuracy during training, while the red dots represent virtual smoothing points.

For the YOLOv3 network, six sound segments of one cylinder misfire were detected as normal, and two sound segments of a one-cylinder misfire were falsely diagnosed as two-cylinder misfires. Both YOLOv5 and Scnn-M5 misidentified two sound segments of a one-cylinder misfire as normal, and each misdiagnosed two sound segments of a one-cylinder misfire as a two-cylinder misfire. Through analysis and careful listening, it was discovered that the same audio segments, which were relatively noisy and interfered with the diagnosis, were misdiagnosed in these four cases.

Compared to the other three networks, the YOLOv8-CBBF model exhibits improved accuracy and precision, resulting in optimal overall performance in identifying misfire faults of gasoline engines. The addition of the CBAM attention mechanism and the BiFormer attention mechanism enhances the ability to extract complex features and improves resistance to background noise interference, making it suitable for gasoline engine misfire fault identification tasks.

During the training process of the convolutional neural network YOLOv8-CBBF model on gasoline engine sound signal images, both training and validation losses converged rapidly to a smaller value. The accuracy of the YOLOv8-CBBF model in detecting misfire faults of gasoline engines rapidly increased to nearly 100%. After 50 epochs, the training and validation losses, as well as the model accuracy, remained stable for the YOLOv8-CBBF model. Although a longer training process could further optimize the performance of the YOLOv8-CBBF model in detecting misfire faults in gasoline engines, 50 epochs were sufficient for the training process.

5. Conclusions

For online misfire fault diagnosis in gasoline engines, the time consumed for detecting each audio segment is important for evaluating the detection performance of convolutional neural network models. Through testing, the time for wavelet transformation on an audio signal with a duration of 3.75 s and 30,000 sampling points is 0.06 s. The transformed wavelet coefficients are then converted into three-dimensional data and input into the YOLOv8-CBBF network model for detection, which takes approximately 0.02 s. The overall time for misfire detection on a frame of audio signal is less than 0.1 s, fully satisfying the real-time requirements for detecting and identifying misfire faults in gasoline engines. This shows that our proposed YOLOv8-CBBF model exhibits high industrial application value for misfire fault diagnosis in gasoline engines.

In this study, we focused on the sound emitted by gasoline engines used in vehicles. We first applied wavelet transformation to analyze the sound data and then created time–frequency images based on the wavelet-transformed results. To enhance the diagnostic capabilities, we designed an improved version of the YOLOv8 neural network by incorporating an attention mechanism for predicting misfire faults in gasoline engines. The entire network architecture encompasses convolutional operations, batch normalization, and ReLU activation functions. By carefully adjusting the network’s hyperparameters, we have optimized the YOLOv8-CBBF convolutional neural network to achieve remarkably high accuracy in diagnosing misfire faults.

The physical storage occupied by the weight coefficients of the convolutional neural network YOLOv8-CBBF does not exceed 3 MB, making it possible to achieve real-time high-precision identification of misfire faults in gasoline engines without high-end computing equipment. The YOLOv8-CBBF can accurately predict misfire faults in gasoline engines without high-quality audio signals, satisfying the requirements of mobile and embedded devices. Furthermore, YOLOv8-CBBF can be utilized to develop low-power IoT devices for misfire fault diagnosis in gasoline engines; YOLOv8-CBBF can also be applied to autonomous vehicles.

Future work will involve further studying the adaptive adjustment of all parameters in convolutional neural networks, exploring the practical engineering applications of the YOLOv8-CBBF model.

Author Contributions

Methodology, Z.L. and W.L.; software, X.L.; data curation, Z.L. and X.L.; writing, Z.L.; Article revision, Z.Q. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Project of National Natural Science Foundation of China. grant number [51775270].

Data Availability Statement

The data presented in this study are available in this article.

Acknowledgments

This work was supported by the Project of National Natural Science Foundation of China.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Firmino, J.L.; Neto, J.M.; Oliveira, A.G.; Silva, J.C.; Mishina, K.V.; Rodrigues, M.C. Misfire detection of an internal combustion engine based on vibration and acoustic analysis. J. Braz. Soc. Mech. Sci. Eng. 2021, 43, 336. [Google Scholar] [CrossRef]
Venkatesh, S.N.; Chakrapani, G.; Senapti, S.B.; Annamalai, K.; Elangovan, M.; Indira, V.; Sugumaran, V.; Mahamuni, V.S. Misfire Detection in Spark Ignition Engine Using Transfer Learning. Comput. Intell. Neurosci. 2022, 2022, 7606896. [Google Scholar] [CrossRef] [PubMed]
Wang, X.; Zhang, P.; Gao, W.; Li, Y.; Wang, Y.; Pang, H. Misfire Detection Using Crank Speed and Long Short-Term Memory Recurrent Neural Network. Energies 2022, 15, 300. [Google Scholar] [CrossRef]
Chen, Z.; Mauricio, A.; Li, W.; Gryllias, K. A deep learning method for bearing fault diagnosis based on cyclic spectral coherence and convolutional neural networks. Mech. Syst. Signal Process. 2020, 140, 106683. [Google Scholar] [CrossRef]
Zhang, Y.; Xing, K.; Bai, R.; Sun, D.; Meng, Z. An enhanced convolutional neural network for bearing fault diagnosis based on time–frequency image. Measurement 2020, 157, 107667. [Google Scholar] [CrossRef]
Janssens, O.; Slavkovikj, V.; Vervisch, B.; Stockman, K.; Loccufier, M.; Verstockt, S.; Van de Walle, R.; Van Hoecke, S. Convolutional neural network based fault detection for rotating machinery. Sound Vib. 2016, 377, 331–345. [Google Scholar] [CrossRef]
Zhao, X.; Guo, H. Fault diagnosis of rolling bearings using multi-feature fusion. Trans. Chin. Soc. Agric. Eng. (Trans. CSAE) 2023, 39, 80–88. [Google Scholar]
Zhang, P.; Gao, W.Z.; Gao, B.; Liu, Z. Misfire Detection of Diesel Engine Based on Artificial Neural Networks. J. Vib. Meas. Diagn. 2022, 40, 702–710. [Google Scholar]
Zhang, H.; Zhang, P.; Jiang, G. Fault Diagnosis of Diesel Engine Misfire Based on Frequency Domain Feature and Neural Network. Mech. Electr. Eng. Technol. 2022, 51, 250–253. [Google Scholar]
Shahid, S.M.; Ko, S.; Kwon, S. Real-time abnormality detection and classification in diesel engine operations with convolutional neural network. Expert Syst. Appl. 2022, 192, 116233. [Google Scholar] [CrossRef]
Zhang, P.; Gao, W.; Li, Y.; Wang, Y. Misfire detection of diesel engine based on convolutional neural networks. J. Automob. Eng. 2021, 235, 2148–2165. [Google Scholar] [CrossRef]
Terwilliger, A.M.; Siegel, J.E. Improving Misfire Fault Diagnosiswith Cascading Architectures via Acoustic Vehicle Characterization. Sensors 2022, 22, 7736. [Google Scholar] [CrossRef] [PubMed]
Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You Only Look Once: Unified, Real-Time Object Detection. Available online: http://pjreddie.com/yolo/ (accessed on 1 January 2024).
Wu, Y.; Ning, S.; Ren, Y.; Wang, Y. Fault Diagnosis Method of Rolling Bearings Based on G-YOLO Network. Noise Vib. Control. 2023, 43, 161–166. [Google Scholar]
Xuan, Z.; Zhang, W.; Song, H.; Qin, Z. Failure Detection of EMU Apron Board Based on YOLO Algorithm. J. Dalian Jiaotong Univ. 2023, 44, 97–102. [Google Scholar]
Sun, I.; Wang, L.; Ma, J.; Gao, W. Photovoltaic Module Fault Detection Based on Improved YOLOv5s Algorithm. Infrared Technol. 2023, 45, 202–208. [Google Scholar]
Zheng, W.; Yang, Y.; Qiao, M.; Lv, J. Thermal defect identification and diagnosis method for substation equipment based on improved YOLO and resnet. J. Chongqing Univ. Technol. (Nat.) 2023, 37, 261–269. [Google Scholar]
Gu, C.; Qiao, X.-Y.; Li, H.; Jin, Y. Misfire Fault DiagnosisMethod for Diesel Engine Based on MEMD and Dispersion Entropy. Shock. Vib. 2021, 2021, 9213697. [Google Scholar] [CrossRef]
Kumar, A.; Gandhi, C.P.; Zhou, Y.; Vashishtha, G.; Kumar, R.; Xiang, J. Improved CNN for the diagnosis of engine defects of 2-wheeler vehicle using wavelet synchro-squeezed transform (WSST). Knowl.-Based Syst. 2020, 208, 106453. [Google Scholar] [CrossRef]
Qin, C.; Jin, Y.; Zhang, Z.; Yu, H.; Tao, J.; Sun, H.; Liu, C. Anti-noise diesel engine misfire diagnosis using a multi-scale CNN-LSTM neural network with denoising module. CAAI Trans. Intell. Technol. 2023, 8, 963–986. [Google Scholar] [CrossRef]
Xu, X.; Liu, Z.; Wu, J.; Xing, J.; Wang, X. Misfire fault diagnosis of range extender based on harmonic analysis. Int. J. Automot. Technol. 2019, 20, 99–108. [Google Scholar] [CrossRef]
Liu, Z.; Wu, K.; Ding, Q.; Gu, J.X. Engine Misfire Diagnosis Based on the Torsional Vibration of the Flexible Coupling in a Diesel Generator Set: Simulation and Experiment. J. Vib. Eng. Technol. 2020, 8, 163–178. [Google Scholar] [CrossRef]
Syta, A.; Czarnigowski, J.; Jakliński, P. Detection of cylinder misfire in an aircraft engine using linear and non-linear signal analysis. Measurement 2021, 174, 108982. [Google Scholar] [CrossRef]
Zhou, J.; Zhu, Y.; Xiao, M.; Wu, J. Fault diagnosis of tractor diesel engine based on LWD-QPSO-SOMBP neural network. Trans. Chin. Soc. Agric. Eng. (Trans. CSAE) 2021, 37, 39–48. [Google Scholar]
Tra, V.; Kim, J.; Khan, S.A.; Kim, J.M. Bearing Fault Diagnosis under Variable Speed Using Convolutional Neural Networks and the Stochastic Diagonal Levenberg-Marquardt Algorithm. Sensors 2017, 17, 2834. [Google Scholar] [CrossRef] [PubMed]
Nie, H. Research on Machine Sound Fault Diagnosis Models with Deep Neural Networks. Master’s Thesis, Xiangtan University, Xiangtan, China, 2021. [Google Scholar]
Wang, Y.; Liu, N.; Guo, H.; Wang, X. An engine-fault-diagnosis system based on sound intensity analysis and wavelet packet pre-processing neural network. Eng. Appl. Artif. Intell. 2020, 94, 103765. [Google Scholar] [CrossRef]
Fan, X.; Yao, C.; Zeng, X. Misfire Fault Diagnosis Based on Local Mean Decomposition of Exhaust Noise. Chin. Intern. Combust. Engine Eng. 2013, 34, 38–41. [Google Scholar]
Qin, C.; Jin, Y.; Tao, J.; Xiao, D.; Yu, H.; Liu, C.; Shi, G.; Lei, J.; Liu, C. DTCNNMI: A deep twin convolutional neural networks with multi-domain inputs for strongly noisy diesel engine misfire detection. Measurement 2021, 180, 109548. [Google Scholar] [CrossRef]
El-Ghamry, M.H.; Reuben, R.L.; Steel, J.A. The development of automated pattern recognition and statistical feature isolation techniques for the diagnosis of reciprocating machinery faults using acoustic mission. Mech. Syst. Signal Process. Process. 2003, 17, 805–823. [Google Scholar] [CrossRef]
Liu, Y.; Dou, S.; Du, Y.; Wang, Z. Gearbox Fault Diagnosis Based on Gramian Angular Field and CSKD-ResNeXt. Electronics 2023, 12, 2475. [Google Scholar] [CrossRef]
Xie, F.; Wang, G.; Zhu, H.; Sun, E.; Fan, Q.; Wang, Y. Rolling Bearing Fault Diagnosis Based on SVD-GST Combined with Vision Transformer. Electronics 2023, 12, 3515. [Google Scholar] [CrossRef]
Zhu, L.; Wang, X.; Ke, Z.; Zhang, W.; Lau, R.W. BiFormer: Vision Transformer with Bi-Level Routing Attention. Available online: https://arxiv.org/abs/2303.08810 (accessed on 1 January 2024).
Woo, S.; Park, J.; Lee, J.Y.; Kweon, I.S.J. Lee CBAM: Convolutional Block Attention Module. arXiv 2018, arXiv:1807.06521. [Google Scholar]

Figure 1. Engine sound signal acquisition.

Figure 2. Engine sound signal. (a) normal; (b) one-cylinder misfire; (c) two-cylinder misfire.

Figure 3. Wavelet transformation time–frequency image. (a) normal; (b) one-cylinder misfire; (c) two-cylinder misfire.

Figure 4. Structure of BiFormer.

Figure 5. The overview of CBAM (Note: Pictures are from the Ref. [34]). (A) the structure of channel attention; (B) the structure of spatial attention; (C) the structure of CBAM attention.

Figure 6. Structural comparison of YOLOv8 and YOLOv8-CBBF. (A) Structure of YOLOv8; (B) Structure of YOLOv8-CBBF.

Figure 7. YOLOv8-CBBF training process. (A) Train Loss; (B) Validation Loss; (C) Train accuracy.

Table 1. Engine sound acquisition time.

Condition	960 (rpm)	1500 (rpm)	2000 (rpm)	2500 (rpm)	3000 (rpm)
Normal (second)	920	954	938	934	909
1 cylinder misfire (second)	1028	809	802	909	1061
2 cylinder misfire (second)	913	807	748	902	1046

Table 2. Normal speed and post-misfire speed.

Condition	Speed (rpm)
normal	960	1500	2000	2500	3000
1 cylinder misfire	960 (uncertainty)	1200	1240	1360	1800
2 cylinder misfire	860	1000	1060	1100	2280

Table 3. The defference of YOLOv8-CBBF and YOLOv8.

Module Number	YOLOv8-CBBF	YOLOv8
2	ultralytics.nn.modules.block.BiLevelRoutingAttention	ultralytics.nn.modules.block.C2f
6	ultralytics.nn.modules.block.C2f_CBAM	ultralytics.nn.modules.block.C2f

Table 4. Distribution of image numbers.

Item	Normal	One-Cylinder Misfire	Two-Cylinder Misfire	Total
train	825	830	821	2476
val	234	236	232	702
test	116	118	116	350
total	1175	1184	1169	3528

Table 5. Ablation experiment of the improved network.

Network Model	Accuracy (%)	PRE (%)	Parameters	FLOPs (G)
YOLOv8	97.71	97.71	1,442,131	3.4
YOLOv8-CBAM	99.43	99.44	1,444,277	3.4
YOLOv8-BF	99.43	99.44	1,439,315	60.5
YOLOv8-CBBF	99.8	99.71	1,440,535	60.7

Table 6. Comparison result of the four-network model.

Network Model	PRE (%)	Recall (%)	Accuracy (%)
YOLOv3	97.80	97.71	97.71
YOLOv5	98.87	98.87	98.86
Scnn-M5	98.87	98.87	98.86
YOLOv8-CBBF	99.71	99.71	99.8

Table 7. Confusion matrix of the neural network models.

Network Model		Predicted Value
		YOLOv3			YOLOv5			Scnn-M5			YOLOv8-CBBF
		Z	Y	E	Z	Y	E	Z	Y	E	Z	Y	E
True Value	Z	116	0	0	116	0	0	116	0	0	116	0	0
	Y	6	110	2	2	114	2	2	114	2	1	117	0
	E	0	0	116	0	0	116	0	0	116	0	0	116

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Li, Z.; Qin, Z.; Luo, W.; Ling, X. Gasoline Engine Misfire Fault Diagnosis Method Based on Improved YOLOv8. Electronics 2024, 13, 2688. https://doi.org/10.3390/electronics13142688

AMA Style

Li Z, Qin Z, Luo W, Ling X. Gasoline Engine Misfire Fault Diagnosis Method Based on Improved YOLOv8. Electronics. 2024; 13(14):2688. https://doi.org/10.3390/electronics13142688

Chicago/Turabian Style

Li, Zhichen, Zhao Qin, Weiping Luo, and Xiujun Ling. 2024. "Gasoline Engine Misfire Fault Diagnosis Method Based on Improved YOLOv8" Electronics 13, no. 14: 2688. https://doi.org/10.3390/electronics13142688

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu