Open AccessArticle

A Lightweight Cotton Verticillium Wilt Hazard Level Real-Time Assessment System Based on an Improved YOLOv10n Model

Juan Liao

^1,2,3,

Xinying He

¹,

Yexiong Liang

¹,

Hui Wang

¹,

Haoqiu Zeng

¹,

Xiwen Luo

^1,2,3,

Xiaomin Li

⁴,

Lei Zhang

⁵

He Xing

^6,* and

Ying Zang

^1,2,3,*

College of Engineering, South China Agricultural University, Guangzhou 510642, China

Key Laboratory of Key Technology on Agricultural Machine and Equipment (South China Agricultural University), Ministry of Education, Guangzhou 510642, China

Guangdong Provincial Key Laboratory of Agricultural Artificial Intelligence (GDKL-AAI), Guangzhou 510642, China

⁴

College of Mechanical and Electrical Engineering, Zhongkai University of Agriculture and Engineering, Guangzhou 510225, China

⁵

College of Agriculture, South China Agricultural University, Guangzhou 510642, China

⁶

School of Information Technology & Engineering, Guangzhou College of Commerce, Guangzhou 511363, China

Authors to whom correspondence should be addressed.

Agriculture 2024, 14(9), 1617; https://doi.org/10.3390/agriculture14091617

Submission received: 15 August 2024 / Revised: 8 September 2024 / Accepted: 13 September 2024 / Published: 14 September 2024

(This article belongs to the Section Digital Agriculture)

Download

Browse Figures

Figure 1
Overall flow of the CVW hazard level assessment. "> Figure 2
(a) Study area map; and (b) the ground photographs of the cotton plants. "> Figure 3
Examples of image samples of CVW: (a) cloudy; (b) sunny; (c) rainy; (d) dusk; and (e) night. "> Figure 4
Original image and enhanced images: (a) original image; (b) image after bending; (c) image after adjusting brightness; (d) image after adding Gaussian noise; and (e) image after adding peppercorn noise. "> Figure 5
GhostConv module structure. "> Figure 6
(a) W-MSA module and SW-MSA module; and (b) the Cyclic Shift operation of the SW-MSA. "> Figure 7
(a) Two successive Swin Transformer Blocks; and (b) the STC module structure. "> Figure 8
SE module structure. "> Figure 9
Data flow diagram of the CVW system. "> Figure 10
CVW hazard level assessment on-site detection and system interface. "> Figure 11
The examples of the CVW segmentation detection in different environments: (a) original images. From top to bottom: cloudy, sunny, rainy, dusk, nighttime images taken without flash, and nighttime images taken with flash; (b) segmentation results of YOLOv10n; and (c) segmentation results of YOLO-VW. Note: the black regions represent the background, the green regions represent healthy leaves, and the red regions represent lesions. The red boxes indicate areas where the leaf and lesion regions are over-segmented, while the yellow boxes highlight areas where the leaf or lesion regions are under-segmented. ">

Versions Notes

Abstract

Compared to traditional manual methods for assessing the cotton verticillium wilt (CVW) hazard level, utilizing deep learning models for foliage segmentation can significantly improve the evaluation accuracy. However, instance segmentation methods for images with complex backgrounds often suffer from low accuracy and delayed segmentation. To address this issue, an improved model, YOLO-VW, with high accuracy, high efficiency, and a light weight, was proposed for CVW hazard level assessment based on the YOLOv10n model. (1) It replaced conventional convolutions with the lightweight GhostConv, reducing the computational time. (2) The STC module based on the Swin Transformer enhanced the expression of foliage and disease spot boundary features, further reducing the model size. (3) It integrated a squeeze-and-excitation (SE) attention mechanism to suppress irrelevant background information. (4) It employed the stochastic gradient descent (SGD) optimizer to enhance the performance and shorten the detection time. The improved CVW severity assessment model was then deployed on a server, and a real-time detection application (APP) for CVW severity assessment was developed based on this model. The results indicated the following. (1) The YOLO-VW model achieved a mean average precision (mAP) of 89.2% and a frame per second (FPS) rate of 157.98 f/s in assessing CVW, representing improvements of 2.4% and 21.37 f/s over the original model, respectively. (2) The YOLO-VW model’s parameters and floating point operations per second (FLOPs) were 1.59 M and 7.8 G, respectively, compressed by 44% and 33.9% compared to the original YOLOv10n model. (3) After deploying the YOLO-VW model on a smartphone, the processing time for each image was 2.42 s, and the evaluation accuracy under various environmental conditions reached 85.5%, representing a 15% improvement compared to the original YOLOv10n model. Based on these findings, YOLO-VW meets the requirements for real-time detection, offering greater robustness, efficiency, and portability in practical applications. This model provides technical support for controlling CVW and developing cotton varieties resistant to verticillium wilt.

Keywords:

cotton verticillium wilt detection; hazard level assessment; lightweight network; improved YOLOv10 model

1. Introduction

Cotton is one of the most important cash crops all over the world. China plants an annual planting area and yield of about 3 million hm² and 6 million t of cotton, respectively, which account for more than 22% of the world [1]. Diseases and pest insects inevitably occur during the whole growth stage of cotton. Cotton verticillium wilt (CVW) is undoubtedly the most harmful disease in cotton production due to its widespread nature and difficulty to control, which causes severe damage to the yield and quality of cotton [2,3]. In China, approximately 30 hm² of cotton crops are affected by CVW, accounting for an annual loss of yield of 10–30% [4,5]. In order to prevent and control the spread of CVW, which causes economic losses, the cultivation of resistant varieties of CVW is currently the most economical, environmentally friendly, and effective way. Understanding the CVW infection status in different varieties is a critical part of the breeding process. However, the current CVW hazard level assessment methods depend mainly on manual investigation. Assessing hazard levels is not only time-consuming and laborious but is also influenced by subjective factors, leading to misjudgment. Therefore, exploring a CVW real-time assessment system with high accuracy and efficiency is of great importance.

Traditional image processing methods employed in the detection of diseases and pest identification usually extract features manually based on the color, shape, and texture of images [6,7,8]. These methods depend on manual experience to design features to determine the location of pests and diseases, making them strongly dependent on prior knowledge and exhibiting poor generalization ability. With the development of computer vision, deep learning has been widely applied in the recognition and classification of biotic and abiotic stresses in crops [9]. Chen et al. [10] proposed an improved bi-directional LSTM-based network to predict cotton pests and diseases. Latif et al. [11] used a deep learning architecture based on serially fused features and the best feature selection for cotton disease recognition, obtaining an identification accuracy of 98.8%. Alves et al. [12] designed a novel deep residual learning method for automatically classifying the most harmful cotton pests in field conditions in cotton plant images with an F-score of 0.98. Caldeira et al. [13] employed convolutional neural network models to identify lesions on cotton foliage, obtaining 89.2% accuracy. The results showed that compared with traditional approaches for the processing of images, the convolutional neural networks can be up to 25% more precise. Zhang et al. [14] used an improved deep residual network (C-ResNet50) to realize the damage grade of cotton spider mites, with an average accuracy of 98.10%.

The above DL-based frameworks can perform feature extraction and classification automatically, which provides an excellent advantage for automatic plant stress phenotyping. This advantage is significant for mobile devices in real-time plant stress detection. More and more research has focused on intelligent terminals for deploying DL models and providing smart applications/services. Many researchers have recently employed smartphones to develop automatic crop disease and pest classification. However, the DL model usually contains many convolutional layers and parameters, which results in time consumption and difficulty in realizing real-time detection on mobile terminals.

The rapid response ability is essential for real-time detection systems. The YOLO (You Only Look Once) series of algorithms, as one-stage algorithms, exhibit extremely fast response speeds, enabling efficient real-time detection [15,16]. Numerous studies have demonstrated the excellent performance of YOLO models in various complex real-time detection tasks [17,18,19]. For instance, Zhang et al. [20] proposed a real-time high-performance detection system based on the improved YOLOX, and it achieved 94.60% mAP on 5760 labeled cotton images. Bai et al. [21] presented a lightweight and efficient detection model, T-YOLO, for the accurate detection of tea nutrient buds, which achieved a mean average precision (mAP) of 84.1%. Xu et al. [22] utilized the YOLOv5s model to detect cotton leaves at different infection stages and successfully built a real-time recognition system on smartphones. These research findings indicate that the YOLO algorithm is highly suitable for tasks requiring rapid and efficient image processing in agricultural environments, aiding managers in making timely crop management decisions.

Addressing the limitations of the manual investigation of CVW and the imbalance between the number of parameters and the accuracy of existing models, which hinder real-time detection, this study introduces the YOLO-VW model. This is a real-time instance segmentation model for CVW based on YOLOv10n. The model incorporates GhostConv instead of traditional convolution to reduce the computational consumption. Additionally, the C2f module is replaced with a self-attention mechanism inspired by the Swin Transformer network, and an SE attention mechanism is added to further reduce the number of parameters while improving the accuracy. Lastly, stochastic gradient descent optimization is used to enhance the performance and shorten the detection times.

The main contributions of our work can be summarized as follows:

Using multiple improvement strategies to enhance the model performance, a novel CVW segmentation model was proposed to achieve precise segmentation of leaves and disease spots, which can lay the foundation for the subsequent assessment of CVW levels.
The performance of different instance segmentation models (YOLOv5s, YOLOv7-tiny, YOLOv8n, YOLOv9t, Mask R-CNN, and SOLOv2) was compared in terms of the CVW segmentation accuracy and lightweight.
The improved model has been deployed to a server to develop a mobile device-based application for real-time detection of CVW levels, providing technical support for CVW control and breeding in actual cotton production processes.

2. Materials and Methods

This study proposed a high-accuracy and lightweight YOLO-VW model and system to accurately assess the hazard level of CVW. Figure 1 shows the workflow methodology of the assessment system. The framework consists of three main steps: offline processing, model training, and online testing.

The offline processing includes image acquisition, image preprocessing, dataset partitioning, model training, and cloud model deployment. The model training includes modeling and debugging. The online testing phase uses the implemented system to predict CVW hazard levels.

2.1. Test Site Design

The images dataset used in this study was taken at a CVW test site in a village in Hebei Province, China (115°68′ N, 38°66′ E). The experiment was conducted from 15 August to 15 September 2023, when CVW was at a high incidence. The study area map and ground photographs of the cotton plants are shown in Figure 2.

2.2. Devices and Materials

The cotton variety used in the experiment was Cotton No. 36, which was cultivated by Hebei Agricultural University.

The mobile device used for image capture was a Redmi Note 12 smartphone with Turbo 5 G, featuring a second-generation Snapdragon 7+ processor, 12 GB of RAM, a camera resolution of 3472 × 4624 pixels, and running on the Android 13 operating system.

The data processing for this study was conducted using a personal PC and a Linux server provided by AutoDL. The specifications of the server for data processing are detailed in Table 1.

2.3. Cotton Verticillium Wilt Hazard Level Assessment Standard

Five levels of hazard (B₀—No damage, B₁—Light damage, B₂—Moderate Damage, B₃—Serious damage, and B₄—Extremely serious damage) were established based on the CVW investigation method outlined in the “Research Methods in Plant Pathology” for cotton. Disease hazard level assessment relies on image segmentation, with the accuracy of the assessment depending on the precision of segmenting disease spots on cotton foliage [23,24]. In this study, the hazard level was determined by calculating the ratio of disease spot area pixels to the entire foliage area pixels after segmenting the disease spots on the leaves [25]. The formula for calculating this ratio is shown in (1).

K = \frac{A_{v}}{A_{l}}

(1)

where

A_{v}

is the area of disease spots,

A_{l}

is the area of the entire area of a foliage, and

K

is the ratio of the disease spots area pixels accounting for the entire foliage area. According to the grading standards and formulas, the specific CVW hazard level assessment standard is shown in Table 2.

2.4. Image Dataset Collection

We set the shooting distance to 20 cm to ensure that the entire leaf was captured in the image without causing operational issues. This setup allowed us to assess the model’s performance under standardized conditions. In order to diversify the dataset, we captured cotton foliage images on sunny, rainy, and cloudy days, covering various scenarios such as single foliage, multiple foliage, different time periods, and more. In total, 1777 images were captured, out of which 200 were used for CVW hazard level assessment on smartphones. The remaining images were randomly divided into training and validation sets with an 8:2 ratio. The original cotton foliage images are shown in Figure 3.

2.5. Image Preprocessing

We utilized a combination of OpenCV’s maximum and minimum filtering operations to preprocess the images, along with background subtraction to eliminate shadows and enhance the overall image contrast. This method improved the visibility of the verticillium wilt symptoms, enabling the model to detect these lesions more accurately during training. Subsequently, all the images were adjusted to a resolution size of 640 × 640 pixels to ensure uniformity across the dataset. This standardization was important for maintaining consistency and improving the accuracy of the hazard level assessment during the training process. Additionally, the dataset was enhanced using an augmenter tool to facilitate better feature extraction from cotton foliage, thereby improving the model’s ability to generalize. The augmentation techniques employed included brightness adjustment, noise injection, and random deformation. Specifically, 10% Gaussian noise and peppercorn noise with a variance of 0.05 were added to the original images to increase the redundancy and reduce the image smoothness. Random bending was applied with control points within the range of (0, 0.1), and the brightness of the original images was reduced by multiplying their pixel values with a random factor between 0.5 and 0.6 to further enhance the robustness of the detection model. The effects of these augmentations are depicted in Figure 4.

Finally, cotton foliage and disease spots were manually labeled using an image data labeling software called LabelMe [26] with input from plant protection experts. This process resulted in the generation of JSON files corresponding to the image names. The training and validation sets used in this study consisted of a total of 1577 images, including 367 images of the B₀ leaf category, 554 images of the B₁ leaf category, 240 images of the B₂ leaf category, 218 images of the B₃ leaf category, and 198 images of the B₄ leaf category. After applying the aforementioned preprocessing steps, the distribution of the dataset is illustrated in Table 3.

2.6. Network Improvement

2.6.1. YOLO-VW Network

YOLOv10 [27] is the latest YOLO algorithm, which offers five different model sizes based on scaling factors to suit various scenarios and task requirements. In terms of the network structure, the YOLOv10n network consists of a backbone layer, neck layer, and head layer. The backbone network layer extracts image features, mainly consisting of CBS, C2f, SCDown, SPPF, and PSA. CBS can extract local spatial information, C2f fuses feature maps at different scales to improve the accuracy and robustness of the target detection, SCDown is a lightweight downsampling method, SPPF applies the max pooling several times during the fusion process to extract the maximum higher-order semantic features, and PSA can increase the receptive field of the network to improve the ability for feature extraction. The neck network is for multi-scale feature fusion of the feature map, mainly consisting of CBS, C2fCIB, UPSample, SCDown, and Concat. UPSample uses the nearest method, and the Concat module is used for splicing to enhance the feature fusion using information extracted from the backbone network. The head network is used to predict the bounding boxes and categories.

To meet the specific requirement for high-speed responsiveness in CVW detection, we selected YOLOv10n, a model with fewer parameters, to ensure real-time performance. However, when using YOLOv10n for CVW segmentation tasks, the inherent degradation of feature map information during image scaling resulted in the loss of critical boundary features of both the leaves and the lesions. This information loss hindered the model’s ability to accurately capture boundary details, leading to under-segmentation and over-segmentation issues. These segmentation errors significantly compromised the model’s segmentation accuracy, ultimately reducing the overall accuracy of the CVW risk assessment. To address these issues while achieving both a lightweight design and high precision, we made the following improvements to YOLOv10n and named the improved model YOLO-VW:

Applying the lightweight module GhostConv to replace traditional convolution encoded more information with fewer parameters, which meant that the model parameters and computation consumption could be reduced while maintaining the detection accuracy.
The novel STC module, consisting of the Swin Transformer Block and CBS, replaced the C2f module of YOLOv10n with the STC module, which improved the processing capability of the model, enhanced the expression of the boundary features of disease spot segmentation, and further reduced the size of the model.
The squeeze-and-excitation (SE) attention mechanism was introduced to improve the perception ability of the model for CVW by obtaining more disease characterization information without increasing the model complexity.
Improving the performance and shortening the detection times using the stochastic gradient descent optimizer.

2.6.2. GhostConv Module

The convolution can improve the accuracy of recognition, but it is time-consuming to compute. Traditional convolution calculates every channel of the feature map, leading to computation time consumption and a large model volume. The GhostConv [28] module utilizes redundant information in feature maps to generate new feature maps through a series of low-cost operations, thereby reducing the model size and ensuring great model accuracy simultaneously. Suppose the size of the input feature map was

h \times w \times c

, the size of the output feature map was

h^{'} \times w^{'} \times n

, and the size of the convolution kernel was

k \times k

, where h and w are the height and width of the input feature map, while h’ and w’ are the height and width of the output feature map, respectively. In this traditional convolution procedure, the required number of FLOPs can be calculated as (2).

{FLOP}_{1} = n \times h^{'} \times w^{'} \times c \times k \times k

(2)

The input feature map was split into two subsets (including the backbone and sub-branch) by the GhostConv module, and a new low-cost feature map was generated in the sub-branch to reduce the computation consumption. More specifically, the backbone of the GhostConv module was computed using the traditional convolution to generate smaller feature maps, and on the basis of the obtained feature maps, a new feature map was further generated in the sub-branch part by using the low-cost linear operations. Finally, the two sets of feature maps were fused to obtain the output features. The redundant information in the feature map can be effectively utilized to spread the computation to the backbone and sub-branches, thus reducing the computation time consumption and the model volume. During the implementation of the GhostConv, the required number of FLOPs was calculated by (3).

{FLOP}_{2} = \frac{n}{s} \cdot h^{'} \cdot w^{'} \cdot c \cdot k \cdot k + (s - 1) \cdot \frac{n}{s} \cdot h^{'} \cdot w^{'} \cdot d \cdot d

(3)

where

d \times d

is the size of the kernel of linear operations (which has a similar magnitude to that of

k \times k

), s is the number of cheap transformation operations times, and

s ≪ c

. The theoretical ratio of FLOPs for GhostConv to traditional convolution is calculated by (4).

r_{s} = \frac{{FLOP}_{1}}{{FLOP}_{2}} \approx s

(4)

where the computation time consumption by the GhostConv is about 1/s of the traditional convolution.

In this study, we introduced the GhostConv module to replace the convolution module. This change allowed us to generate more features using fewer parameters without altering the size of the output feature map. As a result, we significantly reduced the computational complexity while maintaining the accuracy of the model. This improvement enhanced the efficiency and accuracy of CVW detection while effectively compressing the model. The structure of the GhostConv module is shown in Figure 5.

2.6.3. STC Module

The Transformer is composed of an Encoder and a Decoder. The Encoder is mainly for feature extraction, and the Decoder is for generative tasks. However, due to its self-attention mechanism, the Transformer is a time-consuming algorithm because the attentional weights of each area should be calculated. The Swin Transformer [29] is a network model based on the Transformer. It was specifically designed for image processing, and it is mainly composed of the patch partition, linear embedding, Swin Transformer Blocks, and patch merging.

To reduce the computational consumption, the Swin Transformer Blocks introduce a window-based multi-head self-attention (W-MSA), as shown in Layer 1 of Figure 6a. The W-MSA divides the input image into non-overlapping local windows and restricts the attention calculation within each window, with each window consisting of a fixed number of patches (represented by the gray outlines in the figure). This division transforms the entire image into a collection of patches, where each patch can be processed as an independent input unit. Supposing each window contains M × M patches, the computational complexities of the global MSA module and the W-MSA module with an image window of h × w patches are (5) and (6).

Ω (MSA) = 4 {hwC}^{2} + 2 {(h w)}^{2} C

(5)

Ω (W-MSA) = {4 hwC}^{2} + {2 M}^{2} h w C

(6)

where Equation (5) is quadratic to patch number hw and Equation (6) is linear when M is fixed (set to 7 by default). Therefore, compared to the global MSA, the W-MSA significantly reduces the computational complexity. The computational complexity of computing self-attention within windows is much lower than the global self-attention mechanism.

To avoid information loss between non-overlapping windows, the Swin Transformer further introduces the shifted window mechanism (SW-MSA), as shown in Layer L + 1 of Figure 6a. However, during information transmission between different windows, the number of windows doubles. To address this issue, the SW-MSA incorporates the Cyclic Shift operation of the feature map, as shown in Figure 6b. This operation maintains the window count while ensuring that the final computational results remain equivalent.

In this study, the STC module consisted of three standard convolutions and a Swin Transformer Blocks module. It divided the input feature map into two parts, which were processed differently and then concatenated. The STC module structure is shown in Figure 7. The specific steps of STC module building are as follows:

First, the feature map obtained through the output of the previous level was input into two branches, respectively.
In branch one, the input feature maps were sequentially passed through a convolution operation and two successive connected Swin Transformer Blocks. The Conv module used a 1 × 1 filter, mainly used for data dimensionality reduction. The Swin Transformer Blocks used the SW-MSA module to capture the connections between distant pixels to improve the model’s ability to perceive foliage, disease spots, and their margins. Enhanced information flow and global dependency capture.
Branch two is the Conv module used to obtain a semantic representation of local features.
Finally, the small and large-sized features obtained from the two branches were combined to obtain a feature combination, which was then output through a 1 × 1 convolution, enhancing the network’s adaptability and nonlinear representation capabilities.

This structure improved the computational efficiency and feature representation capability of the network while guaranteeing the depth of the model so as to achieve better performance in the CVW segmentation task. Specifically, the STC module helped the network effectively utilize feature information at different scales to enhance the model’s perception of the input image. Due to the use of the SW-MSA operation, the computational complexity of the STC module is low, which makes the model lightweight.

2.6.4. SE Attention

The convolution operation improves the reception field by fusing more spatial features, but it ignores the fusion of inter-channel information. In this study, the YOLOv10n network was enhanced by integrating the SE module [30] to strengthen the relationship between inter-channel features. This approach prioritized inter-channel features while downplaying less important inner-channel features. By using weighted channel assignments, the extracted features were combined, contributing to an enhanced feature screening related to small regions of CVW, thus producing a more expressive feature map, which helped to avoid missing small spots by ignoring pixel imbalances when simultaneously extracting leaves and spots. The Attention Mechanism SE module is shown in Figure 8.

The SE module was realized in three steps, including squeeze, excitation, and scale. Before the squeeze, each layer of the input feature map

X

was transformed into feature map

U

through a transformation convolution operation. The conversion equation is shown in (7).

U_{c} = V_{c} * X = \sum_{s = 1}^{c ’} V_{c}^{s} * X^{s}

(7)

where

X \in H^{'} \times W^{'} \times C^{'}

U \in H \times W \times C

V = [v_{1}, v_{2}, \dots, v_{c}]

indicate a learned set of filter kernels,

V

is the filter parameter, and

V_{c}^{s}

is a 2D spatial kernel. The above operations aimed to change the number of channels from

C^{'}

C

, and to provide appropriate feature maps for subsequent operations. The squeeze operation then performed a global average pooling of

U

, and compressed each 2D feature channel into an actual number with a global receptive field. Then, the numbers of input and output channels were matched, which means that the global distribution would be responded to by the channel characteristically and eventually yield a compressed channel descriptor. The formula is shown in (8).

z_{c} = F_{s q} (u_{c}) = \frac{1}{(H \times W)} \sum_{i = 1}^{H} \sum_{j = 1}^{W} u_{c} (i, j)

(8)

where

z

is the statistical magnitude, generated from the average of the channel feature map

u

, and an output of a vector with the size of

1 \times 1 \times C

was generated after the squeeze operation of feature map U. In the excitation operation, a gate mechanism consisting of two fully connected layers was used to predict the importance of each channel and obtain the weights of the different channels for the feature map. A nonlinear transformation has been performed after the squeeze operation. SERatio is a scaling parameter that reduces the number of channels and thus reduces the computational effort. The formula for the gate control unit

S

is shown in (9).

S = F_{e x} (z, W) = σ (g (z, W)) = σ (W_{2} δ (W_{1} z))

(9)

where

δ

is the RELU activation function,

σ

is the sigmoid activation function, and

W_{1}

and

W_{2}

represent the weight matrices of the two fully connected layers. Finally,

S

was multiplied with the original feature map to complete the recalibration of the original features in the channel dimension, and the final output

X^{'}

of the SE module was obtained. The formula is expressed as (10).

x_{c}^{'} = F_{s c a l e} (u_{c}, s_{c}) = s_{c} u_{c}

(10)

The SE attention enables the network to pay more attention to the foliage characteristics of healthy and verticillium wilt regions in generating feature maps, effectively suppressing the background information of cotton foliage and solving the problem of the poor recognition accuracy of the model.

2.6.5. SGD Optimizer

Backpropagation is the key process when training the convolutional neural network. The loss function is for the deviation estimation between the predicted and actual values of the model based on the weights and offsets in the neural network, and it should be minimized during the training process. Optimization algorithms were used to optimize the model parameters to minimize the deviation between the predicted and actual results. Backpropagation through the optimization algorithm was used to make the loss function gradually converge and obtain the best fit. SGD is a gradient-based optimization algorithm with strong adaptability and can effectively prevent the emergence of local minima. The idea is to update the weights in the opposite direction to the gradient and continuously adjust the model weights to minimize the loss function. Additionally, SGD incurred lower memory and computational overhead compared to Adam and AdamW, as it did not require maintaining additional momentum and squared gradient history. For scenarios with limited memory and computational resources, SGD is a more economical choice. The formula is shown in (11).

θ = θ - η \cdot \nabla_{θ} J (θ, x_{i}, y_{i})

(11)

where

θ

is a parameter of the model,

η

is the learning rate,

\nabla_{θ} J (θ)

is the derivative of the loss function with respect to the parameter

θ

x_{i}

represents the input features of a sample randomly selected from the training dataset, and

y_{i}

is the true label or target value corresponding to

x_{i}

2.7. Model Deployment

The CVW hazard level assessment application developed in this study is an end-to-end system designed to provide an efficient and accurate tool for evaluating CVW. The system consists of three core components: terminal device, cloud server, and the YOLO-VW model.

The terminal device refers to the user’s device, primarily an Android smartphone running the application in this study. Its main functions include acting as an interface between the user and the cloud server, handling image capture, uploading, and displaying the assessment results. The cloud server hosts a lightweight web application based on the Python Flask framework, which processes requests from the terminal device and provides the necessary computational resources. The YOLO-VW model is the core model responsible for evaluating the CVW severity, deployed on the cloud server, and performs image inference and processing. The system workflow is as follows:

The CVW segmentation algorithm is used to train the dataset, and the trained YOLO-VW model is deployed on the cloud server.
The user triggers the smartphone camera via the Android application to capture an image of the cotton leaf or selects an existing image from the gallery. The test image is then uploaded to the cloud server.
The cloud server runs the Flask application, processes the request from the Android device, and invokes the YOLO-VW model.
The YOLO-VW model performs inference on the received image, first segmenting the cotton leaf and disease areas, then assessing the severity of CVW based on the ratio of diseased area to leaf area. The model outputs the assessment result, including the marked disease areas, confidence scores, predicted bounding boxes, and hazard level.
Once the cloud server completes the processing, it returns the results to the terminal device, where the user can view the assessment in real time. The results are presented visually, with annotated images and relevant assessment information.

Figure 9 presents a data flow diagram of the system, clearly illustrating the stages of information processing, from image input to result output, and the data transformation at each stage.

2.8. Parameter Setup and Evaluation Indicators

The initial learning rate was set to 0.01, and the momentum and weight decay parameters were set to 0.937 and 0.0005, respectively. The input image size was 640 × 640 pixels. The epochs were set to 500, and the batch size was set to 32. In addition, 16 threads were utilized to read data to minimize the IO time. Table 4 shows the evaluation indices in this study used to compare the detection effects.

Here,

T P

is the number of correctly classified pixels,

F P

is the number of pixels in the negative class that are correctly classified, and

F N

is the number of pixels that are misclassified as negative. AP is the area under the precision recall curve (P-R curve), and mAP is the average AP for different categories. [email protected] is the mAP at an IOU threshold of 0.5.

C

is the number of categories. This study mainly identifies foliage and disease spot; therefore,

C

was set to two.

C_{i}

and

C_{o}

are the number of input and output channels,

K_{w}

and

K_{h}

are the convolution kernel width and height, while

W

and

H

are the feature map width and height.

In the model design and training phase, our primary focus was on balancing model weight and accuracy. To achieve this, we employed a comprehensive set of metrics, including the mAP, model size, and F1 score, to assess the strengths and weaknesses of different strategies. These metrics allowed us to make informed and well-rounded decisions during the design stage. However, in the practical application phase, we also considered the needs and expectations of the end-users. Users are generally more concerned with the model’s ability to complete tasks accurately and reliably. The ACC (accuracy), as a simple and intuitive metric, clearly demonstrates the model’s real-world performance in assessing the severity of cotton verticillium wilt, and thus better meets user needs. The calculation formula for the ACC is as in (12).

A C C = \frac{N^{'}}{N}

(12)

where

N^{'}

and

N

represent the amount of correctly classified foliage and the total amount for assessment, respectively, and two hundred leaves were selected for the effect assessment of different models.

3. Results and Discussion

3.1. Ablation Experiment

In order to verify the effectiveness of several improvement modules proposed in this paper, we compared and analyzed each module separately in terms of the precision, recall, F1, mAP, number of parameters, floating point operations per second (FLOPs), and weight through ablation experiments in this section.

3.1.1. Impact of Convolutional Mode

To evaluate the performance of different convolution methods, we investigated the impact of using traditional convolution (Conv) versus GhostConv on the performance of YOLOv10n. Furthermore, to explore the impact of different replacement positions of convolutions on the performance, we experimented with three different replacement strategies: A, B and C. In the GhostConv-A strategy, all the traditional convolutions in the model were replaced with GhostConv. In the GhostConv-B strategy, all the convolutions were replaced with GhostConv except for the first convolution layer. In the GhostConv-C strategy, all the convolutions were replaced with GhostConv except for the first and second convolution layers. The experimental results are presented in Table 5.

The results indicate that the model incorporating GhostConv significantly outperforms the YOLOv10n model that uses traditional convolutions. GhostConv operates by dividing the input features into multiple subsets, applying convolution to each subset individually, and then summing the feature maps to generate the output. Since each subset is processed independently, the model can be easily parallelized, significantly reducing the number of parameters and computational complexity. When all the traditional convolutions, except for the first layer, were replaced with GhostConv, the model’s parameters and FLOPs were reduced by 0.63 M and 1.1 G, respectively. The application of GhostConv made the model more lightweight without negatively impacting its performance, with the F1 and mAP(M) scores increasing by 1.85% and 1.2%, respectively.

However, introducing too many or too few GhostConv layers does not necessarily lead to better performance. Both Strategy A and Strategy C somewhat hindered the network performance. Although the model size under Strategy A was 0.01 M smaller than that under Strategy B, indicating a more lightweight model, the other performance metrics of Strategy A were inferior to those of Strategy B. This may be because the first convolutional layer typically has a large receptive field and a high computational load due to its direct interaction with high-resolution raw input images. The sparse feature representation of GhostConv may not fully capture these rich details, resulting in insufficient feature representation. Therefore, maintaining the computational capacity of the first layer is crucial for the subsequent feature extraction and overall model performance. This partly explains why many researchers have retained the traditional convolution layer in the first layer when introducing GhostConv into YOLO [31,32,33].

Moreover, the superior performance of Strategy B over Strategy C could be attributed to the fact that, at higher layers, as the resolution of the feature maps decreases and the information becomes more abstract, GhostConv more effectively reduces the computational load without significantly impacting the performance. Hence, in this scenario, retaining the first convolutional layer while replacing other layers with GhostConv achieves an optimal balance between model efficiency and performance.

3.1.2. Comparative Experiments on Attentional Mechanisms

To verify the impact of different attention mechanisms on the model improvement, we compared the baseline model YOLOv10n with models that incorporate the STC module (which includes shifted window multi-head self-attention) and models that further include hybrid attention mechanisms such as CBAM, CoorAtt, GAM, and SE after the STC module. The results are shown in Table 6.

From the table, it can be observed that with the improvements brought about by the STC module, the F1 score of the baseline model improved by 1.12%. This performance enhancement was mainly attributed to the STC structure’s effective utilization of multi-scale feature information, which enhanced the model’s perceptual ability toward the input image. Additionally, the self-attention mechanism within the STC module strengthened the relationships between cotton leaves, lesions, and the background, achieving better global associations. The parameters and FLOPs were reduced by 1.26 M and 3.8 G, respectively. Building on the STC module, we further introduced the SE, CBAM, CoorAtt, and GAM attention mechanisms to study the impact of hybrid attention mechanisms on the model performance. By incorporating the SE attention mechanism, the F1 and mAP(M) scores increased by 0.85% and 0.8%, respectively, and the model size decreased by 0.44 MB. The CBAM and GAM attention mechanisms showed a decrease in the model segmentation ability, while the CoorAtt attention mechanism provided a slight improvement in the F1 score but was not as effective as the SE mechanism. The SE attention mechanism outperformed the other attention mechanisms. Combining the global self-attention information modeling method of the SW-MSA in the STC with the efficient reweighting transformation method of SENet retained the global feature information brought about by the self-attention mechanism and incorporated the low computational cost characteristics of the SE module’s global context module. This combination improved the model’s instance segmentation capability.

3.1.3. Impact of Different Optimizers on Model Effectiveness

To select the most suitable optimizer, we conducted comparative experiments on the improved model structure, as detailed in Table 7. The AdamW optimizer exhibited the poorest performance, with an F1 score of 86.95. The SGD optimizer achieved the highest F1 and mAP(M) scores, standing at 88.89% and 89.2%, respectively, outperforming the other optimizers. SGD updates the model weights incrementally using only a small batch of samples, which enhances the robustness of the model during the learning process. Its updates involve a fixed learning rate and a simple momentum-based process, enabling the model to find the global optimum more effectively. In contrast, Adam and AdamW use adaptive learning rates and momentum, which can cause the model to become stuck at local optima.

3.1.4. Ablation Experiment Analysis of Improved Module

This section provides a summary of the impact of different improvement modules on the baseline YOLOv10n model. Ablation experiments were conducted on different improved modules with default hyperparameters and configurations on the homemade dataset in this study. The GhostConv module, STC module, SE module, and SGD optimizer were sequentially integrated into the YOLOv10n framework; the mark “✓“ in the table indicates the use of this module. The evaluation was based on seven metrics: precision, recall, F1, mAP(M), weight, parameters, and FLOPs. The results of the experiments are shown in Table 8.

As shown in the table, the various metrics change with the addition of different modules. In Exp 2, replacing the Conv module with the GhostConv module improved the model’s flexibility and expressiveness, resulting in a 1.85% and 1.2% improvement in the F1 and mAP(M) score, respectively. The model size, parameters, and FLOPs were also reduced by 1.16 MB, 0.63 M, and 1.1 G, respectively. In Exp 3, the redundant C2f module was replaced with the STC module constructed from Swin Transformer Blocks to reduce the network size. The results indicated that, with an acceptable loss of accuracy, the numbers of parameters and FLOPs were reduced by 0.66 M and 3 G, respectively. In Exp 4, the SE attention mechanism was introduced, leading to improvements in the F1 and mAP scores of 1.98% and 1.1%, respectively, compensating for the accuracy loss observed in Exp 3. Finally, the application of the SGD optimizer further enhanced the F1 and mAP(M) scores by 1.11% and 0.9%, respectively, thereby improving the evaluation accuracy of CVW.

The SE module assigns different weights for different channels, increasing the weights of features such as disease spot color and foliage edges. This enhances the model’s ability to differentiate small disease spots and healthy foliage, providing more comprehensive disease feature information without increasing the model complexity. This finding aligns with the work of Chen et al. [34] and is significant for improving the model performance in lightweight networks. In conclusion, each module and method proposed in this study effectively enhanced the performance of the YOLO-VW model, ultimately improving the CVW hazard level assessment.

3.2. Comparative Experimental Analysis of Different Models

In order to assess the effectiveness of the improved model, we compared its performance with other classic instance segmentation models such as YOLOv5s, YOLOv7-tiny, YOLOv8n, YOLOv9t, YOLOv10n, Mask R-CNN, and SOLOv2 in the same experimental setting with the same parameters using the image dataset constructed in this study. Indices including the precision, recall, F1, mAP, number of parameters, floating point operations per second (FLOPs), and weight were used to evaluate the model effects. Furthermore, we use the frame per second (FPS) to approximate the real-time capability of the segmentation model. The indices’ results are shown in Table 9.

Analysis of the data in the table reveals that YOLO-VW demonstrates significant advantages in terms of the accuracy, lightweight design, and speed. Table 8 shows that YOLO-VW has notable benefits in the following areas:

Accuracy: YOLO-VW achieved a segmentation accuracy mAP(M) of 89.2%, which represents improvements of 3.9%, 2.9%, 4.2%, 2.9%, 12.8%, 1.9%, and 2.4% compared to YOLOv5s, YOLOv7-tiny, YOLOv8n, YOLOv9t, SOLOv2, Mask R-CNN, and the baseline model YOLOv10n, respectively.
Lightweight Design: In terms of the weight, parameters, and FLOPs, YOLO-VW was compressed to 25.6%, 21.5%, and 30.4% of YOLOv5s; 29.5%, 24.8%, and 33.9% of YOLOv7-tiny; 56.6%, 44.2%, and 65% of YOLOv8n; 2%, 3.4%, and 4% of SOLOv2; and 2.1%, 3.6%, and 3.8% of Mask R-CNN, respectively. Furthermore, compared to the YOLOv10n baseline model, YOLO-VW achieved compression ratios of 64.4%, 56%, and 66.1% of the original size across these three metrics.
Detection Speed: YOLO-VW exhibited excellent performance, achieving 157.98 FPS, an improvement of 21.37 FPS over the original model. This reduction in the computation time is attributable to the improvements in the lightweight modules.

As a result, the YOLO-VW model proved to be the most effective in terms of the CVW hazard level assessment and was selected for developing a real-time CVW hazard level detection system for smartphones.

3.3. Performance Evaluation and Demonstration

The CVW hazard level assessment on-site detection and app interface is depicted in Figure 10. The interface includes the result visualization unit, picture selection, and detection buttons. Upon entering the CVW evaluation system interface, the user can click “PICTURE” to select a method for obtaining the leaf image to be evaluated, such as taking a photo with the phone’s camera or selecting an image from the gallery. The user then clicks the “DETECT” button, and the system uploads the image to the cloud server. The cloud server automatically triggers the model for image detection and, upon completion, returns the results to the client. Subsequently, the result visualization unit will display the image segmentation and the classification results of the verticillium wilt hazard level.

Figure 11a–c display the visual results of the segmentation models YOLOv10n and YOLO-VW. These images demonstrate how the models perform in different environments. In the images, the black regions represent the background, the green regions represent healthy leaves, and the red regions represent lesions. The red boxes indicate areas where the leaf and lesion regions are over-segmented, while the yellow boxes highlight areas where the leaf or lesion regions are under-segmented. As shown in the figures, the segmentation results of YOLO-VW closely match the original images, while YOLOv10n exhibits misclassifications. In some areas with lighter lesion colors, YOLOv10n struggles to distinguish the blurred lesion boundaries, resulting in varying degrees of under-segmentation. The application of the SE module focuses more on the inter-channel features, enhancing the recognition of fine-grained lesion areas with indistinct features and effectively mitigating the under-segmentation of lesions. Due to the lack of global modeling capability, some background areas are misclassified as leaves, and other leaf regions are subject to interference-induced mis-segmentations. The application of the STC module enhances the ability to extract both global and local information from the model, thereby reducing the proportion of erroneous segmentation and effectively improving the segmentation accuracy.

To verify the feasibility of the system in practical applications, we conducted a comparative assessment of the CVW foliage image hazard levels in six different environmental conditions: cloudy, sunny, rainy, dusk, nighttime without flash, and nighttime with flash. The accuracy assessment of the model was based on mask images generated from manually annotated images, guided by plant protection experts. The standard used was within ±3% of the percentage of lesions in the leaf area of the actual mask image. Two hundred test images were used, and the evaluation results are shown in Table 10.

In cloudy, sunny, and rainy conditions, the detection accuracy of YOLO-VW exceeded 90%. However, the accuracy of CVW detection was lower at dusk and nighttime. The possible reasons include the following. (1) Insufficient lighting. At dusk and nighttime, the faint lighting conditions make the leaf color, shape, and texture feature less prominent, affecting the accuracy of the leaf and disease spot segmentation. (2) Images taken with a flash may have strong reflections and highlight areas. These overexposed regions lose detail, impacting the model’s detection accuracy. (3) The flash creates noticeable shadows and uneven lighting in images, increasing the complexity and altering the color and texture features. (4) Insufficient training data. There are fewer training images of cotton foliage taken with a flash, resulting in the model’s weaker performance on such images. Overall, the mobile detection ACC of the YOLOv10n model was 70.5%, while that of the YOLO-VW model was 85.5%. The improved model demonstrated significantly higher accuracy and robustness compared to the YOLOv10n model, better adapting to the different environmental conditions.

To further demonstrate the system’s stability and reliability under different conditions, we conducted variance and error analysis on the actual values and the measured results of the test images. These analyses helped identify performance differences across various environments and pinpoint the key factors affecting the system accuracy. The analysis results are presented in Table 11.

The variance analysis revealed no significant differences between the actual values and the YOLO-VW-based detection results across the six environmental conditions, indicating that the system’s inherent error does not significantly impact the measurements. This demonstrates the system’s ability to consistently assess the severity of CVW under different conditions with high stability. Through the analysis of the relative and absolute errors, we found that the mean absolute error under the six different conditions ranged from 0.003 to 0.025, and the mean relative error varied between 1.6% and 13.9%. This indicates that the system demonstrates overall high detection accuracy, although its precision is influenced by environmental variations. According to the analysis, the degree of environmental impact on the performance follows this order: sunny < rainy < night without flash < night with flash < dusk < cloudy. To ensure high accuracy and stability in real-world applications, we recommend users capture images in sunny conditions and avoid photographing at dusk or during cloudy weather. Additionally, users should minimize obstructions during image capture to achieve optimal detection results.

Furthermore, to verify whether the CVW hazard assessment system is capable of real-time detection, we conducted a processing time test on 200 images, recording the total time required for the model inference and network latency for each image. The results indicated an average detection time of 2.42 s per image, confirming the system’s real-time detection capability.

In summary, these findings demonstrate that the CVW hazard assessment application is reliable and suitable for real-time field surveys of CVW severity.

4. Conclusions

Due to the unclear boundary contours of CVW spots and the complex background of leaves, deep learning models are prone to problems such as mis-segmentation, over-segmentation, segmentation boundary errors, and too many parameters, which leads to difficulty in ensuring a lightweight and high accuracy simultaneously. With the aim of solving these problems, in this study, a CVW hazard level assessment system based on improved YOLOv10n was proposed:

An improved YOLO-VW model, incorporating improved modules such as STC, GhostConv, SE, and SGD, demonstrated improved detection accuracy while reducing the model parameters and computation.
The results showed that the F1 and mAP(M) of the YOLO-VW model were 88.89% and 89.29%, which were increased by 3.91%, and 2.4%, respectively, compared with the YOLOv10n model. The numbers of parameters and FLOPs were also reduced by 1.59 M and 7.8G, respectively. Compared with the YOLOv5s, YOLOv7-tiny, YOLOv8n, YOLOv9t, SOLOv2, and Mask R-CNN models, the YOLO-VW model obtained the greatest accuracy in CVW segmentation with the smallest model size and the most minor parameters.
The lightweight CVW hazard level assessment system was deployed in a client-server platform, with an Android smartphone app developed for testing the YOLO-VW and YOLOv10n models; the YOLO-VW model showed a processing time of 2.42 s per image and an accuracy of 85.5%, which was 15% higher than that of the YOLOv10n model.

The above results proved the effectiveness of the developed system in assessing CVW hazard levels; this system reduced the computational costs and enabled efficient and accurate real-time assessment of the CVW hazard levels in practical applications. Above all, this study provides a more effective solution for cotton protection and guarantees the quality and output of cotton production.

Author Contributions

Conceptualization, J.L. and X.H.; methodology, J.L., X.H., X.L. (Xiaomin Li) and L.Z.; software, J.L. and X.H.; validation, J.L., X.H., Y.L. and L.Z.; formal analysis, J.L. and X.H. and H.Z.; investigation, J.L., X.H., Y.L. and H.W.; resources, J.L.; data curation, J.L. and X.H.; writing—original draft preparation, J.L. and X.H.; writing—review and editing, J.L., X.L. (Xiaomin Li), Y.Z. and H.X.; visualization, J.L. and X.H.; supervision, J.L., X.L. (Xiwen Luo), Y.Z. and H.X.; project administration, J.L.; funding acquisition, J.L. and X.L. (Xiwen Luo). All authors have read and agreed to the published version of the manuscript.

Funding

This research was financially supported by the Key Research and Development Project of China (Grant No. 2022YFD2002400), the National Natural Science Foundation of China (Grant No. 31901401), the Key Scientific and Technological Projects in Key Areas of Corps (Grant No. 2023AB014), the Guangdong Provincial Science and Technology Plan Project (2017B090907031), and the Earmarked Fund for CARS (CARS-15-22).

Institutional Review Board Statement

Not applicable.

Data Availability Statement

The data presented in this study are available on request from the corresponding author. The data are not publicly available due to possible further research.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Dong, Y.; Fu, Z.; Stankovski, S.; Peng, Y.; Li, X. A cotton disease diagnosis method using a combined algorithm of case-based reasoning and fuzzy logic. Comput. J. 2020, 64, 155–168. [Google Scholar] [CrossRef]
Bardak, A.; Çelik, S.; Erdoğan, O.; Ekinci, R.; Dumlupinar, Z. Association mapping of verticillium wilt disease in a worldwide collection of cotton (Gossypium hirsutum L.). Plants 2021, 10, 306. [Google Scholar] [CrossRef] [PubMed]
Shaban, M.; Miao, Y.; Ullah, A.; Khan, A.Q.; Menghwar, H.; Khan, A.H.; Ahmed, M.M.; Tabassum, M.A.; Zhu, L. Physiological and molecular mechanism of defense in cotton against verticillium dahliae. Plant Physiol. Biochem. 2018, 125, 193–204. [Google Scholar] [CrossRef] [PubMed]
Bibi, N.; Fan, K.; Dawood, M.; Nawaz, G.; Yuan, S.; Xuede, W. Exogenous application of epibrassinolide attenuated verticillium wilt in upland cotton by modulating the carbohydrates metabolism, plasma membrane atpases and intracellular osmolytes. Plant Growth Regul. 2013, 73, 155–164. [Google Scholar] [CrossRef]
Xue, L.; Gu, M.-Y.; Xu, W.-L.; Lu, J.-J.; Xue, Q.-H. Antagonistic streptomyces enhances defense—Related responses in cotton for biocontrol of wilt caused by phytotoxin of verticillium dahliae. Phytoparasitica 2016, 44, 225–237. [Google Scholar] [CrossRef]
Joshi, A.A.; Jadhav, B. Monitoring and controlling rice diseases using image processing techniques. In Proceedings of the 2016 International Conference on Computing, Analytics and Security Trends (CAST), Pune, India, 19–21 December 2016; pp. 471–476. [Google Scholar]
Wang, P.; Zhang, Y.; Jiang, B.; Hou, J. An maize leaf segmentation algorithm based on image repairing technology. Comput. Electron. Agric. 2020, 172, 105349. [Google Scholar] [CrossRef]
Yang, Z.; Li, W.; Li, M.; Yang, X. Automatic greenhouse pest recognition based on multiple color space features. Int. J. Agric. Biol. Eng. 2021, 14, 188–195. [Google Scholar] [CrossRef]
Singh, A.K.; Ganapathysubramanian, B.; Sarkar, S.; Singh, A. Deep learning for plant stress phenotyping: Trends and future perspectives. Trends Plant Sci. 2018, 23, 883–898. [Google Scholar] [CrossRef]
Chen, P.; Xiao, Q.; Zhang, J.; Xie, C.; Wang, B. Occurrence prediction of cotton pests and diseases by bidirectional long short-term memory networks with climate and atmosphere circulation. Comput. Electron. Agric. 2020, 176, 105612. [Google Scholar] [CrossRef]
Latif, M.R.; Khan, M.A.; Javed, M.Y.; Masood, H.; Tariq, U.; Nam, Y.; Kadry, S. Cotton leaf diseases recognition using deep learning and genetic algorithm. Comput. Mater. Contin. 2021, 69, 2917–2932. [Google Scholar]
Alves, A.N.; Souza, W.S.; Borges, D.L. Cotton pests classification in field-based images using deep residual networks. Comput. Electron. Agric. 2020, 174, 105488. [Google Scholar] [CrossRef]
Caldeira, R.F.; Santiago, W.E.; Teruel, B. Identification of cotton leaf lesions using deep learning techniques. Sensors 2021, 21, 3169. [Google Scholar] [CrossRef] [PubMed]
Zhang, Y.; Li, M.; Zhou, B.; Wang, Z. Automatic cotton spider mites’ damage grading algorithm for depth residual network based on transfer learning. Comput. Electron. Agric. 2023, 212, 108070. [Google Scholar] [CrossRef]
Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You only look once: Unified, real-time object detection. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar]
Redmon, J.; Farhadi, A. YOLO9000: Better, Faster, Stronger. arXiv 2016, arXiv:1612.08242. [Google Scholar]
Wang, J.; Wang, N.; Li, L.; Ren, Z. Real-time behavior detection and judgment of egg breeders based on YOLO v3. Neural Comput. Appl. 2019, 32, 5471–5481. [Google Scholar] [CrossRef]
Lawal, M.O. Tomato detection based on modified YOLOv3 framework. Sci. Rep. 2021, 11, 1447. [Google Scholar] [CrossRef]
Roy, A.M.; Bose, R.; Bhaduri, J. A fast accurate fine-grain object detection model based on YOLOv4 deep neural network. Neural Comput. Appl. 2022, 34, 3895–3921. [Google Scholar] [CrossRef]
Zhang, Y.; Ma, B.; Hu, Y.; Li, C.; Li, Y. Accurate cotton diseases and pests detection in complex background based on an improved YOLOx model. Comput. Electron. Agric. 2022, 203, 107484. [Google Scholar] [CrossRef]
Bai, B.; Wang, J.; Li, J.; Yu, L.; Wen, J.; Han, Y. T-YOLO: A lightweight and efficient detection model for nutrient buds in complex tea-plantation environments. J. Sci. Food Agric. 2024, 104, 5698–5711. [Google Scholar] [CrossRef]
Xu, X.; Shi, J.; Chen, Y.; He, Q.; Liu, L.; Sun, T.; Ding, R.; Lu, Y.; Xue, C.; Qiao, H. Research on machine vision and deep learning based recognition of cotton seedling aphid infestation level. Front. Plant Sci. 2023, 14, 1200901. [Google Scholar] [CrossRef]
Saeed, F.; Khan, M.A.; Sharif, M.; Mittal, M.; Goyal, L.M.; Roy, S. Deep neural network features fusion and selection based on pls regression with an application for crops diseases classification. Appl. Soft Comput. 2021, 103, 107164. [Google Scholar] [CrossRef]
Wang, C.; Du, P.; Wu, H.; Li, J.; Zhao, C.; Zhu, H. A cucumber leaf disease severity classification method based on the fusion of DeepLabV3+ and U-Net. Comput. Electron. Agric. 2021, 189, 106373. [Google Scholar] [CrossRef]
Van Henten, E.J.; Hemming, J.; Van Tuijl, B.A.J.; Kornet, J.G.; Meuleman, J.; Bontsema, J.; Van Os, E.A. An autonomous robot for harvesting cucumbers in greenhouses. Auton. Robot. 2002, 13, 241–258. [Google Scholar] [CrossRef]
Russell, B.C.; Torralba, A.; Murphy, K.P.; Freeman, W.T. Labelme: A database and web-based tool for image annotation. Int. J. Comput. Vis. 2007, 77, 157–173. [Google Scholar] [CrossRef]
Wang, A.; Chen, H.; Liu, L.; Chen, K.; Lin, Z.; Han, J.; Ding, G. YOLOv10: Real-time end-to-end object detection. arXiv 2024, arXiv:2405.14458. [Google Scholar]
Han, K.; Wang, Y.; Tian, Q.; Guo, J.; Xu, C.; Xu, C. Ghostnet: More features from cheap operations. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 1577–1586. [Google Scholar]
Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, BC, Canada, 11–17 October 2021; pp. 9992–10002. [Google Scholar]
Hu, J.; Shen, L.; Albanie, S.; Sun, G.; Wu, E. Squeeze-and-excitation networks. IEEE Trans. Pattern Anal. Mach. Intell. 2020, 42, 2011–2023. [Google Scholar] [CrossRef]
Nayak, R.; Pati, U.C.; Das, S.K.; Sahoo, G.K. YOLO-gtwdnet: A lightweight YOLOv8 network with ghostnet backbone and transformer neck to detect handheld weapons for smart city applications. Signal Image Video Process. 2024. [Google Scholar] [CrossRef]
Zhang, H.; Li, Q.; Luo, Z. Efficient online detection device and method for cottonseed breakage based on light-YOLO. Front. Plant Sci. 2024, 15, 1418224. [Google Scholar] [CrossRef]
Xu, J.; Pan, F.; Han, X.; Wang, W.; Wang, Y.; Li, W. Edgetrim-YOLO: Improved trim YOLO framework tailored for deployment on edge devices. In Proceedings of the 2024 4th International Conference on Computer Communication and Artificial Intelligence (CCAI), Xi’an, China, 24–26 May 2024; pp. 113–118. [Google Scholar]
Chen, Z.; Wu, R.; Lin, Y.; Li, C.; Chen, S.; Yuan, Z.; Chen, S.; Zou, X. Plant disease recognition model based on improved YOLOv5. Agronomy 2022, 12, 365. [Google Scholar] [CrossRef]

Figure 1. Overall flow of the CVW hazard level assessment.

Figure 2. (a) Study area map; and (b) the ground photographs of the cotton plants.

Figure 3. Examples of image samples of CVW: (a) cloudy; (b) sunny; (c) rainy; (d) dusk; and (e) night.

Figure 4. Original image and enhanced images: (a) original image; (b) image after bending; (c) image after adjusting brightness; (d) image after adding Gaussian noise; and (e) image after adding peppercorn noise.

Figure 5. GhostConv module structure.

Figure 6. (a) W-MSA module and SW-MSA module; and (b) the Cyclic Shift operation of the SW-MSA.

Figure 7. (a) Two successive Swin Transformer Blocks; and (b) the STC module structure.

Figure 8. SE module structure.

Figure 9. Data flow diagram of the CVW system.

Figure 10. CVW hazard level assessment on-site detection and system interface.

Figure 11. The examples of the CVW segmentation detection in different environments: (a) original images. From top to bottom: cloudy, sunny, rainy, dusk, nighttime images taken without flash, and nighttime images taken with flash; (b) segmentation results of YOLOv10n; and (c) segmentation results of YOLO-VW. Note: the black regions represent the background, the green regions represent healthy leaves, and the red regions represent lesions. The red boxes indicate areas where the leaf and lesion regions are over-segmented, while the yellow boxes highlight areas where the leaf or lesion regions are under-segmented.

Table 1. Specifications of the server for data processing.

Configuration Name	Equipped Model
Operating System	Ubuntu 20.04
Graphic Processing Unit	NVIDIA GeForce RTX 4090
Central Processing Unit	16 vCPU Intel(R) Xeon(R) Platinum 8352V CPU
Random Access Memory	24 G
Development Environment	PyCharm 2022, PyTorch 2.0.0, Python 3.8.10

Table 2. CVW hazard level assessment standard.

Hazard Level	K	Hazard Level
B₀	K = 0	No damage
B₁	0< K ≤ 25%	Light damage
B₂	25% < K ≤ 50%	Moderate damage
B₃	50% < K ≤ 75%	Serious damage
B₄	75% < K	Extremely serious damage

Table 3. Distribution of cotton foliage dataset.

Class Name	Original Images			Augmented Images
Class Name	Training Set	Validation Set	Total	Training Set	Validation Set	Total
Healthy	294	73	367	1470	365	1835
Disease	968	242	1210	4840	1210	6050
Total	1262	315	1577	6310	1575	7885

Table 4. Formulae of the model evaluation indices for precision, recall, F1, [email protected], FLOPs, and parameters.

Indicator Name	Formulars
Precision	$\frac{T P}{T P + F P}$
Recall	$\frac{T P}{T P + F N}$
F1	$2 \times \frac{P r e c i s i o n \times R e c a l l}{P r e c i s i o n + R e c a l l}$
[email protected]	$\frac{1}{C} \sum_{c = 1}^{c} \int_{0}^{1} P (R) d R$
FLOPs	$(2 C_{i} K^{2} - 1) H W C_{o}$
Parameters	$C_{i} \times C_{o} \times K_{w} \times K_{h}$

Table 5. Comparison results of different convolution modes.

Model	P (%)	R (%)	F1 (%)	mAP_M@0.5 (%)	Weight/MB	Parameters/M	FLOPs/G
YOLOv10n	86.4	83.6	84.98	86.8	5.73	2.84	11.8
YOLOv10n- GhostConv-A	89.7	84.1	86.81	87.5	4.56	2.21	10.7
YOLOv10n- GhostConv-B	88.3	85.4	86.83	88.0	4.57	2.21	10.7
YOLOv10n- GhostConv-C	88.4	85.3	86.82	87.8	4.57	2.22	10.8

Table 6. Comparison results of different attentional mechanisms.

Model	P (%)	R (%)	F1 (%)	mAP_M@0.5 (%)	Weight/MB	Parameters/M	FLOPs/G
YOLOv10n	86.4	83.6	84.98	86.8	5.73	2.84	11.8
YOLOv10n-STC	90.5	82.1	86.10	86.6	4.15	1.6	8
YOLOv10n-STC- CBAM	86.9	81.6	84.17	85.9	3.79	1.81	8
YOLOv10n-STC- CoorAtt	88.6	84.8	86.66	86.9	3.76	1.64	8
YOLOv10n-STC- GAM	86.8	84.8	85.79	86.6	3.72	1.61	8
YOLOv10n-STC- SE	87.4	86.5	86.95	87.4	3.71	1.6	8

Table 7. Comparison results of different optimizers.

Model	Optimizer	P (%)	R (%)	F1 (%)	mAP_M@0.5 (%)	Weight/MB	Parameters/M	FLOPs/G
YOLO-VW	Adam	86.4	83.6	84.98	86.8	5.73	2.84	11.8
	AdamW	90.5	82.1	86.10	86.6	4.15	1.6	8
	SGD	87.4	86.5	86.95	87.4	3.71	1.6	8

Table 8. Results of the ablation experiments.

Exp	GConv	STC	SE	SGD	P (%)	R (%)	F1 (%)	mAP_M@0.5 (%)	Weight/MB	Parameters/M	FLOPs/G
1					86.4	83.6	84.98	86.8	5.73	2.84	11.8
2	✓				88.3	85.4	86.83	88.0	4.57	2.21	10.7
3	✓	✓			88.8	83	85.80	87.2	3.62	1.55	7.7
4	✓	✓	✓		90.4	85.3	87.78	88.3	3.7	1.59	7.8
5	✓	✓	✓	✓	92.1	85.9	88.89	89.2	3.69	1.59	7.8

Note: The mark “✓“ in the table indicates the use of this module. GConv represents the GhostConv.

Table 9. Comparison of the performance of instance segmentation models.

Model	P (%)	R (%)	F1 (%)	mAP_M@0.5 (%)	Weight/MB	Parameters/M	FLOPs/G	FPS (f/s)
YOLOv5s	85.2	83.8	84.49	85.3	14.40	7.41	25.7	146.2
YOLOv7-tiny	87.5	83.0	85.19	86.3	12.50	6.40	23.0	91.74
YOLOv8n	86.6	83.9	85.23	85.0	6.52	3.26	12.0	150.6
YOLOv9t	83.3	87.4	85.30	86.3	7.12	3.41	43.9	80.65
YOLOv10n	86.4	83.6	84.98	86.8	5.73	2.84	11.8	136.61
SOLOv2	61.6	66.9	64.14	76.4	184.18	46.23	196.0	31.2
Mask-RCNN	68.9	74.3	71.50	87.3	173.84	44.40	208.0	31.3
YOLO-VW	92.1	85.9	88.89	89.2	3.69	1.59	7.8	157.98

Table 10. Results of the CVW hazard level evaluation in different environments.

Model	Cloudy			Sunny			Rainy			Dusk			Night (No Flash)			Night (Flash)
Model	N	N′	ACC	N	N′	ACC	N	N′	ACC	N	N′	ACC	N	N′	ACC	N	N′	ACC
YOLOv10n	59	54	92%	41	35	85%	16	13	81%	12	6	50%	26	12	46%	46	21	46%
YOLO-VW	59	57	97%	41	37	90%	16	15	94%	12	9	75%	26	21	81%	46	32	70%

Table 11. ANOVA and error analysis of the CVW hazard assessment system.

Environment	ANOVA				Error Analysis
Environment	F-Value	p-Value	F-Crit	Significance	Absolute Error	Relative Error
Cloudy	0.945	0.333	3.923	Non-significant	0.025	13.9
Sunny	0.005	0.942	3.960	Non-significant	0.003	1.6
Rainy	0.154	0.698	4.171	Non-significant	0.014	5.5
Dusk	0.343	0.564	4.301	Non-significant	0.016	11.9
Night (no flash)	0.142	0.708	4.034	Non-significant	0.013	7.2
Night (flash)	0.280	0.598	3.947	Non-significant	0.020	7.7

Note: p-value > 0.5 indicates no significant effect; p-value ≤ 0.5 indicates significant effect.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Liao, J.; He, X.; Liang, Y.; Wang, H.; Zeng, H.; Luo, X.; Li, X.; Zhang, L.; Xing, H.; Zang, Y. A Lightweight Cotton Verticillium Wilt Hazard Level Real-Time Assessment System Based on an Improved YOLOv10n Model. Agriculture 2024, 14, 1617. https://doi.org/10.3390/agriculture14091617

AMA Style

Liao J, He X, Liang Y, Wang H, Zeng H, Luo X, Li X, Zhang L, Xing H, Zang Y. A Lightweight Cotton Verticillium Wilt Hazard Level Real-Time Assessment System Based on an Improved YOLOv10n Model. Agriculture. 2024; 14(9):1617. https://doi.org/10.3390/agriculture14091617

Chicago/Turabian Style

Liao, Juan, Xinying He, Yexiong Liang, Hui Wang, Haoqiu Zeng, Xiwen Luo, Xiaomin Li, Lei Zhang, He Xing, and Ying Zang. 2024. "A Lightweight Cotton Verticillium Wilt Hazard Level Real-Time Assessment System Based on an Improved YOLOv10n Model" Agriculture 14, no. 9: 1617. https://doi.org/10.3390/agriculture14091617

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Lightweight Cotton Verticillium Wilt Hazard Level Real-Time Assessment System Based on an Improved YOLOv10n Model

Abstract

1. Introduction

2. Materials and Methods

2.1. Test Site Design

2.2. Devices and Materials

2.3. Cotton Verticillium Wilt Hazard Level Assessment Standard

2.4. Image Dataset Collection

2.5. Image Preprocessing

2.6. Network Improvement

2.6.1. YOLO-VW Network

2.6.2. GhostConv Module

2.6.3. STC Module

2.6.4. SE Attention

2.6.5. SGD Optimizer

2.7. Model Deployment

2.8. Parameter Setup and Evaluation Indicators

3. Results and Discussion

3.1. Ablation Experiment

3.1.1. Impact of Convolutional Mode

3.1.2. Comparative Experiments on Attentional Mechanisms

3.1.3. Impact of Different Optimizers on Model Effectiveness

3.1.4. Ablation Experiment Analysis of Improved Module

3.2. Comparative Experimental Analysis of Different Models

3.3. Performance Evaluation and Demonstration

4. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI