Open AccessArticle

U-Net Semantic Segmentation-Based Calorific Value Estimation of Straw Multifuels for Combined Heat and Power Generation Processes

Lianming Li

^1,2,

Zhiwei Wang

^2,3 and

Defeng He

^2,3,*

Jiaxing Newjies Thermal Power Co., Ltd., Jiaxing 314016, China

College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China

Zhejiang Provincial Key Laboratory of Clean Energy Conversion and Utilization, Hangzhou 310023, China

Author to whom correspondence should be addressed.

Energies 2024, 17(20), 5143; https://doi.org/10.3390/en17205143

Submission received: 31 August 2024 / Revised: 8 October 2024 / Accepted: 14 October 2024 / Published: 16 October 2024

(This article belongs to the Special Issue Application of New Technologies in Bioenergy and Biofuel Conversion)

Download

Browse Figures

Versions Notes

Abstract

This paper proposes a system for real-time estimation of the calorific value of mixed straw fuels based on an improved U-Net semantic segmentation model. This system aims to address the uncertainty in heat and power generation per unit time in combined heat and power generation (CHPG) systems caused by fluctuations in the calorific value of straw fuels. The system integrates an industrial camera, moisture detector, and quality sensors to capture images of the multi-fuel straw. It applies the improved U-Net segmentation network for semantic segmentation of the images, accurately calculating the proportion of each type of straw. The improved U-Net network introduces a self-attention mechanism in the skip connections of the final layer of the encoder, replacing traditional convolutions by depthwise separable convolutions, as well as replacing the traditional convolutional bottleneck layers with Transformer encoder. These changes ensure that the model achieves high segmentation accuracy and strong generalization capability while maintaining good real-time performance. The semantic segmentation results of the straw images are used to calculate the proportions of different types of straw and, combined with moisture content and quality data, the calorific value of the mixed fuel is estimated in real time based on the elemental composition of each straw type. Validation using images captured from an actual thermal power plant shows that, under the same conditions, the proposed model has only a 0.2% decrease in accuracy compared to the traditional U-Net segmentation network, while the number of parameters is significantly reduced by 74%, and inference speed is improved 23%.

Keywords:

combined heat and power generation (CHPG); calorific value prediction; mixed straw fuels; U-Net; semantic segmentation

1. Introduction

In the context of global climate change and environmental protection, fossil fuel power generation faces significant challenges related to resource depletion and environmental degradation. Particularly under the push for carbon neutrality and peak carbon goals, there is a growing global demand for the development and utilization of renewable energy. Biomass fuels, as an important renewable resource (including wood, crop residues, and energy crops), have gained widespread attention due to their renewability and carbon neutrality characteristics [1,2]. Among these, straw is considered a key renewable energy source due to its abundant supply and low cost. As an agricultural powerhouse, China possesses rich crop straw resources. The rational development and utilization of these straw resources not only help reduce fossil fuel consumption and alleviate energy shortages, but also effectively lower environmental pollution and greenhouse gas emissions, thereby contributing to the achievement of peak carbon and carbon neutrality goals. Combined heat and power generation (CHPG) technology serves as an efficient utilization method for straw resources, significantly improving economic benefits while bringing positive ecological and social effects [3,4].

In the combined heat and power generation (CHPG) process, the heat used for power generation is obtained by burning different types of straw fuels, which are fed into the boiler, such as the circulating fluidized bed boiler (CFBB) shown in Figure 1. The straw fuel is mixed with desulfurizing agents and introduced from the bottom of the boiler. At this point, there are many fluidized combustion materials in the boiler, which facilitates rapid fuel combustion. Subsequently, primary and secondary air are introduced into the boiler from the bottom and the sidewalls, respectively. An upward airflow is formed in the boiler, causing the materials to move toward the upper part of the boiler. Most of the fuel continues to burn in the dense phase region, while a smaller portion is carried into the dilute phase region along with the flue gas. In the dilute phase region, the fuel burns in a suspended state, releasing heat to the heating surfaces and water walls inside the boiler, raising their temperature. Under the influence of gravity and external forces, the movement of fuel in the dilute phase changes; as the velocity slows down, the direction of material movement gradually shifts away from the main airflow, forming a particle flow that adheres to the wall and is carried out of the airflow into a separator. These materials are then collected and sent back to the boiler for repeated combustion, ultimately achieving complete combustion through multiple cycles. Some particles enter the flue gas duct, releasing heat to the surfaces at the rear of the boiler. After cooling, the flue gas is treated to meet environmental protection requirements, such as dust removal and desulfurization, before being discharged into the atmosphere, completing the entire combustion process.

However, the heating value of straw fuel is influenced by various factors such as type, moisture content, and combustion state, leading to fluctuations in the heating value and causing uncertainty in the heat and power generated per unit time [5]. This not only affects the stable operation of the CHPG system, but also directly impacts power generation efficiency and economic benefits. Additionally, since straw is often stacked in multiple layers in practical applications, it is difficult to estimate its heating value directly through traditional elemental composition analysis methods. Moreover, the moisture content of straw varies significantly, and the evaporation of moisture consumes some of the heat, further complicating the estimation of fuel heating value. In advanced control strategies for the CHPG process, real-time measurement of the fuel’s moisture content and heating value is crucial for ensuring smooth system operations under long delays and large time constants [6,7]. Therefore, proposing effective solutions for real-time estimation of the heating value in multi-fuel CHPG processes based on straw can not only improve the utilization efficiency of straw resources and promote the development of renewable energy, but also significantly reduce carbon emissions, contributing to the fight against global climate change. Additionally, accurate heating value estimation can enhance the economic benefits of the CHPG system, ensuring the stability and sustainability of energy supply. Furthermore, with the advancement of carbon neutrality goals, the adoption of renewable energy has become a pressing priority, making real-time heating value estimation an important support for sustainable development goals.

Traditional methods for estimating the heating value of multi-fuel straw mainly rely on chemical analysis [8]. These methods typically require manual classification of the straw, collection of water vapor produced during the drying process to determine moisture content, and the use of chemical reagents to analyze the composition of the dried straw. Subsequently, a heating value model is established for the current batch of straw to calculate the heating values of various fuels. Although this offline estimation method is relatively accurate, it cannot respond promptly to fluctuations in heating value within the CHPG system and requires significant human effort, is time-consuming, and incurs high costs. Estiati et al. [9] utilized artificial neural networks to obtain approximate analytical data for estimating heating values based on different types of biomass residues.

With the developments of optical sensing technology, methods for estimating heating values based on digital image processing have received increasing attention. By performing precise semantic segmentation on fuel images to obtain the proportions of different types of biomass fuels and combining this with relevant parameters provided by sensors, there is a high demand for accuracy in image semantic segmentation to accurately estimate the fuel’s heating value. Traditional image segmentation methods mainly rely on gray value thresholds and similarity, such as threshold segmentation [10,11,12], edge detection [13,14], and region-based methods [15,16]. However, due to the complexity of object structures and textures, as well as the influence of environmental lighting and image noise, these methods require parameter adjustments for different images, limiting their applicability. The diversity of straw, variations in lighting, similarities in color and texture, and the complex characteristics of stacking and fragmentation, pose challenges for traditional image segmentation methods in accurately distinguishing between different types of straw.

In recent years, deep learning has made significant advancements in image processing. Convolutional Neural Networks (CNNs), in particular, have been widely applied in semantic segmentation of images [17,18,19]. Semantic segmentation involves segmenting different objects at the pixel level from an image. CNN-based semantic segmentation architectures use convolution operations to capture image features. For instance, Evan et al. [20] proposed the Fully Convolutional Network (FCN), which utilizes CNNs as one of its modules to generate hierarchical features, replacing fully connected layers with convolutional layers to restore the feature maps to the same size as the input. Thus, FCNs can be applied to inputs of arbitrary sizes, generating predictions for each pixel and retaining the spatial information of the original input for pixel-wise classification. Yao et al. [21] introduced the Adaptive Deep Convolutional Neural Network (ADCNN) for object detection and semantic segmentation tasks in specific scenarios. This method first employs transfer learning to select effective convolution kernels from a general CNN classifier, and then effectively learns local and global contextual information in monitored scenes through a specific architecture, thereby improving the accuracy of object location predictions. Zheng et al. [22] converted Conditional Random Fields (CRFs) [23] into Recurrent Neural Networks (RNNs) [24] and connected them to the last convolution layer of FCN. This method addresses the issues of large receptive fields and weak edge constraints in FCN, refining the results to achieve more specific and smooth segmentation. Badrinarayanan et al. [25] proposed the SegNet model, which features an encoder–decoder structure similar to FCN. Unlike FCN, SegNet uses max pooling for upsampling, thereby reducing the number of training parameters and memory usage. Ronneberger et al. [26] built upon FCN to propose the U-Net model, extending the idea of fusing high-level and low-level features in FCN and adding skip connections to concatenate feature maps from the encoder with those from the decoder, preserving more detailed information and enhancing classification accuracy.

Transformers [27], due to their excellent global context modeling capabilities, have seen widespread development and application in the field of image processing in recent years, and real-time semantic segmentation methods based on Transformers have also begun to attract attention. Zheng et al. [28] proposed SETR, which treats semantic segmentation as a sequence-to-sequence prediction task, using pure Transformers as the encoder to encode images and employing a simple decoder. This approach avoids the convolution operations and resolution reduction found in traditional fully convolutional networks, significantly improving segmentation performance by leveraging the global context modeling capabilities of Transformers. Wang et al. [29] introduced the Pyramid Vision Transformer (PVT), which incorporates Feature Pyramid Networks (FPN) [30] into Transformers to capture features at different scales, enhancing the model’s ability to process multi-scale information.

Based on the above discussion, this paper proposes a real-time system for estimating the calorific value of mixed straw fuels based on improved U-Net semantic segmentation model. The study utilizes Python (version 3.8.0) as the primary programming language, and employs the PyTorch framework to construct and train the improved U-Net model. Compared to existing semantic segmentation methods, the proposed improved U-Net network provides key contributions in the following areas:

(1): It introduces a self-attention mechanism in the skip connections to enhance the extraction of key information from deep features;
(2): It replaces traditional convolutions with depthwise separable convolutions to reduce the model’s computational complexity and improve inference speed;
(3): It substitutes the bottleneck layer with a Transformer encoder to leverage the global modeling capabilities of Transformers, allowing the model to better understand contextual information within the image.

The structure of this paper is organized as follows: First, Section 1 introduces the background and related work, focusing on real-time heating value estimation methods and existing image segmentation techniques. Section 2 will provide a detailed description of the design and implementation of the real-time estimation system for mixed straw fuel heating values based on the improved U-Net semantic segmentation model, as well as the structure of the improved U-Net model. Section 3 will present the experimental setup and analysis of the results, including comparisons with existing methods, and will include ablation studies to validate the impact of each component. Finally, Section 4 will summarize the research findings and discuss future research directions.

2. Straw Calorific Value Estimation Method

2.1. Real-Time Heating Value Estimation System

In the combined heat and power generation (CHPG) process, the heterogeneity of the multi-fuel straw fed into the Circulating Fluidized Bed Boiler (CFBB) presents significant challenges for real-time estimation of the calorific value of the current straw batch. To effectively address this issue, a configuration system has been designed, as shown in Figure 2, aimed at real-time estimation of the calorific value of multiple fuels. This system comprises several key components, including an industrial camera, image analysis host, server, industrial network bus, moisture detector, and quality sensor. The industrial camera is responsible for capturing high-resolution images of the multi-fuel straw for subsequent processing, providing essential data for later analysis.

During the image processing phase, an improved U-Net semantic segmentation algorithm is applied to precisely segment the captured straw images. This algorithm effectively identifies and calculates the proportions of different types of straw, thereby providing detailed information about the raw material composition. In this way, the distribution of various straws in the current batch can be accurately reflected, laying the groundwork for calorific value estimation. Simultaneously, the system integrates a moisture detector and quality sensor, which can measure the moisture content and overall quality of the multi-fuel in real time. These parameters are crucial for calorific value calculation, as the moisture content of the straw directly affects its combustion characteristics and calorific value. By combining these real-time measurement data with the results of image analysis, the system can more comprehensively determine the elemental composition of the straw.

Finally, the server estimates the calorific value based on the measured elemental composition and its relationship with the calorific values of each dry element. This process not only enhances the accuracy and timeliness of calorific value estimation, but also provides robust support for the optimized operation of the CHPG system. Through real-time monitoring and analysis, the system can dynamically adjust fuel usage strategies, achieving a more efficient and environmentally friendly power generation process.

2.2. Improved U-Net Semantic Segmentation Network

We propose an improved U-Net semantic segmentation model, as illustrated in Figure 3. This model consists of encoder layers, a bottleneck layer, and decoder layers, To improve the real-time performance of the model without compromising the accuracy of semantic segmentation, we set the model’s input size to 3 × 256 × 256. This size retains sufficient spatial information to ensure the model can extract key features. At the same time, the smaller input size significantly reduces computational load and memory usage, allowing the model to run efficiently while maintaining strong performance (detailed parameter information is listed in Table 1). To reduce the number of parameters without significantly sacrificing performance, we replace traditional Convolution by Depthwise Separable Convolution in both the encoder and decoder. This method, which is derived from MobileNet [31], decomposes traditional convolutions into depthwise and pointwise convolutions, significantly lowering computational complexity while maintaining effective feature extraction capabilities.

Additionally, to enhance the model’s ability to capture critical features, we introduce a self-attention mechanism in the skip connections of the final layer of the encoder. This self-attention mechanism enables the model to focus on the relationships between different regions in the image when processing feature maps, thereby more effectively extracting important information, especially in the segmentation of straw images against complex backgrounds.

Furthermore, we replace the U-Net bottleneck with a Transformer encoder to improve the extraction of key features. The Transformer architecture is better suited for modeling global contextual information, allowing the model to more accurately understand the structure and characteristics of highly heterogeneous straw. In this process, we concatenate the output from the previous layer with the output from the Transformer to preserve the original key features, avoiding information loss and enhancing the model’s expressive capability.

To further optimize the model, we removed the max pooling layers and replaced them with convolutions that have a stride of 2. This change not only maintains spatial resolution, but also enhances the model’s ability to capture and retain details during feature extraction, thereby contributing positively to more accurate segmentation results. This design ensures the model’s efficiency and accuracy in complex environments, providing strong support for real-time calorific value estimation.

To verify the impact of various modules on model performance, we designed and conducted ablation experiments. By gradually removing or modifying different components of the model (such as depthwise separable convolutions, skip connections, and the Transformer Encoder-based bottleneck), we analyzed the effects of these changes on the model’s accuracy and computational efficiency in the semantic segmentation task. This allowed us to quantitatively assess the contribution of each module to the model’s performance, further optimizing the model architecture.

2.2.1. Encoder

The encoder structure is shown in the left half of Figure 3, and uses Depthwise Separable Convolution (DSC) to replace traditional convolution. Compared to traditional convolution, DSC features smaller convolution kernels and lower computational complexity. It significantly reduces the number of parameters and improves computational speed by decomposing the convolution operation into Depthwise Convolution (DW) and Pointwise Convolution (PW). Depthwise Convolution applies a convolution kernel independently to each input channel to extract spatial features. Although this process greatly reduces the number of parameters and computational complexity, it processes information only on a single channel, and cannot interact between channels. Pointwise Convolution uses a 1 × 1 convolution kernel to perform a weighted sum of all channels at each spatial location, enabling the fusion of information between channels. This compensates for the lack of channel interaction in Depthwise Convolution, and enhances feature representation capability.

Additionally, the Softplus activation function is introduced after DSC. Its mathematical expression is

Softplus (x) = ln (1 + e^{x})

. Compared to ReLU,

f (x) = max (0, x)

, the Softplus function is smoother and avoids the “dead zone” problem that can occur with ReLU. When combined with DSC, Softplus further enhances the network’s non-linearity, making the model more robust in handling complex patterns and detailed information, while providing more stable gradient flow during training and accelerating model convergence.

We also use convolutions with stride = 2 instead of max pooling operations, which will not only allows the model to perform downsampling while maintaining feature learning capabilities, but also enables the model to retain more contextual features, thereby optimizing the feature extraction ability of the model.

2.2.2. Bottleneck

In the U-Net model, the bottleneck section is typically composed of convolutional layers. While convolutional networks excel at handling local features, they have limitations in capturing long-range dependencies and global context. To enhance the model’s ability to process complex backgrounds or detailed images, we replaced the traditional convolutional bottleneck with a Transformer encoder, as shown in Figure 4. This encoder consists of three layers of attention encoding modules. The Transformer encoder effectively utilizes global context information to extract key features, allowing it to capture more complex feature relationships and long-range dependencies. Through the self-attention mechanism, the Transformer encoder focuses attention on the most relevant input areas, improving the understanding of complex backgrounds and detailed images.

The computational process is as follows: The computational process is as follows: First, we need to compute the values of the

Q u e r y

K e y

, and

V a l u e

tensors by Equation (1), and then we could calculate the corresponding attention scores by Equation (2). To capture different representations of the input, a multi-head attention mechanism is used to obtain various relationships and features within the input sequence, and the final self-attention output tensor can be obtained from Equation (3).

Q = X W_{q}, K = X W_{k}, V = X W_{ν}

(1)

h e a d = A t t e n t i o n (Q, K, V) = s o f t m a x (\frac{Q K^{T}}{\sqrt{d_{k}}}) V

(2)

M u l t i H e a d (Q, K, V) = C o n c a t (h e a d_{1}, \dots, h e a d_{h}) W^{o}

(3)

where X is input tensor,

W_{q}

W_{k}

, and

W_{v}

are learnable weight matrices, and

d_{k}

is the dimension of the

K e y

. The similarity between the queries and keys is computed using

Q K^{T}

. After normalizing the result with the softmax, it is multiplied by the values V to generate the attention output.

W^{0}

is the linear projection matrix used to perform a linear transformation on the concatenated output of the multi-head attention, mapping it back to the original dimension. The output of the self-attention obtained from Equation (3). We can also obtain the output tensor after residual connection and normalization by using Equation (4).

O u t p u t_{a t t e n t i o n} = L a y e r N o r m (X + O u t p u t_{m u l t i H e a d})

(4)

The output of the decoder can be obtained from Equation (5), where the FFN processes the output through a feed-forward neural network, enabling the model to learn the non-linear mapping from input to output in order to further transform the data and extract more complex features. The formula is

F F N (x) = R e L U (x W_{1} + b_{1}) W_{2} + b_{2}

, where

W_{1}

and

W_{2}

are the learnable weight matrices, and

b_{1}

and

b 2

are the learnable bias vectors.

{O u t p u t}_{e n c o d e r} = L a y e r N o r m ({O u t p u t}_{a t t e n t i o n} + F F N ({O u t p u t}_{a t t e n t i o n}))

(5)

While leveraging the advantages of Transformers and retaining original features, we introduced a feature concatenation strategy into the model. Specifically, we concatenate the output of the Transformer encoder with the output from the previous layer of the U-Net encoder. This concatenation operation not only preserves the local features extracted by traditional convolutional layers, but also integrates the global contextual information captured by the Transformer, resulting in a richer information representation during the feature fusion phase. This combination allows the model to understand image content at a higher level while avoiding the loss of feature information, thus enhancing overall segmentation performance.

Through this design, we have increased the model’s sensitivity to complex and fine-grained features while maintaining the original features, thereby improving the recognition capability of key regions in the image. This approach not only enhances the model’s expressive power but also demonstrates higher accuracy and robustness in practical applications.

2.2.3. Decoder

The decoder structure is shown in the right half of Figure 3. To improve efficiency and reduce computational costs, the first four layers of the decoder use the same skip connection approach as U-Net, which maximizes the preservation of multi-scale feature information passed from the encoder. These skip connections effectively merge local and global feature information by transferring high-resolution feature maps to the corresponding layers in the decoder. To further enhance the model’s ability to capture important information, we introduced a self-attention mechanism in the skip connection of the last encoder layer. This mechanism adaptively adjusts features from different spatial positions, ensuring that the model captures more discriminative features across the entire image, thereby improving the final segmentation accuracy. The last layer of the encoder contains the deepest semantic information and plays a crucial role in semantic feature extraction. By applying the self-attention mechanism, the model can more precisely capture these deep semantic features, reducing the risk of information loss during upsampling. This approach is particularly effective for complex scenes and fine details. In contrast, using self-attention on shallow feature maps with higher resolution would consume a large amount of computational resources, and yield less significant improvements. Therefore, we chose to apply the self-attention mechanism to the last encoder layer to maximize the model’s efficiency and effectiveness. By combining skip connections with the self-attention mechanism, the decoder not only effectively integrates features from the encoder, but also enhances the ability to extract and retain key information. This approach maintains a relatively small number of model parameters while increasing feature richness and segmentation accuracy, achieving a balance between performance and efficiency.

2.2.4. Loss Function

The goal of semantic segmentation is to assign a category label to each pixel, making it a pixel-level multi-class task. To measure the difference between the model’s predicted probability distribution and the actual category distribution, we use cross-entropy as the loss function. The cross-entropy loss function if defined as Equation (6), which could quantifies the discrepancy between the target probability distribution and the predicted probability distribution. By minimizing this loss function, the model can better fit the target distribution, thereby improving the performance of semantic segmentation.

L = - \sum_{i = 0}^{n} p (x_{i}) l o g (q (x_{i}))

(6)

where

p (x_{i})

represents the true probability distribution,

q (x_{i})

represents the predicted distribution, and represents the number of classes. Then, the gradients of the loss function with respect to each parameter are calculated through backpropagation, as shown in the following equation:

\nabla_{θ} J (θ) = \frac{1}{m} \sum_{i = 1}^{m} \nabla_{θ} L (O^{(x_{i})}, P^{(x_{i})})

(7)

where

J (θ)

represents the average loss over all samples in the model,

\nabla_{θ} J (θ)

denotes the gradient vector with respect to the model parameter vector

θ

, and m represents the number of samples. We could obtain the gradient of the loss function by using Equation (7), an optimization algorithm is used to update the model’s parameters to improve recognition accuracy. This process is repeated over all samples until the loss function converges.

2.3. Calorific Value Estimation

The composition of the fuel and the moisture content of each material have a significant impact on the overall calorific value. If the straw fuel contains a large amount of moisture, the water will evaporate first during combustion. The evaporation process requires a substantial amount of energy, which is absorbed from the heat released by the fuel, thus reducing the available heat. Therefore, the moisture content of the straw fuel directly affects its effective calorific value. The calorific value of dry straw can be calculated using the following formula [32,33,34]:

q_{d} = 0.348 \cdot w_{C} + 0.938 \cdot w_{H} + 0.105 \cdot w_{S} + 0.063 \cdot w_{N} - 0.108 \cdot w_{O} [MJ / kg]

(8)

where

w_{C}

is the carbon content in the straw (%),

w_{H}

is the hydrogen content (%),

w_{S}

is the sulfur content (%),

w_{N}

is the nitrogen content (%), and

w_{O}

is the oxygen content (%). These components are closely related to the type of straw, and the compositional proportions of different types of straw are shown in Table 2 [35]. In practical applications, straw fuel typically contains a certain amount of moisture and, therefore, the calorific value needs to be adjusted based on the moisture content using the following formula:

q_{t o t a l} = q_{d} \cdot (1 - w / 100) - 0.0244 \cdot w [MJ / kg]

(9)

Here, w represents the moisture content of the straw-based fuels (%), and

q_{t o t a l}

is the effective calorific value of dry straw, which could be obtained from Equation (9). It is necessary to determine the proportion of each type of straw in the mixed fuel based on the results of semantic segmentation of straw images before using Equation (8). Suppose there are n types of straw fuels fed into the circulating fluidized bed boiler (CFBB). The proportion of each straw type in the mixed fuel can be calculated from the segmentation results of the straw images. The proportion of each type of straw can be calculated using the following formula:

q_{w} = q_{d} \cdot (1 - w / 100) - 0.0244 \cdot w [MJ / kg]

(10)

where n is the number of straw types, and

P_{i}

represents the number of pixels corresponding to the i-th type of straw. Subsequently, the total mass G of the mixed straw fuel can be obtained using a mass sensor. Based on the proportion data from Equation (10), the mass of each type of straw can be determined as

G_{i} = G \cdot P_{i} [\begin{matrix} kg \end{matrix}]

Finally, by combining the mass of each fuel and its effective calorific value, the total calorific value of the entire fuel can be calculated as

Q_{t o t a l} = \sum_{i = 1}^{n} G_{i} \cdot q_{w} [M J]

, where

Q_{t o t a l}

represents the total calorific value of the straw fuel, and n is the number of straw types.

Through this method, using the segmentation results of straw images, data from the mass sensor, and the calorific value formulas for each type of straw, the overall calorific value of the mixed straw fuel can be accurately calculated. This method provides a scientific and effective technical approach for real-time estimation of the calorific value of mixed fuels, offering an important basis for optimizing the utilization of straw fuel and improving the efficiency of combined heat and power systems.

3. Experiments and Analysis

The experiments in this paper were conducted on an Ubuntu 20.04 operating system using the PyTorch 2.2.0 framework. The hardware configuration of the experimental platform is as follows: Intel Core™ i7-13790F processor with a clock speed of 3.4 GHz, 32 GB of RAM, NVIDIA RTX 4070 Ti Super GPU with 16 GB of VRAM. The image resolution for training and testing was fixed at 256 × 256, and the optimization algorithm used was AdamW [36] with a linear learning rate scheduler that includes a warm-up phase. The maximum learning rate was set to 1 × 10⁻⁵, the batch size was set to 50, and the training was conducted for 100 epochs.

3.1. Data Augmentation

The straw images were captured by the MV-GE1400C-T industrial camera (MindVision, Shenzhen, China) under natural light conditions. The camera has a maximum resolution of 4384 × 3288 pixels, a shooting speed of 8 fps, and is equipped with a 6 mm fixed-focus lens. The straw dataset consists of images with a resolution of 4032 × 3024 pixels, each pixel-level annotated with four categories, such as Wheat, Corn, sesame, and background. Since the original dataset is relatively small and the size of each image is too large to direct use, it is necessary to augment the original dataset.

Data augmentation is a common technique that has been proven beneficial for training deep learning models. Proper augmentation can accelerate model convergence, improve robustness, help avoid overfitting, and enhance generalization capabilities. In this study, we expanded the dataset by performing random cropping, adding noise, and applying affine transformations such as rotation, vertical flipping, and horizontal flipping to the straw images. As a result, we obtained a new dataset consisting of 12,000 images with a size of 256 × 256, which was then divided into training, validation, and test sets in an 8:1:1 ratio.

3.2. Evaluation Metrics

We use pixel accuracy (PA) and mean intersection over union (mIoU) to evaluate the segmentation effect of U-Net. In general, PA indicates the proportion of pixels correctly segmented to the image, which is defined by Equation (11), and mIoU indicates the proportion of pixels correctly segmented in the intersection of labeled and predicted, which is defined by Equation (12),

PA = \frac{\sum_{i = 1}^{k} p_{i i}}{\sum_{i = 1}^{k} \sum_{j = 1}^{k} p_{i j}}

(11)

mIoU = \frac{1}{k} \frac{\sum_{i = 1}^{k} p_{i i}}{\sum_{i = 1}^{k} \sum_{j = 1}^{k} (p_{i j} + p_{j i})}

(12)

where k is the number of classes,

p_{i i}

is the pixels correctly segmented,

p_{i j}

is the pixels belonging to class i, but predicted to class j, and

p_{j i}

is the pixels belonging to class j, but predicted to class i.

3.3. Comparison Experiments

To validate the effectiveness of the proposed model, we conducted experiments on images captured from a real power plant and evaluated our model using a real dataset. We adopted two main evaluation metrics: Mean Intersection over Union (mIoU) and Pixel Accuracy (PA). As shown in Table 3, our proposed model performed excellently in both PA and mIoU. The large version of our model outperformed other models in both metrics, while the small version had a PA slightly lower than U-Net but still surpassed other comparative methods, such as SegNet, DDRNet [37], and SmaAt-UNet [38].

Although the large version of the proposed model excelled in PA and mIoU, its parameter count was only 26% of U-Net, and its inference speed was 79% of U-Net’s speed. In contrast, the large version, which did not incorporate Depthwise Separable Convolution (DSC), had a parameter count of 109% of U-Net. Moreover, the accuracy of the proposed model was only 0.21% lower than that of U-Net, and mIoU was also reduced by just 0.2%. Considering both inference accuracy and computational complexity, the segmentation model proposed in this paper is more suitable for real-world applications in estimating the heating value of cogeneration.

3.4. Ablation Studies

To further validate the effectiveness and improvements of the proposed model, we designed ablation experiments and used Pixel Accuracy (PA) and Mean Intersection over Union (mIoU) to assess the impact of different model configurations on overall performance. The experimental results are shown in Table 4. We analyzed the following configurations: the baseline U-Net model, U-Net with the self-attention mechanism (UNet-Atten), U-Net combining self-attention and Transformer (UNet-Atten-Trans), and models that incorporate Depthwise Separable Convolutions (DSC).

From the results, it can be observed that after introducing DSC, the model’s parameter count was significantly reduced to 19% of the original U-Net, and the inference speed improved by approximately 23%, but the accuracy decreased by 2.2%. After introducing the self-attention mechanism in the final layer’s skip connection, although the model’s parameter count increased by 1 MB, the model’s PA improved by 0.94%, mIoU increased by 2.1%, and the inference speed remained largely unchanged. When the bottleneck layer was replaced with a Transformer encoder, the model’s accuracy further increased by 0.95%, and the mIoU improved by 2.4%, but this also led to a significant increase in parameter count and computational complexity.

To validate the effect of introducing the self-attention mechanism in the skip connection and using the Transformer encoder as the bottleneck, we performed tests based on the original U-Net. The results showed that by introducing the self-attention mechanism and replacing the bottleneck layer, both PA and mIoU of the model achieved significant improvements, but the accompanying computational load and parameter count also increased substantially.

Overall, incorporating the self-attention mechanism in the skip connection and replacing the bottleneck layer with a Transformer encoder indeed enhances the model’s segmentation accuracy, but it also increases the computational load and parameter count. By combining DSC, the self-attention mechanism, and the Transformer encoder, the model achieves a good balance between accuracy and computational complexity, making it more suitable for real-time segmentation applications.

3.5. Discussion

The modified U-Net-based model proposed in this paper demonstrates strong performance and the potential for application in the estimation of calorific value in straw fuel. Through a comparative analysis of different models, we can clearly see the differences in performance across models in terms of pixel accuracy (PA) and mean Intersection over Union (mIoU). The large version of the model, which does not incorporate Depthwise Separable Convolutions (DSC), outperforms the smaller version with DSC in both metrics, and also shows significant advantages compared to other mainstream models such as U-Net, SegNet, DDRNet, and SmaAt-UNet. While the smaller version sacrifices some accuracy compared to the larger version, it has lower inference speed and computational complexity, making it more suitable for real-time deployment.

In the ablation study, introducing DSC significantly reduced the model’s parameter count and computational complexity, while improving inference speed, though it led to a slight decrease in accuracy. This indicates that when aiming for efficiency, a trade-off between accuracy and complexity is necessary. The combination of self-attention mechanisms and Transformer encoders provided strong support for performance improvement, though it also increased computational complexity. Through in-depth analysis of different model configurations, we found that appropriate architecture design can achieve a good balance between accuracy and computational complexity.

Additionally, as shown in Figure 5, part of the visual results demonstrates that the segmentation performance of our model is very close to that of U-Net, indicating that our approach effectively captures details and boundary information while maintaining lower computational complexity and parameter count. The figure also shows the classification results of other models, and while some models perform well under specific conditions, they are overall outperformed by the model proposed in this paper. Overall, our model demonstrates both efficiency and good classification accuracy, while reducing computational complexity and improving real-time performance in practical applications.

Future research could further explore ways to reduce computational complexity while maintaining or improving model accuracy. These investigations will provide more solutions for improving the estimation of calorific value in combined heat and power generation (CHPG) systems using straw combustion, promoting the development and utilization of renewable energy.

4. Conclusions

In this article, we address the issue of heat uncertainty caused by fluctuations in the calorific value of straw combustion in combined heat and power Generate (CHPG) systems. We propose a real-time mixed straw fuel calorific value estimation system based on an improved U-Net semantic segmentation model. The improved U-Net model incorporates self-attention mechanisms and depthwise separable convolutions, replacing the bottleneck section with a Transformer encoder. This approach significantly reduces computational complexity and parameter count while maintaining the model’s segmentation performance and generalization ability, enabling real-time estimation of the calorific value of mixed straw fuel. The results on image data captured from an actual thermal power plant validate the system’s potential in practical applications. Future research could explore more advanced and lightweight segmentation models to further enhance the accuracy and real-time performance of the model. Additionally, we plan to utilize more advanced image processing techniques to directly obtain the moisture content of the straw, reducing dependence on sensors and further improving the system’s real-time performance and accuracy.In summary, this study provides an innovative solution for real-time estimation of straw fuel calorific value, laying a solid foundation for the development and utilization of renewable energy.

Author Contributions

Conceptualization, L.L. and D.H.; Methodology, Z.W. and D.H.; Validation, L.L.; Investigation, L.L. and Z.W.; Writing—original draft, Z.W.; Writing—review & editing, D.H. All authors have read and agreed to the published version of the manuscript.

Funding

This work is supported by the Central Guidance Project for Local Scientific and Technological Development under Grant 2023ZY1045.

Data Availability Statement

The original contributions presented in the study are included in the article, further inquiries can be directed to the corresponding authors.

Conflicts of Interest

Author Lianming Li was employed by the company Jiaxing Newjies Thermal Power Co., Ltd. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

Nguyen, T.L.T.; Hermansen, J.E.; Mogensen, L. Environmental performance of crop residues as an energy source for electricity production: The case of wheat straw in Denmark. Appl. Energy 2013, 104, 633–641. [Google Scholar] [CrossRef]
Sheng, C.; Azevedo, J. Estimating the higher heating value of biomass fuels from basic analysis data. Biomass Bioenergy 2005, 28, 499–507. [Google Scholar] [CrossRef]
Williams, A.; Jones, J.; Ma, L.; Pourkashanian, M. Pollutants from the combustion of solid biomass fuels. Prog. Energy Combust. Sci. 2012, 38, 113–137. [Google Scholar] [CrossRef]
Chaney, J.; Liu, H.; Li, J. An overview of CFD modelling of small-scale fixed-bed biomass pellet boilers with preliminary results from a simplified approach. Energy Convers. Manag. 2012, 63, 149–156. [Google Scholar] [CrossRef]
Darvell, L.; Jones, J.; Gudka, B.; Baxter, X.; Saddawi, A.; Williams, A.; Malmgren, A. Combustion properties of some power station biomass fuels. Fuel 2010, 89, 2881–2890. [Google Scholar] [CrossRef]
Fagan, C.C.; Everard, C.D.; McDonnell, K. Prediction of moisture, calorific value, ash and carbon content of two dedicated bioenergy crops using near-infrared spectroscopy. Bioresour. Technol. 2011, 102, 5200–5206. [Google Scholar] [CrossRef]
Striūgas, N.; Vorotinskienė, L.; Paulauskas, R.; Navakas, R.; Džiugys, A.; Narbutas, L. Estimating the fuel moisture content to control the reciprocating grate furnace firing wet woody biomass. Energy Convers. Manag. 2017, 149, 937–949. [Google Scholar] [CrossRef]
Sahu, S.; Sarkar, P.; Chakraborty, N.; Adak, A. Thermogravimetric assessment of combustion characteristics of blends of a coal with different biomass chars. Fuel Process. Technol. 2010, 91, 369–378. [Google Scholar] [CrossRef]
Estiati, I.; Freire, F.B.; Freire, J.T.; Aguado, R.; Olazar, M. Fitting performance of artificial neural networks and empirical correlations to estimate higher heating values of biomass. Fuel 2016, 180, 377–383. [Google Scholar] [CrossRef]
Otsu, N. A Threshold Selection Method from Gray-Level Histograms. IEEE Trans. Syst. Man Cybern. 1979, 9, 62–66. [Google Scholar] [CrossRef]
Ghamisi, P.; Couceiro, M.S.; Martins, F.M.L.; Benediktsson, J.A. Multilevel Image Segmentation Based on Fractional-Order Darwinian Particle Swarm Optimization. IEEE Trans. Geosci. Remote Sens. 2014, 52, 2382–2394. [Google Scholar] [CrossRef]
Liu, Y.; Li, Q.; Farzaneh, M.; Du, B. Image Characteristic Extraction of Ice-Covered Outdoor Insulator for Monitoring Icing Degree. Energies 2020, 13, 5305. [Google Scholar] [CrossRef]
Davis, L.S. A survey of edge detection techniques. Comput. Graph. Image Process. 1975, 4, 248–270. [Google Scholar] [CrossRef]
Liu, J.; Yan, S.; Lu, N.; Yang, D.; Lv, H.; Wang, S.; Zhu, X.; Zhao, Y.; Wang, Y.; Ma, Z.; et al. Automated retinal boundary segmentation of optical coherence tomography images using an improved Canny operator. Sci. Rep. 2021, 12, 1412. [Google Scholar] [CrossRef] [PubMed]
Adams, R.; Bischof, L. Seeded Region Growing. IEEE Trans. Pattern Anal. Mach. Intell. 1994, 16, 641–647. [Google Scholar] [CrossRef]
Vincent, L.M.; Soille, P. Watersheds in Digital Spaces: An Efficient Algorithm Based on Immersion Simulations. IEEE Trans. Pattern Anal. Mach. Intell. 1991, 13, 583–598. [Google Scholar] [CrossRef]
Großkopf, J.; Matthes, J.; Vogelbacher, M.; Waibel, P. Evaluation of Deep Learning-Based Segmentation Methods for Industrial Burner Flames. Energies 2021, 14, 1716. [Google Scholar] [CrossRef]
Hensel, S.; Marinov, M.B.; Koch, M.; Arnaudov, D.D. Evaluation of Deep Learning-Based Neural Network Methods for Cloud Detection and Segmentation. Energies 2021, 14, 6156. [Google Scholar] [CrossRef]
Gao, R.; Sun, Z.; Li, W.; Pei, L.; Hu, Y.; Xiao, L. Automatic Coal and Gangue Segmentation Using U-Net Based Fully Convolutional Networks. Energies 2020, 13, 829. [Google Scholar] [CrossRef]
Shelhamer, E.; Long, J.; Darrell, T. Fully Convolutional Networks for Semantic Segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 640–651. [Google Scholar] [CrossRef]
Yao, J.; Wang, D.; Hu, H.; Xing, W.; Wang, L. ADCNN: Towards learning adaptive dilation for convolutional neural networks. Pattern Recognit. 2021, 123, 108369. [Google Scholar] [CrossRef]
Zheng, S.; Jayasumana, S.; Romera-Paredes, B.; Vineet, V.; Su, Z.; Du, D.; Huang, C.; Torr, P.H.S. Conditional Random Fields as Recurrent Neural Networks. In Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 7–13 December 2015; pp. 1529–1537. [Google Scholar]
Elman, J.L. Finding Structure in Time. Cogn. Sci. 1990, 14, 179–211. [Google Scholar] [CrossRef]
Rumelhart, D.E.; Hinton, G.E.; Williams, R.J. Learning representations by back-propagating errors. Nature 1986, 323, 533–536. [Google Scholar] [CrossRef]
Badrinarayanan, V.; Kendall, A.; Cipolla, R. SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2015, 39, 2481–2495. [Google Scholar] [CrossRef]
Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional Networks for Biomedical Image Segmentation. arXiv 2015, arXiv:abs/1505.04597. [Google Scholar]
Vaswani, A.; Shazeer, N.M.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention is All you Need. arXiv 2017, arXiv:abs/1706.03762. [Google Scholar]
Zheng, S.; Lu, J.; Zhao, H.; Zhu, X.; Luo, Z.; Wang, Y.; Fu, Y.; Feng, J.; Xiang, T.; Torr, P.H.S.; et al. Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers. In Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 6877–6886. [Google Scholar]
Wang, W.; Xie, E.; Li, X.; Fan, D.P.; Song, K.; Liang, D.; Lu, T.; Luo, P.; Shao, L. Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions. In Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Virtual, 11–17 October 2021; pp. 548–558. [Google Scholar]
Lin, T.Y.; Dollár, P.; Girshick, R.B.; He, K.; Hariharan, B.; Belongie, S.J. Feature Pyramid Networks for Object Detection. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 936–944. [Google Scholar]
Howard, A.G.; Zhu, M.; Chen, B.; Kalenichenko, D.; Wang, W.; Weyand, T.; Andreetto, M.; Adam, H. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications. arXiv 2017, arXiv:abs/1704.04861. [Google Scholar]
Effenberger, H. Dampferzeugung; Springer: Berlin/Heidelberg, Germany, 2013. [Google Scholar]
Kortela, J.; Jämsä-Jounela, S.L. Modeling and model predictive control of the BioPower combined heat and power (CHP) plant. Int. J. Electr. Power Energy Syst. 2015, 65, 453–462. [Google Scholar] [CrossRef]
Kortela, J.; Jämsä-Jounela, S.L. Fuel-quality soft sensor using the dynamic superheater model for control strategy improvement of the Biopower 5 CHP plant. Int. J. Electr. Power Energy Syst. 2012, 42, 38–48. [Google Scholar] [CrossRef]
Cuiping, L.; Chuangzhi, W.; Haitao, H. Chemical elemental characteristics of biomass fuels in China. Biomass Bioenergy 2004, 27, 119–130. [Google Scholar] [CrossRef]
Kingma, D.P.; Ba, J. Adam: A Method for Stochastic Optimization. arXiv 2014, arXiv:abs/1412.6980. [Google Scholar]
Hong, Y.; Pan, H.; Sun, W.; Jia, Y. Deep Dual-resolution Networks for Real-time and Accurate Semantic Segmentation of Road Scenes. arXiv 2021, arXiv:abs/2101.06085. [Google Scholar]
Trebing, K.; Mehrkanoon, S. SmaAt-UNet: Precipitation Nowcasting using a Small Attention-UNet Architecture. arXiv 2020, arXiv:abs/2007.04417. [Google Scholar] [CrossRef]

Figure 1. A schematic of the CHPG process.

Figure 2. The configuration system for calorific value estimation.

Figure 3. The proposed model architecture.

Figure 4. Replacing the bottleneck with a Transformer encoder.

Figure 5. Visualization of segmentation results of different methods, where green represents sesame, purple denotes corn straw, red indicates wheat, and blue signifies the background.

Table 1. The specific parameters of the model in this article.

Layer	Conv Layer	Filter	Output Size
Input			256 × 256 × 3
	Encoder1	DSC 1, S = 1	256 × 256 × 64
		DSC 2, S = 2	128 × 128 × 64
	Encoder2	DSC 3, S = 1	128 × 128 × 128
Encoder		DSC 4, S =2	64 × 64 × 128
	Encoder3	DSC 5, S = 1	64 × 64 × 256
		DSC 6, S = 2	32 × 32 × 256
	Encoder4	DSC 7, S = 1	32 × 32 × 512
		DSC 8, S = 2	16 × 16 × 512
Bottleneck		Transformer Encoder	16 × 16 × 512
	Decoder1	DSC 9, S = 1	16 × 16 × 512
		DSC 10, S = 1	16 × 16 × 512
		Deconv	32 × 32 × 512
	Decoder2	DSC 11, S = 1	32 × 32 × 256
		DSC 12, S = 1	32 × 32 × 256
Decoder		Deconv	64 × 64 × 256
	Decoder3	DSC 13, S = 1	64 × 64 × 128
		DSC 14, S = 1	64 × 64 × 128
		Deconv	128 × 128 × 128
	Decoder4	DSC 15, S = 1	128 × 128 × 64
		DSC 16, S = 1	128 × 128 × 64
		Deconv	256 × 256 × 64
Output		PW	256 × 256 × 4

Table 2. Elemental composition of different species.

Species	wC (%)	wH (%)	wS (%)	wN (%)	wO (%)
Wheat	42.11 ± 2.12	6.53 ± 0.46	0.32 ± 0.10	0.58 ± 0.28	40.51 ± 2.67
Corn	42.69 ± 1.47	6.16 ± 0.81	0.21 ± 0.13	0.99 ± 0.20	42.69 ± 2.11
Sesame	41.34	6.57	0.29	0.81	45.16

Table 3. Performance comparison of various semantic segmentation models. Note: An upward arrow indicates that a higher value is better, while a downward arrow indicates that a lower value is preferable.

Model	PA (%) ^↑	mIoU (%) ^↑	Speed (ms) ^↓	Size (MB)
Unet	95.56	86.8	5.11	118.76
SegNET	94.554	84.8	4.26	112.325
DDRNet	87.655	71.8	2.62	21.7247
SmaAt-UNet	93.713	83.0	2.66	15.3824
Our model-large	96.309	89	5.45	129.69
Our model-small	95.359	86.6	3.44	31.176

Table 4. Comparative analysis of UNet variants through ablation study.

Model	PA (%) ^↑	mIoU (%) ^↑	Speed (ms) ^↓	Size (MB)
UNet	95.56	86.8	5.11	118.76
UNet-Se	93.361	82.1	3.08	23.07
UNet-Se-Atten	94.307	84.2	3.23	24.20
UNet-Atten	95.695	87.4	5.21	119.74
UNet-Atten-Trans	96.309	89.0	5.45	129.69
UNet-Se-Atten-Trans	95.359	86.6	3.44	31.18

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Li, L.; Wang, Z.; He, D. U-Net Semantic Segmentation-Based Calorific Value Estimation of Straw Multifuels for Combined Heat and Power Generation Processes. Energies 2024, 17, 5143. https://doi.org/10.3390/en17205143

AMA Style

Li L, Wang Z, He D. U-Net Semantic Segmentation-Based Calorific Value Estimation of Straw Multifuels for Combined Heat and Power Generation Processes. Energies. 2024; 17(20):5143. https://doi.org/10.3390/en17205143

Chicago/Turabian Style

Li, Lianming, Zhiwei Wang, and Defeng He. 2024. "U-Net Semantic Segmentation-Based Calorific Value Estimation of Straw Multifuels for Combined Heat and Power Generation Processes" Energies 17, no. 20: 5143. https://doi.org/10.3390/en17205143

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

U-Net Semantic Segmentation-Based Calorific Value Estimation of Straw Multifuels for Combined Heat and Power Generation Processes

Abstract

1. Introduction

2. Straw Calorific Value Estimation Method

2.1. Real-Time Heating Value Estimation System

2.2. Improved U-Net Semantic Segmentation Network

2.2.1. Encoder

2.2.2. Bottleneck

2.2.3. Decoder

2.2.4. Loss Function

2.3. Calorific Value Estimation

3. Experiments and Analysis

3.1. Data Augmentation

3.2. Evaluation Metrics

3.3. Comparison Experiments

3.4. Ablation Studies

3.5. Discussion

4. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI