CN112215122B

CN112215122B - Fire detection method, system, terminal and storage medium based on video image target detection

Info

Publication number: CN112215122B
Application number: CN202011069784.5A
Authority: CN
Inventors: 胡金星; 王传胜
Original assignee: Shenzhen Institute of Advanced Technology of CAS
Current assignee: Shenzhen Institute of Advanced Technology of CAS
Priority date: 2020-09-30
Filing date: 2020-09-30
Publication date: 2023-10-24
Anticipated expiration: 2040-09-30
Also published as: CN112215122A

Abstract

The application relates to a fire detection method, a fire detection system, a fire detection terminal and a fire detection storage medium based on video image target detection. Comprising the following steps: converting an original natural image into a dust haze image and a sand dust image by adopting a data enhancement algorithm based on an atmospheric scattering model, and generating a data set for training the model; constructing a convolutional neural network model LFNT, inputting a data set into the LFNT model for iterative training to obtain optimal model parameters; the convolutional neural network model LFNT comprises a skeleton feature extraction model, a main feature extraction model and a variable-scale feature fusion model; the skeleton feature extraction model extracts main features of an input image through convolution of three different scales; the main feature extraction model is used for carrying out further feature extraction on main features to generate three groups of feature graphs; and the variable scale feature fusion model carries out self-adaptive fusion on the three groups of feature graphs, and outputs a detection result. The method can improve the robustness of the model in abnormal weather such as sand dust, dust haze and the like, and enable the model to obtain a better detection result.

Description

Fire detection method, system, terminal and storage based on video image target detection medium

技术领域Technical field

本申请属于火灾检测技术领域，特别涉及一种基于视频图像目标检测的火灾检测方法、系统、终端以及存储介质。This application belongs to the field of fire detection technology, and particularly relates to a fire detection method, system, terminal and storage medium based on video image target detection.

背景技术Background technique

火灾检测在安全监控中起着至关重要。目前，传统的火灾检测方法为基于图像先验的方法，该方法是基于图像的颜色和形状进行火灾检测，然而由于颜色和运动特征的鲁棒性和误码率往往受到预先设定的参数的影响，导致在复杂的环境中无法应用，且定位准确度易受区域影响。Fire detection plays a vital role in safety monitoring. At present, the traditional fire detection method is based on the image prior method. This method detects fire based on the color and shape of the image. However, the robustness and bit error rate of color and motion features are often limited by preset parameters. The impact makes it impossible to apply in complex environments, and positioning accuracy is easily affected by regions.

监测是一项繁琐而耗时的工作，尤其是在不确定的监视环境下，它在时间、空间甚至规模上都具有很大的不确定性。基于传感器的探测器在误码率和感知范围方面的性能有限，因此，它无法探测到远距离或小型火灾。近年来，随着深度学习技术的迅速发展，卷积神经网络(CNN)被应用于火灾探测。然而，现有基于深度学习的火灾检测方法还存在以下不足：Monitoring is a tedious and time-consuming task, especially in an uncertain monitoring environment, which has great uncertainty in time, space and even scale. Sensor-based detectors have limited performance in terms of bit error rate and sensing range; therefore, they are unable to detect distant or small fires. In recent years, with the rapid development of deep learning technology, convolutional neural networks (CNN) have been applied to fire detection. However, existing fire detection methods based on deep learning still have the following shortcomings:

一、基于深度学习的方法需要大量的遥感图像作为训练数据，由于真实遥感图像的稀缺性，模型的训练具有很大的挑战性。1. Methods based on deep learning require a large amount of remote sensing images as training data. Due to the scarcity of real remote sensing images, model training is very challenging.

二、基于深度学习的火灾检测模型规模太大，不适合用于资源受限的设备。2. The fire detection model based on deep learning is too large and is not suitable for equipment with limited resources.

三、现有算法的复杂度太高，无法进行实时检测。3. The complexity of the existing algorithms is too high and cannot be detected in real time.

四、抗干扰能力弱，容易受到灰霾、粉尘等恶劣监测环境的影响。4. The anti-interference ability is weak and is easily affected by harsh monitoring environments such as haze and dust.

五、大多数火灾检测算法只关注单一环境，因此，在不确定的环境中会出现较高的错误率。5. Most fire detection algorithms only focus on a single environment, therefore, higher error rates will occur in uncertain environments.

综上所述，现有的火灾检测方法在算法复杂度、应用场景范围、模型大小等方面都具有很大的改进空间。In summary, existing fire detection methods have great room for improvement in terms of algorithm complexity, application scenario range, and model size.

发明内容Contents of the invention

本申请提供了一种基于视频图像目标检测的火灾检测方法、系统、终端以及存储介质，旨在至少在一定程度上解决现有技术中的上述技术问题之一。This application provides a fire detection method, system, terminal and storage medium based on video image target detection, aiming to solve one of the above technical problems in the prior art at least to a certain extent.

为了解决上述问题，本申请提供了如下技术方案：In order to solve the above problems, this application provides the following technical solutions:

一种基于视频图像目标检测的火灾检测方法，包括：A fire detection method based on video image target detection, including:

采用基于大气散射模型的数据增强算法将原始自然图像转换为灰霾图像及沙尘图像，生成用于训练模型的数据集；A data enhancement algorithm based on the atmospheric scattering model is used to convert the original natural images into haze images and dust images to generate a data set for training the model;

构建卷积神经网络模型LFNet，将所述数据集输入LFNet模型进行迭代训练，得到最优模型参数；所述骨架特征提取模型分别采用3＊3、5＊5和7＊7尺度的卷积提取输入图像的特征，得到尺寸分别为13＊13、26＊26和52＊52的特征图；主要特征提取模型对主要特征进行进一步的特征提取，生成大小分别为52＊52、26＊26、13＊13的三组特征图；变尺度特征融合模型将三组特征图映射到不同的卷积核和步长进行卷积，并拼接所有相同大小的卷积，得到三组特征映射，利用基于信道的注意机制操作所述三组特征映射，得到大小分别为13＊13、26＊26和52＊52的特征图，分别用于检测小、中、大型物体；所述将数据集输入LFNet模型进行迭代训练还包括：分别选取均方误差和交叉熵作为损失函数进行模型优化；Construct a convolutional neural network model LFNet, input the data set into the LFNet model for iterative training, and obtain the optimal model parameters; the skeleton feature extraction models respectively adopt convolution extraction at 3*3, 5*5 and 7*7 scales. Input the features of the image and obtain feature maps with sizes of 13*13, 26*26 and 52*52 respectively; the main feature extraction model further extracts the main features and generates sizes of 52*52, 26*26 and 13 respectively. *13 three sets of feature maps; the variable scale feature fusion model maps the three sets of feature maps to different convolution kernels and step sizes for convolution, and splices all convolutions of the same size to obtain three sets of feature maps, using channel-based The attention mechanism operates the three sets of feature maps to obtain feature maps with sizes of 13*13, 26*26 and 52*52 respectively, which are used to detect small, medium and large objects respectively; the data set is input into the LFNet model for Iterative training also includes: selecting mean square error and cross entropy as loss functions respectively for model optimization;

所述卷积神经网络模型LFNet包括骨架特征提取模型、主要特征提取模型和变尺度特征融合模型；所述骨架特征提取模型通过三个不同尺度的卷积提取输入图像的主要特征；所述主要特征提取模型用于对所述主要特征进行进一步的特征提取，生成三组特征图；所述变尺度特征融合模型对所述三组特征图进行自适应融合，输出检测结果；The convolutional neural network model LFNet includes a skeleton feature extraction model, a main feature extraction model and a variable scale feature fusion model; the skeleton feature extraction model extracts the main features of the input image through convolution at three different scales; the main features The extraction model is used to further extract the main features and generate three groups of feature maps; the variable scale feature fusion model adaptively fuses the three groups of feature maps and outputs detection results;

将待检测火灾图像输入训练好的LFNet模型，通过LFNet模型输出待检测火灾图像的火灾定位区域以及火灾类型。Input the fire image to be detected into the trained LFNet model, and output the fire location area and fire type of the fire image to be detected through the LFNet model.

本申请实施例采取的技术方案还包括：所述采用基于大气散射模型的数据增强算法将原始自然图像转换为灰霾图像及沙尘图像前包括：The technical solutions adopted by the embodiments of the present application also include: before using the data enhancement algorithm based on the atmospheric scattering model to convert the original natural image into a haze image and a dust image, the following steps are included:

获取原始自然图像；所述原始自然图像包括没有火灾报警区域的非报警图像和真实的火灾报警图像。Obtain original natural images; the original natural images include non-alarm images without fire alarm areas and real fire alarm images.

本申请实施例采取的技术方案还包括：所述采用基于大气散射模型的数据增强算法将原始自然图像转换为灰霾图像包括：The technical solutions adopted by the embodiments of this application also include: The use of a data enhancement algorithm based on the atmospheric scattering model to convert the original natural image into a haze image includes:

所述大气散射模型分别采用至少两种传输速率分别模拟生成不同浓度的灰霾图像；所述灰霾图像成像公式为：The atmospheric scattering model uses at least two transmission rates to simulate and generate haze images of different concentrations; the haze image imaging formula is:

I(x)＝J(x)t(x)+ɑ(1-t(x))I(x)＝J(x)t(x)+ɑ(1-t(x))

上述公式中，I(x)是模拟出来的灰霾图像，J(x)是输入的无雾图像，ɑ是大气光值，t(x)是场景传输速率。In the above formula, I(x) is the simulated haze image, J(x) is the input haze-free image, ɑ is the atmospheric light value, and t(x) is the scene transmission rate.

本申请实施例采取的技术方案还包括：所述采用基于大气散射模型的数据增强算法将原始自然图像转换为沙尘图像包括：The technical solutions adopted by the embodiments of this application also include: The use of a data enhancement algorithm based on the atmospheric scattering model to convert the original natural image into a sand and dust image includes:

所述大气散射模型采用固定透射率和大气光值，结合三种颜色模拟生成不同浓度的沙尘图像；所述沙尘图像模拟公式为：The atmospheric scattering model uses a fixed transmittance and atmospheric light value, and combines three colors to simulate and generate sand and dust images of different concentrations; the sand and dust image simulation formula is:

D(x)＝J(x)t(x)+a(C(x)*(1-t(x)))D(x)=J(x)t(x)+a(C(x)*(1-t(x)))

上述公式中，D(x)为模拟出的沙尘图像，J(x)为输入的无雾图像，C(x)为颜色值。In the above formula, D(x) is the simulated dust image, J(x) is the input haze-free image, and C(x) is the color value.

本申请实施例采取的技术方案还包括：所述损失函数具体为：The technical solutions adopted in the embodiments of this application also include: the loss function is specifically:

统计火灾区域的路径的亮度、暗通道值和R通道数据，将所述统计数据视为燃烧直方图先验，写成CHP的公式：The brightness, dark channel value and R channel data of the path in the fire area are counted. The statistical data are regarded as a combustion histogram prior and written as the formula of CHP:

上述公式中，R(x)代表图像的R通道，SCP(x)是图像亮度与暗通道的差值，w是直方图的宽度，h是直方图的高度；In the above formula, R(x) represents the R channel of the image, SCP(x) is the difference between the brightness and dark channels of the image, w is the width of the histogram, and h is the height of the histogram;

SCP(x)＝||v(x)-DCP(x)||SCP(x)=||v(x)-DCP(x)||

上述公式中，v(x)是图像的亮度，DCP(x)是图像暗通道的值；In the above formula, v(x) is the brightness of the image, and DCP(x) is the value of the dark channel of the image;

L_CHP＝||CHP(I)-CHP(R)||² L _CHP =||CHP(I)-CHP(R)|| ²

上述公式中，CHP代表燃烧直方图先验，CHP(I)和CHP(R)分别代表目标检测算法选中的区域和标注的区域的CHP值；In the above formula, CHP represents the combustion histogram prior, CHP(I) and CHP(R) represent the CHP value of the area selected by the target detection algorithm and the marked area respectively;

所述损失函数为将三个不同的损失函数进行加权求和：The loss function is a weighted sum of three different loss functions:

L_CHP＝βL_CE+γL_MSE+δL_CHP L _CHP =βL _CE +γL _MSE +δL _CHP

上述公式中，L_CHP为最终的损失函数，I_CE为交叉熵损失函数，L_MSE为均方差损失函数，L_CHP为燃烧直方图先验损失。In the above formula, L _CHP is the final loss function, _ICE is the cross entropy loss function, L _MSE is the mean square error loss function, and L _CHP is the combustion histogram prior loss.

本申请实施例采取的另一技术方案为：一种基于视频图像目标检测的火灾检测系统，包括：Another technical solution adopted by the embodiment of the present application is: a fire detection system based on video image target detection, including:

数据集构建模块：用于采用基于大气散射模型的数据增强算法将原始自然图像转换为灰霾图像及沙尘图像，生成用于训练模型的数据集；Dataset building module: used to convert original natural images into haze images and dust images using a data enhancement algorithm based on the atmospheric scattering model, and generate a data set for training the model;

LFNet模型训练模块：用于构建卷积神经网络模型LFNet，将所述数据集输入LFNet模型进行迭代训练，得到最优模型参数；所述骨架特征提取模型分别采用3＊3、5＊5和7＊7尺度的卷积提取输入图像的特征，得到尺寸分别为13＊13、26＊26和52＊52的特征图；主要特征提取模型对主要特征进行进一步的特征提取，生成大小分别为52＊52、26＊26、13＊13的三组特征图；变尺度特征融合模型将三组特征图映射到不同的卷积核和步长进行卷积，并拼接所有相同大小的卷积，得到三组特征映射，利用基于信道的注意机制操作所述三组特征映射，得到大小分别为13＊13、26＊26和52＊52的特征图，分别用于检测小、中、大型物体；所述将数据集输入LFNet模型进行迭代训练还包括：分别选取均方误差和交叉熵作为损失函数进行模型优化；LFNet model training module: used to construct the convolutional neural network model LFNet, input the data set into the LFNet model for iterative training, and obtain the optimal model parameters; the skeleton feature extraction models adopt 3*3, 5*5 and 7 respectively. *7-scale convolution extracts the features of the input image and obtains feature maps with sizes of 13*13, 26*26 and 52*52 respectively; the main feature extraction model further extracts the main features and generates features with sizes of 52*. Three sets of feature maps: 52, 26*26, and 13*13; the variable scale feature fusion model maps the three sets of feature maps to different convolution kernels and step sizes for convolution, and splices all convolutions of the same size to obtain three sets of feature maps. A group of feature maps uses a channel-based attention mechanism to operate the three groups of feature maps to obtain feature maps with sizes of 13*13, 26*26 and 52*52 respectively, which are used to detect small, medium and large objects respectively; Inputting the data set into the LFNet model for iterative training also includes: selecting the mean square error and cross entropy as the loss function respectively for model optimization;

所述卷积神经网络模型LFNet包括骨架特征提取模型、主要特征提取模型和变尺度特征融合模型；所述骨架特征提取模型通过三个不同尺度的卷积提取输入图像的主要特征；所述主要特征提取模型用于对所述主要特征进行进一步的特征提取，生成三组特征图；所述变尺度特征融合模型对所述三组特征图进行自适应融合，输出检测结果；所述检测结果包括火灾图像的火灾定位区域以及火灾类型。The convolutional neural network model LFNet includes a skeleton feature extraction model, a main feature extraction model and a variable scale feature fusion model; the skeleton feature extraction model extracts the main features of the input image through convolution at three different scales; the main features The extraction model is used to further extract the main features and generate three sets of feature maps; the variable scale feature fusion model adaptively fuses the three sets of feature maps and outputs detection results; the detection results include fire The fire location area of the image and the type of fire.

本申请实施例采取的又一技术方案为：一种终端，所述终端包括处理器、与所述处理器耦接的存储器，其中，Another technical solution adopted by the embodiment of the present application is: a terminal, the terminal includes a processor and a memory coupled to the processor, wherein,

所述存储器存储有用于实现所述基于视频图像目标检测的火灾检测方法的程序指令；The memory stores program instructions for implementing the fire detection method based on video image target detection;

所述处理器用于执行所述存储器存储的所述程序指令以控制基于视频图像目标检测的火灾检测。The processor is configured to execute the program instructions stored in the memory to control fire detection based on video image target detection.

本申请实施例采取的又一技术方案为：一种存储介质，存储有处理器可运行的程序指令，所述程序指令用于执行所述基于视频图像目标检测的火灾检测方法。Another technical solution adopted by the embodiment of the present application is: a storage medium that stores program instructions executable by a processor, and the program instructions are used to execute the fire detection method based on video image target detection.

相对于现有技术，本申请实施例产生的有益效果在于：本申请实施例的基于视频图像目标检测的火灾检测方法、系统、终端以及存储介质通过使用基于大气散射模型的数据增强算法将原始图像转换为受不同程度的灰霾或沙尘图像，生成用于训练模型的数据集，并构建适用于不确定环境下火灾烟雾探测的卷积神经网络模型LFNet，能够提高模型在沙尘和灰霾等异常天气下的鲁棒性，使模型获得更好的检测结果。同时，由于本申请实施例的LFNet模型尺寸较小，可以降低计算成本，并有利于LFNet模型应用于资源受限的设备。Compared with the existing technology, the beneficial effects produced by the embodiments of the present application are that: the fire detection methods, systems, terminals and storage media based on video image target detection of the embodiments of the present application use a data enhancement algorithm based on the atmospheric scattering model to transform the original image into Convert to images affected by haze or sand to varying degrees, generate a data set for training the model, and build a convolutional neural network model LFNet suitable for fire smoke detection in uncertain environments, which can improve the model's performance in sand, dust and haze The robustness under abnormal weather conditions enables the model to obtain better detection results. At the same time, due to the small size of the LFNet model in the embodiment of the present application, the calculation cost can be reduced, and it is beneficial for the LFNet model to be applied to devices with limited resources.

附图说明Description of the drawings

图1是本申请实施例的基于视频图像目标检测的火灾检测方法的流程图；Figure 1 is a flow chart of a fire detection method based on video image target detection according to an embodiment of the present application;

图2是本申请实施例基于大气散射模型的灰霾和沙尘图像模拟效果示意图；Figure 2 is a schematic diagram of the haze and dust image simulation effects based on the atmospheric scattering model according to the embodiment of the present application;

图3是本申请实施例的卷积神经网络模型的框架图；Figure 3 is a framework diagram of the convolutional neural network model according to the embodiment of the present application;

图4是本申请实施例的变尺度特征融合模型的结构图；Figure 4 is a structural diagram of the variable scale feature fusion model according to the embodiment of the present application;

图5是本申请实施例的基于信道的注意机制的结构图；Figure 5 is a structural diagram of the channel-based attention mechanism according to the embodiment of the present application;

图6为本申请实施例的基于视频图像目标检测的火灾检测系统结构示意图；Figure 6 is a schematic structural diagram of a fire detection system based on video image target detection according to an embodiment of the present application;

图7为本申请实施例的终端结构示意图；Figure 7 is a schematic structural diagram of a terminal according to an embodiment of the present application;

图8为本申请实施例的存储介质的结构示意图。Figure 8 is a schematic structural diagram of a storage medium according to an embodiment of the present application.

具体实施方式Detailed ways

为了使本申请的目的、技术方案及优点更加清楚明白，以下结合附图及实施例，对本申请进行进一步详细说明。应当理解，此处所描述的具体实施例仅用以解释本申请，并不用于限定本申请。In order to make the purpose, technical solutions and advantages of the present application more clear, the present application will be further described in detail below with reference to the drawings and embodiments. It should be understood that the specific embodiments described here are only used to explain the present application and are not used to limit the present application.

请参阅图1，是本申请实施例的基于视频图像目标检测的火灾检测方法的流程图。本申请实施例的基于视频图像目标检测的火灾检测方法包括以下步骤：Please refer to Figure 1, which is a flow chart of a fire detection method based on video image target detection according to an embodiment of the present application. The fire detection method based on video image target detection in the embodiment of the present application includes the following steps:

S10：获取原始自然图像；S10: Obtain original natural images;

本步骤中，获取的原始自然图像包括293个没有火灾报警区域的非报警图像和5073个真实的火灾报警图像。利用非报警图像可以提高训练算法对非报警目标的鲁棒性，降低检测器的误码率。利用真实的火灾报警图像可以提高目标检测模型的检测能力。In this step, the original natural images obtained include 293 non-alarm images without fire alarm areas and 5073 real fire alarm images. Using non-alarm images can improve the robustness of the training algorithm to non-alarm targets and reduce the bit error rate of the detector. Utilizing real fire alarm images can improve the detection capabilities of target detection models.

S20：采用基于大气散射模型的数据增强算法将原始自然图像转换为受不同类型及不同程度的异常天气影响的新合成图像，生成用于训练模型的数据集；S20: Use a data enhancement algorithm based on the atmospheric scattering model to convert original natural images into new synthetic images affected by different types and degrees of abnormal weather, and generate a data set for training the model;

本步骤中，由于现有的智能监控算法通常会忽略灰霾或沙尘等异常天气对性能的影响，导致监测算法在不确定气候条件下的鲁棒性较差。为了解决上述不足，本发明考虑了异常天气对火灾探测算法的影响问题，通过基于大气散射模型的数据增强方法分别模拟不同程度的灰霾图像及沙尘图像，从而将原始自然图像转换为受不同程度的灰霾或沙尘天气影响的新合成图像，构建用于训练和测试火灾检测模型的大规模基准数据集，以提高目标检测模型在沙尘和灰霾等异常天气下的鲁棒性。In this step, because existing intelligent monitoring algorithms usually ignore the impact of abnormal weather such as haze or sand dust on performance, the monitoring algorithm has poor robustness under uncertain climate conditions. In order to solve the above shortcomings, the present invention considers the impact of abnormal weather on fire detection algorithms, and simulates haze images and dust images of different degrees through a data enhancement method based on the atmospheric scattering model, thereby converting original natural images into images affected by different Based on new synthetic images affected by haze or dust weather, a large-scale benchmark data set is constructed for training and testing fire detection models to improve the robustness of the target detection model under abnormal weather such as sand, dust and haze.

进一步地，请参阅图2，是本申请实施例基于大气散射模型的灰霾和沙尘图像模拟效果示意图，其中，(a)为原始图像，(b)、(c)和(d)分别为不同传输速率的大气散射模型合成的灰霾图像，(e)、(f)和(g)分别为采用固定透射率和大气光值，结合三种不同颜色模拟的沙尘图像。灰霾图像成像公式为：Further, please refer to Figure 2, which is a schematic diagram of the haze and dust image simulation effects based on the atmospheric scattering model according to the embodiment of the present application, in which (a) is the original image, (b), (c) and (d) are respectively Haze images synthesized by atmospheric scattering models with different transmission rates. (e), (f) and (g) are sand and dust images simulated using fixed transmittance and atmospheric light values, combined with three different colors respectively. The haze image imaging formula is:

I(x)＝J(x)t(x)+ɑ(1-t(x)) (1)I(x)＝J(x)t(x)+ɑ(1-t(x)) (1)

公式(1)中，I(x)是模拟出来的灰霾图像，J(x)是输入的无雾图像，ɑ是大气光值，t(x)是场景传输速率，该速率描述了视图中未散射并到达相机传感器的部分。为了模拟不同浓度的灰霾天气，本申请实施例将大气光值ɑ设为0.8，将透射率分别设为0.8、0.6和0.4。In formula (1), I(x) is the simulated haze image, J(x) is the input haze-free image, ɑ is the atmospheric light value, t(x) is the scene transmission rate, which describes the scene transmission rate in the view. The part that is not scattered and reaches the camera sensor. In order to simulate haze weather with different concentrations, the embodiment of the present application sets the atmospheric light value ɑ to 0.8, and sets the transmittance to 0.8, 0.6 and 0.4 respectively.

由于深度信息在图像除尘任务中不起主要作用，因此假定传输不随图像的深度而改变。通过先验统计，本申请实施例选择了三种适合模拟沙尘图像的颜色分别进行模拟，沙尘图像模拟公式为：Since depth information does not play a major role in image dusting tasks, the transfer is assumed not to change with the depth of the image. Through a priori statistics, the embodiment of this application selects three colors suitable for simulating sand and dust images for simulation respectively. The sand and dust image simulation formula is:

D(x)＝J(x)t(x)+a(C(x)*(1-t(x))) (2)D(x)=J(x)t(x)+a(C(x)*(1-t(x))) (2)

公式(2)中，D(x)为模拟出的沙尘图像，J(x)为输入的无雾图像，C(x)为选择的颜色值。In formula (2), D(x) is the simulated dust image, J(x) is the input haze-free image, and C(x) is the selected color value.

S30：构建卷积神经网络模型LFNet；S30: Construct the convolutional neural network model LFNet;

本申请实施例中，卷积神经网络模型的框架如图3所示。LFNet由公共卷积层、瓶颈构建块、参数校正线性单元、组规范化等组成，包括：骨架特征提取模型、主要特征提取模型和变尺度特征融合模型，各模型功能具体为：In the embodiment of this application, the framework of the convolutional neural network model is shown in Figure 3. LFNet is composed of common convolutional layers, bottleneck building blocks, parameter correction linear units, group normalization, etc., including: skeleton feature extraction model, main feature extraction model and variable scale feature fusion model. The specific functions of each model are:

骨架特征提取模型：用于提取输入图像的主要特征。为了提取更丰富的图像特征，首先分别采用3＊3、5＊5和7＊7尺度的卷积提取输入图像的特征，扩大接受野，提取更多的图像特征。通过三个不同尺度的卷积后，得到尺寸分别为13＊13、26＊26和52＊52的特征图。基于上述，通过采用多尺度卷积进行特征图提取，可以提取出像素周围不同大小的特征信息，这对于火灾图像尤为重要。Skeleton feature extraction model: used to extract the main features of the input image. In order to extract richer image features, we first use 3*3, 5*5 and 7*7 scale convolutions to extract the features of the input image, expand the acceptance field, and extract more image features. After convolution at three different scales, feature maps with sizes of 13*13, 26*26 and 52*52 are obtained. Based on the above, by using multi-scale convolution for feature map extraction, feature information of different sizes around the pixels can be extracted, which is particularly important for fire images.

主要特征提取模型：用于对骨架特征提取模型提取的主要特征进行进一步的特征提取，并生成大小分别为52＊52、26＊26、13＊13的三组特征图，每个小尺寸的特征图都是从上层较大尺寸的特征图中提取出来的，每个卷积块由一层卷积结构和五层残差结构进行提取。Main feature extraction model: used to further extract features from the main features extracted by the skeleton feature extraction model, and generate three sets of feature maps with sizes of 52*52, 26*26, and 13*13. Each small-sized feature The maps are extracted from the larger-sized feature maps of the upper layer, and each convolution block is extracted by a layer of convolution structure and a five-layer residual structure.

变尺度特征融合模型：用于采用变尺度特征融合(VSFF)对主要特征提取模型提取的特征串接起来，然后利用卷积提取特征，并对特征进行自适应融合。变尺度特征融合模型的结构如图4所示。为了融合不同尺度的卷积提取的特征图，将三组特征图映射进行融合，将13＊13和26＊26的功能扩展到52＊52。三个输入是尺寸分别为13＊13、26＊26、52＊52的特征图，将三个不同尺寸的特征图映射到不同的卷积核和步长进行卷积，使上采样或下采样成为另外两种尺寸。最后，拼接所有相同大小的卷积，得到三组特征映射。由于拼接得到的特征图包含了更丰富的图像特征，因此可以使模型定位更加精确。Variable scale feature fusion model: used to use variable scale feature fusion (VSFF) to concatenate the features extracted by the main feature extraction model, and then use convolution to extract features and perform adaptive fusion of the features. The structure of the variable-scale feature fusion model is shown in Figure 4. In order to fuse the feature maps extracted by convolution of different scales, three sets of feature map maps are fused, and the functions of 13*13 and 26*26 are extended to 52*52. The three inputs are feature maps with sizes of 13*13, 26*26, and 52*52 respectively. The three feature maps of different sizes are mapped to different convolution kernels and step sizes for convolution, allowing upsampling or downsampling. Become two other sizes. Finally, all convolutions of the same size are concatenated to obtain three sets of feature maps. Since the feature map obtained by splicing contains richer image features, the model positioning can be made more accurate.

进一步地，本申请实施例利用基于信道的注意机制操作VSFF中提取的三组特征映射。基于信道的注意机制可以看作是根据特征图的重要性对其进行加权的过程。例如，在一组24＊13＊13的卷积中，基于信道的注意机制将确定该组特征映射中的哪一个对预测结果有更显著的影响，然后增加该部分的权重。借助注意机制，进行三次融合，得到大小分别为13＊13、26＊26和52＊52的特征图，分别用于检测小、中、大型物体。基于信道的注意机制的详细结构如图5所示。Furthermore, the embodiment of the present application uses a channel-based attention mechanism to operate the three sets of feature maps extracted in VSFF. The channel-based attention mechanism can be viewed as a process of weighting feature maps according to their importance. For example, in a set of 24*13*13 convolutions, the channel-based attention mechanism will determine which feature map in the set has a more significant impact on the prediction result, and then increase the weight of that part. With the help of the attention mechanism, three fusions are performed to obtain feature maps with sizes of 13*13, 26*26 and 52*52, which are used to detect small, medium and large objects respectively. The detailed structure of the channel-based attention mechanism is shown in Figure 5.

基于上述结构，本申请实施例的LFNet模型的尺寸非常小(22.5M)，但在定量和定性评估方面都占据了领先地位，降低了计算成本，有利于LNet应用于资源受限的设备。Based on the above structure, the size of the LFNet model in the embodiment of the present application is very small (22.5M), but it occupies a leading position in both quantitative and qualitative evaluation, reducing the computational cost and conducive to the application of LNet to resource-limited devices.

S40：将数据集输入LFNet模型进行迭代训练，得到最优模型参数；S40: Input the data set into the LFNet model for iterative training to obtain optimal model parameters;

本步骤中，模型训练过程中，LFNet模型有两个任务：一是准确定位图像中的报警区域；二是对报警区域的灾害类型进行分类。为了使模型更好地完成这两个任务，本申请实施例分别选取均方误差(MSE)和交叉熵(CE)作为损失函数指导网络优化，该损失函数基于对不同火灾图像或视频的大量统计，可以帮助LFNet有效地检测火灾区域。In this step, during the model training process, the LFNet model has two tasks: one is to accurately locate the alarm area in the image; the other is to classify the disaster type in the alarm area. In order to enable the model to better complete these two tasks, the embodiment of this application selects mean square error (MSE) and cross entropy (CE) as the loss function to guide network optimization. This loss function is based on a large number of statistics on different fire images or videos. , can help LFNet effectively detect fire areas.

具体地，经过对各种火灾图像进行大量实验发现，在烟雾区域，其亮度与暗通道值之差的绝对值高于其他区域，火灾区域的R通道高于非火区域，即路径的亮度、暗通道值和R通道随火灾危险区域的不同而变化，烟雾浓度随亮度与暗通道的差的绝对值而增大，火灾的视觉特征与R通道的像素值密切相关。基于上述特征，本申请实施例将这些统计数据视为燃烧直方图先验(CHP)，根据这些统计数据，将其写成CHP的公式：Specifically, after conducting a large number of experiments on various fire images, it was found that in the smoke area, the absolute value of the difference between the brightness and dark channel values is higher than that in other areas, and the R channel in the fire area is higher than that in the non-fire area, that is, the brightness of the path, The dark channel value and R channel vary with the fire danger area. The smoke concentration increases with the absolute value of the difference between the brightness and the dark channel. The visual characteristics of the fire are closely related to the pixel value of the R channel. Based on the above characteristics, the embodiment of the present application regards these statistical data as combustion histogram prior (CHP), and based on these statistical data, it is written as the formula of CHP:

公式(3)中，R(x)代表图像的R通道，SCP(x)是图像亮度与暗通道的差值，,w是直方图的宽度，h是直方图的高度；也可以被写成为：In formula (3), R(x) represents the R channel of the image, SCP(x) is the difference between the brightness and dark channels of the image, w is the width of the histogram, and h is the height of the histogram; it can also be written as :

SCP(x)＝||v(x)-DCP(x)|| (4)SCP(x)＝||v(x)-DCP(x)|| (4)

公式(4)中，v(x)是图像的亮度，DCP(x)是指图像暗通道的值。In formula (4), v(x) is the brightness of the image, and DCP(x) refers to the value of the dark channel of the image.

L_CHP＝||CHP(I)-CHP(R)||² (5)L _CHP =||CHP(I)-CHP(R)|| ² (5)

公式(5)中，CHP代表燃烧直方图先验，CHP(I)和CHP(R)分别代表目标检测算法选中的区域和ground truth中标注的区域的CHP的值。In formula (5), CHP represents the combustion histogram prior, and CHP(I) and CHP(R) represent the CHP values of the area selected by the target detection algorithm and the area marked in the ground truth respectively.

最终的损失函数为交叉熵损失函数、均方差损失函数和燃烧直方图先验损失函数三个不同的损失函数进行加权求和，公式为：The final loss function is the weighted sum of three different loss functions: cross entropy loss function, mean square error loss function and combustion histogram prior loss function. The formula is:

L_CHP＝βL_CE+γL_MSE+δL_CHP (6)L _CHP =βL _CE +γL _MSE +δL _CHP (6)

公式(6)中，L_CHP为最终的损失函数，L_CE为交叉熵损失函数，L_MSE为均方差损失函数，L_CHP为燃烧直方图先验损失，β、γ和δ分别设定为0.25、0.25和0.5。In formula (6), L _CHP is the final loss function, L _CE is the cross entropy loss function, L _MSE is the mean square error loss function, L _CHP is the combustion histogram prior loss, β, γ and δ are set to 0.25 respectively. , 0.25 and 0.5.

S50：将待检测火灾图像输入训练好的LFNet模型，通过LFNet模型输出待检测火灾图像的火灾定位区域以及火灾类型。S50: Input the fire image to be detected into the trained LFNet model, and output the fire location area and fire type of the fire image to be detected through the LFNet model.

请参阅图6，是本申请实施例的基于视频图像目标检测的火灾检测系统的结构示意图。本申请实施例的基于视频图像目标检测的火灾检测系统40包括：Please refer to FIG. 6 , which is a schematic structural diagram of a fire detection system based on video image target detection according to an embodiment of the present application. The fire detection system 40 based on video image target detection in the embodiment of the present application includes:

数据集构建模块41：用于采用基于大气散射模型的数据增强算法将原始自然图像转换为灰霾图像及沙尘图像，生成用于训练模型的数据集；Data set construction module 41: used to convert original natural images into haze images and dust images using a data enhancement algorithm based on the atmospheric scattering model, and generate a data set for training the model;

LFNet模型训练模块42：用于构建卷积神经网络模型LFNet，将所述数据集输入LFNet模型进行迭代训练，得到最优模型参数；所述骨架特征提取模型分别采用3＊3、5＊5和7＊7尺度的卷积提取输入图像的特征，得到尺寸分别为13＊13、26＊26和52＊52的特征图；主要特征提取模型对主要特征进行进一步的特征提取，生成大小分别为52＊52、26＊26、13＊13的三组特征图；变尺度特征融合模型将三组特征图映射到不同的卷积核和步长进行卷积，并拼接所有相同大小的卷积，得到三组特征映射，利用基于信道的注意机制操作所述三组特征映射，得到大小分别为13＊13、26＊26和52＊52的特征图，分别用于检测小、中、大型物体；所述将数据集输入LFNet模型进行迭代训练还包括：分别选取均方误差和交叉熵作为损失函数进行模型优化；LFNet model training module 42: used to construct the convolutional neural network model LFNet, input the data set into the LFNet model for iterative training, and obtain optimal model parameters; the skeleton feature extraction models respectively adopt 3*3, 5*5 and The 7*7 scale convolution extracts the features of the input image and obtains feature maps with sizes of 13*13, 26*26 and 52*52 respectively; the main feature extraction model further extracts the main features and generates sizes of 52 Three sets of feature maps of *52, 26*26, and 13*13; the variable scale feature fusion model maps the three sets of feature maps to different convolution kernels and step sizes for convolution, and splices all convolutions of the same size to obtain Three sets of feature maps are used to operate the three sets of feature maps using a channel-based attention mechanism to obtain feature maps with sizes of 13*13, 26*26 and 52*52 respectively, which are used to detect small, medium and large objects respectively; so The above-mentioned input of the data set into the LFNet model for iterative training also includes: selecting the mean square error and cross entropy as the loss function for model optimization;

模型优化模块43：用于分别选取均方误差和交叉熵作为损失函数进行模型优化。Model optimization module 43: used to select mean square error and cross entropy as loss functions respectively for model optimization.

请参阅图7，为本申请实施例的终端结构示意图。该终端50包括处理器51、与处理器51耦接的存储器52。Please refer to Figure 7, which is a schematic structural diagram of a terminal according to an embodiment of the present application. The terminal 50 includes a processor 51 and a memory 52 coupled to the processor 51 .

存储器52存储有用于实现上述基于视频图像目标检测的火灾检测方法的程序指令。The memory 52 stores program instructions for implementing the above fire detection method based on video image target detection.

处理器51用于执行存储器52存储的程序指令以控制基于视频图像目标检测的火灾检测。The processor 51 is used to execute program instructions stored in the memory 52 to control fire detection based on video image target detection.

其中，处理器51还可以称为CPU(Central Processing Unit，中央处理单元)。处理器51可能是一种集成电路芯片，具有信号的处理能力。处理器51还可以是通用处理器、数字信号处理器(DSP)、专用集成电路(ASIC)、现成可编程门阵列(FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。The processor 51 may also be called a CPU (Central Processing Unit). The processor 51 may be an integrated circuit chip with signal processing capabilities. The processor 51 may also be a general-purpose processor, a digital signal processor (DSP), an application-specific integrated circuit (ASIC), an off-the-shelf programmable gate array (FPGA) or other programmable logic device, a discrete gate or transistor logic device, or a discrete hardware component. . A general-purpose processor may be a microprocessor or the processor may be any conventional processor, etc.

请参阅图8，为本申请实施例的存储介质的结构示意图。本申请实施例的存储介质存储有能够实现上述所有方法的程序文件61，其中，该程序文件61可以以软件产品的形式存储在上述存储介质中，包括若干指令用以使得一台计算机设备(可以是个人计算机，服务器，或者网络设备等)或处理器(processor)执行本发明各个实施方式方法的全部或部分步骤。而前述的存储介质包括：U盘、移动硬盘、只读存储器(ROM，Read－Only Memory)、随机存取存储器(RAM，Random Access Memory)、磁碟或者光盘等各种可以存储程序代码的介质，或者是计算机、服务器、手机、平板等终端设备。Please refer to FIG. 8 , which is a schematic structural diagram of a storage medium according to an embodiment of the present application. The storage medium of the embodiment of the present application stores a program file 61 that can implement all the above methods. The program file 61 can be stored in the above storage medium in the form of a software product and includes a number of instructions to make a computer device (can It is a personal computer, server, or network device, etc.) or processor that executes all or part of the steps of the various embodiments of the present invention. The aforementioned storage media include: U disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), magnetic disk or optical disk and other media that can store program code. , or terminal equipment such as computers, servers, mobile phones, tablets, etc.

本申请实施例的基于视频图像目标检测的火灾检测方法、系统、终端以及存储介质通过使用基于大气散射模型的数据增强算法将原始图像转换为受不同程度的灰霾或沙尘图像，生成用于训练模型的数据集，并构建适用于不确定环境下火灾烟雾探测的卷积神经网络模型LFNet，能够提高模型在沙尘和灰霾等异常天气下的鲁棒性，使模型获得更好的检测结果。同时，由于本申请实施例的LFNet模型尺寸较小，可以降低计算成本，并有利于LFNet模型应用于资源受限的设备。The fire detection method, system, terminal and storage medium based on video image target detection in the embodiment of the present application convert the original image into an image affected by haze or sand to varying degrees by using a data enhancement algorithm based on the atmospheric scattering model to generate a The data set for training the model and constructing the convolutional neural network model LFNet suitable for fire smoke detection in uncertain environments can improve the robustness of the model in abnormal weather such as sand, dust and haze, and enable the model to obtain better detection result. At the same time, due to the small size of the LFNet model in the embodiment of the present application, the calculation cost can be reduced, and it is beneficial for the LFNet model to be applied to devices with limited resources.

对所公开的实施例的上述说明，使本领域专业技术人员能够实现或使用本申请。对这些实施例的多种修改对本领域的专业技术人员来说将是显而易见的，本申请中所定义的一般原理可以在不脱离本申请的精神或范围的情况下，在其它实施例中实现。因此，本申请将不会被限制于本申请所示的这些实施例，而是要符合与本申请所公开的原理和新颖特点相一致的最宽的范围。The above description of the disclosed embodiments enables those skilled in the art to implement or use the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined in this application may be practiced in other embodiments without departing from the spirit or scope of the application. Therefore, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A fire detection method based on video image target detection, characterized by including:

A data enhancement algorithm based on the atmospheric scattering model is used to convert the original natural images into haze images and dust images to generate a data set for training the model;

Construct a convolutional neural network model LFNet, input the data set into the LFNet model for iterative training, and obtain the optimal model parameters; the skeleton feature extraction models respectively adopt convolution extraction at 3*3, 5*5 and 7*7 scales. Input the features of the image and obtain feature maps with sizes of 13*13, 26*26 and 52*52 respectively; the main feature extraction model further extracts the main features and generates sizes of 52*52, 26*26 and 13 respectively. *13 three sets of feature maps; the variable scale feature fusion model maps the three sets of feature maps to different convolution kernels and step sizes for convolution, and splices all convolutions of the same size to obtain three sets of feature maps, using channel-based The attention mechanism operates the three sets of feature maps to obtain feature maps with sizes of 13*13, 26*26 and 52*52 respectively, which are used to detect small, medium and large objects respectively; the data set is input into the LFNet model for Iterative training also includes: selecting mean square error and cross entropy as loss functions respectively for model optimization;

The convolutional neural network model LFNet includes a skeleton feature extraction model, a main feature extraction model and a variable scale feature fusion model; the skeleton feature extraction model extracts the main features of the input image through convolution at three different scales; the main features The extraction model is used to further extract the main features and generate three groups of feature maps; the variable scale feature fusion model adaptively fuses the three groups of feature maps and outputs detection results;

Input the fire image to be detected into the trained LFNet model, and output the fire location area and fire type of the fire image to be detected through the LFNet model.

2. The fire detection method based on video image target detection according to claim 1, characterized in that the use of a data enhancement algorithm based on the atmospheric scattering model to convert the original natural image into a haze image and a dust image includes:

Obtain original natural images; the original natural images include non-alarm images without fire alarm areas and real fire alarm images.

3. The fire detection method based on video image target detection according to claim 1 or 2, characterized in that the use of a data enhancement algorithm based on the atmospheric scattering model to convert the original natural image into a haze image includes:

The atmospheric scattering model uses at least two transmission rates to simulate and generate haze images of different concentrations; the haze image imaging formula is:

I(x)＝J(x)t(x)+ɑ(1-t(x))

In the above formula, I(x) is the simulated haze image, J(x) is the input haze-free image, ɑ is the atmospheric light value, and t(x) is the scene transmission rate.

4. The fire detection method based on video image target detection according to claim 3, characterized in that the use of a data enhancement algorithm based on the atmospheric scattering model to convert the original natural image into a sand and dust image includes:

The atmospheric scattering model uses a fixed transmittance and atmospheric light value, and combines three colors to simulate and generate sand and dust images of different concentrations; the sand and dust image simulation formula is:

D(x)=J(x)t(x)+a(C(x)*(1-t(x)))

In the above formula, D(x) is the simulated dust image, J(x) is the input haze-free image, and C(x) is the color value.

5. The fire detection method based on video image target detection according to claim 1, characterized in that the loss function is specifically:

The brightness, dark channel value and R channel data of the path in the fire area are counted. The statistical data are regarded as a combustion histogram prior and written as the formula of CHP:

In the above formula, R(x) represents the R channel of the image, SCP(x) is the difference between the brightness and dark channels of the image, w is the width of the histogram, and h is the height of the histogram;

SCP(x)=||v(x)-DCP(x)||

In the above formula, v(x) is the brightness of the image, and DCP(x) is the value of the dark channel of the image;

L _CHP =||CHP(I)-CHP(R)|| ²

In the above formula, CHP represents the combustion histogram prior, CHP(I) and CHP(R) represent the CHP value of the area selected by the target detection algorithm and the marked area respectively;

The loss function is a weighted sum of three different loss functions:

L _CHP =βL _CE +γL _MSE +δL _CHP

In the above formula, L _CHP is the final loss function, L _CE is the cross entropy loss function, L _MSE is the mean square error loss function, and L _CHP is the combustion histogram prior loss.

6. A fire detection system based on video image target detection, characterized by including:

Dataset building module: used to convert original natural images into haze images and dust images using a data enhancement algorithm based on the atmospheric scattering model, and generate a data set for training the model;

LFNet model training module: used to construct the convolutional neural network model LFNet, input the data set into the LFNet model for iterative training, and obtain the optimal model parameters; the skeleton feature extraction models adopt 3*3, 5*5 and 7 respectively. *7-scale convolution extracts the features of the input image and obtains feature maps with sizes of 13*13, 26*26 and 52*52 respectively; the main feature extraction model further extracts the main features and generates features with sizes of 52*. Three sets of feature maps: 52, 26*26, and 13*13; the variable scale feature fusion model maps the three sets of feature maps to different convolution kernels and step sizes for convolution, and splices all convolutions of the same size to obtain three sets of feature maps. A group of feature maps uses a channel-based attention mechanism to operate the three groups of feature maps to obtain feature maps with sizes of 13*13, 26*26 and 52*52 respectively, which are used to detect small, medium and large objects respectively; Inputting the data set into the LFNet model for iterative training also includes: selecting the mean square error and cross entropy as the loss function respectively for model optimization;

The convolutional neural network model LFNet includes a skeleton feature extraction model, a main feature extraction model and a variable scale feature fusion model; the skeleton feature extraction model extracts the main features of the input image through convolution at three different scales; the main features The extraction model is used to further extract the main features and generate three sets of feature maps; the variable scale feature fusion model adaptively fuses the three sets of feature maps and outputs detection results; the detection results include fire The fire location area of the image and the type of fire.

7. A terminal, characterized in that the terminal includes a processor and a memory coupled to the processor, wherein,

The memory stores program instructions for implementing the fire detection method based on video image target detection according to any one of claims 1 to 5;

The processor is configured to execute the program instructions stored in the memory to control fire detection based on video image target detection.

8. A storage medium, characterized in that it stores program instructions executable by a processor, and the program instructions are used to execute the fire detection method based on video image target detection according to any one of claims 1 to 5.