CN108961675A

CN108961675A - Fall detection method based on convolutional neural networks

Info

Publication number: CN108961675A
Application number: CN201810614024.4A
Authority: CN
Inventors: 彭力; 王永青
Original assignee: Jiangnan University
Current assignee: Jiangnan University
Priority date: 2018-06-14
Filing date: 2018-06-14
Publication date: 2018-12-07
Also published as: WO2019237567A1

Abstract

The present invention relates to a fall detection method based on a convolutional neural network, comprising: training the convolutional neural network, the training of the convolutional neural network specifically includes: preprocessing each frame of image obtained, the preprocessing work includes Foreground extraction, normalization, and whitening operations are performed in sequence; the ResNet network is pre-trained on the ImageNet dataset to obtain a pre-trained model. The classification method based on convolutional neural network is applied to the fall detection method. At the same time, in order to improve the accuracy of the system and reduce the complexity of the operation, an improved foreground detection method is used to extract the people in the complex background, and then the The processed images are put into the convolutional neural network for model training.

Description

Fall detection method based on convolutional neural network

技术领域technical field

本发明涉及跌倒检测方法，特别是涉及基于卷积神经网络的跌倒检测方法。The invention relates to a fall detection method, in particular to a fall detection method based on a convolutional neural network.

背景技术Background technique

现在社会老龄化趋势日益加重，老年人身体机能的下降以及越来越普遍的独居现象使得跌倒成为老年人致伤的主要原因之一，所以对跌倒行为进行检测具有十分重要的意义。Nowadays, the aging trend of the society is getting worse, the decline of the physical function of the elderly and the more and more common phenomenon of living alone make falls one of the main reasons for the injury of the elderly, so it is of great significance to detect the fall behavior.

传统的基于计算机视觉的跌倒检测方法通常是手动提取特征，工程量巨大，并且泛化能力差，精度不高。与传统的特征提取法不同，卷积神经网络能够自动提取特征，训练后的模型具有几何不变性，能够克服因光照和拍摄角度的变化而产生的问题。Traditional computer vision-based fall detection methods usually extract features manually, which requires a huge amount of engineering, poor generalization ability, and low accuracy. Unlike traditional feature extraction methods, convolutional neural networks can automatically extract features, and the trained model has geometric invariance, which can overcome problems caused by changes in lighting and shooting angles.

传统技术存在以下技术问题：The traditional technology has the following technical problems:

当前跌倒检测系统主要分为两类：第一类是基于传感器的穿戴式检测系统；另一类是基于视频的检测系统。目前，基于三维加速度或者躯干角速度的穿戴式识别系统研究已经较为成熟。然而，穿戴式装置一般需要佩戴于颈部或腰部，长时间佩戴会使用户感到不适。而基于视觉的检测系统则是通过一个或几个摄像机捕捉目标的运动，通过特定的图像处理算法确定发生跌倒时的图像特征，从而将跌倒和日常活动区分开来。目前常用的基于视觉的跌倒检测算法主要是阈值法和智能算法。阈值法通常是对人体的头部位置或者重心进行检测。Diraco通过判断人体中心低于指定高度并维持超过4s时就认为发生跌倒。Rougier等人通过定位头部位置，再通过粒子滤波器估算下一帧图像中头部位置，计算水平方向和垂直方向的速度，并与阈值进行比较的方式确定是否发生跌倒。这些方法实现简单，但是精度容易受到环境等外界因素的影响。而基于机器学习的方法主要是先对图像进行人物提取，然后再手动提取特征，将获得的特征输入模型从而实现对跌倒行为的检测识别。这种方法需要人工提取特征，工程量巨大，且大部分只停留在二分类问题上，考虑到未来对于智能家居的要求变高，对于人体的各种姿势进行识别也成了不可或缺的部分。The current fall detection system is mainly divided into two categories: the first is the sensor-based wearable detection system; the other is the video-based detection system. At present, research on wearable recognition systems based on three-dimensional acceleration or torso angular velocity has been relatively mature. However, wearable devices generally need to be worn on the neck or waist, and wearing them for a long time will make users feel uncomfortable. The vision-based detection system captures the movement of the target through one or several cameras, and uses a specific image processing algorithm to determine the image features when a fall occurs, thereby distinguishing the fall from daily activities. At present, the commonly used vision-based fall detection algorithms are mainly threshold method and intelligent algorithm. The threshold method usually detects the head position or the center of gravity of the human body. Diraco considers a fall when it judges that the center of the human body is lower than the specified height and maintains it for more than 4s. Rougier et al. determine whether a fall has occurred by locating the head position, estimating the head position in the next frame image through a particle filter, calculating the horizontal and vertical speeds, and comparing with the threshold. These methods are simple to implement, but the accuracy is easily affected by external factors such as the environment. The method based on machine learning is mainly to extract people from the image first, then manually extract features, and input the obtained features into the model to realize the detection and recognition of falling behavior. This method requires manual extraction of features, the amount of work is huge, and most of them only stay on the binary classification problem. Considering that the requirements for smart homes will become higher in the future, the recognition of various postures of the human body has become an indispensable part. .

发明内容Contents of the invention

基于此，有必要针对上述技术问题，提供一种基于卷积神经网络的跌倒检测方法，将基于卷积神经网络的分类方法运用到跌倒检测方法中，同时，为了提高系统的精度，减小运算的复杂度，利用一种改进的前景检测方法来提取出复杂背景下的人物，再将处理后的图像放入卷积神经网络中进行模型训练。Based on this, it is necessary to provide a fall detection method based on convolutional neural network for the above technical problems, and apply the classification method based on convolutional neural network to the fall detection method. At the same time, in order to improve the accuracy of the system and reduce the calculation The complexity, using an improved foreground detection method to extract the characters in the complex background, and then put the processed image into the convolutional neural network for model training.

一种基于卷积神经网络的跌倒检测方法，包括：A fall detection method based on convolutional neural network, comprising:

训练卷积神经网络，所述训练卷积神经网络具体包括：Training convolutional neural network, the training convolutional neural network specifically includes:

对获取得到的每一帧图像进行预处理，预处理的工作包括依次为前景提取和归一化、白化操作；Perform preprocessing on each frame of image obtained, and the preprocessing work includes foreground extraction, normalization, and whitening operations in sequence;

在ImageNet数据集上先对ResNet网络进行预训练，从而得到预训练模型；Pre-train the ResNet network on the ImageNet dataset to obtain a pre-trained model;

将步骤“对获取得到的每一帧图像进行预处理，预处理的工作包括依次为前景提取和归一化、白化操作；”处理后的图片放入所述预训练模型中进行模型训练，得到模型的参数；以及The step "preprocess each frame of image obtained, the preprocessing work includes foreground extraction, normalization, and whitening operations in turn;" the processed picture is put into the pre-training model for model training, and the obtained the parameters of the model; and

将测试集输入到经过训练后的模型，用测试集对模型的精度进行检测；Input the test set into the trained model, and use the test set to test the accuracy of the model;

利用训练的后的卷积神经网络检测图片；Use the trained convolutional neural network to detect pictures;

其中，所述前景提取的方法具体包括：Wherein, the method for foreground extraction specifically includes:

利用背景差分法处理图像；Image processing using background subtraction;

利用混合高斯模型处理图像；Image processing using a mixture of Gaussian models;

对利用背景差分法处理图像后的结果和利用混合高斯模型处理后图像的结果求与。Sum the result of processing the image using the background subtraction method and the result of processing the image using the mixture Gaussian model.

上述基于卷积神经网络的跌倒检测方法，将基于卷积神经网络的分类方法运用到跌倒检测方法中，同时，为了提高系统的精度，减小运算的复杂度，利用一种改进的前景检测方法来提取出复杂背景下的人物，再将处理后的图像放入卷积神经网络中进行模型训练。The above fall detection method based on convolutional neural network applies the classification method based on convolutional neural network to the fall detection method. At the same time, in order to improve the accuracy of the system and reduce the complexity of the operation, an improved foreground detection method is used To extract the characters in the complex background, and then put the processed image into the convolutional neural network for model training.

在另外的一个实施例中，在步骤“对获取得到的每一帧图像进行预处理，预处理的工作包括依次为前景提取和归一化、白化操作；”中，获取得到的每一帧图像是通过读取视频文件得到的。In another embodiment, in the step "preprocessing each frame of image obtained, the preprocessing work includes foreground extraction, normalization, and whitening operations in sequence;" in each frame of image obtained It is obtained by reading the video file.

在另外的一个实施例中，在步骤“利用训练的后的卷积神经网络检测图片”之后，所述训练卷积神经网络具体还包括：显示出每一帧图片的检测效果图，并实现模型的卷积核可视化。In another embodiment, after the step "using the trained convolutional neural network to detect pictures", the training convolutional neural network specifically further includes: displaying the detection effect diagram of each frame of pictures, and implementing the model The convolution kernel visualization.

在另外的一个实施例中，步骤“显示出每一帧图片的检测效果图，并实现模型的卷积核可视化”中在matlab平台上显示出每一帧图片的检测效果图，并实现模型的卷积核可视化。In another embodiment, in the step "displaying the detection effect diagram of each frame of pictures, and realizing the visualization of the convolution kernel of the model", the detection effect diagram of each frame of pictures is displayed on the matlab platform, and the model is realized. Convolution kernel visualization.

在另外的一个实施例中，步骤“利用背景差分法处理图像；”具体包括：In another embodiment, the step "processing the image using the background subtraction method;" specifically includes:

取前几帧图像的平均值，将其用作初始背景图像B_t；Take the average value of the previous few frame images and use it as the initial background image B _t ;

前帧图像与背景图像的灰度进行减运算，并取其绝对值为N_t(x,y)公式即为The grayscale of the previous frame image and the background image is subtracted, and its absolute value is N _t (x, y) The formula is

N_t(x,y)＝|I_t(x,y)-B_t(x,y)|N _t (x,y)＝|I _t (x,y)-B _t (x,y)|

对当前帧的像素(x,y)，若有|I_t(x,y)-B_t(x,y)|≥T，则该像素点为前景点，即更新当前的图像帧为；For the pixel (x, y) of the current frame, if there is |I _t (x, y)-B _t (x, y)|≥T, then the pixel point is the foreground point, that is, the current image frame is updated as;

用当前帧图像更新背景图像。Update the background image with the current frame image.

在另外的一个实施例中，步骤“利用混合高斯模型处理图像；”具体包括：In another embodiment, the step "processing the image using a mixture of Gaussian models;" specifically includes:

使用高斯混合模型对背景建立模型时，序列图像的每个像素的像素值都可以用k个高斯模型模拟，因此在时刻t，某个像素值的概率密度函数可以表示为：When the Gaussian mixture model is used to model the background, the pixel value of each pixel of the sequence image can be simulated by k Gaussian models, so at time t, the probability density function of a certain pixel value can be expressed as:

其中，w_i,t表示高斯模型的权重，而高斯模型的概率密度函数表示为：Among them, w _{i, t} represents the weight of the Gaussian model, and the probability density function of the Gaussian model is expressed as:

接着，将K个高斯混合模型按照权重除以标准差的商的大小排序，然后选择先前B个高斯模型用于区分判别，其中B的取值表示为：Next, the K Gaussian mixture models are sorted according to the size of the quotient of the weight divided by the standard deviation, and then the previous B Gaussian models are selected for discrimination, where the value of B is expressed as:

将新的图像帧中的每个像素分别放到排好序的K个高斯模型当中进行判断，判断条件为：Put each pixel in the new image frame into the sorted K Gaussian models for judgment, and the judgment conditions are:

||X_t-μ_t||≤β∑^1/2 ||X _t -μ _t ||≤β∑ ^1/2

在前面的B个高斯模型当中，如果其中有一个高斯模型中满足上述条件，那么这个像素就判定为背景，如果在B个高斯模型中上述条件都不满足时，则判定这个像素属于前景。Among the previous B Gaussian models, if one of the Gaussian models meets the above conditions, then the pixel is judged as the background, and if none of the above conditions is satisfied in the B Gaussian models, the pixel is judged to belong to the foreground.

对于每个高斯模型，假设上述条件不成立，就要减小这一个高斯模型的权重，如果上述的条件成立，就要更新这个高斯混合模型，具体操作的方法如下公式所示：For each Gaussian model, assuming that the above conditions are not true, the weight of this Gaussian model must be reduced. If the above conditions are true, the Gaussian mixture model must be updated. The specific operation method is shown in the following formula:

w_i,t＝(1-λ)w_i,t-1+λBM_t w _i,t ＝(1-λ)w _i,t-1 +λBM _t

μ_i,t＝(1-α)μ_i,t-1+αX_i,t μ _i,t ＝(1-α)μ _i,t-1 +αX _i,t

∑_i,t＝(1-α)∑_i,t-1+α(X_i,t-μ_i,t)(X_i,t-μ_i,t)^T ∑ _i,t ＝(1-α)∑ _i,t-1 +α(X _i,t -μ _i,t )(X _i,t -μ _i,t ) ^T

α＝λ/w_i,t α=λ/w _i,t

其中，像素为前景则BM＝0，否则BM＝1，最后用一个初始的高斯模型来代替权值最小的那个高斯模型，阈值T，学习率λ，参数β都是事先指定的常数。Among them, if the pixel is the foreground, BM=0, otherwise BM=1, and finally an initial Gaussian model is used to replace the Gaussian model with the smallest weight, the threshold T, the learning rate λ, and the parameter β are all constants specified in advance.

在另外的一个实施例中，步骤“将测试集输入到经过训练后的模型，用测试集对模型的精度进行检测；”中，所述测试集来自UR Fall Detection Dataset。In another embodiment, in the step "input the test set into the trained model, and use the test set to test the accuracy of the model;", the test set comes from UR Fall Detection Dataset.

一种计算机设备，包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序，所述处理器执行所述程序时实现任一项所述方法的步骤。A computer device includes a memory, a processor, and a computer program stored in the memory and operable on the processor, and the processor implements the steps of any one of the methods when executing the program.

一种计算机可读存储介质，其上存储有计算机程序，该程序被处理器执行时实现任一项所述方法的步骤。A computer-readable storage medium, on which a computer program is stored, and when the program is executed by a processor, the steps of any one of the methods described above are implemented.

一种处理器，所述处理器用于运行程序，其中，所述程序运行时执行任一项所述的方法。A processor, the processor is used to run a program, wherein the program executes any one of the methods when running.

附图说明Description of drawings

图1为本申请实施例提供的一种基于卷积神经网络的跌倒检测方法的流程示意图。FIG. 1 is a schematic flowchart of a fall detection method based on a convolutional neural network provided in an embodiment of the present application.

图2为本申请实施例提供的一种基于卷积神经网络的跌倒检测方法中的残差学习构建模块示意图。FIG. 2 is a schematic diagram of a residual learning building block in a fall detection method based on a convolutional neural network provided by an embodiment of the present application.

图3为本申请实施例提供的一种基于卷积神经网络的跌倒检测方法中的损失值函数曲线示意图。FIG. 3 is a schematic diagram of a loss value function curve in a fall detection method based on a convolutional neural network provided by an embodiment of the present application.

图4为本申请实施例提供的一种基于卷积神经网络的跌倒检测方法中模型的测试流程图。FIG. 4 is a flow chart of testing a model in a fall detection method based on a convolutional neural network provided by an embodiment of the present application.

图5为本申请实施例提供的一种基于卷积神经网络的跌倒检测方法中的背景差分法的效果示意图。FIG. 5 is a schematic diagram of the effect of the background subtraction method in a fall detection method based on a convolutional neural network provided by an embodiment of the present application.

图6为本申请实施例提供的一种基于卷积神经网络的跌倒检测方法中的高斯混合环境模型的效果示意图。FIG. 6 is a schematic diagram of the effect of a Gaussian mixture environment model in a fall detection method based on a convolutional neural network provided by an embodiment of the present application.

图7为本申请实施例提供的一种基于卷积神经网络的跌倒检测方法中的改进的前景检测方法的效果示意图。FIG. 7 is a schematic diagram of the effect of an improved foreground detection method in a fall detection method based on a convolutional neural network provided by an embodiment of the present application.

图8为本申请实施例提供的一种基于卷积神经网络的跌倒检测方法中的前景提取的RGB图。FIG. 8 is an RGB image of foreground extraction in a fall detection method based on a convolutional neural network provided by an embodiment of the present application.

图9为本申请实施例提供的一种基于卷积神经网络的跌倒检测方法中的卷积核可视化。FIG. 9 is a visualization of convolution kernels in a fall detection method based on a convolutional neural network provided by an embodiment of the present application.

图10为本申请实施例提供的一种基于卷积神经网络的跌倒检测方法中的第一层特征图。FIG. 10 is a feature map of the first layer in a fall detection method based on a convolutional neural network provided by an embodiment of the present application.

图11为本申请实施例提供的一种基于卷积神经网络的跌倒检测方法中的第二层特征图。FIG. 11 is a feature map of the second layer in a fall detection method based on a convolutional neural network provided by an embodiment of the present application.

具体实施方式Detailed ways

为了使本发明的目的、技术方案及优点更加清楚明白，以下结合附图及实施例，对本发明进行进一步详细说明。应当理解，此处所描述的具体实施例仅仅用以解释本发明，并不用于限定本发明。In order to make the object, technical solution and advantages of the present invention clearer, the present invention will be further described in detail below in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described here are only used to explain the present invention, not to limit the present invention.

参阅图1，一种基于卷积神经网络的跌倒检测方法，包括：Referring to Figure 1, a fall detection method based on convolutional neural network, including:

N_t(x,y)＝|I_t(x,y)-B_t(x,y)|N _t (x,y)＝|I _t (x,y)-B _t (x,y)|

||X_t-μ_t||≤β∑^1/2 ||X _t -μ _t ||≤β∑ ^1/2

w_i,t＝(1-λ)w_i,t-1+λBM_t w _i,t ＝(1-λ)w _i,t-1 +λBM _t

μ_i,t＝(1-α)μ_i,t-1+αX_i,t μ _i,t ＝(1-α)μ _i,t-1 +αX _i,t

α＝λ/w_i,t α=λ/w _i,t

在另外的一个实施例中，步骤“将测试集输入到经过训练后的模型，用测试集对模型的精度进行检测；”中，所述测试集来自URFall Detection Dataset。In another embodiment, in the step "input the test set into the trained model, and use the test set to detect the accuracy of the model;", the test set is from URFall Detection Dataset.

下面给出本发明的一个具体应用场景：A specific application scenario of the present invention is given below:

1.图像预处理1. Image preprocessing

在具体的实施例中，使用改进的前景检测方法来提取人物前景。目前前景提取的方法主要有帧间差分法、背景差分法、光流法、混合高斯模型。In a specific embodiment, an improved foreground detection method is used to extract the foreground of a person. At present, the methods of foreground extraction mainly include frame difference method, background difference method, optical flow method, and mixed Gaussian model.

1)背景差分法1) Background difference method

其中，若设I_t，B_t分别为当前的图像帧与背景帧，T为前景检测的灰度阈值，背景差分法的算法步骤如下：Wherein, if it is assumed that I _t and B _t are the current image frame and the background frame respectively, and T is the gray threshold value of the foreground detection, the algorithm steps of the background difference method are as follows:

1)取前几帧图像的平均值，将其用作初始背景图像B_t；1) Take the average value of the previous few frame images and use it as the initial background image B _t ;

2)前帧图像与背景图像的灰度进行减运算，并取其绝对值为N_t(x,y)；公式即为2) Subtract the grayscale of the previous frame image and the background image, and take its absolute value as N _t (x, y); the formula is

N_t(x,y)＝|I_t(x,y)-B_t(x,y)|N _t (x,y)＝|I _t (x,y)-B _t (x,y)|

3)对当前帧的像素(x,y)，若有|I_t(x,y)-B_t(x,y)|≥T，则该像素点为前景点，即更新当前的图像帧为3) For the pixel (x, y) of the current frame, if there is |I _t (x, y)-B _t (x, y)|≥T, then the pixel point is the foreground point, that is, update the current image frame as

4)用当前帧图像更新背景图像。4) Update the background image with the current frame image.

2)混合高斯模型2) Mixed Gaussian model

混合高斯模型是由Stauffer等人提出的一种基于对背景建模的自适应混合高斯背景提取方法。当使用高斯混合模型对背景建立模型时，序列图像的每个像素的像素值都可以用k个高斯模型模拟，因此在时刻t，某个像素值的概率密度函数可以表示为：The mixed Gaussian model is an adaptive mixed Gaussian background extraction method based on background modeling proposed by Stauffer et al. When the Gaussian mixture model is used to model the background, the pixel value of each pixel of the sequence image can be simulated by k Gaussian models, so at time t, the probability density function of a certain pixel value can be expressed as:

其中，w_i,t表示高斯模型的权重，而高斯模型的概率密度函数可以表示为：Among them, w _{i, t} represents the weight of the Gaussian model, and the probability density function of the Gaussian model can be expressed as:

||X_t-μ_t||≤β∑^1/2 ||X _t -μ _t ||≤β∑ ^1/2

w_i,t＝(1-λ)w_i,t-1+λBM_t w _i,t ＝(1-λ)w _i,t-1 +λBM _t

μ_i,t＝(1-α)μ_i,t-1+αX_i,t μ _i,t ＝(1-α)μ _i,t-1 +αX _i,t

α＝λ/w_i,t α=λ/w _i,t

其中，像素为前景则BM＝0，否则BM＝1。最后用一个初始的高斯模型来代替权值最小的那个高斯模型，阈值T，学习率λ，参数β都是事先指定的常数。Wherein, if the pixel is foreground, BM=0, otherwise BM=1. Finally, an initial Gaussian model is used to replace the Gaussian model with the smallest weight. The threshold T, learning rate λ, and parameter β are all constants specified in advance.

3)改进的前景检测方法3) Improved foreground detection method

背景差分法虽然方法简单，计算量小，但是会产生“鬼影”现象。而混合高斯模型混合高斯法不仅对背景建模,同时也对前景建模,因此对全局亮度的突然变化非常敏感。为了解决单纯的使用一种方法所带来的局限性，本发明提出一种改进前景检测方法，即将两者的处理效果进行“与”操作能够很好的解决“鬼影”和光照敏感的问题。对于图片矩阵的像素点(x,y)假设背景差分法的输出为D(x,y)，混合高斯法的输出G(x,y),改进的前景检测方法的输出为R(x,y)，则Although the method of background subtraction is simple and the amount of calculation is small, it will produce "ghosting" phenomenon. The mixed Gaussian model not only models the background, but also models the foreground, so it is very sensitive to sudden changes in global brightness. In order to solve the limitations caused by using only one method, the present invention proposes an improved foreground detection method, that is, performing an "AND" operation on the processing effects of the two can well solve the problems of "ghosting" and light sensitivity . For the pixel point (x, y) of the image matrix, it is assumed that the output of the background difference method is D(x, y), the output of the mixed Gaussian method is G(x, y), and the output of the improved foreground detection method is R(x, y). ),but

再用一些二值化处理图片的方式，就能得到清晰的目标人物的二值化图像。Then use some binarization methods to process the picture, and you can get a clear binarized image of the target person.

经过前景提取后，再对图像进行开闭操作以及最大连通域的选择从而大致确定出人物前景的位置，截取对应的位置得到相应的人物前景的RGB图像。After foreground extraction, open and close the image and select the largest connected domain to roughly determine the position of the person's foreground, and intercept the corresponding position to obtain the corresponding RGB image of the person's foreground.

2.模型的选择2. Model selection

采用2015年的冠军模型ResNet作为本发明的网络模型。ResNet网络能够有效的解决随着网络深度增加，算法的准确度趋于饱和并快速下降的问题。同时，它的参数量比VGGNet还低，效果非常显著。在大大的提升模型的准确率的同时，训练速度也得到了很大的提高。这主要归功于它所使用的残差模块的构建思想，如图2所示.Adopt the champion model ResNet of 2015 as the network model of the present invention. The ResNet network can effectively solve the problem that as the network depth increases, the accuracy of the algorithm tends to saturate and decline rapidly. At the same time, its parameter amount is lower than that of VGGNet, and the effect is very significant. While greatly improving the accuracy of the model, the training speed has also been greatly improved. This is mainly due to the construction idea of the residual module it uses, as shown in Figure 2.

下表为ResNet-34和VGG-16在ImageNet2012的对比The following table shows the comparison of ResNet-34 and VGG-16 in ImageNet2012

网络network RESNET-34RESNET-34 VGG-16VGG-16 计算量Calculations 36亿FLOPs3.6 billion FLOPs 153亿FLOPs15.3 billion FLOPs 精度top-1Accuracy top-1 0.7330.733 0.7150.715

ResNet的网络架构如下表所示：The network architecture of ResNet is shown in the following table:

预训练模型pre-trained model

在正式用预处理后的图像进行模型训练之前，先用ImageNet对卷积神经网络的参数进行预训练。Bengio教授等人指出采用对模型进行随机初始化的方法往往使得模型进入到局部最小值的概率很高，而采用预训练的方式能够使得模型的效果更好。在实际操作中，预训练模型的作用就是将网络的参数初始化为ImageNet在网络上训练出的模型参数。但是，因为ImageNet是分为1000类，而本发明只需要将图像分为两类，所以需要对全连接层进行修改，将num_output的值由原本的1000改成2，并修改全连接层的名字。Before officially using the preprocessed images for model training, ImageNet is used to pre-train the parameters of the convolutional neural network. Professor Bengio and others pointed out that the method of random initialization of the model often makes the probability of the model entering the local minimum is very high, and the use of pre-training can make the model more effective. In actual operation, the function of the pre-training model is to initialize the parameters of the network to the model parameters trained by ImageNet on the network. However, because ImageNet is divided into 1000 categories, and the present invention only needs to divide images into two categories, it is necessary to modify the fully connected layer, change the value of num_output from the original 1000 to 2, and modify the name of the fully connected layer .

模型训练model training

网络的训练过程基于Caffe平台完成，Caffe按照神经网络的一个假设来构建的：其中用层的形式来表示所有的计算，层的效果就是根据输入的数据，从而输出经过计算后的结果。就卷积而言，如果输入的是一幅图像，再与这层的参数进行卷积运算操作，最后再输出卷积的结果。每一层都需要两种运算：1)前向通路，从输入数据来计算输出数据；2)反向通路，依据上面的梯度值从而计算出相对于输入的梯度。当每一层都完成了这两个函数后，就能够将许多层都连接成一个网络，网络的功能就是根据输入数据(图像、语音或者其他信息形式)来计算出期望的输出。在训练的过程中，能够根据已知的标签结果来计算输出模型的损失函数以及梯度，而后再根据梯度值进一步更新网络的参数。The training process of the network is completed based on the Caffe platform. Caffe is built according to a hypothesis of the neural network: all calculations are represented in the form of layers, and the effect of the layers is to output the calculated results according to the input data. As far as convolution is concerned, if the input is an image, then perform a convolution operation with the parameters of this layer, and finally output the result of the convolution. Each layer requires two operations: 1) the forward path, which calculates the output data from the input data; 2) the reverse path, which calculates the gradient relative to the input based on the above gradient value. When each layer has completed these two functions, many layers can be connected into a network. The function of the network is to calculate the desired output based on the input data (image, voice or other information forms). During the training process, the loss function and gradient of the output model can be calculated according to the known label results, and then the parameters of the network can be further updated according to the gradient value.

在对模型进行训练之前，需要先封装图像的数据库即图像和对应的label，即将图像集封装成为数据库的形式：Lmdb与Leveldb。具体来说就是调用convert_imageset命令来转换数据格式。ResNet34模型的搭建是通过定义trian_val.prototxt实现的，它定义了ResNet具体的网络结构。该文件的组织构架是类结构体的形式。每层layer结构体当中都包括许多的参数。比如说，bottom参数表示底层输入，top参数表示输出到下一层的结果，param参数表示这一层的参数，其中包括num_output表示滤波器的个数，kernel_size表示滤波器的大小，stride表示步长。ResNet的训练模型的方式是通过solver.prototxt这个文件进行编写的，这个文件当中的每行都表示一个训练参数。其中，一些常见的训练参数，像net参数用来指定使用的模型，而max_iter参数表示设置的迭达的最大次数，那么snapshot_prefix参数则表示模型保存时它的前缀名称等。Before training the model, it is necessary to encapsulate the image database, that is, the image and the corresponding label, that is, to encapsulate the image set into the form of a database: Lmdb and Leveldb. Specifically, the convert_imageset command is called to convert the data format. The construction of the ResNet34 model is realized by defining trian_val.prototxt, which defines the specific network structure of ResNet. The organizational structure of this file is in the form of a class structure. Each layer structure includes many parameters. For example, the bottom parameter indicates the bottom layer input, the top parameter indicates the result output to the next layer, and the param parameter indicates the parameters of this layer, including num_output indicating the number of filters, kernel_size indicating the size of the filter, and stride indicating the step size . ResNet's training model is written through the file solver.prototxt, and each line in this file represents a training parameter. Among them, some common training parameters, such as the net parameter, are used to specify the model used, while the max_iter parameter indicates the maximum number of iterations set, and the snapshot_prefix parameter indicates its prefix name when the model is saved, etc.

同时，也可以在Matlab平台上绘制模型训练时的损失函数曲线。每迭代一次就绘制一次损失值，每迭代100次就绘制一次精度。如图3所示。At the same time, the loss function curve during model training can also be drawn on the Matlab platform. The loss value is plotted every iteration, and the accuracy is plotted every 100 iterations. As shown in Figure 3.

模型的测试model testing

利用训练好的模型对单张图片进行分类并输出分类的结果，这一步骤同样在Matlab上实现。具体流程参加图4。Use the trained model to classify a single picture and output the classification result, this step is also implemented on Matlab. Refer to Figure 4 for the specific process.

在获取前向传播结果时，只需要调用matlab自带的函数net.forward即可获得图像在每一个标签下的概率。数据在每个卷积层都是以三维形式存在的。可以将其看成由很多个二维图片叠在一起，其中每一个称为一个特征图(feature map)。如果输入层是灰度图片，那么就只有一个feature map；如果输入层是彩色图片，一般就会是3个特征图(红绿蓝)。层与层之间有很多个卷积核(kernel)，上一层的每个feature map跟每个卷积核做卷积运算，都会产生下一层的一个特征图。When obtaining the forward propagation results, you only need to call the function net.forward that comes with matlab to obtain the probability of the image under each label. Data exists in three dimensions at each convolutional layer. It can be seen as a stack of many two-dimensional images, each of which is called a feature map. If the input layer is a grayscale image, then there is only one feature map; if the input layer is a color image, there are generally three feature maps (red, green, and blue). There are many convolution kernels (kernels) between layers, and each feature map of the previous layer is convolved with each convolution kernel to generate a feature map of the next layer.

1.先用改进后的前景检测法来将人物提取出来，其中背景差分法和混合高斯模型提取前景的效果图如图5和图6所示：1. First use the improved foreground detection method to extract the characters, and the effect diagrams of the background difference method and the mixed Gaussian model to extract the foreground are shown in Figure 5 and Figure 6:

运用改进的前景检测法，即将背景差分法和混合高斯模型两者的结果相“与”后的效果图如图7所示。Using the improved foreground detection method, the effect diagram after "ANDing" the results of the background difference method and the mixed Gaussian model is shown in Figure 7.

然后，再进行简单的二值化处理和最大连通域的提取确定人物前景的范围，从而在原图上截取得到人物对应的RGB图像Then, perform simple binarization processing and extraction of the largest connected domain to determine the range of the foreground of the person, so as to intercept the RGB image corresponding to the person on the original image

2.在公开的数据集UR Fall Detection Dataset上进行模型训练和性能的测试。其中训练集有7381张图片，测试集有1326张图片。将数据集中所有的图片都先进行前景提取的操作，获得如图8所示的图片后放入网络中进行模型的训练。2. Perform model training and performance testing on the public dataset UR Fall Detection Dataset. Among them, the training set has 7381 pictures, and the test set has 1326 pictures. All the pictures in the data set are firstly subjected to the foreground extraction operation, and the pictures shown in Figure 8 are obtained and then put into the network for model training.

3.在ImageNet数据集上先对本发明使用的ResNet网络进行预训练，从而得到预训练模型。将训练集进行预处理(前景检测、白化、归一化)后的图片输入到网络中进行训练。3. On the ImageNet data set, the ResNet network used in the present invention is pre-trained to obtain a pre-trained model. The pictures after preprocessing (foreground detection, whitening, normalization) of the training set are input into the network for training.

4.并用测试集对训练好的模型进行精度测试，每迭代一次测试一次损失值，每迭代100次测试一次精度，最终收敛时结果精度为96.7％4. Use the test set to test the accuracy of the trained model, test the loss value every iteration, test the accuracy every 100 iterations, and the final convergence accuracy is 96.7%

5.借助Matlab平台测试单独一张图片的输出，并实现卷积核和特征图的可视化。其中卷积核的可视化图片如图9所示5. Use the Matlab platform to test the output of a single image, and realize the visualization of the convolution kernel and feature map. The visualization picture of the convolution kernel is shown in Figure 9

调用matlab自带函数feature_map实现特征图可视化，如图10和图11所示。Call the function feature_map that comes with matlab to realize the visualization of the feature map, as shown in Figure 10 and Figure 11.

从图10和图11可以看出，底层的卷积核主要用来提取人物的轮廓等基础的特征。It can be seen from Figure 10 and Figure 11 that the underlying convolution kernel is mainly used to extract basic features such as the outline of a person.

本发明的训练集和测试集均来自于UR Fall Detection Dataset，将数据集在caffe平台上进行模型的训练以及精度的测试，并利用cuDNN进行加速，最终显示精度达到96.7％，时间复杂度为49ms。Both the training set and the test set of the present invention come from the UR Fall Detection Dataset. The dataset is used for model training and accuracy testing on the caffe platform, and is accelerated by cuDNN. The final display accuracy reaches 96.7%, and the time complexity is 49ms .

以上所述实施例的各技术特征可以进行任意的组合，为使描述简洁，未对上述实施例中的各个技术特征所有可能的组合都进行描述，然而，只要这些技术特征的组合不存在矛盾，都应当认为是本说明书记载的范围。The various technical features of the above-mentioned embodiments can be combined arbitrarily. To make the description concise, all possible combinations of the various technical features in the above-mentioned embodiments are not described. However, as long as there is no contradiction in the combination of these technical features, should be considered as within the scope of this specification.

以上所述实施例仅表达了本发明的几种实施方式，其描述较为具体和详细，但并不能因此而理解为对发明专利范围的限制。应当指出的是，对于本领域的普通技术人员来说，在不脱离本发明构思的前提下，还可以做出若干变形和改进，这些都属于本发明的保护范围。因此，本发明专利的保护范围应以所附权利要求为准。The above-mentioned embodiments only express several implementation modes of the present invention, and the descriptions thereof are relatively specific and detailed, but should not be construed as limiting the patent scope of the invention. It should be pointed out that those skilled in the art can make several modifications and improvements without departing from the concept of the present invention, and these all belong to the protection scope of the present invention. Therefore, the protection scope of the patent for the present invention should be based on the appended claims.

Claims

1. A fall detection method based on convolutional neural network, characterized in that, comprising:

Training the convolutional neural network, the training convolutional neural network specifically includes:

Perform preprocessing on each frame of image obtained, and the preprocessing work includes foreground extraction, normalization, and whitening operations in sequence;

Pre-train the ResNet network on the ImageNet dataset to obtain a pre-trained model;

The step "preprocess each frame of image obtained, the preprocessing work includes foreground extraction, normalization, and whitening operations in turn;" the processed picture is put into the pre-training model for model training, and the obtained the parameters of the model; and

Input the test set into the trained model, and use the test set to test the accuracy of the model;

Use the trained convolutional neural network to detect pictures;

Wherein, the method for foreground extraction specifically includes:

Image processing using background subtraction;

Image processing using a mixture of Gaussian models;

Sum the result of processing the image using the background subtraction method and the result of processing the image using the mixture Gaussian model.

2. the fall detection method based on convolutional neural network according to claim 1, is characterized in that, in step " each frame image that obtains is carried out preprocessing, the work of preprocessing comprises foreground extraction and normalization successively Whitening and whitening operations; ", each frame of image acquired is obtained by reading the video file.

3. The fall detection method based on convolutional neural network according to claim 1, characterized in that, after the step "use the trained convolutional neural network to detect pictures", the training convolutional neural network specifically also includes : Display the detection effect diagram of each frame of the picture, and realize the visualization of the convolution kernel of the model.

4. the fall detection method based on convolutional neural network according to claim 3, is characterized in that, step " shows the detection effect figure of each frame picture, and realizes the convolution kernel visualization of model " on the matlab platform Display the detection effect diagram of each frame of the picture, and realize the visualization of the convolution kernel of the model.

5. the fall detection method based on convolutional neural network according to claim 1, is characterized in that, step " utilizes background difference method to process image; " specifically comprises:

Take the average value of the previous few frame images and use it as the initial background image B _t ;

The grayscale of the previous frame image and the background image is subtracted, and its absolute value is N _t (x, y) The formula is

N _t (x,y)＝|I _t (x,y)-B _t (x,y)|

For the pixel (x, y) of the current frame, if there is |I _t (x, y)-B _t (x, y)|≥T, then the pixel point is the foreground point, that is, the current image frame is updated as;

Update the background image with the current frame image.

6. the fall detection method based on convolutional neural network according to claim 1, is characterized in that, step " utilizes mixture Gaussian model processing image; " specifically comprises:

When the Gaussian mixture model is used to model the background, the pixel value of each pixel of the sequence image can be simulated by k Gaussian models, so at time t, the probability density function of a certain pixel value can be expressed as:

Among them, w _{i, t} represents the weight of the Gaussian model, and the probability density function of the Gaussian model is expressed as:

Next, the K Gaussian mixture models are sorted according to the size of the quotient of the weight divided by the standard deviation, and then the previous B Gaussian models are selected for discrimination, where the value of B is expressed as:

Put each pixel in the new image frame into the sorted K Gaussian models for judgment, and the judgment conditions are:

||X _t -μ _t ||≤β∑ ^1/2

Among the previous B Gaussian models, if one of the Gaussian models meets the above conditions, then the pixel is judged as the background, and if none of the above conditions is satisfied in the B Gaussian models, the pixel is judged to belong to the foreground.

For each Gaussian model, assuming that the above conditions are not true, the weight of this Gaussian model must be reduced. If the above conditions are true, the Gaussian mixture model must be updated. The specific operation method is shown in the following formula:

w _i,t = (1-λ)w _i,t-1 +λBM _t

μ _i,t ＝(1-α)μ _i,t-1 +αX _i,t

∑ _i,t ＝(1-α)∑ _i,t-1 +α(X _i,t -μ _i,t )(X _i,t -μ _i,t ) ^T

α=λ/w _i,t

Among them, if the pixel is the foreground, BM=0, otherwise BM=1, and finally an initial Gaussian model is used to replace the Gaussian model with the smallest weight, the threshold T, the learning rate λ, and the parameter β are all constants specified in advance.

7. the fall detection method based on convolutional neural network according to claim 1, is characterized in that,

In the step "input the test set into the trained model, and use the test set to test the accuracy of the model;", the test set comes from the UR Fall Detection Dataset.

8. A computer device, comprising a memory, a processor, and a computer program stored on the memory and operable on the processor, characterized in that, when the processor executes the program, any one of claims 1 to 7 is realized The steps of the method.

9. A computer-readable storage medium, on which a computer program is stored, wherein, when the program is executed by a processor, the steps of the method according to any one of claims 1 to 7 are realized.

10. A processor, wherein the processor is used to run a program, wherein the method according to any one of claims 1 to 7 is executed when the program is running.