CN115578792A

CN115578792A - Method and system for early warning and detection of indoor personnel falls based on machine vision

Info

Publication number: CN115578792A
Application number: CN202211281236.8A
Authority: CN
Inventors: 孙慧杰; 陈英伦; 刘万泉; 吴雨瑶
Original assignee: Sun Yat Sen University
Current assignee: Sun Yat Sen University
Priority date: 2022-10-19
Filing date: 2022-10-19
Publication date: 2023-01-06

Abstract

The invention relates to a machine vision-based indoor personnel falling early warning detection method and system, wherein the method comprises the following steps: the method comprises the steps of obtaining field images shot by machine vision, collecting the field images to form a field data set, inputting the field images into a deep learning network model based on a target detection algorithm, carrying out image preprocessing on the field images, obtaining human body images and posture characteristics of the human body images, storing the human body images and the field images in the field data set after the human body images are associated, carrying out posture characteristic analysis on the human body images through the deep learning network model and outputting a detection result, wherein the deep learning network model is trained by a training set and a verification set which are formed by selecting and collecting behavior and posture characteristics when the human body falls down. The invention can meet certain requirements on real-time performance and accuracy, and can detect the falling of the old people and send an alarm in time according to the behavior and posture characteristics when the old people fall on the premise of hardly hindering the daily life of the old people in practical application.

Description

Method and system for early warning and detection of indoor personnel falls based on machine vision

技术领域technical field

本发明涉及基于机器视觉的室内人员跌倒预警检测方法及系统，属于图像处理和识别领域。The invention relates to a machine vision-based indoor occupant fall warning detection method and system, and belongs to the field of image processing and recognition.

背景技术Background technique

在传统的跌倒检测研究领域，基于环境的跌倒检测系统存在受外部环境影响大的缺陷；可穿戴式跌倒检测系统存在一定程度上干扰老年人正常活动的缺陷；而视觉预警跌倒检测系统的技术相对先进，它的目标识别率和准确率高，对老年人影响较小，成为了广受欢迎的研究领域之一。但在跌倒预警检测研究领域，相较于跌倒后行为检测，其研究难度更大，尤其是利用机器视觉方法实现的跌倒行为预警检测，对目标行为分类和检测内容也更加复杂。In the traditional fall detection research field, the environment-based fall detection system has the defect that it is greatly affected by the external environment; the wearable fall detection system has the defect of interfering with the normal activities of the elderly to a certain extent; and the technology of the visual early warning fall detection system is relatively Advanced, its high target recognition rate and accuracy rate, less impact on the elderly, has become one of the popular research fields. However, in the field of fall warning detection research, compared with post-fall behavior detection, its research is more difficult, especially the use of machine vision methods to achieve fall behavior early warning detection, the target behavior classification and detection content is also more complicated.

发明内容Contents of the invention

本发明提供一种基于机器视觉的室内人员跌倒预警检测方法及系统，旨在至少解决现有技术中存在的技术问题之一。The present invention provides a machine vision-based indoor occupant fall warning detection method and system, aiming at at least solving one of the technical problems existing in the prior art.

本发明的技术方案涉及一种基于机器视觉的室内人员跌倒预警检测方法，根据本发明的方法包括以下步骤：The technical solution of the present invention relates to a machine vision-based indoor personnel fall warning detection method, the method according to the present invention includes the following steps:

S10、获取机械视觉拍摄的现场图像并汇集形成现场数据集，将所述现场图像输入到基于目标检测算法的深度学习网络模型中；S10. Acquire on-site images captured by machine vision and collect them to form an on-site data set, and input the on-site images into a deep learning network model based on a target detection algorithm;

S20、对现场图像进行图像预处理，获取人体图像及其体态特征，并将所述人体图像与现场图像关联后存储在所述现场数据集中；S20. Perform image preprocessing on the on-site image, acquire the human body image and its posture features, and store the human body image in the on-site data set after being associated with the on-site image;

S30、通过所述深度学习网络模型对所述人体图像的进行体态特征分析并输出检测结果；S30. Analyzing the body features of the human body image through the deep learning network model and outputting a detection result;

其中，所述深度学习网络模型由基于跌倒时行为体态特征选取并汇集形成的训练集和验证集来训练。Wherein, the deep learning network model is trained based on a training set and a verification set formed by selecting and collecting the characteristics of behavior and posture during a fall.

进一步，其中，所述深度学习网络模型的结构包括输入端、基准网络、Neck网络和用于输出检测结果的输出端；其中，所述步骤S20包括：Further, wherein, the structure of the deep learning network model includes an input terminal, a reference network, a Neck network and an output terminal for outputting detection results; wherein, the step S20 includes:

在所述输入端对输入的所述现场图像进行图像预处理；performing image preprocessing on the input live image at the input terminal;

所述基准网络包括Focus模块和CSPNet模块，其中，所述Focus模块的操作包括切割已预处理的现场图像并将现场图像的多个特征信息组合通道中；所述CSPNet模块的操作包括通过跨阶段层次结构将经过特征映射分成的基本层的两部分合并；The reference network includes a Focus module and a CSPNet module, wherein the operation of the Focus module includes cutting the preprocessed live image and combining multiple feature information channels of the live image; the operation of the CSPNet module includes passing through the cross-stage The hierarchy merges the two parts of the base layer separated by feature maps;

所述Neck网络连接所述基准网络和所述输出端，并通过FPN和PAN构成完整的特征金字塔网络。The Neck network connects the reference network and the output terminal, and forms a complete feature pyramid network through FPN and PAN.

进一步，其中，所述步骤S20中：Further, wherein, in the step S20:

所述图像预处理操作包括马赛克增强、自适应锚框计算和自适应图像缩放。The image preprocessing operations include mosaic enhancement, adaptive anchor box calculation and adaptive image scaling.

进一步，其中，对于所述深度学习网络模型的训练，所述训练集和所述验证集的形成包括以下步骤：Further, wherein, for the training of the deep learning network model, the formation of the training set and the verification set includes the following steps:

S40、制定基于跌倒时行为体态特征的跌倒预警检测标准，基于所述跌倒预警检测标准对训练数据库进行图像的分类选取并汇集形成数据集，将所述数据集的图像进行标注并按比例分成训练集和验证集；其中，所述训练数据库包括UR Fall Detection Dataset数据库资料、 Le2i数据集的视频资料和所述现场数据集。S40. Formulate a fall warning detection standard based on the behavior and posture characteristics when falling, classify and select images from the training database based on the fall warning detection standard and collect them to form a data set, mark the images of the data set and divide them into training in proportion Set and verification set; Wherein, described training database comprises UR Fall Detection Dataset database material, the video material of Le2i data set and described field data set.

进一步，其中，所述步骤S40中：Further, wherein, in the step S40:

所述预警检测标准依据的跌倒时行为体态特征包括有周遭无可扶持对象时伸出手臂、出现用手撑地肘部微屈的姿态、步伐呈交叉状或者重叠状、人体倾斜角度的变化情况和检测矩形框宽高比的变化情况。The behavior and posture characteristics of the fall on the basis of the early warning detection standard include stretching out the arms when there is no supportable object around, showing the posture of slightly bending the elbows with the hands on the ground, crossing or overlapping steps, and changes in the angle of inclination of the human body. And detect the change of the aspect ratio of the rectangular box.

进一步，其中，所述步骤S40中：Further, wherein, in the step S40:

S41、基于训练数据库，对给定的跌倒视频进行抽帧处理，并保存每一帧的训练图像；S41. Based on the training database, perform frame extraction processing on a given fall video, and save the training image of each frame;

S42、根据所述跌倒预警检测标准，按跌倒前行为和跌倒后结果对所述训练图像进行分类选取，并按二者图像数量为8:1的比例收集并形成数据集；S42. According to the fall warning detection standard, classify and select the training images according to the behavior before the fall and the result after the fall, and collect and form a data set according to the ratio of the number of the two images being 8:1;

S43、对所述数据集的图像进行标注，然后按4:1的比例将所述数据集划分为训练集和验证集。S43. Label the images of the data set, and then divide the data set into a training set and a verification set at a ratio of 4:1.

进一步，其中，对所述深度学习网络模型的训练包括以下步骤：Further, wherein, the training of the deep learning network model comprises the following steps:

S50、通过预测框、置信度和类别概率的误差加权总和表示损失函数，根据所述损失函数计算获得的损失值调整所述跌倒预警模型的参数；S50. Representing a loss function by the error weighted sum of the prediction frame, confidence degree and class probability, and adjusting the parameters of the fall warning model according to the loss value calculated by the loss function;

其中，所述预测框的损失值计算如下：Wherein, the loss value of the prediction frame is calculated as follows:

其中，CIOU表示损失函数，IOU表示回归损失函数；ρ为两矩形框的中心点直线距离，c 为最小包围两矩形框的矩形的对角线长度，v为两矩形框的宽高比相似度，α为v的影响因子。Among them, CIOU represents the loss function, and IOU represents the regression loss function; ρ is the straight-line distance between the center points of the two rectangular frames, c is the diagonal length of the smallest rectangle surrounding the two rectangular frames, and v is the aspect ratio similarity of the two rectangular frames , α is the impact factor of v.

进一步，其中，所述步骤S50中：Further, wherein, in the step S50:

所述置信度的损失值计算如下：The loss value for the confidence level is calculated as follows:

loss_BCE(z，x，y)＝-L(z，x，y)×logp(z，x，y)-(1-L(z，x，y))loss _BCE (z, x, y)=-L(z, x, y)×logp(z, x, y)-(1-L(z, x, y))

其中，loss_BCE表示分类损失函数的置信度数值，矩阵L表示置信度标签，矩阵P表示预测置信度，z，x，y的取值范围如下：Among them, loss _BCE represents the confidence value of the classification loss function, the matrix L represents the confidence label, and the matrix P represents the prediction confidence. The value ranges of z, x, and y are as follows:

则整体网络的置信度损失值计算如下：Then the confidence loss value of the overall network is calculated as follows:

loss_obj80＝a×l_obj+(1-a)×l_noobj loss _obj80 = a×l _obj +(1-a)×l _noobj

其中，z，x，y的取值范围如上；mask表示每一个矩形框的掩码值。Among them, the value ranges of z, x, and y are as above; mask represents the mask value of each rectangular box.

本发明的技术方案还涉及计算机可读存储介质，其上储存有程序指令，所述程序指令被处理器执行时实施上述的方法。The technical solution of the present invention also relates to a computer-readable storage medium on which program instructions are stored, and the above-mentioned method is implemented when the program instructions are executed by a processor.

本发明的技术方案还涉及基于机器视觉的室内人员跌倒预警检测系统，所述系统包括计算机装置，该计算机装置包含上述计算机可读存储介质。The technical solution of the present invention also relates to an indoor occupant fall warning and detection system based on machine vision. The system includes a computer device, and the computer device includes the above-mentioned computer-readable storage medium.

本发明的有益效果如下：The beneficial effects of the present invention are as follows:

本发明基于机器视觉的室内人员跌倒预警检测方法及系统，从视觉预警的跌倒检测系统方向出发，设计并实现一种基于机器视觉的室内老人跌倒预警检测算法，满足一定的实时性和准确性要求，使得该算法在实际运用上，既能做到在几乎不妨碍老人日常生活的前提下，及时检测到老人的跌倒并发出警报，有利于保障老年人的身体健康，并且，在一定程度上解决了当前缺乏与跌倒预警检测相关的数据集的问题；实现了在室内环境下的跌倒预警功能，为目标检测算法的选择和相关数据集的构建，提供了一套可行的方案；为将来具体应用于老人日常室内生活监测和身体安全保障等方面提供了一种可能。The present invention is based on the machine vision-based indoor occupant fall early warning detection method and system. Starting from the direction of the visual early warning fall detection system, a machine vision-based indoor elderly fall early warning detection algorithm is designed and implemented to meet certain real-time and accuracy requirements. , so that in practical application, the algorithm can detect the fall of the elderly in time and issue an alarm under the premise of hardly hindering the daily life of the elderly, which is conducive to protecting the health of the elderly, and to a certain extent solves the problem of It solves the current lack of data sets related to fall warning detection; realizes the fall warning function in indoor environments, and provides a set of feasible solutions for the selection of target detection algorithms and the construction of related data sets; for future specific applications It provides a possibility for the daily indoor life monitoring and physical safety protection of the elderly.

附图说明Description of drawings

图1是根据本发明的基于机器视觉的室内人员跌倒预警检测方法的基本流程图。FIG. 1 is a basic flow chart of a machine vision-based indoor occupant fall warning detection method according to the present invention.

图2是根据本发明方法的算法网络结构图。Fig. 2 is an algorithmic network structure diagram according to the method of the present invention.

图3是根据本发明实施例的YOLOV5黑边填充原理框图。Fig. 3 is a schematic block diagram of YOLOV5 black border filling according to an embodiment of the present invention.

图4是根据本发明实施例的数据集制作流程图。Fig. 4 is a flow chart of creating a data set according to an embodiment of the present invention.

图5是根据本发明实施例的数据集部分图像。Fig. 5 is an image of a portion of a data set according to an embodiment of the present invention.

图6是根据本发明实施例的根据体态特征选取的部分跌倒前预警动作图像。Fig. 6 is a partial pre-fall warning action image selected according to body characteristics according to an embodiment of the present invention.

图7是根据本发明实施例的标注界面和XML数据文件的示意图。Fig. 7 is a schematic diagram of an annotation interface and an XML data file according to an embodiment of the present invention.

图8是根据本发明实施例的模型训练的混淆矩阵的示意图。Fig. 8 is a schematic diagram of a confusion matrix for model training according to an embodiment of the present invention.

图9是根据本发明实施例的模型训练的评估指标变化曲线图。FIG. 9 is a graph showing changes in evaluation indicators for model training according to an embodiment of the present invention.

图10是本发明实施例的部分实验结果的示意图。Fig. 10 is a schematic diagram of some experimental results of the embodiment of the present invention.

具体实施方式detailed description

以下将结合实施例和附图对本发明的构思、具体结构及产生的技术效果进行清楚、完整的描述，以充分地理解本发明的目的、方案和效果。The idea, specific structure and technical effects of the present invention will be clearly and completely described below in conjunction with the embodiments and accompanying drawings, so as to fully understand the purpose, scheme and effect of the present invention.

需要说明的是，如无特殊说明，当某一特征被称为“固定”、“连接”在另一个特征，它可以直接固定、连接在另一个特征上，也可以间接地固定、连接在另一个特征上。本文所使用的单数形式的“一种”、“所述”和“该”也旨在包括多数形式，除非上下文清楚地表示其他含义。此外，除非另有定义，本文所使用的所有的技术和科学术语与本技术领域的技术人员通常理解的含义相同。本文说明书中所使用的术语只是为了描述具体的实施例，而不是为了限制本发明。本文所使用的术语“和/或”包括一个或多个相关的所列项目的任意的组合。It should be noted that, unless otherwise specified, when a feature is called "fixed" or "connected" to another feature, it can be directly fixed and connected to another feature, or indirectly fixed and connected to another feature. on a feature. As used herein, the singular forms "a", "the" and "the" are also intended to include the plural unless the context clearly dictates otherwise. Also, unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art. The terms used in the specification herein are for describing specific embodiments only, and are not intended to limit the present invention. As used herein, the term "and/or" includes any combination of one or more of the associated listed items.

应当理解，尽管在本公开可能采用术语第一、第二、第三等来描述各种元件，但这些元件不应限于这些术语。这些术语仅用来将同一类型的元件彼此区分开。例如，在不脱离本公开范围的情况下，第一元件也可以被称为第二元件，类似地，第二元件也可以被称为第一元件。本文所提供的任何以及所有实例或示例性语言(“例如”、“如”等)的使用仅意图更好地说明本发明的实施例，并且除非另外要求，否则不会对本发明的范围施加限制。It should be understood that although the terms first, second, third etc. may be used in the present disclosure to describe various elements, these elements should not be limited by these terms. These terms are only used to distinguish elements of the same type from one another. For example, a first element could be termed a second element, and, similarly, a second element could be termed a first element, without departing from the scope of the present disclosure. The use of any and all examples, or exemplary language ("such as," "such as," etc.) provided herein is intended merely to better illuminate embodiments of the invention and does not pose a limitation on the scope of the invention unless otherwise claimed .

参照图1和图2，在一些实施例中，根据本发明的基于机器视觉的室内人员跌倒预警检测方法，至少包括以下步骤：Referring to Figures 1 and 2, in some embodiments, the machine vision-based indoor occupant fall warning detection method according to the present invention at least includes the following steps:

S10、获取机械视觉拍摄的现场图像并汇集形成现场数据集，将现场图像输入到基于目标检测算法的深度学习网络模型中；S10. Obtain on-site images captured by machine vision and collect them to form an on-site data set, and input the on-site images into a deep learning network model based on a target detection algorithm;

S20、对现场图像进行图像预处理，获取人体图像及其体态特征，并将人体图像与现场图像关联后存储在现场数据集中；S20. Perform image preprocessing on the on-site image, acquire the human body image and its body features, and store the human body image in the on-site data set after being associated with the on-site image;

S30、通过深度学习网络模型对人体图像的进行体态特征分析并输出检测结果；S30, analyze the posture characteristics of the human body image through the deep learning network model and output the detection result;

其中，深度学习网络模型由基于跌倒时行为体态特征选取并汇集形成的训练集和验证集来训练。Among them, the deep learning network model is trained based on the training set and verification set formed by selecting and collecting the characteristics of the behavior and posture during the fall.

步骤S20的具体实施方式The specific implementation manner of step S20

本发明实施例采用YOLOV5算法对人体跌倒前趋势行为进行目标检测，其中YOLOV5的网络结构主要包括四个主要模块，分别是输入端、主干网络、Neck网络和Head输出端。The embodiment of the present invention uses the YOLOV5 algorithm to detect the trend behavior of the human body before the fall, wherein the network structure of the YOLOV5 mainly includes four main modules, which are the input terminal, the backbone network, the Neck network and the Head output terminal.

通过输入端输入现场图像，其中输入的现场图像的大小要求为640*640。对所述现场图像进行预处理，其预处理操作包括有图像翻转、马赛克增强、自适应锚框计算和自适应图像缩放等。Input the live image through the input terminal, and the size of the input live image is required to be 640*640. The on-site image is preprocessed, and the preprocessing operations include image flipping, mosaic enhancement, adaptive anchor frame calculation, and adaptive image scaling.

其中，马赛克据增强方法是一种图像拼接技术，使用随机缩放、剪裁和布局方法将一组图像拼接在一起，将多张照片集成到一张照片中进行检测和学习，从而大幅提高了网络的训练性能，降低了模型的内存需求，丰富了数据集的具体内容。Among them, the mosaic data enhancement method is an image stitching technology, which uses random scaling, cropping and layout methods to stitch a group of images together, and integrates multiple photos into one photo for detection and learning, thus greatly improving the network quality. The training performance of the model reduces the memory requirements of the model and enriches the specific content of the data set.

在自适应锚点框计算方面，YOLOV5将计算功能加入到程序代码中，并在每次训练中根据所提供的数据集标签自动推断出最合适的锚点框。在自适应图像缩放方面，YOLOV5算法为原始图像的缩放图像自适应地添加最少的黑边(参见图3)，在一定程度上提高了推理速度，减少了计算量和信息冗余，使得目标检测速度会得到一定的提升。In terms of adaptive anchor box calculation, YOLOV5 adds the calculation function to the program code, and automatically infers the most suitable anchor box according to the provided data set labels in each training. In terms of adaptive image scaling, the YOLOV5 algorithm adaptively adds the least black borders to the scaled image of the original image (see Figure 3), which improves the inference speed to a certain extent, reduces the amount of calculation and information redundancy, and makes target detection The speed will be improved to a certain extent.

现场图像完成预处理后，进入基准网络(又称BackBone网络、主干网络)，进行图像切片操作，并进行跨级并行网络操作。基准网络通过降低图像的宽度和高度信息，提高通道数的方式，提高了特征信息提取的层次性。其中，通过YOLOV5算法将Focus、CSPNet等模块进行结合，构成了网络的主体。在进入基准网络之前，Focus模块切割经过输入端预处理的图像，并将图像的高度和宽度等特征信息组合到通道中；CSPNet模块将基本层的特征映射分为两部分，然后使用跨阶段层次结构将其进行合并，从而使得CNN深度学习网络模型的学习能力增强，学习效率提高，计算瓶颈和存储成本降低。After the on-site image is pre-processed, it enters the benchmark network (also known as BackBone network, backbone network) for image slicing operations and cross-level parallel network operations. The baseline network improves the level of feature information extraction by reducing the width and height information of the image and increasing the number of channels. Among them, modules such as Focus and CSPNet are combined through the YOLOV5 algorithm to form the main body of the network. Before entering the benchmark network, the Focus module cuts the preprocessed image at the input end, and combines the feature information such as the height and width of the image into the channel; the CSPNet module divides the feature map of the basic layer into two parts, and then uses the cross-stage layer The structure is combined, so that the learning ability of the CNN deep learning network model is enhanced, the learning efficiency is improved, and the calculation bottleneck and storage cost are reduced.

再有，在所述基准网络和Head输出端之间插入Neck网络，通过FPN和PAN构成完整的特征金字塔网络。其中。Neck网络作为躯干网络和顶层网络的连接枢纽，在主干网络和最终的Head输出端之间插入，作为网络的颈部。具体地，YOLOV5的Neck网络主要由FPN和PAN构成。FPN采用自顶向下的方式，将高层部位的信息传递到下部，输出相关的特征图，主要用于解决多个尺度的检测问题，从而提高了检测微小物品时的性能。PAN通过自底而上的方式，构成一个完整的特征金字塔网络。FPN和PAN两者结合，相互作用，通过顶层传达强语义特征，通过底层传输强定位特征，从而获取到更多的特征信息。Furthermore, a Neck network is inserted between the reference network and the Head output terminal, and a complete feature pyramid network is formed by FPN and PAN. in. The Neck network serves as the connection hub between the trunk network and the top-level network, and is inserted between the backbone network and the final Head output as the neck of the network. Specifically, the Neck network of YOLOV5 is mainly composed of FPN and PAN. FPN uses a top-down approach to transfer information from high-level parts to the lower part and output related feature maps. It is mainly used to solve detection problems at multiple scales, thereby improving the performance of detecting tiny objects. PAN forms a complete feature pyramid network in a bottom-up manner. FPN and PAN combine and interact to convey strong semantic features through the top layer and strong positioning features through the bottom layer, so as to obtain more feature information.

最后，输出端又称Head网络，负责输出目标检测结果。Finally, the output end, also known as the Head network, is responsible for outputting the target detection results.

步骤S40的具体实施方式The specific implementation manner of step S40

参见图4和图5，本发明实施例中的数据集包括有UR跌倒后结果DetectionDataset 数据库资料、Le2i数据集的视频资料和所述现场图像，以提高数据集包含图像的质量和数量，改善模型检测效果。对上述数据库中跌倒视频进行抽帧处理，保存每一帧图像至文件夹中，形成初步的数据集。Referring to Fig. 4 and Fig. 5, the data set in the embodiment of the present invention includes the result DetectionDataset database material after UR falls, the video material of Le2i data set and described scene image, to improve the quality and the quantity that data set contains image, improve model Detection effect. Frame extraction processing is performed on the fall video in the above database, and each frame of image is saved to a folder to form a preliminary data set.

建立初步的数据集后，根据体态特征制定跌倒预警检测标准，以对上述初步数据集的图像进行选取。跌倒行为根据诱因和初始位置的不同，在生活中常见的分类为：滑倒、绊倒、晕倒、失足跌倒、从站着到坐着摔倒、从座位上起来时摔倒、支撑东西时摔倒等等。通过对跌倒行为图像和视频的研究，总结得出符合跌倒时行为体态特征的跌倒预警检测标准包括有周遭无可扶持对象时伸出手臂、出现用手撑地肘部微屈的姿态、步伐呈交叉状或者重叠状、人体倾斜角度的变化情况和检测矩形框宽高比的变化情况(参见图6)。After the preliminary data set is established, the fall warning detection standard is formulated according to the body characteristics to select the images of the above preliminary data set. According to different triggers and initial positions, fall behaviors are commonly classified as: slipping, tripping, fainting, slipping and falling, falling from standing to sitting, falling when getting up from a seat, and when supporting something fall and so on. Through the research on the images and videos of falling behavior, it is concluded that the fall warning and detection standards that conform to the behavior and posture characteristics of the fall include stretching out the arm when there is no object around, showing a posture of slightly bending the elbow with the hand on the ground, and walking in a straight line. Intersecting or overlapping shapes, changes in the angle of inclination of the human body, and changes in the aspect ratio of the detected rectangular frame (see Figure 6).

根据跌倒预警检测标准，按跌倒前行为(Warning)和跌倒后结果(fall)两种类别分别选择图像，其中，跌倒前行为图像类别(Warning)为符合跌倒前行为的图像，而跌倒后结果类别图像(fall)为跌倒后结果的图像，两种类别图像数量按1:8的比例选取并形成最终的数据集。According to the fall warning detection standard, images are selected according to the two categories of pre-fall behavior (Warning) and post-fall result (fall). The image (fall) is the image of the result of the fall, and the number of images of the two categories is selected at a ratio of 1:8 to form the final data set.

完成对初步数据集的选取收集后，对每一张图像进行标注，具体地，采用Labelimg这一开源的数据标注工具，并选择VOC标签格式，保存为XML文件，标注界面图和XML数据文件(参见图7)。After completing the selection and collection of the preliminary data set, label each image. Specifically, use Labelimg, an open source data labeling tool, and select the VOC label format, save it as an XML file, and label the interface diagram and XML data file ( See Figure 7).

对已经标注的图像和标签文件进行分类，划分为训练集和验证集，其中训练集：验证集按4：1的比例进行划分，并生成相应的TXT标签文件。采用训练集作为模型拟合的数据样本，并通过验证集初步评估模型能力和测试模型的泛化功能，其中训练误差通过梯度最小化的方式，在训练过程中进行优化。Classify the marked images and label files, divide them into training set and verification set, where the training set: verification set is divided according to the ratio of 4:1, and generate corresponding TXT label files. The training set is used as the data sample for model fitting, and the model ability and the generalization function of the test model are initially evaluated through the verification set. The training error is optimized during the training process by minimizing the gradient.

步骤S50的具体实施方式The specific implementation manner of step S50

本发明实施列的YOLOV5目标识别算法对结果的评价指标通过CIOU损失函数提现，以度量深度学习网络模型预测信息与期望信息的差距。其损失函数可分为三个部分：预测框误差(lossrect)、置信度误差(lossobj)和类别概率误差(lossclc)。其中矩形框，即预测框，用于表征目标的大小以及精确位置。置信度表征所预测矩形框的可信程度，取值范围为0～1，值越大说明该矩形框中越可能存在预期检测的目标。类别概率表征目标类别的预测程度，因此损失函数可以描述成预测框、置信度和类别概率三个误差的加权总和，用以下公式加以定义。The YOLOV5 target recognition algorithm implemented in the present invention withdraws the evaluation index of the result through the CIOU loss function to measure the gap between the predicted information and the expected information of the deep learning network model. Its loss function can be divided into three parts: prediction frame error (lossrect), confidence error (lossobj) and category probability error (lossclc). Among them, the rectangular frame, that is, the prediction frame, is used to represent the size and precise position of the target. Confidence represents the credibility of the predicted rectangular frame, and the value ranges from 0 to 1. The larger the value, the more likely there is an expected detection target in the rectangular frame. The category probability characterizes the degree of prediction of the target category, so the loss function can be described as the weighted sum of the three errors of the prediction box, confidence degree and category probability, which is defined by the following formula.

loss＝A×loss_bj+b×loss_rect+c×loss_clc loss＝A×loss _bj +b×loss _rect +c×loss _clc

CIOU损失函数依据矩形框的宽高比、中心线距离和最小外接矩形三个因素，通过预测边框和实际边框的对比来评价检测效果。同时，加入了加权Nms非极大值抑制的方法，该方法采用计算局部最大值，即寻找接近预期的输出边界框的方式，从而择选出最大概率的检测结果，并显示其边界框和类别。其中，使用CIOU损失函数计算预测框误差，使用BCE_loss分类损失函数计算置信度误差和类别概率误差。The CIOU loss function evaluates the detection effect by comparing the predicted frame with the actual frame based on three factors: the aspect ratio of the rectangular frame, the centerline distance, and the minimum circumscribed rectangle. At the same time, the weighted Nms non-maximum value suppression method is added. This method uses the method of calculating the local maximum value, that is, looking for a way close to the expected output bounding box, so as to select the detection result with the highest probability and display its bounding box and category. . Among them, the CIOU loss function is used to calculate the prediction frame error, and the BCE_loss classification loss function is used to calculate the confidence error and category probability error.

Complete-IOU(CIOU)的回归损失函数可使训练的稳定程度和收敛性有所提升。它在IOU 主要表示矩形框的重叠部分，DIOU主要表示矩形框的中心点距离、两矩形框的重叠面积和矩形框的宽高比，并将三者以相对比例加入到计算式当中。回归损失函数计算公式如下所示：The regression loss function of Complete-IOU (CIOU) can improve the stability and convergence of training. It mainly represents the overlapping part of the rectangular frame in IOU, and DIOU mainly represents the center point distance of the rectangular frame, the overlapping area of the two rectangular frames and the aspect ratio of the rectangular frame, and the three are added to the calculation formula in relative proportion. The regression loss function calculation formula is as follows:

上式中，ρ为两矩形框的中心点直线距离，c为最小包围两矩形框的矩形的对角线长度，υ为两矩形框的宽高比相似度，α为υ的影响因子。由于arctan函数的取值范围是0～π/2，那么υ的取值范围为0～1，当两矩形框的宽高比相等时，υ取0，当两矩形框的宽高比为无穷时，υ取1。当两矩形框之间的距离为无穷，且宽高比差别无限大时DIOU取-1，υ取1，α取0.5，此时CIOU取-1.5；当两矩形框完全重叠时，DIOU取1，υ取0，α取0，则CIOU取 1。因此CIOU取值范围是-1.5～1。In the above formula, ρ is the straight-line distance between the center points of the two rectangular frames, c is the diagonal length of the smallest rectangle enclosing the two rectangular frames, υ is the similarity of the aspect ratio of the two rectangular frames, and α is the influencing factor of υ. Since the value range of the arctan function is 0～π/2, the value range of υ is 0～1. When the aspect ratio of the two rectangular frames is equal, υ is 0. When the aspect ratio of the two rectangular frames is infinite , υ takes 1. When the distance between the two rectangular frames is infinite and the difference in aspect ratio is infinite, DIOU takes -1, υ takes 1, and α takes 0.5. At this time, CIOU takes -1.5; when the two rectangular boxes completely overlap, DIOU takes 1 , υ takes 0, α takes 0, then CIOU takes 1. Therefore, the value range of CIOU is -1.5~1.

通过上式公式可得，当IOU越大，也即两矩形框的重叠区域越大，则α越大，从而v的影响越大；反之IOU越小也即两矩形框的重叠区域越小，则α越小，从而v的影响越小。因此，可以根据重叠区域的大小来决定矩形框的优化方向，从而实现提高检测进度的目的。Through the above formula, it can be obtained that when the IOU is larger, that is, the overlapping area of the two rectangular frames is larger, then the α is larger, and thus the influence of v is greater; otherwise, the smaller the IOU, that is, the smaller the overlapping area of the two rectangular frames, The smaller α is, the smaller the influence of v is. Therefore, the optimization direction of the rectangular frame can be determined according to the size of the overlapping area, so as to achieve the purpose of improving the detection progress.

对于一张图像而言，深度学习网络模型在处理前会将其分割成固定数目且为固定大小的网格，对每一个网格，深度学习网络模型会预测其邻近的三个矩形框，并给出矩形框的宽、高，分类概率，置信度等信息，因此对每一个矩形框都会输出一个与之对应的置信度数值，它的标签维度应该与深度学习网络模型的输出维度保持一致。置信度标签表示了预测框的可靠程度，其取值范围与CIOU损失函数相对应，为0～1(当CIOU值为负数时，进行截断，即将0作为标签)。置信度标签的值越大，代表检测目标的效果与实际结果更加接近，越符合实验的预期效果。For an image, the deep learning network model will divide it into a fixed number of grids with a fixed size before processing. For each grid, the deep learning network model will predict its adjacent three rectangular boxes, and Given the width, height, classification probability, confidence and other information of the rectangular frame, a corresponding confidence value will be output for each rectangular frame, and its label dimension should be consistent with the output dimension of the deep learning network model. The confidence label indicates the reliability of the prediction frame, and its value range corresponds to the CIOU loss function, which is 0 to 1 (when the CIOU value is negative, it will be truncated, that is, 0 will be used as the label). The larger the value of the confidence label, the closer the effect of the detection target is to the actual result, and the more in line with the expected effect of the experiment.

BCE_loss分类损失函数以置信度标签为依据，设矩阵L代表置信度标签，矩阵P代表预测置信度，那么矩阵中每个数值的BCE loss的计算公式和取值范围如下：The BCE_loss classification loss function is based on the confidence label. Let the matrix L represent the confidence label, and the matrix P represent the prediction confidence. Then the calculation formula and value range of BCE loss for each value in the matrix are as follows:

其中，lossBCE表示分类损失函数的置信度数值，矩阵L表示置信度标签，矩阵P表示预测置信度，z，x，y的取值范围如下：Among them, lossBCE represents the confidence value of the classification loss function, the matrix L represents the confidence label, and the matrix P represents the prediction confidence. The value ranges of z, x, and y are as follows:

loss_obj80＝a×l_obj+(1-a)×l_noobj loss _obj80 = a×l _obj +(1-a)×l _noobj

其中，z，x，y的取值范围如上；mask表示每一个矩形框的掩码值，值，用于判断矩形框内是否存在检测目标，如果存在用true表示，不存在则用false表示。将mask掩码为true的预测框作为正样本，将掩码为false的预测框作为负样本，则总损失值为两者的加权平均值。Among them, the value ranges of z, x, and y are as above; mask represents the mask value of each rectangular frame, and the value is used to judge whether there is a detection target in the rectangular frame. If it exists, it is represented by true, and if it does not exist, it is represented by false. The prediction box whose mask is true is used as a positive sample, and the prediction box whose mask is false is taken as a negative sample, and the total loss value is the weighted average of the two.

将制作并分类完成的数据集进行模型训练，训练时参数设置包括：学习率为0.01，学习率动量为0.943；权重文件选择YOLOV5s，并修改好文件中的网络模型参数；权重衰减系数为 0.0005；训练的batch根据显存大小决定，本次训练所采用的batch为2；数据集训练次数 epoch为100。训练完成后会得到文件名为best.pt和last.pt文件，分别代表最优训练模型文件和最后一次的训练模型文件。The data set that has been produced and classified is used for model training. The parameter settings during training include: the learning rate is 0.01, the learning rate momentum is 0.943; the weight file is YOLOV5s, and the network model parameters in the file are modified; the weight decay coefficient is 0.0005; The training batch is determined according to the size of the video memory. The batch used in this training is 2; the number of data set training epochs is 100. After the training is completed, the files named best.pt and last.pt will be obtained, which respectively represent the optimal training model file and the last training model file.

通过训练后获得的混淆矩阵(参见图8)，对于跌倒前行为类，召回率为1.00，精确率约为0.9524，对于类跌倒后结果的召回率和精确率均为0.95。Through the confusion matrix obtained after training (see Figure 8), the recall rate is 1.00 and the precision rate is about 0.9524 for the pre-fall behavior class, and both the recall rate and the precision rate for the post-fall result are 0.95.

训练时获得的评估指标变化曲线图(参见图9)，从左到右，从上到下分别是：训练集位置损失，训练集置信度损失，训练集分类损失，精确率，召回率，测试集位置损失，测试集置信度损失，测试集分类损失，map(0.5和0.95时)均值平均精度，可见在训练过程中，随着训练次数的增加，精确率和召回率等指标都逐渐达到了较为理想的水平。The change curve of evaluation indicators obtained during training (see Figure 9), from left to right, and from top to bottom are: training set position loss, training set confidence loss, training set classification loss, precision rate, recall rate, test Set position loss, test set confidence loss, test set classification loss, map (0.5 and 0.95) mean average precision, it can be seen that in the training process, with the increase of training times, the precision rate and recall rate and other indicators have gradually reached more ideal level.

采用训练模型文件，随机选取图像、视频和打开实时摄像头进行验证，部分检测结果截图参见图10所示，本实验所设计的跌倒预警系统能够较为准确地识别室内人体产生的各种方向和姿势的跌倒结果，并根据跌倒前一定时间内人体姿势的变化趋势预警检测可能出现的跌倒风险，对明显有跌倒趋势的行为赋予较高的置信度水准，从而实现跌倒预警的效果。Use the training model file, randomly select images and videos, and turn on the real-time camera for verification. Some screenshots of the detection results are shown in Figure 10. The fall warning system designed in this experiment can more accurately identify the various directions and postures of the indoor human body. Fall results, and early warning detection of possible fall risks based on the change trend of human body posture within a certain period of time before the fall, and assign a higher level of confidence to behaviors that clearly have a tendency to fall, so as to achieve the effect of fall warning.

应当认识到，本发明实施例中的方法步骤可以由计算机硬件、硬件和软件的组合、或者通过存储在非暂时性计算机可读存储器中的计算机指令来实现或实施。所述方法可以使用标准编程技术。每个程序可以以高级过程或面向对象的编程语言来实现以与计算机系统通信。然而，若需要，该程序可以以汇编或机器语言实现。在任何情况下，该语言可以是编译或解释的语言。此外，为此目的该程序能够在编程的专用集成电路上运行。It should be recognized that the method steps in the embodiments of the present invention may be implemented or implemented by computer hardware, a combination of hardware and software, or by computer instructions stored in a non-transitory computer-readable memory. The methods can use standard programming techniques. Each program can be implemented in a high-level procedural or object-oriented programming language to communicate with the computer system. However, the programs can be implemented in assembly or machine language, if desired. In any case, the language may be a compiled or interpreted language. Furthermore, the program can be run on an application specific integrated circuit programmed for this purpose.

此外，可按任何合适的顺序来执行本文描述的过程的操作，除非本文另外指示或以其他方式明显地与上下文矛盾。本文描述的过程(或变型和/或其组合)可在配置有可执行指令的一个或多个计算机系统的控制下执行，并且可作为共同地在一个或多个处理器上执行的代码 (例如，可执行指令、一个或多个计算机程序或一个或多个应用)、由硬件或其组合来实现。所述计算机程序包括可由一个或多个处理器执行的多个指令。In addition, operations of processes described herein may be performed in any suitable order unless otherwise indicated herein or otherwise clearly contradicted by context. The processes described herein (or variations and/or combinations thereof) can be performed under the control of one or more computer systems configured with executable instructions, and as code that collectively executes on one or more processors (e.g. , executable instructions, one or more computer programs or one or more applications), hardware or a combination thereof. The computer program comprises a plurality of instructions executable by one or more processors.

进一步，所述方法可以在可操作地连接至合适的任何类型的计算平台中实现，包括但不限于个人电脑、迷你计算机、主框架、工作站、网络或分布式计算环境、单独的或集成的计算机平台、或者与带电粒子工具或其它成像装置通信等等。本发明的各方面可以以存储在非暂时性存储介质或设备上的机器可读代码来实现，无论是可移动的还是集成至计算平台，如硬盘、光学读取和/或写入存储介质、RS1M、ROM等，使得其可由可编程计算机读取，当存储介质或设备由计算机读取时可用于配置和操作计算机以执行在此所描述的过程。此外，机器可读代码，或其部分可以通过有线或无线网络传输。当此类媒体包括结合微处理器或其他数据处理器实现上文所述步骤的指令或程序时，本文所述的发明包括这些和其他不同类型的非暂时性计算机可读存储介质。当根据本发明所述的方法和技术编程时，本发明还可以包括计算机本身。Further, the method can be implemented in any type of computing platform operably connected to a suitable one, including but not limited to personal computer, minicomputer, main frame, workstation, network or distributed computing environment, stand-alone or integrated computer platform, or communicate with charged particle tools or other imaging devices, etc. Aspects of the invention can be implemented as machine-readable code stored on a non-transitory storage medium or device, whether removable or integrated into a computing platform, such as a hard disk, optically read and/or written storage medium, RSIM, ROM, etc., such that they can be read by a programmable computer, when the storage medium or device is read by the computer, can be used to configure and operate the computer to perform the processes described herein. Additionally, the machine-readable code, or portions thereof, may be transmitted over a wired or wireless network. The invention described herein includes these and other various types of non-transitory computer-readable storage media when such media include instructions or programs that implement the steps described above in conjunction with a microprocessor or other data processor. The invention may also include the computer itself when programmed according to the methods and techniques described herein.

计算机程序能够应用于输入数据以执行本文所述的功能，从而转换输入数据以生成存储至非易失性存储器的输出数据。输出信息还可以应用于一个或多个输出设备如显示器。在本发明优选的实施例中，转换的数据表示物理和有形的对象，包括显示器上产生的物理和有形对象的特定视觉描绘。Computer programs can be applied to input data to perform the functions described herein, thereby transforming the input data to generate output data stored to non-volatile memory. Output information may also be applied to one or more output devices such as a display. In a preferred embodiment of the invention, the transformed data represents physical and tangible objects, including specific visual depictions of physical and tangible objects produced on a display.

以上所述，只是本发明的较佳实施例而已，本发明并不局限于上述实施方式，只要其以相同的手段达到本发明的技术效果，凡在本发明的精神和原则之内，所做的任何修改、等同替换、改进等，均应包含在本发明保护的范围之内。在本发明的保护范围内其技术方案和/ 或实施方式可以有各种不同的修改和变化。The above is only a preferred embodiment of the present invention, and the present invention is not limited to the above-mentioned implementation, as long as it achieves the technical effect of the present invention by the same means, within the spirit and principles of the present invention, any Any modification, equivalent replacement, improvement, etc., shall be included within the protection scope of the present invention. Various modifications and changes may be made to its technical solutions and/or implementations within the protection scope of the present invention.

Claims

1. An indoor person falling early warning detection method based on machine vision is characterized by comprising the following steps:

s10, acquiring field images shot by mechanical vision, converging the field images to form a field data set, and inputting the field images into a deep learning network model based on a target detection algorithm;

s20, carrying out image preprocessing on the field image, acquiring a human body image and the posture characteristics thereof, associating the human body image with the field image and storing the human body image and the field image in the field data set;

s30, performing posture characteristic analysis on the human body image through the deep learning network model and outputting a detection result;

wherein the deep learning network model is trained by a training set and a verification set which are selected and collected based on behavior and posture characteristics when falling.

2. The method of claim 1, wherein the structure of the deep learning network model comprises an input, a reference network, a hack network, and an output for outputting a detection result; wherein the step S20 includes:

performing image preprocessing on the field image input at the input end;

the reference network comprises a Focus module and a CSPNet module, wherein the Focus module is operated by cutting the preprocessed live image and combining a plurality of characteristic information of the live image into a channel; the CSPNet module operates to merge two portions of the base layer into which the feature map is divided by a cross-phase hierarchy;

the Neck network is connected with the reference network and the output end, and forms a complete characteristic pyramid network through the FPN and the PAN.

3. The method according to claim 2, wherein in step S20:

the image pre-processing operations include mosaic enhancement, adaptive anchor frame calculation, and adaptive image scaling.

4. The method of claim 1, wherein for training of the deep-learning network model, the forming of the training set and the validation set comprises the steps of:

s40, making a falling early warning detection standard based on behavior and posture characteristics during falling, classifying and selecting images from a training database based on the falling early warning detection standard, collecting the images to form a data set, labeling the images of the data set, and dividing the images into a training set and a verification set according to a proportion; wherein the training database comprises UR FallDetection Dataset database data, video data of Le2i Dataset and the field Dataset.

5. The method of claim 4, wherein in step S40:

the behavior and posture characteristics of the early warning detection standard when falling down comprise the posture that an arm is stretched out when no supportable object exists in the periphery, the slightly bent elbow is supported by a hand, the step is in a cross shape or an overlapping shape, the change condition of the inclination angle of the human body and the change condition of the aspect ratio of the detection rectangular frame.

6. The method of claim 5, wherein in step S40:

s41, based on a training database, performing frame extraction processing on a given falling video, and storing a training image of each frame;

s42, according to the falling early warning detection standard, classifying and selecting the training images according to the results before falling and after falling, and collecting and forming a data set according to the ratio that the number of the two images is 8;

s43, labeling the images of the data set, and dividing the data set into a training set and a verification set according to the proportion of 4.

7. The method of claim 1, wherein the training of the deep learning network model comprises the steps of:

s50, representing a loss function through a prediction frame, a confidence coefficient and an error weighted sum of class probabilities, and adjusting parameters of the fall early warning model according to a loss value obtained through calculation of the loss function;

wherein the loss value of the prediction box is calculated as follows:

wherein, CIOU represents a loss function, IOU represents a regression loss function; rho is the straight-line distance between the center points of the two rectangular frames, c is the length of the diagonal line of the rectangle which minimally surrounds the two rectangular frames, upsilon is the similarity of the width-height ratio of the two rectangular frames, and alpha is the influence factor of upsilon.

8. The method of claim 7, wherein in step S50:

the loss value of confidence is calculated as follows:

loss _BCE (z，x，y)＝-L(z，x，y)×logp(z，x，y)-(1-L(z，x，y))

therein, loss _BCE The confidence value of the classification loss function is represented, the matrix L represents a confidence label, the matrix P represents the prediction confidence, and the value ranges of z, x and y are as follows:

the confidence loss value for the overall network is calculated as follows:

loss _obj80 ＝a×l _obj +(1-a)×l _noobj

wherein the mask represents a mask value for each rectangular box.

9. A computer readable storage medium having stored thereon program instructions which, when executed by a processor, implement the method of any one of claims 1 to 8.

10. The utility model provides an indoor personnel early warning detecting system that tumbles based on machine vision which characterized in that includes:

computer arrangement comprising a computer readable storage medium according to claim 9.