[go: up one dir, main page]

CN115082825B - A method and device for real-time human fall detection and alarm based on video - Google Patents

A method and device for real-time human fall detection and alarm based on video Download PDF

Info

Publication number
CN115082825B
CN115082825B CN202210682605.8A CN202210682605A CN115082825B CN 115082825 B CN115082825 B CN 115082825B CN 202210682605 A CN202210682605 A CN 202210682605A CN 115082825 B CN115082825 B CN 115082825B
Authority
CN
China
Prior art keywords
human body
falling
video
detection
fall
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210682605.8A
Other languages
Chinese (zh)
Other versions
CN115082825A (en
Inventor
刘振
陈星如
黄德峰
陈土培
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sino Singapore International Joint Research Institute
Original Assignee
Sino Singapore International Joint Research Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sino Singapore International Joint Research Institute filed Critical Sino Singapore International Joint Research Institute
Priority to CN202210682605.8A priority Critical patent/CN115082825B/en
Publication of CN115082825A publication Critical patent/CN115082825A/en
Application granted granted Critical
Publication of CN115082825B publication Critical patent/CN115082825B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/41Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6245Protecting personal data, e.g. for financial or medical purposes
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/082Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/766Arrangements for image or video recognition or understanding using pattern recognition or machine learning using regression, e.g. by projecting features on hyperplanes
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/44Event detection
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/103Static body considered as a whole, e.g. static pedestrian or occupant recognition
    • GPHYSICS
    • G08SIGNALLING
    • G08BSIGNALLING OR CALLING SYSTEMS; ORDER TELEGRAPHS; ALARM SYSTEMS
    • G08B21/00Alarms responsive to a single specified undesired or abnormal condition and not otherwise provided for
    • G08B21/02Alarms for ensuring the safety of persons
    • G08B21/04Alarms for ensuring the safety of persons responsive to non-activity, e.g. of elderly persons
    • G08B21/0407Alarms for ensuring the safety of persons responsive to non-activity, e.g. of elderly persons based on behaviour analysis
    • G08B21/043Alarms for ensuring the safety of persons responsive to non-activity, e.g. of elderly persons based on behaviour analysis detecting an emergency event, e.g. a fall
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Multimedia (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Databases & Information Systems (AREA)
  • Computing Systems (AREA)
  • Medical Informatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computational Linguistics (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioethics (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computer Hardware Design (AREA)
  • Computer Security & Cryptography (AREA)
  • Human Computer Interaction (AREA)
  • Psychiatry (AREA)
  • Psychology (AREA)
  • Social Psychology (AREA)
  • Gerontology & Geriatric Medicine (AREA)
  • Business, Economics & Management (AREA)
  • Emergency Management (AREA)
  • Image Analysis (AREA)
  • Alarm Systems (AREA)

Abstract

本发明提供了一种基于视频实时人体跌倒检测及报警的方法和装置。所述方法包括以下步骤:在公开的人体检测数据集的基础上,增添对跌倒状态下人体图像信息,建立跌倒检测数据集;对YOLOv2‑Tiny网络进行剪枝改造,搭建跌倒检测模型,基于自行建立的跌倒数据集对跌倒检测模型进行训练;通过对人体跌倒前后的图像长宽比α和重心点偏移量d的敏感程度赋予不同的阈值,获得新的对跌倒的判定参数,实现对跌倒的判断;获取实时视频作为跌倒检测的视频流传入跌倒检测模型,跌倒检测模型先对输入的视频处理得到的帧图像进行人体目标检测,再对识别到的人体目标进行跌倒检测和报警。本发明可应用于容易发生跌倒人群和场所,提高对跌倒人群的救助效率。

The present invention provides a method and device for real-time human fall detection and alarm based on video. The method comprises the following steps: based on a public human detection data set, adding human image information in a fall state to establish a fall detection data set; pruning and transforming the YOLOv2‑Tiny network, building a fall detection model, and training the fall detection model based on a self-established fall data set; assigning different thresholds to the sensitivity of the image aspect ratio α and the center of gravity offset d before and after the human body falls, obtaining new judgment parameters for falls, and realizing the judgment of falls; obtaining real-time video as a video stream for fall detection and inputting it into the fall detection model, the fall detection model first performs human target detection on the frame image obtained by processing the input video, and then performs fall detection and alarm on the identified human target. The present invention can be applied to people and places where falls are prone to occur, and improve the efficiency of rescuing people who fall.

Description

Method and device for detecting and alarming human body falling in real time based on video
Technical Field
The invention relates to the field of real-time object detection, in particular to a method and a device for detecting and alarming human body falling in real time based on video.
Background
According to the report of Focus, the population over 60 years of age worldwide is expected to reach 20 hundred million by 2050. In this regard, this will represent more than one fifth of the population worldwide. However, since the body functions of the elderly are greatly reduced with age, most of the elderly suffer from cardiovascular diseases, osteoporosis and other symptoms, and the possibility of falling down of the elderly is increased due to the side effects of the diseases and the medicines. The old people can be sprain, contusion and fracture caused by falling, and even other diseases can be caused to occur. If the old cannot be cured and salvaged in time after falling, the life safety of the old is inevitably seriously threatened.
With the increasing population of elderly people, more advanced home monitoring is needed, while still allowing individuals to maintain personal autonomy, privacy. According to the data of the U.S. disease control and prevention center, nearly one quarter of the elderly fall each year, which is the main cause of traumatic hospitalization. The fall detection in the market at present is the touch type induction detection, one is a wearable sensor scheme, the other is an environment type scheme, the wearable sensor scheme is that a multi-axis acceleration sensor is worn on the body of the old, and the environment type scheme is that a sound and collision detection sensor is installed on the floor of a home environment. The former judges whether the person falls down through acceleration parameters, and the latter judges whether the person falls down through environmental parameters such as sound, vibration and the like. Both have some drawbacks. For example, a wrist strap touch-pressure detection fall device with publication number CN108041772a and an alarm system formed by the same and a wearable sensor scheme in an implementation method thereof have various limitations such as dislike of wearing, forgetting to wear, fixed wearing position, battery endurance and the like of the elderly. The environment type scheme in the floor and the method for detecting the falling of the publication number CN111538264A needs to modify the living environment, and the system has complex composition and higher cost. Therefore, the invention provides equipment for detecting falling based on video processing, which can detect falling of multiple targets and judge whether the targets fall or not.
Disclosure of Invention
Aiming at the problem that people easy to fall down and starting from the actual requirement of keeping personal autonomy and privacy, the invention provides a method and a device for detecting and alarming human falling down in real time based on video. The main starting point of the invention is that when people easy to fall, the equipment can send a falling message to relatives of the people easy to fall or community social workers to achieve an alarm means. Because the fall detection system uses the video image detection solution, the non-contact induction is provided through accurate image data, the sensor is not required to be worn on the body of people easy to fall or installed in the home environment, and only the camera is required to judge whether the target falls down or not and alarm. However, the camera can relate to the problem of user privacy, so that all the processing is performed through a local edge computing chip, falling of people can not be detected through a cloud server, and disclosure of user privacy is avoided.
The object of the invention is achieved by at least one of the following technical solutions.
A method for detecting and alarming human body falling in real time based on video comprises the following steps:
S1, adding human body image information in a falling state on the basis of a public human body detection data set, and establishing a falling detection data set;
S2, pruning is carried out on YOLOv-Tiny networks, a falling detection model is built, and the falling detection model is trained based on a self-built falling data set;
s3, acquiring new judging parameters for falling through giving different thresholds to the sensitivity degree of the aspect ratio alpha and the gravity center offset d of the images before and after the falling of the human body, and judging the falling;
s4, acquiring a real-time video as a video stream of the falling detection to enter a falling detection model, wherein the falling detection model firstly carries out human body target detection on a frame image obtained by processing the input video, and then carries out falling detection and alarm on the identified human body target.
Further, in step S1, the disclosed human body detection data set is subjected to preliminary screening, and useless data is removed, wherein the useless data is an image with only partial hands and legs and no human body trunk feature in the image, and 80% of the human body part features are the useful data;
Collecting and shooting multiple segments of human body falling videos, carrying out frame interception on the human body falling videos, intercepting multiple frames every second, manually screening out pictures between a standing state and a falling-down state, obtaining a data set with a human body falling posture, carrying out data labeling on the screened pictures, and labeling all the screened pictures as person labels;
Carrying out data enhancement on the data set with the human body falling gesture, generating a plurality of data enhancement pictures through image processing operation on the pictures in the data set with the human body falling gesture, wherein the data enhancement pictures comprise rotation, translation and stretching, and obtaining the data set with the human body falling gesture after the data enhancement;
combining the data-enhanced data set with the human body falling gesture with the screened public human body detection data set to establish a falling detection data set.
In step S2, the original weight of YOLOv-Tiny network model is trained in weight sparsification, namely, a scaling factor gamma introduced with BN layer is introduced into each channel to multiply the output of the channel;
The channel pruning and fine tuning are specifically as follows:
After the regular term of the scaling factor L1 is introduced, the scaling factors in the obtained model all tend to 0, then the absolute values of the scaling factors are firstly sequenced, the scaling factors at 80% of the positions in the scaling factors sequenced from small to large are taken as thresholds, channels corresponding to small scaling factors gamma below the thresholds are cut off, the channels corresponding to the small scaling factors gamma are essentially and directly cut off convolution kernels corresponding to the channels, and by doing so, a compact network which has fewer parameters, small occupied memory in running and low calculation amount can be obtained, namely a Prune-YOLOv-Tiny network model;
Changing the category C in the detection layer in Prune-YOLOv-Tiny network model into a single category, namely 1, simultaneously updating the values of 5 anchors in the detection layer for a fall detection dataset by using a K-means algorithm, wherein the values of the anchors are the width and the height of a prediction frame, and calculating the number R of convolution kernels in the last layer of convolution layer in Prune-YOLOv-Tiny network model by adopting a formula (1) to modify the number R into corresponding 30, wherein the number R is as follows:
R=Anchors*(5+C) (1)
Wherein, anchor is Prune-YOLOv-Tiny network model number of prediction frames, the number is 5, the obtained R is the output channel size, used for obtaining the tensor of the prediction output, generating the target frame, thus obtaining the fall detection model.
Further, in step S2, training is performed on the fall detection model through the data set to obtain weights in the fall detection model, which is specifically as follows:
Dividing a data set into a test set and a training set according to a ratio of 1:9, and calculating an IOU loss, a classification loss and a coordinate loss of a model prediction result and a real label on a fall detection model loaded with pre-training weights by each training picture in the training set, wherein the pre-training weights are existing weights;
when the falling detection model is fitted, the weight of the falling detection model is saved, the learning rate is adjusted, and the next training round is started.
Further, under the same test set, the weights of the falling detection models obtained through different rounds of training are compared, the scores of the accuracy and recall rate of different weights on the test set are compared, and the weight with the highest score is selected as the human body detection weight in the final falling detection model.
Further, in step S3, in the fall detection model, the aspect ratio α is calculated using formula (2), specifically as follows:
Wherein, alpha t is the length-width ratio of the human body detected by the image at the t frame, and h t is the length of the human body detected by the image at the t frame, and w t is the width of the human body detected by the image at the t frame;
the gravity center offset d is calculated by adopting a formula (3), and is specifically as follows:
dt+1=Pt-Pt+1 (3)
Wherein d t+1 is the gravity center offset of the human body detected by the image at the t+1st frame and the human body detected by the image of the previous frame, and P t and P t+1 are the gravity center point positions of the human body detected by the image at the t frame and the t+1st frame respectively, and comprise the position information of the ordinate and the abscissa;
Different thresholds are given through combination of the length-width ratio and the offset of the gravity center point, and new judging parameters for falling are obtained, so that whether the detected human body falls is judged, and the method is concretely as follows:
When alpha t is more than or equal to 1.1 and d t+1≥0.08*wt, judging that the patient falls;
In another case, when the aspect ratio of the human body detected by two continuous frames of images is larger than a threshold value, namely alpha t is larger than or equal to 1.5 and alpha t+1 is larger than or equal to 1.5, the human body is judged to fall, and the two thresholds are obtained through a large number of fall data comparison tests.
Further, in step S4, a real-time video is acquired as a video stream of fall detection, and is input into a fall detection model, the fall detection model firstly detects a human body target on a frame image obtained by processing the input video, then detects the identified human body target, and if it is determined that a human body falls in the real-time video, an alarm is given.
A device for detecting and alarming human body falling in real time based on video comprises a camera, a video decoding and encoding device and an edge computing chip;
The video decoding and encoding device is used for decoding and encoding the images acquired by the camera, and the edge computing chip is used for training a falling detection model and judging whether a human body falls in the video in real time.
Further, the video decoding and encoding device is a professional SMART IP CAMERA SoC Hi3516DV300, and the video acquired by the camera is processed into a video stream of 1920x1080@30fps and is transmitted to the edge computing chip.
Further, in the edge computing chip, a falling detection model is obtained through training according to the disclosed human body detection data set;
The edge computing chip processes the video stream transmitted by the video decoding and encoding device into a frame image and inputs a falling detection model, the falling detection model accelerates to compute whether a human body target exists in the frame image, and computes whether the human body target falls, and if the human body of the target is judged to fall, the edge computing chip sends a signal to alarm.
Compared with the prior art, the invention has the advantages that:
according to the method, the human body detection data set is established for the falling human body model in a targeted manner, the model is pruned, the model size is reduced, the detection precision and rate are improved, the falling detection of deep learning under the localization processing real-time video is realized, cloud computing is not needed, and the falling detection under the multi-person scene is realized under the condition that the privacy of a user is not leaked. Meanwhile, the defects that the wearable falling detection equipment is conflicted with wearing and battery endurance are overcome, the defects that the environment type falling detection system is complex in composition and high in cost are overcome, usability is improved, and detection cost is reduced. The device can be applied to people and places where falling easily occurs, and the rescue efficiency of the falling people is improved.
Drawings
Fig. 1 is a schematic flow chart of a method for detecting and alarming human body falling in real time based on video in an embodiment of the invention;
fig. 2 is a flowchart illustrating an operation of a device for detecting and alarming a fall of a human body in real time based on video according to an embodiment of the present invention;
fig. 3 is a flowchart of a fall algorithm determination in an embodiment of the invention;
fig. 4 is a training flowchart of a fall detection model according to an embodiment of the invention;
fig. 5 is a flow chart of the preparation of a fall data set according to an embodiment of the invention;
FIG. 6 is a block diagram of a fall detection model in an embodiment of the invention;
Fig. 7 is a block diagram showing a device for detecting and alarming human body falling in real time based on video in embodiment 3 of the present invention;
fig. 8 is a flowchart for producing a physical fall data set for the elderly in embodiment 2 of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, a detailed description of the specific implementation of the present invention will be given below with reference to the accompanying drawings and examples.
Example 1:
A method for detecting and alarming human body falling in real time based on video, as shown in figure 1, comprises the following steps:
s1, adding human body image information in a falling state on the basis of the disclosed human body detection data set, and establishing a falling detection data set, as shown in fig. 5;
the disclosed human body detection data set is subjected to preliminary screening, useless data are removed, the useless data refer to images with only partial hands and legs and no human body trunk characteristics in the images, and 80% of the human body part characteristics appear in the images to be useful data;
Collecting and shooting multiple segments of human body falling videos, carrying out frame interception on the human body falling videos, intercepting 8 frames per second, manually screening out pictures between a standing state and a falling-down state, obtaining a data set with a human body falling posture, carrying out data annotation on the screened pictures, and marking all the screened pictures as person labels;
Carrying out data enhancement on the data set with the human body falling gesture, generating 10 data enhancement pictures through image processing operation on the pictures in the data set with the human body falling gesture, wherein the data enhancement pictures comprise rotation, translation and stretching, and the data set with the human body falling gesture after the data enhancement is obtained;
combining the data-enhanced data set with the human body falling gesture with the screened public human body detection data set to establish a falling detection data set.
S2, pruning is carried out on YOLOv-Tiny networks, a falling detection model is built, and the falling detection model is trained based on a self-built falling data set;
Carrying out weight sparsification training on the original weight of YOLOv-Tiny network model, namely introducing a scaling factor gamma introduced with a BN layer into each channel to multiply the output of the channel;
The channel pruning and fine tuning are specifically as follows:
After the regular term of the scaling factor L1 is introduced, the scaling factors in the obtained model all tend to 0, then the absolute values of the scaling factors are firstly sequenced, the scaling factors at 80% of the positions in the scaling factors sequenced from small to large are taken as thresholds, channels corresponding to small scaling factors gamma below the thresholds are cut off, the channels corresponding to the small scaling factors gamma are essentially and directly cut off convolution kernels corresponding to the channels, and by doing so, a compact network which has fewer parameters, small occupied memory in running and low calculation amount can be obtained, namely a Prune-YOLOv-Tiny network model;
Changing the category C in the detection layer in Prune-YOLOv-Tiny network model into a single category, namely 1, simultaneously updating the values of 5 anchors in the detection layer for a fall detection dataset by using a K-means algorithm, wherein the values of the anchors are the width and the height of a prediction frame, and calculating the number R of convolution kernels in the last layer of convolution layer in Prune-YOLOv-Tiny network model by adopting a formula (1) to modify the number R into corresponding 30, wherein the number R is as follows:
R=Anchors*(5+C) (1)
Wherein, anchor is Prune-YOLOv-Tiny network model number of prediction frames, the number is 5, the obtained R is the output channel size, used for obtaining the tensor of the prediction output, generating the target frame, thus obtaining the fall detection model.
The fall detection model is essentially Prune-YOLOv2-Tiny network model, consists of 16 layers, and relates to 3 layers, namely a convolution layer (9 layers), a maximum pooling layer (6 layers) and a final detection layer (1 layer). The convolution layer plays a role in feature extraction, and the pooling layer is used for sampling and reducing the size of the feature map. For RGB images of arbitrary resolution, each pixel is divided by 255 to be converted into a [0,1] interval, scaled to 416×416 according to the aspect ratio of the original image, and the shortage is filled with 0.5. The obtained array with the size of 416×416×3 is input into a fall detection model, and the array with the size of 13×13×55 is output after detection. The number of which is the number of channels from 416 x 416 to 13 x 13.
As shown in fig. 4, training is performed on the fall detection model through the data set to obtain weights in the fall detection model, which is specifically as follows:
dividing a data set into a test set and a training set according to a ratio of 1:9, and calculating an IOU loss, a classification loss and a coordinate loss of a model prediction result and a real label on a fall detection model loaded with pre-training weights of yolov-tiny.weights provided by a official network of a YOLO network model by each training picture in the training set;
when the falling detection model is fitted, the weight of the falling detection model is saved, the learning rate is adjusted, and the next training round is started.
Further, under the same test set, the weights of the falling detection models obtained through different rounds of training are compared, the scores of the accuracy and recall rate of different weights on the test set are compared, and the weight with the highest score is selected as the human body detection weight in the final falling detection model.
S3, acquiring new judging parameters for falling through giving different thresholds to the sensitivity degree of the aspect ratio alpha and the gravity center offset d of the images before and after the falling of the human body, and judging the falling;
In the fall detection model, the aspect ratio α is calculated by using the formula (2), and is specifically as follows:
Wherein, alpha t is the length-width ratio of the human body detected by the image at the t frame, and h t is the length of the human body detected by the image at the t frame, and w t is the width of the human body detected by the image at the t frame;
the gravity center offset d is calculated by adopting a formula (3), and is specifically as follows:
dt+1=Pt-Pt+1 (3)
Wherein d t+1 is the gravity center offset of the human body detected by the image at the t+1st frame and the human body detected by the image of the previous frame, and P t and P t+1 are the gravity center point positions of the human body detected by the image at the t frame and the t+1st frame respectively, and comprise the position information of the ordinate and the abscissa;
Different thresholds are given through combination of the length-width ratio and the offset of the gravity center point, and new judging parameters for falling are obtained, so that whether the detected human body falls is judged, and the method is concretely as follows:
When alpha t is more than or equal to 1.1 and d t+1≥0.08*wt, judging that the patient falls;
In another case, when the aspect ratio of the human body detected by two continuous frames of images is larger than a threshold value, namely alpha t is larger than or equal to 1.5 and alpha t+1 is larger than or equal to 1.5, the human body is judged to fall, and the two thresholds are obtained through a large number of fall data comparison tests.
S4, as shown in FIG. 2, acquiring a real-time video as a video stream of fall detection to transmit a fall detection model, wherein the fall detection model firstly carries out human body target detection on a frame image obtained by processing an input video, and then carries out fall detection and alarm on an identified human body target;
The method comprises the steps of acquiring a real-time video as a video stream of fall detection, inputting the video stream into a fall detection model, detecting human body targets by the fall detection model on frame images obtained by processing the input video, detecting the fall of the identified human body targets, and giving an alarm if the fall of the human body in the real-time video is judged.
In example 2, compared with example 1, the data set of VOCs and COCOs on the internet is adopted, as shown in FIG. 8, in example 2, a falling-down state is not added, and although the human body features are wide because of the huge data set, the falling-down human body target can be identified without adding the falling-down state, but the human body target in the disclosed data set has an image that only hands and feet are identified as people, so that falling-down misjudgment is easily caused.
Example 3:
The device comprises a camera, a video decoding and encoding device, an edge computing chip, a cooling fan and a shell, wherein the power output of a tail wire 5V1A, a network port and a 12V2A input are reserved, the device supplies power, the cooling fan and the shell form a unified power supply circuit comprising an FPGA board, the 5V1A output in the tail wire, the 5V1A power output provides a fan power supply for chip cooling, and the network port is used for network communication data exchange. As shown in fig. 7.
The video decoding and encoding device is used for decoding and encoding the images acquired by the camera, and the edge computing chip is used for training a falling detection model and judging whether a human body falls in the video in real time.
The video decoding and encoding device is a professional SMART IP CAMERA SoC Hi3516DV300, processes images acquired by a camera into a video stream of 1920x1080@30fps, and transmits the video stream to an edge computing chip.
Training according to the disclosed human body detection data set in an edge calculation chip to obtain a falling detection model;
The edge computing chip processes the video stream transmitted by the video decoding and encoding device into a frame image and inputs a falling detection model, the falling detection model accelerates the computing of whether a human body target exists in the frame image, and calculates whether the human body target falls, if the human body of the target is judged to fall, the device transmits data through a network port connection network cable, and relatives or nearby social workers are informed through communication modes such as short messages, weChats, mailboxes and telephones, so that the purpose of timely helping falling personnel is achieved.

Claims (9)

1. The method for detecting and alarming human body falling in real time based on video is characterized by comprising the following steps:
S1, adding human body image information in a falling state on the basis of a public human body detection data set, and establishing a falling detection data set;
S2, pruning and reforming the YOLOv-Tiny network, building a fall detection model, training the fall detection model based on a self-built fall data set, and performing weight sparsification training on the original weight of the YOLOv-Tiny network model, namely introducing a scaling factor gamma introduced into a BN layer into each channel to multiply the output of the channel;
The channel pruning and fine tuning are specifically as follows:
After a scaling factor L1 regular term is introduced, scaling factors in the obtained model all tend to 0, then the absolute values of the scaling factors are firstly sequenced, 80% of the scaling factors sequenced from small to large are taken as thresholds, channels corresponding to small scaling factors gamma below the thresholds are cut off, and a Prune-YOLOv-Tiny network model is obtained;
Changing the category C in the detection layer in Prune-YOLOv-Tiny network model into a single category, namely 1, simultaneously updating the values of 5 anchors in the detection layer for a fall detection dataset by using a K-means algorithm, wherein the values of the anchors are the width and the height of a prediction frame, and calculating the number R of convolution kernels in the last layer of convolution layer in Prune-YOLOv-Tiny network model by adopting a formula (1) to modify the number R into corresponding 30, wherein the number R is as follows:
R=Anchors*(5+C) (1)
Wherein, anchor is the number of predicted frames in Prune-YOLOv-Tiny network model, the number is 5, R is the channel size of output, used for obtaining the tensor of predicted output, generating the target frame, thus obtaining the fall detection model;
s3, acquiring new judging parameters for falling through giving different thresholds to the sensitivity degree of the aspect ratio alpha and the gravity center offset d of the images before and after the falling of the human body, and judging the falling;
s4, acquiring a real-time video as a video stream of the falling detection to enter a falling detection model, wherein the falling detection model firstly carries out human body target detection on a frame image obtained by processing the input video, and then carries out falling detection and alarm on the identified human body target.
2. The method for detecting and alarming human body falling based on video real time according to claim 1, wherein in step S1, the disclosed human body detection data set is subjected to preliminary screening, useless data is removed, the useless data is images with only partial hands and legs and no human body trunk feature, and 80% of the human body part features are useful data;
Collecting and shooting multiple segments of human body falling videos, carrying out frame interception on the human body falling videos, intercepting multiple frames every second, manually screening out pictures between a standing state and a falling-down state, obtaining a data set with a human body falling posture, carrying out data labeling on the screened pictures, and labeling all the screened pictures as person labels;
Carrying out data enhancement on the data set with the human body falling gesture, generating a plurality of data enhancement pictures through image processing operation on the pictures in the data set with the human body falling gesture, wherein the data enhancement pictures comprise rotation, translation and stretching, and obtaining the data set with the human body falling gesture after the data enhancement;
combining the data-enhanced data set with the human body falling gesture with the screened public human body detection data set to establish a falling detection data set.
3. The method for detecting and alarming human body falling based on video real time according to claim 2, wherein in step S2, training is performed on a falling detection model through a data set to obtain weights in the falling detection model, specifically as follows:
Dividing a data set into a test set and a training set according to a ratio of 1:9, and calculating an IOU loss, a classification loss and a coordinate loss of a model prediction result and a real label on a fall detection model loaded with pre-training weights by each training picture in the training set, wherein the pre-training weights are existing weights;
when the falling detection model is fitted, the weight of the falling detection model is saved, the learning rate is adjusted, and the next training round is started.
4. A method for real-time human body fall detection and alarm based on video according to claim 3, wherein the weights of fall detection models obtained by training in different rounds are compared under the same test set, the scores of the accuracy and recall of different weights on the test set are compared, and the weight with the highest score is selected as the human body detection weight in the final fall detection model.
5. The method for detecting and alarming human body falling in real time based on video according to claim 1, wherein in step S3, in the falling detection model, the aspect ratio α is calculated by using formula (2), specifically as follows:
Wherein, alpha t is the length-width ratio of the human body detected by the image at the t frame, and h t is the length of the human body detected by the image at the t frame, and w t is the width of the human body detected by the image at the t frame;
the gravity center offset d is calculated by adopting a formula (3), and is specifically as follows:
dt+1=Pt-Pt+1 (3)
Wherein d t+1 is the gravity center offset of the human body detected by the image at the t+1st frame and the human body detected by the image of the previous frame, and P t and P t+1 are the gravity center point positions of the human body detected by the image at the t frame and the t+1st frame respectively, and comprise the position information of the ordinate and the abscissa;
Different thresholds are given through combination of the length-width ratio and the offset of the gravity center point, and new judging parameters for falling are obtained, so that whether the detected human body falls is judged, and the method is concretely as follows:
When alpha t is more than or equal to 1.1 and d t+1≥0.08*wt, judging that the patient falls;
In another case, when the aspect ratio of the human body detected by two continuous frames of images is larger than a threshold value, namely alpha t is larger than or equal to 1.5 and alpha t+1 is larger than or equal to 1.5, the human body is judged to fall, and the two thresholds are obtained through a large number of fall data comparison tests.
6. The method for detecting and alarming a human body falling based on video according to any one of claims 1 to 5, wherein in step S4, a real-time video is acquired as a video stream of falling detection and is input into a falling detection model, the falling detection model detects a human body target on a frame image obtained by processing the input video, then detects the identified human body target, and alarms if it is determined that a human body falls in the real-time video.
7. The device for detecting and alarming human body falling in real time based on video is characterized by comprising a camera, a video decoding and encoding device and an edge computing chip;
Wherein, the camera is used for image acquisition, and video decoding encoding device is used for decoding and encoding the image that the camera gathered, and the border calculation chip is used for training border calculation chip and is used for training the fall detection model of claim 1 to whether the human body falls in the real-time judgement video exists.
8. The device for detecting and alarming human body falling based on video real time according to claim 7, wherein the video decoding and encoding device is a professional SMART IP CAMERA SoC Hi3516DV300, and the video acquired by the camera is processed into a video stream of 1920x1080@30fps and transmitted to the edge computing chip.
9. The device for detecting and alarming human body falling based on video real time according to claim 7, wherein the edge calculation chip trains according to the disclosed human body detection data set to obtain a falling detection model;
The edge computing chip processes the video stream transmitted by the video decoding and encoding device into a frame image and inputs a falling detection model, the falling detection model accelerates to compute whether a human body target exists in the frame image, and computes whether the human body target falls, and if the human body of the target is judged to fall, the edge computing chip sends a signal to alarm.
CN202210682605.8A 2022-06-16 2022-06-16 A method and device for real-time human fall detection and alarm based on video Active CN115082825B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210682605.8A CN115082825B (en) 2022-06-16 2022-06-16 A method and device for real-time human fall detection and alarm based on video

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210682605.8A CN115082825B (en) 2022-06-16 2022-06-16 A method and device for real-time human fall detection and alarm based on video

Publications (2)

Publication Number Publication Date
CN115082825A CN115082825A (en) 2022-09-20
CN115082825B true CN115082825B (en) 2025-06-27

Family

ID=83254000

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210682605.8A Active CN115082825B (en) 2022-06-16 2022-06-16 A method and device for real-time human fall detection and alarm based on video

Country Status (1)

Country Link
CN (1) CN115082825B (en)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115240015B (en) * 2022-09-23 2023-01-06 中汽数据(天津)有限公司 Training method, device, equipment and storage medium of target detection model
CN115601834B (en) * 2022-10-19 2026-01-09 中原工学院 Fall detection method based on WiFi channel state information
CN116610834B (en) * 2023-05-15 2024-04-12 三峡科技有限责任公司 A surveillance video storage and quick query method based on AI analysis
CN117058573B (en) * 2023-07-25 2025-08-26 齐丰科技股份有限公司 A fan operating status recognition algorithm and device based on video analysis
CN117115862A (en) * 2023-10-23 2023-11-24 四川泓宝润业工程技术有限公司 Fall detection method for multiple human bodies based on deep learning
CN117132949B (en) * 2023-10-27 2024-02-09 长春理工大学 An all-weather fall detection method based on deep learning
CN119445651A (en) * 2023-12-15 2025-02-14 武汉星巡智能科技有限公司 Ground area detection method, device, equipment and medium in video surveillance scene

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113408485A (en) * 2021-07-14 2021-09-17 深圳思悦创新有限公司 Method and device for detecting indoor falling of old people based on FPGA and deep learning
WO2021212883A1 (en) * 2020-04-20 2021-10-28 电子科技大学 Fall detection method based on intelligent mobile terminal

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10936913B2 (en) * 2018-03-20 2021-03-02 The Regents Of The University Of Michigan Automatic filter pruning technique for convolutional neural networks
TWI662514B (en) * 2018-09-13 2019-06-11 緯創資通股份有限公司 Falling detection method and electronic system using the same

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021212883A1 (en) * 2020-04-20 2021-10-28 电子科技大学 Fall detection method based on intelligent mobile terminal
CN113408485A (en) * 2021-07-14 2021-09-17 深圳思悦创新有限公司 Method and device for detecting indoor falling of old people based on FPGA and deep learning

Also Published As

Publication number Publication date
CN115082825A (en) 2022-09-20

Similar Documents

Publication Publication Date Title
CN115082825B (en) A method and device for real-time human fall detection and alarm based on video
CN114283469B (en) Improved YOLOv4-tiny target detection method and system
CN114565882B (en) Abnormal behavior analysis method and device based on intelligent linkage of multiple video cameras
CN110738154A (en) pedestrian falling detection method based on human body posture estimation
CN113947742B (en) Human track tracking method and device based on face recognition
CN111914819A (en) Multi-camera fusion crowd density prediction method and device, storage medium and terminal
CN108875708A (en) Video-based behavior analysis method, device, equipment, system and storage medium
CN113111767A (en) Fall detection method based on deep learning 3D posture assessment
CN116092199B (en) Employee working state identification method and identification system
CN113408435B (en) A security monitoring method, device, equipment and storage medium
CN112488019A (en) Fall detection method and device based on posture recognition, electronic equipment and storage medium
CN114902299B (en) Method, device, equipment and storage medium for detecting associated objects in images
CN115880774B (en) Body-building action recognition method and device based on human body posture estimation and related equipment
CN114241375B (en) Monitoring method for exercise process
CN115909503B (en) Fall detection method and system based on key points of human body
CN117671794A (en) Fall detection model training improvement method and fall detection method
CN116246299A (en) Low-head-group intelligent recognition system combining target detection and gesture recognition technology
CN111178134B (en) A Fall Detection Method Based on Deep Learning and Network Compression
CN118865493A (en) Gymnastics rotation training device and method of use thereof
Yen et al. Adaptive indoor people-counting system based on edge ai computing
CN117994851A (en) A method, device and equipment for detecting falls of the elderly based on multi-task learning
CN113837066A (en) Behavior recognition method and device, electronic equipment and computer storage medium
CN115273243B (en) Fall detection method, device, electronic equipment and computer readable storage medium
CN116543419A (en) Hotel health personnel wearing detection method and system based on embedded platform
CN114913585A (en) Household old man falling detection method integrating facial expressions

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant