[go: up one dir, main page]

CN119445651A - Ground area detection method, device, equipment and medium in video surveillance scene - Google Patents

Ground area detection method, device, equipment and medium in video surveillance scene Download PDF

Info

Publication number
CN119445651A
CN119445651A CN202411437328.XA CN202411437328A CN119445651A CN 119445651 A CN119445651 A CN 119445651A CN 202411437328 A CN202411437328 A CN 202411437328A CN 119445651 A CN119445651 A CN 119445651A
Authority
CN
China
Prior art keywords
human body
depth
sequence
ground area
ground
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202411437328.XA
Other languages
Chinese (zh)
Inventor
陈辉
熊章
张智
胡国湖
张青军
雷奇文
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan Xingxun Intelligent Technology Co ltd
Original Assignee
Wuhan Xingxun Intelligent Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan Xingxun Intelligent Technology Co ltd filed Critical Wuhan Xingxun Intelligent Technology Co ltd
Priority to CN202411437328.XA priority Critical patent/CN119445651A/en
Publication of CN119445651A publication Critical patent/CN119445651A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/50Depth or shape recovery
    • G06T7/55Depth or shape recovery from multiple images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/25Determination of region of interest [ROI] or a volume of interest [VOI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/28Quantising the image, e.g. histogram thresholding for discrimination between background and foreground patterns
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/46Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/52Surveillance or monitoring of activities, e.g. for recognising suspicious objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30196Human being; Person
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30232Surveillance
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Psychiatry (AREA)
  • Social Psychology (AREA)
  • Human Computer Interaction (AREA)
  • Image Analysis (AREA)

Abstract

本发明涉及视频监控技术领域,解决了现有技术中无法准确地进行地面区域检测的问题,提供了一种视频监控场景下的地面区域检测方法、装置、设备及介质。该方法包括:将实时图像输入预训练的人体目标检测模型中,利用视觉里程计算方法,对相机转动进行监测,若当前监控场景下未识别到人体目标且当前监控视角未转动,则对所述实时图像进行滤波处理和颜色空间转换,得出颜色空间图像;对所述颜色空间图像进行特征提取,将提取到的局部特征和全局特征融合,得出多尺度特征;依据所述多尺度特征,确定地面区域位置信息。本发明提高了地面区域检测的准确度。

The present invention relates to the field of video surveillance technology, solves the problem that the prior art cannot accurately detect ground areas, and provides a method, device, equipment and medium for detecting ground areas in a video surveillance scene. The method comprises: inputting a real-time image into a pre-trained human target detection model, using a visual mileage calculation method to monitor the rotation of the camera, if no human target is identified in the current monitoring scene and the current monitoring angle of view is not rotated, filtering and color space conversion are performed on the real-time image to obtain a color space image; feature extraction is performed on the color space image, and the extracted local features are fused with the global features to obtain multi-scale features; based on the multi-scale features, the ground area location information is determined. The present invention improves the accuracy of ground area detection.

Description

Ground area detection method, device, equipment and medium in video monitoring scene
The application relates to a human body fall detection method, a device, equipment and a medium combining depth measurement, which are filed on 12 months 15 days 2023, and are divisional application of an application patent application with the application number 202311742651.3.
Technical Field
The present invention relates to the field of video monitoring technologies, and in particular, to a method, an apparatus, a device, and a medium for detecting a ground area in a video monitoring scene.
Background
In video monitoring, the accurate detection of the ground area has important significance for key safety applications such as human body falling detection, falling events often occur in the ground area, so that the ground area can be accurately identified to lay a foundation for subsequent behavior identification and anomaly detection, the posture change of a human body relative to the ground can be more effectively judged through the positioning of the ground area, particularly in the scenes of old people nursing, public place safety and the like, falling events are timely detected and an alarm is sent out, the risk of accidental injury can be remarkably reduced, in addition, misjudgment and noise interference can be reduced in the ground area detection, the falling detection accuracy rate is improved, the system can focus on target activities on the ground, irrelevant environmental factors are ignored, and the detection reliability and high efficiency are ensured.
The existing ground detection technology often has the problem of insufficient accuracy in a complex monitoring scene, and is mainly characterized in the following aspects. Firstly, many algorithms rely on single features such as color or texture information, which easily leads to erroneous judgment under the conditions of illumination change, shadow interference or various ground materials, secondly, some techniques lack adaptability to camera view angle change in dynamic scenes, so that ground areas are difficult to identify stably under different angles, and furthermore, when the existing ground detection technical method processes scenes with abundant or complex details, excessive segmentation or missed detection easily occurs, and complete contours of the ground areas cannot be captured accurately, which limits the practicability of ground detection, especially in applications requiring high-precision ground identification, such as falling detection scenes.
For this reason, how to accurately perform ground area detection is a problem to be solved.
Disclosure of Invention
In view of the above, the present invention provides a method, apparatus, device and medium for detecting a ground area in a video monitoring scene, which are used for solving the problem that the ground area detection cannot be accurately performed in the prior art.
The technical scheme adopted by the invention is as follows:
In a first aspect, the present invention provides a method for detecting a ground area in a video surveillance scene, which is characterized in that the method includes:
Inputting a real-time image obtained by decomposing a real-time video stream in a video monitoring scene into a pre-trained human body target detection model, and identifying whether a human body target appears in the current monitoring scene;
monitoring the rotation of the camera by using a visual mileage calculation method, and identifying whether the current monitoring visual angle rotates or not by comparing the pose changes between adjacent frames;
If the human body target is not identified in the current monitoring scene and the current monitoring visual angle is not rotated, filtering and converting the real-time image into a color space image;
Extracting features of the color space image, and fusing the extracted local features and global features to obtain multi-scale features;
According to the multi-scale characteristics, the probability of each pixel point in the ground area is obtained, a preset probability threshold value is obtained, and a probability map is converted into a binary segmentation map;
And carrying out morphological processing on the binary segmentation map, and taking the pixel point position information in the ground area as the ground area position information.
Preferably, if the human body target is not identified in the current monitoring scene and the current monitoring view angle is not rotated, performing image segmentation on the real-time image to obtain the ground area position information, and further including:
inputting each real-time image into a pre-trained human body detection model, and outputting human body region position information;
performing depth analysis on the human body region position information and the ground region position information to obtain a ground region average depth sequence and a human body region average depth sequence corresponding to a plurality of frames of time sequence images;
And carrying out depth difference calculation on the human body region average depth sequence and the ground region average depth sequence, and detecting human body falling according to the depth difference sequence.
Preferably, the performing depth analysis on the human body region position information and the ground region position information, and obtaining a ground region average depth sequence and a human body region average depth sequence corresponding to the multi-frame time sequence image includes:
Acquiring a training image with a depth label in a video monitoring scene, and inputting the training image into a deep learning network to obtain a training model;
Inputting the real-time image into the training model for forward propagation to obtain depth values corresponding to all pixel points in the real-time image;
According to the human body region position information and the ground region position information, combining depth values corresponding to all pixel points in the real-time image to obtain a human body region depth value sequence and a ground region depth value sequence corresponding to a multi-frame time sequence image;
And obtaining the ground area average depth sequence and the human area average depth sequence according to the human area depth value sequence and the ground area depth value sequence.
Preferably, the step of obtaining the sequence of depth values of the human body region and the sequence of depth values of the ground region corresponding to the multi-frame time sequence image according to the position information of the human body region and the position information of the ground region by combining the depth values corresponding to all the pixel points in the real-time image includes:
according to the human body region position information, outputting depth values corresponding to pixel points in a human body region to the human body region depth value sequence;
and outputting depth values corresponding to the pixel points in the ground area to the ground area depth value sequence according to the ground area position information.
Preferably, the obtaining the ground area average depth sequence and the human area average depth sequence according to the human area depth value sequence and the ground area depth value sequence includes:
acquiring a first pixel point sequence in a human body area and a second pixel point sequence in a ground area according to the human body area position information and the ground area position information;
obtaining a first sliding window and a second sliding window according to the first pixel point sequence and the second pixel point sequence;
obtaining a first filtering depth value sequence and a second filtering depth value sequence according to the first sliding window and the second sliding window;
And carrying out sliding window processing on the human body region depth value sequence and the ground region depth value sequence according to the first filtering depth value sequence and the second filtering depth value sequence to obtain the ground region average depth and the human body region average depth.
Preferably, the calculating the depth difference between the average depth sequence of the human body region and the average depth sequence of the ground region, and detecting the human body fall according to the depth difference sequence includes:
Acquiring a preset depth difference threshold;
Sequentially comparing each depth difference in the depth difference sequence with the depth difference threshold value to obtain a comparison result;
when the comparison results corresponding to the continuous multi-frame images are that the depth difference is smaller than the depth difference threshold value, the human body falling result is obtained that the human body falls on the ground.
Preferably, if the human body target is not identified in the current monitoring scene and the current monitoring view angle is not rotated, performing filtering processing and color space conversion on the real-time image, and before obtaining the color space image, including:
Acquiring a pre-trained human body target detection model based on yolov s architecture;
Inputting the real-time image into the human body target detection model, and identifying whether a human body target appears in the current monitoring scene;
And monitoring the rotation of the camera by using a visual mileage calculation method, and identifying whether the current monitoring visual angle rotates or not.
In a second aspect, the present invention provides a ground area detection device in a video surveillance scene, which is characterized in that the device includes:
the human body target detection module is used for inputting a real-time image obtained by decomposing a real-time video stream in a video monitoring scene into a pre-trained human body target detection model and identifying whether a human body target appears in the current monitoring scene;
the visual angle rotation monitoring module is used for monitoring rotation of the camera by utilizing a visual mileage calculation method and identifying whether the current monitoring visual angle rotates or not by comparing pose changes between adjacent frames;
The color space image acquisition module is used for carrying out filtering processing and color space conversion on the real-time image to obtain a color space image if a human body target is not identified in the current monitoring scene and the current monitoring visual angle is not rotated;
The feature extraction module is used for extracting features of the color space image, and fusing the extracted local features and global features to obtain multi-scale features;
the binary segmentation map acquisition module is used for acquiring the probability of each pixel point in the ground area according to the multi-scale characteristics, acquiring a preset probability threshold value and converting the probability map into a binary segmentation map;
The ground area position information acquisition module is used for carrying out morphological processing on the binary segmentation map and taking the pixel point position information in the ground area as the ground area position information.
In a third aspect, the present embodiment also provides an electronic device comprising at least one processor, at least one memory and computer program instructions stored in the memory, which when executed by the processor, implement the method of the first aspect as in the above embodiments.
In a fourth aspect, embodiments of the present invention also provide a storage medium having stored thereon computer program instructions which, when executed by a processor, implement a method as in the first aspect of the embodiments described above.
In summary, the beneficial effects of the invention are as follows:
The method comprises the steps of inputting a real-time image obtained by decomposing a real-time video stream in a video monitoring scene into a pre-trained human body target detection model, identifying whether a human body target appears in the current monitoring scene, monitoring rotation of a camera by a visual mileage calculation method, identifying whether a current monitoring view angle rotates by comparing pose changes between adjacent frames, if the human body target is not identified in the current monitoring scene and the current monitoring view angle does not rotate, carrying out filtering processing and color space conversion on the real-time image to obtain a color space image, carrying out feature extraction on the color space image, fusing the extracted local feature with global feature to obtain a multi-scale feature, acquiring probability of each pixel point in a ground area according to the multi-scale feature, acquiring a preset probability threshold, converting a probability map into a binary segmentation map, carrying out morphological processing on the binary segmentation map, and taking pixel point position information in the ground area as ground area position information. According to the invention, the accuracy of ground area detection is effectively improved by combining human body target detection and visual mileage monitoring, under the condition that no human body target is detected and the view angle of a monitoring camera is kept stable, filtering and color space conversion are carried out on a real-time image, so that noise interference is reduced, usable information of the image is enhanced, then, multi-scale features are extracted through fusion of local and global features, complex details of the ground area are captured, the multi-scale features enable probability estimation of pixels of the ground area to be more accurate, and morphological processing is combined, so that the ground area can be clearly segmented, erroneous judgment is reduced, the method is particularly suitable for complex monitoring scenes, stable and efficient ground area identification is provided, and accurate positioning of key areas in the monitoring scenes is ensured.
Drawings
In order to more clearly illustrate the technical solution of the embodiments of the present invention, the drawings required to be used in the embodiments of the present invention will be briefly described, and it is within the scope of the present invention to obtain other drawings according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 is a flowchart illustrating the overall operation of the ground area detection method in the video monitoring scenario in embodiment 1 of the present invention;
FIG. 2 is a flow chart of identifying a human target and a rotation situation of a camera in embodiment 1 of the present invention;
fig. 3 is a schematic flow chart of image segmentation of the real-time image in embodiment 1 of the present invention;
FIG. 4 is a schematic flow chart of the depth analysis in the embodiment 1 of the present invention;
FIG. 5 is a flow chart of determining depth value sequence in embodiment 1 of the present invention;
FIG. 6 is a schematic flow chart of determining the average depth in embodiment 1 of the present invention;
fig. 7 is a flowchart of determining a human fall detection result in embodiment 1 of the present invention;
fig. 8 is a block diagram of a ground area detection device in a video monitoring scene in embodiment 2 of the present invention;
fig. 9 is a schematic structural diagram of an electronic device in embodiment 3 of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present application more clear, the technical solutions of the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application. It is noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. In the description of the present application, it should be understood that the terms "center," "upper," "lower," "front," "rear," "left," "right," "vertical," "horizontal," "top," "bottom," "inner," "outer," and the like indicate orientations or positional relationships based on the orientation or positional relationships shown in the drawings, merely to facilitate describing the present application and simplify the description, and do not indicate or imply that the devices or elements referred to must have a specific orientation, be configured and operated in a specific orientation, and thus should not be construed as limiting the present application. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising" does not exclude the presence of additional identical elements in a process, method, article, or apparatus that comprises the element. If not conflicting, the embodiments of the present application and the features of the embodiments may be combined with each other, which are all within the protection scope of the present application.
Example 1
Referring to fig. 1, embodiment 1 of the invention discloses a ground area detection method in a video monitoring scene, which comprises the following steps:
s1, acquiring a real-time video stream in a video monitoring scene, and decomposing the real-time video stream into multi-frame real-time images;
The method comprises the steps of obtaining a real-time video stream under a video monitoring scene captured by an obliquely installed monocular camera, decoding the obtained video stream and converting the video stream into an image sequence, wherein the step is carried out by using a common video codec (such as H.264 and H.265), the decoded video stream is decomposed into multi-frame images, each frame represents one moment of the video stream, the image sequence is composed of frames, the monocular camera is a camera with only one optical lens, and image information of one plane view can be obtained through the lens, and the common monocular camera comprises a common digital camera, a network camera, an industrial camera and the like. On the one hand, when the monocular camera is installed, if the monocular camera is installed vertically, the visual field range of the camera is limited, and the visual field range of the monocular camera can be enlarged by obliquely installing the monocular camera, so that the monitoring effect is improved, and on the other hand, when the monocular camera is installed vertically, the camera can be easily found out and obliquely installed, so that the camera is more concealed, and the safety is improved.
In one embodiment, please refer to fig. 2, the step S2 includes:
S201, acquiring a pre-trained human body target detection model based on yolov S architecture;
S202, inputting the real-time image into the human body target detection model, and identifying whether a human body target appears in a current monitoring scene;
And S203, monitoring the rotation of the camera by using a visual mileage calculation method, and identifying whether the current monitoring visual angle rotates or not.
Specifically, a machine learning or deep learning algorithm, such as yolov s algorithm, is used to build a human body detection model to detect whether a human body exists in the current video monitoring scene by learning a large amount of labeled training data, and a visual mileage calculation method is used to judge whether the camera rotates by comparing pose changes between adjacent frames. When the method judges that the current video monitoring scene does not detect a human body and the rotation of the camera is stopped, the method considers that the preset starting condition is reached and starts to divide the ground area in the image.
S2, if no human body target is identified in the current monitoring scene and the current monitoring visual angle is not rotated, image segmentation is carried out on the real-time image to obtain ground area position information;
Specifically, a starting condition is preset, wherein the preset starting condition comprises that a current video monitoring scene does not detect a human body and rotation of a camera is stopped, and only when the preset starting condition is reached, the ground area position information in a monocular image is segmented, so that on one hand, interference of human body information on a segmented ground area in the monitoring scene is avoided, and on the other hand, when the camera does not rotate, the monitoring scene does not change too much, and a human body falls down to cause severe change of the monitoring scene, therefore, only when the rotation of the camera is stopped, image segmentation is carried out, and unnecessary waste of working resources is avoided.
In one embodiment, referring to fig. 3, the step S2 includes:
s21, if no human body target is identified in the current monitoring scene and the current monitoring visual angle is not rotated, filtering and converting the real-time image to obtain a color space image, wherein the real-time image is a monocular image;
The method comprises the steps of obtaining a monocular image, preprocessing an input color image, wherein noise possibly existing in the image affects the segmentation result, so that Gaussian filtering is utilized to eliminate noise in the image, converting the preprocessed color image into an HSV color space, and selecting the HSV color space because the HSV color space can decompose color information into three components of hue, saturation and brightness, and hue components correspond to perception of colors by human eyes, so that elements with different colors can be distinguished more accurately, and ground and non-ground elements can be distinguished better.
S22, extracting features of the color space image, and fusing the extracted local features and global features to obtain multi-scale features;
Specifically, the converted color space image is subjected to multi-scale segmentation to obtain a plurality of subareas, and local features are extracted by using an LBP operator for each subarea. The LBP operator is an algorithm for texture feature extraction that can compare neighboring pixels around a pixel to the pixel to obtain a binary code that describes the texture feature around the pixel, and for the entire image, the global feature is extracted using a pre-trained ResNet model, resNet model is a deep convolutional neural network that can perform advanced feature extraction and classification on the image. And fusing the extracted local features and the global features through MLFN feature fusion networks to construct a multi-scale feature representation. Through multi-scale segmentation and application of an LBP operator, sub-regions with local texture features can be extracted, and the degree of distinguishing the features is enhanced. Meanwhile, the global features are extracted by utilizing the ResNet model, so that the context information of the whole image can be captured, the classification accuracy is improved, and finally, the expression capacity and the classification performance of the features can be further improved by fusing the local features and the global features.
S23, according to the multi-scale characteristics, obtaining the probability of each pixel point in the ground area to obtain a probability map;
s24, acquiring a preset probability threshold value, and converting the probability map into a binary segmentation map;
Specifically, the local features and the global features are spliced together to form feature vectors of each pixel point, and a classifier is trained by using the feature vectors, wherein the classifier classifies each pixel point into two types of ground elements and non-ground elements. After training, classifying each pixel point by using the classifier to obtain a probability map. The probability map represents the probability that each pixel belongs to a ground element, and the value range is between 0 and 1. The Otsu method is used to determine a suitable probability threshold. The Otsu method is a histogram-based adaptive threshold selection method, which can automatically determine an optimal threshold and divide the image into two parts. The Otsu method treats the gray values of the image as a probability distribution and determines an optimal gray threshold by minimizing the weighted sum of the intra-class variance and the inter-class variance. In our method, we consider the probability map as a gray scale image, determine an optimal probability threshold by Otsu method, and convert the probability map into a binary segmentation map, where the binary segmentation map classifies each pixel in the image into two classes, ground element and non-ground element. The probability value of the ground element can be calculated according to the characteristics of the color, the texture and the like of the pixel points by using the probability graph method, so that the problem of inaccurate segmentation caused by the change of the characteristics of the color, the texture and the like of the pixel points in the traditional segmentation method is avoided. Meanwhile, the probability threshold value can be determined in a self-adaptive mode by using the Otsu method, so that the separation result is more accurate.
And S25, carrying out morphological processing on the binary segmentation map, and taking the pixel point position information in the ground area as the ground area position information.
Specifically, morphological operations (open operation, close operation, etc.) are performed on the binary segmentation map, and small noise and fine connection are eliminated by the open operation, thereby obtaining a more accurate segmentation result. And the closed operation can fill the hollow and crack in the object, so that the object is more complete and communicated. The accuracy of the subsequent processing can be improved by the operation, and the obtained ground area pixels are stored according to the coordinate sequence, so that the ground area position information P (P1, P2, p3., pn) is obtained, wherein P1, P2, p3., pn represents coordinate information corresponding to a first pixel point to coordinate information corresponding to an nth pixel point in a camera coordinate system of the video camera.
S3, inputting each real-time image into a pre-trained human body detection model, and outputting human body region position information;
s4, carrying out depth analysis on the human body region position information and the ground region position information to obtain a ground region average depth sequence and a human body region average depth sequence corresponding to the multi-frame time sequence image;
Specifically, the human body position is detected through a yolov s target detection algorithm, a pixel set of a human body area is obtained, the human body area position information is obtained according to the pixel set of the human body area, and the image captured by the monocular camera can be converted into a three-dimensional scene by combining the ground area position information. Then, the average depth of the ground area and the average depth of the human body area are calculated, and because abnormal depth changes, such as sudden decrease or increase of the distance between the human body and the ground, occur in the fallen human body, the average depth of the ground area and the average depth of the human body area are used for detecting the falling of the human body, so that the conditions of false detection and omission detection are reduced, and the accuracy and the stability of the falling detection of the human body are improved.
In one embodiment, referring to fig. 4, the step S4 includes:
S41, acquiring a training image with a depth label in a video monitoring scene, and inputting the training image into a deep learning network to obtain a training model;
specifically, first, a training image with a depth label needs to be acquired and input into a deep learning network for training, so as to obtain a model capable of performing depth estimation on a monocular image. In the training process, the deep learning network learns the corresponding relation between the depth information in the input training image and the pixel value of the training image.
S42, inputting the real-time image into the training model for forward propagation to obtain depth values corresponding to all pixel points in the real-time image;
Specifically, after a trained model is obtained, a required monocular image is input into the model for forward propagation, and depth values corresponding to each pixel point in the input monocular image are obtained, wherein the depth values provide more accurate data support for subsequent falling detection.
S43, according to the human body region position information and the ground region position information, combining depth values corresponding to all pixel points in the real-time image to obtain a human body region depth value sequence and a ground region depth value sequence corresponding to a multi-frame time sequence image;
in one embodiment, referring to fig. 5, the step S43 includes:
s431, outputting depth values corresponding to pixel points in the human body region to the human body region depth value sequence according to the human body region position information;
Specifically, for the human body region, the coordinate position of each pixel point in the human body region can be determined according to the segmentation result of the previous human body detection model, then the depth value belonging to the human body region is screened out from the depth values corresponding to all the pixel points in the monocular image, and the depth value is output to the depth value sequence of the human body region.
And S432, outputting depth values corresponding to the pixel points in the ground area to the depth value sequence of the ground area according to the position information of the ground area.
Specifically, the ground area position information is obtained, and a set of depth values corresponding to pixels in the ground area is used as the ground area depth value sequence D (D1, D2, D3, and dn), wherein D1, D2, D3, and dn represent depth values corresponding to a first pixel to an nth pixel in the segmented ground area.
S44, obtaining the ground area average depth sequence and the human body area average depth sequence according to the human body area depth value sequence and the ground area depth value sequence.
Specifically, according to the human body region depth value sequence and the ground region depth value sequence, the ground region average depth and the human body region average depth are obtained, and after the ground region average depth and the human body region average depth are obtained, whether the human body falls down or not can be accurately judged. When the user falls down, the average depth of the human body area is obviously reduced, and compared with the average depth of the ground area, the average depth of the human body area is greatly deviated. Therefore, by comparing the average depth of the human body area with the average depth of the ground area, whether the human body falls down or not can be judged more reliably, and erroneous judgment is avoided.
In one embodiment, referring to fig. 6, the step S44 includes:
S441, acquiring a first pixel point sequence in a human body area and a second pixel point sequence in a ground area according to the human body area position information and the ground area position information;
S442, obtaining a first sliding window and a second sliding window according to the first pixel point sequence and the second pixel point sequence;
S443, obtaining a first filtering depth value sequence and a second filtering depth value sequence according to the first sliding window and the second sliding window;
Specifically, for each pixel point in the first pixel point sequence, adjacent points (w adjacent points before and after) are taken to form a 2w+1 sliding window, the sliding window is used as a first sliding window, the median of depth values corresponding to all the pixel points in the first sliding window is calculated and used as a filtered depth value, and the filtered depth values of all the pixel points in the first pixel point sequence are calculated to obtain a first filtered depth value sequence. Similarly, a second filter depth value sequence can be obtained, and the sliding window can be adjusted according to the actual application requirement. The median filtering of the depth values by using the sliding window can effectively eliminate isolated noise points and false detection points in the depth map, so that the depth map is smoother and more reliable. Because the depth map often has noise and false detection points in practical application, if filtering processing is not performed, the subsequent algorithm may be greatly affected. Therefore, the robustness and accuracy of the algorithm can be effectively improved by adopting sliding window median filtering. In addition, the size of the sliding window can be adjusted according to practical application requirements so as to achieve the optimal filtering effect.
S444, according to the first filtering depth value sequence and the second filtering depth value sequence, sliding window processing is conducted on the human body region depth value sequence and the ground region depth value sequence, and the average depth of the ground region and the average depth of the human body region are obtained.
Specifically, a first filtered depth value sequence d1_filtered and a second filtered depth value sequence d2_filtered are obtained, and the ground area average depth and the human body area average depth are calculated by using the following formula:
D_person_avg=(d11_filtered+d12_filtered+......+d1n_filtered)/n;
D_ground_avg=(d21_filtered+d22_filtered+......+d2m_filtered)/m;
Where D_person_avg represents the average depth of the human body region, D_group_avg represents the average depth of the ground region, d11_filtered, d12_filtered, D1 n-filtered represent the respective filtered depth values for the first pixel to the nth pixel in the first sequence of filtered depth values, d22_filtered, d2m—filtered represents the filtered depth values corresponding to the first pixel to the mth pixel in the second filtered depth value sequence, n is the number of points in the first sequence of filtered depth values and m is the number of points in the second sequence of filtered depth values.
And S5, carrying out depth difference calculation on the average depth sequence of the human body region and the average depth sequence of the ground region, and detecting falling of the human body according to the depth difference sequence.
Specifically, by acquiring a human body region average depth sequence corresponding to a multi-frame monocular image, more accurate human body position and posture information is obtained, and then according to a depth difference sequence between the human body region average depth sequence and the ground region average depth, the human body falling behavior is detected more accurately.
In one embodiment, referring to fig. 7, the step S5 includes:
S51, acquiring a preset depth difference threshold;
Specifically, in practical application, the required depth difference threshold value is also different due to different depth information acquisition modes of different scenes and different cameras. If the threshold value is set too small, the false detection rate may be high, i.e. the situation that the person does not fall is misjudged as falling, and if the threshold value is set too large, the false detection rate may be high, i.e. the situation that the person does not fall is not detected, so that a proper threshold value T is preset for judging whether the depth difference between the human body and the ground is small enough. This threshold may be adjusted based on the actual scene and camera parameters.
S52, sequentially comparing each depth difference in the depth difference sequence with the depth difference threshold value to obtain a comparison result;
Specifically, for each input monocular image, the difference between the average depth of the human body and the ground depth is calculated according to the following formula:
de lta_D_i=abs(D_person_avg_i–D_ground_avg)
The method comprises the steps of sequentially comparing the calculated depth difference value with a depth difference threshold value, considering that a human body falls when the depth difference value is smaller than the depth difference threshold value, and considering that the human body does not fall when the depth difference value is not smaller than the depth difference threshold value. And judging according to the set depth difference threshold value, and accurately detecting whether the human body falls down or not, so that effective video monitoring and safety management are realized.
And S53, when the comparison results corresponding to the continuous multi-frame images are that the depth difference is smaller than the depth difference threshold value, obtaining that the human body falls down on the ground.
In particular, if depth differences de lta_d_i of all frames are traversed and if depth differences de lta_d_i of consecutive multi-frames (e.g., consecutive 3 frames or more) are smaller than a threshold T, we can consider that the human body has fallen on the ground. Whether the human body falls down or not is judged through the depth difference value of continuous multiframes, so that the accuracy and the reliability of detection are improved. Because the position and the posture of the human body can change greatly when the human body falls down, misjudgment can be generated by simply judging the depth difference value of a certain frame, but the possibility of misjudgment can be reduced by judging the depth difference value of continuous multiframes, so that the accuracy of falling detection is improved. Meanwhile, the detection sensitivity and the real-time performance can be controlled by adjusting the frame number and the threshold value of continuous multiframes, so that the requirements of different scenes can be better met.
Example 2
Referring to fig. 8, embodiment 2 of the present invention further provides a ground area detection device in a video surveillance scene, where the device includes:
the image acquisition module is used for acquiring a real-time video stream in a video monitoring scene and decomposing the real-time video stream into multi-frame real-time images;
the image segmentation module is used for carrying out image segmentation on the real-time image to obtain ground area position information if a human body target is not identified in the current monitoring scene and the current monitoring visual angle is not rotated;
In an embodiment, if the human body target is not identified in the current monitoring scene and the current monitoring view angle is not rotated, performing image segmentation on the real-time image to obtain the ground area position information further includes:
The human body target detection model acquisition unit is used for acquiring a pre-trained human body target detection model based on yolov s architecture;
the human body target potential unit is used for inputting the real-time image into the human body target detection model and identifying whether a human body target appears in the current monitoring scene;
And the monitoring visual angle rotation identification unit is used for monitoring the rotation of the camera by using a visual mileage calculation method and identifying whether the current monitoring visual angle rotates or not.
In an embodiment, the image segmentation module comprises:
The filtering and color space converting unit is used for carrying out filtering processing and color space conversion on the real-time image to obtain a color space image if a human body target is not identified in the current monitoring scene and the current monitoring visual angle is not rotated, wherein the real-time image is a monocular image;
The feature extraction and fusion unit is used for extracting features of the color space image, and fusing the extracted local features and global features to obtain multi-scale features;
the probability map acquisition unit is used for acquiring the probability of each pixel point in the ground area according to the multi-scale characteristics to obtain a probability map;
the probability map conversion unit is used for acquiring a preset probability threshold value and converting the probability map into a binary segmentation map;
And the morphology processing unit is used for performing morphology processing on the binary segmentation map and taking the pixel point position information in the ground area as the ground area position information.
The human body detection module is used for inputting each real-time image into a pre-trained human body detection model and outputting human body region position information;
The depth analysis module is used for carrying out depth analysis on the human body region position information and the ground region position information to obtain a ground region average depth sequence and a human body region average depth sequence corresponding to the multi-frame time sequence images;
in an embodiment, the depth analysis module comprises:
The model training unit is used for acquiring training images with depth labels in the video monitoring scene, inputting the training images into a deep learning network and obtaining a training model;
The forward propagation unit is used for inputting the real-time image into the training model for forward propagation to obtain depth values corresponding to all pixel points in the real-time image;
The depth value sequence acquisition unit is used for acquiring a human body region depth value sequence and a ground region depth value sequence corresponding to a multi-frame time sequence image according to the human body region position information and the ground region position information and combining depth values corresponding to all pixel points in the real-time image;
in an embodiment, the depth value sequence acquisition unit includes:
The human body region depth value sequence obtaining subunit is used for outputting depth values corresponding to pixel points in a human body region to the human body region depth value sequence according to the human body region position information;
And the ground area depth value sequence acquisition subunit is used for outputting depth values corresponding to the pixel points in the ground area to the ground area depth value sequence according to the ground area position information.
The average depth sequence unit is used for obtaining the ground area average depth sequence and the human body area average depth sequence according to the human body area depth value sequence and the ground area depth value sequence.
In an embodiment, the average depth sequence unit acquisition unit includes:
The pixel point sequence and acquisition subunit is used for acquiring a first pixel point sequence in the human body region and a second pixel point sequence in the ground region according to the human body region position information and the ground region position information;
the sliding window acquisition subunit is used for obtaining a first sliding window and a second sliding window according to the first pixel point sequence and the second pixel point sequence;
the filtering processing subunit is used for obtaining a first filtering depth value sequence and a second filtering depth value sequence according to the first sliding window and the second sliding window;
And the sliding window processing subunit is used for carrying out sliding window processing on the human body region depth value sequence and the ground region depth value sequence according to the first filtering depth value sequence and the second filtering depth value sequence to obtain the ground region average depth and the human body region average depth.
And the falling detection module is used for calculating the depth difference of the human body region average depth sequence and the ground region average depth sequence and detecting falling of the human body according to the depth difference sequence.
In an embodiment, the fall detection module comprises:
the depth difference threshold value acquisition unit is used for acquiring a preset depth difference threshold value;
the depth difference comparison unit is used for sequentially comparing each depth difference in the depth difference sequence with the depth difference threshold value to obtain a comparison result;
And the falling judgment unit is used for obtaining the falling result of the human body as the falling of the human body on the ground when the comparison results corresponding to the continuous multi-frame images are all that the depth difference is smaller than the depth difference threshold value.
The ground area detection device in the video monitoring scene comprises an image acquisition module, an image segmentation module, a human body detection module, a depth analysis module and a fall detection module, wherein the image acquisition module is used for acquiring a real-time video stream in the video monitoring scene, decomposing the real-time video stream into multiple frames of real-time images, the image segmentation module is used for carrying out image segmentation on the real-time images to obtain ground area position information if a human body target is not recognized in the current monitoring scene and a current monitoring visual angle is not rotated, the human body detection module is used for inputting each real-time image into a pre-trained human body detection model to output the human body area position information, the depth analysis module is used for carrying out depth analysis on the human body area position information and the ground area position information to obtain a ground area average depth sequence and a human body area average depth sequence corresponding to multiple frames of time sequence images, and the fall detection module is used for carrying out depth difference calculation on the human body area average depth sequence and the ground area average depth sequence and detecting a human body fall according to the depth difference sequence. The device relies on real-time video streams under a video monitoring scene, is obtained through a common monitoring camera, does not need to introduce an additional depth sensor, so that hardware cost is reduced, pre-training of a human body detection model means that the model is not required to be trained from zero, development cost is reduced, a plurality of pre-training models are trained on large-scale data and used for human body detection of the common scene, position information of a ground area is obtained through real-time image segmentation, is realized through a common computer vision algorithm without complex depth image processing, cost is reduced, adaptability of a system to dynamic environment changes is improved by analyzing average depth sequences of the ground and the human body area corresponding to multi-frame time sequence images, misjudgment probability is reduced, and the system can pay more attention to changes of relative depths instead of absolute depth values through depth difference calculation, so that detection stability is improved.
Example 3
In addition, the ground area detection method in the video surveillance scene of embodiment 1 of the present invention described in connection with fig. 1 may be implemented by an electronic device. Fig. 9 shows a schematic hardware structure of an electronic device according to embodiment 3 of the present invention.
The electronic device may include a processor and memory storing computer program instructions.
In particular, the processor may comprise a Central Processing Unit (CPU), or an Application SPECIFIC INTEGRATED Circuit (ASIC), or may be configured as one or more integrated circuits that implement embodiments of the present invention.
The memory may include mass storage for data or instructions. By way of example, and not limitation, the memory may comprise a hard disk drive (HARD DISK DRIVE, HDD), floppy disk drive, flash memory, optical disk, magneto-optical disk, magnetic tape, or universal serial bus (Universal Serial Bus, USB) drive, or a combination of two or more of these. The memory may include removable or non-removable (or fixed) media, where appropriate. The memory may be internal or external to the data processing apparatus, where appropriate. In a particular embodiment, the memory is a non-volatile solid state memory. In a particular embodiment, the memory includes Read Only Memory (ROM). The ROM may be mask programmed ROM, programmable ROM (PROM), erasable PROM (EPROM), electrically Erasable PROM (EEPROM), electrically rewritable ROM (EAROM), or flash memory, or a combination of two or more of these, where appropriate.
The processor reads and executes the computer program instructions stored in the memory to implement the ground area detection method in any one of the video surveillance scenes of the above embodiment.
In one example, the electronic device may also include a communication interface and a bus. The processor, the memory, and the communication interface are connected by a bus and complete communication with each other, as shown in fig. 9.
The communication interface is mainly used for realizing communication among the modules, the devices, the units and/or the equipment in the embodiment of the invention.
The bus includes hardware, software, or both that couple the components of the device to each other. By way of example, and not limitation, the buses may include an Accelerated Graphics Port (AGP) or other graphics bus, an enhanced Industry Standard Architecture (ISA) bus, a Front Side Bus (FSB), a HyperTransport (HT) interconnect, an industry standard architecture (ISPA) bus, an infiniband interconnect, a Low Pin Count (LPC) bus, a memory bus, a Micro Channel Architecture (MCA) bus, a Peripheral Component Interconnect (PCI) bus, a PCI-Express (PCI-X) bus, a Serial Advanced Technology Attachment (SATA) bus, a video electronics standards association local (VLB) bus, or other suitable bus, or a combination of two or more of the above. The bus may include one or more buses, where appropriate. Although embodiments of the invention have been described and illustrated with respect to a particular bus, the invention contemplates any suitable bus or interconnect.
Example 4
In addition, in combination with the above-mentioned ground area detection method in the video surveillance scene in embodiment 1, embodiment 4 of the present invention may also provide a computer readable storage medium. The computer readable storage medium has stored thereon computer program instructions which when executed by a processor implement the method for detecting a ground area in any one of the video surveillance scenarios of the above embodiments.
In summary, the embodiment of the invention provides a method, a device, equipment and a medium for detecting a ground area in a video monitoring scene.
It should be understood that the invention is not limited to the particular arrangements and instrumentality described above and shown in the drawings. For the sake of brevity, a detailed description of known methods is omitted here. In the above embodiments, several specific steps are described and shown as examples. The method processes of the present invention are not limited to the specific steps described and shown, but various changes, modifications and additions, or the order between steps may be made by those skilled in the art after appreciating the spirit of the present invention.
The functional blocks shown in the above-described structural block diagrams may be implemented in hardware, software, firmware, or a combination thereof. When implemented in hardware, it may be, for example, an electronic circuit, an Application Specific Integrated Circuit (ASIC), suitable firmware, a plug-in, a function card, or the like. When implemented in software, the elements of the invention are the programs or code segments used to perform the required tasks. The program or code segments may be stored in a machine readable medium or transmitted over transmission media or communication links by a data signal carried in a carrier wave. A "machine-readable medium" may include any medium that can store or transfer information. Examples of machine-readable media include electronic circuitry, semiconductor memory devices, ROM, flash memory, erasable ROM (EROM), floppy disks, CD-ROMs, optical disks, hard disks, fiber optic media, radio Frequency (RF) links, and the like. The code segments may be downloaded via computer networks such as the internet, intranets, etc.
It should also be noted that the exemplary embodiments mentioned in this disclosure describe some methods or systems based on a series of steps or devices. The present invention is not limited to the order of the above-described steps, that is, the steps may be performed in the order mentioned in the embodiments, or may be performed in a different order from the order in the embodiments, or several steps may be performed simultaneously.
In the foregoing, only the specific embodiments of the present invention are described, and it will be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes of the systems, modules and units described above may refer to the corresponding processes in the foregoing method embodiments, which are not repeated herein. It should be understood that the scope of the present invention is not limited thereto, and any equivalent modifications or substitutions can be easily made by those skilled in the art within the technical scope of the present invention, and they should be included in the scope of the present invention.

Claims (10)

1. A method for detecting a ground area in a video surveillance scene, the method comprising:
Inputting a real-time image obtained by decomposing a real-time video stream in a video monitoring scene into a pre-trained human body target detection model, and identifying whether a human body target appears in the current monitoring scene;
monitoring the rotation of the camera by using a visual mileage calculation method, and identifying whether the current monitoring visual angle rotates or not by comparing the pose changes between adjacent frames;
If the human body target is not identified in the current monitoring scene and the current monitoring visual angle is not rotated, filtering and converting the real-time image into a color space image;
Extracting features of the color space image, and fusing the extracted local features and global features to obtain multi-scale features;
According to the multi-scale characteristics, the probability of each pixel point in the ground area is obtained, a preset probability threshold value is obtained, and a probability map is converted into a binary segmentation map;
And carrying out morphological processing on the binary segmentation map, and taking the pixel point position information in the ground area as the ground area position information.
2. The method for detecting a ground area in a video surveillance scene according to claim 1, wherein if no human target is identified in the current surveillance scene and the current surveillance view angle is not rotated, performing image segmentation on the real-time image to obtain ground area position information, further comprising:
inputting each real-time image into a pre-trained human body detection model, and outputting human body region position information;
performing depth analysis on the human body region position information and the ground region position information to obtain a ground region average depth sequence and a human body region average depth sequence corresponding to a plurality of frames of time sequence images;
And carrying out depth difference calculation on the human body region average depth sequence and the ground region average depth sequence, and detecting human body falling according to the depth difference sequence.
3. The method for detecting a ground area in a video surveillance scene according to claim 2, wherein the performing depth analysis on the human body area position information and the ground area position information to obtain a ground area average depth sequence and a human body area average depth sequence corresponding to a plurality of frames of time sequence images comprises:
Acquiring a training image with a depth label in a video monitoring scene, and inputting the training image into a deep learning network to obtain a training model;
Inputting the real-time image into the training model for forward propagation to obtain depth values corresponding to all pixel points in the real-time image;
According to the human body region position information and the ground region position information, combining depth values corresponding to all pixel points in the real-time image to obtain a human body region depth value sequence and a ground region depth value sequence corresponding to a multi-frame time sequence image;
And obtaining the ground area average depth sequence and the human area average depth sequence according to the human area depth value sequence and the ground area depth value sequence.
4. The method for detecting a ground area in a video surveillance scene according to claim 3, wherein the step of obtaining a sequence of depth values of a human body area and a sequence of depth values of a ground area corresponding to a plurality of frames of time-series images by combining depth values corresponding to all pixels in a real-time image according to the position information of the human body area and the position information of the ground area comprises:
according to the human body region position information, outputting depth values corresponding to pixel points in a human body region to the human body region depth value sequence;
and outputting depth values corresponding to the pixel points in the ground area to the ground area depth value sequence according to the ground area position information.
5. The method for detecting a ground area in a video surveillance scene according to claim 3, wherein the obtaining the ground area average depth sequence and the human area average depth sequence according to the human area depth value sequence and the ground area depth value sequence includes:
acquiring a first pixel point sequence in a human body area and a second pixel point sequence in a ground area according to the human body area position information and the ground area position information;
obtaining a first sliding window and a second sliding window according to the first pixel point sequence and the second pixel point sequence;
obtaining a first filtering depth value sequence and a second filtering depth value sequence according to the first sliding window and the second sliding window;
And carrying out sliding window processing on the human body region depth value sequence and the ground region depth value sequence according to the first filtering depth value sequence and the second filtering depth value sequence to obtain the ground region average depth and the human body region average depth.
6. The method for detecting a ground area in a video surveillance scene according to any one of claims 2 to 5, wherein the step of performing depth difference calculation on the average depth sequence of the human body area and the average depth sequence of the ground area, and detecting a human body fall according to the depth difference sequence comprises:
Acquiring a preset depth difference threshold;
Sequentially comparing each depth difference in the depth difference sequence with the depth difference threshold value to obtain a comparison result;
when the comparison results corresponding to the continuous multi-frame images are that the depth difference is smaller than the depth difference threshold value, the human body falling result is obtained that the human body falls on the ground.
7. The method according to claim 1, wherein, before the filtering and color space conversion are performed on the real-time image if no human target is identified in the current monitored scene and the current monitored viewing angle is not rotated, obtaining a color space image comprises:
Acquiring a pre-trained human body target detection model based on yolov s architecture;
Inputting the real-time image into the human body target detection model, and identifying whether a human body target appears in the current monitoring scene;
And monitoring the rotation of the camera by using a visual mileage calculation method, and identifying whether the current monitoring visual angle rotates or not.
8. A ground area detection device in a video surveillance scene, the device comprising:
the human body target detection module is used for inputting a real-time image obtained by decomposing a real-time video stream in a video monitoring scene into a pre-trained human body target detection model and identifying whether a human body target appears in the current monitoring scene;
the visual angle rotation monitoring module is used for monitoring rotation of the camera by utilizing a visual mileage calculation method and identifying whether the current monitoring visual angle rotates or not by comparing pose changes between adjacent frames;
The color space image acquisition module is used for carrying out filtering processing and color space conversion on the real-time image to obtain a color space image if a human body target is not identified in the current monitoring scene and the current monitoring visual angle is not rotated;
The feature extraction module is used for extracting features of the color space image, and fusing the extracted local features and global features to obtain multi-scale features;
the binary segmentation map acquisition module is used for acquiring the probability of each pixel point in the ground area according to the multi-scale characteristics, acquiring a preset probability threshold value and converting the probability map into a binary segmentation map;
The ground area position information acquisition module is used for carrying out morphological processing on the binary segmentation map and taking the pixel point position information in the ground area as the ground area position information.
9. An electronic device comprising at least one processor, at least one memory, and computer program instructions stored in the memory, which when executed by the processor, implement the method of any of claims 1-7.
10. A storage medium having stored thereon computer program instructions, which when executed by a processor, implement the method of any of claims 1-7.
CN202411437328.XA 2023-12-15 2023-12-15 Ground area detection method, device, equipment and medium in video surveillance scene Pending CN119445651A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202411437328.XA CN119445651A (en) 2023-12-15 2023-12-15 Ground area detection method, device, equipment and medium in video surveillance scene

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202411437328.XA CN119445651A (en) 2023-12-15 2023-12-15 Ground area detection method, device, equipment and medium in video surveillance scene
CN202311742651.3A CN117671799B (en) 2023-12-15 2023-12-15 Human body falling detection method, device, equipment and medium combining depth measurement

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
CN202311742651.3A Division CN117671799B (en) 2023-12-15 2023-12-15 Human body falling detection method, device, equipment and medium combining depth measurement

Publications (1)

Publication Number Publication Date
CN119445651A true CN119445651A (en) 2025-02-14

Family

ID=90080777

Family Applications (2)

Application Number Title Priority Date Filing Date
CN202311742651.3A Active CN117671799B (en) 2023-12-15 2023-12-15 Human body falling detection method, device, equipment and medium combining depth measurement
CN202411437328.XA Pending CN119445651A (en) 2023-12-15 2023-12-15 Ground area detection method, device, equipment and medium in video surveillance scene

Family Applications Before (1)

Application Number Title Priority Date Filing Date
CN202311742651.3A Active CN117671799B (en) 2023-12-15 2023-12-15 Human body falling detection method, device, equipment and medium combining depth measurement

Country Status (1)

Country Link
CN (2) CN117671799B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118379800B (en) * 2024-06-24 2024-09-17 宁波星巡智能科技有限公司 Human body falling detection method, device, equipment and storage medium under shielding condition
CN118470905B (en) * 2024-07-15 2024-12-24 无锡顶视科技有限公司 Personnel falling monitoring system and method

Family Cites Families (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105279483B (en) * 2015-09-28 2018-08-21 华中科技大学 A kind of tumble behavior real-time detection method based on depth image
CN109145696B (en) * 2017-06-28 2021-04-09 安徽清新互联信息科技有限公司 Old people falling detection method and system based on deep learning
CN107657244B (en) * 2017-10-13 2020-12-01 河海大学 A multi-camera-based human fall behavior detection system and its detection method
CN108960056B (en) * 2018-05-30 2022-06-03 西南交通大学 A Fall Detection Method Based on Attitude Analysis and Support Vector Data Description
CN111695404B (en) * 2020-04-22 2023-08-18 北京迈格威科技有限公司 Pedestrian falling detection method and device, electronic equipment and storage medium
CN111652136B (en) * 2020-06-03 2022-11-22 苏宁云计算有限公司 Pedestrian detection method and device based on depth image
CN111899470B (en) * 2020-08-26 2022-07-22 歌尔科技有限公司 Human body falling detection method, device, equipment and storage medium
KR102558054B1 (en) * 2021-01-22 2023-07-19 동의대학교 산학협력단 Method and System for detecting fall situation by using deep learning model
CN114333235A (en) * 2021-12-23 2022-04-12 安徽信息工程学院 Human body multi-feature fusion falling detection method
CN114863471B (en) * 2022-03-25 2024-12-20 南京邮电大学 A multi-feature fusion human fall detection method and system
CN115082825A (en) * 2022-06-16 2022-09-20 中新国际联合研究院 Video-based real-time human body falling detection and alarm method and device
CN115482494A (en) * 2022-10-14 2022-12-16 深圳市优必选科技股份有限公司 Human fall detection method, feature extraction model acquisition method, and device
CN115578792A (en) * 2022-10-19 2023-01-06 中山大学 Method and system for early warning and detection of indoor personnel falls based on machine vision
CN116612529A (en) * 2023-05-23 2023-08-18 同济大学 Pedestrian falling behavior identification method and device based on human dynamics centroid model
CN116883946B (en) * 2023-07-24 2024-03-22 武汉星巡智能科技有限公司 Method, device, equipment and storage medium for detecting abnormal behaviors of old people in real time
CN117132949B (en) * 2023-10-27 2024-02-09 长春理工大学 An all-weather fall detection method based on deep learning

Also Published As

Publication number Publication date
CN117671799A (en) 2024-03-08
CN117671799B (en) 2024-09-10

Similar Documents

Publication Publication Date Title
CN106203274B (en) Real-time pedestrian detection system and method in video monitoring
JP6018674B2 (en) System and method for subject re-identification
CN119445651A (en) Ground area detection method, device, equipment and medium in video surveillance scene
CN104966304B (en) Multi-target detection tracking based on Kalman filtering and nonparametric background model
US8374440B2 (en) Image processing method and apparatus
CN114842397B (en) Real-time old man falling detection method based on anomaly detection
KR101697161B1 (en) Device and method for tracking pedestrian in thermal image using an online random fern learning
US20150334267A1 (en) Color Correction Device, Method, and Program
CN106204640A (en) A kind of moving object detection system and method
KR101653278B1 (en) Face tracking system using colar-based face detection method
JP2011034244A (en) Image analysis apparatus, image analysis method and program
CN111460917B (en) Airport abnormal behavior detection system and method based on multi-mode information fusion
CN110324583A (en) A kind of video monitoring method, video monitoring apparatus and computer readable storage medium
CN110490171B (en) Dangerous posture recognition method and device, computer equipment and storage medium
WO2021147055A1 (en) Systems and methods for video anomaly detection using multi-scale image frame prediction network
CN111259718A (en) Escalator retention detection method and system based on Gaussian mixture model
CN112866654B (en) Intelligent video monitoring system
JP5318664B2 (en) Object detection device
CN112733770A (en) Regional intrusion monitoring method and device
CN115546737B (en) Machine room monitoring method
CN117557937A (en) A surveillance camera image anomaly detection method and system
CN111695404A (en) Pedestrian falling detection method and device, electronic equipment and storage medium
CN117475353A (en) Video-based abnormal smoke identification method and system
Ramalingam et al. Vehicle detection for traffic flow analysis
EP4198817A1 (en) System, method, and computer program for retraining a pre-trained object classifier

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination