[go: up one dir, main page]

WO2022160591A1 - Crowd behavior detection method and apparatus, and electronic device, storage medium and computer program product - Google Patents

Crowd behavior detection method and apparatus, and electronic device, storage medium and computer program product Download PDF

Info

Publication number
WO2022160591A1
WO2022160591A1 PCT/CN2021/103579 CN2021103579W WO2022160591A1 WO 2022160591 A1 WO2022160591 A1 WO 2022160591A1 CN 2021103579 W CN2021103579 W CN 2021103579W WO 2022160591 A1 WO2022160591 A1 WO 2022160591A1
Authority
WO
WIPO (PCT)
Prior art keywords
objects
target image
change information
image sequence
image
Prior art date
Application number
PCT/CN2021/103579
Other languages
French (fr)
Chinese (zh)
Inventor
韩志伟
刘诗男
杨昆霖
侯军
伊帅
Original Assignee
北京市商汤科技开发有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京市商汤科技开发有限公司 filed Critical 北京市商汤科技开发有限公司
Priority to KR1020237016722A priority Critical patent/KR20230090344A/en
Publication of WO2022160591A1 publication Critical patent/WO2022160591A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/277Analysis of motion involving stochastic approaches, e.g. using Kalman filters
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/25Determination of region of interest [ROI] or a volume of interest [VOI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/62Extraction of image or video features relating to a temporal dimension, e.g. time-based feature extraction; Pattern tracking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/761Proximity, similarity or dissimilarity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/49Segmenting video sequences, i.e. computational techniques such as parsing or cutting the sequence, low-level clustering or determining units such as shots or scenes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/52Surveillance or monitoring of activities, e.g. for recognising suspicious objects
    • G06V20/53Recognition of crowd images, e.g. recognition of crowd congestion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/70Labelling scene content, e.g. deriving syntactic or semantic representations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30196Human being; Person

Definitions

  • the present application relates to computer technology, and in particular to a crowd behavior detection method and device, electronic equipment, storage medium and computer program product.
  • a target image sequence (video sequence) including pedestrians can be captured by an image capturing device (eg, a monitoring device). If it is determined that the pedestrian behavior occurring in the target image sequence belongs to abnormal behaviors such as pedestrian gathering, pedestrian stasis, etc., crowd evacuation can be arranged immediately to avoid such events as stampede or group viciousness. It can be seen that there is an urgent need to propose a method for detecting crowd behavior in target image sequences.
  • the present application discloses at least one method for detecting crowd behavior.
  • the method includes: performing object tracking on at least one object appearing in a target image sequence including multiple objects, and determining the position of each object in the target image sequence. Change information; perform graph convolution processing based on the above position change information obtained in the above target image sequence, and determine crowd behaviors corresponding to a plurality of the above objects in the above target image sequence based on the extracted features obtained by the above graph convolution.
  • performing object tracking on at least one object appearing in a target image sequence including multiple objects, and determining the position change information of each object in the target image sequence includes: tracking the target image Each image included in the sequence is subjected to image processing to determine the position information of each of the above-mentioned objects in the corresponding image; object tracking is performed on each of the above-mentioned objects to determine, based on the tracking result and the above-mentioned position information, that each of the above-mentioned objects is in the above-mentioned position. Position change information in the target image sequence.
  • the above-mentioned performing object tracking on each of the above-mentioned objects includes: using Kalman filtering An algorithm or an object detection model performs object tracking on each of the above objects; based on the tracked position information of the same object in the corresponding image, the position change information of each of the above objects is determined.
  • performing graph convolution processing based on the position change information obtained in the target image sequence to obtain crowd behaviors corresponding to the plurality of objects in the target image sequence including: based on the position change
  • the object position information in the at least one image included in the above-mentioned target image sequence represented by the information and the connection relationship between the objects in the above-mentioned at least one image are performed on the above-mentioned at least one image respectively.
  • Image features perform time-domain convolution processing on the image features corresponding to the at least one image respectively, and determine crowd behaviors corresponding to a plurality of the above-mentioned objects in the above-mentioned target image sequence based on the extracted features obtained by the above-mentioned time-domain convolution processing; wherein,
  • the above crowd behavior at least includes at least one of the following: pedestrians gather; pedestrians are scattered; pedestrians stay; pedestrians reverse flow.
  • the object position information in at least one image included in the above-mentioned target image sequence represented by the above-mentioned position change information, and the connection relationship between objects in the above-mentioned at least one image, for the above-mentioned at least one image Performing spatial graph convolution processing respectively to obtain graph features corresponding to at least one image respectively, including: determining an adjacency matrix corresponding to the at least one image based on the connection relationship between objects in the at least one image; based on the object position information, Determine the feature matrix corresponding to the at least one image respectively; complete the spatial graph convolution processing based on the adjacency matrix and the feature matrix to obtain the graph feature corresponding to each image.
  • the method before the step of performing graph convolution processing based on the position change information obtained in the target image sequence to obtain the extraction features corresponding to the target image sequence, the method further includes: determining the target image sequence The connection relationship between any two objects contained in at least one of the included images.
  • determining the connection relationship between any two objects included in at least one image included in the target image sequence includes: extracting where in the image at least one object included in the at least one image is located The image feature corresponding to the area; the image feature represents the image information of the location of the at least one object; based on the image feature corresponding to the at least one object, the similarity between any two objects in the at least one object is determined; The two objects corresponding to the similarity of the preset threshold are determined as two objects having a connection relationship.
  • determining the connection relationship between any two objects included in at least one image included in the target image sequence includes: performing image processing on the at least one image, respectively, and determining that the object is in at least one image. The position information in the image; the distance between any two objects in the at least one object is determined based on the position information corresponding to the at least one object; the connection relationship between any two objects included in the at least one image is determined based on the above distance.
  • determining the connection relationship between any two objects included in the at least one image based on the above distance includes: mapping the determined distance between any two objects to a value determined by a third preset threshold and the interval formed by the fourth preset threshold; the distance between any two objects after mapping is determined as the connection weight between the above-mentioned any two objects; the above-mentioned arbitrary two objects are indicated by the connection weight between the above-mentioned any two objects A connection relationship between two objects.
  • the graph convolution processing is implemented by a graph convolution classification model; wherein, the training method of the graph convolution classification model includes: generating a training sample, wherein the training sample has a Position change information, and label information with crowd behavior based on the position change information of the multiple objects; train a preset graph convolution model based on the position change information and the crowd behavior label information, and obtain the graph convolution score. class model.
  • the above-mentioned generating a training sample includes: setting motion patterns corresponding to multiple objects based on a motion simulation platform; determining position change information corresponding to at least one object based on the motion pattern; determining the at least one object The crowd behavior represented by the corresponding position change information; the above training sample is generated based on the above position change information and the crowd behavior represented by the above position change information.
  • the present application further discloses a crowd behavior detection device, the device includes: a position change information determination module, based on the object tracking result of at least one object appearing in a target image sequence including multiple objects, determine that each object is in the target image position change information in the sequence; a crowd behavior detection module, configured to perform graph convolution processing based on the above position change information obtained in the above target image sequence, and to determine the multi-dimensional data in the above target image sequence based on the extracted features obtained by the above graph convolution The crowd behavior corresponding to the above objects.
  • a position change information determination module based on the object tracking result of at least one object appearing in a target image sequence including multiple objects, determine that each object is in the target image position change information in the sequence
  • a crowd behavior detection module configured to perform graph convolution processing based on the above position change information obtained in the above target image sequence, and to determine the multi-dimensional data in the above target image sequence based on the extracted features obtained by the above graph convolution The crowd behavior corresponding to the above objects.
  • the present application also discloses an electronic device, the device includes: a processor; a memory for storing executable instructions of the processor; wherein the processor is configured to call the executable instructions stored in the memory to implement the aforementioned crowd behavior Detection method.
  • the present application also discloses a computer-readable storage medium, where the storage medium stores a computer program, and the computer program is used to execute the foregoing crowd behavior detection method.
  • the present application also discloses a computer program product, which, when the computer program product runs on a computer, enables the computer to execute the aforementioned method for detecting crowd behavior.
  • the position change information of the above-mentioned objects in the above-mentioned target image sequence is determined.
  • graph convolution processing is performed based on the above position change information to obtain extracted features corresponding to the above target image sequence, and based on the above extracted features, crowd behaviors corresponding to the plurality of above objects in the above target image sequence are determined.
  • the principle of graph convolution is used to determine the extraction features that are beneficial to the detection of crowd behavior from the target image sequence, so as to realize the accurate detection of crowd behavior represented by the target image sequence.
  • Fig. 1 is the method flow chart of a kind of target image sequence classification method shown in this application;
  • FIG. 2 is a schematic flow chart of a crowd behavior detection shown in the application
  • FIG. 3 is a flowchart of a method for determining a connection relationship of objects in an image shown in the application
  • FIG. 4 is a schematic diagram of a graph convolution processing flow diagram shown in the application.
  • Fig. 5 is a kind of classification flow schematic diagram shown in this application.
  • FIG. 6 is a schematic diagram of a video sequence classification flow diagram shown in this application.
  • Fig. 7 is the method flow chart of a kind of model training method shown in this application.
  • FIG. 8 is a schematic structural diagram of a crowd behavior detection device shown in the application.
  • FIG. 9 is a schematic diagram of a hardware structure of an electronic device shown in this application.
  • This application aims to propose a crowd behavior detection method (hereinafter referred to as detection method).
  • detection method utilizes the principle of graph convolution, and based on the position change information corresponding to each object appearing in the target image sequence, from the above target image sequence, obtains an extraction useful for determining the crowd behavior corresponding to a plurality of the above objects in the above target image sequence. feature. Then, the method can continue to classify based on the above-mentioned extracted features, so as to determine the crowd behaviors corresponding to the above-mentioned objects in the above-mentioned target image sequence.
  • the above-mentioned target image sequence may be a video sequence collected by monitoring; the above-mentioned object may be a pedestrian appearing in the above-mentioned target image sequence.
  • the above-mentioned types of crowd behavior may include pedestrian gathering, pedestrian stasis, and pedestrian dispersion.
  • the principle of graph convolution can be used to determine the extracted features that can be beneficial for determining crowd behavior based on the position change information of pedestrians in the video.
  • classification is performed based on the above-mentioned extracted features, so as to determine the crowd behavior that is occurring in the video sequence, and make corresponding arrangements according to the determined crowd behavior to reduce the probability of occurrence of dangerous events.
  • FIG. 1 is a flowchart of a method for classifying a target image sequence shown in this application.
  • the above method may include:
  • S102 Perform object tracking on at least one object appearing in a target image sequence including multiple objects, and determine position change information of each object in the target image sequence.
  • S104 Perform graph convolution processing based on the position change information obtained in the target image sequence, and determine crowd behaviors corresponding to the plurality of objects in the target image sequence based on the extracted features obtained by the graph convolution.
  • the above classification method can be applied to electronic equipment.
  • the above-mentioned electronic device may execute the above-mentioned classification method by carrying a software system corresponding to the classification method.
  • the types of the above electronic devices can be notebook computers, computers, servers, mobile phones, PAD terminals, etc.
  • the present application does not specifically limit the specific types of the above electronic devices.
  • the above classification method can be executed only by the terminal device or the server device alone, or can be executed by the terminal device and the server device in cooperation.
  • the above classification method can be integrated in the client.
  • the terminal device equipped with the client After receiving the classification request, the terminal device equipped with the client can provide computing power through its own hardware environment to execute the above classification method.
  • the above classification method can be integrated into the system platform.
  • the server device equipped with the system platform can provide computing power through its own hardware environment to execute the above classification method.
  • the above classification method can be divided into two tasks: acquiring the target image sequence and classifying the target image sequence.
  • the acquisition task can be integrated in the client and carried on the terminal device.
  • the classification task can be integrated on the server and carried on the server device.
  • the terminal device may initiate a classification request to the server device after acquiring the target image sequence.
  • the above-mentioned server device may execute the above-mentioned classification method on the above-mentioned target image sequence in response to the above-mentioned request.
  • the execution subject is an electronic device (hereinafter referred to as a device) as an example for description.
  • FIG. 2 is a schematic diagram of a flow of crowd behavior detection shown in the present application.
  • the target image sequence may be acquired first.
  • the above target image sequence refers to an image sequence containing multiple pedestrian objects and requiring crowd behavior detection.
  • the target image sequence may include multiple frames of images.
  • the target image sequence may include a video sequence or a multi-frame discrete image sequence.
  • the above-mentioned video sequence includes N frames of consecutive images containing multiple objects; the above-mentioned N is a positive integer.
  • the device when acquiring the target image sequence, may interact with the user to complete the input of the target image sequence. For example, the above-mentioned device may provide the user with a window for inputting the target image sequence to be processed through the interface carried by the device, so that the user can input the target image sequence. The user can complete the input of the target image sequence based on this window.
  • the above-mentioned device may also be connected with an image acquisition device (eg, video surveillance) deployed on site, so as to acquire the target image sequence acquired by the above-mentioned image acquisition device from the above-mentioned image acquisition device.
  • an image acquisition device eg, video surveillance
  • S102 may be continued to perform object tracking on at least one object appearing in the target image sequence including multiple objects, to determine the position change information of each object in the target image sequence.
  • the above-mentioned object tracking specifically refers to tracking the same object appearing in each frame of images.
  • the same object appearing in each frame of images is determined to complete the object tracking.
  • the above-mentioned object tracking is pedestrian tracking. Pedestrian tracking can be achieved by determining the same pedestrian appearing in each image during pedestrian tracking.
  • the above position change information may specifically indicate the movement track information of the object in the target image sequence. For example, in a special scene, pedestrian tracking can be performed on pedestrians, and the position information of the same pedestrian in each frame of image can be determined, thereby determining the movement trajectory of the pedestrian in the image sequence. It can be understood that the above position change information can represent the object position information and time domain information of the object in each image.
  • the above-mentioned object position information may represent object coordinates.
  • the above-mentioned time domain information can represent the time information corresponding to each position of the object.
  • the acquired target image sequence may be input into the object tracking unit to perform the above S102.
  • the above-mentioned object tracking unit may specifically execute S1022 through an instruction executable by the device, and use an object position prediction model to perform position prediction processing on each of the above-mentioned images, and determine the position information of each of the above-mentioned objects in each image.
  • an object position prediction model may be used to perform position prediction processing on each of the above-mentioned images, so as to determine the position information of the above-mentioned object in each of the images.
  • the above-mentioned object position prediction model includes a model trained based on several training samples marked with object position information.
  • the above-mentioned object position prediction model may be a neural network model constructed based on a deep convolutional network.
  • supervised training of the position prediction model can be performed using training samples marked with object position information until the model converges.
  • the object tracking unit may execute S1024 to perform object tracking on the object based on the location information, and determine the location change information of the object in the target image sequence.
  • the method of object tracking is not particularly limited in this application, and two object tracking methods are schematically given below.
  • Method 1 When performing S1024, a Kalman filter algorithm may be used to perform object tracking on each of the above objects, and to determine the position change information of each of the above objects.
  • the acquisition sequence of the above-mentioned images starting from the first frame image, successively determine two adjacent frames of images as the current two frames of images, and perform the following steps: use the Kalman filtering algorithm to determine the current two frames of images.
  • the position information corresponding to each object contained; the position information corresponding to each object contained in the first image in the current two frames of images is respectively matched with the second image in the above-mentioned two current frames of images through the Hungarian matching algorithm (bipartite graph matching algorithm).
  • the location information corresponding to each included object is matched.
  • the distance between the position information corresponding to each object included in the first image and the position information corresponding to each object included in the second image may be calculated. If the calculated distance is less than the preset standard threshold, it can be determined that the two pieces of position information corresponding to the distance are the two pieces of position information in the matching process.
  • the two objects corresponding to the two pieces of position information in the matching may be determined as the same object appearing in the above-mentioned two current frames of images, so as to implement object tracking on the above-mentioned objects.
  • the position change information of the object is determined based on the tracked position information of the same object in each image.
  • the same object appearing in each of the above images can be determined, so that the same object can be tracked in each image.
  • the position change information of the object in the above target image sequence can be determined based on the position information of the object in each image.
  • Method 2 When performing S1024, the same object appearing in each of the above images may be determined based on the object detection model, so as to implement object tracking for each of the above objects.
  • the above-mentioned object detection models include models constructed based on deep learning networks.
  • the above-mentioned object detection model may specifically be a pre-trained semantic detection model (eg, models such as fast-rcnn, faster-rcnn, mask-rcnn, etc.).
  • the object feature corresponding to the pedestrian object included in the image can be detected through the detection model.
  • the aforementioned object features may be human face features. After the object features included in each image are detected, similarity calculation may be performed on the object features included in different two frames of images, and the objects whose similarity reaches the second standard threshold are determined as the same object.
  • the above-mentioned object target may be a pedestrian.
  • the face contained in each image can be detected by the above-mentioned object detection model.
  • similarity calculation may be performed on the face features included in different two frames of images, and the faces whose similarity reaches the second standard threshold are determined as the same face. After the same face is determined, it can be determined that the same pedestrian appears in the above two frames of images.
  • the position change information of each object can be determined based on the tracked position information of the same object in each image.
  • the above position change information corresponding to each object may be stored in the form of a three-dimensional matrix (T*H*W).
  • the number of channels of the three-dimensional matrix may be the number of image frames included in the target image sequence; the elements of the three-dimensional matrix may be the position coordinates of the object in the image corresponding to the channel serial number. It can be understood that, at this time, the above-mentioned three-dimensional matrix can be determined as the feature matrix corresponding to the above-mentioned target image sequence.
  • the above-mentioned position change information has time-domain characteristics, and can indicate the change of the position coordinates during the movement of the object within the time-domain range shown by the above-mentioned target image sequence.
  • the motion characteristics of each object can be determined, that is, whether each object is gradually aggregated or gradually dispersed. Therefore, it is feasible to perform crowd behavior detection based on the location change information.
  • S1042 may be performed first, and graph convolution processing is performed based on the above-mentioned position change information obtained in the above-mentioned target image sequence, so as to obtain the extracted features corresponding to the above-mentioned target image sequence.
  • the above feature extraction specifically includes a feature matrix or feature vector determined by performing graph convolution processing (including spatial graph convolution and time domain graph convolution). It can be understood that the above-mentioned extracted features are determined based on the position change information of a plurality of pedestrian objects in the target image sequence, so the above-mentioned extracted features are beneficial for determining crowd behavior.
  • connection relationship between objects in each image included in the target image sequence may be determined.
  • connection relationships determined by using different connection relationship determination rules have different meanings.
  • the connection relationship determined by the similarity between the image features corresponding to the regions where the two objects are located in the image can represent the degree of association between the two objects from the perspective of similarity.
  • the connection relationship determined by the distance between two objects can represent the degree of association between the two objects from the perspective of distance.
  • the image features corresponding to the regions where the objects included in the above images are located in the images may be extracted.
  • the above-mentioned image features represent the image information of the position of each object. If the image features of the two objects are relatively similar, it can be shown that the positions of the two objects are very similar, that is, the distances between the two objects are relatively close and have a connection relationship.
  • the similarity between any two objects in each object can be determined based on the image features corresponding to each object.
  • two objects corresponding to a degree of similarity that reaches a first preset threshold may be determined as two objects having a connection relationship.
  • the above-mentioned first preset threshold includes a threshold set according to experience. The above-mentioned first preset threshold is not particularly limited in this application.
  • the present application also does not specifically limit the method for calculating the similarity.
  • the above-mentioned method for calculating the similarity may be methods such as Euclidean distance, cosine distance, Mahalanobis distance, and the like.
  • the connection relationship between the objects may be determined based on the distance between the objects.
  • FIG. 3 is a schematic flowchart of a method for determining a connection relationship shown in the present application.
  • S302 may be executed to perform image processing on each of the above-mentioned images, and to determine the position information of the above-mentioned object in each of the images.
  • S304 may be executed to determine the distance between any two objects in each object based on the position information corresponding to each object.
  • the connection relationship between any two objects included in each image may be determined based on the above distance.
  • two objects corresponding to a distance that does not reach the second preset threshold may be determined as two objects with a connection relationship .
  • the above-mentioned second preset threshold includes a threshold set according to experience. The above-mentioned second preset threshold is not particularly limited in this application.
  • connection weight between the two objects is set to 1; otherwise, the connection weight between the two objects is set to 0.
  • connection relationship is determined by the distance between objects
  • the spatiotemporal map determined based on the connection relationship can indicate the distance relationship between objects
  • the extracted features determined after the graph convolution operation on the spatiotemporal map can also include Distance information between objects. Therefore, when classifying crowd behaviors in the target image sequence based on the extracted features, the classification accuracy such as pedestrian gathering, pedestrian dispersion or pedestrian retention can be improved.
  • connection weight between two objects can be determined according to the true distance between the two objects.
  • the determined distance between any two objects may be mapped into an interval formed by the third preset threshold and the fourth preset threshold.
  • the third preset threshold and the fourth preset threshold are empirical thresholds. In some examples, the third preset threshold is 0, and the fourth preset threshold is 1.
  • the distance between any two objects after the mapping can be determined as the connection weight between the above-mentioned arbitrary two objects, and the above-mentioned arbitrary two objects are indicated by the connection weight between the above-mentioned arbitrary two objects connection between.
  • the above space-time map can indicate the distance information that is closer to the actual, thereby further improving the classification accuracy.
  • S104 After the connection relationship between any two objects included in each image included in the target image sequence is determined, S104 may be continued.
  • the above S1042 can be implemented by a graph convolution model.
  • the above graph convolution model may be a model constructed based on a spatiotemporal graph convolution processing network.
  • the above-mentioned spatiotemporal graph convolution network at least includes a spatial graph convolution network (GCN) for performing spatial graph convolution processing on each frame of images, and a time domain for performing temporal convolution on the graph features corresponding to each frame image.
  • GCN spatial graph convolution network
  • TCNs Convolutional Networks
  • FIG. 4 is a schematic diagram of a graph convolution processing flow shown in the present application.
  • the above-mentioned position change information may be input into the GCN included in the graph convolution model to execute S402, based on the object position information in each image included in the above-mentioned target image sequence represented by the above-mentioned position change information, and the connection relationship between the objects in each of the above images, perform spatial graph convolution processing on each of the above-mentioned images, respectively, to obtain the graph features corresponding to each of the images.
  • an adjacency matrix corresponding to each image may be determined based on the connection relationship between objects in each image.
  • a topology map corresponding to each image may be generated first.
  • each object in each image can be used as the vertex V of the topology graph, and the edge E can be determined according to the connection relationship between the objects to obtain the topological graph corresponding to each image.
  • the adjacency matrix A corresponding to each of the above images can be determined based on the topological map corresponding to each of the above-mentioned images, and the feature matrix X 0 corresponding to each of the above-mentioned images can be determined based on the above-mentioned object position information. .
  • the above-mentioned spatial graph convolution processing may be completed based on the above-mentioned adjacency matrix and the above-mentioned characteristic matrix, so as to obtain the graph features corresponding to each of the above-mentioned images.
  • the graph features corresponding to the above images can be input into the TCN included in the graph convolution model to execute S404, and time domain convolution processing is performed on the graph features corresponding to the above images to obtain the target image.
  • the extracted features corresponding to the sequence can be input into the TCN included in the graph convolution model to execute S404, and time domain convolution processing is performed on the graph features corresponding to the above images to obtain the target image.
  • the extracted features corresponding to the sequence can be input into the TCN included in the graph convolution model to execute S404, and time domain convolution processing is performed on the graph features corresponding to the above images to obtain the target image.
  • the map features corresponding to each of the above images may be sorted according to the time domain information represented by the above position change information. Then, based on a preset one-dimensional convolution kernel, a one-dimensional convolution process is performed on the image features corresponding to the sorted images to obtain the extracted features corresponding to the above target image sequence.
  • S1044 may be continued to determine crowd behaviors corresponding to the above-mentioned objects in the above-mentioned target image sequence based on the above-mentioned extracted features.
  • the above-mentioned extracted features may be input into a pre-trained multi-classifier for classification, so as to obtain the above-mentioned crowd behavior.
  • FIG. 5 is a schematic diagram of a classification flow shown in this application.
  • the above-mentioned multi-classifier includes a downsampling unit and a fully connected layer.
  • the above-mentioned down-sampling unit may be used to process the extracted features to obtain corresponding feature vectors.
  • the above-mentioned down-sampling unit may be an average pooling unit.
  • the above-mentioned fully-connected layer is used to classify based on the above-mentioned feature vector, and obtain a confidence score corresponding to each preset classification type.
  • the above-mentioned extracted features may be input into the down-sampling unit to execute S502, and the above-mentioned extracted features are averagely pooled to obtain corresponding feature vectors.
  • the feature vector can be input into the fully connected layer to execute S504, and the feature vector is fully connected to obtain the confidence score corresponding to each preset classification type.
  • the crowd behavior type corresponding to the maximum confidence score can be determined as the crowd behavior corresponding to the plurality of objects in the target image sequence.
  • the above-mentioned crowd behavior at least includes at least one of the following: pedestrians gather; pedestrians are scattered; pedestrians stay; pedestrians reverse flow.
  • the position change information of the above-mentioned object in the above-mentioned target image sequence is determined by performing object tracking on the object appearing in the above-mentioned target image sequence. Then, graph convolution processing is performed based on the above position change information to obtain extracted features corresponding to the above target image sequence, and based on the above extracted features, crowd behaviors corresponding to the plurality of above objects in the above target image sequence are determined. In this way, the principle of graph convolution is used to determine the extraction features that are beneficial to the detection of crowd behavior from the target image sequence, so as to realize the accurate detection of crowd behavior represented by the target image sequence.
  • Embodiments are described below in combination with security scenarios.
  • the above security scenarios usually set up monitoring equipment.
  • the surveillance equipment typically captures video sequences. It can be understood that in the security scenario, the video sequences collected by the health equipment are actually classified.
  • FIG. 6 is a schematic diagram of a video sequence classification flow diagram shown in the present application.
  • S602 may be performed based on the coordinate determination unit to perform image processing on each image included in the target video sequence, to determine the position information of pedestrians appearing in the video in each image.
  • S604 may be executed based on the pedestrian tracking unit, and based on the location information, object tracking is performed on the pedestrian to determine the location change information of the pedestrian in the target image sequence.
  • S606 may be performed based on the image convolution model included in the graph convolution classification model, and graph convolution processing is performed based on the position change information to obtain the extracted features corresponding to the target image sequence.
  • the above graph convolution classification model may specifically be a classification model constructed based on a graph convolution model and a multi-classification model.
  • a graph convolution operation can be performed on the spatiotemporal graph to determine the extraction feature corresponding to the above-mentioned spatiotemporal graph; The classification type of the sequence.
  • S608 may be performed based on the multi-classification model included in the graph convolution classification model to determine crowd behaviors corresponding to the objects in the target image sequence based on the extraction features.
  • the principle of graph convolution is used first, and extraction features that can reflect the distance change information of each pedestrian in the video sequence are determined based on the position change information of the pedestrians in the video sequence. Then, the classification type of the above-mentioned video sequence is determined based on the above-mentioned extracted features, so as to determine the pedestrian behavior occurring in the video sequence, and make corresponding arrangements according to the determined pedestrian behavior to reduce the probability of occurrence of dangerous events.
  • the above graph convolution classification model can be used to implement the above graph convolution processing.
  • the graph convolutional classification models described above may include graph convolutional models as well as multi-classification models.
  • the above-mentioned graph convolution model may use the position change information of each object in the target image sequence as input to perform graph convolution processing, and obtain the extracted features corresponding to the above-mentioned target image sequence.
  • the above-mentioned multi-classification model may take the above-mentioned extracted features as input, and perform classification processing on the above-mentioned extracted features, so as to obtain the crowd behavior represented by the above-mentioned target image sequence.
  • the training of the graph convolution classification model is actually a process of determining the model parameters included in the above graph convolution model and the above multi-classification model.
  • a model training method is proposed in this application.
  • the method trains the graph convolution classification model by constructing virtual training samples, so that model training can also be achieved in the absence of real samples.
  • FIG. 7 is a method flowchart of a model training method shown in this application.
  • the above training method includes: S702 , generating a training sample, wherein the training sample has position change information including multiple objects, and has annotation information of crowd behavior based on the position change information of the multiple objects.
  • S7022 may be executed first, and based on the motion simulation platform, the motion mode corresponding to the object appearing in the video is set.
  • the above-mentioned motion simulation platform is specifically any platform that can perform motion simulation.
  • the motion simulation platform described above may be a game development platform.
  • the above-mentioned movement mode may include speed and movement direction.
  • the coordinates of the objects in each frame of images included in the above video can be determined, so as to determine the position change information of each object in the above video.
  • the crowd behavior represented by the above video can be obtained. For example, in a security scene, when the movement patterns of pedestrians are in the same direction, the crowd behavior represented by the video can be determined to be pedestrian gathering; otherwise, the crowd behavior represented by the video can be determined to be pedestrian dispersion.
  • S7024 may be executed to determine the position change information corresponding to each object based on the above motion mode, and determine the crowd behavior represented by the position change information corresponding to each object.
  • the above-mentioned crowd behaviors may include pedestrian gathering, pedestrian dispersion, and pedestrian retention.
  • S7026 may be executed to generate the training sample based on the location change information and the crowd behavior represented by the location change information.
  • the position change information and the above classification types may be encoded by means of one-hot encoding, so as to obtain several training samples.
  • the present application does not limit the specific manner of the above encoding.
  • S704 may be continued, and the above-mentioned graph convolution classification model is trained based on the preset loss information and the above-mentioned training samples, until the model converges.
  • the above-mentioned preset loss information may be loss information set according to experience.
  • the above graph convolution classification model (hereinafter referred to as the model) may be supervised training based on the above training samples.
  • forward propagation can be performed to obtain the computational results output by the model.
  • the error between the real classification type and the above calculation result can be evaluated based on the above preset loss information.
  • the stochastic gradient descent method can be used to determine the descending gradient.
  • the model parameters corresponding to the model can be updated based on backpropagation.
  • the above process can then be repeated until the model converges.
  • the condition for the convergence of the above model may be, for example, that the preset number of training times is reached, or the variation of the error obtained after M consecutive forward propagations is less than a certain threshold.
  • the present application does not specifically limit the conditions for model convergence.
  • the training process since the training samples are used to train the graph convolution classification model, the training process does not need to rely on real training samples.
  • an object location prediction model for determining object location may also be jointly trained.
  • an object tracking model for object tracking may also be jointly trained.
  • a graph convolution classification model for graph convolution processing and classification may also be jointly trained.
  • videos representing pedestrian gathering and pedestrian dispersion can be constructed through a motion simulation platform, and crowd behavior annotations can be performed on the constructed videos to obtain training samples.
  • the training samples can be input into the above-mentioned object position prediction model to obtain the first calculation result.
  • the above-mentioned first calculation result is input into the above-mentioned object tracking model to obtain the second calculation result.
  • the above-mentioned second calculation result is input into the above-mentioned graph convolution classification model to obtain the crowd behavior detection result for the video representation.
  • the parameter update of each model can be completed by using the back-propagation method according to the label information corresponding to the above-mentioned virtual identification.
  • joint training of each model can be realized to improve training efficiency.
  • the present application further provides a crowd behavior detection device.
  • FIG. 8 is a schematic structural diagram of a crowd behavior detection apparatus shown in the present application.
  • the above-mentioned apparatus 80 includes: a position change information determination module 81 , based on the object tracking result of at least one object appearing in the target image sequence including a plurality of objects, to determine the position of each object in the above-mentioned target image sequence. position change information; the crowd behavior detection module 82 is configured to perform graph convolution processing based on the above position change information obtained in the above-mentioned target image sequence, and determine a plurality of above-mentioned target image sequences based on the extracted features obtained by the above-mentioned graph convolution Crowd behavior corresponding to the object.
  • the above-mentioned position change information determination module 81 is specifically configured to: perform image processing on each image included in the above-mentioned target image sequence, respectively, to determine the position information of each of the above-mentioned objects in each image; Object tracking is performed for each of the above-mentioned objects, so as to determine the position change information of each of the above-mentioned objects in the above-mentioned target image sequence based on the tracking result and the above-mentioned position information.
  • the above-mentioned position change information determination module 81 is specifically configured to: use a Kalman filter algorithm or an object detection model to perform object tracking on each of the above-mentioned objects; Position information, to determine the position change information of each of the above objects.
  • the above-mentioned crowd behavior detection module 82 includes: a spatial graph convolution module, which is used for the object position information in each image included in the above-mentioned target image sequence represented by the above-mentioned position change information and the object position information in each of the above-mentioned images.
  • the crowd behaviors include at least one of the following: pedestrians gather; pedestrians are scattered; pedestrians stay ; Pedestrian countercurrent.
  • the above-mentioned spatial graph convolution processing module is specifically configured to: determine the adjacency matrix corresponding to each of the above-mentioned images based on the connection relationship between the objects in each of the above-mentioned images; The feature matrix corresponding to each image; the spatial graph convolution process is completed based on the adjacency matrix and the feature matrix, and the graph feature corresponding to each image is obtained.
  • the above-mentioned apparatus 80 further includes: a connection relationship determination module, configured to determine the connection relationship between any two objects included in each image included in the above-mentioned target image sequence.
  • connection relationship determination module is specifically configured to: extract image features corresponding to the regions where the objects included in the images are located in the image; the image features represent the images where the objects are located information; based on the image features corresponding to each object, determine the similarity between any two objects in each object; and determine the two objects corresponding to the similarity not reaching the first preset threshold as two objects with a connection relationship.
  • connection relationship determining module is specifically configured to: perform image processing on each of the above-mentioned images, respectively, to determine the position information of the above-mentioned object in each image; The distance between any two objects; the connection relationship between any two objects included in each image is determined based on the above distance.
  • the above-mentioned connection relationship determination module is specifically configured to: map the determined distance between any two objects to the interval formed by the third preset threshold and the fourth preset threshold; The distance between any two following objects is determined as the connection weight between the above-mentioned arbitrary two objects; the connection relationship between the above-mentioned arbitrary two objects is indicated by the connection weight between the above-mentioned arbitrary two objects.
  • the graph convolution processing is implemented by a graph convolution classification model; wherein, the training device for the graph convolution classification model includes: a generating module for generating training samples, wherein the training samples have Contains position change information of a plurality of objects, and annotation information of crowd behaviors based on the position change information of the above-mentioned multiple objects; a training module is used for the preset map volume based on the position change information and the annotation information of the crowd behaviors.
  • the product model is trained to obtain the above graph convolution classification model.
  • the above-mentioned generating module is specifically used for: setting motion patterns corresponding to multiple objects based on the motion simulation platform; determining the position change information corresponding to each object based on the above-mentioned motion pattern; The crowd behavior represented by the position change information; the above training samples are generated based on the above position change information and the crowd behavior represented by the above position change information.
  • the embodiments of the crowd behavior detection apparatus shown in this application can be applied to electronic devices. Accordingly, the present application discloses an electronic device, which may include: a processor and a memory for storing instructions executable by the processor. Wherein, the above-mentioned processor is configured to invoke the executable instructions stored in the above-mentioned memory to implement the crowd behavior detection method shown in any of the above-mentioned embodiments.
  • FIG. 9 is a schematic diagram of a hardware structure of an electronic device shown in this application.
  • the electronic device may include a processor for executing instructions, a network interface for network connection, a memory for storing operating data for the processor, and a memory for storing instructions corresponding to the crowd behavior detection apparatus. non-volatile memory.
  • the embodiments of the foregoing apparatus may be implemented by software, or may be implemented by hardware or a combination of software and hardware.
  • a device in a logical sense is formed by reading the corresponding computer program instructions in the non-volatile memory into the memory for operation by the processor of the electronic device where the device is located.
  • the electronic device where the apparatus is located in the embodiment may also include other electronic devices based on the actual functions of the electronic device.
  • Hardware no further details on this. It can be understood that, in order to improve the processing speed, the corresponding instructions of the crowd behavior detection apparatus may also be directly stored in the memory, which is not limited herein.
  • This application proposes a computer-readable storage medium, which may be a volatile storage medium or a non-volatile storage medium.
  • the storage medium stores a computer program, and the computer program is used to execute the crowd behavior shown in any of the foregoing embodiments. Detection method.
  • one or more embodiments of the present application may be provided as a method, system or computer program product. Accordingly, one or more embodiments of the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, one or more embodiments of the present application may employ a computer implemented above in one or more computer-usable storage media (which may include, but are not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein The form of the program product.
  • computer-usable storage media which may include, but are not limited to, disk storage, CD-ROM, optical storage, etc.
  • Embodiments of the subject matter and functional operations described in this application can be implemented in digital electronic circuits, in tangible embodiment of computer software or firmware, in computer hardware which can include the structures disclosed in this application and their structural equivalents, or in A combination of one or more of.
  • Embodiments of the subject matter described in this application may be implemented as one or more computer programs, ie, one or more of the computer program instructions encoded on a tangible non-transitory program carrier as described above for execution by or to control the operation of a data processing apparatus or multiple modules.
  • the program instructions may be encoded in an artificially generated propagated signal as described above, such as a machine-generated electrical, optical or electromagnetic signal, which is generated to encode and transmit information to a suitable receiver device for use by the data.
  • the processing device executes.
  • the computer storage medium may be a machine-readable storage device, a machine-readable storage substrate, a random or serial access memory device, or a combination of one or more of these.
  • the processes and logic flows described in this application can be performed by one or more programmable computers executing one or more computer programs to perform corresponding functions by operating on input data and generating output.
  • the processes and logic flows described above can also be performed by, and apparatus can also be implemented as, special purpose logic circuitry, eg, an FPGA (Field Programmable Gate Array) or an ASIC (Application Specific Integrated Circuit).
  • FPGA Field Programmable Gate Array
  • ASIC Application Specific Integrated Circuit
  • a computer suitable for the execution of a computer program may include, for example, a general and/or special purpose microprocessor, or any other type of central processing unit.
  • the central processing unit will receive instructions and data from read only memory and/or random access memory.
  • the basic components of a computer may include a central processing unit for implementing or executing instructions and one or more memory devices for storing instructions and data.
  • a computer will also include, or be operably coupled to, such mass storage devices to receive data therefrom or to include one or more mass storage devices, such as magnetic disks, magneto-optical disks, or optical disks, etc., for storing data. Send data to it, or both.
  • the computer does not have to have such a device.
  • the computer may be embedded in another device, such as a mobile phone, personal digital assistant (PDA), mobile audio or video player, game console, global positioning system (GPS) receiver, or a universal serial bus (USB) ) flash drives for portable storage devices, to name a few.
  • PDA personal digital assistant
  • GPS global positioning system
  • USB universal serial bus
  • Computer readable media suitable for storage of computer program instructions and data may include all forms of non-volatile memory, media, and memory devices, and may include, for example, semiconductor memory devices (eg, EPROM, EEPROM, and flash memory devices), magnetic disks (eg, internal hard disks) or removable discs), magneto-optical discs, and CD-ROM and DVD-ROM discs.
  • semiconductor memory devices eg, EPROM, EEPROM, and flash memory devices
  • magnetic disks eg, internal hard disks
  • removable discs removable discs
  • magneto-optical discs e.g., CD-ROM and DVD-ROM discs.
  • the processor and memory may be supplemented by or incorporated in special purpose logic circuitry.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • Evolutionary Computation (AREA)
  • Human Computer Interaction (AREA)
  • Psychiatry (AREA)
  • Medical Informatics (AREA)
  • Computational Linguistics (AREA)
  • Databases & Information Systems (AREA)
  • Social Psychology (AREA)
  • Molecular Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Biomedical Technology (AREA)
  • Data Mining & Analysis (AREA)
  • Biophysics (AREA)
  • Image Analysis (AREA)

Abstract

Provided are a crowd behavior detection method and apparatus, and an electronic device, a storage medium and a computer program product. The method may comprise: performing object tracking on at least one object appearing in a target image sequence that includes a plurality of objects, and determining location change information of each object in the target image sequence; and performing image convolution processing on the basis of the location change information obtained in the target image sequence, and on the basis of extraction features obtained by means of the image convolution processing, determining crowd behaviors corresponding to the plurality of objects in the target image sequence.

Description

人群行为检测方法及装置、电子设备、存储介质及计算机程序产品Crowd behavior detection method and device, electronic device, storage medium and computer program product
相关申请的交叉引用CROSS-REFERENCE TO RELATED APPLICATIONS
本专利申请要求于2021年1月26日提交的、申请号为2021101062857、发明名称为“人群行为检测方法及装置、电子设备和存储介质”的中国专利申请的优先权,该申请以引用的方式并入文本中。This patent application claims the priority of the Chinese patent application filed on January 26, 2021 with the application number of 2021101062857 and the invention titled "Crowd Behavior Detection Method and Device, Electronic Equipment and Storage Medium", which is incorporated by reference incorporated into the text.
技术领域technical field
本申请涉及计算机技术,具体涉及一种人群行为检测方法及装置、电子设备、存储介质及计算机程序产品。The present application relates to computer technology, and in particular to a crowd behavior detection method and device, electronic equipment, storage medium and computer program product.
背景技术Background technique
随着城镇化的推进,人群越来越集中,因此识别人群中是否发生异常行为或者发生了什么异常行为对于行人安全十分重要。如果可以准确识别人群异常行为,并对异常行为做出制止防范,即可减小危险事件发生概率。With the advancement of urbanization, crowds are becoming more and more concentrated, so it is very important for pedestrian safety to identify whether or not there is abnormal behavior in the crowd. If the abnormal behavior of the crowd can be accurately identified, and the abnormal behavior can be stopped and prevented, the probability of dangerous events can be reduced.
例如,在安防场景下,通过图像采集设备(例如监控设备)可以采集包含行人的目标图像序列(视频序列)。若确定该目标图像序列中正在发生的行人行为属于诸如行人聚集、行人滞留等异常行为,即可马上安排进行人群疏导,避免出现诸如踩踏或群体恶性事件。可见,亟需提出一种检测目标图像序列中人群行为的方法。For example, in a security scene, a target image sequence (video sequence) including pedestrians can be captured by an image capturing device (eg, a monitoring device). If it is determined that the pedestrian behavior occurring in the target image sequence belongs to abnormal behaviors such as pedestrian gathering, pedestrian stasis, etc., crowd evacuation can be arranged immediately to avoid such events as stampede or group viciousness. It can be seen that there is an urgent need to propose a method for detecting crowd behavior in target image sequences.
发明内容SUMMARY OF THE INVENTION
有鉴于此,本申请至少公开一种人群行为检测方法,上述方法包括:对包含多个对象的目标图像序列中出现的至少一个对象进行对象跟踪,确定每一对象在上述目标图像序列中的位置变化信息;基于上述目标图像序列中获得的上述位置变化信息进行图卷积处理,并基于上述图卷积获得的提取特征确定上述目标图像序列中的多个上述对象对应的人群行为。In view of this, the present application discloses at least one method for detecting crowd behavior. The method includes: performing object tracking on at least one object appearing in a target image sequence including multiple objects, and determining the position of each object in the target image sequence. Change information; perform graph convolution processing based on the above position change information obtained in the above target image sequence, and determine crowd behaviors corresponding to a plurality of the above objects in the above target image sequence based on the extracted features obtained by the above graph convolution.
在示出的一些实施例中,上述对包含多个对象的目标图像序列中出现的至少一个对象进行对象跟踪,确定每一对象在上述目标图像序列中的位置变化信息,包括:对上述目标图像序列包括的每一图像分别进行图像处理,确定上述每一对象分别在相应图像中的位置信息;对上述每一对象进行对象跟踪,以基于跟踪结果以及上述位置信息,确定上述每一对象在上述目标图像序列中的位置变化信息。In some of the illustrated embodiments, performing object tracking on at least one object appearing in a target image sequence including multiple objects, and determining the position change information of each object in the target image sequence includes: tracking the target image Each image included in the sequence is subjected to image processing to determine the position information of each of the above-mentioned objects in the corresponding image; object tracking is performed on each of the above-mentioned objects to determine, based on the tracking result and the above-mentioned position information, that each of the above-mentioned objects is in the above-mentioned position. Position change information in the target image sequence.
在示出的一些实施例中,上述对上述每一对象进行对象跟踪,以基于跟踪结果以及上述位置信息,确定上述每一对象在上述目标图像序列中的位置变化信息,包括:利用卡尔曼滤波算法或者对象检测模型,对上述每一对象进行对象跟踪;基于跟踪到的同一 对象在相应图像中的位置信息,确定上述每一对象的位置变化信息。In some of the illustrated embodiments, the above-mentioned performing object tracking on each of the above-mentioned objects, so as to determine the position change information of each of the above-mentioned objects in the above-mentioned target image sequence based on the tracking result and the above-mentioned position information, includes: using Kalman filtering An algorithm or an object detection model performs object tracking on each of the above objects; based on the tracked position information of the same object in the corresponding image, the position change information of each of the above objects is determined.
在示出的一些实施例中,上述基于上述目标图像序列中获得的上述位置变化信息进行图卷积处理,得到上述目标图像序列中的多个上述对象对应的人群行为,包括:基于上述位置变化信息表征的上述目标图像序列包括的至少一个图像中的对象位置信息以及上述至少一个图像中对象之间的连接关系,对上述至少一个图像分别进行空间图卷积处理,得到至少一个图像分别对应的图特征;对上述至少一个图像分别对应的图特征进行时域卷积处理,并基于上述时域卷积处理得到的提取特征确定上述目标图像序列中的多个上述对象对应的人群行为;其中,上述人群行为至少包括以下中的至少一个:行人聚集;行人分散;行人滞留;行人逆流。In some of the illustrated embodiments, performing graph convolution processing based on the position change information obtained in the target image sequence to obtain crowd behaviors corresponding to the plurality of objects in the target image sequence, including: based on the position change The object position information in the at least one image included in the above-mentioned target image sequence represented by the information and the connection relationship between the objects in the above-mentioned at least one image are performed on the above-mentioned at least one image respectively. Image features; perform time-domain convolution processing on the image features corresponding to the at least one image respectively, and determine crowd behaviors corresponding to a plurality of the above-mentioned objects in the above-mentioned target image sequence based on the extracted features obtained by the above-mentioned time-domain convolution processing; wherein, The above crowd behavior at least includes at least one of the following: pedestrians gather; pedestrians are scattered; pedestrians stay; pedestrians reverse flow.
在示出的一些实施例中,上述基于上述位置变化信息表征的上述目标图像序列包括的至少一个图像中的对象位置信息,以及上述至少一个图像中对象之间的连接关系,对上述至少一个图像分别进行空间图卷积处理,得到至少一个图像分别对应的图特征,包括:基于上述至少一个图像中对象之间的连接关系,确定上述至少一个图像分别对应的邻接矩阵;基于上述对象位置信息,确定上述至少一个图像分别对应的特征矩阵;基于上述邻接矩阵与上述特征矩阵完成上述空间图卷积处理,得到上述每一图像分别对应的图特征。In some of the illustrated embodiments, the object position information in at least one image included in the above-mentioned target image sequence represented by the above-mentioned position change information, and the connection relationship between objects in the above-mentioned at least one image, for the above-mentioned at least one image Performing spatial graph convolution processing respectively to obtain graph features corresponding to at least one image respectively, including: determining an adjacency matrix corresponding to the at least one image based on the connection relationship between objects in the at least one image; based on the object position information, Determine the feature matrix corresponding to the at least one image respectively; complete the spatial graph convolution processing based on the adjacency matrix and the feature matrix to obtain the graph feature corresponding to each image.
在示出的一些实施例中,上述基于上述目标图像序列中获得的上述位置变化信息进行图卷积处理,得到与上述目标图像序列对应的提取特征的步骤之前,还包括:确定上述目标图像序列包括的至少一个图像包含的任意两个对象之间的连接关系。In some of the illustrated embodiments, before the step of performing graph convolution processing based on the position change information obtained in the target image sequence to obtain the extraction features corresponding to the target image sequence, the method further includes: determining the target image sequence The connection relationship between any two objects contained in at least one of the included images.
在示出的一些实施例中,上述确定上述目标图像序列包括的至少一个图像包含的任意两个对象之间的连接关系,包括:提取所述至少一个图像包含的至少一个对象在图像中所处区域对应的图像特征;所述图像特征表征至少一个对象所处位置的图像信息;基于至少一个对象对应的图像特征,确定至少一个对象中任意两个对象之间的相似度;将未达到第一预设阈值的相似度对应的两个对象确定为具有连接关系的两个对象。In some of the illustrated embodiments, determining the connection relationship between any two objects included in at least one image included in the target image sequence includes: extracting where in the image at least one object included in the at least one image is located The image feature corresponding to the area; the image feature represents the image information of the location of the at least one object; based on the image feature corresponding to the at least one object, the similarity between any two objects in the at least one object is determined; The two objects corresponding to the similarity of the preset threshold are determined as two objects having a connection relationship.
在示出的一些实施例中,上述确定上述目标图像序列包括的至少一个图像包含的任意两个对象之间的连接关系,包括:对上述至少一个图像分别进行图像处理,确定上述对象在至少一个图像中的位置信息;基于至少一个对象对应的位置信息,确定至少一个对象中任意两个对象之间的距离;基于上述距离确定至少一个图像包含的任意两个对象之间的连接关系。In some of the illustrated embodiments, determining the connection relationship between any two objects included in at least one image included in the target image sequence includes: performing image processing on the at least one image, respectively, and determining that the object is in at least one image. The position information in the image; the distance between any two objects in the at least one object is determined based on the position information corresponding to the at least one object; the connection relationship between any two objects included in the at least one image is determined based on the above distance.
在示出的一些实施例中,上述基于上述距离确定至少一个图像包含的任意两个对象之间的连接关系,包括:将确定的任意两个对象之间的距离映射于由第三预设阈值与第 四预设阈值形成的区间内;将映射后的任意两个对象之间的距离确定为上述任意两个对象之间的连接权重;通过上述任意两个对象之间的连接权重指示上述任意两个对象之间的连接关系。In some of the illustrated embodiments, determining the connection relationship between any two objects included in the at least one image based on the above distance includes: mapping the determined distance between any two objects to a value determined by a third preset threshold and the interval formed by the fourth preset threshold; the distance between any two objects after mapping is determined as the connection weight between the above-mentioned any two objects; the above-mentioned arbitrary two objects are indicated by the connection weight between the above-mentioned any two objects A connection relationship between two objects.
在示出的一些实施例中,通过图卷积分类模型实现上述图卷积处理;其中,上述图卷积分类模型的训练方法包括:生成训练样本,其中,上述训练样本具有包含多个对象的位置变化信息,以及具有基于上述多个对象的位置变化信息的人群行为的标注信息;基于上述位置变化信息和上述人群行为的标注信息对预设的图卷积模型进行训练,得到上述图卷积分类模型。In some of the illustrated embodiments, the graph convolution processing is implemented by a graph convolution classification model; wherein, the training method of the graph convolution classification model includes: generating a training sample, wherein the training sample has a Position change information, and label information with crowd behavior based on the position change information of the multiple objects; train a preset graph convolution model based on the position change information and the crowd behavior label information, and obtain the graph convolution score. class model.
在示出的一些实施例中,上述生成训练样本,包括:基于运动仿真平台,设置多个对象对应的运动模式;基于上述运动模式,确定至少一个对象对应的位置变化信息;确定上述至少一个对象对应的位置变化信息表征的人群行为;基于上述位置变化信息,以及上述位置变化信息表征的人群行为,生成上述训练样本。In some of the illustrated embodiments, the above-mentioned generating a training sample includes: setting motion patterns corresponding to multiple objects based on a motion simulation platform; determining position change information corresponding to at least one object based on the motion pattern; determining the at least one object The crowd behavior represented by the corresponding position change information; the above training sample is generated based on the above position change information and the crowd behavior represented by the above position change information.
本申请还公开一种人群行为检测装置,上述装置包括:位置变化信息确定模块,基于对包含多个对象的目标图像序列中出现的至少一个对象的对象跟踪结果,确定每一对象在上述目标图像序列中的位置变化信息;人群行为检测模块,用于基于上述目标图像序列中获得的上述位置变化信息进行图卷积处理,并基于上述图卷积获得的提取特征确定上述目标图像序列中的多个上述对象对应的人群行为。The present application further discloses a crowd behavior detection device, the device includes: a position change information determination module, based on the object tracking result of at least one object appearing in a target image sequence including multiple objects, determine that each object is in the target image position change information in the sequence; a crowd behavior detection module, configured to perform graph convolution processing based on the above position change information obtained in the above target image sequence, and to determine the multi-dimensional data in the above target image sequence based on the extracted features obtained by the above graph convolution The crowd behavior corresponding to the above objects.
本申请还公开一种电子设备,上述设备包括:处理器;用于存储上述处理器可执行指令的存储器;其中,上述处理器被配置为调用上述存储器中存储的可执行指令,实现前述人群行为检测方法。The present application also discloses an electronic device, the device includes: a processor; a memory for storing executable instructions of the processor; wherein the processor is configured to call the executable instructions stored in the memory to implement the aforementioned crowd behavior Detection method.
本申请还公开一种计算机可读存储介质,上述存储介质存储有计算机程序,上述计算机程序用于执行前述人群行为检测方法。The present application also discloses a computer-readable storage medium, where the storage medium stores a computer program, and the computer program is used to execute the foregoing crowd behavior detection method.
本申请还公开一种计算机程序产品,当所述计算机程序产品在计算机上运行时,使得计算机执行前述人群行为检测方法。The present application also discloses a computer program product, which, when the computer program product runs on a computer, enables the computer to execute the aforementioned method for detecting crowd behavior.
在本申请中,通过对目标图像序列中出现的对象进行对象跟踪,确定上述对象在上述目标图像序列中的位置变化信息。然后再基于上述位置变化信息进行图卷积处理,得到与上述目标图像序列对应的提取特征,并基于上述提取特征确定上述目标图像序列中的多个上述对象对应的人群行为。从而实现利用图卷积原理,从目标图像序列确定出可以对检测人群行为有益的提取特征,进而实现上述目标图像序列表征的人群行为的精准检测。In the present application, by performing object tracking on the objects appearing in the target image sequence, the position change information of the above-mentioned objects in the above-mentioned target image sequence is determined. Then, graph convolution processing is performed based on the above position change information to obtain extracted features corresponding to the above target image sequence, and based on the above extracted features, crowd behaviors corresponding to the plurality of above objects in the above target image sequence are determined. In this way, the principle of graph convolution is used to determine the extraction features that are beneficial to the detection of crowd behavior from the target image sequence, so as to realize the accurate detection of crowd behavior represented by the target image sequence.
附图说明Description of drawings
图1为本申请示出的一种目标图像序列分类方法的方法流程图;Fig. 1 is the method flow chart of a kind of target image sequence classification method shown in this application;
图2为本申请示出的一种人群行为检测流程示意图;2 is a schematic flow chart of a crowd behavior detection shown in the application;
图3为本申请示出的一种图像中对象连接关系的确定方法流程图;3 is a flowchart of a method for determining a connection relationship of objects in an image shown in the application;
图4为本申请示出的一种图卷积处理流程示意图;4 is a schematic diagram of a graph convolution processing flow diagram shown in the application;
图5为本申请示出的一种分类流程示意图;Fig. 5 is a kind of classification flow schematic diagram shown in this application;
图6为本申请示出的一种视频序列分类流程示意图;6 is a schematic diagram of a video sequence classification flow diagram shown in this application;
图7为本申请示出的一种模型训练方法的方法流程图;Fig. 7 is the method flow chart of a kind of model training method shown in this application;
图8为本申请示出的一种人群行为检测装置的结构示意图;8 is a schematic structural diagram of a crowd behavior detection device shown in the application;
图9为本申请示出的一种电子设备的硬件结构示意图。FIG. 9 is a schematic diagram of a hardware structure of an electronic device shown in this application.
具体实施方式Detailed ways
下面将详细地对示例性实施例进行说明,其示例表示在附图中。下面的描述涉及附图时,除非另有表示,不同附图中的相同数字表示相同或相似的要素。以下示例性实施例中所描述的实施方式并不代表与本申请相一致的所有实施方式。相反,它们仅是与如所附权利要求书中所详述的、本申请的一些方面相一致的设备和方法的例子。Exemplary embodiments will be described in detail below, examples of which are illustrated in the accompanying drawings. Where the following description refers to the drawings, the same numerals in different drawings refer to the same or similar elements unless otherwise indicated. The implementations described in the illustrative examples below are not intended to represent all implementations consistent with this application. Rather, they are merely examples of apparatus and methods consistent with some aspects of the present application as recited in the appended claims.
在本申请使用的术语是仅仅出于描述特定实施例的目的,而非旨在限制本申请。在本申请和所附权利要求书中所使用的单数形式的“一种”、“上述”和“该”也旨在可包括多数形式,除非上述下文清楚地表示其他含义。应当理解,本文中使用的术语“和/或”是指并包含一个或多个相关联的列出项目的任何或所有可能组合。还应当理解,本文中所使用的词语“如果”,取决于语境,可被解释成为“在……时”或“当……时”或“响应于确定”。The terminology used in this application is for the purpose of describing particular embodiments only and is not intended to limit the application. As used in this application and the appended claims, the singular forms "a," "above," and "the" are intended to include plural forms as well, unless the above clearly dictates otherwise. It should be understood that the term "and/or" as used herein refers to and includes any and all possible combinations of one or more of the associated listed items. It should also be understood that the word "if", as used herein, can be interpreted as "at the time of" or "when" or "in response to determining", depending on the context.
本申请旨在提出一种人群行为检测方法(以下简称检测方法)。该方法利用图卷积原理,基于目标图像序列中出现的各对象对应的位置变化信息,从上述目标图像序列中,得到对确定上述目标图像序列中的多个上述对象对应的人群行为有益的提取特征。然后该方法可以继续基于上述提取特征进行分类,从而确定上述目标图像序列中的多个上述对象对应的人群行为。This application aims to propose a crowd behavior detection method (hereinafter referred to as detection method). The method utilizes the principle of graph convolution, and based on the position change information corresponding to each object appearing in the target image sequence, from the above target image sequence, obtains an extraction useful for determining the crowd behavior corresponding to a plurality of the above objects in the above target image sequence. feature. Then, the method can continue to classify based on the above-mentioned extracted features, so as to determine the crowd behaviors corresponding to the above-mentioned objects in the above-mentioned target image sequence.
比如,在特殊场景中,上述目标图像序列可以是监控采集得到的视频序列;上述对象可以是出现在上述目标图像序列中的行人。上述人群行为的类型可以包括行人聚集、行人滞留以及行人分散等。通过上述方法可以利用图卷积原理,基于行人在视频中的位置变化信息确定出可以对确定人群行为有益的提取特征。然后再基于上述提取特征进行分类,从而确定该视频序列中正在发生的人群行为,并根据确定人群行为做出相应安排,减小危险事件发生概率。For example, in a special scene, the above-mentioned target image sequence may be a video sequence collected by monitoring; the above-mentioned object may be a pedestrian appearing in the above-mentioned target image sequence. The above-mentioned types of crowd behavior may include pedestrian gathering, pedestrian stasis, and pedestrian dispersion. Through the above method, the principle of graph convolution can be used to determine the extracted features that can be beneficial for determining crowd behavior based on the position change information of pedestrians in the video. Then, classification is performed based on the above-mentioned extracted features, so as to determine the crowd behavior that is occurring in the video sequence, and make corresponding arrangements according to the determined crowd behavior to reduce the probability of occurrence of dangerous events.
请参见图1,图1为本申请示出的一种目标图像序列分类方法的方法流程图。Please refer to FIG. 1. FIG. 1 is a flowchart of a method for classifying a target image sequence shown in this application.
如图1所示,上述方法可以包括:As shown in Figure 1, the above method may include:
S102,对包含多个对象的目标图像序列中出现的至少一个对象进行对象跟踪,确定每一对象在上述目标图像序列中的位置变化信息。S102: Perform object tracking on at least one object appearing in a target image sequence including multiple objects, and determine position change information of each object in the target image sequence.
S104,基于上述目标图像序列中获得的上述位置变化信息进行图卷积处理,并基于上述图卷积获得的提取特征确定上述目标图像序列中的多个上述对象对应的人群行为。S104: Perform graph convolution processing based on the position change information obtained in the target image sequence, and determine crowd behaviors corresponding to the plurality of objects in the target image sequence based on the extracted features obtained by the graph convolution.
上述分类方法可以应用于电子设备中。其中,上述电子设备可以通过搭载与分类方法对应的软件系统执行上述分类方法。上述电子设备的类型可以是笔记本电脑,计算机,服务器,手机,PAD终端等。本申请不对上述电子设备的具体类型进行特别限定。The above classification method can be applied to electronic equipment. Wherein, the above-mentioned electronic device may execute the above-mentioned classification method by carrying a software system corresponding to the classification method. The types of the above electronic devices can be notebook computers, computers, servers, mobile phones, PAD terminals, etc. The present application does not specifically limit the specific types of the above electronic devices.
可以理解的是,上述分类方法既可以仅通过终端设备或服务端设备单独执行,也可以通过终端设备与服务端设备配合执行。It can be understood that, the above classification method can be executed only by the terminal device or the server device alone, or can be executed by the terminal device and the server device in cooperation.
例如,上述分类方法可以集成于客户端。搭载该客户端的终端设备在接收到分类请求后,可以通过自身硬件环境提供算力执行上述分类方法。For example, the above classification method can be integrated in the client. After receiving the classification request, the terminal device equipped with the client can provide computing power through its own hardware environment to execute the above classification method.
又例如,上述分类方法可以集成于系统平台。搭载该系统平台的服务端设备在接收到分类请求后,可以通过自身硬件环境提供算力执行上述分类方法。For another example, the above classification method can be integrated into the system platform. After receiving the classification request, the server device equipped with the system platform can provide computing power through its own hardware environment to execute the above classification method.
还例如,上述分类方法可以分为获取目标图像序列与对目标图像序列进行分类两个任务。其中,获取任务可集成于客户端并搭载于终端设备。分类任务可集成于服务端并搭载于服务端设备。上述终端设备可以在获取到目标图像序列后向上述服务端设备发起分类请求。上述服务端设备在接收到上述分类请求后,可响应于上述请求对上述目标图像序列执行上述分类方法。以下以执行主体为电子设备(以下简称设备)为例进行说明。For another example, the above classification method can be divided into two tasks: acquiring the target image sequence and classifying the target image sequence. Among them, the acquisition task can be integrated in the client and carried on the terminal device. The classification task can be integrated on the server and carried on the server device. The terminal device may initiate a classification request to the server device after acquiring the target image sequence. After receiving the above-mentioned classification request, the above-mentioned server device may execute the above-mentioned classification method on the above-mentioned target image sequence in response to the above-mentioned request. Hereinafter, the execution subject is an electronic device (hereinafter referred to as a device) as an example for description.
请继续参见图2,图2为本申请示出的一种人群行为检测流程示意图。Please continue to refer to FIG. 2 , which is a schematic diagram of a flow of crowd behavior detection shown in the present application.
在进行图2示出的流程前,可先获取目标图像序列。上述目标图像序列是指包含多个行人对象、需要进行人群行为检测的图像序列。该目标图像序列中可包括多帧图像。Before performing the process shown in FIG. 2 , the target image sequence may be acquired first. The above target image sequence refers to an image sequence containing multiple pedestrian objects and requiring crowd behavior detection. The target image sequence may include multiple frames of images.
在一些例子中,上述目标图像序列可以包括视频序列或多帧离散的图像序列。上述视频序列包括N帧连续的包含多个对象的图像;上述N为正整数。In some examples, the target image sequence may include a video sequence or a multi-frame discrete image sequence. The above-mentioned video sequence includes N frames of consecutive images containing multiple objects; the above-mentioned N is a positive integer.
在一些例子中,在获取上述目标图像序列时,上述设备可以通过与用户进行交互,完成目标图像序列的输入。例如,上述设备可以通过其搭载的界面为用户提供输入待处理目标图像序列的窗口,供用户输入目标图像序列。用户可以基于该窗口完成目标图像序列的输入。In some examples, when acquiring the target image sequence, the device may interact with the user to complete the input of the target image sequence. For example, the above-mentioned device may provide the user with a window for inputting the target image sequence to be processed through the interface carried by the device, so that the user can input the target image sequence. The user can complete the input of the target image sequence based on this window.
在一些例子中,上述设备还可以与现场部署的图像采集设备(例如视频监控)进行连接,从而从上述图像采集设备处获取该设备采集到的目标图像序列。In some examples, the above-mentioned device may also be connected with an image acquisition device (eg, video surveillance) deployed on site, so as to acquire the target image sequence acquired by the above-mentioned image acquisition device from the above-mentioned image acquisition device.
在获取目标图像序列后,可继续执行S102,对包含多个对象的目标图像序列中出现的至少一个对象进行对象跟踪,确定上述每一对象在上述目标图像序列中的位置变化信息。上述对象跟踪,具体是指对出现在各帧图像中的同一对象进行跟踪。在进行对象跟踪时,确定各帧图像中出现的同一对象即完成了对象跟踪。例如,在安防场景下,上述对象跟踪即为行人跟踪。在进行行人跟踪时可以通过确定各图像出现的同一行人即可实现行人跟踪。After acquiring the target image sequence, S102 may be continued to perform object tracking on at least one object appearing in the target image sequence including multiple objects, to determine the position change information of each object in the target image sequence. The above-mentioned object tracking specifically refers to tracking the same object appearing in each frame of images. During object tracking, the same object appearing in each frame of images is determined to complete the object tracking. For example, in a security scenario, the above-mentioned object tracking is pedestrian tracking. Pedestrian tracking can be achieved by determining the same pedestrian appearing in each image during pedestrian tracking.
上述位置变化信息,具体可以指示对象在目标图像序列中的运动轨迹信息。例如,在特殊场景下,可以对行人进行行人跟踪,可以确定出同一行人在各帧图像中的位置信息,从而确定该行人在图序列中的运动轨迹。可以理解的是,上述位置变化信息可以表征对象在各图像中的对象位置信息,以及时域信息。其中,上述对象位置信息可以表征对象坐标。上述时域信息可以表征对象的在各位置时对应的时间信息。The above position change information may specifically indicate the movement track information of the object in the target image sequence. For example, in a special scene, pedestrian tracking can be performed on pedestrians, and the position information of the same pedestrian in each frame of image can be determined, thereby determining the movement trajectory of the pedestrian in the image sequence. It can be understood that the above position change information can represent the object position information and time domain information of the object in each image. The above-mentioned object position information may represent object coordinates. The above-mentioned time domain information can represent the time information corresponding to each position of the object.
请继续参见图2,在本申请中可以将获取的目标图像序列输入对象跟踪单元执行上述S102。Please continue to refer to FIG. 2 , in this application, the acquired target image sequence may be input into the object tracking unit to perform the above S102.
上述对象跟踪单元,具体可以通过设备可执行的指令,执行S1022,利用对象位置预测模型,对上述每一图像分别进行位置预测处理,确定上述每一对象在各图像中的位置信息。在本步骤中,可以利用对象位置预测模型,对上述各图像分别进行位置预测处理,确定上述对象在各图像中的位置信息。其中,上述对象位置预测模型包括基于若干标注了对象位置信息的训练样本训练得到的模型。The above-mentioned object tracking unit may specifically execute S1022 through an instruction executable by the device, and use an object position prediction model to perform position prediction processing on each of the above-mentioned images, and determine the position information of each of the above-mentioned objects in each image. In this step, an object position prediction model may be used to perform position prediction processing on each of the above-mentioned images, so as to determine the position information of the above-mentioned object in each of the images. The above-mentioned object position prediction model includes a model trained based on several training samples marked with object position information.
可以理解的是,上述对象位置预测模型可以是基于深度卷积网络构建的神经网络模型。在使用该模型进行位置预测前,可以使用标注了对象位置信息的训练样本对该位置预测模型进行有监督训练,直至该模型收敛。It can be understood that the above-mentioned object position prediction model may be a neural network model constructed based on a deep convolutional network. Before using the model for position prediction, supervised training of the position prediction model can be performed using training samples marked with object position information until the model converges.
在确定上述位置信息后,上述对象跟踪单元中可以执行S1024,基于上述位置信息,对上述对象进行对象跟踪,确定上述对象在上述目标图像序列中的位置变化信息。After the location information is determined, the object tracking unit may execute S1024 to perform object tracking on the object based on the location information, and determine the location change information of the object in the target image sequence.
在本申请中不对对象跟踪的方法进行特别限定,以下示意性给出两种对象跟踪方法。The method of object tracking is not particularly limited in this application, and two object tracking methods are schematically given below.
方法一:在执行S1024时,可以利用卡尔曼滤波算法,对上述每一对象进行对象跟踪,确定上述每一对象的位置变化信息。Method 1: When performing S1024, a Kalman filter algorithm may be used to perform object tracking on each of the above objects, and to determine the position change information of each of the above objects.
在一些例子中,可以按照上述各图像的采集先后顺序,从首帧图像开始,依次将相邻两帧图像确定为当前两帧图像并执行以下步骤:利用卡尔曼滤波算法确定当前两帧图像中包含的各对象对应的位置信息;通过匈牙利匹配算法(二分图匹配算法)将当前两帧图像中的第一图像包含的各对象对应的位置信息,分别与上述当前两帧图像中的第二图像包含的各对象对应的位置信息进行匹配。In some examples, according to the acquisition sequence of the above-mentioned images, starting from the first frame image, successively determine two adjacent frames of images as the current two frames of images, and perform the following steps: use the Kalman filtering algorithm to determine the current two frames of images. The position information corresponding to each object contained; the position information corresponding to each object contained in the first image in the current two frames of images is respectively matched with the second image in the above-mentioned two current frames of images through the Hungarian matching algorithm (bipartite graph matching algorithm). The location information corresponding to each included object is matched.
其中,在执行上述匹配操作时,可以计算上述第一图像包含的各对象对应的位置信息,分别与上述第二图像包含的各对象对应的位置信息之间的距离。若计算的距离小于预设的标准阈值,即可确定该距离对应的两个位置信息为匹配中的两个位置信息。Wherein, when the matching operation is performed, the distance between the position information corresponding to each object included in the first image and the position information corresponding to each object included in the second image may be calculated. If the calculated distance is less than the preset standard threshold, it can be determined that the two pieces of position information corresponding to the distance are the two pieces of position information in the matching process.
在执行完上述匹配操作后,可以将匹配中的两个位置信息对应的两个对象确定为在上述当前两帧图像中出现的同一对象,以实现对上述对象进行对象跟踪。After the above matching operation is performed, the two objects corresponding to the two pieces of position information in the matching may be determined as the same object appearing in the above-mentioned two current frames of images, so as to implement object tracking on the above-mentioned objects.
当针对所有相邻图像执行完以上步骤后,基于跟踪到的同一对象在各图像中的位置信息,确定上述对象的位置变化信息。After the above steps are performed for all adjacent images, the position change information of the object is determined based on the tracked position information of the same object in each image.
在上述方法中可以确定上述各图像中出现的同一对象,从而实现在各图像中对该同一对象进行跟踪。在实现对该对象的对象跟踪后,既可基于该对象在各图像中的位置信息,确定该对象在上述目标图像序列中的位置变化信息。In the above method, the same object appearing in each of the above images can be determined, so that the same object can be tracked in each image. After the object tracking of the object is realized, the position change information of the object in the above target image sequence can be determined based on the position information of the object in each image.
方法二:在执行S1024时,可以基于对象检测模型确定上述各图像中出现的同一对象,以实现对上述每一对象进行对象跟踪。Method 2: When performing S1024, the same object appearing in each of the above images may be determined based on the object detection model, so as to implement object tracking for each of the above objects.
上述对象检测模型包括基于深度学习网络构建的模型。例如,上述对象检测模型,具体可是预先训练好的语义检测模型(例如,fast-rcnn、faster-rcnn、mask-rcnn等模型)。通过该检测模型可检测出图像包括的行人对象对应的对象特征。在一些例子中,上述对象特征可是人脸特征。在检测出各图像包括的对象特征后,可对不同的两帧图像包含的对象特征进行相似度计算,并将相似度达到第二标准阈值的对象确定为同一对象。The above-mentioned object detection models include models constructed based on deep learning networks. For example, the above-mentioned object detection model may specifically be a pre-trained semantic detection model (eg, models such as fast-rcnn, faster-rcnn, mask-rcnn, etc.). The object feature corresponding to the pedestrian object included in the image can be detected through the detection model. In some examples, the aforementioned object features may be human face features. After the object features included in each image are detected, similarity calculation may be performed on the object features included in different two frames of images, and the objects whose similarity reaches the second standard threshold are determined as the same object.
例如,在安防场景下,上述对象目标可以是行人。此时可以通过上述对象检测模型检测各图像包含的人脸。在检测出各图像包括的人脸后,可以对不同的两帧图像包含的人脸特征进行相似度计算,并将相似度达到第二标准阈值的人脸确定为同一人脸。确定同一人脸后即可确定上述两帧图像出现了同一行人。For example, in a security scene, the above-mentioned object target may be a pedestrian. At this time, the face contained in each image can be detected by the above-mentioned object detection model. After the faces included in each image are detected, similarity calculation may be performed on the face features included in different two frames of images, and the faces whose similarity reaches the second standard threshold are determined as the same face. After the same face is determined, it can be determined that the same pedestrian appears in the above two frames of images.
在确定各帧图像中出现的同一对象后,可以基于跟踪到的同一对象在各图像中的位置信息,确定上述每一对象的位置变化信息。After the same object appearing in each frame of images is determined, the position change information of each object can be determined based on the tracked position information of the same object in each image.
在一些例子中,在确定对象对应的位置变化信息后,可以通过三维矩阵(T*H*W)的形式存储各对象对应的上述位置变化信息。其中,三维矩阵的通道数可以是目标图像序列包括的图像帧数;三维矩阵的元素可以是对象在该通道序号对应的图像中的位置坐标。可以理解的是,此时上述三维矩阵可以被确定为上述目标图像序列对应的特征矩阵。In some examples, after the position change information corresponding to the objects is determined, the above position change information corresponding to each object may be stored in the form of a three-dimensional matrix (T*H*W). The number of channels of the three-dimensional matrix may be the number of image frames included in the target image sequence; the elements of the three-dimensional matrix may be the position coordinates of the object in the image corresponding to the channel serial number. It can be understood that, at this time, the above-mentioned three-dimensional matrix can be determined as the feature matrix corresponding to the above-mentioned target image sequence.
可以理解的是,上述位置变化信息具有时域特性,可以指示出对象在上述目标图像序列示出的时域范围内运动的过程中位置坐标的变化情形。基于目标图像序列中出现的各对象对应的上述位置变化信息即可确定出各对象的运动特性,即各对象是逐渐聚集还是逐渐分散。因此,基于该位置变化信息进行人群行为检测是可行的。It can be understood that the above-mentioned position change information has time-domain characteristics, and can indicate the change of the position coordinates during the movement of the object within the time-domain range shown by the above-mentioned target image sequence. Based on the above position change information corresponding to each object appearing in the target image sequence, the motion characteristics of each object can be determined, that is, whether each object is gradually aggregated or gradually dispersed. Therefore, it is feasible to perform crowd behavior detection based on the location change information.
请继续参见图2,在确定上述位置变化信息后,可以继续执行S104,基于上述目标图像序列中获得的上述位置变化信息进行图卷积处理,并基于上述图卷积获得的提取特征确定上述目标图像序列中的多个上述对象对应的人群行为。Please continue to refer to FIG. 2 , after determining the above-mentioned position change information, you can continue to perform S104, perform graph convolution processing based on the above-mentioned position change information obtained in the above-mentioned target image sequence, and determine the above-mentioned target based on the extracted features obtained by the above-mentioned graph convolution Crowd behavior corresponding to a plurality of the above objects in the image sequence.
其中,可以先执行S1042,基于上述目标图像序列中获得的上述位置变化信息进行图卷积处理,得到与上述目标图像序列对应的提取特征。Wherein, S1042 may be performed first, and graph convolution processing is performed based on the above-mentioned position change information obtained in the above-mentioned target image sequence, so as to obtain the extracted features corresponding to the above-mentioned target image sequence.
上述提取特征,具体包括进行图卷积处理(包括空间图卷积与时域图卷积)确定出的特征矩阵或特征向量。可以理解的是,上述提取特征为基于目标图像序列中的多个行人对象的位置变化信息确定的,因此上述提取特征对确定人群行为是有益的。The above feature extraction specifically includes a feature matrix or feature vector determined by performing graph convolution processing (including spatial graph convolution and time domain graph convolution). It can be understood that the above-mentioned extracted features are determined based on the position change information of a plurality of pedestrian objects in the target image sequence, so the above-mentioned extracted features are beneficial for determining crowd behavior.
在一些例子中,在执行S1042前,可以确定上述目标图像序列包括的各图像中对象之间的连接关系。In some examples, before executing S1042, the connection relationship between objects in each image included in the target image sequence may be determined.
可以理解的是,使用不同的连接关系确定规则确定的连接关系具有不同的含义。例如,通过两个对象在图像中所处区域对应的图像特征之间的相似度大小确定的连接关系可以从相似性角度表征两个对象之间的关联程度。再例如,通过两个对象之间的距离大小确定的连接关系可以从距离角度表征两个对象之间的关联程度。It can be understood that the connection relationships determined by using different connection relationship determination rules have different meanings. For example, the connection relationship determined by the similarity between the image features corresponding to the regions where the two objects are located in the image can represent the degree of association between the two objects from the perspective of similarity. For another example, the connection relationship determined by the distance between two objects can represent the degree of association between the two objects from the perspective of distance.
在一些例子中,可以提取上述各图像包含的各对象在图像中所处区域对应的图像特征。上述图像特征表征各对象所处位置的图像信息。如果两个对象的图像特征比较相似,可以说明所述两个对象所处位置很相似,即二者所处的距离比较接近,具有连接关系。In some examples, the image features corresponding to the regions where the objects included in the above images are located in the images may be extracted. The above-mentioned image features represent the image information of the position of each object. If the image features of the two objects are relatively similar, it can be shown that the positions of the two objects are very similar, that is, the distances between the two objects are relatively close and have a connection relationship.
之后,可以基于各对象对应的图像特征,确定各对象中任意两个对象之间的相似度。Afterwards, the similarity between any two objects in each object can be determined based on the image features corresponding to each object.
在一些例子中,可以将达到第一预设阈值的相似度对应的两个对象确定为具有连接关系的两个对象。其中,上述第一预设阈值包括根据经验设定的阈值。在本申请中不对上述第一预设阈值进行特别限定。In some examples, two objects corresponding to a degree of similarity that reaches a first preset threshold may be determined as two objects having a connection relationship. Wherein, the above-mentioned first preset threshold includes a threshold set according to experience. The above-mentioned first preset threshold is not particularly limited in this application.
需要说明的是,本申请也不对计算相似度的方法进行特别限定。例如,上述计算相似度的方法可以是诸如欧式距离,余弦距离,马氏距离等方法。在一些例子中,为了提升对目标图像序列的分类精确性,可以基于对象之间的距离确定对象之间的连接关系。It should be noted that the present application also does not specifically limit the method for calculating the similarity. For example, the above-mentioned method for calculating the similarity may be methods such as Euclidean distance, cosine distance, Mahalanobis distance, and the like. In some examples, in order to improve the classification accuracy of the target image sequence, the connection relationship between the objects may be determined based on the distance between the objects.
请参见图3,图3为本申请示出的一种连接关系确定方法流程示意图。如图3所示,可以执行S302,对上述各图像分别进行图像处理,确定上述对象在各图像中的位置信息。之后,可以执行S304,基于各对象对应的位置信息,确定各对象中任意两个对象之间的距离。Please refer to FIG. 3 , which is a schematic flowchart of a method for determining a connection relationship shown in the present application. As shown in FIG. 3 , S302 may be executed to perform image processing on each of the above-mentioned images, and to determine the position information of the above-mentioned object in each of the images. After that, S304 may be executed to determine the distance between any two objects in each object based on the position information corresponding to each object.
在确定任意两个对象之间的距离后,可以基于上述距离确定各图像包含的任意两个对象之间的连接关系。在一些例子中,在基于上述距离确定各图像包含的任意两个对象之间的连接关系时,可以将未达到第二预设阈值的距离对应的两个对象确定为具有连接 关系的两个对象。其中,上述第二预设阈值包括根据经验设定的阈值。在本申请中不对上述第二预设阈值进行特别限定。After the distance between any two objects is determined, the connection relationship between any two objects included in each image may be determined based on the above distance. In some examples, when the connection relationship between any two objects included in each image is determined based on the above distance, two objects corresponding to a distance that does not reach the second preset threshold may be determined as two objects with a connection relationship . Wherein, the above-mentioned second preset threshold includes a threshold set according to experience. The above-mentioned second preset threshold is not particularly limited in this application.
在一些例子中,若确定两个对象之间具有连接关系,则该两个对象之间的连接权重设置为1,否则将该两个对象之间的连接权重设置为0。In some examples, if it is determined that there is a connection relationship between the two objects, the connection weight between the two objects is set to 1; otherwise, the connection weight between the two objects is set to 0.
由于上述连接关系是通过对象之间的距离确定的,因此,基于该连接关系确定的时空图可以指示各对象之间的距离关系,对时空图进行图卷积操作后确定的提取特征也可以包含对象之间的距离信息。因此在基于该提取特征进行上述目标图像序列中人群行为分类时,可以提升诸如行人聚集,行人分散或行人滞留的分类精确性。Since the above connection relationship is determined by the distance between objects, the spatiotemporal map determined based on the connection relationship can indicate the distance relationship between objects, and the extracted features determined after the graph convolution operation on the spatiotemporal map can also include Distance information between objects. Therefore, when classifying crowd behaviors in the target image sequence based on the extracted features, the classification accuracy such as pedestrian gathering, pedestrian dispersion or pedestrian retention can be improved.
在一些例子中,为了进一步提升分类精确性,可以根据两个对象之间的真实距离,确定两个对象之间的连接权重。In some examples, in order to further improve the classification accuracy, the connection weight between two objects can be determined according to the true distance between the two objects.
具体地,可以将确定的任意两个对象之间的距离映射于由第三预设阈值与第四预设阈值形成的区间内。其中,上述第三预设阈值与上述第四预设阈值为经验阈值。在一些例子中,上述第三预设阈值为0,上述第四预设阈值为1。Specifically, the determined distance between any two objects may be mapped into an interval formed by the third preset threshold and the fourth preset threshold. The third preset threshold and the fourth preset threshold are empirical thresholds. In some examples, the third preset threshold is 0, and the fourth preset threshold is 1.
在完成上述映射后,可以将映射后的任意两个对象之间的距离确定为上述任意两个对象之间的连接权重,并通过上述任意两个对象之间的连接权重指示上述任意两个对象之间的连接关系。After the above-mentioned mapping is completed, the distance between any two objects after the mapping can be determined as the connection weight between the above-mentioned arbitrary two objects, and the above-mentioned arbitrary two objects are indicated by the connection weight between the above-mentioned arbitrary two objects connection between.
由于上述例子中通过两个对象的真实距离确定两个对象之间的连接关系,因此上述时空图可以指示出更加贴近实际的距离信息,从而进一步提升分类精确性。Since the connection relationship between the two objects is determined by the real distance of the two objects in the above example, the above space-time map can indicate the distance information that is closer to the actual, thereby further improving the classification accuracy.
在确定上述目标图像序列包括的各图像包含的任意两个对象之间的连接关系后,可以继续执行S104。After the connection relationship between any two objects included in each image included in the target image sequence is determined, S104 may be continued.
请继续参见图2,可以通过图卷积模型实现上述S1042。Please continue to refer to Fig. 2, the above S1042 can be implemented by a graph convolution model.
其中,上述图卷积模型,可以是基于时空图卷积处理网络构建的模型。其中上述时空图卷积网络至少包括用于对各帧图像进行空间图卷积处理的空间图卷积网络(GCN),以及用于对各帧图像对应的图特征进行时域卷积的时域卷积网络(TCN)。The above graph convolution model may be a model constructed based on a spatiotemporal graph convolution processing network. The above-mentioned spatiotemporal graph convolution network at least includes a spatial graph convolution network (GCN) for performing spatial graph convolution processing on each frame of images, and a time domain for performing temporal convolution on the graph features corresponding to each frame image. Convolutional Networks (TCNs).
请参见图4,图4为本申请示出的一种图卷积处理流程示意图。Please refer to FIG. 4 , which is a schematic diagram of a graph convolution processing flow shown in the present application.
如图4所示,在上述S1042时,可以将上述位置变化信息输入图卷积模型包括的GCN中执行S402,基于上述位置变化信息表征的上述目标图像序列包括的各图像中的对象位置信息,以及上述各图像中对象之间的连接关系,对上述各图像分别进行空间图卷积处理,得到各图像分别对应的图特征。As shown in FIG. 4 , in the above-mentioned S1042, the above-mentioned position change information may be input into the GCN included in the graph convolution model to execute S402, based on the object position information in each image included in the above-mentioned target image sequence represented by the above-mentioned position change information, and the connection relationship between the objects in each of the above images, perform spatial graph convolution processing on each of the above-mentioned images, respectively, to obtain the graph features corresponding to each of the images.
在本步骤中,可以基于所述各图像中对象之间的连接关系,确定所述各图像分别对应的邻接矩阵。在一些实施例中,可以先生成与各图像对应的拓扑图。在实际应用中, 可以分别针对各图像,将各图像中的各对象作为拓扑图的顶点V,并依据各对象之间的连接关系确定边E,得到各图像分别对应的拓扑图。In this step, an adjacency matrix corresponding to each image may be determined based on the connection relationship between objects in each image. In some embodiments, a topology map corresponding to each image may be generated first. In practical applications, for each image, each object in each image can be used as the vertex V of the topology graph, and the edge E can be determined according to the connection relationship between the objects to obtain the topological graph corresponding to each image.
生成与各图像对应的拓扑图后,可以基于上述各图像分别对应的拓扑图,确定上述各图像分别对应的邻接矩阵A,以及基于上述对象位置信息,确定上述各图像分别对应的特征矩阵X 0After the topological map corresponding to each image is generated, the adjacency matrix A corresponding to each of the above images can be determined based on the topological map corresponding to each of the above-mentioned images, and the feature matrix X 0 corresponding to each of the above-mentioned images can be determined based on the above-mentioned object position information. .
在确定上述邻接矩阵与上述特征矩阵后,可以基于上述邻接矩阵与上述特征矩阵完成上述空间图卷积处理,得到上述每一图像分别对应的图特征。After the above-mentioned adjacency matrix and the above-mentioned feature matrix are determined, the above-mentioned spatial graph convolution processing may be completed based on the above-mentioned adjacency matrix and the above-mentioned characteristic matrix, so as to obtain the graph features corresponding to each of the above-mentioned images.
需要说明的是,在本申请中不对上述图卷积公式进行特别限定。在一些例子中,可以采用
Figure PCTCN2021103579-appb-000001
其中
Figure PCTCN2021103579-appb-000002
增加自环保持自身的特征。
Figure PCTCN2021103579-appb-000003
Figure PCTCN2021103579-appb-000004
的对角阵。θ是图卷积网络的网络参数(具体训练过程在本申请后续内容中示出,在此不作陈述)。X (1)是GCN中第l+1隐藏层的输入,X (l+1)是经过第l+1隐藏层运算之后的输出。
It should be noted that the above graph convolution formula is not particularly limited in this application. In some instances, it is possible to use
Figure PCTCN2021103579-appb-000001
in
Figure PCTCN2021103579-appb-000002
Increase the self-loop to maintain its own characteristics.
Figure PCTCN2021103579-appb-000003
Yes
Figure PCTCN2021103579-appb-000004
diagonal matrix. θ is the network parameter of the graph convolutional network (the specific training process is shown in the subsequent content of this application, and will not be described here). X (1) is the input of the l+1st hidden layer in GCN, and X (l+1) is the output after the operation of the l+1st hidden layer.
在得到上述各图像分别对应的图特征后,可以将上述图特征输入图卷积模型包括的TCN中执行S404,对上述各图像分别对应的图特征进行时域卷积处理,得到与上述目标图像序列对应的提取特征。After the graph features corresponding to the above images are obtained, the graph features can be input into the TCN included in the graph convolution model to execute S404, and time domain convolution processing is performed on the graph features corresponding to the above images to obtain the target image. The extracted features corresponding to the sequence.
在本步骤中,可以对上述各图像分别对应的图特征按照上述位置变化信息表征的时域信息进行排序。然后基于预设的一维卷积核,对排序后的各图像分别对应的图特征进行一维卷积处理,得到与上述目标图像序列对应的提取特征。In this step, the map features corresponding to each of the above images may be sorted according to the time domain information represented by the above position change information. Then, based on a preset one-dimensional convolution kernel, a one-dimensional convolution process is performed on the image features corresponding to the sorted images to obtain the extracted features corresponding to the above target image sequence.
请继续参见图2,在得到与上述目标图像序列对应的提取特征后,可以继续执行S1044,基于上述提取特征确定上述目标图像序列中的多个上述对象对应的人群行为。Please continue to refer to FIG. 2 , after obtaining the extracted features corresponding to the above target image sequence, S1044 may be continued to determine crowd behaviors corresponding to the above-mentioned objects in the above-mentioned target image sequence based on the above-mentioned extracted features.
在本步骤中,可以将上述提取特征输入预先训练的多分类器中进行分类,从而得到上述人群行为。In this step, the above-mentioned extracted features may be input into a pre-trained multi-classifier for classification, so as to obtain the above-mentioned crowd behavior.
请参见图5,图5为本申请示出的一种分类流程示意图。Please refer to FIG. 5 , which is a schematic diagram of a classification flow shown in this application.
如图5所示,上述多分类器包括下采样单元以及全连接层。其中,上述下采样单元可以用于对提取特征进行处理得到对应的特征向量。例如,上述下采样单元可以是平均池化单元。上述全连接层用于基于上述特征向量进行分类,得到与各预设分类类型对应的置信度分数。As shown in Figure 5, the above-mentioned multi-classifier includes a downsampling unit and a fully connected layer. The above-mentioned down-sampling unit may be used to process the extracted features to obtain corresponding feature vectors. For example, the above-mentioned down-sampling unit may be an average pooling unit. The above-mentioned fully-connected layer is used to classify based on the above-mentioned feature vector, and obtain a confidence score corresponding to each preset classification type.
请继续参见图5,在执行S1044时,可以将上述提取特征输入下采样单元执行S502,对上述提取特征进行平均池化得到对应的特征向量。在得到上述特征向量后可以将该特征向量输入全连接层执行S504,对该特征向量进行全连接处理,得到与各预设分类类 型对应的置信度分数。Please continue to refer to FIG. 5 , when S1044 is executed, the above-mentioned extracted features may be input into the down-sampling unit to execute S502, and the above-mentioned extracted features are averagely pooled to obtain corresponding feature vectors. After the above-mentioned feature vector is obtained, the feature vector can be input into the fully connected layer to execute S504, and the feature vector is fully connected to obtain the confidence score corresponding to each preset classification type.
在得到各置信度分数后,即可将最大置信度分数对应的人群行为类型确定为上述目标图像序列中的多个上述对象对应的人群行为。其中,上述人群行为至少包括以下中的至少一个:行人聚集;行人分散;行人滞留;行人逆流。After each confidence score is obtained, the crowd behavior type corresponding to the maximum confidence score can be determined as the crowd behavior corresponding to the plurality of objects in the target image sequence. Wherein, the above-mentioned crowd behavior at least includes at least one of the following: pedestrians gather; pedestrians are scattered; pedestrians stay; pedestrians reverse flow.
在上述方法中,通过对目标图像序列中出现的对象进行对象跟踪,确定上述对象在上述目标图像序列中的位置变化信息。然后再基于上述位置变化信息进行图卷积处理,得到与上述目标图像序列对应的提取特征,并基于上述提取特征确定上述目标图像序列中的多个上述对象对应的人群行为。从而实现利用图卷积原理,从目标图像序列确定出可以对检测人群行为有益的提取特征,进而实现上述目标图像序列表征的人群行为的精准检测。In the above method, the position change information of the above-mentioned object in the above-mentioned target image sequence is determined by performing object tracking on the object appearing in the above-mentioned target image sequence. Then, graph convolution processing is performed based on the above position change information to obtain extracted features corresponding to the above target image sequence, and based on the above extracted features, crowd behaviors corresponding to the plurality of above objects in the above target image sequence are determined. In this way, the principle of graph convolution is used to determine the extraction features that are beneficial to the detection of crowd behavior from the target image sequence, so as to realize the accurate detection of crowd behavior represented by the target image sequence.
以下结合安防场景进行实施例说明。上述安防场景通常会设置监控设备。该监控设备通常可以采集视频序列。可以理解的是,在安防场景下实际是对健康设备采集的视频序列进行分类。请参见图6,图6为本申请示出的一种视频序列分类流程示意图。Embodiments are described below in combination with security scenarios. The above security scenarios usually set up monitoring equipment. The surveillance equipment typically captures video sequences. It can be understood that in the security scenario, the video sequences collected by the health equipment are actually classified. Please refer to FIG. 6 , which is a schematic diagram of a video sequence classification flow diagram shown in the present application.
在获取目标视频序列后,可以基于坐标确定单元执行S602,对上述目标视频序列包括的各图像分别进行图像处理,确定视频中出现的行人在各图像中的位置信息。After the target video sequence is acquired, S602 may be performed based on the coordinate determination unit to perform image processing on each image included in the target video sequence, to determine the position information of pedestrians appearing in the video in each image.
在确定上述位置信息后,可以基于行人跟踪单元执行S604,基于上述位置信息,对上述行人进行对象跟踪,确定上述行人在上述目标图像序列中的位置变化信息。After the location information is determined, S604 may be executed based on the pedestrian tracking unit, and based on the location information, object tracking is performed on the pedestrian to determine the location change information of the pedestrian in the target image sequence.
在确定上述位置变化信息后,可以基于图卷积分类模型包括的图像卷积模型执行S606,基于上述位置变化信息进行图卷积处理,得到与上述目标图像序列对应的提取特征。After the position change information is determined, S606 may be performed based on the image convolution model included in the graph convolution classification model, and graph convolution processing is performed based on the position change information to obtain the extracted features corresponding to the target image sequence.
上述图卷积分类模型,具体可以是基于图卷积模型与多分类模型构建的分类模型。通过该图卷积分类模型,一方面,可以对时空图进行图卷积操作,确定上述时空图对应的提取特征;另一方面,可以基于上述提取特征对上述目标图像序列进行分类处理,确定该序列的分类类型。The above graph convolution classification model may specifically be a classification model constructed based on a graph convolution model and a multi-classification model. Through the graph convolution classification model, on the one hand, a graph convolution operation can be performed on the spatiotemporal graph to determine the extraction feature corresponding to the above-mentioned spatiotemporal graph; The classification type of the sequence.
在确定上述提取特征后,可以基于上述图卷积分类模型包括的多分类模型执行S608基于上述提取特征确定上述目标图像序列中的多个上述对象对应的人群行为。After the extraction features are determined, S608 may be performed based on the multi-classification model included in the graph convolution classification model to determine crowd behaviors corresponding to the objects in the target image sequence based on the extraction features.
在上述方案中,先利用图卷积原理,基于行人在视频中的位置变化信息确定出可以反映各行人在视频序列中的距离变化信息的提取特征。然后再基于上述提取特征确定上述视频序列的分类类型,从而确定该视频序列中正在发生的行人行为,并根据确定行人行为做出相应安排,减小危险事件发生概率。In the above scheme, the principle of graph convolution is used first, and extraction features that can reflect the distance change information of each pedestrian in the video sequence are determined based on the position change information of the pedestrians in the video sequence. Then, the classification type of the above-mentioned video sequence is determined based on the above-mentioned extracted features, so as to determine the pedestrian behavior occurring in the video sequence, and make corresponding arrangements according to the determined pedestrian behavior to reduce the probability of occurrence of dangerous events.
以上是对本申请示出的图像序列分类方案的介绍,以下对使用的图卷积分类模 型的训练方法进行介绍。The above is an introduction to the image sequence classification scheme shown in this application, and the following describes the training method of the used graph convolution classification model.
上述图卷积分类模型可以用于实现上述图卷积处理。The above graph convolution classification model can be used to implement the above graph convolution processing.
在一些例子中,上述图卷积分类模型可以包括图卷积模型以及多分类模型。其中,上述图卷积模型,可以将目标图像序列中各对象的位置变化信息作为输入进行图卷积处理,得到与上述目标图像序列对应的提取特征。上述多分类模型,可以将上述提取特征作为输入,对上述提取特征进行分类处理,得到上述目标图像序列表征的人群行为。In some examples, the graph convolutional classification models described above may include graph convolutional models as well as multi-classification models. The above-mentioned graph convolution model may use the position change information of each object in the target image sequence as input to perform graph convolution processing, and obtain the extracted features corresponding to the above-mentioned target image sequence. The above-mentioned multi-classification model may take the above-mentioned extracted features as input, and perform classification processing on the above-mentioned extracted features, so as to obtain the crowd behavior represented by the above-mentioned target image sequence.
可以理解的是,对图卷积分类模型的训练实际是确定上述图卷积模型以及上述多分类模型包括的模型参数的过程。It can be understood that the training of the graph convolution classification model is actually a process of determining the model parameters included in the above graph convolution model and the above multi-classification model.
在本申请中提出了一种模型训练方法。该方法通过构建虚拟的训练样本对图卷积分类模型进行训练,从而在缺少真实样本的情形下,也可实现模型训练。A model training method is proposed in this application. The method trains the graph convolution classification model by constructing virtual training samples, so that model training can also be achieved in the absence of real samples.
请参见图7,图7为本申请示出的一种模型训练方法的方法流程图。如图7所示,上述训练方法包括:S702,生成训练样本,其中,上述训练样本具有包含多个对象的位置变化信息,以及具有基于上述多个对象的位置变化信息的人群行为的标注信息。Please refer to FIG. 7 , which is a method flowchart of a model training method shown in this application. As shown in FIG. 7 , the above training method includes: S702 , generating a training sample, wherein the training sample has position change information including multiple objects, and has annotation information of crowd behavior based on the position change information of the multiple objects.
在本步骤中,可以先执行S7022,基于运动仿真平台,设置视频中出现的对象对应的运动模式。In this step, S7022 may be executed first, and based on the motion simulation platform, the motion mode corresponding to the object appearing in the video is set.
上述运动仿真平台,具体是可以进行运动模拟的任一平台。在一些例子中,上述运动仿真平台可以是游戏开发平台。The above-mentioned motion simulation platform is specifically any platform that can perform motion simulation. In some examples, the motion simulation platform described above may be a game development platform.
上述运动模式,可以包括速度与运动方向。通过上述运动模式,一方面可以确定对象在上述视频包括的各帧图像中的坐标,从而确定各对象在上述视频中的位置变化信息。另一方面,可以得到上述视频表征的人群行为。例如,在安防场景下,当各行人的运动模式为朝向同一方向时,即可确定视频表征的人群行为为行人聚集;反之则确定视频表征的人群行为为行人分散。The above-mentioned movement mode may include speed and movement direction. Through the above motion mode, on the one hand, the coordinates of the objects in each frame of images included in the above video can be determined, so as to determine the position change information of each object in the above video. On the other hand, the crowd behavior represented by the above video can be obtained. For example, in a security scene, when the movement patterns of pedestrians are in the same direction, the crowd behavior represented by the video can be determined to be pedestrian gathering; otherwise, the crowd behavior represented by the video can be determined to be pedestrian dispersion.
在确定各对象的运动模式后,可以执行S7024,基于上述运动模式,确定各对象对应的位置变化信息,以及确定上述各对象对应的位置变化信息表征的人群行为。其中上述人群行为可以包括行人聚集,行人分散与行人滞留等。After determining the motion mode of each object, S7024 may be executed to determine the position change information corresponding to each object based on the above motion mode, and determine the crowd behavior represented by the position change information corresponding to each object. The above-mentioned crowd behaviors may include pedestrian gathering, pedestrian dispersion, and pedestrian retention.
在确定上述位置变化信息以及上述视频表征的人群行为后,可以执行S7026,基于上述位置变化信息,以及上述位置变化信息表征的人群行为,生成上述训练样本。After the location change information and the crowd behavior represented by the video are determined, S7026 may be executed to generate the training sample based on the location change information and the crowd behavior represented by the location change information.
在本步骤中,可以采用one-hot编码等方式对位置变化信息以及上述分类型进行编码,从而得到若干训练样本。本申请不对上述编码的具体方式进行限定。In this step, the position change information and the above classification types may be encoded by means of one-hot encoding, so as to obtain several training samples. The present application does not limit the specific manner of the above encoding.
在得到上述训练样本后,可以继续执行S704,基于预设损失信息,以及上述训练样本对上述图卷积分类模型进行训练,直至该模型收敛。After the above-mentioned training samples are obtained, S704 may be continued, and the above-mentioned graph convolution classification model is trained based on the preset loss information and the above-mentioned training samples, until the model converges.
上述预设损失信息可以是根据经验设定的损失信息。The above-mentioned preset loss information may be loss information set according to experience.
在对模型训练时,可以先指定诸如学习率、训练循环次数等超参数。在确定上述超参数之后,可以基于上述训练样本对上述图卷积分类模型(以下简称该模型)进行有监督训练。When training a model, you can first specify hyperparameters such as learning rate, number of training loops, etc. After the above hyperparameters are determined, the above graph convolution classification model (hereinafter referred to as the model) may be supervised training based on the above training samples.
在一次有监督训练过程中,可以进行前向传播得到该模型输出的计算结果。在得到该模型输出的计算结果后,可以基于上述预设损失信息评价真实的分类类型与上述计算结果之间的误差。在得到上述误差之后,可以采用随机梯度下降法确定下降梯度。在确定下降梯度后,可以基于反向传播更新该模型对应的模型参数。During a supervised training process, forward propagation can be performed to obtain the computational results output by the model. After the calculation result output by the model is obtained, the error between the real classification type and the above calculation result can be evaluated based on the above preset loss information. After obtaining the above error, the stochastic gradient descent method can be used to determine the descending gradient. After the descending gradient is determined, the model parameters corresponding to the model can be updated based on backpropagation.
然后可以重复上述过程,直至该模型收敛。需要说明的是,上述模型收敛的条件可以是诸如达到预设训练次数,或连续M次前向传播后得到误差的变化量小于一定阈值等。本申请不对模型收敛的条件进行特别限定。The above process can then be repeated until the model converges. It should be noted that the condition for the convergence of the above model may be, for example, that the preset number of training times is reached, or the variation of the error obtained after M consecutive forward propagations is less than a certain threshold. The present application does not specifically limit the conditions for model convergence.
在上述训练方法中,由于使用了训练样本对图卷积分类模型进行训练,从而实现训练过程中无需依赖真实训练样本。In the above training method, since the training samples are used to train the graph convolution classification model, the training process does not need to rely on real training samples.
在一些例子中,还可以对用于确定对象位置的对象位置预测模型、进行对象跟踪的对象跟踪模型以及用于进行图卷积处理和分类的图卷积分类模型进行联合训练。In some examples, an object location prediction model for determining object location, an object tracking model for object tracking, and a graph convolution classification model for graph convolution processing and classification may also be jointly trained.
在一些例子中,可以通过运动仿真平台构建表征行人聚集、行人分散等视频,并对构建的视频进行人群行为标注,得到训练样本。In some examples, videos representing pedestrian gathering and pedestrian dispersion can be constructed through a motion simulation platform, and crowd behavior annotations can be performed on the constructed videos to obtain training samples.
在得到训练样本后,可将训练样本输入至上述对象位置预测模型,得到第一计算结果。然后将上述第一计算结果输入上述对象跟踪模型,得到第二计算结果。之后再将上述第二计算结果输入上述图卷积分类模型得到针对视频表征的人群行为检测结果。After the training samples are obtained, the training samples can be input into the above-mentioned object position prediction model to obtain the first calculation result. Then, the above-mentioned first calculation result is input into the above-mentioned object tracking model to obtain the second calculation result. Then, the above-mentioned second calculation result is input into the above-mentioned graph convolution classification model to obtain the crowd behavior detection result for the video representation.
在得到检测结果后,可以根据与上述虚拟识别对应的标注信息,利用反向传播法完成各模型的参数更新。After the detection result is obtained, the parameter update of each model can be completed by using the back-propagation method according to the label information corresponding to the above-mentioned virtual identification.
在上述例子中,可以实现对各模型的联合训练,提升训练效率。In the above example, joint training of each model can be realized to improve training efficiency.
与上述任一实施例相对应的,本申请还提出一种人群行为检测装置。Corresponding to any of the above embodiments, the present application further provides a crowd behavior detection device.
请参见图8,图8为本申请示出的一种人群行为检测装置的结构示意图。Please refer to FIG. 8 , which is a schematic structural diagram of a crowd behavior detection apparatus shown in the present application.
如图8说是,上述装置80包括:位置变化信息确定模块81,基于对包含多个对象的目标图像序列中出现的至少一个对象的对象跟踪结果,确定每一对象在上述目标图像序列中的位置变化信息;人群行为检测模块82,用于基于上述目标图像序列中获得的上述位置变化信息进行图卷积处理,并基于上述图卷积获得的提取特征确定上述目标图像序列中的多个上述对象对应的人群行为。As shown in FIG. 8 , the above-mentioned apparatus 80 includes: a position change information determination module 81 , based on the object tracking result of at least one object appearing in the target image sequence including a plurality of objects, to determine the position of each object in the above-mentioned target image sequence. position change information; the crowd behavior detection module 82 is configured to perform graph convolution processing based on the above position change information obtained in the above-mentioned target image sequence, and determine a plurality of above-mentioned target image sequences based on the extracted features obtained by the above-mentioned graph convolution Crowd behavior corresponding to the object.
在示出的一些实施例中,上述位置变化信息确定模块81具体用于:对上述目标 图像序列包括的每一图像分别进行图像处理,确定上述每一对象分别在各图像中的位置信息;对上述每一对象进行对象跟踪,以基于跟踪结果以及上述位置信息,确定上述每一对象在上述目标图像序列中的位置变化信息。In some of the illustrated embodiments, the above-mentioned position change information determination module 81 is specifically configured to: perform image processing on each image included in the above-mentioned target image sequence, respectively, to determine the position information of each of the above-mentioned objects in each image; Object tracking is performed for each of the above-mentioned objects, so as to determine the position change information of each of the above-mentioned objects in the above-mentioned target image sequence based on the tracking result and the above-mentioned position information.
在示出的一些实施例中,上述位置变化信息确定模块81具体用于:利用卡尔曼滤波算法或者对象检测模型,对上述每一对象进行对象跟踪;基于跟踪到的同一对象在各图像中的位置信息,确定上述每一对象的位置变化信息。In some of the illustrated embodiments, the above-mentioned position change information determination module 81 is specifically configured to: use a Kalman filter algorithm or an object detection model to perform object tracking on each of the above-mentioned objects; Position information, to determine the position change information of each of the above objects.
在示出的一些实施例中,上述人群行为检测模块82包括:空间图卷积模块,用于基于上述位置变化信息表征的上述目标图像序列包括的各图像中的对象位置信息以及上述各图像中对象之间的连接关系,对上述各图像分别进行空间图卷积处理,得到各图像分别对应的图特征;人群行为确定模块,用于对上述各图像分别对应的图特征进行时域卷积处理,并基于上述时域卷积处理得到的提取特征确定上述目标图像序列中的多个上述对象对应的人群行为;其中,上述人群行为至少包括以下中的至少一个:行人聚集;行人分散;行人滞留;行人逆流。In some of the illustrated embodiments, the above-mentioned crowd behavior detection module 82 includes: a spatial graph convolution module, which is used for the object position information in each image included in the above-mentioned target image sequence represented by the above-mentioned position change information and the object position information in each of the above-mentioned images. The connection relationship between objects, perform spatial graph convolution processing on each of the above images, respectively, to obtain the corresponding graph features of each image; the crowd behavior determination module is used to perform temporal convolution processing on the graph features corresponding to each of the above images. , and determine crowd behaviors corresponding to multiple objects in the target image sequence based on the extracted features obtained by the temporal convolution processing; wherein, the crowd behaviors include at least one of the following: pedestrians gather; pedestrians are scattered; pedestrians stay ; Pedestrian countercurrent.
在示出的一些实施例中,上述空间图卷积处理模块具体用于:基于上述各图像中对象之间的连接关系,确定上述各图像分别对应的邻接矩阵;基于上述对象位置信息,确定上述各图像分别对应的特征矩阵;基于上述邻接矩阵与上述特征矩阵完成上述空间图卷积处理,得到上述每一图像分别对应的图特征。In some of the illustrated embodiments, the above-mentioned spatial graph convolution processing module is specifically configured to: determine the adjacency matrix corresponding to each of the above-mentioned images based on the connection relationship between the objects in each of the above-mentioned images; The feature matrix corresponding to each image; the spatial graph convolution process is completed based on the adjacency matrix and the feature matrix, and the graph feature corresponding to each image is obtained.
在示出的一些实施例中,上述装置80还包括:连接关系确定模块,用于确定上述目标图像序列包括的各图像包含的任意两个对象之间的连接关系。In some of the illustrated embodiments, the above-mentioned apparatus 80 further includes: a connection relationship determination module, configured to determine the connection relationship between any two objects included in each image included in the above-mentioned target image sequence.
在示出的一些实施例中,上述连接关系确定模块具体用于:提取所述各图像包含的各对象在图像中所处区域对应的图像特征;所述图像特征表征各对象所处位置的图像信息;基于各对象对应的图像特征,确定各对象中任意两个对象之间的相似度;将未达到第一预设阈值的相似度对应的两个对象确定为具有连接关系的两个对象。In some of the illustrated embodiments, the above-mentioned connection relationship determination module is specifically configured to: extract image features corresponding to the regions where the objects included in the images are located in the image; the image features represent the images where the objects are located information; based on the image features corresponding to each object, determine the similarity between any two objects in each object; and determine the two objects corresponding to the similarity not reaching the first preset threshold as two objects with a connection relationship.
在示出的一些实施例中,上述连接关系确定模块具体用于:对上述各图像分别进行图像处理,确定上述对象在各图像中的位置信息;基于各对象对应的位置信息,确定各对象中任意两个对象之间的距离;基于上述距离确定各图像包含的任意两个对象之间的连接关系。In some of the illustrated embodiments, the connection relationship determining module is specifically configured to: perform image processing on each of the above-mentioned images, respectively, to determine the position information of the above-mentioned object in each image; The distance between any two objects; the connection relationship between any two objects included in each image is determined based on the above distance.
在示出的一些实施例中,上述连接关系确定模块具体用于:将确定的任意两个对象之间的距离映射于由第三预设阈值与第四预设阈值形成的区间内;将映射后的任意两个对象之间的距离确定为上述任意两个对象之间的连接权重;通过上述任意两个对象之间的连接权重指示上述任意两个对象之间的连接关系。In some of the illustrated embodiments, the above-mentioned connection relationship determination module is specifically configured to: map the determined distance between any two objects to the interval formed by the third preset threshold and the fourth preset threshold; The distance between any two following objects is determined as the connection weight between the above-mentioned arbitrary two objects; the connection relationship between the above-mentioned arbitrary two objects is indicated by the connection weight between the above-mentioned arbitrary two objects.
在示出的一些实施例中,通过图卷积分类模型实现上述图卷积处理;其中,上述图卷积分类模型的训练装置包括:生成模块,用于生成训练样本,其中,上述训练样本具有包含多个对象的位置变化信息,以及具有基于上述多个对象的位置变化信息的人群行为的标注信息;训练模块,用于基于上述位置变化信息和上述人群行为的标注信息对预设的图卷积模型进行训练,得到上述图卷积分类模型。In some of the illustrated embodiments, the graph convolution processing is implemented by a graph convolution classification model; wherein, the training device for the graph convolution classification model includes: a generating module for generating training samples, wherein the training samples have Contains position change information of a plurality of objects, and annotation information of crowd behaviors based on the position change information of the above-mentioned multiple objects; a training module is used for the preset map volume based on the position change information and the annotation information of the crowd behaviors. The product model is trained to obtain the above graph convolution classification model.
在示出的一些实施例中,上述生成模块具体用于:基于运动仿真平台,设置多个对象对应的运动模式;基于上述运动模式,确定各对象对应的位置变化信息;确定上述各对象对应的位置变化信息表征的人群行为;基于上述位置变化信息,以及上述位置变化信息表征的人群行为,生成上述训练样本。In some of the illustrated embodiments, the above-mentioned generating module is specifically used for: setting motion patterns corresponding to multiple objects based on the motion simulation platform; determining the position change information corresponding to each object based on the above-mentioned motion pattern; The crowd behavior represented by the position change information; the above training samples are generated based on the above position change information and the crowd behavior represented by the above position change information.
本申请示出的人群行为检测装置的实施例可以应用于电子设备上。相应地,本申请公开了一种电子设备,该设备可以包括:处理器和用于存储处理器可执行指令的存储器。其中,上述处理器被配置为调用上述存储器中存储的可执行指令,实现如上述任一实施例示出的人群行为检测方法。The embodiments of the crowd behavior detection apparatus shown in this application can be applied to electronic devices. Accordingly, the present application discloses an electronic device, which may include: a processor and a memory for storing instructions executable by the processor. Wherein, the above-mentioned processor is configured to invoke the executable instructions stored in the above-mentioned memory to implement the crowd behavior detection method shown in any of the above-mentioned embodiments.
请参见图9,图9为本申请示出的一种电子设备的硬件结构示意图。Please refer to FIG. 9 , which is a schematic diagram of a hardware structure of an electronic device shown in this application.
如图9所示,该电子设备可以包括用于执行指令的处理器,用于进行网络连接的网络接口,用于为处理器存储运行数据的内存,以及用于存储人群行为检测装置对应指令的非易失性存储器。As shown in FIG. 9 , the electronic device may include a processor for executing instructions, a network interface for network connection, a memory for storing operating data for the processor, and a memory for storing instructions corresponding to the crowd behavior detection apparatus. non-volatile memory.
其中,上述装置的实施例可以通过软件实现,也可以通过硬件或者软硬件结合的方式实现。以软件实现为例,作为一个逻辑意义上的装置,是通过其所在电子设备的处理器将非易失性存储器中对应的计算机程序指令读取到内存中运行形成的。从硬件层面而言,除了图9所示的处理器、内存、网络接口、以及非易失性存储器之外,实施例中装置所在的电子设备通常根据该电子设备的实际功能,还可以包括其他硬件,对此不再赘述。可以理解的是,为了提升处理速度,人群行为检测装置对应指令也可以直接存储于内存中,在此不作限定。The embodiments of the foregoing apparatus may be implemented by software, or may be implemented by hardware or a combination of software and hardware. Taking software implementation as an example, a device in a logical sense is formed by reading the corresponding computer program instructions in the non-volatile memory into the memory for operation by the processor of the electronic device where the device is located. From a hardware perspective, in addition to the processor, memory, network interface, and non-volatile memory shown in FIG. 9 , the electronic device where the apparatus is located in the embodiment may also include other electronic devices based on the actual functions of the electronic device. Hardware, no further details on this. It can be understood that, in order to improve the processing speed, the corresponding instructions of the crowd behavior detection apparatus may also be directly stored in the memory, which is not limited herein.
本申请提出一种计算机可读存储介质,可以是易失性存储介质或非易失性存储介质,上述存储介质存储有计算机程序,上述计算机程序用于执行如前述任一实施例示出的人群行为检测方法。This application proposes a computer-readable storage medium, which may be a volatile storage medium or a non-volatile storage medium. The storage medium stores a computer program, and the computer program is used to execute the crowd behavior shown in any of the foregoing embodiments. Detection method.
本领域技术人员应明白,本申请一个或多个实施例可提供为方法、系统或计算机程序产品。因此,本申请一个或多个实施例可采用完全硬件实施例、完全软件实施例或结合软件和硬件方面的实施例的形式。而且,本申请一个或多个实施例可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(可以包括但不限于磁盘 存储器、CD-ROM、光学存储器等)上述实施的计算机程序产品的形式。As will be appreciated by those skilled in the art, one or more embodiments of the present application may be provided as a method, system or computer program product. Accordingly, one or more embodiments of the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, one or more embodiments of the present application may employ a computer implemented above in one or more computer-usable storage media (which may include, but are not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein The form of the program product.
本申请中的“和/或”表示至少具有两者中的其中一个,例如,“A和/或B”可以包括三种方案:A、B、以及“A和B”。In this application, "and/or" means at least one of the two, for example, "A and/or B" may include three schemes: A, B, and "A and B".
本申请中的各个实施例均采用递进的方式描述,各个实施例之间相同相似的部分互相参见即可,每个实施例重点说明的都是与其他实施例的不同之处。尤其,对于数据处理设备实施例而言,由于其基本相似于方法实施例,所以描述的比较简单,相关之处参见方法实施例的部分说明即可。Each embodiment in this application is described in a progressive manner, and the same and similar parts between the various embodiments may be referred to each other, and each embodiment focuses on the differences from other embodiments. In particular, for the data processing device embodiment, since it is basically similar to the method embodiment, the description is relatively simple, and for related parts, please refer to the partial description of the method embodiment.
上述对本申请特定实施例进行了描述。其它实施例在所附权利要求书的范围内。在一些情况下,在权利要求书中记载的行为或步骤可以按照不同于实施例中的顺序来执行并且仍然可以实现期望的结果。另外,在附图中描绘的过程不一定要求示出的特定顺序或者连续顺序才能实现期望的结果。在某些实施方式中,多任务处理和并行处理也是可以的或者可能是有利的。The foregoing describes specific embodiments of the present application. Other embodiments are within the scope of the appended claims. In some cases, the acts or steps recited in the claims can be performed in an order different from that in the embodiments and still achieve desirable results. Additionally, the processes depicted in the figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.
本申请中描述的主题及功能操作的实施例可以在以下中实现:数字电子电路、有形体现的计算机软件或固件、可以包括本申请中公开的结构及其结构性等同物的计算机硬件、或者它们中的一个或多个的组合。本申请中描述的主题的实施例可以实现为一个或多个计算机程序,即编码在有形非暂时性程序载体上述以被数据处理装置执行或控制数据处理装置的操作的计算机程序指令中的一个或多个模块。可替代地或附加地,程序指令可以被编码在人工生成的传播信号上述,例如机器生成的电、光或电磁信号,该信号被生成以将信息编码并传输到合适的接收机装置以由数据处理装置执行。计算机存储介质可以是机器可读存储设备、机器可读存储基板、随机或串行存取存储器设备、或它们中的一个或多个的组合。Embodiments of the subject matter and functional operations described in this application can be implemented in digital electronic circuits, in tangible embodiment of computer software or firmware, in computer hardware which can include the structures disclosed in this application and their structural equivalents, or in A combination of one or more of. Embodiments of the subject matter described in this application may be implemented as one or more computer programs, ie, one or more of the computer program instructions encoded on a tangible non-transitory program carrier as described above for execution by or to control the operation of a data processing apparatus or multiple modules. Alternatively or additionally, the program instructions may be encoded in an artificially generated propagated signal as described above, such as a machine-generated electrical, optical or electromagnetic signal, which is generated to encode and transmit information to a suitable receiver device for use by the data. The processing device executes. The computer storage medium may be a machine-readable storage device, a machine-readable storage substrate, a random or serial access memory device, or a combination of one or more of these.
本申请中描述的处理及逻辑流程可以由执行一个或多个计算机程序的一个或多个可编程计算机执行,以通过根据输入数据进行操作并生成输出来执行相应的功能。上述处理及逻辑流程还可以由专用逻辑电路—例如FPGA(现场可编程门阵列)或ASIC(专用集成电路)来执行,并且装置也可以实现为专用逻辑电路。The processes and logic flows described in this application can be performed by one or more programmable computers executing one or more computer programs to perform corresponding functions by operating on input data and generating output. The processes and logic flows described above can also be performed by, and apparatus can also be implemented as, special purpose logic circuitry, eg, an FPGA (Field Programmable Gate Array) or an ASIC (Application Specific Integrated Circuit).
适合用于执行计算机程序的计算机可以包括,例如通用和/或专用微处理器,或任何其他类型的中央处理单元。通常,中央处理单元将从只读存储器和/或随机存取存储器接收指令和数据。计算机的基本组件可以包括用于实施或执行指令的中央处理单元以及用于存储指令和数据的一个或多个存储器设备。通常,计算机还将可以包括用于存储数据的一个或多个大容量存储设备,例如磁盘、磁光盘或光盘等,或者计算机将可操作地与此大容量存储设备耦接以从其接收数据或向其传送数据,抑或两种情况兼而有之。 然而,计算机不是必须具有这样的设备。此外,计算机可嵌入在另一设备中,例如移动电话、个人数字助理(PDA)、移动音频或视频播放器、游戏操纵台、全球定位系统(GPS)接收机、或例如通用串行总线(USB)闪存驱动器的便携式存储设备,仅举几例。A computer suitable for the execution of a computer program may include, for example, a general and/or special purpose microprocessor, or any other type of central processing unit. Typically, the central processing unit will receive instructions and data from read only memory and/or random access memory. The basic components of a computer may include a central processing unit for implementing or executing instructions and one or more memory devices for storing instructions and data. Typically, a computer will also include, or be operably coupled to, such mass storage devices to receive data therefrom or to include one or more mass storage devices, such as magnetic disks, magneto-optical disks, or optical disks, etc., for storing data. Send data to it, or both. However, the computer does not have to have such a device. Additionally, the computer may be embedded in another device, such as a mobile phone, personal digital assistant (PDA), mobile audio or video player, game console, global positioning system (GPS) receiver, or a universal serial bus (USB) ) flash drives for portable storage devices, to name a few.
适合于存储计算机程序指令和数据的计算机可读介质可以包括所有形式的非易失性存储器、媒介和存储器设备,例如可以包括半导体存储器设备(例如EPROM、EEPROM和闪存设备)、磁盘(例如内部硬盘或可移动盘)、磁光盘以及CD ROM和DVD-ROM盘。处理器和存储器可由专用逻辑电路补充或并入专用逻辑电路中。Computer readable media suitable for storage of computer program instructions and data may include all forms of non-volatile memory, media, and memory devices, and may include, for example, semiconductor memory devices (eg, EPROM, EEPROM, and flash memory devices), magnetic disks (eg, internal hard disks) or removable discs), magneto-optical discs, and CD-ROM and DVD-ROM discs. The processor and memory may be supplemented by or incorporated in special purpose logic circuitry.
虽然本申请包含许多具体实施细节,但是这些不应被解释为限制任何公开的范围或所要求保护的范围,而是主要用于描述特定公开的具体实施例的特征。本申请内在多个实施例中描述的某些特征也可以在单个实施例中被组合实施。另一方面,在单个实施例中描述的各种特征也可以在多个实施例中分开实施或以任何合适的子组合来实施。此外,虽然特征可以如上述在某些组合中起作用并且甚至最初如此要求保护,但是来自所要求保护的组合中的一个或多个特征在一些情况下可以从该组合中去除,并且所要求保护的组合可以指向子组合或子组合的变型。While this application contains many specific implementation details, these should not be construed as limiting the scope of any disclosure or what may be claimed, but rather are used primarily to describe features of particular disclosed specific embodiments. Certain features that are described herein in the context of multiple embodiments can also be implemented in combination in a single embodiment. On the other hand, various features that are described in the context of a single embodiment can also be implemented in multiple embodiments separately or in any suitable subcombination. Furthermore, although features may function as above in certain combinations and even be claimed as such initially, one or more features from a claimed combination may in some instances be removed from the combination and claimed A combination of can point to a subcombination or a variation of a subcombination.
类似地,虽然在附图中以特定顺序描绘了操作,但是这不应被理解为要求这些操作以所示的特定顺序执行或顺次执行、或者要求所有例示的操作被执行,以实现期望的结果。在某些情况下,多任务和并行处理可能是有利的。此外,上述实施例中的各种系统模块和组件的分散不应被理解为在所有实施例中均需要这样的分散,并且应当理解,所描述的程序组件和系统通常可以一起集成在单个软件产品中,或者封装成多个软件产品。Similarly, although operations are depicted in the figures in a particular order, this should not be construed as requiring the operations to be performed in the particular order shown or sequentially, or that all illustrated operations be performed, to achieve the desired result. In some cases, multitasking and parallel processing may be advantageous. Furthermore, the dispersal of various system modules and components in the above-described embodiments should not be construed as requiring such dispersal in all embodiments, and it should be understood that the described program components and systems may generally be integrated together in a single software product , or packaged into multiple software products.
由此,主题的特定实施例已被描述。其他实施例在所附权利要求书的范围以内。在某些情况下,权利要求书中记载的动作可以以不同的顺序执行并且仍实现期望的结果。此外,附图中描绘的处理并非必需所示的特定顺序或顺次顺序,以实现期望的结果。在某些实现中,多任务和并行处理可能是有利的。Thus, specific embodiments of the subject matter have been described. Other embodiments are within the scope of the appended claims. In some cases, the actions recited in the claims can be performed in a different order and still achieve desirable results. Furthermore, the processes depicted in the figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some implementations, multitasking and parallel processing may be advantageous.
以上述仅为本申请一个或多个实施例的较佳实施例而已,并不用以限制本申请一个或多个实施例,凡在本申请一个或多个实施例的精神和原则之内,所做的任何修改、等同替换、改进等,均应包含在本申请一个或多个实施例保护的范围之内。The above are only preferred embodiments of one or more embodiments of the present application, and are not intended to limit one or more embodiments of the present application. All within the spirit and principles of one or more embodiments of the present application, all Any modification, equivalent replacement, improvement, etc. made should be included within the protection scope of one or more embodiments of the present application.

Claims (15)

  1. 一种人群行为检测方法,其特征在于,所述方法包括:A method for detecting crowd behavior, characterized in that the method comprises:
    对包含多个对象的目标图像序列中出现的至少一个对象进行对象跟踪,确定每一对象在所述目标图像序列中的位置变化信息;Perform object tracking on at least one object appearing in a target image sequence including multiple objects, and determine the position change information of each object in the target image sequence;
    基于所述目标图像序列中获得的所述位置变化信息进行图卷积处理,并基于所述图卷积获得的提取特征确定所述目标图像序列中的多个所述对象对应的人群行为。Graph convolution processing is performed based on the position change information obtained in the target image sequence, and crowd behaviors corresponding to a plurality of the objects in the target image sequence are determined based on the extracted features obtained by the graph convolution.
  2. 根据权利要求1所述的方法,其特征在于,所述对包含多个对象的目标图像序列中出现的至少一个对象进行对象跟踪,确定每一对象在所述目标图像序列中的位置变化信息,包括:The method according to claim 1, wherein the object tracking is performed on at least one object appearing in a target image sequence including multiple objects, and the position change information of each object in the target image sequence is determined, include:
    对所述目标图像序列包括的每一图像分别进行图像处理,确定所述每一对象分别在相应图像中的位置信息;Perform image processing on each image included in the target image sequence to determine the position information of each object in the corresponding image;
    对所述每一对象进行对象跟踪,以基于跟踪结果以及所述位置信息,确定所述每一对象在所述目标图像序列中的位置变化信息。Object tracking is performed on each object to determine position change information of each object in the target image sequence based on the tracking result and the position information.
  3. 根据权利要求2所述的方法,其特征在于,所述对所述每一对象进行对象跟踪,以基于跟踪结果以及所述位置信息,确定所述每一对象在所述目标图像序列中的位置变化信息,包括:3. The method of claim 2, wherein the object tracking is performed on the each object to determine the position of the each object in the target image sequence based on the tracking result and the position information Change information, including:
    利用卡尔曼滤波算法或者对象检测模型,对所述每一对象进行对象跟踪;Using a Kalman filter algorithm or an object detection model to perform object tracking on each of the objects;
    基于跟踪到的同一对象在相应图像中的位置信息,确定所述每一对象的位置变化信息。Based on the tracked position information of the same object in the corresponding image, the position change information of each object is determined.
  4. 根据权利要求1-3任一所述的方法,其特征在于,所述基于所述目标图像序列中获得的所述位置变化信息进行图卷积处理,得到所述目标图像序列中的多个所述对象对应的人群行为,包括:The method according to any one of claims 1-3, wherein the graph convolution process is performed based on the position change information obtained in the target image sequence to obtain a plurality of all the target image sequences. The behavior of the crowd corresponding to the object described, including:
    基于所述位置变化信息表征的所述目标图像序列包括的至少一个图像中的对象位置信息,以及所述至少一个图像中对象之间的连接关系,对所述至少一个图像分别进行空间图卷积处理,得到至少一个图像分别对应的图特征;Based on the object position information in at least one image included in the target image sequence represented by the position change information, and the connection relationship between objects in the at least one image, spatial graph convolution is performed on the at least one image respectively. processing to obtain the map features corresponding to at least one image respectively;
    对所述至少一个图像分别对应的图特征进行时域卷积处理,并基于所述时域卷积处理得到的提取特征确定所述目标图像序列中的多个所述对象对应的人群行为;其中,所述人群行为至少包括以下中的至少一个:行人聚集;行人分散;行人滞留;行人逆流。Perform time-domain convolution processing on the map features corresponding to the at least one image respectively, and determine crowd behaviors corresponding to a plurality of the objects in the target image sequence based on the extracted features obtained by the time-domain convolution processing; wherein , the crowd behavior at least includes at least one of the following: pedestrians gather; pedestrians disperse; pedestrians stay; pedestrians reverse flow.
  5. 根据权利要求4所述的方法,其特征在于,所述基于所述位置变化信息表征的所述目标图像序列包括的至少一个图像中的对象位置信息,以及所述至少一个图像中对象之间的连接关系,对所述至少一个图像分别进行空间图卷积处理,得到至少一个图像分 别对应的图特征,包括:The method according to claim 4, wherein the target image sequence represented based on the position change information includes the object position information in at least one image, and the position information between the objects in the at least one image. The connection relationship is to perform spatial graph convolution processing on the at least one image respectively to obtain the graph features corresponding to the at least one image respectively, including:
    基于所述至少一个图像中对象之间的连接关系,确定所述至少一个图像分别对应的邻接矩阵;based on the connection relationship between objects in the at least one image, determining an adjacency matrix corresponding to the at least one image respectively;
    基于所述对象位置信息,确定所述至少一个图像分别对应的特征矩阵;determining, based on the object position information, feature matrices corresponding to the at least one image respectively;
    基于所述邻接矩阵与所述特征矩阵完成所述空间图卷积处理,得到所述每一图像分别对应的图特征。The spatial graph convolution process is completed based on the adjacency matrix and the feature matrix, and the graph features corresponding to each image are obtained.
  6. 根据权利要求1-5任一所述的方法,其特征在于,所述基于所述目标图像序列中获得的所述位置变化信息进行图卷积处理,得到与所述目标图像序列对应的提取特征的步骤之前,还包括:The method according to any one of claims 1-5, wherein the graph convolution processing is performed based on the position change information obtained in the target image sequence to obtain the extracted features corresponding to the target image sequence Before the steps, also include:
    确定所述目标图像序列包括的至少一个图像包含的任意两个对象之间的连接关系。A connection relationship between any two objects included in at least one image included in the target image sequence is determined.
  7. 根据权利要求6所述的方法,其特征在于,所述确定所述目标图像序列包括的至少一个图像包含的任意两个对象之间的连接关系,包括:The method according to claim 6, wherein the determining the connection relationship between any two objects included in at least one image included in the target image sequence comprises:
    提取所述至少一个图像包含的至少一个对象在图像中所处区域对应的图像特征;所述图像特征表征至少一个对象所处位置的图像信息;基于至少一个对象对应的图像特征,确定至少一个对象中任意两个对象之间的相似度;extracting image features corresponding to the region where the at least one object contained in the at least one image is located in the image; the image features represent image information of the location where the at least one object is located; and determining the at least one object based on the image features corresponding to the at least one object similarity between any two objects in
    将未达到第一预设阈值的相似度对应的两个对象确定为具有连接关系的两个对象。Two objects corresponding to a degree of similarity that does not reach the first preset threshold are determined as two objects having a connection relationship.
  8. 根据权利要求6所述的方法,其特征在于,所述确定所述目标图像序列包括的至少一个图像包含的任意两个对象之间的连接关系,包括:The method according to claim 6, wherein the determining the connection relationship between any two objects included in at least one image included in the target image sequence comprises:
    对所述至少一个图像分别进行图像处理,确定所述对象在至少一个图像中的位置信息;Perform image processing on the at least one image respectively to determine the position information of the object in the at least one image;
    基于至少一个对象对应的位置信息,确定至少一个对象中任意两个对象之间的距离;determining the distance between any two objects in the at least one object based on the position information corresponding to the at least one object;
    基于所述距离确定至少一个图像包含的任意两个对象之间的连接关系。A connection relationship between any two objects included in the at least one image is determined based on the distance.
  9. 根据权利要求8所述的方法,其特征在于,所述基于所述距离确定至少一个图像包含的任意两个对象之间的连接关系,包括:The method according to claim 8, wherein the determining a connection relationship between any two objects included in the at least one image based on the distance comprises:
    将确定的任意两个对象之间的距离映射于由第三预设阈值与第四预设阈值形成的区间内;mapping the determined distance between any two objects to the interval formed by the third preset threshold and the fourth preset threshold;
    将映射后的任意两个对象之间的距离确定为所述任意两个对象之间的连接权重;determining the distance between any two objects after mapping as the connection weight between the any two objects;
    通过所述任意两个对象之间的连接权重指示所述任意两个对象之间的连接关系。The connection relationship between any two objects is indicated by the connection weight between the any two objects.
  10. 根据权利要求1-9任一所述的方法,其特征在于,The method according to any one of claims 1-9, wherein,
    通过图卷积分类模型实现所述图卷积处理;The graph convolution processing is realized by a graph convolution classification model;
    其中,所述图卷积分类模型的训练方法包括:Wherein, the training method of the graph convolution classification model includes:
    生成训练样本,其中,所述训练样本具有包含多个对象的位置变化信息,以及具有基于所述多个对象的位置变化信息的人群行为的标注信息;generating a training sample, wherein the training sample has position change information including a plurality of objects, and has annotation information of crowd behavior based on the position change information of the plurality of objects;
    基于所述位置变化信息和所述人群行为的标注信息对预设的图卷积模型进行训练,得到所述图卷积分类模型。The preset graph convolution model is trained based on the position change information and the annotation information of the crowd behavior to obtain the graph convolution classification model.
  11. 根据权利要求10所述的方法,其特征在于,所述生成训练样本,包括:The method according to claim 10, wherein the generating training samples comprises:
    基于运动仿真平台,设置多个虚拟对象对应的运动模式;Based on the motion simulation platform, set the motion modes corresponding to multiple virtual objects;
    基于所述运动模式,确定至少一个虚拟对象对应的位置变化信息;Based on the motion pattern, determine position change information corresponding to at least one virtual object;
    确定所述至少一个虚拟对象对应的位置变化信息表征的人群行为;determining the crowd behavior represented by the position change information corresponding to the at least one virtual object;
    基于所述位置变化信息,以及所述位置变化信息表征的人群行为,生成所述训练样本。The training samples are generated based on the location change information and the crowd behavior represented by the location change information.
  12. 一种人群行为检测装置,其特征在于,所述装置包括:A crowd behavior detection device, characterized in that the device comprises:
    位置变化信息确定模块,基于对包含多个对象的目标图像序列中出现的至少一个对象的对象跟踪结果,确定每一对象在所述目标图像序列中的位置变化信息;a position change information determination module, for determining the position change information of each object in the target image sequence based on the object tracking result of at least one object appearing in the target image sequence including a plurality of objects;
    人群行为检测模块,用于基于所述目标图像序列中获得的所述位置变化信息进行图卷积处理,并基于所述图卷积获得的提取特征确定所述目标图像序列中的多个所述对象对应的人群行为。A crowd behavior detection module, configured to perform graph convolution processing based on the position change information obtained in the target image sequence, and determine a plurality of said target image sequences based on the extracted features obtained by the graph convolution Crowd behavior corresponding to the object.
  13. 一种电子设备,其特征在于,所述设备包括:An electronic device, characterized in that the device comprises:
    处理器;processor;
    用于存储所述处理器可执行指令的存储器;a memory for storing the processor-executable instructions;
    其中,所述处理器被配置为调用所述存储器中存储的可执行指令,实现如权利要求1-11任一所述的人群行为检测方法。Wherein, the processor is configured to call executable instructions stored in the memory to implement the crowd behavior detection method according to any one of claims 1-11.
  14. 一种计算机可读存储介质,其特征在于,所述存储介质存储有计算机程序,所述计算机程序用于执行如权利要求1-11任一所述的人群行为检测方法。A computer-readable storage medium, characterized in that the storage medium stores a computer program, and the computer program is used to execute the crowd behavior detection method according to any one of claims 1-11.
  15. 一种计算机程序产品,其特征在于,当所述计算机程序产品在计算机上运行时,使得计算机执行如权利要求1-11任一所述的人群行为检测方法。A computer program product, characterized in that, when the computer program product runs on a computer, the computer is made to execute the crowd behavior detection method according to any one of claims 1-11.
PCT/CN2021/103579 2021-01-26 2021-06-30 Crowd behavior detection method and apparatus, and electronic device, storage medium and computer program product WO2022160591A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
KR1020237016722A KR20230090344A (en) 2021-01-26 2021-06-30 Crowd behavior detection method and device, electronic device, storage medium and computer program product

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110106285.7 2021-01-26
CN202110106285.7A CN112800944B (en) 2021-01-26 2021-01-26 Crowd behavior detection method and device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
WO2022160591A1 true WO2022160591A1 (en) 2022-08-04

Family

ID=75811931

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/103579 WO2022160591A1 (en) 2021-01-26 2021-06-30 Crowd behavior detection method and apparatus, and electronic device, storage medium and computer program product

Country Status (3)

Country Link
KR (1) KR20230090344A (en)
CN (1) CN112800944B (en)
WO (1) WO2022160591A1 (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117409368A (en) * 2023-10-31 2024-01-16 大连海洋大学 Real-time analysis method of fish aggregation behavior and fish hunger behavior based on density distribution
CN118297989A (en) * 2024-06-05 2024-07-05 中国工程物理研究院流体物理研究所 Semi-supervised high-robustness infrared small target tracking method and system
CN118395062A (en) * 2024-04-01 2024-07-26 华中师范大学 Space-time track travel time estimation method based on context topological graph space-time aggregation
CN119992667A (en) * 2025-04-14 2025-05-13 泸州职业技术学院 Group behavior recognition method based on video data

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112800944B (en) * 2021-01-26 2023-12-19 北京市商汤科技开发有限公司 Crowd behavior detection method and device, electronic equipment and storage medium
CN113569766B (en) * 2021-07-30 2022-10-04 中国电子科技集团公司第五十四研究所 Pedestrian abnormal behavior detection method for patrol of unmanned aerial vehicle
CN114639062A (en) * 2022-04-07 2022-06-17 上海闪马智能科技有限公司 Video classification method and device, storage medium, and electronic device
CN114943943B (en) * 2022-05-16 2023-10-03 中国电信股份有限公司 Target track obtaining method, device, equipment and storage medium
CN116311524A (en) * 2023-03-22 2023-06-23 凯通科技股份有限公司 Gait feature determining method and device based on camera set and terminal equipment

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170300759A1 (en) * 2016-03-03 2017-10-19 Brigham Young University Automated multiple target detection and tracking system
CN110827292A (en) * 2019-10-23 2020-02-21 中科智云科技有限公司 Video instance segmentation method and device based on convolutional neural network
CN112016413A (en) * 2020-08-13 2020-12-01 南京领行科技股份有限公司 Method and device for detecting abnormal behaviors between objects
CN112800944A (en) * 2021-01-26 2021-05-14 北京市商汤科技开发有限公司 Crowd behavior detection method and device, electronic equipment and storage medium

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109522793B (en) * 2018-10-10 2021-07-23 华南理工大学 Multi-person abnormal behavior detection and recognition method based on machine vision
CN110163890B (en) * 2019-04-24 2020-11-06 北京航空航天大学 A multi-target tracking method for space-based surveillance
AU2020100371A4 (en) * 2020-03-12 2020-04-16 Jilin University Hierarchical multi-object tracking method based on saliency detection

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170300759A1 (en) * 2016-03-03 2017-10-19 Brigham Young University Automated multiple target detection and tracking system
CN110827292A (en) * 2019-10-23 2020-02-21 中科智云科技有限公司 Video instance segmentation method and device based on convolutional neural network
CN112016413A (en) * 2020-08-13 2020-12-01 南京领行科技股份有限公司 Method and device for detecting abnormal behaviors between objects
CN112800944A (en) * 2021-01-26 2021-05-14 北京市商汤科技开发有限公司 Crowd behavior detection method and device, electronic equipment and storage medium

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117409368A (en) * 2023-10-31 2024-01-16 大连海洋大学 Real-time analysis method of fish aggregation behavior and fish hunger behavior based on density distribution
CN118395062A (en) * 2024-04-01 2024-07-26 华中师范大学 Space-time track travel time estimation method based on context topological graph space-time aggregation
CN118297989A (en) * 2024-06-05 2024-07-05 中国工程物理研究院流体物理研究所 Semi-supervised high-robustness infrared small target tracking method and system
CN119992667A (en) * 2025-04-14 2025-05-13 泸州职业技术学院 Group behavior recognition method based on video data

Also Published As

Publication number Publication date
CN112800944B (en) 2023-12-19
KR20230090344A (en) 2023-06-21
CN112800944A (en) 2021-05-14

Similar Documents

Publication Publication Date Title
WO2022160591A1 (en) Crowd behavior detection method and apparatus, and electronic device, storage medium and computer program product
CN108140032B (en) Apparatus and method for automatic video summarization
CN104573652B (en) Determine the method, apparatus and terminal of the identity of face in facial image
CN109214337B (en) Crowd counting method, device, equipment and computer readable storage medium
CN110378259A (en) A kind of multiple target Activity recognition method and system towards monitor video
US20140139633A1 (en) Method and System for Counting People Using Depth Sensor
WO2022156317A1 (en) Video frame processing method and apparatus, electronic device, and storage medium
AU2021203821B2 (en) Methods, devices, apparatuses and storage media of detecting correlated objects involved in images
CN110147699A (en) A kind of image-recognizing method, device and relevant device
CN111476059A (en) Target detection method and device, computer equipment and storage medium
Huang et al. Enhancing multi-camera people tracking with anchor-guided clustering and spatio-temporal consistency id re-assignment
CN113348465A (en) Method, device, equipment and storage medium for predicting relevance of object in image
CN113557546A (en) Method, device, device and storage medium for detecting associated objects in images
CN117237861A (en) People flow counting methods, equipment and storage media
CN111291695A (en) Personnel violation behavior recognition model training method, recognition method and computer equipment
Wong et al. Multi-camera face detection and recognition in unconstrained environment
Srividya et al. Deep learning techniques for physical abuse detection
CN109961103A (en) Training method for feature extraction model, image feature extraction method and device
Usman Anomalous crowd behavior detection in time varying motion sequences
CN112633496B (en) Processing method and device for detection model
CN115131691A (en) Object matching method and device, electronic equipment and computer-readable storage medium
WO2022144605A1 (en) Methods, devices, apparatuses and storage media of detecting correlated objects in images
CN114219827A (en) Multi-target tracking method and device
HK40046311A (en) Crowd behavior detection method and apparatus, electronic device and storage medium
WO2020136704A1 (en) Information processing apparatus, method, and program

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21922189

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 20237016722

Country of ref document: KR

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21922189

Country of ref document: EP

Kind code of ref document: A1