CN114241373A

CN114241373A - End-to-end vehicle behavior detection method, system, equipment and storage medium

Info

Publication number: CN114241373A
Application number: CN202111526066.0A
Authority: CN
Inventors: 陈伟; 查俊林; 李马天行
Original assignee: Chongqing Innovation Center of Beijing University of Technology
Current assignee: Chongqing Innovation Center of Beijing University of Technology
Priority date: 2021-12-14
Filing date: 2021-12-14
Publication date: 2022-03-25
Anticipated expiration: 2041-12-14

Abstract

The invention provides a method, a system, equipment and a storage medium for detecting end-to-end vehicle behaviors, wherein the method comprises the following steps: acquiring collected real-time video data and a single-frame image; identifying and classifying target objects in the single-frame image through a convolutional neural network; cutting an interested area of a single frame image according to a target object to obtain at least one target image; extracting characteristic data by adopting a multi-target tracking method, and carrying out tracking ID classification on the target image according to the characteristic data to obtain ID information; screening out a target object which is a target image of the vehicle, acquiring corresponding continuous video segments, and performing interactive aggregation operation to acquire an interactive relation of the target object; and performing behavior detection on the target object by combining the ID information, the continuous video clips and the interactive relation, outputting the behavior information of the target object, and positioning the vehicle. The invention can improve the accuracy, the real-time property and the generalization capability of vehicle behavior detection and has wide application range.

Description

End-to-end vehicle behavior detection method, system, equipment and storage medium

Technical Field

The present invention relates to the field of vehicle behavior detection technologies, and in particular, to a method, a system, a device, and a storage medium for end-to-end vehicle behavior detection.

Background

In recent years, with the increasing of automobile holding capacity, the phenomena of automobile congestion and traffic violation frequently occur, and how to use modern technological means to realize efficient and accurate vehicle behavior detection becomes a research hotspot in the traffic field at the present stage. The vehicle behavior detection method based on computer vision is simple and easy to implement, so that the method is widely applied to the fields of intelligent security, traffic monitoring and unmanned system camera detection. At present, the common detection of parking violation in a fixed area based on an image comprises the mode of combining an optical flow method with target detection to judge whether a target exists, but the use scene of the mode is limited, and the detectable vehicle behavior is single.

Some scholars use traditional image processing methods, such as a background subtraction method and an optical flow trajectory tracking method, to realize recognition of vehicle trajectories, but the method is not high in accuracy and poor in generalization capability, and cannot be widely applied to various scenes. The method has the advantages that the method is high in complexity, depends on the accuracy of various sensor information, cannot realize end-to-end behavior detection, and is difficult to apply in the scene of unobvious traffic signs or poor weather conditions. Due to the development of the neural network, some scholars use the convolutional neural network to predict and identify the vehicle behaviors, but most functions tend to the current motion state of a certain vehicle or the state of a driver, and the ability of behavior identification of formation of transportation vehicles is lacked. Therefore, a vehicle behavior detection method with wide application range and high recognition accuracy and generalization capability is lacked in the prior art.

Disclosure of Invention

In view of the above, it is necessary to provide an end-to-end vehicle behavior detection method, system, device, and storage medium.

An end-to-end vehicle behavior detection method, comprising the steps of: acquiring collected real-time video data, and acquiring a single-frame image according to the real-time video data; identifying and classifying the target object in the single-frame image through a convolutional neural network; cutting an interested area of the single-frame image according to the target object to obtain at least one target image; extracting feature data of a target object in the target image by adopting a multi-target tracking method, and carrying out tracking ID classification on the target image according to the feature data to obtain ID information; screening out a target object as a target image of a vehicle, acquiring a continuous video segment according to a single-frame image corresponding to the target image, and performing interactive aggregation according to the single-frame image and the continuous video segment to acquire interactive relations between the target object and other target objects in the single-frame image; and performing behavior detection on the target object by combining the ID information, the continuous video segments and the interactive relation, outputting behavior information of the target object according to a detection result, and positioning the vehicle.

In one embodiment, the acquiring the collected real-time video data and acquiring a single-frame image according to the real-time video data specifically includes: splitting the real-time video data according to seconds to obtain a plurality of single-frame images of one second; and screening out continuous video clips corresponding to the single-frame images according to the single-frame images, and storing the continuous video clips to a memory pool.

In one embodiment, the identifying and classifying the target object in the single frame image through a convolutional neural network specifically includes: acquiring a plurality of historical video data, splitting the historical video data into corresponding historical single-frame images, taking the historical single-frame images as a training set, and taking target objects corresponding to the historical single-frame images as a test set; constructing an initial target detection model based on a YOLOV5 convolutional neural network, training the initial target detection model according to the training set and the test set, and acquiring a target detection model after training is completed; inputting the single-frame image into the target detection model, and outputting a corresponding target object; and classifying the target objects in all the single-frame images through a classifier, and classifying and storing the single-frame images.

In one embodiment, the extracting, by using the multi-target tracking method, feature data of a target object in the target image, performing tracking ID classification on the target image according to the feature data, and acquiring ID information specifically includes: classifying all target images according to the characteristic data, screening out target images with the same target object, storing the same ID marks in the target images, and acquiring corresponding ID information; and classifying all the target images according to the ID marks, wherein the classified target images have the behavior characteristics of the target objects.

In one embodiment, the screening out a target object as a target image of a vehicle, acquiring a continuous video segment according to a single-frame image corresponding to the target image, and acquiring an interaction relationship between the target object and the other target objects in the single-frame image according to the single-frame image and the continuous video segment specifically includes: selecting a vehicle from the single-frame images as a target vehicle object, screening out corresponding target images and single-frame images according to the target vehicle object, and acquiring continuous video segments corresponding to the single-frame images according to the single-frame images corresponding to the target vehicle object; sampling the continuous video clips through two channel paths, and analyzing static content and dynamic content in the continuous video clips; determining whether only one target vehicle object exists in the single-frame image according to the static content and the dynamic content; if so, carrying out interactive aggregation according to the continuous video segments corresponding to the single-frame images and the target image to obtain the interactive relation between the target vehicle object and the rest target objects; and if not, carrying out interactive aggregation according to the continuous video clip corresponding to the single-frame image and the target image to obtain the interactive relation between the target vehicle object and the other vehicle target objects.

In one embodiment, the performing behavior detection on the target object by combining the ID information, the continuous video segments, and the interaction relationship, outputting behavior information of the target object according to a detection result, and positioning the vehicle specifically includes: detecting the behavior of the target vehicle object marked by the current ID according to the interactive relation, the continuous video clip and the ID information to obtain a detection result; acquiring the pixel position, the ID mark and the vehicle behavior type of the target vehicle object according to the detection result; and acquiring the positioning information of the target vehicle object by combining the height of the unmanned aerial vehicle for acquiring the real-time video data and the angle and the focal length of the camera.

An end-to-end vehicle behavior detection system, comprising: the system comprises a single-frame image acquisition module, a target detection module, an image cutting module, a target tracking module, an interactive aggregation module and a behavior detection module; the single-frame image acquisition module is connected with the target detection module, the target detection module is connected with the image cutting module, the image cutting module is connected with the target tracking module, the target tracking module is connected with the interactive aggregation module, and the interactive aggregation module is connected with the behavior detection module; the target detection module is used for carrying out convolution operation on a target area in the single-frame image through a convolution neural network and dividing the category of a target object through a classifier to finish target detection; the image cutting module is used for cutting an interested area of the detected image to obtain at least one target image and inputting the target image to the target tracking module; the target tracking module is used for acquiring real-time video data, extracting feature data of the target object through a multi-task network, tracking the target object for all single-frame images according to the feature data, storing the same ID marks for the single-frame images with the same feature data, acquiring ID information, transmitting the continuous video segments, the feature data and the ID information corresponding to the single-frame images to the memory pool, and transmitting the information of the continuous video segments to the behavior detection module by the memory pool; the interactive aggregation module is used for screening out a target object which is a target image of a vehicle, acquiring a continuous video segment according to a single-frame image corresponding to the target image, and performing interactive aggregation according to the single-frame image and the continuous video segment to acquire interactive relations between the target object and the other target objects in the single-frame image; and the behavior detection module is used for performing behavior detection on the target object by combining the ID information, the continuous video clips and the interactive relation, outputting the behavior information of the target object according to a detection result, and positioning the vehicle.

In one embodiment, the behavior detection module further includes a monocular visual positioning unit, and the monocular visual positioning unit is configured to obtain relevant information of the target object according to the detection result and position the target object according to the relevant information; the interactive aggregation module comprises a V-shaped interactive aggregation module and an O-shaped interactive aggregation module, and the V-shaped interactive aggregation module and the O-shaped interactive aggregation module are in second-order series-parallel connection; the V-shaped interactive aggregation module is used for detecting the interactive relationship between the vehicles, the input ends of the V-shaped interactive aggregation module correspond to target images of target objects of the vehicles, one input end of the V-shaped interactive aggregation module corresponds to the characteristic data input of the target vehicles, the other input ends of the V-shaped interactive aggregation module correspond to the special data input of the non-target vehicles, and the interactive relationship between the target vehicles and the non-target vehicles is obtained through interactive aggregation operation; the O-shaped interactive aggregation module is used for detecting the interactive relationship between the vehicle and other target objects, the input end of the O-shaped interactive aggregation module corresponds to the characteristic data of the target vehicle and the characteristic data of the non-vehicle objects, and the interactive relationship between the vehicle and the non-vehicle objects is obtained through interactive aggregation operation.

An apparatus comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of a method of end-to-end vehicle behavior detection as described in the various embodiments above when executing the program.

A storage medium having stored thereon a computer program which, when executed by a processor, carries out the steps of a method of end-to-end vehicle behaviour detection as described in the various embodiments above.

Compared with the prior art, the invention has the advantages and beneficial effects that: the method can improve the accuracy, the real-time performance and the generalization capability of vehicle behavior detection, has wide application range, and can be used for identifying the vehicle behavior in traffic flow, identifying the vehicle behavior in aerial images, detecting abnormal behaviors in vehicle formation, positioning vehicles and the like.

Drawings

FIG. 1 is a schematic flow diagram of an end-to-end vehicle behavior detection method in one embodiment;

FIG. 2 is a schematic diagram of an end-to-end vehicle behavior detection system in one embodiment;

FIG. 3 is a diagram of the data processing architecture of an end-to-end vehicle behavior detection system in one embodiment;

FIG. 4 is a schematic diagram illustrating the operation of the O-type interworking aggregation module of FIG. 3;

FIG. 5 is a schematic diagram illustrating the operation of the V-shaped interactive aggregation module in FIG. 3;

fig. 6 is a schematic diagram of the internal structure of the apparatus in one embodiment.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings by way of specific embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

The method is based on a latest space-time behavior detection method AIA (Asynchronous interactive Aggregation) to process real-time video images acquired by security monitoring and unmanned system reconnaissance, and can be used for identifying traffic flow vehicle behaviors, identifying vehicle behaviors in aerial images, detecting abnormal behaviors in vehicle formation and the like, wherein the vehicle behaviors mainly comprise vehicle acceleration and deceleration, vehicle abnormal lane changing, vehicle line pressing and vehicle abnormal driving in the vehicle formation. In addition, the monocular vision positioning algorithm can be used for ranging, positioning and locking the position of the identified abnormal running vehicle and giving an alarm or reminding. In the unmanned aerial vehicle system, the advantage of usable unmanned aerial vehicle aviation flight of this application realizes functions such as visual positioning, continuous target tracking reconnaissance to the target vehicle, provides end-to-end solution for the action detection of vehicle, target vehicle location and tracking.

In one embodiment, as shown in FIG. 1, there is provided an end-to-end vehicle behavior detection method comprising the steps of:

and S101, acquiring the acquired real-time video data, and acquiring a single-frame image according to the real-time video data.

Specifically, real-time video data in a road or a parking area can be obtained through unmanned system reconnaissance, and a plurality of single-frame images are split according to the real-time video data, so that the images can be conveniently obtained and processed.

And step S102, identifying and classifying the target object in the single-frame image through a convolutional neural network.

Specifically, a convolution operation is performed on a target area in a single-frame image through a convolution neural network YOLOV5, and classification of target object classes is realized by using a classifier, for example, the target object can be a vehicle, a pedestrian, a lane line identifier, an obstacle, and the like, so as to acquire a corresponding single-frame image.

And step S103, cutting the region of interest of the single-frame image according to the target object to obtain at least one target image.

Specifically, according to the area where the target object is located, cutting the region of interest of the single-frame image to obtain at least one target image, and if a plurality of target objects exist, obtaining a plurality of target images, wherein the plurality of target images correspond to the single-frame image.

Specifically, when cutting is performed, a target object in a single frame image is framed by a detection frame, a target region is obtained, ROI (region of interest) cutting is performed according to the size of the detection frame, the cut image is used as a target image, when a plurality of target objects exist, a plurality of target images can be obtained according to the single frame image, and the plurality of target images all correspond to the single frame image, so that the corresponding single frame image can be conveniently searched according to the target image subsequently.

And step S104, extracting the characteristic data of the target object in the target image by adopting a multi-target tracking method, and carrying out tracking ID classification on the target image according to the characteristic data to obtain ID information.

Specifically, a multi-target tracking method is adopted, a multi-task network is utilized to extract feature data in a cut target image, target objects in all target images are matched according to the feature data, the same ID marks are marked on the same target objects, ID information is obtained, the target images are classified according to the ID marks, the classified target objects have behavior characteristics of the target objects, and behaviors of the target images can be identified according to the target images.

And S105, screening out a target object which is a target image of the vehicle, acquiring a continuous video segment according to a single-frame image corresponding to the target image, and performing interactive aggregation according to the single-frame image and the continuous video segment to acquire interactive relations between the target object and other target objects in the single-frame image.

Specifically, screening out target images of which the target objects are vehicles, taking one of the vehicles as the target vehicle object, further acquiring all the target images of the vehicle by combining an ID mark, acquiring continuous video clips from the previous three seconds to the current moment according to a single-frame image corresponding to the target image, and performing interactive aggregation on the target objects in the single-frame image according to the continuous video clips to acquire interactive relations between the target vehicle object and the other target objects in the single-frame image.

And S106, performing behavior detection on the target object by combining the ID information, the continuous video clips and the interactive relation, outputting the behavior information of the target object according to the detection result, and positioning the vehicle.

Specifically, the method includes the steps of performing behavior detection on a target object by combining ID information, continuous video segments and an interaction relation, obtaining a detection result including a target object behavior, and outputting corresponding behavior information, wherein the behavior information may include a pixel position, an ID mark and a behavior category of the target object, and positioning a vehicle according to the pixel position in the behavior information to achieve end-to-end vehicle behavior detection.

In the embodiment, the collected real-time video data is acquired, single-frame images are split according to the real-time video data, target objects in the single-frame images are identified and classified through a convolutional neural network, the single-frame images are cut according to the target objects to obtain at least one target image, a multi-target tracking method is adopted to extract feature data of the target objects in the target images, the target images are tracked and classified according to the feature data to obtain ID information, the target images of the target objects to vehicles are screened out, continuous video segments are acquired according to the single-frame images corresponding to the target images, behavior detection is carried out on the target objects by combining the ID information and the continuous video segments, behavior information of the target objects is output according to detection results, accuracy, real-time performance and generalization capability of vehicle behavior detection can be improved, and the application range is wide, the method is suitable for vehicle behavior identification including but not limited to traffic flow vehicle behavior identification, vehicle behavior identification in aerial images, abnormal behavior detection in vehicle formation and the like.

Wherein, step S101 specifically includes: splitting the real-time video image according to seconds to obtain a plurality of single-frame images of one second; and screening out continuous video clips corresponding to the single-frame images according to the single-frame images, and storing the continuous video clips to a memory pool.

Specifically, the acquired real-time video data is split according to seconds, so that a plurality of one-second single-frame images are acquired, for example, if the real-time video data of a road in a certain time period is acquired for 1 minute, the 1-minute real-time video data is split, and 60 single-frame images corresponding to each second are acquired; screening out the continuous video segments corresponding to the single-frame images according to the single-frame images, for example, if the current time of the single-frame images is 12 hours, 25 minutes and 25 seconds on a certain day of a certain month in a certain year, the continuous video segments of 12 hours, 25 minutes, 22 seconds and 25 seconds on the corresponding date can be obtained and stored in a memory pool, and the continuous video segments correspond to the single-frame images one to one, so that the subsequent screening of the continuous video segments is facilitated. The time period of the continuous video segments may be set correspondingly according to needs, for example, the time period is set to be a segment from the first three seconds to the current time, or a segment from the first second to the last second of the current time, the time period of the continuous video segments is not set too long, and the time period can be kept within 5 seconds, otherwise, the behavior detection of the target object may be not facilitated due to too many objects.

In order to improve the accuracy of behavior detection, continuous video clips are extracted from the memory pool to serve as the basis of behavior detection, and meanwhile, data and image information of the memory pool are updated in real time, so that the phenomenon that the memory space is too large and the memory is consumed is avoided.

Wherein, step S102 specifically includes: acquiring a plurality of historical video data, splitting the historical video data into corresponding historical single-frame images, taking the historical single-frame images as a training set, and taking target objects corresponding to the historical single-frame images as a test set; constructing an initial target detection model based on a YOLOV5 convolutional neural network, training the initial target detection model according to a training set and a test set, and acquiring the target detection model after training; inputting the single-frame image into a target detection model, and outputting a corresponding target object; and classifying the target objects in all the single-frame images through a classifier, and classifying and storing the single-frame images.

Specifically, a plurality of historical video data are obtained, the historical video data can be related video data of vehicle driving on a road, the historical video data are divided into corresponding historical single-frame images, the historical single-frame images serve as a training set, corresponding target objects serve as a test set, an initial target detection model is established based on a Yolov5 convolutional neural network, the initial target detection model is trained through the training set, the generalization ability of the trained initial target detection model is judged according to a testing machine, and after the generalization ability reaches a preset requirement, a target detection model is obtained, and the target detection model can detect vehicles and other targets related to vehicle behaviors, such as behaviors, lane line marks or obstacles.

The single-frame images are input into a target detection model, target objects in the single-frame images, such as vehicles, pedestrians, lane line marks, obstacles and the like, are recognized through the target detection model, and the single-frame images are classified and stored through a classifier according to the recognized targets, namely the single-frame images of the same category are classified and stored, so that subsequent behavior detection is facilitated. Of course, in the case of performing the behavior detection, only the behavior of the vehicle or the pedestrian is recognized and detected.

Wherein, step S104 specifically includes: screening out target images with the same target object according to the characteristic data, marking the same ID marks on the target images, and acquiring corresponding ID information; and classifying all the target images according to the ID marks, wherein the classified target images have the behavior characteristics of the target object.

Specifically, according to the extracted feature data, feature data in all target images are matched, target images with the same target object are screened out, the same ID marks are stored in the target images with the same target object, the target images are classified according to the ID marks, the classified target images have behavior features of the target object, and the behavior features of the corresponding target object can be identified according to the target images. For example, the target image 1 has the feature data 1 of the vehicle 1, all target images are matched according to the feature data 1, whether the target image of the feature data 1 still exists is determined, if the target image exists, the ID mark 1 is marked on the target image, the target images with all ID marks 1 are stored in the same position, all the target images 1 have the behavior feature of the vehicle 1, and when behavior detection is performed subsequently, the target images of all the vehicles 1 can be acquired according to the ID marks.

Wherein, step S105 specifically includes: selecting a vehicle from the single-frame images as a target vehicle object, screening out corresponding target images and single-frame images according to the target vehicle object, and acquiring continuous video segments corresponding to the single-frame images according to the single-frame images corresponding to the target vehicle object; sampling the continuous video clips through two channel paths, and analyzing static content and dynamic content in the continuous video clips; determining whether only one target vehicle object exists in the single-frame image according to the static content and the dynamic content; if so, carrying out interactive aggregation according to the continuous video segments corresponding to the single-frame images and the target image to obtain the interactive relation between the target vehicle object and the rest target objects; and if not, carrying out interactive aggregation according to the continuous video segments corresponding to the single-frame images and the target images to obtain the interactive relation between the target vehicle object and the other vehicle target objects. .

Specifically, a vehicle is selected from a single-frame image as a target vehicle object, a corresponding target image and a single-frame image are screened out according to the target vehicle object, for example, if the target vehicle object is the vehicle 1, the corresponding target image and the single-frame image are obtained according to the vehicle 1, a corresponding continuous video clip is obtained according to the single-frame image, the continuous video clip is sampled through two channel paths, one of the two channel paths is a slow channel, and is used for analyzing static content in the continuous video clip; while the dynamic content in successive video segments is analyzed through another fast channel. And the slow channel and the fast channel both use a 3D RestNet model, and adopt a slowfast algorithm to classify and output the actions of the target object in the time sequence.

For example, according to the static content and the dynamic content obtained by analysis, it is determined whether only one target vehicle object of the vehicle 1 exists in the single frame image, and if only one target vehicle object exists, the target vehicle object and the rest of the target objects are interactively aggregated according to the single frame image and the continuous video segments, so as to obtain the interactive relationship between the target vehicle object and the rest of the target objects. If the vehicle 2 and the vehicle 3 exist in the single frame image besides the vehicle 1, interactive aggregation is performed according to the target images of the vehicle 1, the vehicle 2 and the vehicle 3 corresponding to the single frame image, and the interactive relationship among the vehicle 1, the vehicle 2 and the vehicle 3 is obtained.

Step S106 specifically includes: detecting the behavior of the target object marked by the current ID according to the interactive relation, the continuous video clips and the ID information to obtain a detection result; acquiring the pixel position, the ID information and the vehicle behavior type of the target object according to the detection result; and acquiring the positioning information of the target object by combining the height of the unmanned aerial vehicle for acquiring the real-time video data and the angle and the focal length of the camera.

Specifically, when only one vehicle 1 is present in a single frame image, the behavior detection model performs behavior detection on the target object marked by the current ID, determines the relationship between the vehicle 1 and the remaining target objects, such as pedestrians and lane markings, and obtains the behavior detection result of the vehicle 1. The method comprises the steps that the vehicle 2 and the vehicle 3 exist in a single frame image besides the vehicle 1, after the interactive relation among the vehicles 1, 2 and 3 is obtained, the corresponding relation among the vehicle 1, the vehicle 2 and the vehicle 3 in position, characteristics and action is determined according to the interactive relation, a continuous video segment and ID information of the vehicle 1, behavior judgment of the vehicle 1 in a vehicle queue or cluster is achieved, namely behavior detection of the vehicle 1 is achieved, and a behavior detection result of the vehicle 1 is obtained.

According to the behavior detection result of the vehicle 1, the pixel position, the ID information and the behavior category of the vehicle 1 in the single-frame image are output, the longitude and latitude positioning of the vehicle 1 is achieved through other state parameters of the unmanned system, such as the height of the unmanned aerial vehicle, the angle and the focal length of a camera and the like, and the longitude and latitude positioning information of the vehicle 1 is obtained by combining the pixel position of the vehicle 1.

As shown in fig. 2, there is provided an end-to-end vehicle behavior detection system 20, connected to an unmanned system, for acquiring real-time video data, comprising: the system comprises a single-frame image acquisition module 21, a target detection module 22, an image cutting module 23, a target tracking module 24, an interactive aggregation module 25 and a behavior detection module 26; the single-frame image acquisition module 21 is connected with the target detection module 22, the target detection module 22 is connected with the image cutting module 23, the target image cutting module 23 is connected with the target tracking module 24, the target tracking module 24 is connected with the interactive aggregation module 25, and the interactive aggregation module 25 is connected with the behavior detection module 26;

the single-frame image acquisition module 21 is configured to acquire the acquired real-time video data and acquire a single-frame image according to the real-time video data;

the target detection module 22 is used for performing convolution operation on a target area in the single-frame image through a convolution neural network, and classifying the category of a target object through a classifier to complete target detection;

the image cutting module 23 is configured to cut an area of interest of the detected image, obtain at least one target image, and input the target image to the target tracking module 24;

the target tracking module 24 is configured to acquire real-time video data, extract feature data of the target object through a multitask network, track the target object according to the feature data for all single-frame images, store the same ID tag for the single-frame images with the same feature data, acquire ID information, transmit a continuous video segment corresponding to the single-frame image, the feature data, and the ID information to the memory pool, and transmit information of the continuous video segment to the behavior detection module 26 by the memory pool;

the interactive aggregation module 25 is configured to screen out a target object as a target image of a vehicle, obtain a continuous video segment according to a single-frame image corresponding to the target image, perform interactive aggregation according to the single-frame image and the continuous video segment, and obtain an interactive relationship between the target object and the remaining target objects in the single-frame image;

and the behavior detection module 26 is configured to perform behavior detection on the target object by combining the ID information, the continuous video segments, and the interaction relationship, output behavior information of the target object according to a detection result, and position the vehicle.

In one embodiment, as shown in fig. 3, behavior detection module 26 further includes: and the monocular vision positioning unit is used for acquiring the related information of the target object according to the detection result and positioning the target object according to the related information.

Specifically, after the vehicle behavior is detected by the behavior detection module 26, the target vehicle object is located by the monocular visual locating unit. When a target vehicle object is positioned, longitude and latitude data or real-time distance data of a vehicle are estimated according to a monocular vision distance measuring principle and information such as coordinate postures of a camera and the like on the basis of target behavior identification, and positioning information of the target vehicle is obtained, so that early warning can be performed on the vehicle, surrounding vehicles or pedestrians according to the positioning information when the driving state of the vehicle is abnormal.

As shown in fig. 4 and 5, the interactive aggregation module includes a V-type interactive aggregation module and an O-type interactive aggregation module, and the V-type interactive aggregation module and the O-type interactive aggregation module are in second-order series-parallel connection; the V-shaped interactive aggregation module is used for detecting the interactive relationship between the vehicles, the input ends of the V-shaped interactive aggregation module correspond to target images of target objects of the vehicles, one input end of the V-shaped interactive aggregation module corresponds to the characteristic data input of the target vehicles, the other input ends of the V-shaped interactive aggregation module correspond to the special data input of the non-target vehicles, and the interactive relationship between the target vehicles and the non-target vehicles is obtained through interactive aggregation operation; the input end of the O-shaped interactive aggregation module corresponds to characteristic data of a target vehicle and characteristic data of a non-vehicle object, and the interactive relationship between the vehicle and the non-vehicle object is obtained through interactive aggregation operation.

Specifically, since the behavior of the vehicle is related to the interaction of the target objects in the surrounding scene in addition to the state of the vehicle itself, the interaction between the vehicle and the remaining target objects can be recognized by the V-type and O-type interaction aggregation module. The V-shaped interactive aggregation module is used for identifying action relations between vehicles, such as retrograde motion, queue abnormity, traffic accidents and the like. The O-shaped interactive aggregation module is used for identifying action relations between vehicles and other traffic signs, such as running red light, pressing lines, impacting obstacles or pedestrians and the like. The two can be connected in series, parallel and series-parallel. According to the method, a second-order series-parallel connection mode is adopted, and the series-parallel connection enables each interactive module to receive the output of all the blocks, and aggregation can be carried out through learnable weights.

Specifically, the input of the O-type interactive aggregation module corresponds to a target image of a vehicle target object, for example, if a single frame image has three target objects, the target images are respectively input according to an image which is cut according to interest, the three target objects include a vehicle target object, and the rest are non-vehicle target objects, and the feature data of the target images are calculated through scale calculation, softmax calculation and the like, so that interaction between the vehicle and other target objects in the single frame image is realized, and an interaction relationship is obtained.

Specifically, the V-type interactive aggregation module is similar to the O-type interactive aggregation module, but inputs of all vehicle target objects, wherein one of the three input ends corresponds to a main vehicle object, and the other two correspond to secondary vehicle objects, and through interactive aggregation operation, the interactive relationship between the main vehicle object and other vehicle objects in position, characteristics and action is realized, and behavior judgment of the main vehicle in a vehicle queue and a cluster is realized.

In one embodiment, a device is provided, which may be a server, and its internal structure diagram may be as shown in fig. 6. The device includes a processor, a memory, a network interface, and a database connected by a system bus. Wherein the processor of the device is configured to provide computing and control capabilities. The memory of the device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The database of the device is used for storing configuration templates and also can be used for storing target webpage data. The network interface of the device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement an end-to-end vehicle behavior detection method.

Those skilled in the art will appreciate that the configuration shown in fig. 6 is a block diagram of only a portion of the configuration associated with the present application and does not constitute a limitation on the devices to which the present application may be applied, and that a particular device may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.

In one embodiment, a storage medium may also be provided, the storage medium storing a computer program comprising program instructions which, when executed by a computer, which may be part of an end-to-end vehicle behaviour detection system of the kind referred to above, cause the computer to perform the method according to the preceding embodiment.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), or the like.

It will be apparent to those skilled in the art that the modules or steps of the invention described above may be implemented in a general purpose computing device, they may be centralized on a single computing device or distributed across a network of computing devices, and optionally they may be implemented in program code executable by a computing device, such that they may be stored on a computer storage medium (ROM/RAM, magnetic disks, optical disks) and executed by a computing device, and in some cases, the steps shown or described may be performed in an order different than that described herein, or they may be separately fabricated into individual integrated circuit modules, or multiple ones of them may be fabricated into a single integrated circuit module. Thus, the present invention is not limited to any specific combination of hardware and software.

The foregoing is a more detailed description of the present invention that is presented in conjunction with specific embodiments, and the practice of the invention is not to be considered limited to those descriptions. For those skilled in the art to which the invention pertains, several simple deductions or substitutions can be made without departing from the spirit of the invention, and all shall be considered as belonging to the protection scope of the invention.

Claims

1. an end-to-end vehicle behavior detection method, is characterized in that, comprises the following steps:

Acquire the collected real-time video data, and obtain a single-frame image according to the real-time video data;

Identify and classify the target object in the single-frame image through a convolutional neural network;

According to the target object, the region of interest is cropped on the single-frame image to obtain at least one target image;

Using a multi-target tracking method, extract the feature data of the target object in the target image, and classify the target image by tracking ID according to the feature data to obtain ID information;

Filter out the target image whose target object is a vehicle, obtain continuous video clips according to the single-frame image corresponding to the target image, perform interactive aggregation according to the single-frame image and the continuous video clip, and obtain the target object and the remaining targets in the single-frame image. interactions between objects;

The behavior detection of the target object is performed in combination with the ID information, the continuous video clips and the interaction relationship, the behavior information of the target object is output according to the detection result, and the vehicle is positioned.

2. A kind of end-to-end vehicle behavior detection method according to claim 1, is characterized in that, described obtaining the real-time video data that collects, obtains single frame image according to described real-time video data, specifically comprises:

Splitting the real-time video data according to the number of seconds to obtain multiple single-frame images of one second;

The continuous video segments corresponding to the single-frame images are filtered out according to the single-frame images, and the continuous video segments are stored in the memory pool.

3. An end-to-end vehicle behavior detection method according to claim 1, characterized in that, identifying and classifying the target object in the single-frame image through a convolutional neural network, specifically comprising:

Obtaining some historical video data, and splitting it into corresponding historical single-frame images, using the historical single-frame images as a training set, and the target objects corresponding to the historical single-frame images as a test set;

Build the initial target detection model based on the YOLOV5 convolutional neural network, train the initial target detection model according to the training set and the test set, and obtain the target detection model after the training is completed;

Input the single-frame image into the target detection model, and output the corresponding target object;

The target objects in all single-frame images are classified by the classifier, and the single-frame images are classified and stored.

4 . The end-to-end vehicle behavior detection method according to claim 1 , wherein the multi-target tracking method is used to extract the characteristic data of the target object in the target image, and according to the characteristic data, the The target image is tracked for ID classification, and ID information is obtained, including:

Classify all target images according to the feature data, filter out target images with the same target object, store the same ID mark on the target image, and obtain corresponding ID information;

All target images are classified according to the ID tags, and the classified target images have the behavioral characteristics of the target object.

5 . The end-to-end vehicle behavior detection method according to claim 4 , wherein, the target image in which the target object is a vehicle is screened out, and continuous video segments are obtained according to a single frame image corresponding to the target image. 6 . , obtain the interaction relationship between the target object and the remaining target objects in the single-frame image according to the single-frame image and the continuous video clip, specifically including:

Select a vehicle in the single-frame image as the target vehicle object, filter out the corresponding target image and single-frame image according to the target vehicle object, and obtain continuous video clips corresponding to the single-frame image according to the single-frame image corresponding to the target vehicle object;

Sampling the continuous video segment through two channel paths, and analyzing the static content and dynamic content in the continuous video segment;

According to the static content and the dynamic content, determine whether there is only one target vehicle object in the single frame image;

If yes, perform interactive aggregation according to the continuous video clips corresponding to the single frame image and the target image to obtain the interaction relationship between the target vehicle object and the remaining target objects;

If not, interactive aggregation is performed according to the continuous video segment corresponding to the single-frame image and the target image to obtain the interactive relationship between the target vehicle object and the remaining vehicle target objects.

6 . An end-to-end vehicle behavior detection method according to claim 4 , wherein the behavior detection is performed on the target object in combination with the ID information, the continuous video clips and the interaction relationship, and output is output according to the detection result. 7 . The behavior information of the target object and the positioning of the vehicle, including:

According to the interaction relationship, continuous video clips and ID information, the behavior of the target vehicle object marked by the current ID is detected, and the detection result is obtained;

Obtain the pixel position, ID mark and vehicle behavior category of the target vehicle object according to the detection result;

Combined with the height of the drone, the angle and focal length of the camera obtained from the real-time video data, the positioning information of the target vehicle object is obtained.

7. An end-to-end vehicle behavior detection system, which is connected to an unmanned system to obtain real-time video data, characterized in that it includes: a single-frame image acquisition module, a target detection module, an image cropping module, a target tracking module, and an interactive aggregation module and a behavior detection module; the single-frame image acquisition module is connected to the target detection module, the target detection module is connected to the image cropping module, the image cropping module is connected to the target tracking module, and the target tracking module is connected to the target tracking module. The module is connected with the interaction aggregation module, and the interaction aggregation module is connected with the behavior detection module;

The target detection module is used to perform a convolution operation on the target area in a single-frame image through a convolutional neural network, and divide the categories of the target objects through a classifier to complete target detection;

The image cropping module is used for cropping the region of interest of the detected image, acquiring at least one target image, and inputting the target image to the target tracking module;

The target tracking module is used for acquiring real-time video data, extracting feature data of the target object through a multi-task network, tracking the target object for all single-frame images according to the feature data, and performing target object tracking on single-frame images with the same feature data. The image stores the same ID mark, obtains the ID information, and transmits the continuous video clip, feature data and ID information corresponding to the single frame image to the memory pool, and the memory pool transmits the information of the continuous video clip to the behavior detection module;

The interactive aggregation module is used to filter out a target image whose target object is a vehicle, obtain a continuous video clip according to a single frame image corresponding to the target image, and perform interactive aggregation according to the single frame image and the continuous video clip to obtain a target The interaction between the object and the rest of the target objects in the single frame image;

The behavior detection module is used to detect the behavior of the target object in combination with the ID information, the continuous video clips and the interaction relationship, output the behavior information of the target object according to the detection result, and locate the vehicle.

8 . The end-to-end vehicle behavior detection system according to claim 7 , wherein the behavior detection module further comprises a monocular visual positioning unit, and the monocular visual positioning unit is used to obtain the target according to the detection result. 9 . Relevant information of the object, and locate the target object according to the relevant information;

The interactive aggregation module includes a V-shaped interactive aggregation module and an O-shaped interactive aggregation module, and the V-shaped interactive aggregation module and the O-shaped interactive aggregation module are second-order mixed;

The V-shaped interaction aggregation module is used to detect the interaction between vehicles and vehicles. The input terminals all correspond to the target image of the vehicle target object, one of the input terminals corresponds to the feature data input of the target vehicle, and the other input terminals correspond to the non-target vehicle. Special data input, through interactive aggregation operation, to obtain the interactive relationship between the target vehicle and the non-target vehicle;

The O-type interactive aggregation module is used to detect the interactive relationship between vehicles and other target objects. The input terminal corresponds to the characteristic data of the target vehicle and the characteristic data of non-vehicle objects. interaction.

9. A device comprising a memory, a processor and a computer program stored on the memory and running on the processor, wherein the processor implements any one of claims 1 to 6 when the processor executes the computer program the steps of the method described in item.

10. A storage medium on which a computer program is stored, characterized in that, when the computer program is executed by a processor, the steps of the method according to any one of claims 1 to 6 are implemented.