CN113486850A

CN113486850A - Traffic behavior recognition method and device, electronic equipment and storage medium

Info

Publication number: CN113486850A
Application number: CN202110852518.8A
Authority: CN
Inventors: 范佳柔; 乔磊; 徐光辉; 甘伟豪; 武伟
Original assignee: Zhejiang Shangtang Technology Development Co Ltd
Current assignee: Zhejiang Shangtang Technology Development Co Ltd
Priority date: 2021-07-27
Filing date: 2021-07-27
Publication date: 2021-10-08

Abstract

The application provides a traffic behavior recognition method and device, electronic equipment and a storage medium. The method may include: and carrying out object detection on the received video stream to obtain at least one object and an object area corresponding to the object. The type of the object includes at least one of a human body, a vehicle, and a rider. Determining a target rider leaving the video stream based on results of multi-object tracking of the object. And carrying out region contact degree detection on the target rider based on the obtained object regions corresponding to the objects to obtain a target human body and a target vehicle which meet preset region contact degree requirements with the target rider. Determining a behavior recognition result for the target rider based on the helmet recognition result for the target human body and the vehicle recognition result for the target vehicle.

Description

Traffic behavior recognition method and device, electronic equipment and storage medium

Technical Field

The present application relates to computer technologies, and in particular, to a traffic behavior recognition method and apparatus, an electronic device, and a storage medium.

Background

As the traffic regulatory authority strengthens the supervision, traffic behavior recognition is required. In some traffic behavior recognition scenarios, there is a need to recognize unsafe behavior in which a rider riding a motorcycle or an electric bicycle does not wear a safety helmet, and to apply safety education.

At present, traffic behavior recognition is generally performed on a single-frame image captured, so that false recognition may be caused due to the accidental nature of capturing the single-frame image.

Disclosure of Invention

In view of the above, the present application at least discloses a traffic behavior recognition method. The method may include: carrying out object detection on the received video stream to obtain at least one object and an object area corresponding to the object; the type of the object includes at least one of a human body, a vehicle, and a rider; determining a target rider leaving the video stream based on results of multi-object tracking of the object; performing region contact degree detection on the target rider based on the obtained object regions corresponding to the objects to obtain a target human body and a target vehicle which meet preset region contact degree requirements with the target rider; determining a behavior recognition result for the target rider based on the helmet recognition result for the target human body and the vehicle recognition result for the target vehicle.

In some embodiments, the performing, based on the obtained object region corresponding to each object, region contact ratio detection on the target rider to obtain a target human body and a target vehicle meeting a preset region contact ratio requirement with the target rider includes: respectively determining the region overlap ratio between other object regions except the target rider region and the target rider region in each object region based on the obtained object region corresponding to each object; determining a target human body region and a target vehicle region with the maximum region overlapping degree with the target rider region in the other object regions; and respectively determining the human body corresponding to the target human body area and the vehicle corresponding to the target vehicle area as the target human body and the target vehicle.

In some embodiments, the determining, as the target human body and the target vehicle, a human body corresponding to the target human body region and a vehicle corresponding to the target vehicle region respectively includes: and in response to the fact that the area contact ratio between the target human body area and the target vehicle area reaches a preset contact ratio threshold value, respectively determining the human body corresponding to the target human body area and the vehicle corresponding to the target vehicle area as the target human body and the target vehicle.

In some embodiments, the performing, based on the obtained object region corresponding to each object, region contact ratio detection on the target rider to obtain a target human body and a target vehicle meeting a preset region contact ratio requirement with the target rider includes: respectively determining a human body region and a vehicle region with the largest region overlapping degree with the target rider region in each target image of the video stream based on the obtained object regions corresponding to the objects; respectively determining the human body area and the vehicle area which are determined to have the maximum contact ratio times to reach a preset threshold value as a target human body area and a target vehicle area; and respectively determining the human body corresponding to the target human body area and the vehicle corresponding to the target vehicle area as the target human body and the target vehicle.

In some embodiments, the performing, based on the obtained object region corresponding to each object, region contact ratio detection on the target rider to obtain a target human body and a target vehicle meeting a preset region contact ratio requirement with the target rider includes: determining a target human body region and a target vehicle region with the largest region overlapping degree with the target rider region in each target image of the video stream based on the obtained object regions corresponding to the objects, and taking the human body and the vehicle respectively corresponding to the target human body region and the target vehicle region as first associated objects corresponding to the target rider; determining a second associated object corresponding to the target rider, wherein in an object region corresponding to each object, the region overlapping degree of the target rider region and the second associated object is the largest, and the second associated object comprises a human body and a vehicle; and determining the target human body and the target vehicle according to the same associated object in the second associated object and the first associated object.

In some embodiments, the determining, based on the obtained object regions corresponding to the objects, a target human body region and a target vehicle region in each target image of the video stream, where a region overlapping degree with the target rider region is the largest, includes: respectively determining a human body region and a vehicle region with the largest region overlapping degree with the target rider region in each target image based on the obtained object regions corresponding to the objects; and respectively taking the human body area and the vehicle area which are determined to have the maximum number of times of the contact ratio as a preset threshold value as the target human body area and the target vehicle area.

In some embodiments, the determining the target human body and the target vehicle according to the same one of the second associated object and the first associated object comprises: in response to the fact that the same associated object comprises a human body and a vehicle, determining the human body and the vehicle as the target human body and the target vehicle respectively; or, in response to that the same associated object includes multiple human bodies and multiple vehicles, determining, as the target human body and the target vehicle, the human body and the vehicle, corresponding to the human body and the vehicle with the largest number of times, in the multiple human bodies and the multiple vehicles, respectively.

In some embodiments, before obtaining the behavior recognition result for the target rider according to the helmet recognition result for the target human body and the vehicle recognition result for the target vehicle, the method further comprises: acquiring a target image containing the target rider; performing helmet identification on the target human body in the target image by using a helmet identification network to obtain a helmet identification result aiming at the target human body; and utilizing a vehicle identification network to perform vehicle identification on the target vehicle in the target image to obtain a vehicle identification result aiming at the target vehicle.

In some embodiments, the helmet identification result indicates whether the target person wears a hat, and a type of the hat worn and hat assist attributes; the training method of the helmet identification network comprises the following steps: constructing a first training sample; the first training sample comprises first image samples of different hat types, and the first image samples comprise first marking information of hat sub-types and second marking information of hat auxiliary attributes; inputting the first training sample into the helmet identification network, and obtaining a helmet identification result of each sample image, wherein the helmet identification result comprises whether a helmet is worn, a type of the helmet and a hat auxiliary attribute; respectively determining a first loss and a second loss according to the helmet identification result and the first marking information and the second marking information; optimizing the helmet identification network based on the first loss and the second loss.

In some embodiments, the vehicle identification result indicates a vehicle type and a vehicle assistance attribute of the target vehicle; the training method of the vehicle identification network comprises the following steps: constructing a second training sample; the second training sample comprises second image samples of different vehicle types, and the second image samples comprise third labeling information of the vehicle types and fourth labeling information of vehicle auxiliary attributes; inputting the second training sample into the vehicle recognition network, and obtaining a vehicle recognition result of each sample image, wherein the vehicle recognition result comprises a vehicle type and a vehicle auxiliary attribute; respectively determining a third loss and a fourth loss according to the vehicle identification result and the third labeling information and the fourth labeling information;

optimizing the vehicle identification network based on the third loss and the fourth loss.

In some embodiments, said obtaining a target image including said target rider comprises: acquiring a first image containing the target rider; and acquiring a target image meeting the requirement of image identification degree from the first image.

In some embodiments, the method further comprises: and if the vehicle type of the target vehicle is a preset vehicle type and the target human body does not wear a helmet, outputting relevant information of the target rider and/or the target vehicle.

The present application further provides a traffic behavior recognition device, the device including: the object detection module is used for carrying out object detection on the received video stream to obtain at least one object and an object area corresponding to the object; the type of the object includes at least one of a human body, a vehicle, and a rider; a determination module to determine a target rider leaving the video stream based on a result of multi-object tracking of the object; the contact ratio detection module is used for carrying out region contact ratio detection on the target rider based on the obtained object regions corresponding to the objects to obtain a target human body and a target vehicle which meet the preset region contact ratio requirement with the target rider; a recognition result for determining a behavior recognition result for the target rider based on a helmet recognition result for the target human body and a vehicle recognition result for the target vehicle.

The present application further proposes an electronic device, comprising: a processor; a memory for storing processor-executable instructions; wherein the processor executes the executable instructions to implement the traffic behavior recognition method as shown in any one of the foregoing embodiments.

The present application also proposes a computer-readable storage medium, which stores a computer program for causing a processor to execute the traffic behavior recognition method as shown in any of the preceding embodiments.

According to the technical scheme, the behavior of the rider leaving the video stream is identified, the traffic behavior can be identified according to the complete collected image completely containing the target rider in the video stream instead of detecting the traffic behavior aiming at the single-frame image, so that the error identification caused by the accidental nature of single-frame snapshot is avoided, and the accuracy of the traffic behavior identification is improved; in the method, the traffic behavior is identified through the target human body and the target vehicle which meet the requirement of the overlap ratio of the preset area between the target rider and the target human body, and the spatial incidence relation among the human body, the vehicle and the rider can be utilized to improve the identification accuracy of the vehicle and the human body matched with the rider, so that the detection accuracy of the traffic behavior is improved.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application.

Drawings

In order to more clearly illustrate one or more embodiments of the present application or technical solutions in the related art, the drawings needed to be used in the description of the embodiments or the related art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments described in one or more embodiments of the present application, and other drawings can be obtained by those skilled in the art without inventive exercise.

Fig. 1 is a flow chart illustrating a method of traffic behavior recognition according to the present application;

fig. 2 is a schematic flow chart of an object detection method according to the present application;

FIG. 3 is a flow chart illustrating a method for determining a target human body and a target vehicle according to the present application;

FIG. 4 is a flow chart illustrating a method for determining a target human body and a target vehicle according to the present application;

fig. 5 is a schematic diagram illustrating a helmet identification process according to the present application;

FIG. 6 is a schematic view of a vehicle identification process shown herein;

fig. 7 is a flow chart of a traffic behavior recognition method shown in the present application;

fig. 8 is a schematic structural diagram of a traffic behavior recognition apparatus according to the present application;

fig. 9 is a schematic diagram of a hardware structure of an electronic device shown in the present application.

Detailed Description

Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present application. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present application, as detailed in the appended claims.

The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in this application and the appended claims, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items. It should also be understood that the word "if" as used herein may be interpreted as "at … …" or "at … …" or "in response to a determination," depending on the context.

The application aims to provide a traffic behavior identification method (hereinafter referred to as an identification method).

Referring to fig. 1, fig. 1 is a flowchart illustrating a method of a traffic behavior recognition method according to the present application.

As shown in fig. 1, the method may include:

s102, carrying out object detection on the received video stream to obtain at least one object and an object area corresponding to the object; the type of the object includes at least one of a human body, a vehicle, and a rider;

s104, determining a target rider leaving the video stream based on a result of multi-object tracking performed on the object;

s106, carrying out region contact ratio detection on the target rider based on the obtained object regions corresponding to the objects to obtain a target human body and a target vehicle which meet the preset region contact ratio requirement with the target rider;

and S108, determining a behavior recognition result of the target rider based on the helmet recognition result of the target human body and the vehicle recognition result of the target vehicle. According to the traffic behavior recognition method, the behavior recognition is carried out on the rider leaving the video stream, the traffic behavior recognition traffic behavior can be carried out according to the complete collected image completely containing the target rider in the video stream instead of carrying out traffic behavior detection on a single-frame image, so that the error recognition caused by the accidental nature of single-frame snapshot is avoided, and the accuracy of the traffic behavior recognition is improved; in the method, the traffic behavior is identified through the target human body and the target vehicle which meet the requirement of the overlap ratio of the preset area between the target rider and the target human body, and the spatial incidence relation among the human body, the vehicle and the rider can be utilized to improve the identification accuracy of the vehicle and the human body matched with the rider, so that the detection accuracy of the traffic behavior is improved.

The identification method can be applied to electronic equipment. The electronic device can execute the identification method through an embarkation software system. The electronic equipment can be a notebook computer, a server, a mobile phone, a PAD terminal and the like. The specific type of the electronic device is not particularly limited in this application.

It is understood that the identification method can be executed by only the client device or the server device, or can be executed by the client device and the server device in cooperation. The server may be a cloud constructed by a single server or server machine.

For example, the identification method may be integrated in the client. After receiving the identification request, the client device can provide calculation power to execute the identification method through the hardware environment of the client device.

Also for example, the identification method may be integrated into a server device. After receiving the identification request, the server-side equipment can provide calculation force to execute the identification method through the hardware environment of the server-side equipment.

For example, the identification method can be divided into two tasks of acquiring video stream and identifying traffic behavior. Wherein the retrieval task may be integrated into the client device. The recognition class task may be integrated into the server device. And the client equipment initiates an identification request to the server equipment after acquiring the video stream. After receiving the identification request, the server device may detect traffic behavior in the video stream in response to the request.

The following description will be given taking an execution body as an electronic device (hereinafter simply referred to as a device) as an example.

The device may receive a video stream.

The video stream may be a live video acquired by an image acquisition device (e.g., a monitoring camera, etc.) deployed on the site for a monitored area. The image capturing devices typically capture video for a fixed monitored area.

In some examples, an object such as a rider may be captured by the image capturing device when passing through the monitoring area to form a video stream, and the image capturing device may transmit the video stream to the device by wire or wirelessly for traffic behavior detection.

After receiving the video stream, the apparatus may perform S102. It should be noted that, in some embodiments, the operation performed on the video stream may be a correlation operation performed on each frame image in the video stream, or may be a correlation operation performed on each frame image sampled at a preset sampling interval from the video stream.

The rider may refer to a person driving a non-motorized vehicle. The rider zones disclosed herein may include the rider's body, as well as the rider-driven vehicle.

The vehicle may be a non-motor vehicle. For example, in the present application, the vehicle may refer to a motorcycle, an electric bicycle, a bicycle, and the like. In the present application, by performing vehicle recognition on a vehicle matched to a rider, it is possible to determine whether the rider is driving a motorcycle, an electric bicycle, or a bicycle, thereby excluding erroneous recognition due to the recognized rider not being a "real sense" rider. The vehicle region disclosed herein may include a vehicle appearing in the image.

The human body refers to a preset human body part of a person appearing in the image. In some examples, the predetermined body part may include at least one of a head, a torso, and a limb. In some examples, to increase the accuracy of behavior detection, the predetermined body parts may include a head, a torso, and limbs. By performing helmet recognition on a human body matched with a rider, it can be determined whether the rider wears a helmet. The body region disclosed herein may include a body appearing in an image.

Several objects may be included in the video stream. Wherein a number may represent one or more. I.e. an object or objects may be included in the image.

In some embodiments, in step S102, object detection may be performed on each object in the received video stream by using a pre-trained object detection network, so as to obtain an object region corresponding to each object.

The object area is an image area corresponding to a detection frame corresponding to each object, and information such as the position and size of the object area can be indicated through the coordinates of 4 vertexes of the detection frame.

The object detection network may specifically be a deep convolutional network model for object detection. For example, the object detection network may be an RCNN (Region Convolutional Neural Networks) network, a FAST-RCNN (FAST Region Convolutional Neural Networks) network, or a FASTER-RCNN network.

In practical application, before the object detection network is used for object detection, the model can be trained based on a plurality of training samples marked with the position information and the classification information corresponding to the human body detection frame, the rider detection frame and the vehicle detection frame until the model converges.

Referring to fig. 2, fig. 2 is a schematic flow chart of an object detection method according to the present application. Fig. 2 is a schematic diagram illustrating the object detection method, and does not limit the present application.

As shown in fig. 2, the target object detection network 20 may be a model constructed based on the FASTER-RCNN network. The model may include at least a backbone Network 21(backbone), an RPN22 (Region-based Convolutional Neural Network), and an RCNN23(Region-based Convolutional Neural Network).

The backbone network 21 may perform convolution operation on the target image for several times to obtain a target feature map of the target image. RPN22 may be used to process the target feature map to obtain anchors (anchor boxes) corresponding to each object in the target image. The RCNN23 is configured to perform bbox (bounding boxes, detection boxes) regression and classification according to an anchor frame output by the RPN network and a target feature map output by the backbone network, so as to obtain detection boxes corresponding to a human body, a vehicle, and a rider included in the target image.

After the detection frames corresponding to the respective objects are obtained, vertex coordinate information corresponding to the respective detection frames may be stored.

After obtaining the object region corresponding to each object, the device may execute the step S104.

In some embodiments, in performing multi-object tracking, S1042 may be performed first to obtain each object included in each frame of image in the video stream. And then executing S1044, determining matched objects in the two adjacent frames of images, and determining the matched objects as the same object, thereby realizing the tracking of the same object. S1046 may be executed to assign the same identity ID to the same object, thereby implementing tagging of the same object.

In some embodiments, when S1042 is executed, the detection result of S102 may be obtained.

In some embodiments, in step S1044, each frame of image may be taken as a current image, each object in the current image is taken as a current object, and the similarity between the first area visual feature corresponding to each object detected in the image of the previous frame of the current image and the second area visual feature corresponding to the current object in the current frame of image is calculated. The similarity calculation may be performed by using a euclidean distance, a cosine distance, a mahalanobis distance, or the like, and is not particularly limited herein. Then, the object of the previous frame image corresponding to the first regional visual feature with the highest similarity to the second regional visual feature may be used as the object matched with the current object.

In some embodiments, in performing S1046, if the object is found not to have appeared in the previous frame image (i.e., there is no object in the previous frame image that matches the object), a new ID may be assigned to the object. The logic for assigning the new ID is not particularly limited in the present application. In some examples, a sequential assignment method may be employed, i.e., assigning one by one in the order of detection of new IDs.

If a rider (hereinafter referred to as a first object) of a previous frame is found not to find a matching object in the current image, it can be assumed that the first object may leave the video stream. In some examples, after determining the first object missing from the current image, S1048 may be performed to determine the first object as a target rider away from the video stream. Compared with the method for identifying the behaviors of each rider, on one hand, the method can identify the traffic behaviors according to the complete collected image containing the target rider in the video stream instead of detecting the traffic behaviors according to the single-frame image, so that the error identification caused by the accidental nature of single-frame snapshot is avoided, and the accuracy of identifying the traffic behaviors is improved; on the other hand, the calculation amount can be reduced, and the resource utilization rate can be reduced.

In some embodiments, whether the object leaves the video stream monitoring area or not can be determined through the rider moving track, so that the target rider leaving the video stream can be accurately acquired, and the error recognition is avoided. In executing S1048, it may be determined whether a movement trajectory of the first object in the video stream represents that the first object leaves the monitored area based on an object tracking result for the first object; if so, determining the first object as the target rider.

The object tracking result of the first object may include coordinate information of the first object at different time instants (i.e. motion trail information of the first object), and it may be determined whether the motion trail marks that the first object is about to leave the video stream by determining whether the coordinate of the first object at the latest time instant is at the edge position of the monitored area.

In some embodiments, false recognition rates due to snapshot contingencies may be reduced by determining the length of time that the target rider appears in the video stream.

Specifically, after a certain object is subjected to object tracking for the first time, the number of image frames in which the object appears may be counted by an object tracking result for a subsequent frame image, and when the number of image frames reaches a preset threshold (empirical threshold), the object may be taken as the target object. Therefore, the false recognition rate caused by the accidental snapshot can be reduced.

After determining the target rider from the video stream, the apparatus may perform S106: and carrying out region contact degree detection on the target rider based on the obtained object regions corresponding to the objects to obtain a target human body and a target vehicle which meet preset region contact degree requirements with the target rider.

In some embodiments, the predetermined region overlap ratio requirement may include any of the following: the degree of area overlap reaches a standard threshold (empirical value); the area contact ratio is maximum; the region overlap ratio interaction is maximum. The region overlap ratio can represent the spatial association relationship of two objects. The larger the coincidence degree of the areas corresponding to the two objects is, the stronger the spatial relevance of the two objects is, and the spatial relation constraint between the rider, the human body and the vehicle is met.

The region contact ratio is the largest, which means that the regions where two objects are opposite to each other are the largest. For example, the maximum region overlapping degree of the object a and the object B is that the object a is taken as a reference object, and the object B with the maximum region overlapping degree with the object a is determined by comparing the object a and the object B one by one; and conversely, by taking the object B as a reference object, comparing the object B one by one to determine that the object A is the object with the highest region overlapping degree with the object B.

In this case, when S106 is executed, S1062 may be executed to determine, based on the obtained object regions corresponding to the respective objects, region overlapping degrees between the object regions other than the target rider region and the target rider region. Then, S1064 may be performed, and among the other object regions, a target human body region and a target vehicle region having a greatest degree of region overlap with the target rider region are determined. S1066 may be executed to determine the human body corresponding to the target human body area and the vehicle corresponding to the target vehicle area as the target human body and the target vehicle, respectively.

In S1062, an area blending ratio may be calculated according to IoU (Intersection over Union), so as to obtain an area blending ratio between the two objects.

Take the other objects as human bodies as an example. When the IoU calculation is performed, it may be determined whether the human body detection frame (hereinafter referred to as frame 1) and the target rider detection frame (hereinafter referred to as frame 2) coincide with each other, and if so, the area intersection ratio IoU (frame 1, frame 2) may be obtained by dividing the area of the region where frame 1 and frame 2 overlap with the area of the region formed by frame 1 and frame 2.

Let the coordinates of the top left corner of box 1 be (p _ x1, p _ y1) and the coordinates of the bottom right corner be (p _ x2, p _ y 2). The coordinates of the upper left corner of the box 2 are (h _ x1, h _ y1) and the coordinates of the lower right corner are (h _ x2, h _ y 2).

If the value corresponding to p _ x1 > h _ x2| | p _ x2 < h _ x1| | p _ y1 > h _ y2| | p _ y2 < h _ y1 is 1, it may be determined that the frame 1 does not coincide with the frame 2, that is, it may be determined that the human body corresponding to the frame 1 is not spatially associated with the target rider corresponding to the frame 2.

If the value corresponding to the formula p _ x1 > h _ x2| | | p _ x2 < h _ x1| | p _ y1 > h _ y2| | p _ y2 < h _ y1 is 0, the length Len of the overlapping region may be further determined according to the formula Len ═ min (p _ x2, h _ x2) -max (p _ x 1-h _ x1), and the width Wid of the overlapping region may be determined according to the formula Wid ═ min (p _ y2, h _ y2) -max (p _ y 1-h _ y 1).

After the length and width are determined, the area S1 of the overlapping area of frame 1 and frame 2 is obtained according to the formula S1 — Len × Wid.

Then, the area of the region composed of the frame 1 and the frame 2 may be determined according to the formula S2 ═ S (p) + S (h) -S1. Wherein: s (p) (p _ y 2-p _ y1) (p _ x 2-p _ x1),

S(h)＝(h_y2–h_y1)*(h_x2–h_x1)。

Finally, according to the formula IoU — S1/S2, a score of IoU (box 1, box 2), i.e., a region coincidence between the human body and the target rider, may be determined.

Thereafter, in S1064, the human body and the vehicle having the greatest degree of area overlap (IoU) with the target rider may be determined as a target human body area and a target vehicle area having the greatest degree of area overlap with the target rider area, respectively. S1066 may then be performed to derive the target human body and the target vehicle spatially associated with the target rider based on the target human body area and the target vehicle area.

Therefore, the target human body and the target vehicle with the largest contact degree with the target rider area can be obtained, the human body and the vehicle with the strongest spatial relevance with the rider can be obtained, and the accuracy of traffic behavior recognition is improved.

In some embodiments, the preset region overlapping degree requirement can be set to be that the region overlapping degree is the largest, so that the human body and the vehicle which are most strongly correlated with the rider in space can be accurately determined, and the traffic behavior identification accuracy is improved.

In some embodiments, after determining the target human body region and the target vehicle region, the region overlap ratio between the target human body region and the target vehicle region may be further verified to determine whether the overlap ratio reaches a preset overlap ratio threshold value. And in response to the fact that the area contact ratio between the target human body area and the target vehicle area reaches a preset contact ratio threshold value, respectively determining the human body corresponding to the target human body area and the vehicle corresponding to the target vehicle area as the target human body and the target vehicle.

The preset contact ratio threshold may be an empirical threshold set according to the service requirement. For example, the preset overlap threshold may be 0.2 for region overlap in IoU.

The target human body region and the target vehicle region associated with the same rider region should be the same as the human body and the vehicle included in the rider region, and therefore the target human body and the target vehicle should partially overlap in space, and therefore, in this example, whether the target human body and the target vehicle are in the rider region is verified by the region overlap ratio between the target human body region and the target vehicle region; if the contact ratio of the target human body and the target vehicle is lower than the threshold value, the associated target human body and the associated target vehicle are possibly subjected to misjudgment, and the target human body and the target vehicle associated with the rider area are re-determined, so that the accuracy of traffic behavior identification is improved.

In some embodiments, the preset region overlap requirement may be set such that the number of times the subject is determined to have the greatest degree of overlap with the target rider region and is determined to have the greatest degree of overlap with the target rider region reaches a preset threshold. Therefore, the human body and the vehicle which are most strongly and stably associated with the rider in space can be determined, and the accuracy of traffic behavior identification is improved. The preset threshold is determined by multiplying the number of target images including the target rider in the video stream by a preset ratio, and in the present embodiment, the preset ratio is 20%, but the value of the preset ratio is not limited thereto, and may be set according to actual conditions.

Referring to fig. 3, fig. 3 is a flowchart illustrating a method for determining a target human body and a target vehicle according to the present application.

As shown in fig. 3, when executing S106, S302 may be executed to determine, based on the obtained object regions corresponding to the objects, a human body region and a vehicle region in each target image of the video stream, where the region overlapping degree with the target rider region is the greatest, respectively. S304 may then be executed to determine the human body region and the vehicle region corresponding to the human body region and determined as the target human body region and the target vehicle region, respectively, where the number of times of the maximum contact ratio reaches the preset threshold. S306, respectively determining the human body corresponding to the target human body area and the vehicle corresponding to the target vehicle area as the target human body and the target vehicle.

The preset threshold may be determined by multiplying the number of target images including the target rider in the video stream by a preset ratio, and in the embodiment, the preset ratio is 20%, but the value of the preset ratio is not limited thereto, and may be set according to actual conditions.

In some embodiments, in performing S302-S304, each target image may be taken as a current image, and the following steps are performed:

according to the method IoU, the region overlapping degree between each object in the current image and the target rider is calculated. And then respectively determining a human body region and a vehicle region which have the maximum region overlapping degree with the target rider in the current image, and updating the times which are determined as the maximum region overlapping degree and correspond to the human body region and the vehicle region.

After the steps are completed for each target image, whether the number of times corresponding to each human body region and each vehicle region, and the ratio of the number of target images reach a preset threshold value, may be determined, and the human body region and the vehicle region corresponding to the human body region and the vehicle region determined that the number of times with the largest degree of overlap reaches the preset threshold value are determined as the target human body region and the target vehicle, respectively. S306 may then be performed, resulting in the target human body and the target vehicle being spatially associated with the target rider. Therefore, the human body and the vehicle which are most strongly and stably associated with the rider in space can be determined, and the accuracy of traffic behavior identification is improved.

In some embodiments, the target human body and the target vehicle may also be determined in the following manner.

Referring to fig. 4, fig. 4 is a flowchart illustrating a method for determining a target human body and a target vehicle according to the present application.

As shown in fig. 4, when S106 is executed, S402 may be executed to determine, based on the obtained object regions corresponding to the objects, a target human body region and a target vehicle region having a largest region overlapping degree with the target rider region in each target image of the video stream, and to set human bodies and vehicles corresponding to the target human body region and the target vehicle region, respectively, as first related objects corresponding to the target rider.

S404, determining a second related object corresponding to the target rider, wherein in the object region corresponding to each object, the region overlapping degree of the target rider region and the second related object is the largest, and the second related object comprises a human body and a vehicle.

S406, determining the target human body and the target vehicle according to the same associated object in the second associated object and the first associated object.

In some embodiments, in executing S402, the target human body region and the target vehicle region may be determined in the same manner as in S302, or the target human body region and the target vehicle region may be determined in the same manner as in S302 and S304, which is not described in detail herein, and then the human body and the vehicle corresponding to the target human body region and the target vehicle region, respectively, are taken as the first related object corresponding to the target rider.

In executing S404, each target image may be taken as a current image, and:

in the manner of the aforementioned IoU, it is determined whether the target rider region has the greatest degree of region overlap with the reference region, with each human object region and vehicle object region included in the current image as the reference region. If the second correlation object is the largest, the object corresponding to the reference area may be taken as the second correlation object.

In some embodiments, a second related object stably related to the target rider may be determined, and specifically, after the reference region having the largest degree of region overlap with the target rider region is determined, the number of times that the target rider region is determined to have the largest degree of region overlap with the reference region may be counted, and when the number of times reaches a second preset threshold value, an object corresponding to the reference region may be taken as the second related object.

The second preset threshold may be determined by a product of a number of target images in the video stream containing the target rider and a preset ratio. In the present embodiment, the preset ratio is 20%, but the value of the preset ratio is not limited to this, and may be set according to actual conditions. Thereby determining a second associated object that is stably associated with the target rider.

In S406, the first associated object set and the second associated object set may be intersected to obtain the same associated object. Therefore, the intersection is taken through the two associated objects, the target human body and the target vehicle which are stably associated with the target rider can be screened out, and false recognition caused by one-way association is eliminated, so that the accuracy of traffic behavior recognition is improved.

In some embodiments, if the same associated object includes only one human body and vehicle, the human body and the vehicle may be determined as the target human body and the target vehicle, respectively.

If the same associated object includes a plurality of human bodies and a plurality of vehicles, the human body and the vehicle corresponding to the largest number of times among the plurality of human bodies and the plurality of vehicles may be determined as the target human body and the target vehicle, respectively. Therefore, the target human body and the target vehicle which are most strongly related to the target rider and are stably related to each other can be obtained, so that the incidence relation caused by accidental snapshot among the objects can be reduced, the false recognition caused by scenes such as the parking of the rider and the passing of pedestrians and vehicles is reduced, and the accuracy of the traffic behavior recognition is improved.

After obtaining the target rider, and the target human body and the target vehicle spatially associated with the target rider, S108 may be performed.

In some embodiments, prior to performing S108, S1082 may be performed, acquiring a target image including the target rider. Then executing S1084, performing helmet identification on the target human body in the target image by using a helmet identification network, and obtaining a helmet identification result for the target human body; and executing S1086, performing vehicle identification on the target vehicle in the target image by using a vehicle identification network, and obtaining a vehicle identification result for the target vehicle.

The target image may be any image in the video stream that contains the target rider. In some embodiments, the target image may be a high quality image that meets image requirements. Therefore, the identification accuracy of the helmet and the vehicle can be improved, and the identification accuracy of traffic behaviors is further improved.

At this time, in executing S1082, a first image including the target rider may be acquired from the video stream. Then, a target image meeting the requirement of image identification degree is obtained from the first image. The image identification degree is high, and the object in the image is easy to identify, so that the identification accuracy of the helmet and the vehicle can be improved.

In some embodiments, when 1084 is executed, the area of the target human body obtained after the object detection step described in S102 is performed on the target image may be obtained, and then a human body area feature map corresponding to the area of the target human body is input to the helmet identification network, so as to obtain a helmet identification result for the target human body. The helmet identification network can be a multi-classification network.

In training the helmet identification network, a first training sample may be constructed in training the network. Wherein the first training sample comprises first image samples of different hat types and first annotation information indicating the hat types in the first image samples.

In the present application, the specific type of cap is not limited. For example, the cap may be a peaked cap, helmet, hard hat, fisherman's cap, or the like. The encoding can be performed by a one-hot (one-hot encoding) method when labeling the hat type. For example, 1000 may indicate that a peaked cap is worn, 0100 indicates that a helmet is worn, 0010 indicates that a safety helmet is worn, 0001 indicates that a fisherman cap is worn, and 0000 indicates that no hat is worn.

Then, network parameters of the helmet identification network can be optimized according to each first image sample in the first training samples and the first labeling information.

After the training of the helmet identification network is completed, the area visual features corresponding to the target human body in the target image can be input into the helmet identification network to obtain a helmet identification result corresponding to the human body.

The helmet recognition result may indicate whether the target person wears a hat, and a type of the hat worn. If the helmet recognition result indicates that the target person wears a helmet, the target rider may be considered to be wearing a helmet. On the contrary, if the helmet identification result indicates that the target human body does not wear any hat or does not wear a helmet, it may indicate that the target rider does not wear a helmet.

In some embodiments, the helmet identification result may also indicate an auxiliary attribute of the worn hat; the helmet identification network comprises a hat type identification sub-network and a hat auxiliary attribute identification sub-network; the hat type identification subnetwork and the hat auxiliary attribute identification subnetwork share a feature extraction network.

When the helmet identification network is trained, a joint training mode can be adopted, and the network training efficiency and the helmet identification network prediction effect are improved.

In some embodiments, the hat assist attribute comprises at least one of: the color of the cap; the angle of the cap; a hat shape. It will be appreciated that when the hat assistant attribute is of multiple types, the hat attribute identification sub-network may be constructed separately for each type of assistant attribute.

When training the helmet identification network, S11 may be performed first to construct a first training sample. The first training sample comprises first image samples of different hat types, and the first image samples comprise first marking information of hat sub-types and second marking information of hat auxiliary attributes. It will be appreciated that if there are multiple hat attributes, the information label may be made for each hat attribute. In this example, the first training sample may include hat color labeling information and hat angle labeling information.

Then, S12 may be executed to input the first training sample into the helmet identification network, and obtain a helmet identification result of each sample image, where the helmet identification result includes whether a helmet is worn, a type of the hat, and a hat auxiliary attribute.

Then, S13 and S14 may be executed, respectively determining a first loss and a second loss according to the helmet identification result and the first annotation information and the second annotation information, and optimizing the helmet identification network according to the first loss and the second loss.

Referring to fig. 5, fig. 5 is a schematic view illustrating a helmet identification process according to the present application.

The helmet identification network 50 shown in fig. 5 may include a hat type identification sub-network 53, a hat color identification sub-network 54, a hat angle identification sub-network 55, and a shared feature extraction network 52 and a backbone network 51 shared by the three networks.

The backbone network 51 may perform image processing such as convolution on the visual feature map of the human object region, and input the processing result to the shared feature extraction network. The shared feature extraction network 52 may further perform feature extraction processing such as convolution on the processing result, and input the obtained processing result into three sub-networks respectively to obtain a hat type recognition result, a color recognition result, and an angle detection result. The three sub-networks may include a feature extraction layer for performing relevant feature extraction, a full connectivity layer, and a confidence computation layer (softmax function layer). Not illustrated in fig. 5.

In performing S12-S14, a first image sample may be input to the helmet identification network, resulting in a helmet identification result for the first image sample. Then, according to the helmet type indicated by the helmet identification result and the first marking information, first loss information is determined, and according to the first loss information, network parameters of the helmet type identification sub-network are updated through back propagation; and the number of the first and second groups,

second loss information may be determined based on the helmet assistance attribute indicated by the helmet identification result and the second tag information, and the network parameter of the hat assistance attribute identification sub-network may be updated by back propagation based on the second loss information. In this example, the color loss information and the angle loss information may be determined according to the hat color labeling information and the hat angle labeling information, respectively, and then the network parameters of the hat color identification sub-network and the angle identification sub-network may be updated according to the loss information, respectively.

In the example, when the helmet identification network is trained, because a supervised joint training method is adopted, the helmet type identification sub-network and the hat auxiliary attribute identification sub-network can be simultaneously trained, so that the models can be mutually constrained and mutually promoted in the training process, and the convergence efficiency of the sub-networks is improved on one hand; on the other hand, the backbone network shared by the sub-networks and the shared feature extraction network are promoted to extract features beneficial to helmet attribute identification, so that the helmet attribute identification accuracy is improved, and the traffic behavior identification accuracy is further improved.

After the training for the helmet identification network is completed, the area visual features corresponding to the target human body in the target image can be input into the helmet identification network to obtain a helmet identification result corresponding to the human body.

And then determining whether the target human body wears a hat and the type of the hat to be worn according to the type recognition result indicated by the helmet recognition result, and determining whether the target rider wears a helmet.

In some examples, when 1086 is executed, an area of the target vehicle obtained after the object detection step described in S102 is executed on the target image may be acquired, and then a vehicle area feature map corresponding to the area of the target vehicle may be input to the vehicle identification network, so as to obtain a vehicle identification result for the target vehicle.

The vehicle identification network may be a multi-classification network. In training the network, a second training sample may be constructed. Wherein the second training sample comprises a second image sample of a different vehicle type and third annotation information indicating the vehicle type in the second image sample.

In the present application, the specific type of vehicle is not limited. For example, the vehicle may be a motorcycle, an electric bicycle, a scooter, a bicycle, or the like. The encoding may be performed by a one-hot (one-hot encoding) method when the vehicle type is labeled. For example, a motorcycle may be represented by 1000, an electric bicycle by 0100, a scooter by 0010, a bicycle by 0001, and an unknown vehicle type by 0000. Then, the vehicle identification network may be trained according to each second image sample in the second training samples and the third label information until the network converges.

After the network training is completed, the area visual features corresponding to the target vehicle in the target image can be input into the vehicle recognition network to obtain the vehicle attribute recognition result of the vehicle. It may then be further determined whether the vehicle type of the vehicle is a preset vehicle type. The preset vehicle type may be a vehicle type set according to a service requirement. For example, in the traffic behavior recognition, the preset vehicle type may include a motorcycle, an electric bicycle, and the like.

If the vehicle type of the vehicle is a preset vehicle type, the vehicle of the preset vehicle type is driven by the rider, so that behavior recognition of a target rider associated with the vehicle is required, and otherwise, behavior recognition of the target rider is not required.

In some embodiments, the vehicle identification result further includes an auxiliary attribute of the target vehicle; the vehicle identification network comprises a vehicle type identification sub-network and a vehicle auxiliary attribute identification sub-network; the vehicle type identification sub-network and the vehicle auxiliary attribute identification sub-network share a feature extraction network.

When the vehicle identification network is trained, a joint training mode can be adopted, and the network training efficiency and the vehicle identification network prediction effect are improved.

In some embodiments, the vehicle assistance attribute comprises at least one of: a vehicle color; vehicle shape. It will be appreciated that when the vehicle assistance attributes are of multiple types, the vehicle attribute identification sub-network may be constructed separately for each type of assistance attribute.

In training the vehicle identification network, S22 may be performed first to construct a second training sample. The second training sample comprises second image samples of different vehicle types, and the second image samples comprise third labeling information of the vehicle types and fourth labeling information of vehicle auxiliary attributes. It will be appreciated that if there are multiple vehicle assistance attributes, then information labeling may be performed for each vehicle assistance attribute. In this example, the second training sample may include vehicle color labeling information.

Then, S22 is executed, the second training sample is input into the vehicle identification network, and a vehicle identification result of each sample image is obtained, where the vehicle identification result includes a vehicle type and a vehicle assistance attribute.

And then executing S23 and S24, respectively determining a third loss and a fourth loss according to the vehicle identification result and the third annotation information and the fourth annotation information, and optimizing the helmet identification network according to the third loss and the fourth loss.

Referring to fig. 6, fig. 6 is a schematic view illustrating a vehicle identification process according to the present application.

The vehicle identification network 60 shown in fig. 6 may comprise a vehicle type identification sub-network 63, a vehicle color identification sub-network 64, and a shared feature extraction network 62 and a backbone network 61 shared by both networks.

The description of the backbone network and the shared feature extraction network can refer to the foregoing embodiments, and will not be described in detail herein.

In performing S22-S24, a second image sample may be input to the vehicle identification network, resulting in a vehicle identification result for the second image sample. Then, third loss information can be determined according to the vehicle type indicated by the vehicle identification result and the third label information, and the network parameters of the vehicle type identification sub-network are updated through back propagation according to the third loss information; and the number of the first and second groups,

fourth loss information may be determined based on the vehicle assistance attribute information indicated by the vehicle identification result and the fourth label information, and the network parameter of the vehicle assistance attribute identification sub-network may be updated by back propagation based on the fourth loss information. In this example, the color loss information may be determined according to the vehicle color labeling information, and then the network parameters of the vehicle color identification sub-network may be updated according to the loss information.

In the example, when the vehicle identification network is trained, because a supervised joint training method is adopted, the vehicle type identification sub-network and the vehicle auxiliary attribute identification sub-network can be trained simultaneously, so that the models can be mutually constrained and promoted in the training process, and the convergence efficiency of the sub-networks is improved on one hand; on the other hand, the backbone network shared by the networks and the shared feature extraction network are promoted to extract features beneficial to vehicle attribute identification, so that the vehicle attribute identification accuracy is improved, and the traffic behavior identification accuracy is improved.

After the training for the processing attribute recognition network is completed, the regional visual features corresponding to the target processing in the target image may be input to the processing attribute recognition network to obtain a vehicle recognition result corresponding to the target vehicle.

After the helmet recognition result and the vehicle recognition result are obtained, S110 may be performed.

In some embodiments, in performing S110, it may be determined whether the vehicle type of the target vehicle is a preset vehicle type and whether to perform behavior recognition on the target rider according to a vehicle type recognition result indicated by the vehicle recognition result.

If the vehicle type of the target vehicle is a preset vehicle type and the target human body object does not wear a helmet, the target traffic behavior is considered to be irregular, the behavior detection result for the target rider is determined not to pass, and otherwise the detection result is determined to pass.

In some examples, if the behavior detection result for the target rider is failed, information related to the target rider and/or the target vehicle may be output for a supervisor to perform safety education on the target rider.

The following description is made in connection with an example of a scene of whether or not a motorcycle rider wears a helmet.

A number of surveillance cameras are deployed in the scene. The monitoring camera can send the video stream recorded in the monitoring area to the identification device for traffic behavior detection. It will be appreciated that the recognition device may recognize for each frame of image in the video stream.

The recognition device can be loaded with a pre-trained rider-human-vehicle detection network (hereinafter referred to as network 1), a helmet recognition network (hereinafter referred to as network 2) and a vehicle recognition network (hereinafter referred to as network 3).

The network 1 is used for detecting a rider, a human body and a vehicle appearing in an image, and the rider image corresponding to the rider can comprise the human body and the vehicle. Wherein the vehicle is, for example, a motorcycle, an electric bicycle, a bicycle, or the like. The network 2 may include a network structure shown in fig. 5 for obtaining a helmet recognition result of a human body. The network 2 may be trained by the joint training method described above. The network 3 may include a network structure shown in fig. 6, and is used to obtain a vehicle identification result corresponding to a vehicle. The network 3 may be trained by the joint training method described above. Therefore, the prediction accuracy of the network 2 and the network 3 is improved, and the traffic behavior identification accuracy is improved.

The identification device may also perform multi-target tracking on each object appearing in the video stream according to the detection result of the network 1, and identify an object newly appearing in the monitored area, an object still moving in the monitored area, and an object leaving the monitored area.

The recognition device also maintains a first set of associated objects and a second set of associated objects corresponding to a rider, a human body, and a vehicle, respectively, appearing in the video stream. Wherein the first associated object set comprises first associated objects with the largest coincidence degree with object regions, such as human bodies and motorcycles with the largest coincidence degree with rider regions. In a preferred embodiment, the number of times the first associated object has the greatest degree of overlap with the rider area is greater than a preset threshold, for example: and 20 times. The second associated object set comprises objects which are used as objects with maximum region overlapping degree

The second related object is, for example, a human body and a motorcycle having the largest degree of overlap of regions in the rider region. In a preferred embodiment, the number of times the first associated object has the greatest degree of overlap with the rider area is greater than a preset threshold, for example: and 20 times.

Referring to fig. 7, fig. 7 is a schematic flow chart of a traffic behavior recognition method according to the present application.

As shown in fig. 7, after the recognition device receives the video stream, S701 may be executed to recognize the human body, the rider, and the vehicle appearing in each frame image in the video stream, and the region corresponding to each object through the network 1.

S702 (not shown in the figure) may then be executed to determine the degree of area overlap between each human body, rider and vehicle by IoU, and update the first and second associated object sets corresponding to each object maintained by the identification device.

After the regions corresponding to the objects are detected, S703 may be executed, and through a multi-target tracking method, identity IDs are assigned to the human body, the rider, and the vehicle, and the rider a who leaves the video stream is determined.

S704 may then be executed, and a first set of associated objects corresponding to the rider a may be obtained. And when any object in the first associated object set leaves the monitoring area, updating a second associated set corresponding to the rider A, and deleting associated objects which are not included in the second associated set from the first associated object set.

When all the objects in the first associated object set leave the monitoring area, S705 may be executed, and the human body B and the vehicle C in the first associated object set are output, and form a triple with the rider a.

S706 may then be performed to acquire a high-quality target image with high recognition for rider a. Executing S707 and S708, and performing helmet attribute identification on the human body region characteristic diagram corresponding to the human body B in the target image through the network 2 to obtain a helmet identification result; and through the network 3, carrying out vehicle attribute identification on the vehicle area characteristic diagram corresponding to the vehicle C in the target image to obtain a vehicle identification result.

Finally, S709 may be executed, if the vehicle recognition result indicates that the vehicle C is a motorcycle and the helmet recognition result indicates that the human body B does not wear a helmet, indicating that the driving behavior of the rider a is not normal, and the supervisor may be notified of the relevant information of the rider a and the motorcycle, so as to perform safety education on the rider a by the supervisor.

Therefore, traffic behavior recognition can be carried out according to the complete motion track of the target rider in the video stream, and traffic behavior detection is not carried out aiming at the single-frame image, so that false recognition caused by the accidental property of single-frame snapshot is avoided, and the accuracy of traffic behavior recognition is improved;

in addition, the traffic behavior is recognized through the target human body and the target vehicle meeting the preset region contact ratio requirement between the target rider, the spatial incidence relation among the human body, the vehicle and the rider can be utilized, the recognition accuracy of the vehicle matched with the rider and the human body is improved, and the detection accuracy of the traffic behavior is further improved.

Corresponding to any embodiment, the application also provides a traffic behavior recognition device.

Please refer to fig. 8, fig. 8 is a schematic structural diagram of a traffic behavior recognition device according to the present application.

As shown in fig. 8, the traffic behavior recognition device 80 includes:

an object detection module 81, configured to perform object detection on a received video stream to obtain at least one object and an object region corresponding to the object; the type of the object includes at least one of a human body, a vehicle, and a rider;

a determination module 82 for determining a target rider from the video stream based on results of multi-object tracking of the object;

the contact ratio detection module 83 is configured to perform region contact ratio detection on the target rider based on the obtained object regions corresponding to the objects, so as to obtain a target human body and a target vehicle, which meet a preset region contact ratio requirement with the target rider;

an identification module 84 for determining a behavior recognition result for the target rider based on the helmet recognition result for the target human body and the vehicle recognition result for the target vehicle.

In some illustrated embodiments, the coincidence detection module 83 is specifically configured to:

respectively determining the region overlap ratio between other object regions except the target rider region and the target rider region in each object region based on the obtained object region corresponding to each object;

determining a target human body region and a target vehicle region with the maximum region overlapping degree with the target rider region in the other object regions;

and respectively determining the human body corresponding to the target human body area and the vehicle corresponding to the target vehicle area as the target human body and the target vehicle.

and in response to the fact that the area contact ratio between the target human body area and the target vehicle area reaches a preset contact ratio threshold value, respectively determining the human body corresponding to the target human body area and the vehicle corresponding to the target vehicle area as the target human body and the target vehicle.

respectively determining a human body region and a vehicle region with the largest region overlapping degree with the target rider region in each target image of the video stream based on the obtained object regions corresponding to the objects;

respectively determining the human body area and the vehicle area which are determined to have the maximum contact ratio times to reach a preset threshold value as a target human body area and a target vehicle area;

determining a target human body region and a target vehicle region with the largest region overlapping degree with the target rider region in each target image of the video stream based on the obtained object regions corresponding to the objects, and taking the human body and the vehicle respectively corresponding to the target human body region and the target vehicle region as first associated objects corresponding to the target rider;

determining a second associated object corresponding to the target rider, wherein in an object region corresponding to each object, the region overlapping degree of the target rider region and the second associated object is the largest, and the second associated object comprises a human body and a vehicle;

and determining the target human body and the target vehicle according to the same associated object in the second associated object and the first associated object.

respectively determining a human body region and a vehicle region with the largest region overlapping degree with the target rider region in each target image based on the obtained object regions corresponding to the objects;

and respectively taking the human body area and the vehicle area which are determined to have the maximum number of times of the contact ratio as a preset threshold value as the target human body area and the target vehicle area.

in response to the fact that the same associated object comprises a human body and a vehicle, determining the human body and the vehicle as the target human body and the target vehicle respectively; or,

and in response to the fact that the same associated object comprises a plurality of human bodies and a plurality of vehicles, respectively determining the human body and the vehicle which correspond to the human body and the vehicle with the largest times as the target human body and the target vehicle.

In some embodiments shown, the apparatus 80 further comprises:

an acquisition module for acquiring a target image containing the target rider;

the helmet identification module is used for performing helmet identification on the target human body in the target image by utilizing a helmet identification network to obtain a helmet identification result aiming at the target human body; and the number of the first and second groups,

and the vehicle identification module is used for carrying out vehicle identification on the target vehicle in the target image by utilizing a vehicle identification network to obtain a vehicle identification result aiming at the target vehicle.

In some embodiments shown, the helmet identification result indicates whether the target person wears a hat, and the type of hat worn and hat assist attributes; the apparatus 80 further comprises:

the training module of the helmet identification network is used for constructing a first training sample; the first training sample comprises first image samples of different hat types, and the first image samples comprise first marking information of hat sub-types and second marking information of hat auxiliary attributes;

inputting the first training sample into the helmet identification network, and obtaining a helmet identification result of each sample image, wherein the helmet identification result comprises whether a helmet is worn, a type of the helmet and a hat auxiliary attribute;

respectively determining a first loss and a second loss according to the helmet identification result and the first marking information and the second marking information;

optimizing the helmet identification network based on the first loss and the second loss.

In some embodiments shown, the vehicle identification result indicates a vehicle type and a vehicle assistance attribute of the target vehicle;

the apparatus 80 further comprises:

the training module of the helmet identification network is used for constructing a second training sample; the second training sample comprises second image samples of different vehicle types, and the second image samples comprise third labeling information of the vehicle types and fourth labeling information of vehicle auxiliary attributes;

inputting the second training sample into the vehicle recognition network, and obtaining a vehicle recognition result of each sample image, wherein the vehicle recognition result comprises a vehicle type and a vehicle auxiliary attribute;

respectively determining a third loss and a fourth loss according to the vehicle identification result and the third labeling information and the fourth labeling information;

In some embodiments shown, the obtaining module is specifically configured to:

acquiring a first image containing the target rider;

and acquiring a target image meeting the requirement of image identification degree from the first image.

In some embodiments shown, the apparatus 80 further comprises:

and the output module is used for outputting the relevant information of the target rider and/or the target vehicle if the vehicle type of the target vehicle is a preset vehicle type and the target human body is not wearing a helmet.

The embodiment of the traffic behavior recognition device shown in the application can be applied to electronic equipment. Accordingly, the present application discloses an electronic device, which may comprise: a processor.

A memory for storing processor-executable instructions.

Wherein the processor is configured to call the executable instructions stored in the memory to implement the traffic behavior recognition method shown in any of the foregoing embodiments.

Referring to fig. 9, fig. 9 is a schematic diagram of a hardware structure of an electronic device shown in the present application.

As shown in fig. 9, the electronic device may include a processor for executing instructions, a network interface for making network connections, a memory for storing operating data for the processor, and a non-volatile memory for storing instructions corresponding to the behavior recognizing apparatus.

The embodiments of the apparatus may be implemented by software, or by hardware, or by a combination of hardware and software. Taking a software implementation as an example, as a logical device, the device is formed by reading, by a processor of the electronic device where the device is located, a corresponding computer program instruction in the nonvolatile memory into the memory for operation. In terms of hardware, in addition to the processor, the memory, the network interface, and the nonvolatile memory shown in fig. 9, the electronic device in which the apparatus is located in the embodiment may also include other hardware according to an actual function of the electronic device, which is not described again.

It is to be understood that, in order to increase the processing speed, the instruction corresponding to the traffic behavior recognition device may also be directly stored in the memory, which is not limited herein.

The present application proposes a computer-readable storage medium, which stores a computer program, which may be used to cause a processor to execute a traffic behavior recognition method as shown in any of the preceding embodiments.

One skilled in the art will recognize that one or more embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, one or more embodiments of the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, one or more embodiments of the present application may take the form of a computer program product embodied on one or more computer-usable storage media (which may include, but are not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

"and/or" in this application means having at least one of the two, for example, "a and/or B" may include three schemes: A. b, and "A and B".

The embodiments in the present application are described in a progressive manner, and the same and similar parts among the embodiments can be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the data processing apparatus embodiment, since it is substantially similar to the method embodiment, the description is relatively simple, and for the relevant points, reference may be made to part of the description of the method embodiment.

Specific embodiments of the present application have been described. Other embodiments are within the scope of the following claims. In some cases, the acts or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.

Embodiments of the subject matter and functional operations described in this application may be implemented in the following: digital electronic circuitry, tangibly embodied computer software or firmware, computer hardware that may include the structures disclosed in this application and their structural equivalents, or combinations of one or more of them. Embodiments of the subject matter described in this application can be implemented as one or more computer programs, i.e., one or more modules encoded in computer program instructions that are carried by a tangible, non-transitory program carrier for execution by, or to control the operation of, data processing apparatus. Alternatively or additionally, the program instructions may be encoded in an artificially generated propagated signal, e.g., a machine-generated electrical, optical, or electromagnetic signal, that is generated to encode information for transmission to suitable receiver apparatus for execution by the data processing apparatus. The computer storage medium may be a machine-readable storage device, a machine-readable storage substrate, a random or serial access memory device, or a combination of one or more of them.

The processes and logic flows described in this application can be performed by one or more programmable computers executing one or more computer programs to perform corresponding functions by operating on input data and generating output. The processes and logic flows can also be performed by, and apparatus can also be implemented as, special purpose logic circuitry, e.g., an FPGA (field programmable gate array) or an ASIC (application-specific integrated circuit).

Computers suitable for executing computer programs may include, for example, general and/or special purpose microprocessors, or any other type of central processing unit. Generally, a central processing unit will receive instructions and data from a read-only memory and/or a random access memory. The basic components of a computer may include a central processing unit for implementing or executing instructions and one or more memory devices for storing instructions and data. Generally, a computer will also include, or be operatively coupled to receive data from or transfer data to, or both, one or more mass storage devices for storing data, e.g., magnetic, magneto-optical disks, or optical disks. However, a computer does not necessarily have such a device. Moreover, a computer may be embedded in another device, e.g., a mobile telephone, a Personal Digital Assistant (PDA), a mobile audio or video player, a game console, a Global Positioning System (GPS) receiver, or a portable storage device such as a Universal Serial Bus (USB) flash drive, to name a few.

Computer-readable media suitable for storing computer program instructions and data can include all forms of non-volatile memory, media and memory devices, including by way of example semiconductor memory devices (e.g., EPROM, EEPROM, and flash memory devices), magnetic disks (e.g., internal hard disk or removable disks), magneto-optical disks, and CD ROM and DVD-ROM disks. The processor and the memory can be supplemented by, or incorporated in, special purpose logic circuitry.

Although this application contains many specific implementation details, these should not be construed as limiting the scope of any disclosure or of what may be claimed, but rather as merely describing features of particular disclosed embodiments. Certain features that are described in this application in the context of separate embodiments can also be implemented in combination in a single embodiment. In other instances, features described in connection with one embodiment may be implemented as discrete components or in any suitable subcombination. Moreover, although features may be described as acting in certain combinations and even initially claimed as such, one or more features from a claimed combination can in some cases be excised from the combination, and the claimed combination may be directed to a subcombination or variation of a subcombination.

Similarly, while operations are depicted in the drawings in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order, or that all illustrated operations be performed, to achieve desirable results. In some cases, multitasking and parallel processing may be advantageous. Moreover, the dispersion of various system modules and components in the described embodiments should not be understood as requiring such dispersion in all embodiments, and it should be understood that the described program components and systems can generally be integrated together in a single software product or packaged into multiple software products.

Thus, particular embodiments of the subject matter have been described. Other embodiments are within the scope of the following claims. In some cases, the actions recited in the claims can be performed in a different order and still achieve desirable results. Further, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some implementations, multitasking and parallel processing may be advantageous.

While the invention has been described in terms of what is presently considered to be the most practical and preferred embodiment, it is to be understood that the invention is not to be limited to the disclosed embodiment, but on the contrary, is intended to cover various modifications, equivalents, and alternatives falling within the spirit and scope of the invention as defined by the appended claims.

Claims

1. A traffic behavior recognition method, characterized in that the method comprises:

carrying out object detection on the received video stream to obtain at least one object and an object area corresponding to the object; the type of the object includes at least one of a human body, a vehicle, and a rider;

determining a target rider leaving the video stream based on results of multi-object tracking of the object;

performing region contact degree detection on the target rider based on the obtained object regions corresponding to the objects to obtain a target human body and a target vehicle which meet preset region contact degree requirements with the target rider;

determining a behavior recognition result for the target rider based on the helmet recognition result for the target human body and the vehicle recognition result for the target vehicle.

2. The method according to claim 1, wherein the step of performing region contact degree detection on the target rider based on the obtained object region corresponding to each object to obtain the target human body and the target vehicle meeting a preset region contact degree requirement with the target rider comprises:

3. The method according to claim 2, wherein the determining the human body corresponding to the target human body area and the vehicle corresponding to the target vehicle area as the target human body and the target vehicle, respectively, comprises:

4. The method according to claim 1, wherein the step of performing region contact degree detection on the target rider based on the obtained object region corresponding to each object to obtain the target human body and the target vehicle meeting a preset region contact degree requirement with the target rider comprises:

5. The method according to claim 1, wherein the step of performing region contact degree detection on the target rider based on the obtained object region corresponding to each object to obtain the target human body and the target vehicle meeting a preset region contact degree requirement with the target rider comprises:

6. The method according to claim 5, wherein the determining a target human body region and a target vehicle region having a largest region overlapping degree with the target rider region in each target image of the video stream based on the obtained object region corresponding to each object comprises:

7. The method of claim 6, wherein determining the target human body and the target vehicle according to a same one of the second associated object and the first associated object comprises:

8. The method according to any one of claims 1 to 7, wherein before obtaining the behavior recognition result for the target rider based on the helmet recognition result for the target human body and the vehicle recognition result for the target vehicle, the method further comprises:

performing helmet identification on the target human body by using a helmet identification network to obtain a helmet identification result aiming at the target human body; and the number of the first and second groups,

and utilizing a vehicle identification network to identify the target vehicle to obtain a vehicle identification result aiming at the target vehicle.

9. The method according to claim 8, wherein the helmet identification result indicates whether the target person wears a hat, and a type of the hat worn and hat assist attributes;

the training method of the helmet identification network comprises the following steps:

constructing a first training sample; the first training sample comprises first image samples of different hat types, and the first image samples comprise first marking information of hat sub-types and second marking information of hat auxiliary attributes;

10. The method according to claim 8 or 9, wherein the vehicle identification result indicates a vehicle type and a vehicle assistance attribute of the target vehicle;

the training method of the vehicle identification network comprises the following steps:

constructing a second training sample; the second training sample comprises second image samples of different vehicle types, and the second image samples comprise third labeling information of the vehicle types and fourth labeling information of vehicle auxiliary attributes;

11. The method according to any one of claims 1-10, further comprising:

and if the vehicle type of the target vehicle is a preset vehicle type and the target human body does not wear a helmet, outputting relevant information of the target rider and/or the target vehicle.

12. A traffic behavior recognition apparatus, characterized in that the apparatus comprises:

the object detection module is used for carrying out object detection on the received video stream to obtain at least one object and an object area corresponding to the object; the type of the object includes at least one of a human body, a vehicle, and a rider;

a determination module to determine a target rider leaving the video stream based on a result of multi-object tracking of the object;

the detection module is used for detecting the region contact ratio of the target rider based on the obtained object regions corresponding to the objects to obtain a target human body and a target vehicle which meet the preset region contact ratio requirement with the target rider;

a recognition result for determining a behavior recognition result for the target rider based on a helmet recognition result for the target human body and a vehicle recognition result for the target vehicle.

13. An electronic device, characterized in that the device comprises:

a processor;

a memory for storing processor-executable instructions;

wherein the processor implements the traffic behavior recognition method according to any one of claims 1 to 11 by executing the executable instructions.

14. A computer-readable storage medium, characterized in that the storage medium stores a computer program for causing a processor to execute the traffic behavior recognition method according to any one of claims 1-11.