CN112883782B

CN112883782B - Method, device, equipment and storage medium for identifying putting behaviors

Info

Publication number: CN112883782B
Application number: CN202110035578.0A
Authority: CN
Inventors: 王鑫; 朱李娟; 李兰
Original assignee: Shanghai Keating Communication Technology Co ltd
Current assignee: Shanghai Keating Communication Technology Co ltd
Priority date: 2021-01-12
Filing date: 2021-01-12
Publication date: 2023-03-24
Anticipated expiration: 2041-01-12
Also published as: CN112883782A

Abstract

The embodiment of the application discloses a method, a device, equipment and a storage medium for identifying a launch behavior, wherein the method comprises the following steps: determining time domain information of a video to be identified; performing frame extraction processing on the video to be identified to obtain a plurality of candidate video frames; determining a target video frame corresponding to the action event according to the candidate video frames; generating a target frame sequence corresponding to each action event according to the target video frame corresponding to each action event and the time domain information, and identifying the release behavior; through the technical scheme, the efficiency of identifying the putting behaviors is improved.

Description

Method, device, equipment and storage medium for identifying putting behaviors

Technical Field

The embodiment of the application relates to the technical field of behavior recognition, in particular to a method, a device, equipment and a storage medium for recognizing a release behavior.

Background

The throwing behavior identification is applied to monitoring the non-civilized garbage throwing behaviors, including behaviors of throwing garbage randomly, not classifying the garbage, stacking the garbage everywhere, maliciously damaging public facilities and the like. In the intelligent garbage throwing system, the throwing behavior identification has important application value.

The drop behavior identification is mainly to automatically analyze video data by using a computer so as to identify the category of the drop behavior. However, the existing putting behavior identification method has the problem of low efficiency of putting behavior identification.

Disclosure of Invention

The application provides a method, a device, equipment and a storage medium for identifying a launch behavior, so as to improve the identification efficiency of the launch behavior.

In a first aspect, an embodiment of the present application provides a method for identifying a launch behavior, where the method includes:

determining time domain information of a video to be identified;

performing frame extraction processing on the video to be identified to obtain a plurality of candidate video frames;

determining a target video frame corresponding to the action event according to the candidate video frames;

and generating a target frame sequence corresponding to each action event according to the target video frame corresponding to each action event and the time domain information, and identifying the release behavior.

In a second aspect, an embodiment of the present application further provides a device for identifying a release behavior, where the device includes:

the time domain information determining module is used for determining the time domain information of the video to be identified;

the candidate video frame acquisition module is used for performing frame extraction processing on the video to be identified to obtain a plurality of candidate video frames;

the target video frame determining module is used for determining a target video frame corresponding to the action event according to the candidate video frames;

and the release behavior identification module is used for generating a target frame sequence corresponding to each action event according to the target video frame corresponding to each action event and the time domain information and is used for carrying out release behavior identification.

In a third aspect, an embodiment of the present application further provides an electronic device, where the device includes:

one or more processors;

a storage device for storing one or more programs,

when the one or more programs are executed by the one or more processors, the one or more processors implement any one of the delivery behavior recognition methods provided in the embodiments of the first aspect.

In a fourth aspect, the present application further provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements any one of the delivery behavior identification methods provided in the embodiments of the first aspect.

According to the method and the device for recognizing the release behavior, a plurality of candidate video frames are obtained by performing frame extraction processing on the video to be recognized, the target video frames corresponding to the action events are screened out on the basis of the candidate video frames, and finally, the target frame sequence corresponding to the action events is generated according to the target video frames corresponding to the action events and the time domain information determined in advance from the video to be recognized for recognizing the release behavior. Through the technical scheme, the target frame sequence which contains the key characteristics of the action event and has data volume far smaller than the number of the video frames to be identified is screened out to be used for identifying the release behavior, so that the data volume for identifying the release behavior of the video to be identified is effectively reduced, and the identification efficiency of the release behavior is improved.

Drawings

Fig. 1 is a flowchart of a method for identifying a launch behavior according to an embodiment of the present application;

fig. 2 is a flowchart of a delivery behavior identification method according to a second embodiment of the present application;

fig. 3 is a schematic diagram of a launch behavior recognition device according to a third embodiment of the present application;

fig. 4 is a schematic view of an electronic device according to a fourth embodiment of the present application.

Detailed Description

The present application will be described in further detail with reference to the drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the application and are not limiting of the application. It should be further noted that, for the convenience of description, only some of the structures related to the present application are shown in the drawings, not all of the structures.

Example one

Fig. 1 is a flowchart of a method for identifying a launch behavior according to an embodiment of the present application. The method and the device for garbage collection are applicable to videos collected in a garbage collection area and used for garbage collection, and the collection behavior is identified. The method may be executed by a launch behavior recognition apparatus, which may be implemented by software and/or hardware, and is specifically configured in an electronic device, which may be a mobile terminal or a fixed terminal. The fixed terminal can be a background server for carrying out putting behavior identification.

Referring to fig. 1, a method for identifying a launch behavior provided in an embodiment of the present application includes:

and S110, determining time domain information of the video to be identified.

The video to be identified refers to a video clip acquired from the image acquisition device or a video clip intercepted from an existing video clip acquired by the image acquisition device. The video clip may be a video clip of any time period, or a video clip of a fixed time period, for example, the video clip may be captured or intercepted every hour.

The time domain information refers to information such as the start time, the end time, the video duration and the like of the video to be identified. The starting time refers to the starting time of the video to be identified intercepted from the monitoring video, and the ending time refers to the ending time of the video to be identified intercepted from the monitoring video. In addition, the time domain information also comprises time information of each video frame in the video to be identified, and the time information indicates the occurrence sequence of each video frame.

In this embodiment, the monitoring video can be stored as a video file through an image acquisition device (such as a monitoring camera) installed near the trash can or the trash plant, and the video file is uploaded to a background server at regular time to obtain the monitoring video, wherein the background server is used for identifying the release behavior in the monitoring video. For example, an image capture device uploads a captured surveillance video to a background server every 3 seconds, while every 1 second of video may consist of 25 frames, each referred to as a frame.

And S120, performing frame extraction processing on the video to be identified to obtain a plurality of candidate video frames.

The frame extraction processing refers to extracting a certain number of video frames from an original video at a certain sampling frequency to obtain a plurality of candidate video frames. The sampling frequency may be set according to actual requirements, which is not specifically limited in the embodiment of the present application.

The candidate video frames are obtained after frame extraction processing, are smaller than the number of the video frames to be identified in number, and comprise key features in the video to be identified.

It can be understood that the frame extraction processing is performed on the video to be identified, so that the data volume of the video to be identified is reduced in a certain manner, and the complexity of identifying the putting behavior is reduced.

Alternatively, a frame extraction processing operation may be performed based on a predetermined sampling frequency to extract from the video to be identified, so as to obtain a plurality of candidate video frames.

Or optionally, in order to implement the frame extraction processing operation more reasonably, specifically, performing frame extraction processing on the video to be identified to obtain a plurality of candidate video frames, including: determining the time complexity of the video to be identified according to the time domain information; determining the frame extraction frequency according to the time complexity; and performing frame extraction processing on the video to be identified according to the frame extraction frequency to obtain a plurality of candidate video frames.

The temporal complexity refers to a temporal variation of the video to be recognized, and the video to be recognized with a higher motion degree generally has a higher temporal complexity.

The frame extraction frequency is that one video picture is extracted correspondingly every a certain number of video pictures. For example, the frame decimation frequency may be such that one video picture is decimated every 5 frames of video pictures.

According to the time complexity, the frame extraction frequency is determined by sequentially setting larger frame extraction frequencies as the time complexity of the video to be identified increases.

It can be understood that the frame extraction processing is performed on the video to be identified according to the frame extraction frequency determined according to the time complexity of the video to be identified, so that the data volume of the video to be identified is reduced, the complexity of identifying the release behavior is reduced, and the key features of the release behavior identification to be performed are also retained on the whole by the plurality of candidate video frames obtained through the frame extraction processing.

And S130, determining a target video frame corresponding to the action event according to the candidate video frames.

The action event refers to an action event related to a throwing action, and taking a rubbish delivery action as an example, the action event includes action events such as arm lifting, garbage bin opening and arm lowering.

In general, each delivery behavior is composed of a plurality of different action events, and may also include repetition of the same action event. At present, considering the complexity of overall recognition and analysis of a delivery behavior composed of a plurality of action events and the feasibility of motion analysis, the delivery behavior is often divided into a plurality of action events, so as to improve the efficiency of action recognition and reduce the complexity of action recognition.

The target video frame is a video frame related to the action event extracted from the plurality of candidate video frames.

Optionally, according to the feature set corresponding to the action event, removing the candidate video frames without the features corresponding to the action event on the plurality of candidate video frames in a manner of identifying by adopting a convolutional neural network, and finally obtaining a target video frame only containing the features corresponding to the action event; or, the candidate video frames which are higher than the set identification threshold and contain the corresponding features of the action events are reserved to obtain more target video frames. Wherein, the set identification threshold value can be set according to actual requirements.

It can be understood that the target video frames related to the action events are extracted by further screening the candidate video frames, so that the data volume of the video to be identified is reduced, and the subsequent efficient identification of the delivery behaviors for the target video frames is facilitated.

Optionally, performing edge detection on the target video frame containing the release behavior to identify the release type of the release in the target video frame; the types of the input objects comprise a single input type and a mixed input type; and if the type of the object to be placed is the mixed placement type, storing the video clip comprising the target video frame.

Wherein, the throwing objects can be various domestic garbage, including kitchen garbage, recoverable garbage, other garbage and the like.

The single throwing type is a throwing object after the household garbage is classified, and correspondingly, the mixed throwing type is a throwing object without classifying the household garbage.

Considering the difference between the shapes of the objects of the single-shot type and the mixed-shot type, for example, the difference between the shapes of the objects of the single-shot type is relatively uniform, while the difference between the shapes of the objects of the mixed-shot type is relatively non-uniform, so the types of the objects can be identified by using an edge detection algorithm. In general, the edge detection may use a brightness gradient method, and detect the edge of the object by identifying the pixel point with the largest brightness gradient change. Specifically, taking a throwing object including kitchen waste (including leftovers, bones, leaves, etc.) and recyclable waste (paper cups, aluminum cans, soda bottles, etc.) as an example, the throwing object would be considered to be a mixed throwing type, since the throwing object has both a relatively fine edge and a relatively coarse edge.

It can be understood that the edge recognition is performed on the throwing objects related to the throwing behaviors in the target video frame, so that the non-civilized behaviors which do not classify the garbage can be effectively recognized. When the type of the release is identified as a hybrid release type, it is stated that the release is not classified, in which case a warning may be given and a video clip including the target video frame may be stored.

In some embodiments, an image acquisition device with an image recognition function can be installed above each trash can, when the image acquisition device shoots that someone puts trash into the trash can, a monitoring video picture is captured, edge detection is carried out on the put-in object in the monitoring picture, whether the put-in object is dry trash or wet trash can be judged quickly according to an edge detection result, when the fact that the trash is in a dry-wet unseparated state is detected, the image acquisition device stores the video as an evidence, the face with an illegal putting behavior is recognized in a face recognition mode, and the face is reported to a background server.

And S140, generating a target frame sequence corresponding to each action event according to the target video frame and the time domain information corresponding to each action event, and identifying the release behavior.

The target frame sequence refers to a video frame set obtained by sequencing all target video frames according to the sequence of the time domain information. The length of the target frame sequence is related to the recognition accuracy and recognition speed of the recognition of the releasing behavior, and the length of the target frame sequence can be determined according to actual requirements, for example, in a scene with a certain requirement on the recognition speed, the recognition speed of the recognition of the releasing behavior can be improved as much as possible under the condition of ensuring a certain recognition accuracy, and at the moment, the target frame sequence can be set to be longer.

Optionally, before generating the target frame sequence corresponding to each action event according to the target video frame and the time domain information corresponding to each action event, the method further includes: the human body contour is extracted according to a static background subtraction method. Specifically, based on a pre-established background model of the garbage throwing area, an image sequence of a target video frame to be detected is subtracted from the pre-established background model image to obtain a difference value, and then binarization, dilation corrosion and other processing are performed on the difference value to detect a human body area, so that a human body contour required by throwing behavior recognition is obtained.

It can be understood that irrelevant background picture information is filtered out through a static background subtraction method, so that only a human body can be concerned in subsequent putting behavior identification, and the influence of other irrelevant information in a picture on the putting behavior identification is avoided.

According to the method and the device for recognizing the release behavior, a plurality of candidate video frames are obtained by performing frame extraction processing on the video to be recognized, the target video frames corresponding to the action events are screened out on the basis of the candidate video frames, and finally, the target frame sequence corresponding to the action events is generated according to the target video frames corresponding to the action events and the time domain information determined in advance from the video to be recognized for recognizing the release behavior. Through the technical scheme, the target frame sequence which contains the key characteristics of the action event and has the data volume far smaller than the number of the frames of the video to be recognized is screened out so as to be used for recognizing the release behavior, the data volume for recognizing the release behavior of the video to be recognized is effectively reduced, and the recognition efficiency of the release behavior is improved.

Example two

Fig. 2 is a flowchart of an issuing behavior identification method provided in the second embodiment of the present application, and this embodiment is an optimization of the foregoing scheme based on the foregoing embodiment.

Further, the operation of determining a target video frame corresponding to an action event according to the candidate video frames is refined into the operation of determining an initial action frame of the current action event aiming at any current action event; generating at least one reference action frame according to at least two candidate video frames after the initial action frame and the initial action frame; determining a cutoff motion frame based on each of the reference motion frames and the plurality of candidate video frames; and taking the initial action frame, the cut-off action frame and a candidate video frame positioned between the initial action frame and the cut-off action frame as a target video frame corresponding to the current action event so as to perfect the determination process of the target video frame.

Wherein explanations of the same or corresponding terms as those of the above-described embodiments are omitted.

Referring to fig. 2, the method for identifying a launch behavior provided in this embodiment includes:

s210, determining time domain information of the video to be identified.

S220, performing frame extraction processing on the video to be identified to obtain a plurality of candidate video frames.

And S230, aiming at any current action event, determining an initial action frame of the current action event.

The initial motion frame refers to the frame of the picture that is the first frame when a motion event occurs.

Optionally, before determining, for any current action event, an initial action frame of the current action event, the method further includes: and based on a predetermined characteristic set of the person, performing characteristic matching on the characteristic set and each video frame in the video to be recognized, and reserving the video frames exceeding a set matching threshold value, so that a plurality of video frames only including the scene where the person appears are recognized from the video to be recognized.

It can be understood that the number of video frames of the video to be identified is further reduced through feature matching, which is beneficial to improving the efficiency of subsequently identifying the release behavior.

Alternatively, the initial action frame of a certain action event may be determined based on a pre-trained network model, and different action events correspond to different network models.

Or optionally, for the complexity of the network model training, an initial action frame of the action event may also be determined by using a picture similarity method, so as to simplify the determination process of the initial action frame. Specifically, for any current action event, determining an initial action frame of the current action event includes: sequentially determining the picture similarity of at least two candidate video frames, and identifying an initial action frame according to the picture similarity; or, the ending action frame of the previous action event is used as the initial action frame of the current action event.

The image similarity can be calculated by methods such as Euclidean distance, cosine distance and Hamming distance.

The ending action frame is the last frame of the action event at the end of the action event relative to the initial action frame.

It can be understood that when the action event does not occur, the picture similarity of the adjacent candidate video frames is very large, and therefore, based on the above situation, the picture similarity of the two adjacent candidate video frames can be sequentially determined, and the initial action frame is determined according to the picture similarity result. Specifically, when the picture similarity suddenly decreases from a relatively large value, it indicates that the pictures of the two candidate video frames at this time are greatly different, that is, when an action event occurs, such as a person lifting an arm suddenly in the video, the initial action frame where the action event occurs may be determined according to the change of the picture similarity. In addition, an initial action frame of the current action event can be determined in a simpler mode, for example, in the case of knowing an ending action frame of a last action event, the ending action frame of a previous action event can be directly used as the initial action frame of the current action event.

And S240, generating at least one reference motion frame according to the at least two candidate video frames after the initial motion frame and the initial motion frame.

The reference action frame refers to a reference action frame which is predicted to be related to the current action event according to the initial action frame and at least two candidate video frames after the initial action frame, that is, the reference action frame belongs to a part of the current action event.

Optionally, a next action in the video is predicted based on artificial intelligence and a machine learning algorithm from at least two candidate video frames and the initial action frame following the initial action frame. Specifically, the reference motion frame related to the current motion event can be generated by inputting the several frames of videos into a pre-trained neural network.

Or optionally, in order to simplify a prediction process of the reference action frame, generating at least one reference action frame according to the initial action frame and at least two candidate video frames following the initial action frame, including: determining action change information of the current action event according to at least two candidate video frames and the initial action frame after the initial action; the motion change information includes a motion change amplitude; and generating at least one reference motion frame according to the motion change amplitude.

The motion change information includes, but is not limited to, a motion position, a motion change amplitude, and other information. The motion change amplitude refers to the motion change amplitude of adjacent video frames, namely the motion range between two adjacent video frame pictures; the motion position refers to a position where each motion appears in the video frame picture.

In some embodiments, the generating of the at least one reference motion frame according to the motion change amplitude may be that motion change amplitudes of adjacent video frames are calculated according to the initial motion frame and at least two candidate video frames following the initial motion frame, and a plurality of reference motion frames are predicted on the basis of the candidate video frames according to an average value of the motion change amplitudes.

It can be understood that the motion change information of the current motion event includes the characteristics of the occurrence of the current motion event, for example, how large the motion change amplitude of the current motion event is, and according to the motion change amplitude in the motion change information, the reference motion frame related to the current motion event can be predicted more accurately.

Optionally, the motion change information may further include a motion change rate, wherein the motion change rate refers to a change rate of the motion amplitude. It is understood that, like the motion change amplitude, the motion change rate can also be used as motion change information to provide a basis for predicting the reference motion frame. The action change rate introduces the relation between the action change amplitude and time, and the action speed can be known according to the action change rate, so that the reference action frame can be predicted more accurately.

And S250, determining a cut-off action frame according to each reference action frame and a plurality of candidate video frames.

In each reference action frame and a plurality of candidate video frames, an ending action frame of the current action event exists, and the ending action frame can be determined in a certain mode. Specifically, each reference motion frame is sequentially compared with a plurality of candidate video frames, and the rearmost candidate video frame with the matching degree exceeding the set threshold with each reference motion frame is taken as a cut-off motion frame. The matching degree may be based on the similarity of the video frame pictures, and the set threshold may be a value preset according to the actual precision requirement, for example, the set threshold is 95%.

For example, two predicted reference motion frames B1 and B2 exist, the reference motion frames B1 and B2 are sequentially compared with a plurality of candidate video frames (A1-A5), and if the picture similarity between the reference motion frame B1 and the candidate video frame A3 is the highest and reaches 96%, and the picture similarity between the reference motion frame B2 and the candidate video frame A5 is the highest and reaches 95.6%, the candidate video frame A5 is used as the ending motion frame of the current motion event. Wherein, only one candidate video frame with the highest matching degree corresponding to one reference motion frame can be determined.

It can be understood that the ending action frame is used as the last frame of the current action event, and has significance for determining the subsequent target video frame, and the more accurate the determined position of the ending action frame is, the more the obtained target video frame can describe the current action event.

Optionally, if the motion change information further includes a motion change rate, correspondingly, determining a cut-off motion frame according to each reference motion frame and the plurality of candidate video frames includes: determining a candidate interval of a candidate video frame corresponding to the reference action frame according to the action change rate; and determining a cut-off action frame according to the candidate video frame and the reference action frame in the candidate interval.

Specifically, according to the motion change rate, a corresponding candidate interval may be determined for each reference motion frame, and the candidate video frames in the candidate interval may be compared with the corresponding reference motion frame, so that it is not necessary to compare each reference motion frame with all the candidate video frames.

For example, two predicted reference motion frames B1 and B2 exist, a video frame in which the next motion appears can be estimated to be A3 according to the motion change rate of the current event, and the candidate interval [ A1-A4] is taken as the most likely interval in which the reference motion frame B1 appears in consideration of fault tolerance, so as to perform comparison; accordingly, a candidate interval [ A3-A5] may be determined for the reference action frame B2; and comparing each reference action frame with a plurality of candidate video frames in the candidate interval in sequence, and taking the rearmost candidate video frame with the matching degree exceeding the set threshold value with each reference action frame as a cut-off action frame.

It can be understood that the candidate interval of the candidate video frame corresponding to the reference motion frame is determined based on the motion change rate in the motion change information, and the determination of the candidate interval limits the range of the candidate video frame, so that the computation amount of comparing each reference motion frame with a plurality of candidate video frames one by one is reduced, which is more efficient for determining the cut-off motion frame.

And S260, taking the initial action frame, the cut-off action frame and the candidate video frame between the initial action frame and the cut-off action frame as a target video frame corresponding to the current action event.

It can be understood that after the initial action frame and the cut-off action frame corresponding to the action event are accurately identified, the initial action frame, the cut-off action frame and the candidate video frame located between the initial action frame and the cut-off action frame can be used as the target video frame corresponding to the current action event, so that unnecessary candidate video frames are prevented from being used for identifying the release behavior together as the target video frame, and the data amount for identifying the release behavior is reduced.

And S270, generating a target frame sequence corresponding to each action event according to the target video frame and the time domain information corresponding to each action event, and identifying the releasing behavior.

The method comprises the steps of refining a determination process of a target video frame on the basis of the above embodiment, generating a plurality of reference action frames according to at least two candidate video frames and an initial action frame after the initial action frame of the current action event for any current action event, determining a cut-off action frame according to each reference action frame and the plurality of candidate video frames, and finally combining the initial action frame, the cut-off action frame and the candidate video frames positioned between the initial action frame and the cut-off action frame as the target video frame corresponding to the current action event and time domain information to generate a target frame sequence corresponding to each action event for releasing action identification, so that the target video frame capable of accurately describing the action event is determined in the video to be identified, the number of the video frames is reduced, the identification efficiency of the action is improved, and the accuracy of identifying the releasing action is improved.

EXAMPLE III

Fig. 3 is a schematic structural diagram of a release behavior recognition device according to a third embodiment of the present application. Referring to fig. 3, an apparatus for identifying a launch behavior according to an embodiment of the present application includes: a time domain information determination module 310, a candidate video frame acquisition module 320, a target video frame determination module 330, and a release behavior identification module 340.

A time domain information determining module 310, configured to determine time domain information of a video to be identified;

a candidate video frame obtaining module 320, configured to perform frame extraction processing on the video to be identified to obtain multiple candidate video frames;

a target video frame determining module 330, configured to determine, according to the candidate video frames, a target video frame corresponding to an action event;

and the release behavior identification module 340 is configured to generate a target frame sequence corresponding to each action event according to the target video frame and the time domain information corresponding to each action event, and perform release behavior identification.

Further, the candidate video frame obtaining module 320 includes:

the time complexity determining unit is used for determining the time complexity of the video to be identified according to the time domain information;

the frame extracting frequency determining unit is used for determining the frame extracting frequency according to the time complexity;

and the candidate video frame acquisition unit is used for performing frame extraction processing on the video to be identified according to the frame extraction frequency to obtain a plurality of candidate video frames.

Further, the apparatus further comprises:

the detection module is used for carrying out edge detection on a target video frame containing the putting behaviors so as to identify the type of the put object in the target video frame; the types of the objects to be put into the device comprise a single putting type and a mixed putting type;

and the storage module is used for storing the video clip comprising the target video frame if the type of the put object is a mixed putting type.

Further, the target video frame determination module 330 includes:

the initial action frame determining unit is used for determining an initial action frame of any current action event;

a reference action frame determining unit, configured to generate at least one reference action frame according to the initial action frame and at least two candidate video frames after the initial action frame;

a cut-off action frame determination unit for determining a cut-off action frame from each of the reference action frames and the plurality of candidate video frames;

and the target video frame determining unit is used for taking the initial action frame, the cut-off action frame and a candidate video frame positioned between the initial action frame and the cut-off action frame as a target video frame corresponding to the current action event.

Further, the initial action frame determination unit includes:

the initial action frame determining subunit is used for sequentially determining the image similarity of at least two candidate video frames and identifying an initial action frame according to the image similarity; or, the ending action frame of the previous action event is used as the initial action frame of the current action event.

Further, the reference motion frame determination unit includes:

the action change information determining subunit is configured to determine action change information of the current action event according to at least two candidate video frames after the initial action and the initial action frame; the motion change information comprises a motion change amplitude;

and the reference action frame generator is used for generating at least one reference action frame according to the action change amplitude.

Further, the motion change information further includes a motion change rate; accordingly, the cutoff action frame determination unit includes:

a candidate interval determining subunit, configured to determine a candidate interval of a candidate video frame corresponding to the reference motion frame according to the motion change rate;

and the cut-off action frame determining subunit is used for determining the cut-off action frame according to the candidate video frame in the candidate interval and the reference action frame.

The putting behavior identification device provided by the embodiment of the application can execute the putting behavior identification method provided by any embodiment of the application, and has the corresponding functional modules and beneficial effects of the execution method.

Example four

Fig. 4 is a schematic structural diagram of an electronic device according to a fourth embodiment of the present disclosure, as shown in fig. 4, the electronic device includes a processor 410, a memory 420, an input device 430, and an output device 440.

The number of the processors 410 in the device may be one or more, and one processor 410 is taken as an example in fig. 4; the processor 410, the memory 420, the input device 430 and the output device 440 in the apparatus may be connected by a bus or other means, for example, in fig. 4.

The input device 430 is used for receiving a video to be identified.

An output means 440 for outputting the sequence of target frames.

The processor 410 may determine time domain information of the video to be identified according to the video to be identified input by the input device 430; the video to be identified can be subjected to frame extraction processing to obtain a plurality of candidate video frames; determining a target video frame corresponding to the action event according to the candidate video frames; generating a target frame sequence corresponding to each action event according to the target video frame corresponding to each action event and the time domain information; the sequence of target frames may also be transmitted to the output device 440 for placement behavior recognition.

The memory 420 serves as a computer-readable storage medium, and may be used to store software programs, computer-executable programs, and modules, such as program instructions/modules corresponding to the delivery behavior identification method in the embodiment of the present application (for example, the temporal information determination module 310, the candidate video frame acquisition module 320, the target video frame determination module 330, and the delivery behavior identification module 340 in the delivery behavior identification device). The processor 410 executes various functional applications of the device and data processing by executing software programs, instructions and modules stored in the memory 420, so as to implement the above-mentioned delivery behavior recognition method.

The memory 420 may mainly include a program storage area and a data storage area, wherein the program storage area may store an operating system, an application program required for at least one function; the storage data area may store data created according to the use of the terminal, etc. (such as the video to be identified, the candidate video frame, the initial action frame, the reference action frame, the cutoff action frame, the target video frame, and the target frame sequence in the above-described embodiments, etc.). Further, the memory 420 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some examples, memory 420 may further include memory located remotely from processor 410, which may be connected to devices through a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The input means 430 may be used to receive input numeric or character information and to generate key signal inputs relating to user settings and function controls of the device. The output device 440 may include a display device such as a display screen.

EXAMPLE five

A storage medium containing computer-executable instructions, which when executed by a computer processor, perform a method for impression behavior identification, the method comprising:

determining time domain information of a video to be identified;

From the above description of the embodiments, it is obvious for those skilled in the art that the present application can be implemented by software and necessary general hardware, and certainly can be implemented by hardware, but the former is a better embodiment in many cases. Based on such understanding, the technical solutions of the present application may be embodied in the form of a software product, which may be stored in a computer-readable storage medium, such as a floppy disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a FLASH Memory (FLASH), a hard disk or an optical disk of a computer, and includes several instructions for enabling a computer device (which may be a personal computer, a server, or a network device) to execute the methods described in the embodiments of the present application.

It should be noted that, in the embodiment of the release behavior recognition apparatus, each included unit and module are only divided according to functional logic, but are not limited to the above division, as long as the corresponding function can be implemented; in addition, specific names of the functional units are only used for distinguishing one functional unit from another, and are not used for limiting the protection scope of the application.

It is to be noted that the foregoing is only illustrative of the preferred embodiments of the present application and the technical principles employed. It will be understood by those skilled in the art that the present application is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the application. Therefore, although the present application has been described in more detail with reference to the above embodiments, the present application is not limited to the above embodiments, and may include other equivalent embodiments without departing from the spirit of the present application, and the scope of the present application is determined by the scope of the appended claims.

Claims

1. A method for identifying a release behavior is characterized by comprising the following steps:

determining time domain information of a video to be identified;

determining a target video frame corresponding to an action event according to the candidate video frames;

generating a target frame sequence corresponding to each action event according to the target video frame corresponding to each action event and the time domain information, and identifying the release behavior;

wherein, the determining a target video frame corresponding to an action event according to the candidate video frames comprises:

aiming at any current action event, determining an initial action frame of the current action event;

generating at least one reference action frame according to at least two candidate video frames after the initial action frame and the initial action frame;

determining a cutoff motion frame based on each of the reference motion frames and the plurality of candidate video frames;

and taking the initial action frame, the cut-off action frame and a candidate video frame between the initial action frame and the cut-off action frame as a target video frame corresponding to the current action event.

2. The method of claim 1, wherein the determining an initial action frame for the current action event comprises:

sequentially determining the picture similarity of at least two candidate video frames, and identifying an initial action frame according to the picture similarity; or,

and taking the ending action frame of the previous action event as the initial action frame of the current action event.

3. The method of claim 1, wherein generating at least one reference action frame from at least two candidate video frames following the initial action frame and the initial action frame comprises:

determining action change information of the current action event according to at least two candidate video frames after the initial action and the initial action frame; the motion change information comprises a motion change amplitude;

and generating at least one reference motion frame according to the motion change amplitude.

4. The method of claim 3, wherein the motion change information further comprises a rate of change of motion;

correspondingly, the determining a cut-off motion frame according to each of the reference motion frames and the candidate video frames includes:

determining a candidate interval of a candidate video frame corresponding to the reference action frame according to the action change rate;

and determining the cut-off action frame according to the candidate video frame in the candidate interval and the reference action frame.

5. The method according to claim 1, wherein the performing frame extraction processing on the video to be identified to obtain a plurality of candidate video frames comprises:

determining the time complexity of the video to be identified according to the time domain information;

determining the frame extraction frequency according to the time complexity;

and performing frame extraction processing on the video to be identified according to the frame extraction frequency to obtain a plurality of candidate video frames.

6. The method according to any one of claims 1-5, further comprising:

performing edge detection on a target video frame containing a putting behavior to identify the type of a put object in the target video frame; the types of the objects to be put into the device comprise a single putting type and a mixed putting type;

and if the type of the put object is a mixed putting type, storing the video clip comprising the target video frame.

7. A putting behavior recognition device, comprising:

the release behavior identification module is used for generating a target frame sequence corresponding to each action event according to the target video frame corresponding to each action event and the time domain information and identifying release behaviors;

the target video frame determination module comprises:

8. An electronic device, comprising:

one or more processors;

a storage device to store one or more programs,

when executed by the one or more processors, cause the one or more processors to implement a method of impression behavior identification as recited in any of claims 1-6.

9. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out a method for impression behavior recognition according to any one of claims 1 to 6.