CN106599789A - Video class identification method and device, data processing device and electronic device - Google Patents
Video class identification method and device, data processing device and electronic device Download PDFInfo
- Publication number
- CN106599789A CN106599789A CN201611030170.XA CN201611030170A CN106599789A CN 106599789 A CN106599789 A CN 106599789A CN 201611030170 A CN201611030170 A CN 201611030170A CN 106599789 A CN106599789 A CN 106599789A
- Authority
- CN
- China
- Prior art keywords
- video
- segmenting
- time domain
- classification
- spatial domain
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 128
- 238000012545 processing Methods 0.000 title claims abstract description 73
- 238000005070 sampling Methods 0.000 claims abstract description 14
- 238000013528 artificial neural network Methods 0.000 claims abstract description 13
- 238000013527 convolutional neural network Methods 0.000 claims description 116
- 230000007935 neutral effect Effects 0.000 claims description 61
- 230000008569 process Effects 0.000 claims description 41
- 230000004927 fusion Effects 0.000 claims description 25
- 230000011218 segmentation Effects 0.000 claims description 23
- 230000000007 visual effect Effects 0.000 claims description 6
- 230000003287 optical effect Effects 0.000 abstract description 3
- 238000007499 fusion processing Methods 0.000 abstract 1
- 230000006870 function Effects 0.000 description 108
- 239000013598 vector Substances 0.000 description 43
- 238000012549 training Methods 0.000 description 33
- 230000009471 action Effects 0.000 description 19
- 238000011478 gradient descent method Methods 0.000 description 16
- 230000000694 effects Effects 0.000 description 11
- 238000005516 engineering process Methods 0.000 description 11
- PXFBZOLANLWPMH-UHFFFAOYSA-N 16-Epiaffinine Natural products C1C(C2=CC=CC=C2N2)=C2C(=O)CC2C(=CC)CN(C)C1C2CO PXFBZOLANLWPMH-UHFFFAOYSA-N 0.000 description 10
- 230000009466 transformation Effects 0.000 description 10
- 238000006243 chemical reaction Methods 0.000 description 9
- 230000033001 locomotion Effects 0.000 description 9
- 230000001537 neural effect Effects 0.000 description 8
- 230000008859 change Effects 0.000 description 7
- 238000004364 calculation method Methods 0.000 description 6
- 238000004891 communication Methods 0.000 description 6
- 235000013399 edible fruits Nutrition 0.000 description 6
- 230000015572 biosynthetic process Effects 0.000 description 5
- 238000004590 computer program Methods 0.000 description 5
- 239000011159 matrix material Substances 0.000 description 5
- 238000003786 synthesis reaction Methods 0.000 description 5
- 238000012360 testing method Methods 0.000 description 4
- 238000013135 deep learning Methods 0.000 description 3
- 239000012634 fragment Substances 0.000 description 3
- 241001270131 Agaricus moelleri Species 0.000 description 2
- 230000001133 acceleration Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 210000004218 nerve net Anatomy 0.000 description 2
- 238000005457 optimization Methods 0.000 description 2
- 238000012706 support-vector machine Methods 0.000 description 2
- 238000013461 design Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 238000002372 labelling Methods 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 210000005036 nerve Anatomy 0.000 description 1
- 230000001151 other effect Effects 0.000 description 1
- 230000000750 progressive effect Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
- G06V20/41—Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/22—Matching criteria, e.g. proximity measures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
- G06V20/41—Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
- G06V20/42—Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items of sport video content
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Multimedia (AREA)
- Software Systems (AREA)
- Computational Linguistics (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Evolutionary Computation (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Computational Biology (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Image Analysis (AREA)
Abstract
An embodiment of the invention discloses a video class identification method and device, a data processing device and an electronic device. The method comprises the following steps: carrying out subsection on a video to obtain a plurality of segmented videos; carrying out sampling on each segmented video in the plurality of segmented videos to obtain an original image and an optical flow image of each segmented video; processing the original image of each segmented video by utilizing an airspace convolution neural network to obtain an airspace classification result of the video; processing the optical flow image of each segmented video by utilizing a time domain convolution neural network to obtain a time domain classification result of the video; and carrying out fusion processing on the airspace classification result and the time domain classification result to obtain a classification result of the video. The video class identification method and device can improve video class identification accuracy.
Description
Technical field
The invention belongs to technical field of computer vision, the more particularly to a kind of recognition methodss of video classification and device, number
According to processing meanss and electronic equipment.
Background technology
Action recognition is a popular direction of computer vision research.Action recognition technology is mainly by by colour
The video that sequence of pictures is constituted is processed, and identifies the action in video.The difficult point of action recognition technology is:It is how right
The video content of dynamic change is processed, to overcome the change at distance, visual angle, the movement of camera, and change of scene etc.
Correctly to identify the action in video.
At present, conventional action recognition technology mainly coordinates support vector machine etc. using the Feature Descriptor of hand-designed
Grader carries out action recognition.Wherein, most representational method is, as feature, to match somebody with somebody using intensive track description of modified model
Closing support vector machine classifier carries out action recognition.This kind of method cannot be in training certainly due to the Feature Descriptor of hand-designed
It is dynamic to improve character representation, cannot usually obtain preferable recognition correct rate.
In recent years, developing rapidly with depth learning technology, the application particularly in computer vision field are based on
The action recognition technology of deep learning has been increasingly becoming main flow.This kind of method based on deep learning is mainly using convolution god
Jing networks are processed to video, so as to identify the action in video.
The content of the invention
The embodiment of the present invention provides a kind of video classification technology of identification scheme.
A kind of one side according to embodiments of the present invention, there is provided video classification recognition methodss, including:
Video is segmented, multiple segmenting videos are obtained;
Respectively to multiple segmenting videos in each segmenting video sample, obtain the original image and light of each segmenting video
Stream picture;
Classified with the spatial domain for obtaining the video using the original image of each segmenting video of spatial domain convolutional neural networks process
As a result;And using each segmenting video of convolution Processing with Neural Network light stream picture obtaining the time domain of the video
As a result;
The spatial domain classification results and the time domain result are carried out with fusion treatment, the classification knot of the video is obtained
Really.
It is in based on another embodiment of said method, described segmentation is carried out to video to include:
The video is averagely segmented, the multiple segmenting videos of length identical are obtained.
In based on another embodiment of said method, the original image for obtaining each segmenting video includes:
A two field picture is randomly selected from each segmenting video respectively, as the original image of each segmenting video.
In based on another embodiment of said method, the light stream picture for obtaining each segmenting video includes:
Continuous multiple image is randomly selected from each segmenting video respectively, the light stream picture of each segmenting video is obtained.
In based on another embodiment of said method, the smooth stream picture be based on 8 bitmaps, totally 256 it is discrete
The gray level image of color range, the intermediate value of the gray level image is 128.
It is in based on another embodiment of said method, described to randomly select continuous multiframe respectively from each segmenting video
Image, the light stream picture for obtaining each segmenting video include:
It is respectively directed to each segmenting video:Continuous N two field pictures are randomly selected from each segmenting video;Wherein, N be more than
1 integer;And
The every adjacent two field pictures being based respectively in the N two field pictures are calculated, and obtain N-1 group light stream pictures, institute
Stating each group of light stream picture in N-1 group light stream pictures includes a frame lateral light stream picture and frame longitudinal direction light stream picture respectively.
Based in another embodiment of said method, it is characterised in that the process of utilization spatial domain convolutional neural networks
The original image of each segmenting video is included with the spatial domain classification results for obtaining the video:
It is utilized respectively spatial domain convolutional neural networks to process the original image of each segmenting video, obtains each segmenting video
Spatial domain preliminary classification result;
Integrated treatment is carried out using the spatial domain preliminary classification result of the plurality of segmenting video of spatial domain common recognition function pair, is obtained
The spatial domain classification results of the video;
And/or
Using the light stream picture of each segmenting video of convolution Processing with Neural Network obtaining the time domain of the video
As a result include:
It is utilized respectively convolution neutral net to process the light stream picture of each segmenting video, obtains each segmenting video
Time domain preliminary classification result;
Integrated treatment is carried out using the time domain preliminary classification result of the plurality of segmenting video of time domain common recognition function pair, is obtained
The time domain result of the video.
In based on another embodiment of said method, spatial domain common recognition function and/or time domain common recognition function bag
Include:Average function, max function or cum rights average function.
In based on another embodiment of said method, also include:
It is chosen at classification accuracy rate highest average function, max function or cum rights average function on checking data set to make
For spatial domain common recognition function;And/or
It is chosen at classification accuracy rate highest average function, max function or cum rights average function on checking data set to make
For time domain common recognition function.
In based on another embodiment of said method, the spatial domain preliminary classification result and the time domain preliminary classification are tied
Fruit is respectively the classification results vector that dimension is equal to class categories quantity;
The time domain result of the spatial domain classification results and the video of the video is respectively dimension and is equal to class categories
The classification results vector of quantity;
The classification results of the video are the classification results vector that dimension is equal to class categories quantity.
In based on another embodiment of said method, the spatial domain classification results and the time domain result are carried out
Fusion treatment includes:
The spatial domain classification results are multiplied by after weight coefficient set in advance respectively with the time domain result to be carried out
Summation, obtains the classification results of the video.
In based on another embodiment of said method, between the spatial domain classification results and the time domain result
Weight coefficient ratio is 1:1.5.
In based on another embodiment of said method, the smooth stream picture is specially primary light stream picture, the time domain
Convolutional neural networks are specially the first convolution neutral net;
It is utilized respectively the first convolution neutral net to process the primary light stream picture of each segmenting video, obtains
Obtain the first time domain preliminary classification result of each segmenting video;
Synthesis is carried out using the first time domain preliminary classification result of the plurality of segmenting video of the first time domain common recognition function pair
Process, obtain the first time domain result of the video.
In based on another embodiment of said method, also include:
Obtain the deformation light stream picture after the primary light flow graph distortion of image;
It is utilized respectively the second convolution neutral net to process the deformation light stream picture of each segmenting video, obtains each
Second time domain preliminary classification result of segmenting video;
Synthesis is carried out using the second time domain preliminary classification result of the plurality of segmenting video of the second time domain common recognition function pair
Process, obtain the second time domain result of the video;
Fusion treatment is carried out to the spatial domain classification results and the time domain result includes:Spatial domain classification is tied
Really, the first time domain result and the second time domain result carry out fusion treatment, obtain the classification of the video
As a result.
Deformation light stream picture in based on another embodiment of said method, after the acquisition primary light flow graph distortion of image
Including:
Respectively to calculating per adjacent two field pictures, the homography conversion square between obtaining per adjacent two field pictures
Battle array;
Respectively according to the homography conversion matrix between every adjacent two field pictures in corresponding adjacent two field pictures
Latter two field picture carries out affine transformation;
Respectively the previous frame image and the latter two field picture after affine transformation in every adjacent two field pictures is calculated,
Obtain deformation light stream picture.
It is in based on another embodiment of said method, described to include to carrying out calculating per adjacent two field pictures:According to
Robust features SURF feature point description is accelerated to carry out interframe Feature Points Matching.
In based on another embodiment of said method, to the spatial domain classification results, the first time domain result
Fusion treatment is carried out with the second time domain result includes:
The spatial domain classification results, the first time domain result and the second time domain result are multiplied by respectively
Sued for peace after weight coefficient set in advance, obtained the classification results of the video.
In based on another embodiment of said method, the spatial domain classification results and the first time domain result and
Weight coefficient ratio between the second time domain result is 1:1:0.5.
Based in another embodiment of said method, the classification results of the video are that dimension is equal to class categories quantity
Classification results vector;
Methods described also includes:
It is normalized using the classification results vector of video described in Softmax function pairs, obtains video and belong to each
The class probability vector of classification.
In based on another embodiment of said method, also include:
Preset initial spatial domain convolutional neural networks and initial time domain convolutional neural networks;
Each video as sample is based respectively on, using stochastic gradient descent method to the initial spatial domain convolutional neural networks
It is trained, obtains the spatial domain convolutional neural networks;And using stochastic gradient descent method to initial time domain convolution god
Jing networks are trained, and obtain the convolution neutral net.
In based on another embodiment of said method, using stochastic gradient descent method to the initial spatial domain convolutional Neural
Network is trained, and obtaining the spatial domain convolutional neural networks includes:
For a video as sample, start to perform the operation for being segmented video, it is described until obtaining
The spatial domain classification results of video;
Deviation of the spatial domain classification results of the comparison video relative to the preset standard spatial domain classification results of the video
Whether preset range is less than;
If being not less than preset range, the network parameter of the initial spatial domain convolutional neural networks is adjusted;To adjust
Spatial domain convolutional neural networks after network parameter as initial spatial domain convolutional neural networks, for the next one regarding as sample
Frequently, start to perform the operation for being segmented video;
If being less than preset range, using current initial spatial domain convolutional neural networks as the spatial domain convolutional neural networks.
In based on another embodiment of said method, using stochastic gradient descent method to the initial time domain convolutional Neural
Network is trained, and obtaining the convolution neutral net includes:
For a video as sample, start to perform the operation for being segmented video, it is described until obtaining
The time domain result of video;
Deviation of the time domain result of the comparison video relative to the preset standard time domain result of the video
Whether preset range is less than;
If being not less than preset range, the network parameter of the initial time domain convolutional neural networks is adjusted;To adjust
Convolution neutral net after network parameter as initial time domain convolutional neural networks, for the next one regarding as sample
Frequently, start to perform the operation for being segmented video;
If being less than preset range, using current initial time domain convolutional neural networks as the convolution neutral net;
The initial time domain convolutional neural networks include the first initial time domain convolutional neural networks or the second initial time domain volume
Product neutral net, the time domain result include the first time domain result or the second time domain result accordingly, described
Convolution neutral net includes the first convolution neutral net and the second convolution neutral net accordingly.
In based on another embodiment of said method, also include:
It is normalized using the spatial domain classification results of video described in Softmax function pairs, obtains the video category
In a spatial domain class probability vector of all categories;And entered using the time domain result of video described in Softmax function pairs
Row normalized, obtains the video and belongs to a time domain probability vector of all categories.
A kind of other side according to embodiments of the present invention, there is provided video classification identifying device, including:
Segmenting unit, for being segmented to video, obtains multiple segmenting videos;
Sampling unit, for respectively to multiple segmenting videos in each segmenting video sample, obtain each segmenting video
Original image and light stream picture;
Spatial domain classification processing unit, for processing the original graph of each segmenting video to obtain using spatial domain convolutional neural networks
The spatial domain classification results of the video;
Time domain processing unit, for being utilized respectively the light stream picture of each segmenting video of convolution Processing with Neural Network
To obtain the time domain result of each segmenting video;
Integrated unit, for the spatial domain classification results and the time domain result are carried out with fusion treatment, obtains institute
State the classification results of video.
Based in another embodiment of said apparatus, the segmenting unit, specifically for carrying out averagely to the video
Segmentation, obtains the multiple segmenting videos of length identical.
In based on another embodiment of said apparatus, the sampling unit includes:
Image sampling module, for randomly selecting a two field picture from each segmenting video respectively, as each segmenting video
Original image;
Light stream sampling module, for randomly selecting continuous multiple image respectively from each segmenting video, obtains each segmentation
The light stream picture of video.
In based on another embodiment of said apparatus, the smooth stream picture be based on 8 bitmaps, totally 256 it is discrete
The gray level image of color range, the intermediate value of the gray level image is 128.
In based on another embodiment of said apparatus, the light stream sampling module, specifically for:
It is respectively directed to each segmenting video:Continuous N two field pictures are randomly selected from each segmenting video;Wherein, N be more than
1 integer;And the every adjacent two field pictures being based respectively in the N two field pictures are calculated, N-1 group light flow graphs are obtained
Picture, each group of light stream picture in the N-1 groups light stream picture include a frame lateral light stream picture and frame longitudinal direction light stream respectively
Image.
In based on another embodiment of said apparatus, the spatial domain classification processing unit includes:
Spatial domain classification processing module, enters to the original image of each segmenting video for being utilized respectively spatial domain convolutional neural networks
Row is processed, and obtains the spatial domain preliminary classification result of each segmenting video;With
First integrated treatment module, for the spatial domain preliminary classification of the plurality of segmenting video of function pair of being known together using spatial domain
As a result integrated treatment is carried out, the spatial domain classification results of the video are obtained;
The time domain processing unit includes:
First time domain processing module, for being utilized respectively light flow graph of the convolution neutral net to each segmenting video
As being processed, the time domain preliminary classification result of each segmenting video is obtained;With
Second integrated treatment module, for the time domain preliminary classification of the plurality of segmenting video of function pair of being known together using time domain
As a result integrated treatment is carried out, the time domain result of the video is obtained.
In based on another embodiment of said apparatus, spatial domain common recognition function and/or time domain common recognition function bag
Include:Average function, max function or cum rights average function.
In based on another embodiment of said apparatus, the spatial domain common recognition function is specially classifies on checking data set
Accuracy highest average function, max function or cum rights average function;
The time domain common recognition function is specially classification accuracy rate highest average function, maximum letter on checking data set
Number or cum rights average function.
In based on another embodiment of said apparatus, the spatial domain preliminary classification result and the time domain preliminary classification are tied
Fruit is respectively the classification results vector that dimension is equal to class categories quantity;
The time domain result of the spatial domain classification results and the video of the video is respectively dimension and is equal to class categories
The classification results vector of quantity;
The classification results of the video are the classification results vector that dimension is equal to class categories quantity.
Based in another embodiment of said apparatus, the integrated unit, specifically for by the spatial domain classification results
Sued for peace after weight coefficient set in advance is multiplied by respectively with the time domain result, obtained the classification knot of the video
Really.
In based on another embodiment of said apparatus, between the spatial domain classification results and the time domain result
Weight coefficient ratio is 1:1.5.
In based on another embodiment of said apparatus, the smooth stream picture is specially primary light stream picture, the time domain
Convolutional neural networks are specially the first convolution neutral net;
The first time domain processing module, specifically for being utilized respectively the first convolution neutral net to each segmentation
The primary light stream picture of video is processed, and obtains the first time domain preliminary classification result of each segmenting video;
Second integrated treatment module, specifically for using the plurality of segmenting video of the first time domain common recognition function pair
First time domain preliminary classification result carries out integrated treatment, obtains the first time domain result of the video.
In based on another embodiment of said apparatus, also include:
Light stream processing unit, for obtaining the deformation light stream picture after the primary light flow graph distortion of image;
The time domain processing unit also includes:
Second time domain processing module, for being utilized respectively change of the second convolution neutral net to each segmenting video
Shape light stream picture is processed, and obtains the second time domain preliminary classification result of each segmenting video;
3rd integrated treatment module, for carrying out synthesis to the second time domain preliminary classification result of the plurality of segmenting video
Process, obtain the second time domain result of the video;
The integrated unit, specifically for the spatial domain classification results, the first time domain result and described
Two time domain results carry out fusion treatment, obtain the classification results of the video.
In based on another embodiment of said apparatus, the smooth stream processing unit, specifically for:
Respectively to calculating per adjacent two field pictures, the homography conversion square between obtaining per adjacent two field pictures
Battle array;
Respectively according to the homography conversion matrix between every adjacent two field pictures in corresponding adjacent two field pictures
Latter two field picture carries out affine transformation;And
Respectively the previous frame image and the latter two field picture after affine transformation in every adjacent two field pictures is calculated,
Obtain deformation light stream picture.
In based on another embodiment of said apparatus, the smooth stream processing unit is to counting per adjacent two field pictures
During calculation, specifically for carrying out interframe Feature Points Matching according to acceleration robust features SURF feature point description.
Based in another embodiment of said apparatus, the integrated unit, specifically for by the spatial domain classification results,
The first time domain result and the second time domain result are asked after being multiplied by weight coefficient set in advance respectively
With the classification results of the acquisition video.
In based on another embodiment of said apparatus, the spatial domain classification results and the first time domain result and
Weight coefficient ratio between the second time domain result is 1:1:0.5.
In based on another embodiment of said apparatus, also include:
First normalized unit, for being returned using the classification results vector of video described in Softmax function pairs
One change is processed, and obtains the class probability vector that video belongs to of all categories.
In based on another embodiment of said apparatus, also include:
Network training unit, for storing default initial spatial domain convolutional neural networks and initial time domain convolutional neural networks;
And each video as sample is based respectively on, the initial spatial domain convolutional neural networks are carried out using stochastic gradient descent method
Training, obtains the spatial domain convolutional neural networks;And using stochastic gradient descent method to the initial time domain convolutional Neural net
Network is trained, and obtains the convolution neutral net.
In based on another embodiment of said apparatus, the network training unit is using stochastic gradient descent method to described
When initial spatial domain convolutional neural networks are trained, specifically for:
For a video as sample, the spatial domain classification knot of the video that the comparison spatial domain classification processing unit is obtained
Whether fruit is identical with the preset standard spatial domain classification results of the video;
If differing, the network parameter of the initial spatial domain convolutional neural networks is adjusted;To adjust network parameter
Spatial domain convolutional neural networks afterwards start as initial spatial domain convolutional neural networks, then for the next one as the video of sample
Perform the spatial domain classification results of the video that the comparison spatial domain classification processing unit is obtained and the preset standard of the video
Whether identical is operated spatial domain classification results;
If identical, using current initial spatial domain convolutional neural networks as the spatial domain convolutional neural networks.
In based on another embodiment of said apparatus, the network training unit is using stochastic gradient descent method to described
When initial time domain convolutional neural networks are trained, specifically for:
The time domain knot of the video obtained for a video as sample, the comparison time domain processing unit
Whether fruit is identical with the preset standard time domain result of the video;
If differing, the network parameter of the initial time domain convolutional neural networks is adjusted;To adjust network parameter
Convolution neutral net afterwards starts as initial time domain convolutional neural networks, then for the next one as the video of sample
Perform the time domain result of the video that the comparison time domain processing unit is obtained and the preset standard of the video
Whether identical is operated time domain result;
If identical, using current initial time domain convolutional neural networks as the convolution neutral net;
The initial time domain convolutional neural networks include the first initial time domain convolutional neural networks or the second initial time domain volume
Product neutral net, the time domain result include the first time domain result or the second time domain result accordingly, described
Convolution neutral net includes the first convolution neutral net and the second convolution neutral net accordingly.
In based on another embodiment of said apparatus, also include:
Second normalized unit, for being returned using the spatial domain classification results of video described in Softmax function pairs
One change is processed, and obtains the spatial domain class probability vector that the video belongs to of all categories;And utilize Softmax function pairs institute
The time domain result for stating video is normalized, obtain the video belong to a time domain probability of all categories to
Amount.
A kind of another aspect according to embodiments of the present invention, there is provided data processing equipment, including:Any of the above-described embodiment
Described video classification identifying device.
In based on another embodiment of above-mentioned data processing equipment, the data processing equipment includes advanced reduced instruction
Collection machine ARM, central processing unit CPU or Graphics Processing Unit GPU.
In terms of another according to embodiments of the present invention, there is provided a kind of electronic equipment, be provided with any of the above-described embodiment
Described data processing equipment.
In terms of another according to embodiments of the present invention, there is provided a kind of computer-readable storage medium, for storing computer
The instruction that can read, the instruction include:Video is segmented, the instruction of multiple segmenting videos is obtained;Respectively to multiple points
Each segmenting video in section video is sampled, and obtains the instruction of the original image and light stream picture of each segmenting video;Using sky
Domain convolutional neural networks process the original image of each segmenting video to obtain the instruction of the spatial domain classification results of the video;And
Using the light stream picture of each segmenting video of convolution Processing with Neural Network obtaining the finger of the time domain result of the video
Order;The spatial domain classification results and the time domain result are carried out with fusion treatment, the classification results of the video are obtained
Instruction.
In terms of another according to embodiments of the present invention, there is provided a kind of computer equipment, including:
Memorizer, stores executable instruction;
One or more processors, complete of the invention any of the above-described reality to perform executable instruction with memory communication
Apply the corresponding operation of video classification recognition methodss of example.
The recognition methodss of video classification and device, data processing equipment and electronics provided based on the above embodiment of the present invention are set
It is standby, by being segmented to video, obtain multiple segmenting videos;And respectively to multiple segmenting videos in each segmenting video carry out
Sampling, obtains the original image and light stream picture of each segmenting video;Spatial domain convolutional neural networks are recycled to process each segmenting video
Original image obtaining the spatial domain classification results of video;And using the light of each segmenting video of convolution Processing with Neural Network
Stream picture is obtaining the time domain result of video;Fusion treatment is carried out to spatial domain classification results and time domain result finally,
Obtain the classification results of video.The embodiment of the present invention is adopted to each segmenting video respectively by video is divided into multiple segmenting videos
Sample frame picture and interframe light stream, when being trained to convolutional neural networks, it is possible to achieve the modeling to long-time action so that
When the network model that later use training is obtained is identified to visual classification, the knowledge of video classification is improve relative to prior art
Other accuracy, improves video classification recognition effect, and calculation cost is less.
Description of the drawings
Constitute the Description of Drawings embodiments of the invention of a part for description, and together with description for explaining
The principle of the present invention.
Referring to the drawings, according to detailed description below, the present invention can be more clearly understood from, wherein:
Fig. 1 is the flow chart of embodiment of the present invention video classification recognition methodss one embodiment.
Fig. 2 is the flow chart of another embodiment of embodiment of the present invention video classification recognition methodss.
Fig. 3 is the flow chart of another embodiment of embodiment of the present invention video classification recognition methodss.
Fig. 4 is the flow process of the one embodiment being trained to initial spatial domain convolutional neural networks in the embodiment of the present invention
Figure.
Fig. 5 is the flow process of the one embodiment being trained to initial time domain convolutional neural networks in the embodiment of the present invention
Figure.
Fig. 6 is the structural representation of embodiment of the present invention video classification identifying device one embodiment.
Fig. 7 is the structural representation of another embodiment of embodiment of the present invention video classification identifying device.
Fig. 8 is the structural representation of another embodiment of embodiment of the present invention video classification identifying device.
Fig. 9 is the structural representation of another embodiment of embodiment of the present invention video classification identifying device.
Figure 10 is the structural representation of embodiment of the present invention video classification identifying device further embodiment.
Figure 11 is the schematic diagram of one application example of video classification identifying device of the present invention.
Figure 12 is the structural representation of electronic equipment one embodiment of the present invention.
Specific embodiment
Describe the various exemplary embodiments of the present invention now with reference to accompanying drawing in detail.It should be noted that:Unless had in addition
Body illustrates that the positioned opposite of the part for otherwise illustrating in these embodiments, numerical expression and numerical value do not limit the present invention's
Scope.
Simultaneously, it should be appreciated that for the ease of description, the size of the various pieces shown in accompanying drawing is not according to reality
Proportionate relationship draw.
It is illustrative below to the description only actually of at least one exemplary embodiment, never as to the present invention
And its application or any restriction for using.
For known to person of ordinary skill in the relevant, technology, method and apparatus may be not discussed in detail, but suitable
In the case of, the technology, method and apparatus should be considered a part for description.
It should be noted that:Similar label and letter represent similar terms in following accompanying drawing, therefore, once a certain Xiang Yi
It is defined in individual accompanying drawing, then which need not be further discussed in subsequent accompanying drawing.
The embodiment of the present invention can apply to computer system/server, and which can be with numerous other universal or special calculating
System environmentss or configuration are operated together.It is suitable to well-known computing system, the ring being used together with computer system/server
The example of border and/or configuration is included but is not limited to:Personal computer system, server computer system, thin client, thick client
Machine, hand-held or laptop devices, based on the system of microprocessor, Set Top Box, programmable consumer electronics, NetPC Network PC,
Minicomputer Xi Tong ﹑ large computer systems and the distributed cloud computing technology environment including any of the above described system, etc..
Computer system/server can be in computer system executable instruction (the such as journey performed by computer system
Sequence module) general linguistic context under describe.Generally, program module can include routine, program, target program, component, logic, number
According to structure etc., they perform specific task or realize specific abstract data type.Computer system/server can be with
Implement in distributed cloud computing environment, in distributed cloud computing environment, task is by by the long-range of communication network links
What reason equipment was performed.In distributed cloud computing environment, program module may be located at the Local or Remote meter including storage device
On calculation system storage medium.
In the action recognition technology based on deep learning, double-current method convolutional neural networks (Two-Stream
Convolution Neural Network) it is a kind of representative network model.Double-current method convolutional neural networks are to make
With two convolutional neural networks, i.e. spatial domain convolutional neural networks and convolution neutral net respectively to frame picture and interframe light stream
It is modeled, and is merged by the classification results to two convolutional neural networks, identifies the action in video.
However, during implementing, inventor has found, although double-current method convolutional neural networks can be simultaneously to frame figure
Piece and interframe light stream, i.e., be modeled to transitory motions information, but but lacks the modeling ability to long-time action, and this causes
The accuracy of action recognition cannot be ensured.
Fig. 1 is the flow chart of embodiment of the present invention video classification recognition methodss one embodiment.As shown in figure 1, of the invention
Embodiment video classification recognition methodss include:
102, video is segmented, multiple segmenting videos are obtained.
As a specific example, when being segmented to video, specifically video averagely can be segmented, be obtained length
The multiple segmenting videos of identical.For example, video is divided into into 3 segmenting videos of length identical or 5 segmenting videos, specifically
Number of fragments regarding actual effect determine.Alternatively, it is also possible to random segment being carried out to video or several sections of works being extracted from video
For multiple segmenting videos.
In implementing, after receiving video, the length of video can be obtained, according to the length of video and set in advance
Number of fragments determines the length of each section of video, the video for receiving is divided into the multiple segmentations of length identical accordingly and is regarded
Frequently.
When being averagely segmented to video, the length of each segmenting video for obtaining is identical, is being based on long-time video to volume
When the network model of product neutral net is trained, can be with the training process of simplified network model;Using the convolution for training
When neutral net carries out the identification of video classification, due to close to the time needed for the identification of each segmenting video, video class can be improved
The whole efficiency not recognized.
104, respectively to multiple segmenting videos in each segmenting video sample, obtain the original image of each segmenting video
And light stream picture.
Exemplarily, when obtaining the original image of each segmenting video, one can be randomly selected from each segmenting video respectively
Two field picture, as the original image of each segmenting video.
Exemplarily, when obtaining the light stream picture of each segmenting video, can respectively from the company of randomly selecting in each segmenting video
Continuous multiple image, obtains the light stream picture of each segmenting video.
In a specific example of various embodiments of the present invention, light stream picture can for example be based on 8 bitmaps, totally 256
The gray level image of individual discrete color range, the intermediate value of gray level image is 128.
As optical flow field is a vector field, when light stream picture is represented using gray level image, need with two width scalar fields
Picture represents light stream picture, that is, correspond respectively to two width scalar field figures of the X-direction and Y-direction amplitude of light stream image coordinate axle
Piece.
Specifically, continuous multiple image is randomly selected from each segmenting video respectively, obtain the light stream of each segmenting video
Image, can be realized in the following way:It is respectively directed to each segmenting video:
Continuous N two field pictures are randomly selected from each segmenting video;Wherein, N is the integer more than 1;And
The every adjacent two field pictures being based respectively in N two field pictures are calculated, and obtain N-1 group light stream pictures, wherein N-1
Each group of light stream picture in group light stream picture includes a frame lateral light stream picture and frame longitudinal direction light stream picture respectively.
For example, each segmenting video can be respectively directed to:Continuous 6 two field picture is randomly selected from each segmenting video;Point
Every adjacent two field pictures in not based on 6 two field pictures are calculated, and obtain 5 groups of light stream gray level images, wherein 5 groups of light stream gray scales
Each group of light stream gray level image in image includes the horizontal light stream gray level image of a frame and frame longitudinal direction light stream gray level image respectively,
10 frame light stream gray level images are obtained, this 10 frame light stream gray level image can be used as the image of 10 passages.
106, classified with the spatial domain for obtaining video using the original image of each segmenting video of spatial domain convolutional neural networks process
As a result;And using each segmenting video of convolution Processing with Neural Network light stream picture with obtain video time domain tie
Really.
Wherein, the time domain result of the spatial domain classification results and video of video is respectively dimension and is equal to class categories quantity
Classification results vector.For example, classification results include:Running, high jump, footrace, vault, long-jump and triple jump, totally 6 classes
Not, then spatial domain classification results and time domain result are respectively classification results vector of the dimension equal to 6.
108, spatial domain classification results and time domain result are carried out with fusion treatment, the classification results of video are obtained.
Wherein, the classification results of video are the classification results vector that dimension is equal to class categories quantity.For example, classification results
Including:Running, high jump, footrace, vault, long-jump and triple jump, totally 6 classifications, then the classification results of video are that dimension is equal to 6
Classification results vector.
Used as a specific example, carrying out fusion treatment to spatial domain classification results and time domain result can be specifically:
Sued for peace after spatial domain classification results are multiplied by weight coefficient set in advance respectively with time domain result, obtained dividing for video
Class result.Wherein, weight coefficient is correct according to classification of the network model of convolutional neural networks on checking data set is corresponded to
Rate determines that the high network model's weight of classification accuracy rate is higher, and checking data set is by marking with true classification, and to have neither part nor lot in
The video of network training is constituted.Checking data set can be obtained by any possible mode, for example, by a search engine
The video of search respective classes is obtained.
For example, in a concrete application, the weight coefficient ratio between spatial domain classification results and time domain result can
Being 1:1.5.
Based on the video classification recognition methodss that the above embodiment of the present invention is provided, by being segmented to video, obtain many
Individual segmenting video;And respectively to multiple segmenting videos in each segmenting video sample, obtain the original graph of each segmenting video
Picture and light stream picture;Classified with the spatial domain for obtaining video using the original image of each segmenting video of spatial domain convolutional neural networks process
As a result;And using each segmenting video of convolution Processing with Neural Network light stream picture with obtain video time domain tie
Really;Finally spatial domain classification results and time domain result are carried out with fusion treatment, the classification results of video are obtained.The present invention is implemented
Example distinguishes sample frame picture and interframe light stream to each segmenting video by video is divided into multiple segmenting videos, to convolution god
When Jing networks are trained, it is possible to achieve the modeling to long-time action so that the network model pair that later use training is obtained
When visual classification is identified, the accuracy of video classification identification is improve relative to prior art, improve the knowledge of video classification
Other effect, and calculation cost is less.
Fig. 2 is the flow chart of another embodiment of embodiment of the present invention video classification recognition methodss.As shown in Fig. 2 this
Bright embodiment video classification recognition methodss include:
202, video is segmented, multiple segmenting videos are obtained.
As a specific example, when being segmented to video, specifically video averagely can be segmented, be obtained length
The multiple segmenting videos of identical, to simplify the training process of the network model of convolutional neural networks, improve the identification of video classification
Whole efficiency.For example, video is divided into into 3 segmenting videos of length identical or 5 segmenting videos, specific number of fragments
Determine depending on actual effect.Alternatively, it is also possible to random segment is carried out to video or several sections is extracted as multiple segmentations from video
Video.As shown in figure 11, in an Application Example of video classification recognition methodss of the present invention, video is divided into into 3
Segmenting video.
204, respectively to multiple segmenting videos in each segmenting video sample, obtain the original image of each segmenting video
And light stream picture.
For example, a two field picture can be randomly selected from each segmenting video respectively, as the original image of each segmenting video;
Continuous multiple image can be randomly selected from each segmenting video respectively, obtain the light stream picture of each segmenting video.
As shown in figure 11, in an Application Example of video classification recognition methodss of the present invention, respectively 3 segmentations are regarded
Frequency is sampled, and obtains a frame original image and interframe light stream picture of 3 segmenting videos.Wherein original image is RGB color
Image, light stream picture are gray level image.
206, it is utilized respectively spatial domain convolutional neural networks and the original image of each segmenting video is processed, obtains each segmentation
The spatial domain preliminary classification result of video;And be utilized respectively convolution neutral net the light stream picture of each segmenting video is carried out
Process, obtain the time domain preliminary classification result of each segmenting video.
Wherein, spatial domain preliminary classification result and time domain preliminary classification result are respectively dimension and are equal to dividing for class categories quantity
Class result vector.For example, classification results include:Running, high jump, footrace, vault, long-jump and triple jump, totally 6 classifications, then
Spatial domain preliminary classification result and time domain preliminary classification result are respectively classification results vector of the dimension equal to 6.
As shown in figure 11, in an Application Example of video classification recognition methodss of the present invention, it is utilized respectively spatial domain volume
Product neutral net is processed to the original image of 3 segmenting videos, obtains 3 spatial domain preliminary classification knots of 3 segmenting videos
Really;And be utilized respectively convolution neutral net the light stream picture of 3 segmenting videos is processed, obtain 3 segmentations and regard
3 time domain preliminary classification results of frequency.In implementing, spatial domain convolutional neural networks and/or convolution neutral net can be with
The combination of convolutional layer, non-linear layer, pond layer etc. is first passed through, the character representation of image is obtained, then by linear classification layer, is obtained
Belong to the preliminary classification result of the score of each classification, i.e. each segmenting video.For example, classification results include:Running, high jump,
Footrace, vault, long-jump and triple jump, totally 6 classifications, then the spatial domain preliminary classification result and time domain of each segmenting video is preliminary
Classification results are respectively 6 dimensional vectors of the classification score for belonging to this 6 classifications comprising video.
208, integrated treatment is carried out using the spatial domain preliminary classification result of the multiple segmenting videos of spatial domain common recognition function pair, obtain
The spatial domain classification results of video;And using time domain know together the multiple segmenting videos of function pair time domain preliminary classification result carry out it is comprehensive
Conjunction is processed, and obtains the time domain result of video.
Wherein, the time domain result of the spatial domain classification results and video of video is respectively dimension and is equal to class categories quantity
Classification results vector.
In implementing, spatial domain common recognition function and/or time domain common recognition function include:Average function, max function or band
Weight average function.Classification accuracy rate highest average function, max function or cum rights on checking data set is chosen at specially
Average function is used as spatial domain common recognition function;Or it is chosen at classification accuracy rate highest average function, maximum on checking data set
Value function or cum rights average function are used as time domain common recognition function.
Specifically, average function, specially averages as output same category of category score between different segmentations
The category category score;Max function, specially same category of category score between different segmentations, is selected by function
Maximum therein is taken as the category score of output;Cum rights average function, specially same category of class between different segmentations
Other score takes the category score of the meansigma methodss as the category of output of cum rights, and wherein each classification uses same set of weights, and
Obtain as network model's parameter optimization in training.
For example, in the Application Example shown in Figure 11, average function can be chosen as spatial domain common recognition function and time domain
Common recognition function, chooses average function as spatial domain common recognition function and time domain common recognition function, calculates 3 points using spatial domain common recognition function
Belong to the meansigma methodss of 3 scores of each classification in 3 spatial domain preliminary classification results of section video, obtain as the classification of the category
Point, one group of category score to all categories is thus obtained, as the spatial domain classification results of video;Using time domain common recognition letter
Belong to the meansigma methodss of 3 scores of each classification in the preliminary category result of 3 time domains of number 3 segmenting videos of calculating, as this
The category score of classification, has thus obtained one group of category score to all categories, used as the time domain result of video.Example
Such as, classification results include:Running, high jump, footrace, vault, long-jump and triple jump, totally 6 classifications, then the spatial domain classification of video
As a result 6 dimensional vectors of the category score for belonging to this 6 classifications comprising video are respectively with time domain result.
210, spatial domain classification results and time domain result are carried out with fusion treatment, the classification results of video are obtained.
Wherein, the classification results of video are the classification results vector that dimension is equal to class categories quantity.
As shown in figure 11, in an Application Example of video classification recognition methodss of the present invention, video spatial domain is classified
As a result 1 is multiplied by respectively with time domain result:Sued for peace after 1.5 weight coefficient, obtained the classification results of video.For example,
Classification results include:Running, high jump, footrace, vault, long-jump and triple jump, totally 6 classifications, then the classification results of video be
Belong to 6 dimensional vectors of the classification score of this 6 classifications comprising video.Wherein, the classification of highest scoring is the class belonging to video
Not, the classification of highest scoring is high jump in this embodiment, then identify that the classification of video is high jump.
Based on the video classification recognition methodss that the above embodiment of the present invention is provided, by common recognition is used between each segmenting video
Function, by the preliminary classification result of each segmenting video of Function Synthesis of knowing together, obtains the classification results of video, due to function of knowing together
Not to each segmenting video using convolutional neural networks model limit, therefore can realize that multiple segmenting videos share network
The parameter of model, makes the parameter of network model less, such that it is able to adopt the network model with less parameters to realize to any
The identification of the classification of the video of length, in the training process, by the video segmentation to random length, and carries out segmented network
Training, is exercised supervision study by classification results and the true tag of the whole video of comparison, it is possible to achieve the instruction of video-level entirely
Practice supervision, do not limited by video length.
Fig. 3 is the flow chart of another embodiment of embodiment of the present invention video classification recognition methodss.As shown in figure 3, this
Bright embodiment video classification recognition methodss include:
302, video is segmented, multiple segmenting videos are obtained.
304, respectively to multiple segmenting videos in each segmenting video sample, obtain the original image of each segmenting video
And primary light stream picture.
306, obtain the deformation light stream picture after primary light flow graph distortion of image.
In implementing, obtaining the deformation light stream picture after primary light flow graph distortion of image includes:Respectively to every adjacent two
Two field picture is calculated, the homography conversion matrix between obtaining per adjacent two field pictures;Respectively according to per two adjacent frames
Homography conversion matrix between image carries out affine transformation to the latter two field picture in corresponding adjacent two field pictures;It is right respectively
Previous frame image and the latter two field picture after affine transformation in per adjacent two field pictures is calculated, and obtains deformation light flow graph
Picture.
On previous frame image due to the characteristic point on the latter two field picture after above-mentioned affine transformation and as benchmark
There is no homography conversion between corresponding characteristic point, therefore, by previous frame image and affine transformation after latter two field picture meter
The deformation light stream picture for obtaining, can reduce camera movement to the identification of video classification as the input information of video classification identification
The impact of effect.
Specifically, include to carrying out calculating per adjacent two field pictures:Retouch according to robust features SURF characteristic point is accelerated
Stating son carries out interframe Feature Points Matching.
308, it is utilized respectively spatial domain convolutional neural networks and the original image of each segmenting video is processed, obtains each segmentation
The spatial domain preliminary classification result of video;It is utilized respectively primary light stream picture of the first convolution neutral net to each segmenting video
Processed, obtained the first time domain preliminary classification result of each segmenting video;And it is utilized respectively the second convolution nerve net
Network is processed to the deformation light stream picture of each segmenting video, obtains the second time domain preliminary classification result of each segmenting video.
310, integrated treatment is carried out using the spatial domain preliminary classification result of the multiple segmenting videos of spatial domain common recognition function pair, obtain
The spatial domain classification results of video;Entered using the first time domain preliminary classification result of the multiple segmenting videos of the first time domain common recognition function pair
Row integrated treatment, obtains the first time domain result of video;And using the multiple segmenting videos of the second time domain common recognition function pair
The second time domain preliminary classification result carry out integrated treatment, obtain the second time domain result of video.
312, fusion treatment is carried out to spatial domain classification results, the first time domain result and the second time domain result, is obtained
Obtain the classification results of video.
As a specific example, spatial domain classification results, the first time domain result and the second time domain result are entered
Row fusion treatment includes:Spatial domain classification results, the first time domain result and the second time domain result are multiplied by respectively in advance
Sued for peace after the weight coefficient of setting, obtained the classification results of video.Wherein, weight coefficient is according to corresponding network model
Classification accuracy rate on checking data set determines that the high network model of classification accuracy rate obtains higher weights.
For example, in a particular application, spatial domain classification results and the first time domain result and the second time domain result it
Between weight coefficient ratio can be 1:1:0.5.
As now widely used double-current method convolutional neural networks represent light stream picture using movable information in short-term, carrying
The movement of camera is not considered when taking light stream picture, this may result in dynamic in None- identified video when camera movement is larger
Make, and affect recognition effect.
Based on the video classification recognition methodss that the above embodiment of the present invention is provided, except using frame picture and interframe light stream it
Outward, also represented as additional movable information in short-term using the light stream of deformation, the input that video classification is recognized is expanded as three kinds
Information, i.e. frame picture, interframe light stream and deformation light stream, eliminate the impact of camera movement, therefore can drop due to deforming light stream
The impact of low phase machine mobile video classification recognition effect, in the training process, equally using three kinds of input informations, i.e. frame picture,
Interframe light stream and deformation light stream, are trained to network model, can reduce impact of the camera movement to network model, so as to can
So that video classification recognition system moves more robust to camera.
The video classification recognition methodss of the various embodiments described above of the present invention can be applicable to the training rank of convolutional neural networks model
Section, also apply be applicable to the test phase of convolutional neural networks model and follow-up concrete application stage.
In another embodiment of video classification recognition methodss of the present invention, the video classification identification side of the various embodiments described above
When method is applied to the test phase of convolutional neural networks model and follow-up concrete application stage, can be in operation 108,210 or 312
After obtaining the classification results of video, the classification results vector obtained using Softmax function pairs fusion treatment is normalized place
Reason, obtains the class probability vector that video belongs to of all categories.
In another embodiment of video classification recognition methodss of the present invention, the video classification identification side of the various embodiments described above
When method is applied to the training stage of convolutional neural networks model, following operation can also be included:
Preset initial spatial domain convolutional neural networks and initial time domain convolutional neural networks;
Each video as sample is based respectively on, using stochastic gradient descent method (SGD) to initial spatial domain convolutional Neural net
Network is trained, and obtains the spatial domain convolutional neural networks in the various embodiments described above;And using stochastic gradient descent method to initial
Convolution neutral net is trained, and obtains the convolution neutral net in the various embodiments described above.
Wherein, in advance to each video labeling standard spatial domain sorting result information as sample.
Stochastic gradient descent method is to update primary network model come iteration by each sample, using stochastic gradient descent method
Initial spatial domain convolutional neural networks and initial time domain convolutional neural networks are trained, training speed is fast, improve network instruction
Practice efficiency.
Fig. 4 is the flow process of the one embodiment being trained to initial spatial domain convolutional neural networks in the embodiment of the present invention
Figure.As shown in figure 4, the embodiment includes:
402, for a video as sample, start to perform the operation of flow process shown in the various embodiments described above of the present invention,
Spatial domain classification results until obtaining video.
For example, perform operation 102~106,202~208, or 302~310 in the operation related to spatial domain, acquisition video
Spatial domain classification results.
404, the spatial domain classification results for comparing video relative to the deviation of the preset standard spatial domain classification results of the video are
It is no less than preset range.
If being not less than preset range, operation 406 is performed.If being less than preset range, terminate to initial spatial domain convolutional Neural net
The training flow process of network, using current initial spatial domain convolutional neural networks as final spatial domain convolutional neural networks, does not perform sheet
The follow-up process of embodiment.
406, the network parameter of initial spatial domain convolutional neural networks is adjusted.
408, to adjust the spatial domain convolutional neural networks after network parameter as new initial spatial domain convolutional neural networks, pin
To the next one as the video of sample, start to perform operation 402.
Fig. 5 is the flow process of the one embodiment being trained to initial time domain convolutional neural networks in the embodiment of the present invention
Figure.As shown in figure 5, the embodiment includes:
502, for a video as sample, start to perform the operation for being segmented video, until obtaining video
Time domain result.
For example, perform operation 102~106,202~208, or 302~310 in the operation related to time domain, acquisition video
Time domain result.
504, compare the time domain result of video relative to the preset standard time domain result of video deviation whether
Less than preset range.
If being not less than preset range, operation 506 is performed.If being not less than preset range, terminate to initial time domain convolutional Neural
The training flow process of network, using current initial time domain convolutional neural networks as final convolution neutral net, does not perform
The follow-up process of the present embodiment.
506, the network parameter of initial time domain convolutional neural networks is adjusted.
508, to adjust the convolution neutral net after network parameter as new initial time domain convolutional neural networks, pin
To the next one as the video of sample, start to perform operation 502.
Specifically, in the embodiment shown in fig. 5, initial time domain convolutional neural networks specifically include the first initial time domain volume
Product neutral net or the second initial time domain convolutional neural networks, time domain result include accordingly the first time domain result or
Second time domain result, convolution neutral net include the first convolution neutral net and the second convolution accordingly
Neutral net.I.e., it is possible to pass through embodiment illustrated in fig. 5 and realize respectively or while realize to the first initial time domain convolutional Neural net
The training of network, the second initial time domain convolutional neural networks.
Further, by Fig. 4, embodiment illustrated in fig. 5 to initial spatial domain convolutional neural networks and initial time domain convolution god
When Jing networks are trained, following operation can also be included:
It is normalized using the spatial domain classification results of Softmax function pair videos, obtains video and belong to of all categories
A spatial domain class probability vector;And be normalized using the time domain result of Softmax function pair videos,
Obtain video and belong to a time domain probability vector of all categories.
Correspondingly, shown in Fig. 4, Fig. 5 spatial domain classification results, time domain result, can be specifically not normalized point
Class result or normalized class probability vector.
Fig. 6 is the structural representation of embodiment of the present invention video classification identifying device one embodiment.The embodiment is regarded
Frequency classification identifying device can be used for the video classification recognition methodss for realizing the various embodiments described above of the present invention.As shown in fig. 6, of the invention
Embodiment video classification identifying device includes:Segmenting unit, sampling unit, spatial domain classification processing unit, time domain process are single
Unit and integrated unit.Wherein:
Segmenting unit, for being segmented to video, obtains multiple segmenting videos.
As a specific example, segmenting unit, specifically can be used for averagely being segmented video, obtain length identical
Multiple segmenting videos.For example, video is divided into into 3 segmenting videos of length identical or 5 segmenting videos, specific point
Segment number is determined regarding actual effect.Alternatively, it is also possible to random segment being carried out to video or several sections being extracted from video as many
Individual segmenting video.
In implementing, after receiving video, the length of video is obtained, according to length and the segmentation set in advance of video
Quantity determines the length of each section of video, and the video for receiving is divided into the multiple segmenting videos of length identical accordingly.
Sampling unit, for respectively to multiple segmenting videos in each segmenting video sample, obtain each segmenting video
Original image and light stream picture.
Exemplarily, sampling unit can specifically include:
Image sampling module, for randomly selecting a two field picture from each segmenting video respectively, as each segmenting video
Original image;
Light stream sampling module, for randomly selecting continuous multiple image respectively from each segmenting video, obtains each segmentation
The light stream picture of video.
In a specific example of various embodiments of the present invention, light stream picture can for example be based on 8 bitmaps, totally 256
The gray level image of individual discrete color range, the intermediate value of gray level image is 128.
Specifically, light stream sampling module, can be specifically for being respectively directed to each segmenting video:
Continuous N two field pictures are randomly selected from each segmenting video;Wherein, N is the integer more than 1;And
The every adjacent two field pictures being based respectively in N two field pictures are calculated, and obtain N-1 group light stream pictures, wherein N-1
Each group of light stream picture in group light stream picture includes a frame lateral light stream picture and frame longitudinal direction light stream picture respectively.
For example, each segmenting video can be respectively directed to:Continuous 6 two field picture is randomly selected from each segmenting video;Point
Every adjacent two field pictures in not based on 6 two field pictures are calculated, and obtain 5 groups of light stream gray level images, wherein 5 groups of light stream gray scales
Each group of light stream gray level image in image includes the horizontal light stream gray level image of a frame and frame longitudinal direction light stream gray level image respectively,
10 frame light stream gray level images are obtained, this 10 frame light stream gray level image can be used as the image of 10 passages.
Spatial domain classification processing unit, enters to the original image of each segmenting video for being utilized respectively spatial domain convolutional neural networks
Row is processed, to obtain the spatial domain classification results of each segmenting video.
Wherein, the spatial domain classification results of video are the classification results vector that dimension is equal to class categories quantity.For example, classify
As a result include:Running, high jump, footrace, vault, long-jump and triple jump, totally 6 classifications, then spatial domain classification results are dimension etc.
In 6 classification results vector.
Time domain processing unit, enters to the light stream picture of each segmenting video for being utilized respectively convolution neutral net
Row is processed, to obtain the time domain result of each segmenting video.
Wherein, the time domain result of video is the classification results vector that dimension is equal to class categories quantity.For example, classify
As a result include:Running, high jump, footrace, vault, long-jump and triple jump, totally 6 classifications, then time domain result is dimension etc.
In 6 classification results vector.
Integrated unit, for spatial domain classification results and time domain result are carried out with fusion treatment, obtains the classification of video
As a result.
Wherein, the classification results of video are the classification results vector that dimension is equal to class categories quantity.For example, classification results
Including:Running, high jump, footrace, vault, long-jump and triple jump, totally 6 classifications, then the classification results of video are that dimension is equal to 6
Classification results vector.
Used as a specific example, carrying out fusion treatment to spatial domain classification results and time domain result includes:By spatial domain
Classification results are sued for peace after being multiplied by weight coefficient set in advance respectively with time domain result, obtain the classification knot of video
Really.Wherein, weight coefficient be according to corresponding network model checking data set on classification accuracy rate determine, classification accuracy rate
High network model obtains higher weight.
For example, in a particular application, the weight coefficient ratio between spatial domain classification results and time domain result can be
1:1.5。
Based on the video classification identifying device that the above embodiment of the present invention is provided, by being segmented to video, obtain many
Individual segmenting video;And respectively to multiple segmenting videos in each segmenting video sample, obtain the original graph of each segmenting video
Picture and light stream picture;The original graph of spatial domain convolutional neural networks and convolution neutral net to each segmenting video is utilized respectively again
Picture and light stream picture are processed, to obtain the spatial domain classification results and time domain result of each segmenting video;Finally to spatial domain
Classification results and time domain result carry out fusion treatment, obtain the classification results of video.The embodiment of the present invention is by by video
It is divided into multiple segmenting videos, sample frame picture and interframe light stream is distinguished to each segmenting video, convolutional neural networks are being instructed
When practicing, it is possible to achieve the modeling to long-time action so that the network model that later use training is obtained is carried out to visual classification
During identification, the accuracy of video classification identification is improve relative to prior art, improve video classification recognition effect, and count
Calculate cost less.
Fig. 7 is the structural representation of another embodiment of embodiment of the present invention video classification identifying device.As shown in fig. 7,
Compared with the embodiment shown in Fig. 6, in the embodiment, spatial domain classification processing unit specifically includes spatial domain classification processing module and the
One integrated treatment module.Wherein:
Spatial domain classification processing module, enters to the original image of each segmenting video for being utilized respectively spatial domain convolutional neural networks
Row is processed, and obtains the spatial domain preliminary classification result of each segmenting video.
Wherein, preliminary classification result in spatial domain is the classification results vector that dimension is equal to class categories quantity.For example, classification knot
Fruit includes:Running, high jump, footrace, vault, long-jump and triple jump, totally 6 classifications, then the spatial domain preliminary classification result of video
It is that classification results of the dimension equal to 6 are vectorial.
First integrated treatment module, for the spatial domain preliminary classification result of the multiple segmenting videos of function pair of being known together using spatial domain
Integrated treatment is carried out, the spatial domain classification results of video are obtained.
In implementing, spatial domain common recognition function includes:Average function, max function or cum rights average function.Spatial domain is altogether
Know function and be specially classification accuracy rate highest average function, max function or cum rights average function on checking data set.
Specifically, average function, specially averages as output same category of category score between different segmentations
The category category score;Max function, specially same category of category score between different segmentations, is selected by function
Maximum therein is taken as the category score of output;Cum rights average function, specially same category of class between different segmentations
Other score takes the category score of the meansigma methodss as the category of output of cum rights, and wherein each classification uses same set of weights, and
Obtain as network model's parameter optimization in training.
For example, in a particular application, average function can be chosen as spatial domain common recognition function, video is divided into into 3 segmentations
Video, for the convolutional neural networks of spatial domain, which obtains 3 groups of category scores, and each classification has being total to from 3 segmenting videos
3 scores, are corresponded to 3 segmenting videos respectively, 3 scores of each classification are averaged as the category using average function
Category score, thus obtained one group of category score to all categories.
Referring back to Fig. 7, in another embodiment, time domain processing unit is specifically included:First time domain processes mould
Block and the second integrated treatment module.Wherein:
First time domain processing module, for being utilized respectively light flow graph of the convolution neutral net to each segmenting video
As being processed, the time domain preliminary classification result of each segmenting video is obtained.
Wherein, time domain preliminary classification result is the classification results vector that dimension is equal to class categories quantity.For example, classification knot
Fruit includes:Running, high jump, footrace, vault, long-jump and triple jump, totally 6 classifications, then the time domain preliminary classification result of video
It is that classification results of the dimension equal to 6 are vectorial.
Second integrated treatment module, for the time domain preliminary classification result of the multiple segmenting videos of function pair of being known together using time domain
Integrated treatment is carried out, the time domain result of video is obtained.
In implementing, time domain common recognition function includes:Average function, max function or cum rights average function.Time domain is altogether
Know function and be specially classification accuracy rate highest average function, max function or cum rights average function on checking data set.
Based on the video classification identifying device that the above embodiment of the present invention is provided, by common recognition is used between each segmenting video
Function, by the preliminary classification result of each segmenting video of Function Synthesis of knowing together, obtains the classification results of video, due to function of knowing together
Not to each segmenting video using convolutional neural networks model limit, therefore can realize that multiple segmenting videos share network
The parameter of model, makes the parameter of network model less, such that it is able to adopt the network model with less parameters to realize to any
The identification of the classification of the video of length, in the training process, by the video segmentation to random length, and carries out segmented network
Training, is exercised supervision study by classification results and the true tag of the whole video of comparison, it is possible to achieve the instruction of video-level entirely
Practice supervision, do not limited by video length.
Fig. 8 is the structural representation of another embodiment of embodiment of the present invention video classification identifying device.As shown in figure 8,
Compared with the embodiment shown in Fig. 6 and Fig. 7, in the embodiment of the present invention, light stream picture be primary light stream picture, convolution nerve
Network is the first convolution neutral net, and the video classification identifying device of the embodiment also includes:
Light stream processing unit, for obtaining the deformation light stream picture after primary light flow graph distortion of image.
In implementing, light stream processing unit obtains every specifically for respectively to calculating per adjacent two field pictures
Homography conversion matrix between adjacent two field pictures;Respectively according to the homography conversion square between every adjacent two field pictures
Battle array carries out affine transformation to the latter two field picture in corresponding adjacent two field pictures;And respectively in every adjacent two field pictures
Previous frame image and affine transformation after latter two field picture calculated, obtain deformation light stream picture.
Specifically, when light stream processing unit is to calculating per adjacent two field pictures, specifically for according to acceleration robust
Property feature SURF feature point description carries out interframe Feature Points Matching.
The time domain processing unit of the embodiment includes:First time domain processing module, the second integrated treatment module,
Second time domain processing module and the 3rd integrated treatment module.Wherein:
First time domain processing module, specifically for being utilized respectively the first convolution neutral net to each segmenting video
Primary light stream picture processed, obtain the first time domain preliminary classification result of each segmenting video;
Second integrated treatment module, specifically for the first time domain of the multiple segmenting videos of function pair of being known together using the first time domain
Preliminary classification result carries out integrated treatment, obtains the first time domain result of video.
Second time domain processing module, for being utilized respectively change of the second convolution neutral net to each segmenting video
Shape light stream picture is processed, and obtains the second time domain preliminary classification result of each segmenting video.
3rd integrated treatment module, it is preliminary for the second time domain using the multiple segmenting videos of the second time domain common recognition function pair
Classification results carry out integrated treatment, obtain the second time domain result of video.
Integrated unit, specifically for entering to spatial domain classification results, the first time domain result and the second time domain result
Row fusion treatment, obtains the classification results of video.
As a specific example, integrated unit, specifically for by spatial domain classification results, the first time domain result and
Two time domain results are sued for peace after being multiplied by weight coefficient set in advance respectively, obtain the classification results of video.Wherein, weigh
Weight coefficient is that the classification accuracy rate according to corresponding network on checking data set determines that the high network model of classification accuracy rate obtains
Obtain higher weights.
For example, in a particular application, spatial domain classification results and the first time domain result and the second time domain result it
Between weight coefficient ratio can be 1:1:0.5.
Based on the video classification identifying device that the above embodiment of the present invention is provided, except using frame picture and interframe light stream it
Outward, also represented as additional movable information in short-term using the light stream of deformation, the input that video classification is recognized is expanded as three kinds
Information, i.e. frame picture, interframe light stream and deformation light stream, eliminate the impact of camera movement, therefore can drop due to deforming light stream
The impact of low phase machine mobile video classification recognition effect, in the training process, equally using three kinds of input informations, i.e. frame picture,
Interframe light stream and deformation light stream, are trained to network model, can reduce impact of the camera movement to network model, so as to can
So that video classification recognition system moves more robust to camera.
The video classification identifying device of the various embodiments described above of the present invention can be applicable to the training rank of convolutional neural networks model
Section, also apply be applicable to the test phase of convolutional neural networks model and follow-up concrete application stage.
Fig. 9 is the structural representation of another embodiment of embodiment of the present invention video classification identifying device.As shown in figure 9,
The video classification identifying device of the various embodiments described above is applied to the test phase and follow-up concrete application of convolutional neural networks model
During the stage, video classification identifying device can also include:First normalized unit, for being melted using Softmax function pairs
The classification results vector for closing process acquisition is normalized, and obtains the class probability vector that video belongs to of all categories.
Figure 10 is the structural representation of embodiment of the present invention video classification identifying device further embodiment.Above-mentioned each enforcement
When the video classification identifying device of example is applied to the training stage of convolutional neural networks model, can also include:Network training list
Unit, for storing default initial spatial domain convolutional neural networks and initial time domain convolutional neural networks;And it is based respectively on each conduct
Initial spatial domain convolutional neural networks are trained by the video of sample using stochastic gradient descent method, obtain final spatial domain volume
Product neutral net;And initial time domain convolutional neural networks are trained using stochastic gradient descent method, when obtaining final
Domain convolutional neural networks.
In a specific example based on embodiment illustrated in fig. 10, network training unit is using stochastic gradient descent method to first
When beginning spatial domain convolutional neural networks are trained, specifically for:
For a video as sample, compare the spatial domain classification results phase of the video that spatial domain classification processing unit is obtained
For whether the deviation of the preset standard spatial domain classification results of video is less than preset range;
If being not less than preset range, the network parameter of initial spatial domain convolutional neural networks is adjusted;To adjust network
Spatial domain convolutional neural networks after parameter are used as new initial spatial domain convolutional neural networks, then the regarding as sample for the next one
Frequently, start to perform the preset standard spatial domain point for comparing the spatial domain classification results with video of the video that spatial domain classification processing unit is obtained
Whether identical is operated class result;
If being less than preset range, using current initial spatial domain convolutional neural networks as final spatial domain convolutional Neural net
Network.
Based on another specific example of embodiment illustrated in fig. 10, network training unit is using stochastic gradient descent method to first
When beginning convolution neutral net is trained, specifically for:
For a video as sample, compare the time domain result phase of the video of time domain processing unit acquisition
For whether the deviation of the preset standard time domain result of video is less than preset range;
If being not less than preset range, the network parameter of initial time domain convolutional neural networks is adjusted;To adjust network
Convolution neutral net after parameter is used as new initial time domain convolutional neural networks, then the regarding as sample for the next one
Frequently, start to perform the preset standard time domain point of the time domain result with video of the video for comparing the acquisition of time domain processing unit
Whether identical is operated class result;
If being less than preset range, using current initial time domain convolutional neural networks as final convolution nerve net
Network.
Wherein, above-mentioned initial time domain convolutional neural networks can be included at the beginning of the first initial time domain convolutional neural networks or second
Beginning convolution neutral net, time domain result include the first time domain result or the second time domain result accordingly,
Convolution neutral net includes the first convolution neutral net and the second convolution neutral net accordingly.
Further, referring back to Figure 10, for initial spatial domain convolutional neural networks and initial time domain convolutional neural networks
When being trained, the video classification identifying device of the various embodiments described above can also include:Second normalized unit, for profit
It is normalized with the spatial domain classification results of Softmax function pair videos, obtains video and belong to a spatial domain of all categories
Class probability vector;And be normalized using the time domain result of Softmax function pair videos, obtain video category
In a time domain probability vector of all categories.
As shown in figure 11, be video classification identifying device of the present invention a concrete application example, convolution therein
Neutral net can be specifically the first convolution neutral net, it is also possible to while including the first convolution neutral net and
Two convolution neutral nets.
In addition, the embodiment of the present invention additionally provides a kind of data processing equipment, the data processing equipment is included in the present invention
State the video classification identifying device of any embodiment.
Based on the data processing equipment that the above embodiment of the present invention is provided, the thing video classification for being provided with above-described embodiment is known
Other device, by video is divided into multiple segmenting videos, distinguishes sample frame picture and interframe light stream to each segmenting video, to volume
When product neutral net is trained, it is possible to achieve the modeling to long-time action so that the network mould that later use training is obtained
When type is identified to visual classification, the accuracy of video classification identification is improve relative to prior art, improve video class
Other recognition effect, and calculation cost is less.
Specifically, the data processing equipment of the embodiment of the present invention can be arbitrarily with data processing function device, example
Such as can be including but not limited to:Advanced reduced instruction set machine (ARM), CPU (CPU) or Graphics Processing Unit
(GPU) etc..
In addition, the embodiment of the present invention additionally provides a kind of electronic equipment, can for example be mobile terminal, personal computer
(PC), panel computer, server etc., the electronic equipment are provided with the data processing equipment of any of the above-described embodiment of the invention.
Based on the electronic equipment that the above embodiment of the present invention is provided, the data processing equipment of above-described embodiment is provided with, is led to
Cross and video is divided into into multiple segmenting videos, sample frame picture and interframe light stream are distinguished to each segmenting video, to convolutional Neural net
When network is trained, it is possible to achieve the modeling to long-time action so that the network model that later use training is obtained is to video
When classification is identified, the accuracy of video classification identification is improve relative to prior art, improve video classification identification effect
Really, and calculation cost is less.
Figure 12 is the structural representation of electronic equipment one embodiment of the present invention, as shown in figure 12, for realizing the present invention
The electronic equipment of embodiment includes CPU (CPU), and which can be according to being stored in holding in read only memory (ROM)
Row instruction or the executable instruction that is partially loaded in random access storage device (RAM) from storage and perform various appropriate dynamic
Make and process.CPU can communicate with performing executable instruction with read only memory and/or random access storage device
So as to complete the corresponding operation of video classification recognition methodss provided in an embodiment of the present invention, for example:Video is segmented, is obtained
Multiple segmenting videos;Respectively to multiple segmenting videos in each segmenting video sample, obtain the original graph of each segmenting video
Picture and light stream picture;It is utilized respectively spatial domain convolutional neural networks to process the original image of each segmenting video, it is each to obtain
The spatial domain classification results of segmenting video;And be utilized respectively convolution neutral net the light stream picture of each segmenting video is carried out
Process, to obtain the time domain result of each segmenting video;Fusion treatment is carried out to spatial domain classification results and time domain result,
Obtain the classification results of video.
Additionally, in RAM, various programs and the data that can be also stored with needed for system operatio.CPU, ROM and RAM lead to
Cross bus to be connected with each other.Input/output (I/O) interface is also connected to bus.
I/O interfaces are connected to lower component:Including the importation of keyboard, mouse etc.;Including such as cathode ray tube
(CRT), the output par, c of liquid crystal display (LCD) etc. and speaker etc.;Storage part including hard disk etc.;And including all
The such as communications portion of the NIC of LAN card, modem etc..Communications portion performs logical via the network of such as the Internet
Letter process.Driver is also according to needing to be connected to I/O interfaces.Detachable media, such as disk, CD, magneto-optic disk, quasiconductor are deposited
Reservoir etc., is installed on a drive as needed, and the computer program in order to read from it is mounted into as needed
Storage part.
Especially, in accordance with an embodiment of the present disclosure, computer is may be implemented as above with reference to the process of flow chart description
Software program.For example, embodiment of the disclosure includes a kind of computer program, and which includes being tangibly embodied in machine readable
Computer program on medium, computer program include the program code for the method shown in execution flow chart, described program
Code may include that correspondence performs the corresponding instruction of any one video classification methods step provided in an embodiment of the present invention, for example, right
Video is segmented, and obtains the instruction of multiple segmenting videos;Respectively to multiple segmenting videos in each segmenting video sample,
Obtain the instruction of the original image and light stream picture of each segmenting video;Spatial domain convolutional neural networks are utilized respectively to each segmenting video
Original image processed, obtain the spatial domain preliminary classification object command of each segmenting video;And it is utilized respectively convolution
Neutral net is processed to the light stream picture of each segmenting video, obtains the finger of the time domain preliminary classification result of each segmenting video
Order;Integrated treatment is carried out to the spatial domain preliminary classification result of multiple segmenting videos, the instruction of the spatial domain classification results of video is obtained;
And integrated treatment is carried out to the time domain preliminary classification result of multiple segmenting videos, obtain the finger of the time domain result of video
Order;Spatial domain classification results and time domain result are carried out with fusion treatment, the instruction of the classification results of video is obtained.The computer
Program can be downloaded and installed from network by communications portion, and/or mounted from detachable media.In the computer journey
When sequence is performed by CPU (CPU), the above-mentioned functions limited in performing the method for the present invention.
The embodiment of the present invention additionally provides a kind of computer-readable storage medium, for storing the instruction of embodied on computer readable, institute
Stating instruction includes:Video is segmented, the instruction of multiple segmenting videos is obtained;Respectively to multiple segmenting videos in each segmentation
Video is sampled, and obtains the instruction of the original image and light stream picture of each segmenting video;It is utilized respectively spatial domain convolutional Neural net
Network is processed to the original image of each segmenting video, obtains the instruction of the spatial domain preliminary classification result of each segmenting video;And
It is utilized respectively convolution neutral net to process the light stream picture of each segmenting video, obtains at the beginning of the time domain of each segmenting video
The instruction of step classification results;Integrated treatment is carried out to the spatial domain preliminary classification result of multiple segmenting videos, the spatial domain of video is obtained
The instruction of classification results;And integrated treatment is carried out to the time domain preliminary classification result of multiple segmenting videos, obtain video when
The instruction of domain classification results;Spatial domain classification results and time domain result are carried out with fusion treatment, the classification results of video are obtained
Instruction.
In addition, the embodiment of the present invention additionally provides a kind of computer equipment, including:
Memorizer, stores executable instruction;
One or more processors, complete of the invention any of the above-described reality to perform executable instruction with memory communication
Apply the corresponding operation of video classification recognition methodss of example.
In this specification, each embodiment is described by the way of progressive, and what each embodiment was stressed is and which
The difference of its embodiment, same or analogous part cross-reference between each embodiment.For system embodiment
For, it is substantially corresponding with embodiment of the method due to which, so description is fairly simple, portion of the related part referring to embodiment of the method
Defend oneself bright.
Methods and apparatus of the present invention, equipment may be achieved in many ways.For example, software, hardware, firmware can be passed through
Or any combinations of software, hardware, firmware are realizing methods and apparatus of the present invention, equipment.The step of for methods described
Said sequence merely to illustrate, be not limited to order described in detail above the step of the method for the present invention, unless with
Alternate manner is illustrated.Additionally, in certain embodiments, also the present invention can be embodied as recording journey in the recording medium
Sequence, these programs are included for realizing the machine readable instructions of the method according to the invention.Thus, the present invention also covers storage and uses
In the recording medium of the program for performing the method according to the invention.
Description of the invention is given for the sake of example and description, and is not exhaustively or by the present invention
It is limited to disclosed form.Many modifications and variations are obvious for the ordinary skill in the art.Select and retouch
It is, for the principle and practical application that more preferably illustrate the present invention, and one of ordinary skill in the art is managed to state embodiment
The present invention is solved so as to design the various embodiments with various modifications for being suitable to special-purpose.
Claims (10)
1. a kind of video classification recognition methodss, it is characterised in that include:
Video is segmented, multiple segmenting videos are obtained;
Respectively to multiple segmenting videos in each segmenting video sample, obtain the original image and light flow graph of each segmenting video
Picture;
The original image of each segmenting video is processed to obtain the spatial domain classification results of the video using spatial domain convolutional neural networks;
And using each segmenting video of convolution Processing with Neural Network light stream picture obtaining the time domain result of the video;
The spatial domain classification results and the time domain result are carried out with fusion treatment, the classification results of the video are obtained.
2. method according to claim 1, it is characterised in that described segmentation is carried out to video to include:
The video is averagely segmented, the multiple segmenting videos of length identical are obtained.
3. method according to claim 1 and 2, it is characterised in that the original image of each segmenting video of the acquisition includes:
A two field picture is randomly selected from each segmenting video respectively, as the original image of each segmenting video.
4. method according to claim 1 and 2, it is characterised in that the light stream picture of each segmenting video of the acquisition includes:
Continuous multiple image is randomly selected from each segmenting video respectively, the light stream picture of each segmenting video is obtained.
5. method according to claim 4, it is characterised in that the smooth stream picture be based on 8 bitmaps, totally 256 from
The gray level image of scattered color range, the intermediate value of the gray level image is 128.
6. the method according to claim 4 or 5, it is characterised in that described respectively from the company of randomly selecting in each segmenting video
Continuous multiple image, the light stream picture for obtaining each segmenting video include:
It is respectively directed to each segmenting video:Continuous N two field pictures are randomly selected from each segmenting video;Wherein, N is more than 1
Integer;And
The every adjacent two field pictures being based respectively in the N two field pictures are calculated, and obtain N-1 group light stream pictures, the N-1
Each group of light stream picture in group light stream picture includes a frame lateral light stream picture and frame longitudinal direction light stream picture respectively.
7. the method according to claim 1 to 6 any one, it is characterised in that utilization spatial domain convolutional neural networks
The original image for processing each segmenting video is included with the spatial domain classification results for obtaining the video:
It is utilized respectively spatial domain convolutional neural networks to process the original image of each segmenting video, obtains the sky of each segmenting video
Domain preliminary classification result;
Integrated treatment is carried out using the spatial domain preliminary classification result of the plurality of segmenting video of spatial domain common recognition function pair, obtains described
The spatial domain classification results of video;
And/or
Using the light stream picture of each segmenting video of convolution Processing with Neural Network obtaining the time domain result of the video
Including:
Be utilized respectively convolution neutral net to process the light stream picture of each segmenting video, obtain each segmenting video when
Domain preliminary classification result;
Integrated treatment is carried out using the time domain preliminary classification result of the plurality of segmenting video of time domain common recognition function pair, obtains described
The time domain result of video.
8. a kind of video classification identifying device, it is characterised in that include:
Segmenting unit, for being segmented to video, obtains multiple segmenting videos;
Sampling unit, for respectively to multiple segmenting videos in each segmenting video sample, obtain the original of each segmenting video
Beginning image and light stream picture;
Spatial domain classification processing unit, it is described to obtain for the original graph using each segmenting video of spatial domain convolutional neural networks process
The spatial domain classification results of video;
Time domain processing unit, for being utilized respectively the light stream picture of each segmenting video of convolution Processing with Neural Network to obtain
Obtain the time domain result of each segmenting video;
Integrated unit, for the spatial domain classification results and the time domain result are carried out with fusion treatment, obtain described in regard
The classification results of frequency.
9. a kind of data processing equipment, it is characterised in that including the visual classification identifying device described in claim 8.
10. a kind of electronic equipment, it is characterised in that the data processing equipment being provided with described in claim 9.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN2016106196541 | 2016-07-29 | ||
CN201610619654 | 2016-07-29 |
Publications (2)
Publication Number | Publication Date |
---|---|
CN106599789A true CN106599789A (en) | 2017-04-26 |
CN106599789B CN106599789B (en) | 2019-10-11 |
Family
ID=58592577
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201611030170.XA Active CN106599789B (en) | 2016-07-29 | 2016-11-15 | The recognition methods of video classification and device, data processing equipment and electronic equipment |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN106599789B (en) |
WO (1) | WO2018019126A1 (en) |
Cited By (35)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107330362A (en) * | 2017-05-25 | 2017-11-07 | 北京大学 | A kind of video classification methods based on space-time notice |
CN107463949A (en) * | 2017-07-14 | 2017-12-12 | 北京协同创新研究院 | A kind of processing method and processing device of video actions classification |
WO2018019126A1 (en) * | 2016-07-29 | 2018-02-01 | 北京市商汤科技开发有限公司 | Video category identification method and device, data processing device and electronic apparatus |
CN107943849A (en) * | 2017-11-03 | 2018-04-20 | 小草数语(北京)科技有限公司 | The search method and device of video file |
CN108010538A (en) * | 2017-12-22 | 2018-05-08 | 北京奇虎科技有限公司 | Audio data processing method and device, computing device |
CN108171222A (en) * | 2018-02-11 | 2018-06-15 | 清华大学 | A kind of real-time video sorting technique and device based on multithread neural network |
CN108229290A (en) * | 2017-07-26 | 2018-06-29 | 北京市商汤科技开发有限公司 | Video object dividing method and device, electronic equipment, storage medium and program |
CN108230413A (en) * | 2018-01-23 | 2018-06-29 | 北京市商汤科技开发有限公司 | Image Description Methods and device, electronic equipment, computer storage media, program |
CN108764084A (en) * | 2018-05-17 | 2018-11-06 | 西安电子科技大学 | Video classification methods based on spatial domain sorter network and the time domain network integration |
CN109271840A (en) * | 2018-07-25 | 2019-01-25 | 西安电子科技大学 | Video gesture classification method |
CN109325435A (en) * | 2018-09-15 | 2019-02-12 | 天津大学 | Video Action Recognition and Localization Algorithm Based on Cascaded Neural Network |
CN109325430A (en) * | 2018-09-11 | 2019-02-12 | 北京飞搜科技有限公司 | Real-time Activity recognition method and system |
CN109376603A (en) * | 2018-09-25 | 2019-02-22 | 北京周同科技有限公司 | A kind of video frequency identifying method, device, computer equipment and storage medium |
CN109657546A (en) * | 2018-11-12 | 2019-04-19 | 平安科技(深圳)有限公司 | Video behavior recognition methods neural network based and terminal device |
CN109726765A (en) * | 2019-01-02 | 2019-05-07 | 京东方科技集团股份有限公司 | A kind of sample extraction method and device of visual classification problem |
CN109740670A (en) * | 2019-01-02 | 2019-05-10 | 京东方科技集团股份有限公司 | The method and device of visual classification |
CN109886165A (en) * | 2019-01-23 | 2019-06-14 | 中国科学院重庆绿色智能技术研究院 | An Action Video Extraction and Classification Method Based on Moving Object Detection |
CN110020639A (en) * | 2019-04-18 | 2019-07-16 | 北京奇艺世纪科技有限公司 | Video feature extraction method and relevant device |
CN110062248A (en) * | 2019-04-30 | 2019-07-26 | 广州酷狗计算机科技有限公司 | Recommend the method and apparatus of direct broadcasting room |
CN110321761A (en) * | 2018-03-29 | 2019-10-11 | 中国科学院深圳先进技术研究院 | A kind of Activity recognition method, terminal device and computer readable storage medium |
CN110598504A (en) * | 2018-06-12 | 2019-12-20 | 北京市商汤科技开发有限公司 | Image recognition method and device, electronic equipment and storage medium |
CN110602527A (en) * | 2019-09-12 | 2019-12-20 | 北京小米移动软件有限公司 | Video processing method, device and storage medium |
CN111125405A (en) * | 2019-12-19 | 2020-05-08 | 国网冀北电力有限公司信息通信分公司 | Power monitoring image abnormality detection method and device, electronic device and storage medium |
WO2020108023A1 (en) * | 2018-11-28 | 2020-06-04 | 北京达佳互联信息技术有限公司 | Video motion classification method, apparatus, computer device, and storage medium |
WO2020155713A1 (en) * | 2019-01-29 | 2020-08-06 | 北京市商汤科技开发有限公司 | Image processing method and device, and network training method and device |
CN111820947A (en) * | 2019-04-19 | 2020-10-27 | 无锡祥生医疗科技股份有限公司 | Ultrasonic heart reflux automatic capturing method and system and ultrasonic imaging equipment |
CN111860353A (en) * | 2020-07-23 | 2020-10-30 | 北京以萨技术股份有限公司 | Video behavior prediction method, device and medium based on dual-stream neural network |
CN111898458A (en) * | 2020-07-07 | 2020-11-06 | 中国传媒大学 | Violent Video Recognition Method Based on Bimodal Task Learning Based on Attention Mechanism |
CN112288345A (en) * | 2019-07-25 | 2021-01-29 | 顺丰科技有限公司 | Method and device for detecting loading and unloading port state, server and storage medium |
CN113139467A (en) * | 2021-04-23 | 2021-07-20 | 西安交通大学 | Hierarchical structure-based fine-grained video action identification method |
US11113536B2 (en) | 2019-03-15 | 2021-09-07 | Boe Technology Group Co., Ltd. | Video identification method, video identification device, and storage medium |
CN113395537A (en) * | 2021-06-16 | 2021-09-14 | 北京百度网讯科技有限公司 | Method and device for recommending live broadcast room |
CN113870040A (en) * | 2021-09-07 | 2021-12-31 | 天津大学 | Double-flow graph convolution network microblog topic detection method fusing different propagation modes |
CN114987551A (en) * | 2022-06-27 | 2022-09-02 | 吉林大学 | Lane departure early warning method based on double-current convolutional neural network |
CN116645917A (en) * | 2023-06-09 | 2023-08-25 | 浙江技加智能科技有限公司 | LED display brightness adjustment system and method thereof |
Families Citing this family (19)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109120932B (en) * | 2018-07-12 | 2021-10-26 | 东华大学 | Video significance prediction method of HEVC compressed domain double SVM model |
US11200424B2 (en) * | 2018-10-12 | 2021-12-14 | Adobe Inc. | Space-time memory network for locating target object in video content |
CN111753574A (en) * | 2019-03-26 | 2020-10-09 | 顺丰科技有限公司 | Throw area positioning method, device, equipment and storage medium |
CN112307821A (en) * | 2019-07-29 | 2021-02-02 | 顺丰科技有限公司 | Video stream processing method, device, equipment and storage medium |
US11138441B2 (en) * | 2019-12-06 | 2021-10-05 | Baidu Usa Llc | Video action segmentation by mixed temporal domain adaption |
CN111027482B (en) * | 2019-12-10 | 2023-04-14 | 浩云科技股份有限公司 | Behavior analysis method and device based on motion vector segmentation analysis |
CN111104553B (en) * | 2020-01-07 | 2023-12-12 | 中国科学院自动化研究所 | Efficient motor complementary neural network system |
CN111783713B (en) * | 2020-07-09 | 2022-12-02 | 中国科学院自动化研究所 | Weakly-supervised time-series behavior location method and device based on relational prototype network |
CN111951276B (en) * | 2020-07-28 | 2025-03-28 | 上海联影智能医疗科技有限公司 | Image segmentation method, device, computer equipment and storage medium |
CN113395542B (en) * | 2020-10-26 | 2022-11-08 | 腾讯科技(深圳)有限公司 | Video generation method and device based on artificial intelligence, computer equipment and medium |
CN114756115A (en) * | 2020-12-28 | 2022-07-15 | 阿里巴巴集团控股有限公司 | Interactive control method, device and device |
CN112580589A (en) * | 2020-12-28 | 2021-03-30 | 国网上海市电力公司 | Behavior identification method, medium and equipment considering unbalanced data based on double-flow method |
CN112731359B (en) * | 2020-12-31 | 2024-04-09 | 无锡祥生医疗科技股份有限公司 | Method and device for determining speed of ultrasonic probe and storage medium |
CN113128354B (en) * | 2021-03-26 | 2022-07-19 | 中山大学中山眼科中心 | A kind of hand washing quality detection method and device |
CN112926549B (en) * | 2021-04-15 | 2022-06-24 | 华中科技大学 | Gait recognition method and system based on time domain-space domain feature joint enhancement |
CN114373194B (en) * | 2022-01-14 | 2024-11-12 | 南京邮电大学 | Human action recognition method based on keyframe and attention mechanism |
CN114861530A (en) * | 2022-04-21 | 2022-08-05 | 同济大学 | A kind of ENSO intelligent prediction method, device, equipment and storage medium |
CN115830698A (en) * | 2022-04-28 | 2023-03-21 | 西安理工大学 | Target detection and positioning method based on depth optical flow and YOLOv3 space-time fusion |
CN118214922B (en) * | 2024-05-17 | 2024-08-30 | 环球数科集团有限公司 | System for capturing video spatial and temporal features using CNNs filters |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102129691A (en) * | 2011-03-22 | 2011-07-20 | 北京航空航天大学 | Video object tracking cutting method using Snake profile model |
CN102289795A (en) * | 2011-07-29 | 2011-12-21 | 上海交通大学 | Method for enhancing video in spatio-temporal mode based on fusion idea |
US20130071041A1 (en) * | 2011-09-16 | 2013-03-21 | Hailin Jin | High-Quality Denoising of an Image Sequence |
CN104217214A (en) * | 2014-08-21 | 2014-12-17 | 广东顺德中山大学卡内基梅隆大学国际联合研究院 | Configurable convolutional neural network based red green blue-distance (RGB-D) figure behavior identification method |
CN105550699A (en) * | 2015-12-08 | 2016-05-04 | 北京工业大学 | CNN-based video identification and classification method through time-space significant information fusion |
Family Cites Families (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8345984B2 (en) * | 2010-01-28 | 2013-01-01 | Nec Laboratories America, Inc. | 3D convolutional neural networks for automatic human action recognition |
CN103218831B (en) * | 2013-04-21 | 2015-11-18 | 北京航空航天大学 | A kind of video frequency motion target classifying identification method based on profile constraint |
CN104966104B (en) * | 2015-06-30 | 2018-05-11 | 山东管理学院 | A kind of video classification methods based on Three dimensional convolution neutral net |
CN105740773B (en) * | 2016-01-25 | 2019-02-01 | 重庆理工大学 | Activity recognition method based on deep learning and multi-scale information |
CN106599789B (en) * | 2016-07-29 | 2019-10-11 | 北京市商汤科技开发有限公司 | The recognition methods of video classification and device, data processing equipment and electronic equipment |
-
2016
- 2016-11-15 CN CN201611030170.XA patent/CN106599789B/en active Active
-
2017
- 2017-07-12 WO PCT/CN2017/092597 patent/WO2018019126A1/en active Application Filing
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102129691A (en) * | 2011-03-22 | 2011-07-20 | 北京航空航天大学 | Video object tracking cutting method using Snake profile model |
CN102289795A (en) * | 2011-07-29 | 2011-12-21 | 上海交通大学 | Method for enhancing video in spatio-temporal mode based on fusion idea |
US20130071041A1 (en) * | 2011-09-16 | 2013-03-21 | Hailin Jin | High-Quality Denoising of an Image Sequence |
CN104217214A (en) * | 2014-08-21 | 2014-12-17 | 广东顺德中山大学卡内基梅隆大学国际联合研究院 | Configurable convolutional neural network based red green blue-distance (RGB-D) figure behavior identification method |
CN105550699A (en) * | 2015-12-08 | 2016-05-04 | 北京工业大学 | CNN-based video identification and classification method through time-space significant information fusion |
Cited By (56)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2018019126A1 (en) * | 2016-07-29 | 2018-02-01 | 北京市商汤科技开发有限公司 | Video category identification method and device, data processing device and electronic apparatus |
CN107330362A (en) * | 2017-05-25 | 2017-11-07 | 北京大学 | A kind of video classification methods based on space-time notice |
CN107330362B (en) * | 2017-05-25 | 2020-10-09 | 北京大学 | Video classification method based on space-time attention |
CN107463949A (en) * | 2017-07-14 | 2017-12-12 | 北京协同创新研究院 | A kind of processing method and processing device of video actions classification |
CN107463949B (en) * | 2017-07-14 | 2020-02-21 | 北京协同创新研究院 | A processing method and device for video action classification |
CN108229290A (en) * | 2017-07-26 | 2018-06-29 | 北京市商汤科技开发有限公司 | Video object dividing method and device, electronic equipment, storage medium and program |
US11222211B2 (en) | 2017-07-26 | 2022-01-11 | Beijing Sensetime Technology Development Co., Ltd | Method and apparatus for segmenting video object, electronic device, and storage medium |
CN107943849A (en) * | 2017-11-03 | 2018-04-20 | 小草数语(北京)科技有限公司 | The search method and device of video file |
CN107943849B (en) * | 2017-11-03 | 2020-05-08 | 绿湾网络科技有限公司 | Video file retrieval method and device |
CN108010538B (en) * | 2017-12-22 | 2021-08-24 | 北京奇虎科技有限公司 | Audio data processing method and device, and computing device |
CN108010538A (en) * | 2017-12-22 | 2018-05-08 | 北京奇虎科技有限公司 | Audio data processing method and device, computing device |
CN108230413A (en) * | 2018-01-23 | 2018-06-29 | 北京市商汤科技开发有限公司 | Image Description Methods and device, electronic equipment, computer storage media, program |
CN108230413B (en) * | 2018-01-23 | 2021-07-06 | 北京市商汤科技开发有限公司 | Image description method and device, electronic equipment and computer storage medium |
CN108171222B (en) * | 2018-02-11 | 2020-08-25 | 清华大学 | A real-time video classification method and device based on multi-stream neural network |
CN108171222A (en) * | 2018-02-11 | 2018-06-15 | 清华大学 | A kind of real-time video sorting technique and device based on multithread neural network |
CN110321761B (en) * | 2018-03-29 | 2022-02-11 | 中国科学院深圳先进技术研究院 | Behavior recognition method, terminal device and computer-readable storage medium |
CN110321761A (en) * | 2018-03-29 | 2019-10-11 | 中国科学院深圳先进技术研究院 | A kind of Activity recognition method, terminal device and computer readable storage medium |
CN108764084A (en) * | 2018-05-17 | 2018-11-06 | 西安电子科技大学 | Video classification methods based on spatial domain sorter network and the time domain network integration |
CN108764084B (en) * | 2018-05-17 | 2021-07-27 | 西安电子科技大学 | Video classification method based on fusion of spatial classification network and temporal classification network |
CN110598504A (en) * | 2018-06-12 | 2019-12-20 | 北京市商汤科技开发有限公司 | Image recognition method and device, electronic equipment and storage medium |
CN110598504B (en) * | 2018-06-12 | 2023-07-21 | 北京市商汤科技开发有限公司 | Image recognition method and device, electronic equipment and storage medium |
CN109271840A (en) * | 2018-07-25 | 2019-01-25 | 西安电子科技大学 | Video gesture classification method |
CN109325430B (en) * | 2018-09-11 | 2021-08-20 | 苏州飞搜科技有限公司 | Real-time behavior identification method and system |
CN109325430A (en) * | 2018-09-11 | 2019-02-12 | 北京飞搜科技有限公司 | Real-time Activity recognition method and system |
CN109325435A (en) * | 2018-09-15 | 2019-02-12 | 天津大学 | Video Action Recognition and Localization Algorithm Based on Cascaded Neural Network |
CN109325435B (en) * | 2018-09-15 | 2022-04-19 | 天津大学 | Video action recognition and localization method based on cascaded neural network |
CN109376603A (en) * | 2018-09-25 | 2019-02-22 | 北京周同科技有限公司 | A kind of video frequency identifying method, device, computer equipment and storage medium |
CN109657546A (en) * | 2018-11-12 | 2019-04-19 | 平安科技(深圳)有限公司 | Video behavior recognition methods neural network based and terminal device |
WO2020108023A1 (en) * | 2018-11-28 | 2020-06-04 | 北京达佳互联信息技术有限公司 | Video motion classification method, apparatus, computer device, and storage medium |
CN109726765A (en) * | 2019-01-02 | 2019-05-07 | 京东方科技集团股份有限公司 | A kind of sample extraction method and device of visual classification problem |
CN109740670A (en) * | 2019-01-02 | 2019-05-10 | 京东方科技集团股份有限公司 | The method and device of visual classification |
US11055535B2 (en) | 2019-01-02 | 2021-07-06 | Boe Technology Group Co., Ltd. | Method and device for video classification |
US11210522B2 (en) | 2019-01-02 | 2021-12-28 | Boe Technology Group Co., Ltd. | Sample extraction method and device targeting video classification problem |
CN109886165A (en) * | 2019-01-23 | 2019-06-14 | 中国科学院重庆绿色智能技术研究院 | An Action Video Extraction and Classification Method Based on Moving Object Detection |
WO2020155713A1 (en) * | 2019-01-29 | 2020-08-06 | 北京市商汤科技开发有限公司 | Image processing method and device, and network training method and device |
US11113536B2 (en) | 2019-03-15 | 2021-09-07 | Boe Technology Group Co., Ltd. | Video identification method, video identification device, and storage medium |
CN110020639B (en) * | 2019-04-18 | 2021-07-23 | 北京奇艺世纪科技有限公司 | Video feature extraction method and related equipment |
CN110020639A (en) * | 2019-04-18 | 2019-07-16 | 北京奇艺世纪科技有限公司 | Video feature extraction method and relevant device |
CN111820947A (en) * | 2019-04-19 | 2020-10-27 | 无锡祥生医疗科技股份有限公司 | Ultrasonic heart reflux automatic capturing method and system and ultrasonic imaging equipment |
CN111820947B (en) * | 2019-04-19 | 2023-08-29 | 无锡祥生医疗科技股份有限公司 | Ultrasonic heart reflux automatic capturing method and system and ultrasonic imaging equipment |
CN110062248B (en) * | 2019-04-30 | 2021-09-28 | 广州酷狗计算机科技有限公司 | Method and device for recommending live broadcast room |
CN110062248A (en) * | 2019-04-30 | 2019-07-26 | 广州酷狗计算机科技有限公司 | Recommend the method and apparatus of direct broadcasting room |
CN112288345A (en) * | 2019-07-25 | 2021-01-29 | 顺丰科技有限公司 | Method and device for detecting loading and unloading port state, server and storage medium |
CN110602527A (en) * | 2019-09-12 | 2019-12-20 | 北京小米移动软件有限公司 | Video processing method, device and storage medium |
US11288514B2 (en) | 2019-09-12 | 2022-03-29 | Beijing Xiaomi Mobile Software Co., Ltd. | Video processing method and device, and storage medium |
CN111125405A (en) * | 2019-12-19 | 2020-05-08 | 国网冀北电力有限公司信息通信分公司 | Power monitoring image abnormality detection method and device, electronic device and storage medium |
CN111898458A (en) * | 2020-07-07 | 2020-11-06 | 中国传媒大学 | Violent Video Recognition Method Based on Bimodal Task Learning Based on Attention Mechanism |
CN111898458B (en) * | 2020-07-07 | 2024-07-12 | 中国传媒大学 | Violent video identification method for bimodal task learning based on attention mechanism |
CN111860353A (en) * | 2020-07-23 | 2020-10-30 | 北京以萨技术股份有限公司 | Video behavior prediction method, device and medium based on dual-stream neural network |
CN113139467A (en) * | 2021-04-23 | 2021-07-20 | 西安交通大学 | Hierarchical structure-based fine-grained video action identification method |
CN113395537A (en) * | 2021-06-16 | 2021-09-14 | 北京百度网讯科技有限公司 | Method and device for recommending live broadcast room |
CN113395537B (en) * | 2021-06-16 | 2023-05-16 | 北京百度网讯科技有限公司 | Method and device for recommending live broadcasting room |
CN113870040B (en) * | 2021-09-07 | 2024-05-21 | 天津大学 | Double-flow chart convolution network microblog topic detection method integrating different propagation modes |
CN113870040A (en) * | 2021-09-07 | 2021-12-31 | 天津大学 | Double-flow graph convolution network microblog topic detection method fusing different propagation modes |
CN114987551A (en) * | 2022-06-27 | 2022-09-02 | 吉林大学 | Lane departure early warning method based on double-current convolutional neural network |
CN116645917A (en) * | 2023-06-09 | 2023-08-25 | 浙江技加智能科技有限公司 | LED display brightness adjustment system and method thereof |
Also Published As
Publication number | Publication date |
---|---|
WO2018019126A1 (en) | 2018-02-01 |
CN106599789B (en) | 2019-10-11 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106599789A (en) | Video class identification method and device, data processing device and electronic device | |
Baldassarre et al. | Deep koalarization: Image colorization using cnns and inception-resnet-v2 | |
US10929649B2 (en) | Multi-pose face feature point detection method based on cascade regression | |
CN111445488B (en) | A Weakly Supervised Learning Approach to Automatically Identify and Segment Salt Bodies | |
CN105701508B (en) | Global local optimum model and conspicuousness detection algorithm based on multistage convolutional neural networks | |
CN106548192B (en) | Image processing method, device and electronic equipment neural network based | |
CN103984959B (en) | A kind of image classification method based on data and task-driven | |
CN108108751B (en) | Scene recognition method based on convolution multi-feature and deep random forest | |
CN109858466A (en) | A kind of face critical point detection method and device based on convolutional neural networks | |
CN112101344B (en) | Video text tracking method and device | |
CN108681695A (en) | Video actions recognition methods and device, electronic equipment and storage medium | |
CN106504233A (en) | Image electric power widget recognition methodss and system are patrolled and examined based on the unmanned plane of Faster R CNN | |
CN111126115B (en) | Violent sorting behavior identification method and device | |
CN109657612B (en) | Quality sorting system based on facial image features and application method thereof | |
CN107683469A (en) | A kind of product classification method and device based on deep learning | |
WO2022152009A1 (en) | Target detection method and apparatus, and device and storage medium | |
CN111368660A (en) | A single-stage semi-supervised image human object detection method | |
CN104866868A (en) | Metal coin identification method based on deep neural network and apparatus thereof | |
CN109918971A (en) | Number detection method and device in monitor video | |
CN110543848B (en) | Driver action recognition method and device based on three-dimensional convolutional neural network | |
CN109472193A (en) | Method for detecting human face and device | |
CN112418032A (en) | Human behavior recognition method and device, electronic equipment and storage medium | |
CN112364791B (en) | Pedestrian re-identification method and system based on generation of confrontation network | |
CN112613579A (en) | Model training method and evaluation method for human face or human head image quality and selection method for high-quality image | |
CN108961358A (en) | A kind of method, apparatus and electronic equipment obtaining samples pictures |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |