[go: up one dir, main page]

CN103824284B - Key frame extraction method based on visual attention model and system - Google Patents

Key frame extraction method based on visual attention model and system Download PDF

Info

Publication number
CN103824284B
CN103824284B CN201410039072.7A CN201410039072A CN103824284B CN 103824284 B CN103824284 B CN 103824284B CN 201410039072 A CN201410039072 A CN 201410039072A CN 103824284 B CN103824284 B CN 103824284B
Authority
CN
China
Prior art keywords
saliency
key
salient
domain
key frame
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201410039072.7A
Other languages
Chinese (zh)
Other versions
CN103824284A (en
Inventor
纪庆革
赵杰
刘勇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sun Yat Sen University
Guangzhou Zhongda Nansha Technology Innovation Industrial Park Co Ltd
Original Assignee
Sun Yat Sen University
Guangzhou Zhongda Nansha Technology Innovation Industrial Park Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sun Yat Sen University, Guangzhou Zhongda Nansha Technology Innovation Industrial Park Co Ltd filed Critical Sun Yat Sen University
Priority to CN201410039072.7A priority Critical patent/CN103824284B/en
Publication of CN103824284A publication Critical patent/CN103824284A/en
Application granted granted Critical
Publication of CN103824284B publication Critical patent/CN103824284B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Image Analysis (AREA)
  • Studio Devices (AREA)

Abstract

本发明公开了一种基于视觉注意力模型的关键帧提取方法和系统。其提取方法包括:在空域上,该方法用二项式系数滤波全局对比度进行显著度检测,并且利用自适应阈值对目标区域进行提取。算法不但能较好地保持显著目标区域边界,而且区域内显著度较均匀。然后,在时域上,该方法定义了运动的显著度,通过单应性矩阵对目标运动进行估计,采用关键点代替目标进行显著度检测,之后融合空域显著度的数据,提出基于能量函数边界扩展的方法获得包围盒作为时域的显著目标区域。最后,该方法通过显著目标区域降低视频的丰富性,采用结合在线聚类的镜头自适应方法进行关键帧提取。

The invention discloses a key frame extraction method and system based on a visual attention model. The extraction method includes: In the space domain, the method uses the binomial coefficient to filter the global contrast to detect the saliency, and uses the adaptive threshold to extract the target area. The algorithm can not only keep the boundary of the salient object area well, but also the saliency degree in the area is relatively uniform. Then, in the time domain, the method defines the saliency of the motion, estimates the target motion through the homography matrix, uses the key points instead of the target to detect the saliency, and then fuses the data of the saliency in the space domain to propose a boundary based on the energy function. The extended method obtains bounding boxes as salient object regions in the temporal domain. Finally, the method reduces the richness of the video by salient target regions, and uses a shot-adaptive method combined with online clustering for keyframe extraction.

Description

一种基于视觉注意力模型的关键帧提取方法和系统A key frame extraction method and system based on visual attention model

技术领域technical field

本发明涉及视频分析技术领域,特别是涉及一种基于视觉注意力模型的关键帧提取方法和系统。The invention relates to the technical field of video analysis, in particular to a method and system for extracting key frames based on a visual attention model.

背景技术Background technique

随着互联网技术的快速发展,我们已经迈入了信息大爆炸时代,各种各样的网络应用和多媒体技术的快速发展得到了广泛的应用。视频作为一种常见的网络信息载体,生动而直观,具有很强的观赏性和表现力,从而在各个领域得到了广泛的应用,使得视频数据海量增长,以著名的视频网站YouTube为例,每分钟由用户上传的视频约有60小时(数据取自2012年1月23日),而且依然保持着增长趋势。如何快速有效地存储、管理和访问海量的视频资源成为当前视频应用领域的一个重要问题。视频因为具有时域相关性,传统方式下,用户掌握一段视频信息需要自始至终浏览完整段视频。无关视频占据用户大量时间的同时,也浪费了大量网络带宽。因此,我们需要对视频添加辅助信息,帮助用户更好地筛选。目前成熟的系统中普遍采用传统的文字标注法,通过人工方式手动分类,用标题、描述等文字赋予视频人工语义。面对海量视频,这项任务不但工作量大,而且不同的人对视频理解不同,其他人无法通过作者的文字标注判断视频是否符合自己的兴趣。With the rapid development of Internet technology, we have entered the era of information explosion, and various network applications and the rapid development of multimedia technology have been widely used. As a common network information carrier, video is vivid and intuitive, with strong appreciation and expressiveness, and thus has been widely used in various fields, resulting in a massive increase in video data. Taking the famous video website YouTube as an example, every There are about 60 hours of videos uploaded by users per minute (data taken from January 23, 2012), and it still maintains a growing trend. How to quickly and effectively store, manage and access massive video resources has become an important issue in the current video application field. Due to the time-domain correlation of video, in the traditional way, users need to browse the entire video from beginning to end to grasp a piece of video information. While irrelevant videos take up a lot of time for users, they also waste a lot of network bandwidth. Therefore, we need to add auxiliary information to the video to help users filter better. At present, the traditional text annotation method is generally used in mature systems, which are manually classified by manual methods, and artificial semantics are given to videos with text such as titles and descriptions. Faced with a large number of videos, this task is not only a lot of work, but also different people have different understandings of videos, and others cannot judge whether the videos meet their interests through the author's text annotation.

因此,人们迫切需要一种自动化的方式对视频进行有效地概括。Therefore, there is an urgent need for an automated way to effectively summarize videos.

发明内容Contents of the invention

为了解决现有技术的不足,本发明首先提供一种基于视觉注意力模型的视频关健帧提取方法,采用该方法能够有效的获得对视频镜头具有很好代表性的关键帧。In order to solve the deficiencies of the prior art, the present invention firstly provides a video key frame extraction method based on a visual attention model, which can effectively obtain key frames that are very representative of video shots.

本发明的又一目的是提出一种基于视觉注意力模型的视频关健帧提取系统。Another object of the present invention is to propose a video key frame extraction system based on a visual attention model.

为了实现上述目的,本发明的技术方案为:In order to achieve the above object, the technical solution of the present invention is:

一种基于视觉注意力模型的视频关键帧提取方法,包括:A video key frame extraction method based on a visual attention model, comprising:

在空域上,用二项式系数滤波全局对比度进行显著度检测,并且利用自适应阈值对目标区域进行提取;采用这种方式不但能较好地保持显著目标区域边界,而且区域内显著度较均匀。In the spatial domain, the binomial coefficient is used to filter the global contrast for saliency detection, and the adaptive threshold is used to extract the target area; this method can not only better maintain the boundary of the salient target area, but also the saliency in the area is relatively uniform .

在时域上,定义运动的显著度,通过单应性矩阵对目标运动进行估计,采用关键点代替目标进行显著度检测,融合空域显著度的数据,提出基于能量函数边界扩展的方法获得包围盒作为时域的显著目标区域;In the time domain, define the saliency of the motion, estimate the target motion through the homography matrix, use the key points instead of the target to detect the saliency, fuse the data of the saliency in the space domain, and propose a method based on the boundary expansion of the energy function to obtain the bounding box as a salient target area in the temporal domain;

通过显著目标区域降低视频的丰富性,采用结合在线聚类的镜头自适应方法进行关键帧提取。The richness of the video is reduced by salient target regions, and a shot-adaptive method combined with online clustering is used for keyframe extraction.

一种基于视觉注意力模型的视频关键帧提取系统,该系统包括显著区域提取模块,关键帧提取模块;A video key frame extraction system based on a visual attention model, the system includes a salient area extraction module and a key frame extraction module;

具体的,所述显著区域提取模块包括:Specifically, the salient region extraction module includes:

空域显著区域提取模块,用于提取空域上的显著区域;The airspace salient area extraction module is used to extract the salient area on the airspace;

时域关键点显著度获取模块,用于提取时域上的关键点的显著度值;The time-domain key point saliency acquisition module is used to extract the saliency value of the key point on the time domain;

融合模块,用于将空域上的显著区域和时域上的关键点进行融合,并最终获取显著区域。The fusion module is used to fuse the salient areas in the air domain and the key points in the time domain, and finally obtain the salient areas.

所述关键帧提取模块包括:The key frame extraction module includes:

静态镜头关键帧提取模块,用于静态镜头的关键帧提取;Static lens key frame extraction module, used for key frame extraction of static lens;

动态镜头关键帧提取模块,用于动态镜头的关键帧提取;Dynamic lens key frame extraction module, used for key frame extraction of dynamic lens;

镜头自适应模块,用于静态镜头关键帧提取模块和动态镜头关键帧提取模块之间的控制。The lens adaptive module is used for the control between the static lens key frame extraction module and the dynamic lens key frame extraction module.

与现有技术相比,本发明的有益效果为:采用本发明能够自动的对视频进行有地概括,有效的获得对视频镜头具有很好代表性的关键帧。Compared with the prior art, the beneficial effects of the present invention are: adopting the present invention can automatically and effectively summarize the video, and effectively obtain key frames that are very representative of the video shots.

附图说明Description of drawings

图1为本发明静态镜头的关键帧提取流程图。Fig. 1 is a flow chart of key frame extraction of a static shot in the present invention.

图2为本发明动态镜头的关键帧提取流程图。Fig. 2 is a flow chart of key frame extraction of a dynamic shot in the present invention.

图3为本发明自适应镜头的关键帧提取流程图。Fig. 3 is a flow chart of key frame extraction of an adaptive lens according to the present invention.

具体实施方式detailed description

下面结合附图对本发明作进一步详细的说明。The present invention will be described in further detail below in conjunction with the accompanying drawings.

本发明公开的一种基于视觉注意力模型的视频关键帧提取方法,具体实施方式如下:A kind of video key frame extraction method based on visual attention model disclosed by the present invention, the specific implementation is as follows:

首先,在空域上,通过用二项式系数滤波全局对比度进行显著度检测,并且利用自适应阈值对目标区域进行提取,具体方法如下:First, in the spatial domain, the binomial coefficient is used to filter the global contrast for saliency detection, and the adaptive threshold is used to extract the target area. The specific method is as follows:

(11)二项式系数按照杨辉三角构造,N层的归一化因子为2N。选择第四层,因此滤波器系数B4=(1/16)[1 4 6 4 1];(11) The binomial coefficient is constructed according to the Yang Hui triangle, and the normalization factor of the N layer is 2 N . Select the fourth layer, so the filter coefficient B 4 =(1/16)[1 4 6 4 1];

(12)设I为原刺激强度,为周围刺激强度的均值,为I与B4的卷积;将像素点采用CIELAB颜色空间的向量形式衡量刺激的强弱,刺激的对比度即为两CIELAB向量的欧式距离,因此对于像素点(x,y)的刺激度检测为(12) Let I be the original stimulus intensity, is the mean value of the surrounding stimulus intensity, It is the convolution of I and B4 ; the pixels are measured in the form of vectors in the CIELAB color space to measure the strength of the stimulus, and the contrast of the stimulus is the Euclidean distance between the two CIELAB vectors. Therefore, for the stimulus detection of the pixel (x, y) for

(13)得到显著度的测量集合Ss=(s11,s12,...,sNM)之后,利用自适应阈值对目标区域进行提取,其中sij(0≤i≤N,0≤j≤M)为像素点(i,j)的显著度,M,N分别为图像的宽度和高度。(13) After obtaining the saliency measurement set S s =(s 11 ,s 12 ,...,s NM ), use the adaptive threshold to extract the target area, where s ij (0≤i≤N,0≤ j≤M) is the salient degree of the pixel point (i, j), M and N are the width and height of the image respectively.

具体,通过以下方法实现自适应阈值对目标区域进行提取:Specifically, the adaptive threshold is used to extract the target area through the following methods:

(21)定义像素点(x,y)全局显著度检测计算式(21) Define the global saliency detection calculation formula of pixel point (x, y)

其中A为检测的面积,为原图像经滤波器B4滤波后像素点(x,y)的刺激强度,I(i,j)为像素点(i,j)的原刺激强度,M,N分别为图像的宽度和高度;where A is the detected area, is the stimulus intensity of the pixel (x, y ) after the original image is filtered by filter B4, I(i, j) is the original stimulus intensity of the pixel (i, j), M, N are the width and height of the image respectively ;

(22)通过直方图进行运算加速,将原刺激强度I映射到刺激空间中,最终对于用户感受到的刺激的显著度如下所示(22) Accelerate the operation through the histogram, and map the original stimulus intensity I to the stimulus space In the end, the stimulus felt by the user The significance of

其中D为刺激在m个最近刺激之间的距离m为人为控制参数,在本实施例中取m为8;where D is the stimulus The distance between the m nearest stimuli m is an artificial control parameter, m is taken as 8 in the present embodiment;

(23)通过改变阈值Ts指定前景和背景区域,然后以获得最小的能量函数的阈值作为最优阈值;以Ts为阈值的能量函数的定义如下:(23) Specify the foreground and background regions by changing the threshold T s , and then obtain the threshold of the smallest energy function as the optimal threshold; the definition of the energy function with T s as the threshold is as follows:

其中Sn由公式(2)获得,λ为显著目标能量的权重,在本实施例中取λ=1.0,N为图像的总像素数,f(Ts,Sn)=max(0,sign(Sn-Ts)),V(I,Ts,s)为对周围刺激的相似度的衡量,选择当前Ts下显著点和其8邻域的像素点组成点对Pair进行计算,dist(p,q)为两点之间的空间距离,σ为人为控制参数,在本实施例中取σ=10.0。Among them, S n is obtained by formula (2), λ is the weight of significant target energy, in this embodiment, λ=1.0, N is the total number of pixels of the image, f(T s ,S n )=max(0,sign (S n -T s )), V(I,T s ,s) is a measure of the similarity of surrounding stimuli, select the salient point under the current T s and its 8 neighboring pixels to form a point to calculate the Pair, dist(p,q) is the spatial distance between two points, σ is a human control parameter, and σ=10.0 is taken in this embodiment.

因此给定一幅图像以及显著度图,通过最小化能量函数对Ts进行估计,当像素点属于显著目标时被标记为1,其余标记为0,参数λ和σ需要事先手工设定。Therefore, given an image and a saliency map, T s is estimated by minimizing the energy function. When a pixel belongs to a salient object, it is marked as 1, and the rest are marked as 0. The parameters λ and σ need to be manually set in advance.

然后,在时域上,定义运动的显著度,通过单应性矩阵对目标运动进行估计,采用关键点代替目标进行显著度检测,之后融合空域显著度的数据,提出基于能量函数边界扩展的方法获得包围盒作为时域的显著目标区域,具体方法如下:Then, in the time domain, the saliency of motion is defined, the target motion is estimated through the homography matrix, and key points are used to replace the target for saliency detection, and then the data of spatial saliency is fused, and a method based on energy function boundary extension is proposed Obtain the bounding box as the salient target area in the time domain, the specific method is as follows:

(31)给定一幅图像,采用实时性好的FAST(Features from Accelerated SegmentTest)特征点检测算法获得图像的关键点;(31) Given an image, use the FAST (Features from Accelerated SegmentTest) feature point detection algorithm with good real-time performance to obtain the key points of the image;

(32)给定相邻的两帧图像,采用FLANN(Fast Library for Approximate NearestNeighbor)进行快速的相关点匹配;(32) Given two adjacent frames of images, use FLANN (Fast Library for Approximate Nearest Neighbor) for fast correlation point matching;

(33)用单应性矩阵(Homography Matrix)H来描述关键点的运动,由于一个H仅仅描述一种运动形式,同一段视频内存在的运动形式是多样的,因此需要多个H对不同的运动进行描述。在本实施例中采用RANSAC算法,通过不断迭代,获得一系列单应性矩阵的估计H={H1,H2,...,Hn};(33) Use the homography matrix (Homography Matrix) H to describe the motion of key points. Since one H only describes one motion form, and the motion forms in the same video are diverse, multiple H pairs of different Movement is described. In this embodiment, the RANSAC algorithm is used to obtain a series of homography matrix estimates H={H 1 ,H 2 ,...,H n } through continuous iterations;

(34)定义关键点的时域显著度为(34) Define the temporal saliency of key points as

其中Am为运动状态Hm的所有关键点的分布面积,W和H为视频图像的宽度和高度;Among them, A m is the distribution area of all key points in the motion state H m , and W and H are the width and height of the video image;

(35)将空域的显著度值与获取的关键点的时域显著度值进行融合;(35) Fusing the saliency value of the spatial domain with the temporal saliency value of the acquired key points;

(36)采用基于能量函数边界扩展的方法获得包围盒作为时域的显著目标区域。(36) adopt an energy function-based boundary extension method to obtain bounding boxes as salient object regions in the temporal domain.

具体,通过以下方法实现空域的显著度值与获取的关键点的时域显著度值进行融合:Specifically, the saliency value of the spatial domain and the temporal saliency value of the obtained key points are fused by the following methods:

(41)定义一个运动显著性的对比度其中关键点时域显著度值St由公式(5)获得,为关键点时域显著度值的均值;(41) Defining the contrast of a motion salience Among them, the time-domain saliency value S t of key points is obtained by formula (5), is the mean value of the time-domain saliency value of key points;

(42)运动的显著性应该针对在空域上依然有较强区分度的目标,因此对时域显著度St的统计范围应该有所限制,设pi为St的第i个关键点,则pi应满足其中为空域显著度值均值;(42) The salience of motion should be aimed at targets that still have a strong degree of discrimination in the airspace, so the statistical range of the time-domain saliency S t should be limited. Let p i be the i-th key point of S t , Then p i should satisfy in is the mean value of the spatial significance value;

(43)定义时域的权重空域的权重将满足(42)的关键点的时域与空域显著度值按权值相加。(43) Define the weight of the time domain weight of airspace The temporal domain and spatial domain saliency values of the key points satisfying (42) are added according to the weight.

具体,通过以下方法实现时域显著目标区域提取:Specifically, the time-domain salient target region extraction is realized by the following methods:

将空域的显著关键点p作为种子点,种子区域采用矩形的包围盒B,设bi为包围盒B的四条边,i∈{1,2,3,4}为上下左右的编号,边界扩展的算法如下:The prominent key point p of the airspace is used as the seed point, and the seed area adopts a rectangular bounding box B. Let b i be the four sides of the bounding box B, i∈{1,2,3,4} is the number of up, down, left, and right, and the boundary is extended The algorithm is as follows:

初始化:包围盒B的上下左右顶点都设为关键点p位置,点p为包围盒B的内部点。Initialization: The upper, lower, left, and right vertices of the bounding box B are set to the position of the key point p, and point p is the internal point of the bounding box B.

步骤1:从i=1以递增的顺序计算bi外边界上的显著度能量Eouter(i)和内边界的显著度能量Einner(i),能量函数的计算如公式(4),然后计算边界可以外扩的权值为其中li为当前包围盒B的第i条边的长度。Step 1: Calculate the saliency energy E outer (i) on the outer boundary of bi and the saliency energy E inner (i) on the inner boundary of bi in increasing order from i =1, the calculation of the energy function is as formula (4), and then The weight that can be expanded to calculate the boundary is Where l i is the length of the i-th side of the current bounding box B.

步骤2:如果w(i)≥ε,则第i条边向外扩展一个像素单元。ε为扩展判定的阈值,需要预先设置。在本文的实验中,设置为0.8Ts′,Ts′为包围盒内的空域显著度均值。Step 2: If w(i)≥ε, the i-th side extends outward by one pixel unit. ε is the threshold for extending the decision, which needs to be set in advance. In the experiments in this paper, it is set to 0.8T s ′, where T s ′ is the mean value of the spatial saliency in the bounding box.

步骤3:在步骤2中如果没有新的边被扩展,则停止算法,输出包围盒B;否则,重复步骤1和步骤2。Step 3: If no new edge is extended in step 2, stop the algorithm and output bounding box B; otherwise, repeat step 1 and step 2.

最后,通过显著目标区域降低视频的丰富性,采用结合在线聚类的镜头自适应方法进行关键帧提取,具体方法如下:Finally, the richness of the video is reduced by the significant target area, and the key frame extraction is performed by a shot adaptive method combined with online clustering. The specific method is as follows:

(51)将显著区域的RGB颜色空间转换为HSV颜色空间,取其中H分量(色调)和S分量(饱和度)计算色相饱和度直方图(Hue-saturation Histogram)。记Hp(i)为第p帧显著目标区域的色相饱和度直方图的第i个bin值,本实施例采用Bhattacharyya距离来衡量两帧之间的视觉距离 (51) Convert the RGB color space of the salient area to the HSV color space, and take the H component (hue) and S component (saturation) to calculate the hue-saturation histogram (Hue-saturation Histogram). Note that Hp(i) is the i-th bin value of the hue-saturation histogram of the salient target area in frame p , and this embodiment uses the Bhattacharyya distance to measure the visual distance between two frames

(52)采用结合在线聚类的镜头自适应方法进行关键帧提取,以静态镜头的聚类方式为主,动态镜头的聚类方式为辅。对于静态镜头,以显著区域的色相饱和度直方图为依据进行在线聚类,选取聚类中任意一帧作为关键帧。对于动态镜头,首先跟踪显著运动目标,然后以显著运动目标的跟踪作为在线聚类的依据,显著目标的位置信息作为从聚类中提取关键帧的依据。(52) A shot adaptive method combined with online clustering is used for key frame extraction, the clustering method of static shots is the main method, and the clustering method of dynamic shots is supplemented. For static shots, online clustering is performed based on the hue-saturation histogram of the salient area, and any frame in the cluster is selected as a key frame. For dynamic shots, firstly, the significant moving objects are tracked, and then the tracking of the significant moving objects is used as the basis for online clustering, and the location information of the prominent objects is used as the basis for extracting key frames from the clustering.

具体,如图1,通过以下步骤实现静态镜头在线聚类:Specifically, as shown in Figure 1, the online clustering of static shots is realized through the following steps:

初始化:计算静态镜头第一帧的色相饱和度直方图初始胞腔数N=1,并且将作为胞腔Cell1的形心C1的矢量,C1=f1Initialization: Calculate the hue-saturation histogram of the first frame of the static shot The initial cell number N=1, and the As the vector of the centroid C 1 of the cell cell Cell 1 , C 1 =f 1 .

S11:如果当前帧p属于静态镜头,计算当前帧的色相饱和度直方图HpS11: If the current frame p belongs to a static shot, calculate the hue-saturation histogram H p of the current frame.

S12:计算p和各胞腔形心的视觉距离,获得其中最小的视觉距离胞腔其中m为胞腔的索引号。S12: Calculate the visual distance between p and the centroid of each cell, and obtain the smallest visual distance cell where m is the index number of the cell.

S13:将Dsal(p,Cm)与阈值εc进行比较,当Dsal(p,Cm)≤εc时,把p归入胞腔Cellm中,然后用Hp替代Cellm的形心。否则,增加胞腔CellN+1,将Hp作为胞腔CellN+1的形心CN+1的矢量,最后更新胞腔数N=N+1。S13: Compare D sal (p, C m ) with the threshold ε c , when D sal (p, C m ) ≤ ε c , classify p into Cell m , and then replace Cell m with H p Centroid. Otherwise, add cell N+1 , use H p as the vector of centroid C N+1 of cell N +1, and finally update cell number N=N+1.

S14:对于所有静态镜头帧重复S11、S12和S13。S14: Repeat S11, S12 and S13 for all still shot frames.

具体,如图2,通过以下步骤实现动态镜头的关键帧提取:Specifically, as shown in Figure 2, the key frame extraction of dynamic shots is realized through the following steps:

初始化:获取动态镜头的第一帧。Initialization: Get the first frame of the dynamic camera.

S21:获得跟踪目标区域,初始化粒子或重采样,提取视频下一帧,判断该帧是否为空,如果为空,则结束。S21: Obtain the tracking target area, initialize particles or resample, extract the next frame of the video, judge whether the frame is empty, and if it is empty, end.

S22:获得FAST特征向量,用FLANN算法进行匹配,更新特征向量权值,如果特征向量不足,则结束。S22: Obtain FAST feature vectors, use the FLANN algorithm for matching, update the weights of feature vectors, and end if the feature vectors are insufficient.

S23:更新各粒子权值,计算关键帧权值和目标区域,跳转执行S21。S23: Update the weights of each particle, calculate the key frame weights and the target area, and jump to S21.

本发明公开的一种基于视觉注意力模型的关键帧提取系统包括显著区域提取模块和关键帧提取模块。A key frame extraction system based on a visual attention model disclosed in the present invention includes a salient area extraction module and a key frame extraction module.

显著区域提取模块包括:Salient region extraction modules include:

空域显著区域提取模块,用于提取空域上的显著区域;The airspace salient area extraction module is used to extract the salient area on the airspace;

时域关键点显著度获取模块,用于提取时域上的关键点的显著度值;The time-domain key point saliency acquisition module is used to extract the saliency value of the key point on the time domain;

融合模块,用于将空域上的显著区域和时域上的关键点进行融合,并最终获取显著区域。The fusion module is used to fuse the salient areas in the air domain and the key points in the time domain, and finally obtain the salient areas.

关键帧提取模块包括:The keyframe extraction module includes:

静态镜头关键帧提取模块,用于静态镜头的关键帧提取;Static lens key frame extraction module, used for key frame extraction of static lens;

动态镜头关键帧提取模块,用于动态镜头的关键帧提取;Dynamic lens key frame extraction module, used for key frame extraction of dynamic lens;

镜头自适应模块,用于静态镜头关键帧提取模块和动态镜头关键帧提取模块之间的控制。The lens adaptive module is used for the control between the static lens key frame extraction module and the dynamic lens key frame extraction module.

以上所述,仅为本发明的较佳实施例而已,并非用于限定本发明的保护范围,应当理解,本发明并不限于这里所描述的实现方案,这些实现方案描述的目的在于帮助本领域中的技术人员实践本发明。任何本领域中的技术人员很容易在不脱离本发明精神和范围的情况下进行进一步的改进和完善,因此任何在本发明的精神原则之内所作出的修改、等同替换和改进等,均应包含在本发明的权利要求保护范围之内。The above descriptions are only preferred embodiments of the present invention, and are not intended to limit the protection scope of the present invention. It should be understood that the present invention is not limited to the implementation solutions described here. The purpose of these implementation solutions descriptions is to help those skilled in the art Those skilled in the art practice the present invention. Any person skilled in the art can easily make further improvements and improvements without departing from the spirit and scope of the present invention, so any modifications, equivalent replacements and improvements made within the spirit and principles of the present invention should be Included within the protection scope of the claims of the present invention.

Claims (8)

1.一种基于视觉注意力模型的关键帧提取方法,用于对视频的关键帧进行提取,其特征在于,包括:1. A key frame extraction method based on visual attention model, for extracting the key frame of video, it is characterized in that, comprising: 在空域上,用二项式系数滤波全局对比度进行显著度检测,并且利用自适应阈值对目标区域进行提取;In the spatial domain, the binomial coefficient is used to filter the global contrast for saliency detection, and the adaptive threshold is used to extract the target area; 在时域上,定义运动的显著度,通过单应性矩阵对目标运动进行估计,采用关键点代替目标进行显著度检测,融合空域显著度的数据,提出基于能量函数边界扩展的方法获得包围盒作为时域的显著目标区域;In the time domain, define the saliency of the motion, estimate the target motion through the homography matrix, use the key points instead of the target to detect the saliency, fuse the data of the spatial saliency, and propose a method based on the boundary expansion of the energy function to obtain the bounding box as a salient target area in the temporal domain; 通过显著目标区域降低视频的丰富性,采用结合在线聚类的镜头自适应方法进行关键帧提取;Reduce the richness of the video through significant target areas, and use a shot adaptive method combined with online clustering for key frame extraction; 在空域上,通过用二项式系数滤波全局对比度进行显著度检测,并且利用自适应阈值对目标区域进行提取,具体方法如下:In the spatial domain, saliency detection is performed by filtering the global contrast with the binomial coefficient, and the target area is extracted by using an adaptive threshold. The specific method is as follows: (11)二项式系数按照杨辉三角构造,N层的归一化因子为2N;选择第四层,滤波器系数B4=(1/16)[1 4 6 4 1];(11) The binomial coefficient is constructed according to the Yang Hui triangle, and the normalization factor of the N layer is 2 N ; the fourth layer is selected, and the filter coefficient B 4 =(1/16)[1 4 6 4 1]; (12)设I为原刺激强度,为周围刺激强度的均值,为I与B4的卷积;将像素点采用CIELAB颜色空间的向量形式衡量刺激的强弱,刺激的对比度即为两CIELAB向量的欧式距离,因此对于像素点(x,y)的刺激度检测为(12) Let I be the original stimulus intensity, is the mean value of the surrounding stimulus intensity, It is the convolution of I and B4 ; the pixels are measured in the form of vectors in CIELAB color space to measure the strength of the stimulus, and the contrast of the stimulus is the Euclidean distance between the two CIELAB vectors, so for the stimulus detection of the pixel (x, y) for SS (( xx ,, ythe y )) == || || II BB 44 (( xx ,, ythe y )) -- II ‾‾ || || -- -- -- (( 11 )) (13)得到显著度的测量集合Ss=(s11,s12,…,sNM)后,利用自适应阈值对目标区域进行提取,其中sij为像素点(i,j)的显著度,0≤i≤N,0≤j≤M,M,N分别为图像的宽度和高度;通过以下方法实现自适应阈值对目标区域进行提取:(13) After obtaining the saliency measurement set S s =(s 11 ,s 12 ,…,s NM ), use the adaptive threshold to extract the target area, where s ij is the saliency of the pixel point (i,j) , 0≤i≤N, 0≤j≤M, M, N are the width and height of the image respectively; implement the adaptive threshold to extract the target area by the following method: (21)定义像素点(x,y)全局显著度检测计算式(21) Define the global saliency detection calculation formula of pixel point (x, y) SS gg (( xx ,, ythe y )) == 11 AA ΣΣ ii == 00 NN ΣΣ jj == 00 Mm || || II BB 44 (( xx ,, ythe y )) -- II (( ii ,, jj )) || || -- -- -- (( 22 )) 其中A为检测的面积,为原图像经滤波器B4滤波后像素点(x,y)的刺激强度,I(i,j)为像素点(i,j)的原刺激强度,M,N分别为图像的宽度和高度;where A is the detected area, is the stimulus intensity of pixel (x, y ) after the original image is filtered by filter B4, I(i, j) is the original stimulus intensity of pixel (i, j), M, N are the width and height of the image respectively ; (22)通过直方图进行运算加速,将原刺激强度I映射到刺激空间中,最终对于用户感受到的刺激的显著度如下所示(22) Accelerate the operation through the histogram, and map the original stimulus intensity I to the stimulus space In the end, the stimulus felt by the user The significance of SS (( II BB 44 (( II )) )) == 11 (( mm -- 11 )) DD. (( II BB 44 (( II )) )) ΣΣ ii == 11 mm (( DD. (( II BB 44 (( II )) )) -- || || II BB 44 (( II )) -- II BB 44 (( II ii )) || || )) SS gg (( II BB 44 (( II )) )) -- -- -- (( 33 )) 其中D为刺激在m个最近刺激之间的距离 where D is the stimulus The distance between the m nearest stimuli (23)通过改变阈值Ts指定前景和背景区域,然后以获得最小的能量函数的阈值作为最优阈值;以Ts为阈值的能量函数的定义如下:(23) Designate the foreground and background regions by changing the threshold T s , and then obtain the threshold of the smallest energy function as the optimal threshold; the definition of the energy function with T s as the threshold is as follows: EE. (( II ,, TT sthe s ,, λλ ,, σσ )) == λλ ΣΣ nno == 11 NN (( ff (( TT sthe s ,, SS nno )) SS nno )) ++ VV (( II ,, TT sthe s ,, σσ )) -- -- -- (( 44 )) 其中Sn由公式(2)获得,λ为显著目标能量的权重,N为图像的总像素数,f(Ts,Sn)=max(0,sign(Sn-Ts)),V(I,Ts,σ)为对周围刺激的相似度的衡量,选择当前Ts下显著点和其8邻域的像素点组成点对Pair进行计算,dist(p,q)为两点之间的空间距离,σ为控制参数。where S n is obtained by formula (2), λ is the weight of salient target energy, N is the total number of pixels in the image, f(T s ,S n )=max(0,sign(S n -T s )), V (I, T s , σ) is a measure of the similarity of the surrounding stimuli, select the salient points under the current T s and the pixel points in its 8 neighborhoods to calculate the Pair, dist(p,q) is the spatial distance between two points, and σ is the control parameter. 2.根据权利要求1所述的方法,其特征在于,在时域上,定义运动的显著度,通过单应性矩阵对目标运动进行估计,采用关键点代替目标进行显著度检测,之后融合空域显著度的数据,提出基于能量函数边界扩展的方法获得包围盒作为时域的显著目标区域,具体方法如下:2. The method according to claim 1, characterized in that, in the time domain, the saliency of the motion is defined, the target motion is estimated through the homography matrix, the key points are used instead of the target for saliency detection, and then the spatial domain is fused For saliency data, a method based on energy function boundary extension is proposed to obtain the bounding box as the salient target area in the time domain. The specific method is as follows: (31)给定一幅图像,采用实时性好的FAST特征点检测算法获得图像的关键点;(31) Given an image, use the FAST feature point detection algorithm with good real-time performance to obtain the key points of the image; (32)给定相邻的两帧图像,采用FLANN进行快速的相关点匹配;(32) Given two adjacent frames of images, use FLANN to perform fast correlation point matching; (33)用多个单应性矩阵H来描述关键点的运动,采用RANSAC算法,通过不断迭代,获得一系列单应性矩阵的估计H={H1,H2,...,Hn};(33) Use multiple homography matrices H to describe the movement of key points, and use the RANSAC algorithm to obtain a series of homography matrix estimates H={H 1 ,H 2 ,...,H n through continuous iteration }; (34)定义关键点的时域显著度为(34) Define the temporal saliency of key points as SS tt (( pp mm )) == AA mm WW ×× Hh ΣΣ ii == 11 nno AA ii DD. (( pp mm ,, Hh ii )) -- -- -- (( 55 )) 其中Am为运动状态Hm的所有关键点的分布面积,W和H为视频图像的宽度和高度;Among them, A m is the distribution area of all key points in the motion state H m , and W and H are the width and height of the video image; (35)采用基于能量函数边界扩展的方法获得包围盒作为时域的显著目标区域。(35) A method based on energy function boundary expansion is used to obtain bounding boxes as salient object regions in the time domain. 3.根据权利要求2所述的方法,其特征在于,通过以下方法实现空域的显著度值与获取的关键点的时域显著度值进行融合:3. The method according to claim 2, characterized in that, the saliency value of the space domain is fused with the time domain saliency value of the key point obtained by the following method: (41)定义一个运动显著性的对比度其中关键点时域显著度值St由公式(5)获得,为关键点时域显著度值的均值;(41) Define a motion salience contrast Among them, the time-domain saliency value S t of key points is obtained by formula (5), is the mean value of the time-domain saliency value of key points; (42)设pi为St的第i个关键点,则pi应满足其中为空域显著度值均值;(42) Let p i be the ith key point of S t , then p i should satisfy in is the mean value of the spatial significance value; (43)定义时域的权重空域的权重将满足步骤(42)中的关键点的时域与空域显著度值按权值相加。(43) Define the weight of the time domain weight of airspace The time-domain and space-domain saliency values satisfying the key points in step (42) are added according to weights. 4.根据权利要求2所述的方法,其特征在于,通过以下方法实现时域显著目标区域提取:4. The method according to claim 2, characterized in that, the extraction of the salient target region in time domain is realized by the following method: 将空域的显著关键点p作为种子点,种子区域采用矩形的包围盒B,设bi为包围盒B的四条边,i∈{1,2,3,4}为上下左右的编号,边界扩展的算法如下:The prominent key point p of the airspace is used as the seed point, and the seed area adopts a rectangular bounding box B. Let b i be the four sides of the bounding box B, i∈{1,2,3,4} is the number of up, down, left, and right, and the boundary is extended The algorithm is as follows: 初始化:包围盒B的上下左右顶点都设为关键点p位置,点p为包围盒B的内部点;Initialization: The upper, lower, left, and right vertices of the bounding box B are set to the position of the key point p, and point p is the internal point of the bounding box B; 步骤1:从i=1以递增的顺序计算bi外边界上的显著度能量Eouter(i)和内边界的显著度能量Einner(i),能量函数的计算如公式(4),然后计算边界外扩的权值为其中li为当前包围盒B的第i条边的长度;Step 1: Calculate the saliency energy E outer (i) on the outer boundary of bi and the saliency energy E inner (i) of the inner boundary in increasing order from i =1, the calculation of the energy function is as formula (4), and then The weight to calculate the boundary extension is Where l i is the length of the i-th side of the current bounding box B; 步骤2:如果w(i)≥ε,则第i条边向外扩展一个像素单元;ε为设定的扩展判定的阈值,设置为0.8Ts′,Ts′为包围盒内的空域显著度均值;Step 2: If w(i)≥ε, then the i-th edge is extended outward by one pixel unit; ε is the set threshold for expanding the judgment, set to 0.8T s ′, and T s ′ is the significant space in the bounding box degree mean; 步骤3:在步骤2中如果没有新的边被扩展,则停止算法,输出包围盒B;否则,重复步骤1和步骤2。Step 3: If no new edge is extended in step 2, stop the algorithm and output bounding box B; otherwise, repeat step 1 and step 2. 5.根据权利要求1所述的方法,其特征在于,通过显著目标区域降低视频的丰富性,采用结合在线聚类的镜头自适应方法进行关键帧提取,具体方法如下:5. method according to claim 1, is characterized in that, reduces the richness of video by significant target area, adopts the shot adaptive method in conjunction with online clustering to carry out key frame extraction, and concrete method is as follows: (51)将显著区域的RGB颜色空间转换为HSV颜色空间,取其中H分量和S分量计算色相饱和度直方图,记Hp(i)为第p帧显著目标区域的色相饱和度直方图的第i个bin值,采用Bhattacharyya距离来衡量第p、q两帧之间的视觉距离 (51) Convert the RGB color space of the salient area to the HSV color space, take the H component and the S component to calculate the hue-saturation histogram, record H p (i) as the hue-saturation histogram of the p-th frame salient target area For the i-th bin value, the Bhattacharyya distance is used to measure the visual distance between the p and q frames (52)采用结合在线聚类的镜头自适应方法进行关键帧提取,以静态镜头的聚类方式为主,动态镜头的聚类方式为辅;(52) Using the lens adaptive method combined with online clustering to extract key frames, the clustering method of static shots is the main method, and the clustering method of dynamic shots is supplemented; 对于静态镜头,以显著区域的色相饱和度直方图为依据进行在线聚类,选取聚类中任意一帧作为关键帧;For static shots, online clustering is performed based on the hue-saturation histogram of the salient area, and any frame in the cluster is selected as a key frame; 对于动态镜头,首先跟踪显著运动目标,然后以显著运动目标的跟踪作为在线聚类的依据,显著目标的位置信息作为从聚类中提取关键帧的依据。For dynamic shots, firstly, the significant moving objects are tracked, and then the tracking of the significant moving objects is used as the basis for online clustering, and the location information of the prominent objects is used as the basis for extracting key frames from the clustering. 6.根据权利要求5所述的方法,其特征在于,通过以下步骤实现静态镜头在线聚类:6. The method according to claim 5, characterized in that, the online clustering of static shots is realized by the following steps: 初始化:计算静态镜头第一帧的色相饱和度直方图初始胞腔数N=1,并且将作为胞腔Cell1的形心C1的矢量,C1=f1Initialization: Calculate the hue-saturation histogram of the first frame of the static shot The initial cell number N=1, and the As the vector of the centroid C 1 of the cell cavity Cell 1 , C 1 =f 1 ; S11:如果当前帧p属于静态镜头,计算当前帧的色相饱和度直方图HpS11: If the current frame p belongs to a static shot, calculate the hue-saturation histogram H p of the current frame; S12:计算p和各胞腔形心的视觉距离,获得其中最小的视觉距离胞腔其中m为胞腔的索引号;S12: Calculate the visual distance between p and the centroid of each cell, and obtain the smallest visual distance cell Where m is the index number of the cell; S13:将Dsal(p,Cm)与阈值εc进行比较,当Dsal(p,Cm)≤εc时,把p归入胞腔Cellm中,然后用Hp替代Cellm的形心;否则,增加胞腔CellN+1,将Hp作为胞腔CellN+1的形心CN+1的矢量,最后更新胞腔数N=N+1;S13: Compare D sal (p, C m ) with the threshold ε c , when D sal (p, C m ) ≤ ε c , classify p into Cell m , and then replace Cell m with H p Centroid; otherwise, increase the cell cavity Cell N+1 , use H p as the vector of the centroid C N+1 of the cell cavity Cell N+1 , and finally update the cell number N=N+1; S14:对于所有静态镜头帧重复S11、S12和S13。S14: Repeat S11, S12 and S13 for all still shot frames. 7.根据权利要求5所述的方法,其特征在于,通过以下步骤实现动态镜头的关键帧提取:7. method according to claim 5, is characterized in that, realizes the key frame extraction of dynamic shot by the following steps: 初始化:获取动态镜头的第一帧;Initialization: Get the first frame of the dynamic shot; S21:获得跟踪目标区域,初始化粒子或重采样,提取视频下一帧,判断该帧是否为空,如果为空,则结束;S21: Obtain the tracking target area, initialize particles or resampling, extract the next frame of the video, judge whether the frame is empty, and if it is empty, end; S22:获得FAST特征向量,用FLANN算法进行匹配,更新特征向量权值,如果特征向量不足,则结束;S22: Obtain FAST eigenvectors, use the FLANN algorithm for matching, update the weights of the eigenvectors, and end if the eigenvectors are insufficient; S23:更新各粒子权值,计算关键帧权值和目标区域,跳转执行S21。S23: Update the weights of each particle, calculate the key frame weights and the target area, and jump to S21. 8.一种应用权利要求1至7任一项所述基于视觉注意力模型的关键帧提取方法的系统,其特征在于,包括显著区域提取模块,关键帧提取模块;8. A system of the key frame extraction method based on the visual attention model described in any one of claims 1 to 7, characterized in that, comprising a salient region extraction module, a key frame extraction module; 所述显著区域提取模块包括:The salient region extraction module includes: 空域显著区域提取模块,用于提取空域上的显著区域;The airspace salient area extraction module is used to extract the salient area on the airspace; 时域关键点显著度获取模块,用于提取时域上的关键点的显著度值;The time-domain key point saliency acquisition module is used to extract the saliency value of the key point on the time domain; 融合模块,用于将空域上的显著区域和时域上的关键点进行融合,并最终获取显著区域;The fusion module is used to fuse the salient areas in the airspace and the key points in the time domain, and finally obtain the salient areas; 所述关键帧提取模块包括:The key frame extraction module includes: 静态镜头关键帧提取模块,用于静态镜头的关键帧提取;Static lens key frame extraction module, used for key frame extraction of static lens; 动态镜头关键帧提取模块,用于动态镜头的关键帧提取;Dynamic lens key frame extraction module, used for key frame extraction of dynamic lens; 镜头自适应模块,用于静态镜头关键帧提取模块和动态镜头关键帧提取模块之间的控制。The lens adaptive module is used for the control between the static lens key frame extraction module and the dynamic lens key frame extraction module.
CN201410039072.7A 2014-01-26 2014-01-26 Key frame extraction method based on visual attention model and system Expired - Fee Related CN103824284B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410039072.7A CN103824284B (en) 2014-01-26 2014-01-26 Key frame extraction method based on visual attention model and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410039072.7A CN103824284B (en) 2014-01-26 2014-01-26 Key frame extraction method based on visual attention model and system

Publications (2)

Publication Number Publication Date
CN103824284A CN103824284A (en) 2014-05-28
CN103824284B true CN103824284B (en) 2017-05-10

Family

ID=50759326

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410039072.7A Expired - Fee Related CN103824284B (en) 2014-01-26 2014-01-26 Key frame extraction method based on visual attention model and system

Country Status (1)

Country Link
CN (1) CN103824284B (en)

Families Citing this family (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104598908B (en) * 2014-09-26 2017-11-28 浙江理工大学 A kind of crops leaf diseases recognition methods
CN104778721B (en) * 2015-05-08 2017-08-11 广州小鹏汽车科技有限公司 The distance measurement method of conspicuousness target in a kind of binocular image
CN105472380A (en) * 2015-11-19 2016-04-06 国家新闻出版广电总局广播科学研究院 Compression domain significance detection algorithm based on ant colony algorithm
CN106210444B (en) * 2016-07-04 2018-10-30 石家庄铁道大学 Motion state self adaptation key frame extracting method
CN107967476B (en) * 2017-12-05 2021-09-10 北京工业大学 Method for converting image into sound
CN110197107B (en) * 2018-08-17 2024-05-28 平安科技(深圳)有限公司 Micro-expression recognition method, micro-expression recognition device, computer equipment and storage medium
CN110322474B (en) * 2019-07-11 2021-06-01 史彩成 Image moving target real-time detection method based on unmanned aerial vehicle platform
CN110399847B (en) * 2019-07-30 2021-11-09 北京字节跳动网络技术有限公司 Key frame extraction method and device and electronic equipment
CN111191650B (en) * 2019-12-30 2023-07-21 北京市新技术应用研究所 Article positioning method and system based on RGB-D image visual saliency
CN111493935B (en) * 2020-04-29 2021-01-15 中国人民解放军总医院 Method and system for automatic prediction and recognition of echocardiography based on artificial intelligence
CN112418012B (en) * 2020-11-09 2022-06-07 武汉大学 A video summary generation method based on spatiotemporal attention model
CN114399729B (en) * 2021-12-20 2025-03-25 山东鲁软数字科技有限公司 Monitoring object movement identification method, system, terminal and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2207111A1 (en) * 2009-01-08 2010-07-14 Thomson Licensing SA Method and apparatus for generating and displaying a video abstract
CN102088597A (en) * 2009-12-04 2011-06-08 成都信息工程学院 Method for estimating video visual salience through dynamic and static combination
CN102695056A (en) * 2012-05-23 2012-09-26 中山大学 Method for extracting compressed video key frames

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7263660B2 (en) * 2002-03-29 2007-08-28 Microsoft Corporation System and method for producing a video skim

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2207111A1 (en) * 2009-01-08 2010-07-14 Thomson Licensing SA Method and apparatus for generating and displaying a video abstract
CN102088597A (en) * 2009-12-04 2011-06-08 成都信息工程学院 Method for estimating video visual salience through dynamic and static combination
CN102695056A (en) * 2012-05-23 2012-09-26 中山大学 Method for extracting compressed video key frames

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
Efficient visual attention based framework for extraction key frames from videos;Naveed Ejaz et al.;《Signal Processing:Image Communication》;20121017;第34-44页 *
Visual attention detection in video sequences using spatiotemporal cues;Yun Zhai et al.;《Proceedings of the 14th ACM international conference on multimedia》;20061031;第816-821页第1.2-第4节 *
基于视觉注意模型的自适应视频关键帧提取;蒋鹏 等;《中国图象图形学报》;20090831;第14卷(第8期);第1651-1653页第2-4节 *

Also Published As

Publication number Publication date
CN103824284A (en) 2014-05-28

Similar Documents

Publication Publication Date Title
CN103824284B (en) Key frame extraction method based on visual attention model and system
CN103258193B (en) A kind of group abnormality Activity recognition method based on KOD energy feature
CN110059581A (en) People counting method based on depth information of scene
CN102256065B (en) Automatic video condensing method based on video monitoring network
US9626585B2 (en) Composition modeling for photo retrieval through geometric image segmentation
CN103853724B (en) multimedia data classification method and device
CN103530638B (en) Method for pedestrian matching under multi-cam
CN106845621A (en) Dense population number method of counting and system based on depth convolutional neural networks
CN104599275A (en) Understanding method of non-parametric RGB-D scene based on probabilistic graphical model
CN105701467A (en) Many-people abnormal behavior identification method based on human body shape characteristic
CN103577875A (en) CAD (computer-aided design) people counting method based on FAST (features from accelerated segment test)
CN102034267A (en) Three-dimensional reconstruction method of target based on attention
Xu et al. Weakly supervised deep semantic segmentation using CNN and ELM with semantic candidate regions
Yang et al. The large-scale crowd density estimation based on sparse spatiotemporal local binary pattern
CN103309982A (en) Remote sensing image retrieval method based on vision saliency point characteristics
CN103400155A (en) Pornographic video detection method based on semi-supervised learning of images
CN108961385B (en) SLAM composition method and device
CN106815576A (en) Target tracking method based on consecutive hours sky confidence map and semi-supervised extreme learning machine
CN109034258A (en) Weakly supervised object detection method based on certain objects pixel gradient figure
Ma et al. Scene invariant crowd counting using multi‐scales head detection in video surveillance
CN104200235A (en) Time-space local feature extraction method based on linear dynamic system
CN103500456B (en) A kind of method for tracing object based on dynamic Bayesian network network and equipment
CN105809673A (en) SURF (Speeded-Up Robust Features) algorithm and maximal similarity region merging based video foreground segmentation method
Roy et al. A comprehensive survey on computer vision based approaches for moving object detection
Sun et al. Unsupervised fast anomaly detection in crowds

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20170510