CN103399893B

CN103399893B - Method for retrieving objects on basis of hierarchical perception

Info

Publication number: CN103399893B
Application number: CN201310311320.4A
Authority: CN
Inventors: 陈宗海; 项俊平; 赵宇宙; 郭明玮
Original assignee: University of Science and Technology of China USTC
Current assignee: University of Science and Technology of China USTC
Priority date: 2013-07-23
Filing date: 2013-07-23
Publication date: 2017-02-08
Anticipated expiration: 2033-07-23
Also published as: CN103399893A

Abstract

The invention relates to a target retrieval method based on layered perception. First, match the target to be retrieved in a single frame image of the video to be retrieved: by arranging a "search particle swarm" to find a similarity measure with the target to be retrieved that is higher than a certain value The area where these particles are located is the approximate location of the target, realizing low-resolution target perception; by arranging "focused particle swarms" to perform multi-scale and multi-neighborhood particle similarity matching, the precise matching position of the target is obtained. Then, the matching results of multiple frames of images in the retrieved video are integrated to determine whether the target to be retrieved appears in the retrieved video. The present invention can realize the positioning of the retrieved target in the video clip or single frame image, can effectively eliminate the inaccurate target positioning caused by the viewing angle, deformation, scale, color distribution, etc. , starting from the first frame of the video for object retrieval and localization.

Description

A Object Retrieval Method Based on Hierarchical Perception

技术领域technical field

本发明涉及图像模式识别领域，尤其涉及一种基于分层感知的目标检索方法。The invention relates to the field of image pattern recognition, in particular to an object retrieval method based on layered perception.

背景技术Background technique

基于内容的目标检索技术是指通过计算机对图像(或视频)的内容进行分析，自动地提取出图像(或视频)中包括颜色、纹理、空间等各种特征信息，经过相似性判别和推理，最后检索出与其相类似的目标。随着图像和视频信息量的不断增加，这种基于内容的目标检索技术的研究具有重大的理论价值和广阔的应用前景，其关键技术在于对目标特征信息的提取和匹配。Content-based target retrieval technology refers to analyzing the content of images (or videos) by computer, and automatically extracting various feature information in images (or videos), including color, texture, space, etc., after similarity discrimination and reasoning, Finally, similar targets are retrieved. As the amount of image and video information continues to increase, the research on this content-based target retrieval technology has great theoretical value and broad application prospects. The key technology lies in the extraction and matching of target feature information.

目标的特征匹配就是目标间的特征值进行相似性度量，目前特征匹配大致分为两种：完全匹配和相似性匹配。所谓完全匹配是指目标特征值的精确匹配，两者完全相同；相似性匹配是指当目标特征值满足某些条件时，就认为两个目标相同。因为技术手段的差异性，相似目标间的特征值不可能是完全一样的，因此，相似性匹配是目前目标检索的主要方法之一，如何在匹配过程中消除不同目标的视角、形变、尺度、色彩分布等方面的差异性是需要重点考虑的问题，也是各种目标检索方法效果存在差别的主因之一。The feature matching of the target is to measure the similarity of the feature values between the targets. At present, the feature matching can be roughly divided into two types: exact matching and similarity matching. The so-called complete matching refers to the exact matching of the target feature values, and the two are exactly the same; the similarity matching means that when the target feature values meet certain conditions, the two targets are considered to be the same. Due to the differences in technical means, the feature values of similar objects cannot be exactly the same. Therefore, similarity matching is one of the main methods of object retrieval at present. How to eliminate the angle of view, deformation, scale, The difference in color distribution and other aspects is an important issue that needs to be considered, and it is also one of the main reasons for the differences in the effects of various target retrieval methods.

一个好的目标检索方法应该符合人类视觉系统的感知和判断。人类视觉系统是分层次感知目标的，首先是低分辨率的目标搜索，其次是高分辨率的目标定位，实现对目标的深层次理解和感知。A good object retrieval method should conform to the perception and judgment of the human visual system. The human visual system perceives targets in layers, first is low-resolution target search, and second is high-resolution target positioning to achieve deep understanding and perception of targets.

发明内容Contents of the invention

本发明技术解决问题：克服现有技术的不足，提供一种基于分层感知的目标检索方法，通过模拟人类视觉系统感知和推理机制，达到提高目标定位和检索精度的目的。The technical solution of the present invention is to overcome the deficiencies of the prior art, and provide a target retrieval method based on layered perception, and achieve the purpose of improving target positioning and retrieval accuracy by simulating the perception and reasoning mechanism of the human visual system.

本发明技术解决方案：一种基于分层感知的目标检索方法，本发明所述的目标检索是指给定待检索目标图像后，在被检索视频中寻找待检索目标，计算出目标在视频图像中的精确位置。为实现上述目的，本发明采用分层感知策略，将目标搜索过程当作灰色系统的白化过程，并给出匹配度的可视化定性描述。实现步骤如下：The technical solution of the present invention: a target retrieval method based on layered perception. The target retrieval in the present invention refers to finding the target to be retrieved in the retrieved video after the target image to be retrieved is given, and calculating the target in the video image. exact location in . In order to achieve the above purpose, the present invention adopts a layered perception strategy, treats the target search process as the whitening process of the gray system, and provides a visual qualitative description of the matching degree. The implementation steps are as follows:

步骤(1)将待检索目标归一化后用直方图分布进行描述；Step (1) After normalizing the target to be retrieved, describe it with a histogram distribution;

步骤(2)构造被检索视频单帧图像的空间闭包椭圆镞区域，定义为搜索粒子群，用直方图分布描述粒子，寻找与待检索目标相似度度量高于一定值的粒子，这些粒子所在区域就是目标可能处于的大致位置，实现了低分辨率的目标感知；Step (2) Construct the spatial closure ellipse region of the retrieved video single frame image, which is defined as the search particle group, use the histogram distribution to describe the particles, and find the particles whose similarity measure with the target to be retrieved is higher than a certain value. The area is the approximate location where the target may be, and realizes low-resolution target perception;

步骤(3)通过在上述目标的大致位置附件随机生成2W个多尺度、多邻域椭圆镞区域，定义为聚焦粒子群，用直方图分布描述粒子，进行多尺度、多邻域的粒子相似度匹配，得到目标在被检索视频的单帧图像中的精确匹配位置，实现高分辨率的目标定位；Step (3) Randomly generate 2W multi-scale, multi-neighborhood elliptical arrowhead regions near the approximate position of the above target, define them as focused particle groups, use histogram distribution to describe the particles, and perform multi-scale, multi-neighborhood particle similarity Matching, to obtain the precise matching position of the target in the single frame image of the retrieved video, to achieve high-resolution target positioning;

步骤(4)综合被检索视频中多帧图像的匹配结果，确定待检索目标是否出现在被检索视频中，并给出匹配度的可视化灰色定性描述图。Step (4) Synthesize the matching results of multiple frames of images in the retrieved video, determine whether the target to be retrieved appears in the retrieved video, and provide a visual gray qualitative description map of the matching degree.

所述步骤(1)的具体实现为：The concrete realization of described step (1) is:

步骤(1.1)将待检测目标按比例缩小到宽80像素、高152像素，并将其看做粒子s'＝(x',y',H_x',H_y')＝(40,76,40,76)，所述粒子定义如下：粒子s＝(x,y,H_x,H_y)是目标可能覆盖范围的一个椭圆描述子，其中(x,y)是椭圆O_s的中心坐标，(H_x,H_y)分别是椭圆的长轴半长和短轴半长。Step (1.1) scale down the target to be detected to a width of 80 pixels and a height of 152 pixels, and regard it as a particle s'=(x',y',H _x ',H _y ')=(40,76, 40,76), the particle is defined as follows: particle s=(x, y, H _x , H _y ) is an ellipse descriptor of the possible coverage of the target, where (x, y) is the center coordinate of the ellipse O _s , (H _x , H _y ) are the semi-major and semi-minor lengths of the ellipse, respectively.

步骤(1.2)将粒子s'用直方图刻画，所述粒子直方图分布定义如下：本发明中采用HSV空间中的大小为8×8×4的直方图来描述粒子s所在椭圆区域O_s的图像特征，直方图由函数h(x_i)＝j,x_i∈O_s,j∈{1,2,...,256}确定，O_s中像素的色彩分布p_y＝{p_y ^(v)}_v＝1,...,m由下式确定：Step (1.2) characterizes the particle s' with a histogram, and the distribution of the particle histogram is defined as follows: In the present invention, the size of the 8 × 8 × 4 histogram in the HSV space is used to describe the particle s in the elliptical region O _s Image features, the histogram is determined by the function h( _xi )=j, _xi ∈O _s , j∈{1,2,...,256}, the color distribution of pixels in O _s p _y ={p _y ^{( v)} } _v=1,...,m is determined by the following formula:

${p p}_{y the y}^{((v v))} = = f f {Σ Σ}_{i i = = 11}^{I I} k k ((\frac{| | | | y the y - - {x x}_{i i} | | | |}{a a})) δ δ [[h h (({x x}_{i i})) - - v v]],, k k ((r r)) = = \{\begin{matrix} 11 - - {r r}^{22} & r r < < 11 \\ 00 & o o t t h h e e r r \end{matrix} - - - - - - ((11))$

其中，I是O_s中像素的个数，δ是Kronecker delta函数，参数归一化因子以确保m＝8×8×4＝256。Among them, I is the number of pixels in O _s , δ is the Kronecker delta function, and the parameter normalization factor To ensure that the m=8×8×4=256.

所述步骤(2)具体实现步骤如下：Described step (2) concrete realization steps are as follows:

步骤(2.1)布设搜索粒子群，所述搜索粒子群布设如下：在第i帧图像P_i中均匀选择N₁个点作为“基本粒子群”S_i ¹＝{s_ij ¹,j＝1,...,N₁}的中心，粒子长轴半长H_x均相等、短轴半长H_y均相等，H_x和H_y取满足如下两个条件的最小值：(a)粒子所在椭圆镞为P_i的闭包，即满足(b)粒子两两重合度不小于10％，即再在P_i中设置粒子群S_i ¹的翻转粒子群S_i ²＝{s_ij ²,j＝1,...,N₁}，使得s_ij ²与s_ij ¹中心重合，s_ij ²的长轴半长H_x'等于s_ij ¹的短轴半长H_y、s_ij ²的短轴半长H_y'等于s_ij ¹的长轴半长H_x，S_i ¹∪S_i ²组成了搜索粒子群；Step (2.1) lay out the search particle swarm, and the search particle swarm is arranged as follows: uniformly select N ₁ points in the image P _i of the i-th frame as the "basic particle swarm" S _i ¹ ={s _ij ¹ ,j=1, ..., N ₁ }, the particle major axis semi-length H _x is equal, and the minor axis semi-length H _y is equal, and H _x and H _y take the minimum value that satisfies the following two conditions: (a) the ellipse where the particle is located arrowhead is the closure of P _i , which satisfies (b) The coincidence degree of particles in pairs is not less than 10%, that is Then set the inverted particle group S _i ² of the particle group S _i ¹ in P _i ={s _ij ² ,j=1,...,N ₁ }, so that the centers of s _ij ² and s _ij ¹ coincide, and s _ij ² The major axis semi-length H _x ' of s _ij ¹ is equal to the minor axis semi-length H _y , and the minor axis semi-length H _y ' of s _ij ² is equal to the major axis semi-length H _x of s _ij ¹ , S _i ¹ ∪ S _i ² Formed a search particle swarm;

步骤(2.2)在搜索粒子群中，通过计算两个粒子分布之间的相似性度量，寻找与待检索目标相似度度量高于某个设定值的粒子，两个粒子分布p＝{p^(v)}_v＝1,...,m和q＝{q^(v)}_v＝1,...,m之间的相似性度量为如下的Bhattacharyya系数：Step (2.2) In the search particle swarm, by calculating the similarity measure between the two particle distributions, look for particles whose similarity measure with the target to be retrieved is higher than a certain set value, the two particle distributions p={p ^{( v)} } _v=1,...,m and q={q ^(v) } _v=1,...,m The similarity measure is the Bhattacharyya coefficient as follows:

$ρ ρ [[p p,, q q]] = = {Σ Σ}_{v v = = 11}^{m m} \sqrt{{p p}^{((v v))} {q q}^{((v v))}} - - - - - - ((22))$

步骤(2.3)记录下相似度高于设定值的粒子信息，这些粒子所在区域就是目标可能处于的大致位置，这些区域的并集就是步骤(4)需考虑的搜索范围，其它区域不再进行搜索。Step (2.3) records the particle information whose similarity is higher than the set value. The area where these particles are located is the approximate position where the target may be. The union of these areas is the search range to be considered in step (4). Other areas are no longer search.

所述步骤(3)通过在上述目标的大致位置布设“聚焦粒子群”，进行多尺度、多邻域的粒子相似度匹配，实现高分辨率的目标定位，得到目标在待检索视频的单帧图像中的精确匹配位置。具体实现步骤如下：In the step (3), by arranging a "focused particle swarm" at the approximate position of the target, multi-scale and multi-neighborhood particle similarity matching is performed to achieve high-resolution target positioning and obtain the single frame of the target in the video to be retrieved. Exact match location in the image. The specific implementation steps are as follows:

步骤(3.1)“聚焦粒子群”中的粒子是满足步骤(3)条件的搜索粒子的多尺度、多邻域拓展，确定聚焦粒子群如下：对搜索粒子群中的粒子j(记为s_j＝(x_j,y_j,H_x,j,H_y,j))，按如下步骤生成聚焦粒子群S_i,j ³∪S_i,j ⁴：Particles in step (3.1) "focused particle swarm" are multi-scale and multi-neighborhood extensions of search particles that meet the conditions of step (3), and the focused particle swarm is determined as follows: For particle j in the search particle swarm (denoted as s _j ＝(x _j ,y _j ,H _x,j ,H _y,j )), according to the following steps to generate the focused particle group S _i,j ³ ∪S _i,j ⁴ :

步骤(3.1.1)随机生成W个不重复的取值于[0.8,1]的随机数{α_k}_k＝1,...,W和W个不重复的取值于[0.1,0.3]的随机数{β_k}_k＝1,...,W，则其中：Step (3.1.1) Randomly generate W non-repeating random numbers {α _k } _{k=1,..., W} and W non-repeating values in [0.1, 0.3] ] random number {β _k } _k=1,...,W , then in:

s_j,k ¹＝{x_j×(1-β_k),y_j×(1-β_k),H_x,j×α_k,H_y,j×α_k}，s _j,k ¹ ={x _j ×(1-β _k ),y _j ×(1-β _k ),H _x,j ×α _k ,H _y,j ×α _k },

s_j,k ²＝{x_j×(1-β_k),y_j×(1+β_k),H_x,j×α_k,H_y,j×α_k}，s _j,k ² ＝{x _j ×(1-β _k ),y _j ×(1+β _k ),H _x,j ×α _k ,H _y,j ×α _k },

s_j,k ³＝{x_j×(1+β_k),y_j×(1-β_k),H_x,j×α_k,H_y,j×α_k}，s _j,k ³ ＝{x _j ×(1+β _k ),y _j ×(1-β _k ),H _x,j ×α _k ,H _y,j ×α _k },

s_j,k ⁴＝{x_j×(1+β_k),y_j×(1+β_k),H_x,j×α_k,H_y,j×α_k}，s _j,k ⁴ ＝{x _j ×(1+β _k ),y _j ×(1+β _k ),H _x,j ×α _k ,H _y,j ×α _k },

s_j,k ⁵＝{x_j,y_j,H_x,j×α_k,H_y,j×α_k}；s _j,k ⁵ ={x _j ,y _j ,H _x,j ×α _k ,H _y,j ×α _k };

步骤(3.1.2)随机生成W个不重复的取值于[1,1.2]的随机数{α_k}_k＝1,...,W和W个不重复的取值于[0.1,0.3]的随机数{β_k}_k＝1,...,W，则其中：Step (3.1.2) Randomly generate W non-repeating random numbers {α _k } _{k=1,..., W} and W non-repeating values in [0.1, 0.3] ] random number {β _k } _k=1,...,W , then in:

s_j,k ⁵＝{x_j,y_j,H_x,j×α_k,H_y,j×α_k}。s _j,k ⁵ ={x _j ,y _j ,H _x,j ×α _k ,H _y,j ×α _k }.

步骤(3.2)计算聚焦粒子群中的粒子与带检索目标粒子相似度度量，根据度量值进行定性映射，映射值满足一定条件的所有聚焦粒子所在位置的均值，就是在待检索视频的单帧图像中的精确匹配位置，自此实现了单帧图像中的目标定位。粒子的相似性度量是粒子椭圆区域图像的匹配度，人类对匹配度的感知是定性的，定性映射定义如下：Step (3.2) Calculate the similarity measure between the particles in the focused particle swarm and the target particle with retrieval, and perform qualitative mapping according to the measured value. The average value of the positions of all focused particles whose mapped value meets certain conditions is the single frame image of the video to be retrieved. The precise matching position in , and the object localization in a single frame image has been realized since then. The similarity measure of particles is the matching degree of the image in the particle ellipse region. Human perception of the matching degree is qualitative. The qualitative mapping is defined as follows:

所述步骤(4)综合被检索视频中多帧图像的匹配结果，确定待检索目标是否出现在被检索视频中，并给出匹配度的可视化定性描述图具体实现步骤如下：Said step (4) synthesizes the matching results of multiple frames of images in the retrieved video, determines whether the target to be retrieved appears in the retrieved video, and provides a visual qualitative description of the matching degree. The specific implementation steps are as follows:

步骤(4.1)在被检索视频Dest的M帧图像中，将匹配的帧序号依次排列。若存在连续T₀帧均匹配的情形，则判定待检索目标出现在被检索视频中，出现时刻由连续匹配帧序列的首末帧序号标记；否则判定待检索目标未出现在被检索视频中。Step (4.1) Arrange the matching frame numbers in sequence in the M frames of the retrieved video Dest. If there is a situation where consecutive T+ ₀ frames are matched, it is determined that the target to be retrieved appears in the retrieved video, and the appearance time is marked by the first and last frame numbers of the continuous matching frame sequence; otherwise, it is determined that the target to be retrieved does not appear in the retrieved video.

步骤(4.2)给出匹配度的可视化定性描述图，用于直观显示目标在被检索图像G中的局部相似度，所述匹配度的可视化定性描述图是指由下式定义的灰度图：Step (4.2) provides a visual qualitative description map of the matching degree, which is used to visually display the local similarity of the target in the retrieved image G. The visual qualitative description map of the matching degree refers to a grayscale image defined by the following formula:

{(x,y,f(x,y))|x∈[0,width],y∈[0,height]} (4){(x,y,f(x,y))|x∈[0,width],y∈[0,height]} (4)

其中，f(x,y)为定性白化权函数，width、height分别为图像的宽和高。这里将待检索目标在被检索图像G中的位置信息表示成如下区间灰数四元组：Among them, f(x, y) is the qualitative whitening weight function, and width and height are the width and height of the image respectively. Here, the position information of the target to be retrieved in the retrieved image G is expressed as the following interval gray number quadruple:

${&CircleTimes; &CircleTimes;}_{G G} = = (({&CircleTimes; &CircleTimes;}_{x x},, {&CircleTimes; &CircleTimes;}_{y the y},, {&CircleTimes; &CircleTimes;}_{{H h}_{x x}},, {&CircleTimes; &CircleTimes;}_{{H h}_{y the y}})) = = (([[00,, w w i i d d t t h h]],, [[00,, h h e e i i g g h h t t]],, [[00,, w w i i d d t t h h / / 22]],, [[00,, h h e e i i g g h h t t / / 22]])) - - - - - - ((55))$

其中依次对应匹配对象在G中的中心横坐标x、中心纵坐标y、长轴半长H_x和短轴半长H_y的灰数。假设服从各变量独立的多元高斯分布，G中布设的粒子群为{(x_i,y_i,H_x,i,H_y,i),i＝1,...,N}，粒子群与待检索目标的相似性度量值为{ρ_i,i＝1,...,N}，则定义的白化权函数如下：in Corresponding to the gray number of the center abscissa x, center ordinate y, major axis half length H _x and short axis half length H _y of the matching object in G in turn. suppose Obey the independent multivariate Gaussian distribution of each variable, the particle swarm arranged in G is {( _xi ,y _i ,H _x,i ,H _y,i ),i=1,...,N}, the particle swarm and the The similarity measure value of retrieval target is {ρ _i ,i=1,...,N}, then define The whitening weight function of is as follows:

$f f ((x x,, y the y)) = = Q Q ((\underset{i i = = 11,, ... ...,, N N}{max max} ((\frac{11}{\sqrt{22 π π} {σ σ}_{i i,, x x}} exp exp ((- - {((\frac{x x - - {x x}_{i i}}{22 {σ σ}_{i i,, x x}}))}^{22})) \times \times \frac{11}{\sqrt{22 π π} {σ σ}_{i i,, y the y}} exp exp ((- - {((\frac{y the y - - {y the y}_{i i}}{22 {σ σ}_{i i,, y the y}}))}^{22})))))) - - - - - - ((66))$

$\frac{11}{\sqrt{22 π π} {σ σ}_{i i,, x x}} = = \sqrt{{ρ ρ}_{i i}},, \frac{11}{\sqrt{22 π π} {σ σ}_{i i,, y the y}} = = \sqrt{{ρ ρ}_{i i}} - - - - - - ((77))$

由定义可知，该白化权函数取值于定性集合。It can be seen from the definition that the whitening weight function takes the value of a qualitative set.

本发明与现有技术相比优点在于：Compared with the prior art, the present invention has the advantages of:

(1)本发明可实现在视频片段或单帧图像中定位被检索目标，从而实现视频库或图像库中的目标检索；(1) The present invention can locate the target to be retrieved in a video segment or a single frame image, thereby realizing target retrieval in a video library or an image library;

(2)本发明模拟人类视觉的分层感知模型，模拟人类对灰色系统的定性描述思维，是一种对拟人模型应用的探索；(2) The present invention simulates the layered perception model of human vision, simulates the qualitative descriptive thinking of human beings to the gray system, and is a kind of exploration to the application of anthropomorphic models;

(3)本发明可以有效消除因视角、形变、尺度、色彩分布等造成的目标定位不准；(3) The present invention can effectively eliminate inaccurate target positioning caused by viewing angle, deformation, scale, color distribution, etc.;

(4)本发明可以在不进行视频目标前景/背景检测的前提下，从视频的第一帧开始进行目标检索和定位。(4) The present invention can perform target retrieval and positioning from the first frame of the video without performing video target foreground/background detection.

附图说明Description of drawings

图1为本发明的整体思路流程框图；Fig. 1 is a flow chart diagram of the overall idea of the present invention;

图2为本发明中目标匹配流程图；Fig. 2 is target matching flowchart in the present invention;

图3为目标匹配效果示意图，其中(a)为带检索目标图像，(b)为被检索视频中某一帧的匹配目标图。Figure 3 is a schematic diagram of the target matching effect, where (a) is an image with a retrieval target, and (b) is a matching target image of a certain frame in the retrieved video.

具体实施方式detailed description

下面结合本发明实施例中的附图，对本发明实施例中的技术方案进行清楚、完整地描述，显然，所描述的实施例仅仅是本发明一部分实施例，而不是全部的实施例。基于本发明的实施例，本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例，都属于本发明的保护范围。附图1是本发明整体思路的流程框图，附图2是本发明所提出的目标匹配的流程图。所述一种基于分层感知和灰色定性推理的目标检索方法包括以下步骤：The technical solutions in the embodiments of the present invention will be clearly and completely described below in conjunction with the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only some of the embodiments of the present invention, not all of them. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the protection scope of the present invention. Accompanying drawing 1 is the flow chart of the overall thinking of the present invention, and accompanying drawing 2 is the flow chart of the target matching proposed by the present invention. Described a kind of target retrieval method based on hierarchical perception and gray qualitative reasoning comprises the following steps:

步骤S101:将待检索目标用直方图分布进行描述，计算待检测目标的直方图分布。对由人工选定或目标检测算法获取的待检测目标Target，依据(1)式计算其HSV色彩直方图分布q＝{q^(v)}_v＝1,...,m；Step S101: Describe the target to be retrieved with histogram distribution, and calculate the histogram distribution of the target to be detected. For the target to be detected Target obtained by manual selection or target detection algorithm, calculate its HSV color histogram distribution q={q ^(v) } _v=1,...,m according to (1) formula;

步骤S102:布设“搜索粒子群”，计算被检索视频中单帧图像“搜索粒子群”的直方图分布。设被检索视频Dest共有M帧图像，在每帧图像P_i(i＝1,...,M)中，按如下所述方法生成“搜索粒子群”S_i ¹∪S_i ²：在待检测目标Target的第i帧图像P_i中均匀选择N₁个点作为“基本粒子群”S_i ¹＝{s_ij ¹,j＝1,...,N₁}的中心，粒子长轴半长H_x均相等、短轴半长H_y均相等，H_x和H_y取满足如下两个条件的最小值：(1)粒子所在椭圆镞为P_i的闭包，即满足粒子两两重合度不小于10％，即再在P_i中设置粒子群S_i ¹的翻转粒子群S_i ²＝{s_ij ²,j＝1,...,N₁}，使得s_ij ²与s_ij ¹中心重合，s_ij ²的长轴半长H_x'等于s_ij ¹的短轴半长H_y、s_ij ²的短轴半长H_y'等于s_ij ¹的长轴半长H_x，S_i ¹∪S_i ²组成了搜索粒子群。对S_i ¹∪S_i ²中的每个粒子，依据(1)式计算其HSV色彩直方图分布p_i,j＝{p_i,j ^(v)}_v＝1,...,m,i＝1,...,M,j＝1,...,2×N₁；Step S102: Layout the "search particle swarm", and calculate the histogram distribution of the single-frame image "search particle swarm" in the retrieved video. Assuming that the retrieved video Dest has a total of M frames of images, in each frame of image P _i (i=1,...,M), generate a "search particle swarm" S _i ¹ ∪ S _i ² as follows: Evenly select N ₁ points in the i-th frame image P _i of the detection target Target as the center of the "basic particle group" S _i ¹ ={s _ij ¹ ,j=1,...,N ₁ }, the long axis of the particle is half The lengths H _x and H _y are equal, and H _x and H _y take the minimum value that satisfies the following two conditions: (1) The ellipse where the particle is located is the closure of P _i , which satisfies The coincidence degree of particles in pairs is not less than 10%, that is Then set the inverted particle group S _i ² of the particle group S _i ¹ in P _i ={s _ij ² ,j=1,...,N ₁ }, so that the centers of s _ij ² and s _ij ¹ coincide, and s _ij ² The major axis semi-length H _x ' of s _ij ¹ is equal to the minor axis semi-length H _y , and the minor axis semi-length H _y ' of s _ij ² is equal to the major axis semi-length H _x of s _ij ¹ , S _i ¹ ∪ S _i ² A search particle swarm is formed. For each particle in S _i ¹ ∪S _i ² , calculate its HSV color histogram distribution p _i,j ={p _i,j ^(v) } _v=1,...,m , i=1,...,M,j=1,...,2×N ₁ ;

步骤S103:筛选“搜索粒子群”，得到目标的大致位置。依据(2)式计算Target与S_i ¹∪S_i ²中的每个粒子的直方图分布相似性度量值，并计算度量值{ρ[p_i,j,q],j＝1,...,2×N₁}的均值μ_i,1和方差σ_i,1，将所有满足ρ[p_i,j,q]>ρ_i,1+σ_i,1的j组成的集合按相似度从大到小排序后记为B_i,1；Step S103: Screen the "search particle swarm" to obtain the approximate location of the target. Calculate the histogram distribution similarity value between Target and each particle in S _i ¹ ∪ S _i ² according to formula (2), and calculate the measurement value {ρ[p _i,j ,q],j=1,... .,2×N ₁ }'s mean value μ _i,1 and variance σ _i,1 , the set of all js satisfying ρ[p _i,j ,q]>ρ _i,1 +σ _i,1 is sorted by similarity After sorting from large to small, it is recorded as B _i,1 ;

步骤S104:布设“聚焦粒子群”，计算被检索视频中单帧图像“聚焦粒子群”的直方图分布。对B_i,1中的粒子j(记为s_j＝(x_j,y_j,H_x,j,H_y,j))，按如下所述方法生成“聚焦粒子群”S_i,j ³∪S_i,j ⁴：Step S104: Layout the "focused particle swarm", and calculate the histogram distribution of the single-frame image "focused particle swarm" in the retrieved video. For particle j in B _i,1 (denoted as s _j =(x _j ,y _j ,H _x,j ,H _y,j )), generate “focused particle swarm” S _i,j ³ as follows ∪S _i,j ⁴ :

(1)随机生成W个不重复的取值于[0.8,1]的随机数{α_k}_k＝1,...,W和W个不重复的取值于[0.1,0.3]的随机数{β_k}_k＝1,...,W，则其中：(1) Randomly generate W non-repeated random numbers {α _k } _k=1,...,W and W non-repeated random numbers with values in [0.1,0.3] Number {β _k } _k=1,...,W , then in:

(2)随机生成W个不重复的取值于[1,1.2]的随机数{α_k}_k＝1,...,W和W个不重复的取值于[0.1,0.3]的随机数{β_k}_k＝1,...,W，则其中：(2) Randomly generate W non-repeating random numbers {α _k } _{k=1,..., W} and W non-repeating random numbers in [0.1, 0.3] Number {β _k } _k=1,...,W , then in:

对S_i,j ³∪S_i,j ⁴中的每个粒子，依据(1)式计算其HSV色彩直方图分布p_i,j,k＝{p_i,j,k ^(v)}_v＝1,...,m,i＝1,...,M,k＝1,...,10×W；For each particle in S _i,j ³ ∪S _i,j ⁴ , calculate its HSV color histogram distribution p _i,j,k ＝{p _i,j,k ^(v) } _{v＝ 1,...,m} ,i=1,...,M,k=1,...,10×W;

步骤S105:筛选“聚焦粒子群”，进行推理。对B_i,1中的粒子j，依据(2)式计算Target与S_i,j ³∪S_i,j ⁴中的每个粒子的直方图分布相似性度量值，得到度量值集合D_i,j＝{ρ[p(s),q]|s∈S_i,j ³∪S_i,j ⁴}，根据(3)式计算D_i,j取值，按下列推理规则决策匹配情况："如果存在ρ∈D_i,j使得Q(ρ)＝0，则匹配，且s₀＝mean{s|s∈S_i,j ³∪S_i,j ⁴,Q(ρ[p(s),q])＝0}，，转步骤S107；否则，若对所有j均未找到匹配位置，转步骤S106"；其中，s₀＝(x₀,y₀,H_x,0,H_y,0)为匹配的目标位置，mean表示取均值，由下式确定：mean{s|s∈S_i,j ³∪S_i,j ⁴}＝{(mean(x),mean(y),mean(H_x),mean(H_y))|(x,y,H_x,H_y)∈S_i,j ³∪S_i,j ⁴}；Step S105: Screening "Focused Particle Swarms" for reasoning. For particle j in B _i,1 , calculate the histogram distribution similarity measurement value between Target and each particle in S _i,j ³ ∪ S i, _j ⁴ according to formula (2), and obtain the measurement value set D _{i, j} ＝{ρ[p(s),q]|s∈S _i,j ³ ∪S _i,j ⁴ }, calculate the value of D _i,j according to formula (3), and decide the matching situation according to the following reasoning rules:" Match if there exists ρ∈D _i,j such that Q(ρ)=0, and s ₀ =mean{s|s∈S _i,j ³ ∪S _i,j ⁴ ,Q(ρ[p(s), q])=0}, go to step S107; otherwise, if no matching position is found for all j, go to step S106"; wherein, s ₀ =(x ₀ ,y ₀ ,H _x,0 ,H _y,0 ) is the matching target position, and mean means taking the mean value, which is determined by the following formula: mean{s|s∈S _i,j ³ ∪S _i,j ⁴ }＝{(mean(x),mean(y),mean( H _x ),mean(H _y ))|(x,y,H _x ,H _y )∈S _i,j ³ ∪S _i,j ⁴ };

步骤S106:判定当前帧图像中没有与待检测目标相匹配的图像区域，转步骤S108；Step S106: determine that there is no image area matching the target to be detected in the current frame image, and turn to step S108;

步骤S107:判定当前帧图像中存在与待检测目标相匹配的图像区域，该区域由s₀＝(x₀,y₀,H_x,0,H_y,0)标示，实现高分辨率的单帧图像目标定位，转步骤S108；Step S107: Determine that there is an image area matching the target to be detected in the current frame image, and this area is marked by s ₀ =(x ₀ ,y ₀ ,H _x,0 ,H _y,0 ), realizing high-resolution single Frame image target positioning, turn to step S108;

步骤S108:综合被检索视频中多帧图像的匹配结果，确定待检索目标是否出现在被检索视频中，并给出匹配度的可视化定性描述图。在被检索视频Dest的M帧图像中，将匹配的帧序号依次排列。若存在连续T₀帧均匹配的情形，则判定待检索目标出现在被检索视频中，出现时刻由连续匹配帧序列的首末帧序号标记；否则判定待检索目标未出现在被检索视频中。对每一帧图像，由(4)式可以作出匹配度的定性描述图，用于可视化直观显示目标在被检索图像中的局部相似度，用于计算定性描述图的粒子群由步骤S102～步骤S107中所有使用过的粒子组成。Step S108: Synthesize the matching results of multiple frames of images in the retrieved video, determine whether the target to be retrieved appears in the retrieved video, and provide a visual qualitative description map of the matching degree. In the M frame images of the retrieved video Dest, the matching frame numbers are arranged sequentially. If there is a situation where consecutive T+ ₀ frames are matched, it is determined that the target to be retrieved appears in the retrieved video, and the appearance time is marked by the first and last frame numbers of the continuous matching frame sequence; otherwise, it is determined that the target to be retrieved does not appear in the retrieved video. For each frame of image, a qualitative description map of the matching degree can be made by formula (4), which is used to visualize and intuitively display the local similarity of the target in the retrieved image, and the particle swarm used to calculate the qualitative description map consists of steps S102 to Composition of all particles used in S107.

在本实施例中，取W＝3。In this embodiment, W=3.

如图3所示，(a)中椭圆所标区域为待检索目标，(b)中椭圆所标区域为由本实施例方法计算得到的“某一帧被检索图像”中的匹配目标，在目标被部分遮挡的情况下实现了目标的精确匹配。As shown in Figure 3, the area marked by the ellipse in (a) is the target to be retrieved, and the area marked by the ellipse in (b) is the matching target in the "a certain frame of the retrieved image" calculated by the method of this embodiment. Accurate matching of the target is achieved in the case of being partially occluded.

本发明未详细阐述部分属于本领域技术人员的公知技术。Parts not described in detail in the present invention belong to the known techniques of those skilled in the art.

以上所述，仅为本发明较佳的具体实施方式，但本发明的保护范围并不局限于此，任何熟悉本技术领域的技术人员在本发明披露的技术范围内，可轻易想到的变化或替换，都应涵盖在本发明的保护范围之内。因此，本发明的保护范围应该以权利要求书的保护范围为准。The above is only a preferred embodiment of the present invention, but the scope of protection of the present invention is not limited thereto. Any person familiar with the technical field can easily conceive of changes or changes within the technical scope disclosed in the present invention. Replacement should be covered within the protection scope of the present invention. Therefore, the protection scope of the present invention should be determined by the protection scope of the claims.

Claims

1. A target retrieval method based on layered perception, characterized in that the realization steps are as follows:

Step (1) After normalizing the target to be retrieved, describe it with a histogram distribution;

Step (2) Construct the spatial closure ellipse region of the retrieved video single frame image, which is defined as the search particle group, use the histogram distribution to describe the particles, and find the particles whose similarity measure with the target to be retrieved is higher than a certain value. The area is the approximate location where the target may be, and realizes low-resolution target perception;

Step (3) Randomly generate 2W multi-scale, multi-neighborhood elliptical arrowhead regions near the approximate position of the above-mentioned target, which are defined as focused particle swarms, describe the particles with histogram distribution, and perform multi-scale, multi-neighborhood particle similarity Matching, to obtain the precise matching position of the target in the single frame image of the retrieved video, to achieve high-resolution target positioning;

Step (4) Synthesize the matching results of multiple frames of images in the retrieved video, determine whether the target to be retrieved appears in the retrieved video, and provide a visual gray qualitative description map of the matching degree;

Described step (2) concrete realization steps are as follows:

Step (2.1) lay out the search particle swarm, and the search particle swarm is arranged as follows: uniformly select N ₁ points in the image P _i of the i-th frame as the "basic particle swarm" S _i ¹ ={s _ij ¹ ,j=1, ..., N ₁ }, the particle major axis semi-length H _x is equal, and the minor axis semi-length H _y is equal, and H _x and H _y take the minimum value that satisfies the following two conditions: (a) the ellipse where the particle is located arrowhead is the closure of P _i , which satisfies (b) The coincidence degree of particles in pairs is not less than 10%, that is j≠k; then set the inverted particle group S _i ² of the particle group S _i ¹ in P _i ={s _ij ² ,j=1,...,N ₁ }, so that the centers of s _ij ² and s _ij ¹ coincide , the major axis semi-length H _x ' of s _ij ² is equal to the minor axis semi-length H _y of s _ij ¹ , the minor axis semi-length H _y ' of s _ij ² is equal to the major axis semi-length H _x of s _ij ¹ , S _i ¹ ∪S _i ² constitutes the search particle swarm;

Step (2.2) In the search particle swarm, by calculating the similarity measure between the two particle distributions, look for particles whose similarity measure with the target to be retrieved is higher than a certain set value, the two particle distributions p={p ^{( v)} } _v=1,...,m and q={q ^(v) } _v=1,...,m The similarity measure is the Bhattacharyya coefficient as follows:

ρ ρ [[p p,, q q]] = = {Σ Σ}_{v v = = 11}^{m m} \sqrt{{p p}^{((v v))} {q q}^{((v v))}} - - - - - - ((22))

Step (2.3) records the information of the particles whose similarity is higher than the set value. The area where these particles are located is the approximate position where the target may be. The union of these areas is the search range to be considered in step (3), and other areas are no longer carried out. search.

2. the target retrieval method based on layered perception according to claim 1, is characterized in that: the concrete realization of described step (1) is:

Step (1.1) scale down the target to be detected to a width of 80 pixels and a height of 152 pixels, and treat it as a particle The particle is defined as follows: Particle s=(x, y, H _x , H _y ) is an ellipse descriptor of the possible coverage of the target, where (x, y) is the center coordinate of the ellipse O _s , (H _x , H _y ) are the major axis semi-length and the minor axis semi-length of the ellipse respectively;

Step (1.2) Use a histogram to characterize the particle s _o , and the distribution of the particle histogram is defined as follows: use a histogram with a size of 8×8×4 in the HSV space to describe the image characteristics of the elliptical region O _s where the particle s is located, The histogram is determined by the function h( _xi )=j, _xi ∈O _s , j∈{1,2,...,256}, the color distribution of pixels in O _s p _y ={p _y ^(v) } _v=1,...,m is determined by the following formula:

{p p}_{y the y}^{((v v))} = = f f {Σ Σ}_{i i = = 11}^{I I} k k ((\frac{| | | | y the y - - {x x}_{i i} | | | |}{a a})) δ δ [[h h (({x x}_{i i})) - - v v]],, k k ((r r)) = = \{\begin{matrix} 11 - - {r r}^{22} & r r < < 11 \\ 00 & o o t t h h e e r r \end{matrix} - - - - - - ((11))

Among them, I is the number of pixels in O _s , δ is the Kronecker delta function, and the parameter normalization factor To ensure that the m=8×8×4=256.

3. The object retrieval method based on layered perception according to claim 1, characterized in that: said step (3) carries out multi-scale, multi-neighborhood particle similarity by arranging focused particle swarms at the approximate position of the above-mentioned object Degree matching, to achieve high-resolution target positioning, to obtain the precise matching position of the target in the single frame image of the video to be retrieved, the specific implementation steps are as follows:

The particles in the focused particle swarm in step (3.1) are the multi-scale and multi-neighborhood extensions of the search particles that meet the conditions of step (2), and the focused particle swarm is determined as follows: For particle j in the search particle swarm (denoted as s _j =( x _j ,y _j ,H _x,j ,H _y,j )), according to the following steps to generate the focused particle swarm S _i,j ³ ∪S _i,j ⁴ :

Step (3.1.1) Randomly generate W non-repeating random numbers {α _k } _{k=1,..., W} and W non-repeating values in [0.1, 0.3] ] random number {β _k } _k=1,...,W , then in:

s _j,k ¹ ={x _j ×(1-β _k ),y _j ×(1-β _k ),H _x,j ×α _k ,H _y,j ×α _k },

s _j,k ² ＝{x _j ×(1-β _k ),y _j ×(1+β _k ),H _x,j ×α _k ,H _y,j ×α _k },

s _j,k ³ ＝{x _j ×(1+β _k ),y _j ×(1-β _k ),H _x,j ×α _k ,H _y,j ×α _k },

s _j,k ⁴ ＝{x _j ×(1+β _k ),y _j ×(1+β _k ),H _x,j ×α _k ,H _y,j ×α _k },

s _j,k ⁵ ={x _j ,y _j ,H _x,j ×α _k ,H _y,j ×α _k };

Step (3.1.2) Randomly generate W non-repeating random numbers {α _k } _{k=1,..., W} and W non-repeating values in [0.1, 0.3] ] random number {β _k } _k=1,...,W , then in:

s _j,k ⁵ ={x _j ,y _j ,H _x,j ×α _k ,H _y,j ×α _k };

Step (3.2) Calculate the similarity measure between the particles in the focused particle swarm and the target particle with retrieval, and perform qualitative mapping according to the measured value. The average value of the positions of all focused particles whose mapped value meets certain conditions is the single frame image of the video to be retrieved. The precise matching position in ; since then, the target positioning in a single frame image has been realized. The similarity measure of particles is the matching degree of the image in the particle ellipse area. Human perception of matching degree is qualitative, and the qualitative mapping is defined as follows:

4. The object retrieval method based on layered perception according to claim 1, characterized in that: said step (4) synthesizes the matching results of multiple frames of images in the retrieved video, and determines whether the object to be retrieved appears in the retrieved video , and give a visual qualitative description of the matching degree. The specific implementation steps are as follows:

Step (4.1) In the M frame images of the retrieved video Dest, arrange the matching frame numbers in sequence. If there are consecutive T ₀ frames that match, it is determined that the target to be retrieved appears in the retrieved video, and the time of appearance is determined by consecutive Match the first and last frame number marks of the frame sequence; otherwise, it is determined that the target to be retrieved does not appear in the retrieved video;

Step (4.2) provides a visual qualitative description map of the matching degree, which is used to visually display the local similarity of the target in the retrieved image G. The visual qualitative description map of the matching degree refers to a grayscale image defined by the following formula:

{(x,y,f(x,y))|x∈[0,width],y∈[0,height]} (4)

Among them, f(x, y) is a qualitative whitening weight function, and width and height are the width and height of the image respectively. Here, the position information of the target to be retrieved in the retrieved image G is expressed as the following interval gray number quadruple:

{&CircleTimes; &CircleTimes;}_{G G} = = (({&CircleTimes; &CircleTimes;}_{x x},, {&CircleTimes; &CircleTimes;}_{y the y},, {&CircleTimes; &CircleTimes;}_{{H h}_{x x}},, {&CircleTimes; &CircleTimes;}_{{H h}_{y the y}})) = = (([[00,, w w i i d d t t h h]],, [[00,, h h e e i i g g h h t t]],, [[00,, w w i i d d t t h h / / 22]],, [[00,, h h e e i i g g h h t t / / 22]])) - - - - - - ((55))

in In turn, it is the gray number of the center abscissa x, center y coordinate y, major axis semi-length H _x and short axis semi-length H _y of the matching object in G; assuming Obey the independent multivariate Gaussian distribution of each variable, the particle swarm arranged in G is {( _xi ,y _i ,H _x,i ,H _y,i ),i=1,...,N}, the particle swarm and the The similarity measure value of the retrieval target is {ρ _i ,i=1,...,N}, which means The whitening weight function of is as follows:

f f ((x x,, y the y)) = = Q Q ((\underset{i i = = 11,, ... ...,, N N}{m m a a x x} ((\frac{11}{\sqrt{22 π π} {σ σ}_{i i,, x x}} exp exp ((- - {((\frac{x x - - {x x}_{i i}}{22 {σ σ}_{i i,, x x}}))}^{22})) \times \times \frac{11}{\sqrt{22 π π} {σ σ}_{i i,, y the y}} exp exp ((- - {((\frac{y the y - - {y the y}_{i i}}{22 {σ σ}_{i i,, y the y}}))}^{22})))))) - - - - - - ((66))

\frac{11}{\sqrt{22 π π} {σ σ}_{i i,, x x}} = = \sqrt{{ρ ρ}_{i i}},, \frac{11}{\sqrt{22 π π} {σ σ}_{i i,, y the y}} = = \sqrt{{ρ ρ}_{i i}} - - - - - - ((77))

It can be seen from the above formula that the value of the whitening weight function is in the qualitative set.