[go: up one dir, main page]

CN101493890B - Dynamic vision caution region extracting method based on characteristic - Google Patents

Dynamic vision caution region extracting method based on characteristic Download PDF

Info

Publication number
CN101493890B
CN101493890B CN2009100466886A CN200910046688A CN101493890B CN 101493890 B CN101493890 B CN 101493890B CN 2009100466886 A CN2009100466886 A CN 2009100466886A CN 200910046688 A CN200910046688 A CN 200910046688A CN 101493890 B CN101493890 B CN 101493890B
Authority
CN
China
Prior art keywords
feature
saliency
picture
small
basis
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN2009100466886A
Other languages
Chinese (zh)
Other versions
CN101493890A (en
Inventor
侯小笛
祁航
张丽清
祝文骏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Jiao Tong University
Original Assignee
Shanghai Jiao Tong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Jiao Tong University filed Critical Shanghai Jiao Tong University
Priority to CN2009100466886A priority Critical patent/CN101493890B/en
Publication of CN101493890A publication Critical patent/CN101493890A/en
Application granted granted Critical
Publication of CN101493890B publication Critical patent/CN101493890B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Image Processing (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

本发明涉及的是一种机器视觉技术领域的基于特征的动态视觉注意区域提取方法。步骤为:采用独立成分分析方法对大量的自然图像进行稀疏分解,得到一组滤波基函数和对应的一组重构基函数,将输入的图像分成m×m的RGB小块,并投影到这组基上,得到该图的特征;利用有效编码原理,为每个特征衡量增量编码长度指标;第三步,依据这些增量编码长度指标,通过对各个特征的能量重新分配来处理各个小块的显著度,从而最终得到显著地图。本发明可以消除“时间片”,连续的采样,从而不同帧的数据可以一起指导显著度的处理,解决了不同帧的显著度需要独立处理的问题,实现了动态性。

Figure 200910046688

The invention relates to a feature-based dynamic visual attention area extraction method in the technical field of machine vision. The steps are: use the independent component analysis method to sparsely decompose a large number of natural images, obtain a set of filtering basis functions and a corresponding set of reconstruction basis functions, divide the input image into m×m RGB small blocks, and project them to these On the group basis, the features of the graph are obtained; using the principle of effective coding, the incremental encoding length index is measured for each feature; the third step, according to these incremental encoding length indicators, the energy of each feature is redistributed to process each small The saliency of the block, so that the saliency map is finally obtained. The invention can eliminate "time slice" and continuous sampling, so that the data of different frames can guide the processing of salience together, solve the problem that the saliency of different frames needs to be processed independently, and realize dynamics.

Figure 200910046688

Description

基于特征的动态视觉注意区域提取方法 Feature-based Dynamic Visual Attention Region Extraction Method

技术领域technical field

本发明涉及的是一种图像处理技术领域的方法,具体地说,涉及的是一种基于特征的动态的视觉注意区域提取方法。The present invention relates to a method in the technical field of image processing, in particular to a feature-based dynamic visual attention region extraction method.

背景技术Background technique

随着人工智能技术地不断发展,机器视觉在现实生活中的应用越来越多,其主要用计算机来模拟人的视觉功能,但并非仅仅是人眼的简单延伸,更重要的是具有人脑的一部分功能——从客观事物的图像中提取信息,进行处理并加以理解,最终用于实际检测、测量和控制。由于机器视觉具有速度快、信息量大、功能多的特点,其在质量检测,身份认证,物体检测与识别,机器人,自动小车等的应用十分广泛。With the continuous development of artificial intelligence technology, machine vision is used more and more in real life. It mainly uses computers to simulate human visual functions, but it is not just a simple extension of the human eye, but more importantly, it has the human brain. A part of the function - to extract information from the image of objective things, process and understand it, and finally use it for actual detection, measurement and control. Because machine vision has the characteristics of fast speed, large amount of information, and multiple functions, it is widely used in quality inspection, identity authentication, object detection and recognition, robots, and automatic cars.

目前工程上已经可以做出在各个方面(包括视角,视锐度,广谱性,动态特性)都超过人眼能力的传感器,可以说对“视”的探索已经到达一定程度了,但机器视觉系统不仅需要“视”,还需要“觉”。由于人的选择性注意机制保证了人眼获取信息的高效性,吸引了人们的关注和研究,各种视觉注意区域的提取技术被提出并得到了广泛的应用。例如,利用基于选择性注意机制的视觉注意区域的提取技术来找到图像中的感兴趣区域,然后优先在这些区域中进行搜索,从而提高了物体检测和识别的效率;利用找到的感兴趣区域,进行高效的图片压缩(感兴趣的区域压缩比低,其他区域压缩比高),和图片缩放(感兴趣的区域的放缩比例大于其他区域),等等。视觉注意区域的提取技术在获取信息的高效性有着巨大的优势,因而经常出现中机器视觉系统处理过程中。At present, engineering has been able to produce sensors that exceed the capabilities of the human eye in all aspects (including viewing angle, visual acuity, broad-spectrum, and dynamic characteristics). It can be said that the exploration of "vision" has reached a certain level, but machine vision The system needs not only "vision", but also "feeling". Since the human selective attention mechanism ensures the high efficiency of human eyes to obtain information, it has attracted people's attention and research, and various extraction techniques of visual attention regions have been proposed and widely used. For example, using the visual attention region extraction technology based on the selective attention mechanism to find the regions of interest in the image, and then preferentially search in these regions, thereby improving the efficiency of object detection and recognition; using the region of interest found, Efficient image compression (low compression ratio in the region of interest, high compression ratio in other regions), and image scaling (the zoom ratio of the region of interest is larger than that of other regions), and so on. The extraction technology of visual attention area has a huge advantage in the efficiency of information acquisition, so it often appears in the process of machine vision system processing.

经对现有技术的文献检索发现,视觉注意区域的提取技术是Koch和Ullman于1985年提出的显著地图(Saliency Map),后来该技术经Itti和Koch完善,形成一整套关于显著地图的体系。具体可以参考文献:″LAURENT I,CHRISTOF K,ERNST N.A model of saliency-based visual attention for rapid sceneanlysis[J].IEEE Transactions on PAMI,1998,20(11):1254-1259″,(作者:LAURENT I,CHRISTOF K,ERNST N,题目:一个可用于快速场景分析的基于显著度(saliency)的视觉注意模型,杂志:模式分析和机器智能IEEE会刊,1998年20卷,第11期,1254-1259页)。该方法是基于空间的提取技术,首先将图片分成颜色,方向,亮度,纹理等几个平行的通道,然后对于每个通道分别抽取信息,形成一张保留了图片拓扑结构,但同时对特征的响应强弱有记录的特征地形图(feature maps),接下来,每个特征地形图都经过一系列尺度的“墨西哥帽(Difference of Gaussian)”函数进行滤波,它是对两个不同尺度的高斯函数求差之后得到的函数。该函数对于检测变化非常敏感,而对于一而弥撒的信号反映非常弱,具有普遍的生物意义。接下来,使用竞争网络的胜者全赢(Winner-Take-All)模型对于不同的注意区域进行比较,最终生成一个关于全局每个点显著度的地图,称为显著地图。该方法以及后来的基于空间的分析技术虽然在很多场景中有很好的表现,但几乎都无可避免的面临着下面的问题:1)只能关注特定一部分视觉线索,2)注意力的分配在时间上是不连续的。例如,在对连续影像进行观测时,系统就无法考虑多帧的情况,这就导致每个时刻都需要单独地重新分析显著地图,使得系统的连续性,可靠性都大幅下降。而且,当视角以及物体的位置发生变化的时候,由于没有对特征的追踪机制,新的显著地图的预测很有可能与前帧发生偏移。此外,一系列视觉注意行为,例如返回抑制,以及视点转移等,都无法在基于空间的分析技术中得到很好的实现。According to the literature search of the existing technology, the extraction technology of visual attention area is the saliency map (Saliency Map) proposed by Koch and Ullman in 1985, which was later perfected by Itti and Koch to form a whole set of saliency map system. For details, please refer to the literature: "LAURENT I, CHRISTOF K, ERNST N.A model of saliency-based visual attention for rapid scene anlysis[J]. IEEE Transactions on PAMI, 1998, 20(11): 1254-1259", (Author: LAURENT I , CHRISTOF K, ERNST N, Title: A saliency-based visual attention model for fast scene analysis, Journal: IEEE Transactions on Pattern Analysis and Machine Intelligence, 1998, Vol. 20, No. 11, 1254-1259 Page). This method is a space-based extraction technology. First, the image is divided into several parallel channels such as color, direction, brightness, and texture, and then information is extracted for each channel to form a picture that retains the topological structure of the image, but at the same time extracts the information of the features. Response to feature maps with recorded strengths and weaknesses. Next, each feature map is filtered by a series of "Mexican hat (Difference of Gaussian)" functions, which are two different scales of Gaussian Function obtained after function difference. This function is very sensitive to detect changes, but very weak to the signal of a mass, which has general biological significance. Next, use the Winner-Take-All model of the competition network to compare different attention regions, and finally generate a map of the saliency of each point in the world, called a saliency map. Although this method and the subsequent spatial-based analysis techniques performed well in many scenes, they almost inevitably face the following problems: 1) only pay attention to a specific part of visual cues, 2) the distribution of attention It is not continuous in time. For example, when observing continuous images, the system cannot consider the situation of multiple frames, which leads to the need to re-analyze the saliency map separately at each moment, which greatly reduces the continuity and reliability of the system. Moreover, when the viewing angle and the position of the object change, since there is no tracking mechanism for features, the prediction of the new saliency map is likely to be offset from the previous frame. Furthermore, a range of visual attentional behaviors, such as inhibition of return, and viewpoint shift, are not well represented by spatially-based analysis techniques.

发明内容Contents of the invention

本发明的目的在于克服现有技术的不足,提供一种基于特征的动态视觉注意区域提取方法,该方法对显著度的定义是基于特征本身,而非特征的空间分布差异,可以消除“时间片”,连续的采样,从而不同帧(时间)的数据可以一起指导显著度的处理,解决了不同帧(时间)的显著度需要独立处理的问题,实现了动态性。The purpose of the present invention is to overcome the deficiencies of the prior art and provide a method for extracting dynamic visual attention regions based on features. The definition of salience in this method is based on the features themselves rather than the spatial distribution differences of the features, which can eliminate the "time slice" ", continuous sampling, so that the data of different frames (time) can guide the processing of saliency together, which solves the problem that the saliency of different frames (time) needs to be processed independently, and realizes dynamics.

本发明是通过以下技术方案实现的,包括以下步骤:The present invention is achieved through the following technical solutions, comprising the following steps:

第一步,采用独立成分分析方法对大量的自然图像进行稀疏分解,得到一组滤波基函数和对应的一组重构基函数,将输入的图像分成m×m的RGB小块,并投影到这组基上,得到该图的特征;In the first step, the independent component analysis method is used to sparsely decompose a large number of natural images, and a set of filtering basis functions and a corresponding set of reconstruction basis functions are obtained. The input image is divided into m×m RGB small blocks, and projected to Based on this group, the features of the graph are obtained;

第二步,利用有效编码原理,即当一个系统是有效编码时,其熵最大的原理,为每个特征衡量增量编码长度指标;The second step is to use the effective encoding principle, that is, when a system is effectively encoded, its entropy is the largest, and measure the incremental encoding length index for each feature;

第三步,依据这些增量编码长度指标,通过对各个特征的能量重新分配来处理各个小块的显著度,从而最终得到显著地图。In the third step, according to these incremental encoding length indicators, the saliency of each small block is processed by redistribution of the energy of each feature, so as to finally obtain a saliency map.

所述的第一步,具体如下:The first step, as described, is as follows:

①将训练图片分成若干个m×m像素大小的RGB彩色小块,并将每个小块向量化。对自然图片进行采样,得到大量的m×m的RGB彩色小块,将其作为训练样本。m的取值可以是8,16或32。m为每个RGB彩色小块的边长。① Divide the training picture into several RGB color blocks of m×m pixel size, and vectorize each small block. Sampling natural pictures to obtain a large number of m×m RGB color blocks, which are used as training samples. The value of m can be 8, 16 or 32. m is the side length of each RGB color block.

②通过标准的独立成分分析(ICA)方法,训练出基函数(A,W)。基函数的个数为m×m×3=3m2,即 W = [ w 1 , w 2 , . . . w 3 m 2 ] , 其中wi为第i个滤波基函数(A的大小与W一样,1≤i≤3m2)。A,W是ICA方法训练出来的基函数,其值可以取任意范围,由输入决定。②Basic functions (A, W) are trained through the standard independent component analysis (ICA) method. The number of basis functions is m×m×3=3m 2 , namely W = [ w 1 , w 2 , . . . w 3 m 2 ] , Where w i is the i-th filtering basis function (the size of A is the same as W, 1≤i≤3m 2 ). A and W are the basis functions trained by the ICA method, and their values can be in any range, determined by the input.

③对于任意一幅图片X,将其分成n个m×m的RGB小块,形成采样矩阵X=[x1,x2,…xn],其中xk是第k个图像块的向量化表示(1≤k≤n),对xk进行线性变换 S k = W x k = [ s k , 1 , s k , 2 , . . . , s k , 3 m 2 ] , 其中W是训练好的滤波基函数。则Sk为基函数的对应系数,也就是图片小块xk的特征,sk,i为第i个基函数的对应系数,即为第i个特征的值。对所有的xk都做同样的处理,得到X的特征S=[S1,S2,...,Sn]。n是输入图片X被分成RGB小块的个数,其值是由输入图片X的大小和m的取值所确定的。③For any picture X, divide it into n m×m RGB small blocks to form a sampling matrix X=[x 1 , x 2 ,…x n ], where x k is the vectorization of the kth image block Indicates (1≤k≤n), linear transformation of x k S k = W x k = [ the s k , 1 , the s k , 2 , . . . , the s k , 3 m 2 ] , where W is the trained filtering basis function. Then S k is the corresponding coefficient of the basis function, that is, the feature of the image block x k , and s k,i is the corresponding coefficient of the i-th basis function, that is, the value of the i-th feature. Do the same process for all x k to get the feature S=[S 1 , S 2 , . . . , S n ] of X. n is the number of RGB blocks that the input image X is divided into, and its value is determined by the size of the input image X and the value of m.

第一步处理结束后,对于输入图片X,已经构造出3m2个特征S,接下来进行第二步。After the first step of processing, for the input image X, 3m 2 features S have been constructed, and then the second step will be performed.

所述第二步,具体如下:The second step is as follows:

①对于每个特征,计算激活率pi ① For each feature, calculate the activation rate p i

pp ii == ΣΣ kk sthe s kk ,, ii 22 ΣΣ jj ΣΣ kk sthe s kk ,, jj 22 -- -- -- (( 2.12.1 ))

这个量代表了该特征的平均能量发放水平。This amount represents the average energy release level for that feature.

②考虑熵在第i个特征的激活率pi上的变化,即第i个特征的增量编码长度指标,令 p = { p 1 , p 2 , p 3 m 2 } , 为随机变量的概率分布。假设,某个特定时刻的特征激活率分布为p,当第i个特征被激活时会对pi带来一个微小的扰动ε,因此新的分布

Figure G2009100466886D00041
变成:②Considering the change of entropy on the activation rate p i of the i-th feature, that is, the incremental encoding length index of the i-th feature, let p = { p 1 , p 2 , p 3 m 2 } , is the probability distribution of a random variable. Assuming that the feature activation rate distribution at a specific moment is p, when the i-th feature is activated, it will bring a small disturbance ε to p i , so the new distribution
Figure G2009100466886D00041
become:

pp ^^ == pp jj ++ ϵϵ 11 ++ ϵϵ ,, if jif j == ii pp jj 11 ++ ϵϵ ,, if jif j ≠≠ ii -- -- -- (( 2.22.2 ))

因此,第i个特征的增量编码长度为:Therefore, the incremental encoding length of the i-th feature is:

ICLICL (( pp ii )) == ∂∂ Hh (( pp )) ∂∂ pp ii == 11 -- Hh (( pp )) -- pp ii -- loglog pp ii -- pp ii loglog pp ii -- -- -- (( 2.32.3 ))

本发明借助预测编码的原理,将能量、特征与显著度挂钩。增量编码长度(ICL)衡量了每个特征的对感知的熵变化率。这一指标用来指导能量分配,从而实现系统整体上实现预测编码——即常见的信息会尽量少的使得系统产生响应,而稀有的信息通常会触发系统的强烈响应。The present invention links energy, features and salience by means of the principle of predictive coding. Incremental Coding Length (ICL) measures the rate of change of perceptual entropy for each feature. This metric is used to guide the energy allocation so that the system as a whole achieves predictive coding—that is, common information causes the system to respond as little as possible, while rare information usually triggers a strong system response.

所述第三步,具体如下:The third step is as follows:

①根据所得到出的各个特征的增量编码长度指标,划分显著特征集合SF:①Divide the salient feature set SF according to the obtained incremental coding length index of each feature:

SF={i|ICL(pi)>0}(3.1)SF={i|ICL(p i )>0} (3.1)

划分{SF,SF}唯一确定了会导致整体系统的熵增的特征。并且该划分有着明确的数学意义,对于一个特征,只有当其它在特征分布上是稀有的,也就是说,当对该特征进行新的观测,会导致整体特征分布p的熵增加。The partition {SF, SF} uniquely identifies the features that lead to an increase in the entropy of the overall system. And this division has a clear mathematical meaning. For a feature, only when the other is rare in the feature distribution, that is, when a new observation is made on the feature, the entropy of the overall feature distribution p will increase.

②依照预测编码原则,在各个特征之间重新分配能量,对于显著特征集合内的特征i,分配权重di(i∈SF):②According to the principle of predictive coding, redistribute energy between each feature, and assign weight d i (i∈SF) to feature i in the salient feature set:

dd ii == ICLICL (( pp ii )) ΣΣ kk ∈∈ SFSF ICLICL (( PP kk )) ,, if iif i ∈∈ SFSF -- -- -- (( 3.23.2 ))

而对于非显著特征,定义其 d k = 0 ( k ∉ SF ) . And for non-significant features, define its d k = 0 ( k ∉ SF ) .

③对于图片小块xk,其显著度定义为mk③For a picture block x k , its saliency is defined as m k :

mm kk == ΣΣ ii ∈∈ SFSF dd ii ww ii TT xx kk -- -- -- (( 3.33.3 ))

④有了各个图片小块的显著度之后,通过重构基A,生成整幅图片的显著地图M:④ After obtaining the saliency of each small block of the picture, the saliency map M of the whole picture is generated by reconstructing the base A:

Mm == ΣΣ kk ∈∈ SFSF AA kk mm kk -- -- -- (( 3.43.4 ))

其中Ak表示重构基A的第k个列向量。where A k represents the kth column vector of the reconstructed basis A.

从公式(3.3)中可以看出,对于图片小块的显著度不是常量,而是会随着时间而发生变化。并且,由于在本发明的方法中,采样是一种连续过程,特征的权重会随着采样的增加连续变化,那么就可以成功地把采样变化理解为上下文对特征注意权重的影响。所谓“显著特征”,之所以显著,都是相对于当前上下文的特征分布而言的。It can be seen from formula (3.3) that the saliency of a small image block is not constant, but changes with time. And, because in the method of the present invention, sampling is a continuous process, and the weight of the feature will change continuously with the increase of sampling, then the sampling change can be successfully understood as the influence of the context on the weight of feature attention. The so-called "significant features" are significant because they are relative to the feature distribution of the current context.

本发明的有益效果是:(1)由于采用的过滤基是预先训练好的,因此在处理一张新的输入图片时,不需要重新训练基函数,使得处理速度非常快,效率高,可以做到实时处理。(2)由于采用了基于特征本身,而非特征的空间分布差异的方式来对显著度进行分析,消除了对图片空间结构上的限制。在处理上,由于连续的采样,消除了“时间片”,从而不同帧(时间)的数据可以一起指导显著度的处理,解决了不同帧(时间)的显著度需要独立处理的问题,实现了动态性。The beneficial effects of the present invention are: (1) because the filter base that adopts is trained in advance, so when processing a new input picture, do not need to train the base function again, make processing speed very fast, efficiency is high, can do to real-time processing. (2) Since the saliency is analyzed based on the feature itself rather than the difference in the spatial distribution of the feature, the limitation on the spatial structure of the image is eliminated. In terms of processing, due to the continuous sampling, the "time slice" is eliminated, so that the data of different frames (time) can guide the processing of saliency together, which solves the problem that the saliency of different frames (time) needs to be processed independently, and realizes dynamic.

附图说明Description of drawings

图1.静态图片的显著地图;Figure 1. Saliency maps for static images;

其中:(a)、(d)、(g)为输入图片,(b)、(e)、(h)为本发明生成的显著地图,(c)、(f)、(i)为标注的眼动数据。Among them: (a), (d), (g) are input pictures, (b), (e), (h) are salient maps generated by the present invention, (c), (f), (i) are marked eye movement data.

图2.视屏(动态视觉)的显著地图。Figure 2. Saliency map of video screens (dynamic vision).

具体实施方式Detailed ways

下面结合附图对本发明的实施例作详细说明:本实施例在以本发明技术方案为前提下进行实施,给出了详细的实施方式和具体的操作过程,但本发明的保护范围不限于下述的实施例。The embodiments of the present invention are described in detail below in conjunction with the accompanying drawings: this embodiment is implemented on the premise of the technical solution of the present invention, and detailed implementation methods and specific operating procedures are provided, but the protection scope of the present invention is not limited to the following the described embodiment.

1.特征构造1. Feature construction

(1)采用的RGB彩色小块大小为8×8,通过对大量的自然图片进行采样,得到120000个8×8的RGB彩色小块,这些RGB彩色小块为基函数的训练数据。(1) The size of the RGB color block used is 8×8. By sampling a large number of natural pictures, 120,000 8×8 RGB color blocks are obtained. These RGB color blocks are the training data of the basis function.

(2)利用ICA方法来训练基函数(A,W),由于采用8×8的RGB彩色小块来作为训练样本,即m=8,因此基函数的个数为3×82=192。(2) Use the ICA method to train the basis functions (A, W). Since 8×8 RGB color blocks are used as training samples, that is, m=8, the number of basis functions is 3×8 2 =192.

(3)对于输入彩色图片,例如图片的大小为800×640,将其分成8000个8×8的RGB彩色小块,即n=8000,形成采样矩阵X=[x1,x1,...,x8000],其中xk是第k个图像块的向量化表示,对其进行线性变换Sk=Wxk=[sk,1,sk,2,...,sk,192],其中W是训练好的滤波基函数。则Sk为基函数的对应系数,也就是图片小块xk的特征。sk,i为第i基函数对应系数,即为第i个特征。(3) For the input color picture, for example, the size of the picture is 800×640, divide it into 8000 8×8 RGB color blocks, that is, n=8000, and form a sampling matrix X=[x 1 , x 1 , .. ., x 8000 ], where x k is the vectorized representation of the kth image block, and it is linearly transformed S k = Wx k = [s k, 1 , s k, 2 ,..., s k, 192 ], where W is the trained filtering basis function. Then S k is the corresponding coefficient of the basis function, that is, the feature of the picture block x k . s k, i is the coefficient corresponding to the i-th basis function, which is the i-th feature.

2.衡量增量编码长度(ICL)指标2. Measuring Incremental Code Length (ICL) Index

(1)对各个特征,依据公式(2.1)计算其激活率p。(1) For each feature, calculate its activation rate p according to formula (2.1).

(2)根据各个特征的激活率,根据公式(2.3)衡量其增量编码长度指标(2) According to the activation rate of each feature, measure its incremental encoding length index according to the formula (2.3)

3.生成显著地图3. Generate saliency map

(1)根据2中得到的各个特征的增量编码长度指标,利用公式(3.1)划分显著特征集合SF。(1) According to the incremental coding length index of each feature obtained in 2, use the formula (3.1) to divide the salient feature set SF.

(2)利用公式(3.2),重新分配显著特征集合内的各特征的能量。(2) Using formula (3.2), redistribute the energy of each feature in the salient feature set.

(3)对于图片小块xk,根据公式(3.3),处理其显著度mk (3) For the picture block x k , according to formula (3.3), process its saliency m k

(4)有了输入图片的各个小块的显著度,利用公式(3.3),得到输入图片的显著地图M。(4) With the saliency of each small block of the input image, use the formula (3.3) to obtain the saliency map M of the input image.

实例一:静止图片的显著地图Example 1: Saliency Maps for Still Images

采用8×8的RGB小块来训练基函数(A,W),它们的维数为192。8×8 RGB patches are used to train the basis functions (A, W), and their dimensionality is 192.

对于大小为800×640的输入图片,将其分成8000个8×8的RGB彩色小块,即n=8000,形成采样矩阵X=[x1,x1,...,x8000]。并通过公式S=WX来计算基函数对应系数,即X的特征。For an input picture with a size of 800×640, divide it into 8000 8×8 RGB color blocks, ie n=8000, to form a sampling matrix X=[x 1 , x 1 , . . . , x 8000 ]. And the corresponding coefficients of the basis functions, that is, the characteristics of X, are calculated by the formula S=WX.

通过公式(2.1)得到各个特征激活率p,并依据p和公式(2.3)衡量各个特征的增量编码长度指标。The activation rate p of each feature is obtained by formula (2.1), and the incremental coding length index of each feature is measured according to p and formula (2.3).

根据各个特征的增量编码长度指标和公式(3.1)划分显著特征集合SF,并利用公式(3.2),重新分配显著特征集合内的各特征的能量。那么对于图片小块xk,根据公式(3.3),处理其显著度mk,并最终利用公式(3.4),生成输入图片的显著地图M。Divide the salient feature set SF according to the incremental coding length index of each feature and the formula (3.1), and use the formula (3.2) to redistribute the energy of each feature in the salient feature set. Then, for the picture block x k , according to the formula (3.3), process its saliency m k , and finally use the formula (3.4) to generate the saliency map M of the input picture.

当顺序地对一张静态图片上的每个图像小块采样的时候,就可以估计该图片的特征分布特性,从而构筑出显著地图,将生成出的显著地图会进一步和人的眼动数据进行对比分析,以验证模型的正确性。图1中,(a)、(d)、(g)为输入图片,(b)、(e)、(h)为本发明生成的显著地图,(c)、(f)、(i)为标注的眼动数据。在实施例中,采用了文献″BRUCE N,TSOTSOS J.Saliency Basedon Information Maximization[J],Advances in Neural InformationProcessing Systems,2006,18:155-162″(作者:BRUCE N,TSOTSOS J.题目:基于信息量最大化的显著度,杂志:高级神经信息处理系统,2006年18期,第155-162页)所提供的眼动数据作为基准,比较了的模型与传统的模型,结果表明本发明取得了最佳成绩。When sampling each small image block on a static image sequentially, the feature distribution characteristics of the image can be estimated, thereby constructing a saliency map, and the generated saliency map will be further compared with human eye movement data Comparative analysis to verify the correctness of the model. In Figure 1, (a), (d), (g) are input pictures, (b), (e), (h) are salient maps generated by the present invention, (c), (f), (i) are Annotated eye movement data. In the embodiment, the document "BRUCE N, TSOTSOS J.Saliency Basedon Information Maximization [J], Advances in Neural Information Processing Systems, 2006, 18: 155-162" (Author: BRUCE N, TSOTSOS J. Topic: Based on Information The salience degree of quantity maximization, magazine: Advanced Neural Information Processing System, 2006, No. 18, pages 155-162) provided eye movement data as a benchmark, compared the model and the traditional model, the results show that the present invention has achieved Best score.

实例二:视屏中的显著地图Example 2: Salient map in video

相比以往同类方法,本发明的方法的一大优点在于它是连续的。增量编码长度是一个连续更新的过程。特征激活率的分布的变化可以是基于空域的,也可以是居于时域的。如果考虑时域变化是一个拉普拉斯分布的话,假定pt是第t帧,那么可以认为pt是以前特征响应的累积和:Compared with previous similar methods, a great advantage of the method of the present invention is that it is continuous. Incremental encoding length is a process of continuous updating. The change of the distribution of the feature activation rate can be based on the space domain or the time domain. If considering that the time domain change is a Laplace distribution, assuming that p t is the tth frame, then p t can be considered as the cumulative sum of the previous characteristic responses:

pp tt == 11 ZZ ΣΣ ττ == 00 tt -- 11 expexp (( ττ -- tt λλ )) pp ^^ ττ

其中λ是半衰期, Z = ∫ p ^ t ( x ) dx 是标准化函数。where λ is the half-life, Z = ∫ p ^ t ( x ) dx is a standardized function.

在对视屏做视觉注意提取的时候,通常面临目标运动和观测视角运动的问题。然而,在的基于特征的注意模型框架下,这些问题都迎刃而解,因为特征总会随着物体在视野中的位置而移动。When extracting visual attention from a video screen, it usually faces the problem of target movement and observation perspective movement. However, under the framework of the feature-based attention model, these problems are easily solved, because the features always move with the position of the object in the field of view.

分析图像的信噪比(SNR),其定义如下:Analyze the signal-to-noise ratio (SNR) of the image, which is defined as follows:

SNRSNR (( tt )) == ΣΣ ii ∈∈ Ff mm ii tt ΣΣ jj ∉∉ Ff mm jj tt

其中F是手工标注的“前景”。当分别对250帧画面进行手工标注后,就对每一帧处理其显著度,其过程除了特征激活率p不同外,其他过程与生成静止图片的显著地图是一致的。之后,可以将生成的显著地图与手工标注进行对比,分析信噪比值。图2反映了结果,图中第一行为视频的截图,第二行反应了本发明的信噪比,最后一行则为Itti模型的信噪比,从图中可以看出,本发明的平均信噪比为0.4803,远好于现主流的Itti模型的0.1680。where F is the manually annotated "foreground". After manually annotating 250 frames, the saliency of each frame is processed. The process is consistent with the saliency map of still pictures except that the feature activation rate p is different. Afterwards, the generated saliency map can be compared with manual annotations to analyze the signal-to-noise ratio value. Fig. 2 reflects the result, among the figure, the screenshot of the first line video, the second line reflects the signal-to-noise ratio of the present invention, and the last line is the signal-to-noise ratio of the Itti model, as can be seen from the figure, the average signal-to-noise ratio of the present invention The noise ratio is 0.4803, much better than the 0.1680 of the current mainstream Itti model.

Claims (1)

1.一种基于特征的动态视觉注意区域提取方法,其特征在于包括以下步骤:1. a feature-based dynamic visual attention region extraction method is characterized in that comprising the following steps: 第一步,采用独立成分分析方法对大量的自然图像进行稀疏分解,得到一组滤波基函数和对应的一组重构基函数,将输入的图像分成m×m的RGB小块,并投影到这组滤波基函数上,得到该图的特征,具体如下:In the first step, the independent component analysis method is used to sparsely decompose a large number of natural images, and a set of filtering basis functions and a corresponding set of reconstruction basis functions are obtained. The input image is divided into m×m RGB small blocks, and projected to On this set of filtering basis functions, the characteristics of the graph are obtained, as follows: ①将训练图片分成若干个m×m像素大小的RGB彩色小块,并将每个小块向量化,对自然图片进行采样,得到大量的m×m的RGB彩色小块,将其作为训练样本,m的取值是8,16或32,m为每个RGB彩色小块的边长;① Divide the training picture into several small RGB color blocks of m×m pixel size, and vectorize each small block, and sample the natural picture to obtain a large number of m×m RGB color blocks, which are used as training samples , the value of m is 8, 16 or 32, and m is the side length of each RGB color block; ②通过标准的独立成分分析方法,训练出基函数(A,W),基函数的个数为m×m×3=3m2,即 
Figure FSB00000374613100011
其中wi为第i个滤波基函数,基函数A的个数大小与基函数W一样,1≤i≤3m2,A,W是ICA方法训练出来的基函数;
②Basic functions (A, W) are trained through the standard independent component analysis method, and the number of basic functions is m×m×3=3m 2 , namely
Figure FSB00000374613100011
Where w i is the i-th filtering basis function, the number of basis function A is the same as the basis function W, 1≤i≤3m 2 , A, W are the basis functions trained by the ICA method;
③对于任意一幅图片X,将其分成n个m×m的RGB小块,形成采样矩阵X=[x1,x2,…xn],其中xk是第k个图像块的向量化表示,1≤k≤n,对xk进行线性变换 
Figure FSB00000374613100012
其中W是训练好的滤波基函数,则Sk为基函数的对应系数,也就是图片小块xk的特征,sk,i为第i基函数对应系数,即为第i个特征的值,对所有的xk都做同样的处理,得到X的特征S=[S1,S2,...,Sn],n是输入图片X被分成RGB小块的个数,其值是由输入图片X的大小和m的取值所确定的;
③For any picture X, divide it into n m×m RGB small blocks to form a sampling matrix X=[x 1 , x 2 ,…x n ], where x k is the vectorization of the kth image block Indicates that 1≤k≤n, linear transformation of x k
Figure FSB00000374613100012
Where W is the trained filtering basis function, then S k is the corresponding coefficient of the basis function, which is the feature of the image block x k , s k, i is the corresponding coefficient of the i-th basis function, which is the value of the i-th feature , do the same process for all x k , get the feature S=[S 1 , S 2 ,...,S n ] of X, n is the number of the input picture X is divided into RGB small blocks, and its value is Determined by the size of the input image X and the value of m;
第二步,为每个特征衡量增量编码长度指标,具体如下:In the second step, the incremental encoding length index is measured for each feature, as follows: ①对于每个特征i,计算其激活率pi: 并令 
Figure FSB00000374613100014
则p为一个随机变量的概率密度分布,其熵为H(p);
① For each feature i, calculate its activation rate p i : and order
Figure FSB00000374613100014
Then p is the probability density distribution of a random variable, and its entropy is H(p);
②计算第i个特征的增量编码长度ICL(pi):②Calculate the incremental code length ICL(p i ) of the i-th feature: 第三步,依据这些增量编码长度指标,通过对各个特征的能量重新分配来处 理各个小块的显著度,从而最终得到显著地图,具体如下:The third step is to process the saliency of each small block by redistributing the energy of each feature according to these incremental encoding length indicators, so as to finally obtain a saliency map, as follows: ①根据所得到的各个特征的增量编码长度指标,划分显著特征集合SF:① According to the obtained incremental coding length index of each feature, divide the salient feature set SF: SF={i|ICL(pi)>0}SF={i|ICL(p i )>0} ②依照预测编码原则,在各个特征之间重新分配能量,对于显著特征集合内的特征i,分配权重di,i∈SF:②According to the principle of predictive coding, redistribute energy between each feature, and assign weight d i to feature i in the salient feature set, i∈SF:
Figure FSB00000374613100021
if,i∈SF
Figure FSB00000374613100021
if, i ∈ SF
而对于非显著特征,定义其权重  And for non-significant features, define its weight ③那么对于图片小块xk,其显著度定义为mk: 
Figure FSB00000374613100023
其中di为②步中所分配给第i个特征的权重;
③Then for a small image block x k , its saliency is defined as m k :
Figure FSB00000374613100023
where d i is the weight assigned to the i-th feature in step 2;
④有了各个图片小块的显著度之后,通过重构基A,生成整幅图片的显著地图M: 
Figure FSB00000374613100024
其中Ak表示重构基A的第k个列向量。 
④ After obtaining the saliency of each small block of the picture, the saliency map M of the whole picture is generated by reconstructing the base A:
Figure FSB00000374613100024
where A k represents the kth column vector of the reconstructed basis A.
CN2009100466886A 2009-02-26 2009-02-26 Dynamic vision caution region extracting method based on characteristic Expired - Fee Related CN101493890B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2009100466886A CN101493890B (en) 2009-02-26 2009-02-26 Dynamic vision caution region extracting method based on characteristic

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2009100466886A CN101493890B (en) 2009-02-26 2009-02-26 Dynamic vision caution region extracting method based on characteristic

Publications (2)

Publication Number Publication Date
CN101493890A CN101493890A (en) 2009-07-29
CN101493890B true CN101493890B (en) 2011-05-11

Family

ID=40924482

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2009100466886A Expired - Fee Related CN101493890B (en) 2009-02-26 2009-02-26 Dynamic vision caution region extracting method based on characteristic

Country Status (1)

Country Link
CN (1) CN101493890B (en)

Families Citing this family (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101493890B (en) * 2009-02-26 2011-05-11 上海交通大学 Dynamic vision caution region extracting method based on characteristic
US20120288003A1 (en) * 2010-01-15 2012-11-15 Thomson Licensing Llc Video coding using compressive sensing
CN101840518A (en) * 2010-04-02 2010-09-22 中国科学院自动化研究所 Biological vision mechanism-based object training and identifying method
EP3958573B1 (en) 2010-04-13 2023-06-07 GE Video Compression, LLC Video coding using multi-tree sub-divisions of images
TWI864983B (en) 2010-04-13 2024-12-01 美商Ge影像壓縮有限公司 Sample region merging
CN106067985B (en) 2010-04-13 2019-06-28 Ge视频压缩有限责任公司 Cross-Plane Prediction
BR122020008249B1 (en) 2010-04-13 2021-02-17 Ge Video Compression, Llc inheritance in a multitree subdivision arrangement sample
CN101866484B (en) * 2010-06-08 2012-07-04 华中科技大学 Method for computing significance degree of pixels in image
TWI478099B (en) * 2011-07-27 2015-03-21 Univ Nat Taiwan Learning visual attention prediction system and method thereof
CN102568016B (en) * 2012-01-03 2013-12-25 西安电子科技大学 Compressive sensing image target reconstruction method based on visual attention
CN104778704B (en) * 2015-04-20 2017-07-21 北京航空航天大学 Image attention method for detecting area based on random pan figure sparse signal reconfiguring
CN105426399A (en) * 2015-10-29 2016-03-23 天津大学 Eye movement based interactive image retrieval method for extracting image area of interest

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101493890A (en) * 2009-02-26 2009-07-29 上海交通大学 Dynamic vision caution region extracting method based on characteristic

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101493890A (en) * 2009-02-26 2009-07-29 上海交通大学 Dynamic vision caution region extracting method based on characteristic

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
LAURENT I,CHRISTOF K,ERNST N.A model of saliency-based visual attention for rapid scene anlysis[J].《IEEE Transactions on PAMI》.1998,第20卷(第11期),1254-1259. *

Also Published As

Publication number Publication date
CN101493890A (en) 2009-07-29

Similar Documents

Publication Publication Date Title
CN101493890B (en) Dynamic vision caution region extracting method based on characteristic
CN113591968A (en) Infrared weak and small target detection method based on asymmetric attention feature fusion
CN104240256B (en) A kind of image significance detection method based on the sparse modeling of stratification
CN108108751B (en) Scene recognition method based on convolution multi-feature and deep random forest
CN112487949B (en) Learner behavior recognition method based on multi-mode data fusion
CN101968850B (en) Method for extracting face feature by simulating biological vision mechanism
CN108334848A (en) A kind of small face identification method based on generation confrontation network
CN105590099B (en) A kind of more people's Activity recognition methods based on improvement convolutional neural networks
CN115797970B (en) Dense pedestrian target detection method and system based on YOLOv5 model
CN111027377B (en) A dual-stream neural network time-series action localization method
CN115375668B (en) Infrared single-frame small target detection method based on attention mechanism
CN116012709B (en) High-resolution remote sensing image building extraction method and system
CN117830783B (en) Sight estimation method based on local super-resolution fusion attention mechanism
CN103605993B (en) Image-to-video face identification method based on distinguish analysis oriented to scenes
CN103699874A (en) Crowd abnormal behavior identification method based on SURF (Speed-Up Robust Feature) stream and LLE (Locally Linear Embedding) sparse representation
CN117975577A (en) A deep fake detection method and system based on facial dynamic integration
Hussien et al. DeepFake video detection using vision transformer
CN110659724A (en) A construction method of target detection convolutional neural network based on target scale range
CN119888219B (en) A method and apparatus for dense tobacco shred segmentation based on an improved DeepLabv3+ network model
CN119339422B (en) A micro-expression recognition method based on visual Transformer
CN114519844A (en) Crowd density estimation method and system based on visual converter
CN101482917B (en) A face recognition system and method based on second-order two-dimensional principal component analysis
CN116168450B (en) An isolated sign language word recognition method based on event camera
CN102609732A (en) Object recognition method based on generalization visual dictionary diagram
CN111696070A (en) Multispectral image fusion power internet of things fault point detection method based on deep learning

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
C17 Cessation of patent right
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20110511

Termination date: 20140226