CN101493890B

CN101493890B - Dynamic vision caution region extracting method based on characteristic

Info

Publication number: CN101493890B
Application number: CN2009100466886A
Authority: CN
Inventors: 侯小笛; 祁航; 张丽清; 祝文骏
Original assignee: Shanghai Jiao Tong University
Current assignee: Shanghai Jiao Tong University
Priority date: 2009-02-26
Filing date: 2009-02-26
Publication date: 2011-05-11
Anticipated expiration: 2029-02-26
Also published as: CN101493890A

Abstract

The invention relates to a feature-based dynamic visual attention area extraction method in the technical field of machine vision. The steps are: use the independent component analysis method to sparsely decompose a large number of natural images, obtain a set of filtering basis functions and a corresponding set of reconstruction basis functions, divide the input image into m×m RGB small blocks, and project them to these On the group basis, the features of the graph are obtained; using the principle of effective coding, the incremental encoding length index is measured for each feature; the third step, according to these incremental encoding length indicators, the energy of each feature is redistributed to process each small The saliency of the block, so that the saliency map is finally obtained. The invention can eliminate "time slice" and continuous sampling, so that the data of different frames can guide the processing of salience together, solve the problem that the saliency of different frames needs to be processed independently, and realize dynamics.

Description

Feature-based Dynamic Visual Attention Region Extraction Method

技术领域technical field

本发明涉及的是一种图像处理技术领域的方法，具体地说，涉及的是一种基于特征的动态的视觉注意区域提取方法。The present invention relates to a method in the technical field of image processing, in particular to a feature-based dynamic visual attention region extraction method.

背景技术Background technique

随着人工智能技术地不断发展，机器视觉在现实生活中的应用越来越多，其主要用计算机来模拟人的视觉功能，但并非仅仅是人眼的简单延伸，更重要的是具有人脑的一部分功能——从客观事物的图像中提取信息，进行处理并加以理解，最终用于实际检测、测量和控制。由于机器视觉具有速度快、信息量大、功能多的特点，其在质量检测，身份认证，物体检测与识别，机器人，自动小车等的应用十分广泛。With the continuous development of artificial intelligence technology, machine vision is used more and more in real life. It mainly uses computers to simulate human visual functions, but it is not just a simple extension of the human eye, but more importantly, it has the human brain. A part of the function - to extract information from the image of objective things, process and understand it, and finally use it for actual detection, measurement and control. Because machine vision has the characteristics of fast speed, large amount of information, and multiple functions, it is widely used in quality inspection, identity authentication, object detection and recognition, robots, and automatic cars.

目前工程上已经可以做出在各个方面(包括视角，视锐度，广谱性，动态特性)都超过人眼能力的传感器，可以说对“视”的探索已经到达一定程度了，但机器视觉系统不仅需要“视”，还需要“觉”。由于人的选择性注意机制保证了人眼获取信息的高效性，吸引了人们的关注和研究，各种视觉注意区域的提取技术被提出并得到了广泛的应用。例如，利用基于选择性注意机制的视觉注意区域的提取技术来找到图像中的感兴趣区域，然后优先在这些区域中进行搜索，从而提高了物体检测和识别的效率；利用找到的感兴趣区域，进行高效的图片压缩(感兴趣的区域压缩比低，其他区域压缩比高)，和图片缩放(感兴趣的区域的放缩比例大于其他区域)，等等。视觉注意区域的提取技术在获取信息的高效性有着巨大的优势，因而经常出现中机器视觉系统处理过程中。At present, engineering has been able to produce sensors that exceed the capabilities of the human eye in all aspects (including viewing angle, visual acuity, broad-spectrum, and dynamic characteristics). It can be said that the exploration of "vision" has reached a certain level, but machine vision The system needs not only "vision", but also "feeling". Since the human selective attention mechanism ensures the high efficiency of human eyes to obtain information, it has attracted people's attention and research, and various extraction techniques of visual attention regions have been proposed and widely used. For example, using the visual attention region extraction technology based on the selective attention mechanism to find the regions of interest in the image, and then preferentially search in these regions, thereby improving the efficiency of object detection and recognition; using the region of interest found, Efficient image compression (low compression ratio in the region of interest, high compression ratio in other regions), and image scaling (the zoom ratio of the region of interest is larger than that of other regions), and so on. The extraction technology of visual attention area has a huge advantage in the efficiency of information acquisition, so it often appears in the process of machine vision system processing.

经对现有技术的文献检索发现，视觉注意区域的提取技术是Koch和Ullman于1985年提出的显著地图(Saliency Map)，后来该技术经Itti和Koch完善，形成一整套关于显著地图的体系。具体可以参考文献：″LAURENT I，CHRISTOF K，ERNST N.A model of saliency-based visual attention for rapid sceneanlysis[J].IEEE Transactions on PAMI，1998，20(11)：1254-1259″，(作者：LAURENT I，CHRISTOF K，ERNST N，题目：一个可用于快速场景分析的基于显著度(saliency)的视觉注意模型，杂志：模式分析和机器智能IEEE会刊，1998年20卷，第11期，1254-1259页)。该方法是基于空间的提取技术，首先将图片分成颜色，方向，亮度，纹理等几个平行的通道，然后对于每个通道分别抽取信息，形成一张保留了图片拓扑结构，但同时对特征的响应强弱有记录的特征地形图(feature maps)，接下来，每个特征地形图都经过一系列尺度的“墨西哥帽(Difference of Gaussian)”函数进行滤波，它是对两个不同尺度的高斯函数求差之后得到的函数。该函数对于检测变化非常敏感，而对于一而弥撒的信号反映非常弱，具有普遍的生物意义。接下来，使用竞争网络的胜者全赢(Winner-Take-All)模型对于不同的注意区域进行比较，最终生成一个关于全局每个点显著度的地图，称为显著地图。该方法以及后来的基于空间的分析技术虽然在很多场景中有很好的表现，但几乎都无可避免的面临着下面的问题：1)只能关注特定一部分视觉线索，2)注意力的分配在时间上是不连续的。例如，在对连续影像进行观测时，系统就无法考虑多帧的情况，这就导致每个时刻都需要单独地重新分析显著地图，使得系统的连续性，可靠性都大幅下降。而且，当视角以及物体的位置发生变化的时候，由于没有对特征的追踪机制，新的显著地图的预测很有可能与前帧发生偏移。此外，一系列视觉注意行为，例如返回抑制，以及视点转移等，都无法在基于空间的分析技术中得到很好的实现。According to the literature search of the existing technology, the extraction technology of visual attention area is the saliency map (Saliency Map) proposed by Koch and Ullman in 1985, which was later perfected by Itti and Koch to form a whole set of saliency map system. For details, please refer to the literature: "LAURENT I, CHRISTOF K, ERNST N.A model of saliency-based visual attention for rapid scene anlysis[J]. IEEE Transactions on PAMI, 1998, 20(11): 1254-1259", (Author: LAURENT I , CHRISTOF K, ERNST N, Title: A saliency-based visual attention model for fast scene analysis, Journal: IEEE Transactions on Pattern Analysis and Machine Intelligence, 1998, Vol. 20, No. 11, 1254-1259 Page). This method is a space-based extraction technology. First, the image is divided into several parallel channels such as color, direction, brightness, and texture, and then information is extracted for each channel to form a picture that retains the topological structure of the image, but at the same time extracts the information of the features. Response to feature maps with recorded strengths and weaknesses. Next, each feature map is filtered by a series of "Mexican hat (Difference of Gaussian)" functions, which are two different scales of Gaussian Function obtained after function difference. This function is very sensitive to detect changes, but very weak to the signal of a mass, which has general biological significance. Next, use the Winner-Take-All model of the competition network to compare different attention regions, and finally generate a map of the saliency of each point in the world, called a saliency map. Although this method and the subsequent spatial-based analysis techniques performed well in many scenes, they almost inevitably face the following problems: 1) only pay attention to a specific part of visual cues, 2) the distribution of attention It is not continuous in time. For example, when observing continuous images, the system cannot consider the situation of multiple frames, which leads to the need to re-analyze the saliency map separately at each moment, which greatly reduces the continuity and reliability of the system. Moreover, when the viewing angle and the position of the object change, since there is no tracking mechanism for features, the prediction of the new saliency map is likely to be offset from the previous frame. Furthermore, a range of visual attentional behaviors, such as inhibition of return, and viewpoint shift, are not well represented by spatially-based analysis techniques.

发明内容Contents of the invention

本发明的目的在于克服现有技术的不足，提供一种基于特征的动态视觉注意区域提取方法，该方法对显著度的定义是基于特征本身，而非特征的空间分布差异，可以消除“时间片”，连续的采样，从而不同帧(时间)的数据可以一起指导显著度的处理，解决了不同帧(时间)的显著度需要独立处理的问题，实现了动态性。The purpose of the present invention is to overcome the deficiencies of the prior art and provide a method for extracting dynamic visual attention regions based on features. The definition of salience in this method is based on the features themselves rather than the spatial distribution differences of the features, which can eliminate the "time slice" ", continuous sampling, so that the data of different frames (time) can guide the processing of saliency together, which solves the problem that the saliency of different frames (time) needs to be processed independently, and realizes dynamics.

本发明是通过以下技术方案实现的，包括以下步骤：The present invention is achieved through the following technical solutions, comprising the following steps:

第一步，采用独立成分分析方法对大量的自然图像进行稀疏分解，得到一组滤波基函数和对应的一组重构基函数，将输入的图像分成m×m的RGB小块，并投影到这组基上，得到该图的特征；In the first step, the independent component analysis method is used to sparsely decompose a large number of natural images, and a set of filtering basis functions and a corresponding set of reconstruction basis functions are obtained. The input image is divided into m×m RGB small blocks, and projected to Based on this group, the features of the graph are obtained;

第二步，利用有效编码原理，即当一个系统是有效编码时，其熵最大的原理，为每个特征衡量增量编码长度指标；The second step is to use the effective encoding principle, that is, when a system is effectively encoded, its entropy is the largest, and measure the incremental encoding length index for each feature;

第三步，依据这些增量编码长度指标，通过对各个特征的能量重新分配来处理各个小块的显著度，从而最终得到显著地图。In the third step, according to these incremental encoding length indicators, the saliency of each small block is processed by redistribution of the energy of each feature, so as to finally obtain a saliency map.

所述的第一步，具体如下：The first step, as described, is as follows:

①将训练图片分成若干个m×m像素大小的RGB彩色小块，并将每个小块向量化。对自然图片进行采样，得到大量的m×m的RGB彩色小块，将其作为训练样本。m的取值可以是8，16或32。m为每个RGB彩色小块的边长。① Divide the training picture into several RGB color blocks of m×m pixel size, and vectorize each small block. Sampling natural pictures to obtain a large number of m×m RGB color blocks, which are used as training samples. The value of m can be 8, 16 or 32. m is the side length of each RGB color block.

②通过标准的独立成分分析(ICA)方法，训练出基函数(A，W)。基函数的个数为m×m×3＝3m²，即 $W = [w_{1}, w_{2}, . . . w_{3 m^{2}}],$ 其中w_i为第i个滤波基函数(A的大小与W一样，1≤i≤3m²)。A，W是ICA方法训练出来的基函数，其值可以取任意范围，由输入决定。②Basic functions (A, W) are trained through the standard independent component analysis (ICA) method. The number of basis functions is m×m×3=3m ² , namely $W = [w_{1}, w_{2}, . . . w_{3 m^{2}}],$ Where w _i is the i-th filtering basis function (the size of A is the same as W, 1≤i≤3m ² ). A and W are the basis functions trained by the ICA method, and their values can be in any range, determined by the input.

③对于任意一幅图片X，将其分成n个m×m的RGB小块，形成采样矩阵X＝[x₁，x₂，…x_n]，其中x_k是第k个图像块的向量化表示(1≤k≤n)，对x_k进行线性变换 $S_{k} = W x_{k} = [s_{k, 1}, s_{k, 2}, . . ., s_{k, 3 m^{2}}],$ 其中W是训练好的滤波基函数。则S_k为基函数的对应系数，也就是图片小块x_k的特征，s_k，i为第i个基函数的对应系数，即为第i个特征的值。对所有的x_k都做同样的处理，得到X的特征S＝[S₁，S₂，...，S_n]。n是输入图片X被分成RGB小块的个数，其值是由输入图片X的大小和m的取值所确定的。③For any picture X, divide it into n m×m RGB small blocks to form a sampling matrix X=[x ₁ , x ₂ ,…x _n ], where x _k is the vectorization of the kth image block Indicates (1≤k≤n), linear transformation of x _k $S_{k} = W x_{k} = [{the s}_{k, 1}, {the s}_{k, 2}, . . ., {the s}_{k, 3 m^{2}}],$ where W is the trained filtering basis function. Then S _k is the corresponding coefficient of the basis function, that is, the feature of the image block x _k , and s _k,i is the corresponding coefficient of the i-th basis function, that is, the value of the i-th feature. Do the same process for all x _k to get the feature S=[S ₁ , S ₂ , . . . , S _n ] of X. n is the number of RGB blocks that the input image X is divided into, and its value is determined by the size of the input image X and the value of m.

第一步处理结束后，对于输入图片X，已经构造出3m²个特征S，接下来进行第二步。After the first step of processing, for the input image X, 3m ² features S have been constructed, and then the second step will be performed.

所述第二步，具体如下：The second step is as follows:

①对于每个特征，计算激活率p_i ① For each feature, calculate the activation rate p _i

${p p}_{i i} = = \frac{{Σ Σ}_{k k} {s the s}_{k k,, i i}^{22}}{{Σ Σ}_{j j} {Σ Σ}_{k k} {s the s}_{k k,, j j}^{22}} - - - - - - ((2.1 2.1))$

这个量代表了该特征的平均能量发放水平。This amount represents the average energy release level for that feature.

②考虑熵在第i个特征的激活率p_i上的变化，即第i个特征的增量编码长度指标，令 $p = {p_{1}, p_{2}, p_{3 m^{2}}},$ 为随机变量的概率分布。假设，某个特定时刻的特征激活率分布为p，当第i个特征被激活时会对p_i带来一个微小的扰动ε，因此新的分布

变成：②Considering the change of entropy on the activation rate p _i of the i-th feature, that is, the incremental encoding length index of the i-th feature, let

p = {p_{1}, p_{2}, p_{3 m^{2}}},

is the probability distribution of a random variable. Assuming that the feature activation rate distribution at a specific moment is p, when the i-th feature is activated, it will bring a small disturbance ε to p _i , so the new distribution

become:

$\overset{^^}{p p} = = \{\begin{matrix} \frac{{p p}_{j j} + + ϵ ϵ}{11 + + ϵ ϵ},, & if j if j = = i i \\ {p p}_{} \\ \frac{j j}{11 + + ϵ ϵ},, & if j if j &NotEqual; &NotEqual; i i \end{matrix} - - - - - - ((2.2 2.2))$

因此，第i个特征的增量编码长度为：Therefore, the incremental encoding length of the i-th feature is:

$ICL ICL (({p p}_{i i})) = = \frac{&PartialD; &PartialD; H h ((p p))}{&PartialD; &PartialD; {p p}_{i i}} = = 11 - - H h ((p p)) - - {p p}_{i i} - - log log {p p}_{i i} - - {p p}_{i i} log log {p p}_{i i} - - - - - - ((2.3 2.3))$

本发明借助预测编码的原理，将能量、特征与显著度挂钩。增量编码长度(ICL)衡量了每个特征的对感知的熵变化率。这一指标用来指导能量分配，从而实现系统整体上实现预测编码——即常见的信息会尽量少的使得系统产生响应，而稀有的信息通常会触发系统的强烈响应。The present invention links energy, features and salience by means of the principle of predictive coding. Incremental Coding Length (ICL) measures the rate of change of perceptual entropy for each feature. This metric is used to guide the energy allocation so that the system as a whole achieves predictive coding—that is, common information causes the system to respond as little as possible, while rare information usually triggers a strong system response.

所述第三步，具体如下：The third step is as follows:

①根据所得到出的各个特征的增量编码长度指标，划分显著特征集合SF：①Divide the salient feature set SF according to the obtained incremental coding length index of each feature:

SF＝{i|ICL(p_i)＞0}(3.1)SF={i|ICL(p _i )>0} (3.1)

划分{SF，SF}唯一确定了会导致整体系统的熵增的特征。并且该划分有着明确的数学意义，对于一个特征，只有当其它在特征分布上是稀有的，也就是说，当对该特征进行新的观测，会导致整体特征分布p的熵增加。The partition {SF, SF} uniquely identifies the features that lead to an increase in the entropy of the overall system. And this division has a clear mathematical meaning. For a feature, only when the other is rare in the feature distribution, that is, when a new observation is made on the feature, the entropy of the overall feature distribution p will increase.

②依照预测编码原则，在各个特征之间重新分配能量，对于显著特征集合内的特征i，分配权重d_i(i∈SF)：②According to the principle of predictive coding, redistribute energy between each feature, and assign weight d _i (i∈SF) to feature i in the salient feature set:

${d d}_{i i} = = \frac{ICL ICL (({p p}_{i i}))}{{Σ Σ}_{k k &Element; &Element; SF SF} ICL ICL (({P P}_{k k}))},, if i if i &Element; &Element; SF SF - - - - - - ((3.2 3.2))$

而对于非显著特征，定义其 $d_{k} = 0 (k &NotElement; SF) .$ And for non-significant features, define its $d_{k} = 0 (k &NotElement; SF) .$

③对于图片小块x_k，其显著度定义为m_k：③For a picture block x _k , its saliency is defined as m _k :

${m m}_{k k} = = {Σ Σ}_{i i &Element; &Element; SF SF} {d d}_{i i} {w w}_{i i}^{T T} {x x}_{k k} - - - - - - ((3.3 3.3))$

④有了各个图片小块的显著度之后，通过重构基A，生成整幅图片的显著地图M：④ After obtaining the saliency of each small block of the picture, the saliency map M of the whole picture is generated by reconstructing the base A:

$M m = = \underset{k k &Element; &Element; SF SF}{Σ Σ} {A A}_{k k} {m m}_{k k} - - - - - - ((3.4 3.4))$

其中A_k表示重构基A的第k个列向量。where A _k represents the kth column vector of the reconstructed basis A.

从公式(3.3)中可以看出，对于图片小块的显著度不是常量，而是会随着时间而发生变化。并且，由于在本发明的方法中，采样是一种连续过程，特征的权重会随着采样的增加连续变化，那么就可以成功地把采样变化理解为上下文对特征注意权重的影响。所谓“显著特征”，之所以显著，都是相对于当前上下文的特征分布而言的。It can be seen from formula (3.3) that the saliency of a small image block is not constant, but changes with time. And, because in the method of the present invention, sampling is a continuous process, and the weight of the feature will change continuously with the increase of sampling, then the sampling change can be successfully understood as the influence of the context on the weight of feature attention. The so-called "significant features" are significant because they are relative to the feature distribution of the current context.

本发明的有益效果是：(1)由于采用的过滤基是预先训练好的，因此在处理一张新的输入图片时，不需要重新训练基函数，使得处理速度非常快，效率高，可以做到实时处理。(2)由于采用了基于特征本身，而非特征的空间分布差异的方式来对显著度进行分析，消除了对图片空间结构上的限制。在处理上，由于连续的采样，消除了“时间片”，从而不同帧(时间)的数据可以一起指导显著度的处理，解决了不同帧(时间)的显著度需要独立处理的问题，实现了动态性。The beneficial effects of the present invention are: (1) because the filter base that adopts is trained in advance, so when processing a new input picture, do not need to train the base function again, make processing speed very fast, efficiency is high, can do to real-time processing. (2) Since the saliency is analyzed based on the feature itself rather than the difference in the spatial distribution of the feature, the limitation on the spatial structure of the image is eliminated. In terms of processing, due to the continuous sampling, the "time slice" is eliminated, so that the data of different frames (time) can guide the processing of saliency together, which solves the problem that the saliency of different frames (time) needs to be processed independently, and realizes dynamic.

附图说明Description of drawings

图1.静态图片的显著地图；Figure 1. Saliency maps for static images;

其中：(a)、(d)、(g)为输入图片，(b)、(e)、(h)为本发明生成的显著地图，(c)、(f)、(i)为标注的眼动数据。Among them: (a), (d), (g) are input pictures, (b), (e), (h) are salient maps generated by the present invention, (c), (f), (i) are marked eye movement data.

图2.视屏(动态视觉)的显著地图。Figure 2. Saliency map of video screens (dynamic vision).

具体实施方式Detailed ways

下面结合附图对本发明的实施例作详细说明：本实施例在以本发明技术方案为前提下进行实施，给出了详细的实施方式和具体的操作过程，但本发明的保护范围不限于下述的实施例。The embodiments of the present invention are described in detail below in conjunction with the accompanying drawings: this embodiment is implemented on the premise of the technical solution of the present invention, and detailed implementation methods and specific operating procedures are provided, but the protection scope of the present invention is not limited to the following the described embodiment.

1.特征构造1. Feature construction

(1)采用的RGB彩色小块大小为8×8，通过对大量的自然图片进行采样，得到120000个8×8的RGB彩色小块，这些RGB彩色小块为基函数的训练数据。(1) The size of the RGB color block used is 8×8. By sampling a large number of natural pictures, 120,000 8×8 RGB color blocks are obtained. These RGB color blocks are the training data of the basis function.

(2)利用ICA方法来训练基函数(A，W)，由于采用8×8的RGB彩色小块来作为训练样本，即m＝8，因此基函数的个数为3×8²＝192。(2) Use the ICA method to train the basis functions (A, W). Since 8×8 RGB color blocks are used as training samples, that is, m=8, the number of basis functions is 3×8 ² =192.

(3)对于输入彩色图片，例如图片的大小为800×640，将其分成8000个8×8的RGB彩色小块，即n＝8000，形成采样矩阵X＝[x₁，x₁，...，x₈₀₀₀]，其中x_k是第k个图像块的向量化表示，对其进行线性变换S_k＝Wx_k＝[s_k，1，s_k，2，...，s_k，192]，其中W是训练好的滤波基函数。则S_k为基函数的对应系数，也就是图片小块x_k的特征。s_k，i为第i基函数对应系数，即为第i个特征。(3) For the input color picture, for example, the size of the picture is 800×640, divide it into 8000 8×8 RGB color blocks, that is, n=8000, and form a sampling matrix X=[x ₁ , x ₁ , .. ., x ₈₀₀₀ ], where x _k is the vectorized representation of the kth image block, and it is linearly transformed S _k = Wx _k = [s _{k, 1} , s _{k, 2} ,..., s _{k, 192} ], where W is the trained filtering basis function. Then S _k is the corresponding coefficient of the basis function, that is, the feature of the picture block x _k . s _{k, i} is the coefficient corresponding to the i-th basis function, which is the i-th feature.

2.衡量增量编码长度(ICL)指标2. Measuring Incremental Code Length (ICL) Index

(1)对各个特征，依据公式(2.1)计算其激活率p。(1) For each feature, calculate its activation rate p according to formula (2.1).

(2)根据各个特征的激活率，根据公式(2.3)衡量其增量编码长度指标(2) According to the activation rate of each feature, measure its incremental encoding length index according to the formula (2.3)

3.生成显著地图3. Generate saliency map

(1)根据2中得到的各个特征的增量编码长度指标，利用公式(3.1)划分显著特征集合SF。(1) According to the incremental coding length index of each feature obtained in 2, use the formula (3.1) to divide the salient feature set SF.

(2)利用公式(3.2)，重新分配显著特征集合内的各特征的能量。(2) Using formula (3.2), redistribute the energy of each feature in the salient feature set.

(3)对于图片小块x_k，根据公式(3.3)，处理其显著度m_k (3) For the picture block x _k , according to formula (3.3), process its saliency m _k

(4)有了输入图片的各个小块的显著度，利用公式(3.3)，得到输入图片的显著地图M。(4) With the saliency of each small block of the input image, use the formula (3.3) to obtain the saliency map M of the input image.

实例一：静止图片的显著地图Example 1: Saliency Maps for Still Images

采用8×8的RGB小块来训练基函数(A，W)，它们的维数为192。8×8 RGB patches are used to train the basis functions (A, W), and their dimensionality is 192.

对于大小为800×640的输入图片，将其分成8000个8×8的RGB彩色小块，即n＝8000，形成采样矩阵X＝[x₁，x₁，...，x₈₀₀₀]。并通过公式S＝WX来计算基函数对应系数，即X的特征。For an input picture with a size of 800×640, divide it into 8000 8×8 RGB color blocks, ie n=8000, to form a sampling matrix X=[x ₁ , x ₁ , . . . , x ₈₀₀₀ ]. And the corresponding coefficients of the basis functions, that is, the characteristics of X, are calculated by the formula S=WX.

通过公式(2.1)得到各个特征激活率p，并依据p和公式(2.3)衡量各个特征的增量编码长度指标。The activation rate p of each feature is obtained by formula (2.1), and the incremental coding length index of each feature is measured according to p and formula (2.3).

根据各个特征的增量编码长度指标和公式(3.1)划分显著特征集合SF，并利用公式(3.2)，重新分配显著特征集合内的各特征的能量。那么对于图片小块x_k，根据公式(3.3)，处理其显著度m_k，并最终利用公式(3.4)，生成输入图片的显著地图M。Divide the salient feature set SF according to the incremental coding length index of each feature and the formula (3.1), and use the formula (3.2) to redistribute the energy of each feature in the salient feature set. Then, for the picture block x _k , according to the formula (3.3), process its saliency m _k , and finally use the formula (3.4) to generate the saliency map M of the input picture.

当顺序地对一张静态图片上的每个图像小块采样的时候，就可以估计该图片的特征分布特性，从而构筑出显著地图，将生成出的显著地图会进一步和人的眼动数据进行对比分析，以验证模型的正确性。图1中，(a)、(d)、(g)为输入图片，(b)、(e)、(h)为本发明生成的显著地图，(c)、(f)、(i)为标注的眼动数据。在实施例中，采用了文献″BRUCE N，TSOTSOS J.Saliency Basedon Information Maximization[J]，Advances in Neural InformationProcessing Systems，2006，18：155-162″(作者：BRUCE N，TSOTSOS J.题目：基于信息量最大化的显著度，杂志：高级神经信息处理系统，2006年18期，第155-162页)所提供的眼动数据作为基准，比较了的模型与传统的模型，结果表明本发明取得了最佳成绩。When sampling each small image block on a static image sequentially, the feature distribution characteristics of the image can be estimated, thereby constructing a saliency map, and the generated saliency map will be further compared with human eye movement data Comparative analysis to verify the correctness of the model. In Figure 1, (a), (d), (g) are input pictures, (b), (e), (h) are salient maps generated by the present invention, (c), (f), (i) are Annotated eye movement data. In the embodiment, the document "BRUCE N, TSOTSOS J.Saliency Basedon Information Maximization [J], Advances in Neural Information Processing Systems, 2006, 18: 155-162" (Author: BRUCE N, TSOTSOS J. Topic: Based on Information The salience degree of quantity maximization, magazine: Advanced Neural Information Processing System, 2006, No. 18, pages 155-162) provided eye movement data as a benchmark, compared the model and the traditional model, the results show that the present invention has achieved Best score.

实例二：视屏中的显著地图Example 2: Salient map in video

相比以往同类方法，本发明的方法的一大优点在于它是连续的。增量编码长度是一个连续更新的过程。特征激活率的分布的变化可以是基于空域的，也可以是居于时域的。如果考虑时域变化是一个拉普拉斯分布的话，假定p^t是第t帧，那么可以认为p^t是以前特征响应的累积和：Compared with previous similar methods, a great advantage of the method of the present invention is that it is continuous. Incremental encoding length is a process of continuous updating. The change of the distribution of the feature activation rate can be based on the space domain or the time domain. If considering that the time domain change is a Laplace distribution, assuming that p ^t is the tth frame, then p ^t can be considered as the cumulative sum of the previous characteristic responses:

${p p}^{t t} = = \frac{11}{Z Z} {Σ Σ}_{τ τ = = 00}^{t t - - 11} exp exp ((\frac{τ τ - - t t}{λ λ})) {\overset{^^}{p p}}^{τ τ}$

其中λ是半衰期， $Z = &Integral; {\hat{p}}^{t} (x) dx$ 是标准化函数。where λ is the half-life, $Z = &Integral; {\hat{p}}^{t} (x) dx$ is a standardized function.

在对视屏做视觉注意提取的时候，通常面临目标运动和观测视角运动的问题。然而，在的基于特征的注意模型框架下，这些问题都迎刃而解，因为特征总会随着物体在视野中的位置而移动。When extracting visual attention from a video screen, it usually faces the problem of target movement and observation perspective movement. However, under the framework of the feature-based attention model, these problems are easily solved, because the features always move with the position of the object in the field of view.

分析图像的信噪比(SNR)，其定义如下：Analyze the signal-to-noise ratio (SNR) of the image, which is defined as follows:

$SNR SNR ((t t)) = = \frac{{Σ Σ}_{i i &Element; &Element; F f} {m m}_{i i}^{t t}}{{Σ Σ}_{j j &NotElement; &NotElement; F f} {m m}_{j j}^{t t}}$

其中F是手工标注的“前景”。当分别对250帧画面进行手工标注后，就对每一帧处理其显著度，其过程除了特征激活率p不同外，其他过程与生成静止图片的显著地图是一致的。之后，可以将生成的显著地图与手工标注进行对比，分析信噪比值。图2反映了结果，图中第一行为视频的截图，第二行反应了本发明的信噪比，最后一行则为Itti模型的信噪比，从图中可以看出，本发明的平均信噪比为0.4803，远好于现主流的Itti模型的0.1680。where F is the manually annotated "foreground". After manually annotating 250 frames, the saliency of each frame is processed. The process is consistent with the saliency map of still pictures except that the feature activation rate p is different. Afterwards, the generated saliency map can be compared with manual annotations to analyze the signal-to-noise ratio value. Fig. 2 reflects the result, among the figure, the screenshot of the first line video, the second line reflects the signal-to-noise ratio of the present invention, and the last line is the signal-to-noise ratio of the Itti model, as can be seen from the figure, the average signal-to-noise ratio of the present invention The noise ratio is 0.4803, much better than the 0.1680 of the current mainstream Itti model.

Claims

1. a feature-based dynamic visual attention region extraction method is characterized in that comprising the following steps:

In the first step, the independent component analysis method is used to sparsely decompose a large number of natural images, and a set of filtering basis functions and a corresponding set of reconstruction basis functions are obtained. The input image is divided into m×m RGB small blocks, and projected to On this set of filtering basis functions, the characteristics of the graph are obtained, as follows:

① Divide the training picture into several small RGB color blocks of m×m pixel size, and vectorize each small block, and sample the natural picture to obtain a large number of m×m RGB color blocks, which are used as training samples , the value of m is 8, 16 or 32, and m is the side length of each RGB color block;

②Basic functions (A, W) are trained through the standard independent component analysis method, and the number of basic functions is m×m×3=3m ² , namely

Where w _i is the i-th filtering basis function, the number of basis function A is the same as the basis function W, 1≤i≤3m ² , A, W are the basis functions trained by the ICA method;

③For any picture X, divide it into n m×m RGB small blocks to form a sampling matrix X=[x ₁ , x ₂ ,…x _n ], where x _k is the vectorization of the kth image block Indicates that 1≤k≤n, linear transformation of x _k

Where W is the trained filtering basis function, then S _k is the corresponding coefficient of the basis function, which is the feature of the image block x _k , s _{k, i} is the corresponding coefficient of the i-th basis function, which is the value of the i-th feature , do the same process for all x _k , get the feature S=[S ₁ , S ₂ ,...,S _n ] of X, n is the number of the input picture X is divided into RGB small blocks, and its value is Determined by the size of the input image X and the value of m;

In the second step, the incremental encoding length index is measured for each feature, as follows:

① For each feature i, calculate its activation rate p _i : and order

Then p is the probability density distribution of a random variable, and its entropy is H(p);

②Calculate the incremental code length ICL(p _i ) of the i-th feature:

The third step is to process the saliency of each small block by redistributing the energy of each feature according to these incremental encoding length indicators, so as to finally obtain a saliency map, as follows:

① According to the obtained incremental coding length index of each feature, divide the salient feature set SF:

SF＝{i|ICL(p _i )＞0}

②According to the principle of predictive coding, redistribute energy between each feature, and assign weight d _i to feature i in the salient feature set, i∈SF:

if, i ∈ SF

And for non-significant features, define its weight

③Then for a small image block x _k , its saliency is defined as m _k :

where d _i is the weight assigned to the i-th feature in step 2;

④ After obtaining the saliency of each small block of the picture, the saliency map M of the whole picture is generated by reconstructing the base A:

where A _k represents the kth column vector of the reconstructed basis A.