CN102054178A

CN102054178A - Chinese painting image identifying method based on local semantic concept

Info

Publication number: CN102054178A
Application number: CN2011100233154A
Authority: CN
Inventors: 鲍泓; 冯松鹤; 张南; 娄海涛; 王迪菲; 潘卫国
Original assignee: Beijing Union University
Current assignee: Beijing Union University
Priority date: 2011-01-20
Filing date: 2011-01-20
Publication date: 2011-05-11
Anticipated expiration: 2031-01-20
Also published as: CN102054178B

Abstract

The present invention relates to a kind of Chinese painting image recognition method based on partial semantic concept, comprising the following steps: 1) using a scanning device to collect images of Chinese painting works to be recognized, and storing them in a computer; 2) using a random extractor to extract the collected Chinese paintings Work images are divided into training sample set and test sample set; 3) Extract training sample set and test sample set through the visual attention model respectively in the image of the salient area in the traditional Chinese painting image in the sample set; 4) The Chinese painting image in the training sample set and corresponding 5) According to the bag-of-words model space pyramid model, and generate two corresponding space pyramid feature histograms; 6) adopt the method of serial merging to generate in step 5) Two spatial pyramid feature histograms are fused; 7) Utilize more than one classification method in clustering method, K nearest neighbor method, neural network and support vector machine method to identify the traditional Chinese painting image to be identified in the test sample set, and identify accurately Output recognition results in the form of rate and confusion matrix.

Description

A Chinese Painting Image Recognition Method Based on Partial Semantic Concept

技术领域technical field

本发明涉及一种图像识别方法，特别是关于一种基于局部语义概念的国画图像识别方法。The invention relates to an image recognition method, in particular to a traditional Chinese painting image recognition method based on local semantic concepts.

背景技术Background technique

近年来，基于语义图像分类和标注技术已经成为与CBIR(基于内容的图像检索)密切相关的研究热点，因其将图像按照高层的语义进行合理的分类，不仅会大大提高基于语义的图像检索的性能，且能在一定程度上弥补“语义鸿沟”。不同于传统的数字图像，国画作品所涵盖的语义信息更加丰富，也更加抽象。若能实现国画图像的自动分类及标注，则其在数字化书画博物馆领域能得到广泛的应用，并成为数字化图书馆等重大研究项目中的关键技术。In recent years, semantic-based image classification and annotation technology has become a research hotspot closely related to CBIR (Content-Based Image Retrieval). Performance, and can bridge the "semantic gap" to a certain extent. Different from traditional digital images, the semantic information covered by traditional Chinese paintings is richer and more abstract. If the automatic classification and labeling of Chinese painting images can be realized, it can be widely used in the field of digital calligraphy and painting museums, and become a key technology in major research projects such as digital libraries.

图像场景分类的目的是将图像整体归类到某一场景类别中去。国内外现有的研究工作大多集中于自然图像的场景分类，即研究如何自动地实现将待识别图像归类到一组语义类别中去(如海滩，山脉等)的课题。针对中国书画的数字图像研究领域，国画图像分类方法的主要集中在基于低层视觉特征的表示，并利用支持向量机、决策树算法等实现国画图像的自动分类，目前尚未发现针对国画图像的基于中层语义建模的图像分类方法。The purpose of image scene classification is to classify the image as a whole into a certain scene category. Most of the existing research work at home and abroad focuses on the scene classification of natural images, that is, how to automatically classify the images to be recognized into a set of semantic categories (such as beaches, mountains, etc.). For the research field of digital images of Chinese calligraphy and painting, the classification methods of Chinese painting images mainly focus on the representation based on low-level visual features, and use support vector machines, decision tree algorithms, etc. to realize the automatic classification of Chinese painting images. Image Classification Approaches for Semantic Modeling.

为了弥合语义鸿沟以及应对复杂场景的图像识别问题，研究者通过实现对图像的场景进行语义建模，来达到图像场景分类的目的。基于局部语义概念的图像中层表示方法因其不依赖图像分割的结果而表现出良好的分类性能，目前成为主流方法。In order to bridge the semantic gap and deal with image recognition problems in complex scenes, researchers achieve the purpose of image scene classification by implementing semantic modeling of image scenes. Layer representation methods in images based on local semantic concepts have become the mainstream methods because they do not depend on the results of image segmentation and show good classification performance.

2005年Fei-Fei提出了一种新的用于自然场景分类的贝叶斯层次化模型。不同于前人的工作，该方法不需要专门标注的训练样本集，而是通过局部区域的聚类形成词包来表示图像，最终在一个包含13类大规模的复杂场景集上实验得到了满意的分类性能。2005年的Quelhas和2006年的Bosch分别提出了结合Bag of words和pLSA模型，两者的区别在于提取局部描述子的方法不同。前者是基于稀疏的SIFT描述子，后者是密集的SIFT描述子。2006年Perronin提出了基于Bag of words和GMM(GaussianMixture Models，高斯混合模型)的图像分类思路，该方法能描述所有的被承认图像类别的图像内容，并且能通过训练典型的类样本数据学习得到改进后自适应的类字典。前人基于可视词典的方法都是用单一的直方图来描述图像，而该方法的创新点在于用一系列的直方图来描述一幅图像。In 2005, Fei-Fei proposed a new Bayesian hierarchical model for natural scene classification. Different from previous work, this method does not require a specially labeled training sample set, but forms a bag of words through clustering of local regions to represent images, and is finally satisfied with the experiment on a large-scale complex scene set containing 13 categories classification performance. Quelhas in 2005 and Bosch in 2006 respectively proposed the combination of Bag of words and pLSA models. The difference between the two lies in the different methods of extracting local descriptors. The former is based on sparse SIFT descriptors and the latter is dense SIFT descriptors. In 2006, Perronin proposed an image classification idea based on Bag of words and GMM (Gaussian Mixture Models, Gaussian Mixture Model). This method can describe the image content of all recognized image categories and can be improved by training typical class sample data learning. Post-adaptive dictionary-like. Previous methods based on visual dictionaries used a single histogram to describe an image, but the innovation of this method is to use a series of histograms to describe an image.

上述几种方法虽然有效，但是均没有考虑和利用图像中的空间结构信息，在复杂的自然图像场景分类系统中，这种空间结构的语境信息(如邻近的局部对象间的空间关系或某些场景中物体的绝对位置)可进一步提高分类器的性能，有助于得到更好的分类结果。2006年Lazebnik提出了高于Bag of Words的空间金字塔匹配的分类算法。该方法通过将图像划分成渐渐变小的子区域和计算每个子区域块的局部特征直方图，然后利用得到的这些局部特征直方图表示图像。“空间金字塔”是对无序的特征包的图像表示方法的一种简单、高效计算的扩展结果，并且在非常有挑战性的场景分类问题上表现出重要的、深远的改进性能。但该方法对于背景区域大的图像样本库，分类结果会存在偏差。Although the above methods are effective, they do not consider and utilize the spatial structure information in the image. In the complex natural image scene classification system, the context information of this spatial structure (such as the spatial relationship between adjacent local objects or a certain The absolute position of objects in some scenes) can further improve the performance of the classifier and help to get better classification results. In 2006, Lazebnik proposed a classification algorithm for spatial pyramid matching higher than Bag of Words. This method divides the image into gradually smaller sub-regions and calculates the local feature histogram of each sub-region block, and then uses these local feature histograms to represent the image. The "Spatial Pyramid" is a simple, computationally efficient extension of image representations to unordered feature bags, and demonstrates important, far-reaching improvements in performance on the very challenging scene classification problem. However, for the image sample library with a large background area, the classification results of this method will be biased.

发明内容Contents of the invention

针对上述问题，本发明的目的是提出了一种全局图像特征和局部图像特征相融合的基于局部语义概念的国画图像识别方法。In view of the above problems, the purpose of the present invention is to propose a Chinese painting image recognition method based on local semantic concepts that combines global image features and local image features.

为了实现上述目的，本发明采用以下技术方案：1、一种基于局部语义概念的国画图像识别方法，其包括以下步骤：1)利用扫描设备对待识别的国画作品进行图像采集，并存入计算机中；2)通过随机抽取器将采集到的国画作品图像分成训练样本集和测试样本集；3)通过视觉注意力模型分别提取训练样本集和测试样本集内国画作品图像中的显著区域图像；4)对训练样本集内的国画作品图像和相应的显著区域图像，分别建立国画作品图像的词包模型；5)根据训练样本集内建立的国画作品图像词包模型和相应的显著区域图像词包模型，分别构建国画作品图像的空间金字塔模型和相应的显著区域图像的空间金字塔模型，并生成相应的两个空间金字塔特征直方图；6)采用串行合并的方法对步骤5)中生成的两个空间金字塔特征直方图进行融合；7)利用聚类方法、K近邻法、神经网络和支持向量机方法中的一种以上分类方法对测试样本集中待识别的国画图像进行识别，用识别准确率和混淆矩阵的方式输出识别结果。In order to achieve the above object, the present invention adopts the following technical schemes: 1, a kind of Chinese painting image recognition method based on local semantic concept, it comprises the following steps: 1) utilize scanning equipment to carry out image collection of the Chinese painting work to be recognized, and store in the computer ; 2) divide the collected Chinese painting works image into a training sample set and a test sample set by a random extractor; 3) extract the salient area images in the training sample set and the test sample set in the Chinese painting works image respectively by a visual attention model; 4 ) To the Chinese painting works image and the corresponding salient region image in the training sample set, set up the word bag model of the Chinese painting work image respectively; 5) According to the Chinese painting work image bag word model established in the training sample set and the corresponding salient region image word bag model, respectively constructing the spatial pyramid model of the image of traditional Chinese painting and the spatial pyramid model of the corresponding salient region image, and generating two corresponding spatial pyramid feature histograms; 7) Utilize more than one classification method in clustering method, K nearest neighbor method, neural network and support vector machine method to identify the traditional Chinese painting image to be identified in the test sample set, and use the recognition accuracy rate Output recognition results in the form of confusion matrix.

所述步骤2)中训练样本集和测试样本集的生成方法包括：①定义国画图像的类别，类别编号为1～n，n为自然数；②假设用于待识别的国画作品图像代表集为P，记为{P₁，P₂，P₃}。其中P₁表示花鸟画，记为

A_i为其中的一幅国画图像，P₂表示人物画，记为P₂＝{B₁，B₂，...，B_i}，B_i为其中的一幅国画图像，P₃表示山水画，记为P₃＝{C₁，C₂，...，C_i}，C_i为其中的一幅国画图像；③分别从P₁、P₂和P₃中随机选取设定数量的图像作为训练样本集Q，记为{P₁′，P₂′，P₃′}，用于生成国画图像识别的模型；将P₁、P₂和P₃中剩余的图像作为测试样本集

，用于校准。Described step 2) in the generation method of training sample set and test sample set include: 1. define the category of Chinese painting image, category number is 1～n, and n is a natural number; , recorded as {P ₁ , P ₂ , P ₃ }. Among them, P ₁ represents flower-and-bird painting, denoted as

A _i is one of the traditional Chinese painting images, P ₂ represents figure painting, recorded as P ₂ ={B ₁ , B ₂ ,...,B _i }, B _i is one of the traditional Chinese painting images, P ₃ represents landscape painting , recorded as P ₃ ={C ₁ , C ₂ ,...,C _i }, where C _i is one of the traditional Chinese painting images; ③Randomly select a set number of images from P ₁ , P ₂ and P ₃ respectively As a training sample set Q, denoted as {P ₁ ′, P ₂ ′, P ₃ ′}, it is used to generate a model for Chinese painting image recognition; the remaining images in P ₁ , P ₂ and P ₃ are used as a test sample set

, for calibration.

所述步骤4)中建立国画图像的词包模型，包含以下步骤：①国画图像的灰度化，分别对训练样本集和显著区域图像中彩色国画图像按如下公式进行灰度化处理：Gray(i，j)＝0.11*R(i，j)+0.59*G(i，j)+0.3*B(i，j)；其中i，j是一个像素点在图像中的位置，R(i，j)是i，j所表示的像素点颜色的红色分量，G(i，j)，B(i，j)分别表示绿色和蓝色分量，Gray(i，j)表示该点转换后的灰度级别；②分别对步骤①得到的灰度图像选取SIFT(Scale-invariant feature transform，尺度不变特征转换)描述子的关键点，利用关键点邻域像素的梯度方向分布特性为每个关键点指定方向参数，生成SIFT特征向量，并根据需要对SIFT特征向量进行光照归一化处理；③根据步骤②得到的国画原图和国画显著区域图像的SIFT特征向量来，分别构建视觉词汇表；视觉词汇表包含K个视觉单词，K为自然数，一般取值为500-1200，建议K取为1000；④利用得到的两个视觉词汇表，进行局部语义概念特征的提取和表示，即计算某一个SIFT关键点邻域内的SIFT特征与视觉词汇表中的每个视觉单词所对应SIFT特征的欧氏距离，用最近邻的视觉单词来定义该SIFT关键点，将所有的SIFT关键点映射到视觉词汇表中，用视觉单词的标号描述这幅图像，即得到该图像的局部语义概念特征采用直方图特征表示法来表示该图像的局部语义概念特征。Said step 4) sets up the bag of words model of Chinese painting image, comprises the following steps: 1. the grayscale of Chinese painting image carries out grayscale processing to the color Chinese painting image in the training sample set and the salient region image respectively according to the following formula: Gray( i, j)=0.11*R(i, j)+0.59*G(i, j)+0.3*B(i, j); where i, j is the position of a pixel in the image, R(i, j) is the red component of the pixel color represented by i, j, G(i, j), B(i, j) represent the green and blue components respectively, and Gray(i, j) represents the converted gray of the point ② select the key points of the SIFT (Scale-invariant feature transform) descriptor for the grayscale image obtained in step ① respectively, and use the gradient direction distribution characteristics of the key point neighborhood pixels to generate Specify the direction parameters, generate SIFT feature vectors, and perform illumination normalization processing on the SIFT feature vectors as needed; ③Construct the visual vocabulary according to the SIFT feature vectors of the original Chinese painting image and the prominent area image of Chinese painting obtained in step ②; The vocabulary contains K visual words, K is a natural number, the general value is 500-1200, and K is recommended to be 1000; ④Use the two obtained visual vocabularies to extract and represent local semantic concept features, that is, to calculate a certain The Euclidean distance between the SIFT feature in the neighborhood of the SIFT key point and the SIFT feature corresponding to each visual word in the visual vocabulary, using the nearest neighbor visual word to define the SIFT key point, and mapping all SIFT key points to the visual vocabulary In the table, the image is described by the label of the visual word, that is, the local semantic concept feature of the image is obtained. The histogram feature representation is used to represent the local semantic concept feature of the image.

所述步骤4)的②中选取SIFT描述子的关键点的步骤如下：A、对国画原图采用网格采样法进行采样；B、对国画显著区域图像采用尺度空间极值检测方法。The step of selecting the key point of the SIFT descriptor in 2. of said step 4) is as follows: A, the original image of the traditional Chinese painting is sampled by grid sampling;

所述步骤5)中构建空间金字塔模型包括以下步骤：①将国画图像在二维图像空间中划分为不同大小的子图像区域，形成空间金字塔分块；空间金字塔层数为2-5；②对形成的空间金字塔分块图像构建相应的空间金字塔特征直方图。Described step 5) constructing space pyramid model comprises the following steps: 1. traditional Chinese painting image is divided into sub-image areas of different sizes in two-dimensional image space, forms space pyramid sub-block; Space pyramid layer number is 2-5; 2. The formed spatial pyramid block image constructs the corresponding spatial pyramid feature histogram.

所述步骤6)中，对两个空间金字塔特征直方图的融合包括以下两种方法之一：一种是将两组特征向量首尾相连生成一个联合向量作为新的特征向量，在更高维的向量空间进行特征提取，即串行组合；另一种是利用复向量将同一样本的两组特征向量合并在一起，在复向量空间进行特征提取，即并行组合。In the step 6), the fusion of the two spatial pyramid feature histograms includes one of the following two methods: one is to connect two groups of feature vectors end to end to generate a joint vector as a new feature vector, in a higher dimension The other is to use complex vectors to combine two sets of feature vectors of the same sample, and perform feature extraction in complex vector space, that is, parallel combination.

所述步骤7)中采用支持向量机方法进行分类时的步骤如下：①分类器模型的生成；采用LIBSVM-fast工具包进行识别实验，训练生成分类器模型所需的参数为options＝’-t4-s0-b1-c1’，其表示的含义是核函数为交叉核函数，SVM类型为C-svc；C-svc惩罚系数为1，且需要概率估计；②输出测试样本集中待识别国画图像的结果；利用步骤3)～6)处理测试样本集中待识别的国画图像，得到对应的特征向量，并将其输入训练好的分类器模型，根据分类器模型的公式即可得到图像的分类结果；③识别结果评价方法包括识别准确率和混淆矩阵两种方法。The step when adopting support vector machine method to classify in described step 7) is as follows: 1. the generation of classifier model; Adopt LIBSVM-fast tool kit to carry out recognition experiment, training generates the required parameter of classifier model as options='-t4 -s0-b1-c1', which means that the kernel function is a cross kernel function, and the SVM type is C-svc; the C-svc penalty coefficient is 1, and probability estimation is required; ② output the Chinese painting image to be recognized in the test sample set Result; Utilize steps 3)～6) to process the Chinese painting image to be identified in the test sample set, obtain the corresponding feature vector, and input it into the trained classifier model, the classification result of the image can be obtained according to the formula of the classifier model; ③Recognition result evaluation methods include recognition accuracy and confusion matrix.

本发明由于采取以上技术方案，其具有以下优点：本发明和Lazebnik等人提出的自然场景图像分类方法相比，引入提取全局国画图像(全局图像)中的局部显著区域图像(局部图像)，针对全局图像和局部图像利用不同的方法提取图像中SIFT描述子关键点信息，并且实现了全局图像和局部图像的局部语义概念特征的融合，这样同时对全局国画图像和局部显著图像进行局部语义概念特征信息的分析，能够获取更多有助于分类识别和更具辨别力的特征信息，因此能提高国画图像分类识别的准确率。本发明和James.Wang以及蒋树强等提出的利用低层视觉特征实现的国画场景图像分类方法相比具有更强的扩展性。并且将中层语义建模分类方法拓展了到国画图像的应用领域。The present invention has the following advantages due to taking the above technical scheme: compared with the natural scene image classification method proposed by people such as Lazebnik, the present invention introduces and extracts the local salient region image (partial image) in the global traditional Chinese painting image (global image), for The global image and the local image use different methods to extract the SIFT descriptor key point information in the image, and realize the fusion of the local semantic concept features of the global image and the local image, so that the local semantic concept features of the global Chinese painting image and the local salient image are simultaneously The analysis of information can obtain more feature information that is helpful for classification and recognition and is more discriminative, so it can improve the accuracy of classification and recognition of traditional Chinese painting images. Compared with the traditional Chinese painting scene image classification method realized by using low-level visual features proposed by James.Wang and Jiang Shuqiang, the present invention has stronger expansibility. And the middle-level semantic modeling classification method is extended to the application field of traditional Chinese painting images.

附图说明Description of drawings

图1是本发明的模块框图Fig. 1 is a block diagram of the present invention

图2是本发明输入的国画原图Fig. 2 is the original picture of Chinese painting imported by the present invention

图3是本发明提取国画原图中的显著区域图像Fig. 3 is the salient area image in the original picture of traditional Chinese painting extracted by the present invention

图4是本发明的词包表示模型流程图Fig. 4 is the word bag representation model flowchart of the present invention

图5是本发明国画原图局部描述子SIFT特征的算法流程图Fig. 5 is the algorithm flow chart of the local descriptor SIFT feature of the original picture of traditional Chinese painting in the present invention

图6是本发明均匀网格采样方法示意图Fig. 6 is a schematic diagram of the uniform grid sampling method of the present invention

图7是本发明一幅国画图像均匀网格采样实例示意图Fig. 7 is a schematic diagram of a Chinese painting image uniform grid sampling example of the present invention

图8是本发明国画显著区域图像局部描述子SIFT特征的算法流程图Fig. 8 is the algorithm flow chart of the local descriptor SIFT feature of the salient region image of traditional Chinese painting in the present invention

图9是本发明构建国画图像空间金字塔图例Fig. 9 is that the present invention constructs the Chinese painting image space pyramid legend

图10是本发明的分类识别流程示意图Fig. 10 is a schematic diagram of the flow chart of classification and recognition in the present invention

具体实施方式Detailed ways

下面结合附图和实施例对本发明进行详细的描述。The present invention will be described in detail below in conjunction with the accompanying drawings and embodiments.

中国画根据绘画的内容，大致分人物画、山水画和花鸟走兽画三大类。其中每一个大类又可以分为不同的小类，比如：人物画是以人物为主要描绘对象的画科，按其取材的差异可分为宗教人物画和世俗人物画，还可细分为肖像画、故事画、风俗画等。本发明基于局部语义概念的国画图像识别方法包括以下步骤：According to the content of the painting, Chinese painting can be roughly divided into three categories: figure painting, landscape painting and flower, bird and animal painting. Each of these major categories can be divided into different subcategories. For example, figure painting is a painting subject that mainly depicts people. Portrait painting, story painting, genre painting, etc. The present invention is based on the traditional Chinese painting image recognition method of local semantic concept and comprises the following steps:

1)如图1所示，利用扫描设备将待识别的若干幅国画作品扫描出来，存入计算机中，扫描设备可以采用各种已有技术设备，本实施例采用的扫描设备是Expression10000XL平板式扫描仪，保存图像的参数为：24位颜色深度，400dpi分辨率，JPEG图像文件格式。1) As shown in Figure 1, utilize the scanning device to scan several pieces of Chinese painting works to be identified, and store them in the computer. The scanning device can adopt various prior art devices. The scanning device used in this embodiment is an Expression10000XL flatbed scanner The parameters for saving the image are: 24-bit color depth, 400dpi resolution, and JPEG image file format.

2)将采集到的各幅国画作品图像输入随机抽取器，随机抽取器对输入的国画作品图像分成训练样本集和测试样本集，分类的步骤如下：2) each piece of traditional Chinese painting images collected is input into a random extractor, and the random extractor is divided into a training sample set and a test sample set to the imported Chinese painting images, and the classification steps are as follows:

①定义国画图像的类别，类别编号为1、2、…、n，n为自然数，本实施例根据绘画的内容，将国画分为花鸟画、人物画和山水画三大类，即n＝3，(以下以n＝3为例进行说明，但不限于此)。1. define the category of Chinese painting image, category numbering is 1, 2, ..., n, and n is a natural number, and present embodiment divides Chinese painting into flower-and-bird painting, figure painting and landscape painting three major classes according to the content of painting, i.e. n=3, (In the following, n=3 is taken as an example for illustration, but not limited thereto).

②假设用于待识别的国画作品图像代表集为P，记为{P₁，P₂，P₃}。其中P₁表示花鸟画，记为

A_i为其中的一幅国画图像，P₂表示人物画，记为P₂＝{B₁，B₂，...，B_i}，B_i为其中的一幅国画图像，P₃表示山水画，记为P₃＝{C₁，C₂，...，C_i}，C_i为其中的一幅国画图像，其中i为对应图像的数量。。② Assume that the image representative set of Chinese painting works to be recognized is P, denoted as {P ₁ , P ₂ , P ₃ }. Among them, P ₁ represents flower-and-bird painting, denoted as

A _i is one of the traditional Chinese painting images, P ₂ represents figure painting, recorded as P ₂ ={B ₁ , B ₂ ,...,B _i }, B _i is one of the traditional Chinese painting images, P ₃ represents landscape painting , recorded as P ₃ ={C ₁ , C ₂ ,...,C _i }, where C _i is one of the traditional Chinese painting images, where i is the number of corresponding images. .

③分别从P₁、P₂和P₃中随机选取设定数量的图像作为训练样本集Q，记为{P₁′，P₂′，P₃′}，用于生成国画图像识别的模型；将P₁、P₂和P₃中剩余的图像作为测试样本集

，用于校准。③Respectively randomly select a set number of images from P ₁ , P ₂ and P ₃ as the training sample set Q, denoted as {P ₁ ′, P ₂ ′, P ₃ ′}, which is used to generate a model for Chinese painting image recognition; Use the remaining images in P ₁ , P ₂ and P ₃ as the test sample set

, for calibration.

3)将步骤2)中分出的训练样本集和测试样本集的原图(如图2所示)输入视觉注意力模型，视觉注意力模型从训练样本集Q中提取国画图像中的显著区域图像集Q_显(如图3所示)：寻找图像中显著目标的过程符合人类视觉神经系统选择视觉场景中显著目标的生物机理，尽可能多的保留国画中的有助于分类的主要语义区域，剔除一些冗余区域，本发明中的视觉注意力模型可以采用Itti-Koch(人名)的视觉注意力模型，也可以采用Jonathan Harel(人名)的GBVS(Graph-Based Visual Saliency基于图的显著性分析)算法，但不限于此。Itti-Koch模型主要分为视觉特征的提取和显著图的计算两步，GBVS算法是对经典的Itti-Koch模型的改进。3) Input the original image (as shown in Figure 2) of the training sample set and test sample set separated in step 2) into the visual attention model, and the visual attention model extracts the salient regions in the Chinese painting image from the training sample set Q Image set Q _display (as shown in Figure 3): The process of finding salient objects in the image is in line with the biological mechanism of the human visual nervous system to select salient objects in the visual scene, and retain as many main semantic regions in Chinese paintings that are helpful for classification , remove some redundant regions, the visual attention model among the present invention can adopt the visual attention model of Itti-Koch (personal name), also can adopt GBVS (Graph-Based Visual Saliency based on the salience of graph) of Jonathan Harel (personal name) Analysis) algorithm, but not limited to this. The Itti-Koch model is mainly divided into two steps: the extraction of visual features and the calculation of the saliency map. The GBVS algorithm is an improvement of the classic Itti-Koch model.

4)如图4所示，根据步骤3)得到的训练样本集Q和显著区域图像集Q_显，建立国画图像的词包模型的过程如下：4) as shown in Figure 4, according to the training sample set Q obtained in step 3) and the salient region image _set Q, the process of setting up the word bag model of the traditional Chinese painting image is as follows:

①国画图像的灰度化，分别对训练样本集Q和显著区域图像Q_显中彩色国画图像进行灰度化处理，分别记为Q′和Q_显′，具体灰度处理的步骤包括：①Grayscale of traditional Chinese painting images, respectively carry out grayscale processing on the color traditional Chinese painting images in the training sample set Q and the significant area _image Q, which are respectively recorded as Q′ and Q _′ , and the specific grayscale processing steps include:

将一个彩色图像转换成一个灰度图像，按如下常规公式进行转换：To convert a color image into a grayscale image, convert it according to the following general formula:

Gray(i，j)＝0.11*R(i，j)+0.59*G(i，j)+0.3*B(i，j)Gray(i,j)=0.11*R(i,j)+0.59*G(i,j)+0.3*B(i,j)

其中i，j是一个像素点在图像中的位置，R(i，j)是i，j所表示的像素点颜色的红色分量，同理G(i，j)，B(i，j)分别表示绿色和蓝色分量，Gray(i，j)表示该点转换后的灰度级别。最后将该像素点RGB分量值都设为Gray(i，j)即可。按照上述方法可将彩色图像中转化为灰度图像。Where i, j is the position of a pixel in the image, R(i, j) is the red component of the pixel color represented by i, j, similarly G(i, j), B(i, j) respectively Represents the green and blue components, and Gray(i, j) represents the converted gray level of the point. Finally, the RGB component values of the pixel point are all set to Gray(i, j). The color image can be converted into a grayscale image according to the above method.

②分别对步骤①得到的Q′和Q_显′中的灰度图像选取SIFT(Scale-invariantfeature transform，尺度不变特征转换)描述子的关键点，利用关键点邻域像素的梯度方向分布特性为每个关键点指定方向参数，生成SIFT特征向量，并根据需要对SIFT特征向量进行光照归一化处理。② Select the key points of the SIFT (Scale-invariant feature transform, scale invariant feature transform) descriptor for the grayscale images in Q′ and _Qdisplay obtained in step ① respectively, and use the gradient direction distribution characteristics of the neighborhood pixels of the key points as Each key point specifies a direction parameter, generates a SIFT feature vector, and performs illumination normalization processing on the SIFT feature vector as required.

其中，局部描述子SIFT特征的提取方法分成两部分，一个是对国画原图进行局部描述，另一个是对国画显著区域图像进行局部描述：Among them, the extraction method of the local descriptor SIFT feature is divided into two parts, one is to describe the original image of traditional Chinese painting locally, and the other is to describe the image of the salient area of traditional Chinese painting locally:

如图5所示，对于Q′中图像采用均匀网格采样的方法选取SIFT特征关键点，网格采样方法就是对将图像按M*M像素大小的网格进行采样(如图6所示)，其中M为2的整数次幂，建议M为8或16，确保将图像划分为一定数量的网格；设图像的宽和高分别为Width和Hight，则：As shown in Figure 5, the uniform grid sampling method is used to select SIFT feature key points for the image in Q′, and the grid sampling method is to sample the image according to the grid of M*M pixel size (as shown in Figure 6) , where M is an integer power of 2. It is recommended that M be 8 or 16 to ensure that the image is divided into a certain number of grids; if the width and height of the image are Width and Hight respectively, then:

X＝(Width％M)/2+1；X=(Width%M)/2+1;

Y＝(Hight％M)/2+1；Y=(Hight%M)/2+1;

X，Y为开始进行网格采样的起点坐标，一共生成(Width/X)*(Hight/Y)个均匀网格，将均匀网格的交叉点作为SIFT特征关键点，选取以关键点为圆心，M为半径的圆为邻域，利用关键点邻域像素的梯度方向分布特性为每个关键点指定方向参数，生成SIFT特征向量(如图7所示)。X, Y are the starting point coordinates for grid sampling. A total of (Width/X)*(Hight/Y) uniform grids are generated. The intersection points of the uniform grids are used as the key points of SIFT features, and the key points are selected as the center of the circle. , the circle with M as the radius is the neighborhood, and the gradient direction distribution characteristics of the pixels in the neighborhood of the key point are used to specify the direction parameter for each key point to generate the SIFT feature vector (as shown in Figure 7).

如8所示，对Q_显′图像采用的是尺度空间极值检测的方法选取SIFT特征关键点，即在图像二维平面空间和DoG(Difference-of-Gaussian高斯核差分)尺度空间中同时检测局部极值以作为特征关键点，以使特征具备良好的独特性和稳定性。DoG算子定义为两个不同尺度的高斯差分核，其具有计算简单的特点，是归一化LoG(Laplacian-of-Gaussian拉普拉斯-高斯)算子的近似。As shown in 8, the method of scale space extremum detection is used to select the SIFT feature key points for the Q- _display image, that is, the two-dimensional plane space of the image and the DoG (Difference-of-Gaussian Gaussian kernel difference) scale space are simultaneously detected. The local extremum is used as the key point of the feature to make the feature have good uniqueness and stability. The DoG operator is defined as two Gaussian difference kernels of different scales, which has the characteristics of simple calculation and is an approximation of the normalized LoG (Laplacian-of-Gaussian) operator.

DoG算子如下式所示：The DoG operator is shown in the following formula:

D(x，y，σ)＝(G(x，y，kσ)-G(x，y，σ))*I(x，y)＝L(x，y，kσ)-L(x，y，σ)D(x,y,σ)=(G(x,y,kσ)-G(x,y,σ))*I(x,y)=L(x,y,kσ)-L(x,y , σ)

式中G(x，y，kσ)是二维高斯函数，k表示尺度因子比例系数，σ代表了高斯正态分布的方差，I(x，y)表示原图像，L代表了图像的尺度空间。In the formula, G(x, y, kσ) is a two-dimensional Gaussian function, k represents the scaling factor of the scale factor, σ represents the variance of the Gaussian normal distribution, I(x, y) represents the original image, and L represents the scale space of the image .

其中，L(x，y，σ)定义如下：Among them, L(x, y, σ) is defined as follows:

L(x，y，σ)＝G(x，y，σ)*I(x，y)L(x,y,σ)=G(x,y,σ)*I(x,y)

将图像I(x，y)与不同尺度因子下的高斯核G(x，y，σ)进行卷积操作是为了得到在不同尺度空间下的稳定特征点。The purpose of convolving the image I(x, y) with the Gaussian kernel G(x, y, σ) under different scale factors is to obtain stable feature points in different scale spaces.

其中，G(x，y，kσ)定义如下：Among them, G(x, y, kσ) is defined as follows:

$G G ((x x,, y the y,, σ σ)) = = \frac{11}{22 π π {σ σ}^{22}} {e e}^{- - (({x x}^{22} + + {y the y}^{22})) / / 22 {σ σ}^{22}}$

式中，(x，y)代表图像的像素位置，σ称为尺度空间因子，其值越小则表征该图像被平滑的越少，相应的尺度也就越小，大尺度对应于图像的概貌特征，小尺度对应于图像的细节特征。In the formula, (x, y) represents the pixel position of the image, and σ is called the scale space factor. The smaller the value, the less the image is smoothed, and the smaller the corresponding scale. The large scale corresponds to the general appearance of the image Features, the small scale corresponds to the detailed features of the image.

③如图4所示，在步骤②得到的SIFT特征向量来构建视觉词汇表：使用K-Means算法聚类训练样本集Q上生成的所有的SIFT特征向量，每个聚类中心视为一个视觉单词，从而生成了一个由K个视觉单词构成的视觉词汇表。该视觉词汇表中视觉单词的编号亦可称为局部语义概念。K为自然数，一般取值为500-1200，建议K取为1000。该视觉词汇表仅在训练过程中生成。同理，也在Q_显上构建视觉词汇表。③As shown in Figure 4, the SIFT feature vector obtained in step ② is used to construct a visual vocabulary: use the K-Means algorithm to cluster all SIFT feature vectors generated on the training sample set Q, and each cluster center is regarded as a visual vocabulary words, thus generating a visual vocabulary consisting of K visual words. The numbers of the visual words in the visual vocabulary can also be called local semantic concepts. K is a natural number, the general value is 500-1200, and it is recommended that K be 1000. This visual vocabulary is only generated during training. In the same way, a visual vocabulary is also built on the Q _display .

④利用上述步骤③处理训练样本集Q后得到的视觉词汇表，以及处理显著区域图像Q_显得到的视觉词汇表，分别对Q和Q_显中图像进行局部语义概念特征的提取和表示：④ Utilize the visual vocabulary obtained after processing the training sample set Q in the above step ③, and the visual vocabulary _obtained by processing the salient area image Q, and extract and represent the local semantic concept features of Q and Q _displayed images respectively:

首先，计算某一个SIFT关键点邻域内的SIFT特征与视觉词汇表中的每个视觉单词所对应SIFT特征的欧氏距离，用最近邻的视觉单词来定义该SIFT关键点。然后，依次处理给定一幅图像中各个SIFT关键点，将所有的SIFT关键点映射到视觉词汇表中，用视觉单词的标号描述上述这幅图像，即得到该图像的局部语义概念特征。最后，采用直方图特征表示法来表示该图像的局部语义概念特征，即该图像的视觉单词分布概率直方图。First, calculate the Euclidean distance between the SIFT feature in the neighborhood of a certain SIFT key point and the SIFT feature corresponding to each visual word in the visual vocabulary, and use the nearest neighbor visual word to define the SIFT key point. Then, each SIFT key point in a given image is processed sequentially, all SIFT key points are mapped to the visual vocabulary, and the above-mentioned image is described with the label of the visual word, that is, the local semantic concept feature of the image is obtained. Finally, the histogram feature representation is used to represent the local semantic concept features of the image, that is, the visual word distribution probability histogram of the image.

5)如图9所示，在步骤3)得到的训练样本集Q和显著区域图像集Q_显上分别构建空间金字塔模型，其包括以下步骤：5) As shown in Figure 9, in step 3) the training sample set Q obtained and the salient area image _set Q are respectively constructed spatial pyramid models, which includes the following steps:

①国画图像的空间金字塔分块，其具体为；①Space pyramid blocks of traditional Chinese painting images, specifically:

将国画图像整体在二维图像空间划分为不同大小的子图像区域，形成图像空间金字塔G。The whole Chinese painting image is divided into sub-image areas of different sizes in the two-dimensional image space to form an image space pyramid G.

设G的层数为L，l表示空间金字塔G的第l层，l＝0，1，......，L-1。子图像区域数为D，r表示子图像区域标号，r＝0，1，......，D-1。Let the number of layers of G be L, l represents the lth layer of the space pyramid G, l=0, 1, . . . , L-1. The number of sub-image areas is D, r represents the label of the sub-image area, r=0, 1, . . . , D-1.

D＝(2^l)×(2^l)D＝(2 ^l )×(2 ^l )

当l＝0时，表示处于金字塔最底层，此时图像划分的字块数目为1。L一般取值为3-5，建议取为4。When l=0, it means that it is at the bottom of the pyramid, and the number of blocks in the image is 1 at this time. L generally takes a value of 3-5, and 4 is recommended.

②构建空间金字塔特征直方图，其具体为；② Construct a spatial pyramid feature histogram, specifically;

首先，将图像空间金字塔G中的各层各个子块图像表示为局部语义概念特征直方图，然后对这些特征直方图赋予以合适的权值后进行串行组合，形成一个总的特征直方图，即空间金字塔特征直方图。本实施例中权值为2^L-l+1。Firstly, each sub-block image of each layer in the image space pyramid G is expressed as a local semantic concept feature histogram, and then these feature histograms are given appropriate weights and combined serially to form a total feature histogram, That is, the spatial pyramid feature histogram. In this embodiment, the weight value is 2 ^L-l+1 .

设空间金字塔G第l层上第r个子图像区域的直方图为

(其中r表示子图像区域标号，r＝0，1，......，D-1)，H^l代表空间金字塔G第l层上串行组合后的直方图；H为该图像经过空间金字塔分块表示后，形成总的特征直方图。Let the histogram of the rth sub-image area on the lth layer of the spatial pyramid G be

(where r represents the sub-image area label, r=0, 1, ..., D-1), H ¹ represents the histogram after serial combination on the first layer of space pyramid G; H is that the image passes through After the space pyramid is divided into blocks, a total feature histogram is formed.

${H h}^{11} = = [[{H h}_{00}^{11},, {H h}_{11}^{11},, . . . . . .,, {H h}_{r r}^{11}]]$

$H h = = \frac{11}{22^{L L} * * {H h}^{00}} + + {Σ Σ}_{l l = = 11}^{L L - - 11} ((\frac{11}{22^{L L - - l l + + 11} * * {H h}^{l l}}))$

l＝1，2，...，L-1l=1, 2, ..., L-1

同理，也在Q_显上构建国画显著区域图像的空间金字塔特征直方图。In the same way, the spatial pyramid feature histogram of the image of the salient area of traditional Chinese painting is also constructed on the Q _display .

6)将步骤5)的②中在训练样本集Q上构建的空间金字塔特征直方图和在显著区域图像集Q_显构建的空间金字塔特征直方图进行融合。其包括以下步骤：6) Fuse the spatial pyramid feature histogram constructed on the training sample set Q in ② of step 5) with the spatial pyramid feature histogram _constructed on the salient region image set Q. It includes the following steps:

①依次采用步骤4)、5)分别处理步骤2)中得到Q和步骤3)中得到Q_显。① Use steps 4) and 5) in turn to process the Q obtained in step 2) and _{the display} of Q obtained in step 3).

步骤2)得到了训练样本集Q；步骤3)生成了显著区域图像Q_显，利用步骤4)和步骤5)，分别处理Q和Q_显后得到的相应的空间金字塔特征直方图，步骤5)实现空间金字塔特征直方图的融合。这样融合目的就是既包含了全局特征和又包含了局部特征，得到更好的识别效果。Step 2) obtains the training sample set Q; step 3) generates the salient region image _Q , uses step 4) and step 5), respectively processes Q and Q _to obtain the corresponding spatial pyramid feature histogram, step 5) Realize the fusion of spatial pyramid feature histograms. The purpose of this fusion is to include both global features and local features to obtain better recognition results.

根据训练样本集Q＝{q₁，q₂，...，q_e}，e表示训练样本集中图像的数量。那么，利用步骤4)和步骤5)处理Q后得到的特征直方图为H_原＝{H_原1，H_原2，...，H_原e}。According to the training sample set Q={q ₁ , q ₂ , . . . , q _e }, e represents the number of images in the training sample set. Then, the feature histogram obtained after processing Q by step 4) and step 5) is _Hyuan ={ _Hyuan1 , _Hyuan2 , . . . , _Hyuane }.

同理，利用步骤4)和步骤5)处理国画图像的显著区域Q_显后得到的特征直方图为H_显＝{H_显1，H_显2，...，H_显e}。In the same way, the feature histogram obtained after _processing the salient area Q of the traditional Chinese painting image by step 4) and step 5) is H _significant ={H _{significant 1} , H _{significant 2} ,..., H _{significant e} }.

②将①中生成的特征直方图H_原和H_显进行串行合并。② Serially merge the feature histogram H _original and H _display generated in ①.

目前，存在的特征融合方法，一种是将两组特征向量首尾相连生成一个联合向量作为新的特征向量，在更高维的向量空间进行特征提取，即串行组合；另一种是利用复向量将同一样本的两组特征向量合并在一起，在复向量空间进行特征提取，即并行组合。本发明此处用到的是串行组合的方法，最终融合后的结果：H＝{H_原，H_显}。At present, there are feature fusion methods, one is to connect two sets of feature vectors end to end to generate a joint vector as a new feature vector, and perform feature extraction in a higher-dimensional vector space, that is, serial combination; the other is to use complex The vector merges two sets of feature vectors of the same sample together, and performs feature extraction in the complex vector space, that is, parallel combination. What the present invention uses here is the method of serial combination, the result after final fusion: H={H _original , H _shows }.

7)如图10所示，选择利用现有的聚类方法、K近邻法、神经网络以及支持向量机等方法中的一种或几种分类方法，对测试样本集中待识别的国画图像进行识别，用识别准确率和混淆矩阵的方式输出识别结果，其具体步骤如下：7) As shown in Figure 10, select and utilize one or several classification methods in the methods such as existing clustering method, K nearest neighbor method, neural network and support vector machine, the traditional Chinese painting image to be identified in the test sample set is identified , output the recognition results in the form of recognition accuracy and confusion matrix, the specific steps are as follows:

①分类器模型的生成① Generation of classifier model

将从训练样本集Q中提取的特征向量H、训练样本集Q对应的类别标签H_label，以及相关参数options作为训练分类器模型的输入，分类器模型model作为结果输出。该发明采用LIBSVM-fast工具包进行识别实验，但不限于此，在仿真环境MatlabR2008A软件平台上，可利用如下函数模型表示：The feature vector H extracted from the training sample set Q, the category label H_label corresponding to the training sample set Q, and related parameter options are used as the input of the training classifier model, and the classifier model model is output as the result. The invention adopts the LIBSVM-fast toolkit to carry out the recognition experiment, but is not limited thereto. On the simulation environment MatlabR2008A software platform, the following function model can be used to represent:

model＝svmtrain(H，H_label，options)；model = svmtrain(H, H_label, options);

其中，H_label＝{label1，label2，...，label_e}，label_e取值范围为1～n，此处n＝3，分别代表花鸟画、山水画和人物画。Among them, H_label={label1, label2, ..., label _e }, the value range of label _e is 1~n, where n=3, representing flower-and-bird painting, landscape painting and figure painting respectively.

Options(操作参数)：可用的选项表示含义如下：Options (operating parameters): The available options indicate the following meanings:

-t核函数类型：设置核函数类型。可选类型有-t kernel function type: set the kernel function type. Optional types are

0——线性核 1——多项式核0——linear kernel 1——polynomial kernel

2——RBF核 3——sigmoid核2——RBF kernel 3——sigmoid kernel

4——intersection核4——intersection core

-s 设置svm类型-s set svm type

0——C-svc 1——V-svc0——C-svc 1——V-svc

2——One-class-svm 3——ε-SVR2——One-class-svm 3——ε-SVR

4——γ-SVR4——γ-SVR

-b概率估计：是否计算SVC或SVR的概率估计，可选值为0或1，默认为0。-b probability estimate: Whether to calculate the probability estimate of SVC or SVR, the optional value is 0 or 1, and the default is 0.

-c cost：设置C-svc、ε-SVR、γ-SVR中惩罚系数C，默认值为1。-c cost: Set the penalty coefficient C in C-svc, ε-SVR, γ-SVR, the default value is 1.

参数options＝’-t4-s0-b1-c1’，表示的含义是核函数为intersectionkernel，SVM类型为C-svc；C-svc惩罚系数为1，且需要概率估计。The parameter options='-t4-s0-b1-c1' means that the kernel function is intersectionkernel, the SVM type is C-svc; the C-svc penalty coefficient is 1, and probability estimation is required.

②输出测试样本集中待识别国画图像的结果，其具体为：② Output the result of the Chinese painting image to be recognized in the test sample set, which is specifically:

利用步骤3)～6)处理测试样本集中待识别的国画图像，得到对应的特征向量，并将其输入训练好的分类器模型，根据分类器模型的公式即可得到图像的分类结果。Use steps 3) to 6) to process the Chinese painting images to be recognized in the test sample set to obtain the corresponding feature vectors, and input them into the trained classifier model, and the image classification results can be obtained according to the formula of the classifier model.

依次利用步骤3)～6)处理测试样本集C_pQ中的待识别的国画图像，得到对应的特征直方图向量H和H_label。测试样本集C_pQ的H、H_label以及步骤7)的①中生成的model作为输入，该测试样本集C_pQ测试结果为识别的准确率。该发明采用LIBSVM-fast工具包进行识别实验，但不限于此，在仿真环境MatlabR2008A软件平台中，可利用如下函数模型表示：Use steps 3) to 6) in sequence to process the Chinese painting images to be recognized in the test sample set C _p Q to obtain the corresponding feature histogram vectors H and H_label. H, H_label of the test sample set C _p Q and the model generated in ① of step 7) are used as input, and the test result of the test sample set C _p Q is the recognition accuracy. The invention adopts the LIBSVM-fast toolkit to carry out the recognition experiment, but is not limited thereto. In the simulation environment MatlabR2008A software platform, the following function model can be used to represent:

[VP]＝svmpredict(H_label，H，model，libsvm_options)；[VP]=svmpredict(H_label, H, model, libsvm_options);

此处，libsvm_options＝’-b 1’含义为需要概率估计。输出结果V含义为预测得到该测试样本集的类别标号，P为预测该测试样本集的识别准确率。Here, libsvm_options='-b 1' means that probability estimation is required. The meaning of the output result V is to predict the category label of the test sample set, and P is to predict the recognition accuracy of the test sample set.

③识别结果评价方法③Recognition result evaluation method

最终识别结果评价的方法有两种，识别准确率和混淆矩阵。假定识别准确率为p，定义如下公式：There are two ways to evaluate the final recognition results, recognition accuracy and confusion matrix. Assuming that the recognition accuracy is p, define the following formula:

P＝n/N；P=n/N;

其中，n为正确识别图像数，N为待识别的图像总数。Among them, n is the number of correctly recognized images, and N is the total number of images to be recognized.

混淆矩阵是模式识别中较为常用的精度评价工具，在图像精度评价中，主要用于比较分类结果和真实结果，可以把分类结果的精度显示在一个混淆矩阵里面。一个完美的分类模型就是，若一个目标对象实际上属于类别A，也预测成类别A，处于类别B，也就预测成B。但实际上，模型往往会出现类别A的对象预测为类别B，对一些原本是类别B的对象，却预测为类别A。那么，这个模型到底预测对了多少预测错了多少，混淆矩阵就把所有这些信息，都归到一个表里(如表2所示)：Confusion matrix is a commonly used accuracy evaluation tool in pattern recognition. In image accuracy evaluation, it is mainly used to compare classification results and real results, and the accuracy of classification results can be displayed in a confusion matrix. A perfect classification model is that if a target object actually belongs to category A, it is also predicted to be category A, and if it is in category B, it is also predicted to be B. But in fact, the model often predicts that the object of category A is category B, and some objects that are originally category B are predicted as category A. Then, how much the model predicts is right and how much it is wrong, the confusion matrix puts all this information into a table (as shown in Table 2):

表2混淆矩阵Table 2 confusion matrix

其中，对角线上的n_AA、n_BB、n_CC为每类预测正确的数目；而非对角线上的为该类预测相应类别的错误的数目，如n_BA为B预测为A的数目；n_AC为A预测为C的数目。Among them, n _AA , n _BB , and n _CC on the diagonal are the number of correct predictions for each class; those on the non-diagonal are the number of errors in predicting the corresponding class for this class, such as n _BA is the number of B predicted as A Number; n _AC is the number of A predicted as C.

本发明将用在国画图像的分类，其性能可通过如下实际国画图像完成的分类实验给出，实验数据集是源自《中国绘画全集》画册扫描的国画图像库，它包含国画图像1303幅(其中训练样本数639，测试样本数664，约满足1∶1)，每幅图像大小为512*(长和宽的最大值不超过512)，彩色图像，jpg格式。设计分类实验的样本集，详细情况如下，其中A代表花鸟类，B代表人物类，C代表山水类。The present invention will be used in the classification of Chinese painting images, and its performance can be provided by the classification experiment that following actual Chinese painting images complete, the experimental data set is the Chinese painting image library that is derived from the scanning of the "Complete Works of Chinese Painting" album, and it contains 1303 pieces of Chinese painting images ( Among them, the number of training samples is 639, and the number of test samples is 664, approximately satisfying 1:1), the size of each image is 512* (the maximum length and width do not exceed 512), color image, jpg format. Design the sample set for the classification experiment, the details are as follows, where A represents flowers and birds, B represents people, and C represents landscapes.

训练样本集：A262幅；B157幅；C220幅；Training sample set: A262; B157; C220;

测试样本集：A261幅；B103幅；C300幅；Test sample set: A261; B103; C300;

分类器选用目前主流的支持向量机分类器，版本为Fast-Libsvm-2.84-1，实验中的参数options＝’-t4-s0-b1-c1’，实验结果如下(如表3所示)：The classifier uses the current mainstream support vector machine classifier, the version is Fast-Libsvm-2.84-1, the parameter options in the experiment='-t4-s0-b1-c1', the experimental results are as follows (as shown in Table 3):

表3国画图像分类识别结果表Table 3 Chinese painting image classification and recognition result table

表中的方法1是仅利用全局国画原图像特征识别方法；方法2是仅利用局部国画局部显著区域图像特征识别方法；方法3是本方法提出的方法，融合全局特征和局部特征的识别方法。Method 1 in the table is a recognition method using only global original image features of Chinese paintings; Method 2 is a recognition method using only local salient area image features of local Chinese paintings; Method 3 is a method proposed by this method, a recognition method that combines global features and local features.

表4、表5和表6分别为方法1、方法2和方法3识别结果对应的混淆矩阵。Table 4, Table 5, and Table 6 are the confusion matrices corresponding to the recognition results of Method 1, Method 2, and Method 3, respectively.

表4方法1对应的混淆矩阵Table 4 Confusion Matrix Corresponding to Method 1

表5方法2对应的混淆矩阵Table 5 Confusion Matrix Corresponding to Method 2

表6方法3对应的混淆矩阵Table 6 Confusion Matrix Corresponding to Method 3

由表3知，本发明方法提出的方法综合利用了国画图像的全局特征信息和局部特征信息来进行国画图像的识别，较前两种方法而言，提高了国画识别准确率。由表4、表5、表6知，方法3在A类和B类国画图像的正确识别数目上较方法1和方法2有均有提高，同时也可为步骤4)③中参数K和步骤5)①中的参数L的选取提供了依据，即最优K和L的选择是以更高的识别准确率和更多的正确识别数目为目的。Known from Table 3, the method proposed by the method of the present invention comprehensively utilizes the global feature information and the local feature information of the Chinese painting image to carry out the recognition of the Chinese painting image, and compared with the previous two methods, the recognition accuracy of the Chinese painting has been improved. Known by table 4, table 5, table 6, method 3 has all improved than method 1 and method 2 on the correct recognition number of A class and B class traditional Chinese painting image, also can be step 4 simultaneously) ③ middle parameter K and step 5) The selection of the parameter L in ① provides a basis, that is, the selection of the optimal K and L is aimed at higher recognition accuracy and more correct recognition numbers.

本发明虽然得到了更理想的识别结果，如能够考虑颜色、纹理及其他特征，将进一步提高准确率，有助于国画图像的自动分类及其标注和检索。Although the present invention has obtained more ideal recognition results, if the color, texture and other features can be considered, the accuracy rate will be further improved, and it will help the automatic classification, labeling and retrieval of traditional Chinese painting images.

Claims

1. traditional Chinese Painting image-recognizing method based on local semantic concept, it may further comprise the steps:

1) utilizes scanning device that traditional Chinese Painting works to be identified are carried out image acquisition, and deposit in the computing machine;

2) by randomly drawing device the traditional Chinese Painting works image that collects is divided into training sample set and test sample book collection;

3) extract the marking area image in the traditional Chinese Painting works image in training sample set and the test sample book collection respectively by visual attention model;

4), set up the speech bag model of traditional Chinese Painting works image respectively to traditional Chinese Painting works image in the training sample set and corresponding marking area image;

5) according to traditional Chinese Painting works image speech bag model of setting up in the training sample set and corresponding marking area image speech bag model, make up the space pyramid model of traditional Chinese Painting works image and the space pyramid model of corresponding marking area image respectively, and generate corresponding two space pyramid feature histograms;

6) method that adopts serial to merge merges two space pyramid feature histograms that generate in the step 5);

7) utilize more than one sorting techniques in clustering method, k nearest neighbor method, neural network and the support vector machine method to concentrate traditional Chinese Painting image to be identified to discern, export recognition result with the mode of recognition accuracy and confusion matrix to test sample book.

2. a kind of traditional Chinese Painting image-recognizing method based on local semantic concept as claimed in claim 1 is characterized in that: the generation method of training sample set and test sample book collection comprises described step 2):

1. define the classification of traditional Chinese Painting image, classification is numbered 1～n, and n is a natural number;

2. hypothesis is used for traditional Chinese Painting works image representative collection to be identified for P, is designated as { P ₁, P ₂, P ₃.P wherein ₁The expression flower-and-bird painting is designated as

A _iBe a width of cloth traditional Chinese Painting image wherein, P ₂The expression figure painting is designated as P ₂={ B ₁, B ₂..., B _i, B _iBe a width of cloth traditional Chinese Painting image wherein, P ₃The expression landscape painting is designated as P ₃={ C ₁, C ₂..., C _i, C _iBe a width of cloth traditional Chinese Painting image wherein;

3. respectively from P ₁, P ₂And P ₃The image that middle picked at random is set quantity is designated as { P as training sample set Q ₁', P ₂', P ₃', be used to generate the model of traditional Chinese Painting image recognition; With P ₁, P ₂And P ₃In remaining image as the test sample book collection

, be used for calibration.

3. a kind of traditional Chinese Painting image-recognizing method based on local semantic concept as claimed in claim 1 or 2 is characterized in that: set up the speech bag model of traditional Chinese Painting image in the described step 4), comprise following steps:

1. the gray processing of traditional Chinese Painting image, carry out gray processing to colored traditional Chinese Painting image in training sample set and the marking area image by following formula respectively and handle:

Gray(i，j)＝0.11*R(i，j)+0.59*G(i，j)+0.3*B(i，j)

I wherein, j is the position of pixel in image, R (i j) is i, the red component of the pixel color that j is represented, G (i, j), (i j) represents green and blue component respectively to B, and (i j) represents grey level after this some conversion to Gray;

2. the gray level image that respectively 1. step is obtained is chosen SIFT (Scale-invariant feature transform, the conversion of yardstick invariant features) key point of descriptor, utilize the gradient direction distribution character of key point neighborhood territory pixel to be each key point assigned direction parameter, generate the SIFT proper vector, and as required the SIFT proper vector is carried out unitary of illumination and handle;

3. the SIFT proper vector of former figure of traditional Chinese Painting that 2. obtains according to step and traditional Chinese Painting marking area image is come, and makes up the visual vocabulary table respectively; The visual vocabulary table comprises K vision word, and K is a natural number, and general value is 500-1200, and suggestion K is taken as 1000;

4. utilize two visual vocabulary tables that obtain, carry out local semantic concept Feature Extraction and expression, promptly calculate each vision word in SIFT feature and the visual vocabulary table in some SIFT key point neighborhoods the Euclidean distance of corresponding SIFT feature, vision word with arest neighbors defines this SIFT key point, all SIFT key points are mapped in the visual vocabulary table, describe this width of cloth image with the label of vision word, the local semantic concept feature that promptly obtains this image adopts the histogram feature representation to represent the local semantic concept feature of this image.

4. a kind of traditional Chinese Painting image-recognizing method based on local semantic concept as claimed in claim 3 is characterized in that: described step 4) 2. in to choose the step of key point of SIFT descriptor as follows:

A, adopt the grid sampling method to sample to the former figure of traditional Chinese Painting;

B, traditional Chinese Painting marking area image is adopted metric space extreme value detection method.

5. a kind of traditional Chinese Painting image-recognizing method based on local semantic concept as claimed in claim 1 is characterized in that: make up the space pyramid model in the described step 5) and may further comprise the steps:

1. the traditional Chinese Painting image is divided into the sub-image area of different sizes in the two dimensional image space, forms space pyramid piecemeal; The space pyramid number of plies is 2～5;

2. the space pyramid block image that forms is made up corresponding space pyramid feature histogram.

6. as claim 1 or 2 or 4 or 5 described a kind of traditional Chinese Painting image-recognizing methods, it is characterized in that: in the described step 6), the fusion of two space pyramid feature histograms is comprised one of following two kinds of methods based on local semantic concept:

A kind of is two eigenvectors to be joined end to end generate an associating vector as new proper vector, carries out feature extraction, i.e. serial combination in the vector space of higher-dimension more;

Another kind is to utilize complex vector that two eigenvectors of same sample are combined, and carries out feature extraction, i.e. The parallel combined at complex vector space.

7. a kind of traditional Chinese Painting image-recognizing method based on local semantic concept as claimed in claim 3 is characterized in that: in the described step 6), the fusion of two space pyramid feature histograms is comprised one of following two kinds of methods:

8. as claim 1 or 2 or 4 or 5 or 7 described a kind of traditional Chinese Painting image-recognizing methods based on local semantic concept, it is characterized in that: it is as follows to adopt support vector machine method to carry out the step of branch time-like in the described step 7):

1. the generation of sorter model

Adopt the LIBSVM-fast kit to discern experiment, it is options='-t4-s0-b1-c1 ' that training generates the required parameter of sorter model, and the implication of its expression is that kernel function is the intersection kernel function, and the SVM type is C-svc; The C-svc penalty coefficient is 1, and needs probability estimate;

2. export test sample book and concentrate the result of traditional Chinese Painting image to be identified;

Utilize step 3)～6) handle the concentrated traditional Chinese Painting image to be identified of test sample book, obtain the characteristic of correspondence vector, and, can obtain the classification results of image according to the formula of sorter model the sorter model that its input trains;

3. the recognition result evaluation method comprises recognition accuracy and two kinds of methods of confusion matrix.

9. a kind of traditional Chinese Painting image-recognizing method based on local semantic concept as claimed in claim 3 is characterized in that: it is as follows to adopt support vector machine method to carry out the step of branch time-like in the described step 7):

1. the generation of sorter model

10. a kind of traditional Chinese Painting image-recognizing method based on local semantic concept as claimed in claim 6 is characterized in that: it is as follows to adopt support vector machine method to carry out the step of branch time-like in the described step 7):

1. the generation of sorter model