CN108108657A

CN108108657A - A kind of amendment local sensitivity Hash vehicle retrieval method based on multitask deep learning

Info

Publication number: CN108108657A
Application number: CN201711135951.XA
Authority: CN
Inventors: 何霞; 汤平; 汤一平; 陈朋; 王丽冉; 袁公萍; 金宇杰
Original assignee: Zhejiang University of Technology ZJUT
Current assignee: Zhejiang University of Technology ZJUT
Priority date: 2017-11-16
Filing date: 2017-11-16
Publication date: 2018-06-01
Anticipated expiration: 2037-11-16
Also published as: CN108108657B

Abstract

A kind of amendment local sensitivity Hash vehicle retrieval method based on multitask deep learning, identification is carried out at the same time to vehicle vehicle, vehicle system, logo, color, car plate using multitask convolutional neural networks segmentation end to end parallel, feature based pyramid extracts the network module of vehicle image example aspects, the algorithm being ranked up using local sensitivity Hash-Sorting Algorithm is corrected to vehicle characteristics in database can not obtain the cross-module state text searching method of retrieval vehicle image.The present invention proposes a kind of multitask convolutional neural networks and amendment local sensitivity Hash vehicle retrieval method end to end, effectively improve automation and the intelligent level of vehicle retrieval, and with less memory space, faster retrieval rate meets the image retrieval requirement in big data epoch.

Description

A Modified Local Sensitive Hashing Vehicle Retrieval Method Based on Multi-task Deep Learning

技术领域technical field

本发明涉及计算机视觉、模式识别、信息检索、多任务学习、相似度测量、深度自编码卷积神经网络和深度学习技术在图像检索领域的应用，尤其涉及一种基于多任务深度学习的修正局部敏感哈希车辆检索方法。The invention relates to the application of computer vision, pattern recognition, information retrieval, multi-task learning, similarity measurement, deep self-encoding convolutional neural network and deep learning technology in the field of image retrieval, especially to a modified local Sensitive Hash Vehicle Retrieval Method.

背景技术Background technique

随着社会经济的快速发展，机动车在日益成为人们日常生活必不可少交通工具的同时，也成为犯罪分子和恐怖分子从事非法活动的必须工具。各省、市际高速公路和主干线、城市出入口及主要交通要道均部署了卡口设备，对过往车辆进行信息的采集，但当前的卡口一般都是基于车牌识别技术，嫌疑车辆一旦使用假牌、套牌、无牌以及不断更换车牌的方式，便可逃避现有卡口对其的跟踪和识别。With the rapid development of society and economy, motor vehicles have become an essential tool for criminals and terrorists to engage in illegal activities while increasingly becoming an indispensable means of transportation for people's daily life. All provinces and inter-city expressways and main roads, city entrances and exits, and major traffic arteries have deployed bayonet equipment to collect information from passing vehicles. However, current bayonets are generally based on license plate recognition technology. Once a suspected vehicle uses fake License plates, sets of cards, no license plates, and constant replacement of license plates can evade the tracking and identification of existing bayonets.

基于图像的车辆特征识别涉及图像处理、模式识别、计算机视觉等相关技术领域，目前国内外对于该技术的研究大致上可以分为三个方向：(1)基于牌照的车型识别，该方法仅从图像中识别出牌照信息，并没有直接分析获得车辆的类型，分类粒度粗，且对于套牌车辆更是无法辨别；(2)基于车标的车型识别，在实际应用中，由于车标过小、光线、遮挡等客观因素的存在，无法达到理想的效果；(3)基于外观特征的车型识别，该技术相比前两种方法具有较好的鲁棒性，识别的类型也更加的细化，可以精确到车辆的品牌、系列、型号、年款等。Image-based vehicle feature recognition involves image processing, pattern recognition, computer vision and other related technical fields. At present, the research on this technology at home and abroad can be roughly divided into three directions: (1) Vehicle type recognition based on license plate. The license plate information is recognized in the image, and the type of the vehicle is not directly analyzed. The classification granularity is coarse, and it is impossible to distinguish the license plate vehicles; The existence of objective factors such as light and occlusion cannot achieve the desired effect; (3) car model recognition based on appearance features, this technology has better robustness than the previous two methods, and the types of recognition are more refined. It can be accurate to the brand, series, model, year, etc. of the vehicle.

基于外观特征的车辆特征识别技术主要通过以下三个步骤来完成：车辆分割、车辆的特征提取和车辆的分类。传统车型识别的方法主要有：模板匹配法、统计模式识别法、神经网络识别方法、仿生模式(拓扑模式)的识别方法和支持向量机的方法。但是这些方法都存在各自的缺陷，无法同时满足车型分类的速度与准确度两个最重要的指标。然而，影响这两个指标最重要的因素就是提取的车辆特征和迅速地将车辆进行定位，所以特征提取及目标快速定位是整个识别过程的关键。车辆特征的提取受到诸多因素的影响，如车辆种类多但是没有明显的区别特征、车辆的移动以及摄像机的高度和角度导致车型的特征差别大、天气的影响，光照的影响等。The vehicle feature recognition technology based on appearance features is mainly completed through the following three steps: vehicle segmentation, vehicle feature extraction and vehicle classification. The traditional vehicle identification methods mainly include: template matching method, statistical pattern recognition method, neural network recognition method, bionic pattern (topological pattern) recognition method and support vector machine method. However, these methods have their own defects, and cannot meet the two most important indicators of speed and accuracy of vehicle classification at the same time. However, the most important factors affecting these two indicators are the extracted vehicle features and the rapid positioning of the vehicle, so feature extraction and rapid target positioning are the keys to the entire recognition process. The extraction of vehicle features is affected by many factors, such as many types of vehicles but no obvious distinguishing features, the movement of vehicles, and the height and angle of the camera lead to large differences in the characteristics of vehicle models, the influence of weather, and the influence of light.

深度学习技术的发展，推动了图片结构化和特征提取的能力。早期建设的卡口系统，智能分析能力弱，图片质量以及车牌识别准确率较低，经常要根据品牌型号颜色等车辆自身固有信息，从海量过车图片或视频中，人工查找目标车辆，由于一线警力有限、劳动强度大、车型种类多、光线角度不确定等因素，无法保证查找的准确性和时效性，特别是突发紧急事件，经常贻误最佳处理时机。通过使用车辆特征深度学习系统，对前端卡口或简易卡口获取的过车图片进行特征结构化分析识别，充分挖掘海量的卡口过车图片中有价值信息，不但可以提高车牌车型的准确率，而且增加了车辆特征的识别信息，实现了车辆子品牌、车身颜色、不系安全带、驾驶员接打电话、遮阳板状态等识别检测功能，对过车数据进行精细化校正，摆脱了传统单纯依靠车牌进行分析研判的单一手段，为卡口电警数据提供了更加丰富实用的车辆防控应用，可以实现对高危车辆的有效预警防控，优化警力部署进行针对性车辆排查，可以在大量涉车涉驾案件中有效锁定嫌疑车辆，提高刑事侦查效能，使治安防控手段从事后被动侦查向事前主动预警转变。The development of deep learning technology has promoted the ability of image structure and feature extraction. The bayonet system built in the early stage has weak intelligent analysis ability, low picture quality and low accuracy of license plate recognition. It is often necessary to manually search for the target vehicle from a large number of passing pictures or videos based on the inherent information of the vehicle itself, such as the brand, model and color. Due to factors such as limited police force, high labor intensity, various vehicle types, and uncertain light angles, the accuracy and timeliness of the search cannot be guaranteed, especially in emergencies, which often delay the best time to deal with them. By using the deep learning system of vehicle characteristics, the characteristic structural analysis and identification of the passing pictures obtained by the front-end bayonet or simple bayonet are carried out, and the valuable information in the massive bayonet passing pictures can be fully excavated, which can not only improve the accuracy of the license plate model , and increase the identification information of vehicle characteristics, realize the identification and detection functions of vehicle sub-brand, body color, not wearing seat belts, drivers answering and calling, sun visor status, etc., finely correct the passing data, get rid of the traditional The single method of analyzing and judging purely relying on the license plate provides a richer and more practical vehicle prevention and control application for the bayonet electric police data. Effectively lock suspect vehicles in cases involving vehicles and driving, improve the efficiency of criminal investigations, and transform public security prevention and control methods from post-passive investigations to pre-active early warnings.

中国发明专利申请号CN201510744990.4公开发明了一种基于相似度学习的车辆检索方法，给定车辆区域，获得SIFT特征点和描述后，使用聚类算法对SIFT特征离散化。为了弥补SIFT特征缺少位置信息的缺陷，进一步使用邻域内离散SIFT特征分布生成邻域特征，作为最终的特征点描述，每一个车辆图片由一批特征来表示，一对相似车辆图片的特征组成一个正样本，一对不同车辆图片的特征组成一个负样本。如此收集大量的正负样本后，利用随机森林方法进行相似度学习，获得分类器可以用来判断两个车辆是否相似，达到相似车辆检索的目的。该技术使用SIFT特征无法充分提取车辆特征。Chinese invention patent application number CN201510744990.4 discloses a vehicle retrieval method based on similarity learning. Given a vehicle area, after obtaining SIFT feature points and descriptions, a clustering algorithm is used to discretize SIFT features. In order to make up for the lack of position information of SIFT features, the discrete SIFT feature distribution in the neighborhood is further used to generate neighborhood features as the final feature point description. Each vehicle picture is represented by a batch of features, and a pair of features of similar vehicle pictures form a Positive samples, a pair of features of different vehicle images form a negative sample. After collecting a large number of positive and negative samples in this way, the random forest method is used for similarity learning, and the obtained classifier can be used to judge whether two vehicles are similar, so as to achieve the purpose of similar vehicle retrieval. This technique cannot sufficiently extract vehicle features using SIFT features.

中国发明专利申请号CN201610711333.4公开发明了一种基于大数据的车辆检索方法及装置，该方法包括：在目标车辆图像中提取目标车辆的各个车辆检验标志；根据各个车辆检验标志之间的位置关系对各个车辆检验标志进行融合，得到多个融合区域，融合区域包括至少一个所述车辆检验标志；确定各个融合区域包含的各个车辆检验标志的形状和颜色；根据车辆检验标志的数量、融合区域的数量、各个融合区域包含的车辆检验标志的数量、形状和颜色，在多个待检索车辆图像中逐层检索目标车辆。该技术仅仅使用单个特征对车辆进行检索。China Invention Patent Application No. CN201610711333.4 discloses a vehicle retrieval method and device based on big data. The method includes: extracting each vehicle inspection mark of the target vehicle from the target vehicle image; Relational fusion of each vehicle inspection mark to obtain a plurality of fusion areas, the fusion area includes at least one of the vehicle inspection marks; determine the shape and color of each vehicle inspection mark contained in each fusion area; according to the number of vehicle inspection marks, the fusion area The number, the number, shape and color of the vehicle inspection marks contained in each fusion area, and the target vehicle is retrieved layer by layer in multiple vehicle images to be retrieved. The technique uses only a single feature to retrieve vehicles.

中国发明专利申请号CN201710451957.1公开发明了一种基于机器视觉的套牌车辆检索识别系统。所述的系统主要包括车辆图像采集系统，数据库系统和检索系统，本发明提出一种基于机器视觉的套牌车辆检索识别系统，借助可疑车辆的车载装饰品，例如车辆的摆件、年检标签等特征进行嫌疑车辆的检索，通过对车载装饰品区域图像进行特征采集，并采用基于车载装饰品区域图像稀疏编码方法进行车辆检索，解决从海量交通场景图像中搜索目标车辆的问题，实现对套牌车辆的准确识别与发现。该技术在大量数据库中时间复杂度较高。China Invention Patent Application No. CN201710451957.1 discloses a machine vision-based retrieval and recognition system for licensed vehicles. The system mainly includes a vehicle image acquisition system, a database system, and a retrieval system. The present invention proposes a machine vision-based retrieval and recognition system for licensed vehicles, which uses the vehicle decorations of suspicious vehicles, such as vehicle decorations, annual inspection labels, etc. Carry out the retrieval of suspected vehicles, through the feature collection of the image of the vehicle decoration area, and use the sparse coding method based on the image of the vehicle decoration area to perform vehicle retrieval, solve the problem of searching for the target vehicle from a large number of traffic scene images, and realize the identification of licensed vehicles accurate identification and discovery. This technique has high time complexity in a large number of databases.

综上所述，利用深度自编码卷积神经网络和修正局部敏感哈希再排序方法进行以图搜图技术，目前尚存着在如下若干个棘手的问题：1)如何从复杂的背景中准确分割出被测车辆的整体图像；如何尽可能采用极少的标签图像数据来学习训练获得车辆车型的特征数据；2)如何对车型进行更加的细分类，识别出车辆的品牌、系列、车身颜色等更多的信息。另一方面是如何将车型、车牌和车标在同一深度卷积神经网络中进行并行处理，即实现深度学习的多任务的并行计算，以提高车辆身份识别水平；3)如何设计一种对车辆图像提取实例特征的方法用于相似类型车型检索；4)如何使用提取到的特征建立分层深度搜索，以获取更为精准的检索结果；5)如何减少大数据时代背景下图像检索系统存储空间消耗大、检索速度慢等问题。To sum up, there are still several thorny problems in the image search technology using the deep self-encoded convolutional neural network and the modified local sensitive hash reordering method: 1) How to accurately extract information from the complex background Segment the overall image of the vehicle under test; how to use as little label image data as possible to learn and train the characteristic data of the vehicle model; 2) how to classify the vehicle model more subdivided, and identify the brand, series, and body color of the vehicle Wait for more information. On the other hand, how to process the model, license plate and logo in parallel in the same deep convolutional neural network, that is, to realize the multi-task parallel computing of deep learning, so as to improve the level of vehicle identification; 3) how to design a vehicle The method of extracting instance features from images is used to retrieve similar types of vehicle models; 4) How to use the extracted features to establish a hierarchical deep search to obtain more accurate retrieval results; 5) How to reduce the storage space of image retrieval systems in the era of big data Large consumption, slow retrieval speed and other issues.

发明内容Contents of the invention

针对已有的车辆检索技术中自动化和智能化水平低、缺乏深度学习、难以获取精确的检索结果、检索技术存储空间消耗大，检索速度慢且难以满足大数据时代的图像检索需求等问题，本发明提出了一种基于深度自编码卷积神经网络端对端的通过分层深度搜索的车辆图像检索方法，利用深度学习方法提高了检索系统中的自动化和智能化水平同时使图像识别、特征获取、检索效率完美结合，使得整个检索系统获得了精准的检索结果，使用稀疏编码方式减少了系统对内存的依赖、加快了检索速度，从而满足大数据时代背景下的图像检索需求。Aiming at the low level of automation and intelligence in the existing vehicle retrieval technology, lack of deep learning, difficulty in obtaining accurate retrieval results, large storage space consumption of retrieval technology, slow retrieval speed and difficulty in meeting the image retrieval needs in the era of big data, this paper The invention proposes an end-to-end vehicle image retrieval method through layered depth search based on deep self-encoded convolutional neural network, which improves the automation and intelligence level of the retrieval system by using deep learning methods and simultaneously enables image recognition, feature acquisition, The perfect combination of retrieval efficiency enables the entire retrieval system to obtain accurate retrieval results, and the use of sparse coding reduces the system's dependence on memory and speeds up retrieval, thereby meeting the image retrieval needs in the era of big data.

为了解决上述技术问题，本发明提供如下的技术方案：In order to solve the above technical problems, the present invention provides the following technical solutions:

一种基于多任务深度学习的修正局部敏感哈希车辆检索方法，包括以下步骤：A modified local sensitive hash vehicle retrieval method based on multi-task deep learning, comprising the following steps:

1)构建用于深度学习和训练识别的多任务端到端的卷积神经网络，训练数据和逐层递进的网络结构深入地学习车辆各种属性信息，包括车型、车系、车标、颜色和车牌；1) Construct a multi-task end-to-end convolutional neural network for deep learning and training recognition. The training data and layer-by-layer progressive network structure can deeply learn various attribute information of vehicles, including model, car series, car logo, color and license plate;

2)利用步骤1)的多任务卷积神经网络采用分段并行学习和编码策略构建车辆属性哈希码；2) Utilize the multi-task convolutional neural network of step 1) to adopt segmented parallel learning and encoding strategies to construct vehicle attribute hash codes;

3)利用金字塔池化层和向量压缩层构建特征金字塔模块，以适应不同尺寸的卷积特征图输入提取车辆的实例特征；3) Use the pyramid pooling layer and the vector compression layer to construct a feature pyramid module to adapt to the convolution feature map input of different sizes to extract the instance features of the vehicle;

4)利用步骤3)得到的实例特征构建局部敏感再排序算法；4) Using the instance features obtained in step 3) to construct a locally sensitive reordering algorithm;

5)构建在无法获取检索车辆图像情况下的跨模态检索方法，实现车辆检索。5) Construct a cross-modal retrieval method under the condition that the retrieved vehicle image cannot be obtained to realize vehicle retrieval.

进一步，所述用于深度学习和训练识别的多任务端到端的卷积神经网络含有共享卷积模块，感兴趣区域坐标回归和识别模块，多任务学习模块，实例特征提取模块；Further, the multi-task end-to-end convolutional neural network for deep learning and training recognition contains a shared convolution module, a region of interest coordinate regression and recognition module, a multi-task learning module, and an instance feature extraction module;

共享卷积模块：共享网络由5个卷积模块组成，其中conv2_x到conv5_x的最后一层分别为{4²,8²,16²,16²}作为特征图的输出尺寸，conv1作为输入层只含有单层卷积层；Shared convolution module: The shared network consists of 5 convolution modules, where the last layer of conv2_x to conv5_x is {4 ² ,8 ² ,16 ² ,16 ² } as the output size of the feature map, conv1 as the input layer only Contains a single convolutional layer;

在共享卷积模块之后连接感兴趣区域坐标回归和识别模块，此模块可将任意大小的图像作为输入，输出目标区域的矩形预测框的集合，包含了每个预测框的位置坐标和数据集中类别的概率得分，为了生成区域建议框，首先输入图像经过卷积共享层生成特征图，然后在特征图上进行多尺度卷积操作，实现过程为：在每一个滑动窗口的位置使用3种尺度和3种长宽比，以当前滑动窗口中心为中心，并对应一种尺度和长宽比，则可以在原图上映射得到9种不同尺度的候选区域；如对于大小为w×h的共享卷积特征图，则总共有w×h×9个候选区域；最后，分类层输出w×h×9×2个候选区域的得分，即对每个区域是目标/非目标的估计概率，回归层输出w×h×9×4个参数，即候选区域的坐标参数；After the shared convolution module, the coordinate regression and recognition module of the region of interest is connected. This module can take an image of any size as input, and output a set of rectangular prediction frames of the target area, including the position coordinates of each prediction frame and the category in the data set. The probability score of , in order to generate a region proposal box, first input the image through the convolution shared layer to generate a feature map, and then perform a multi-scale convolution operation on the feature map. The implementation process is: use 3 scales and 3 kinds of aspect ratios, centered on the center of the current sliding window, and corresponding to one scale and aspect ratio, then 9 candidate regions of different scales can be mapped on the original image; for example, for a shared convolution with a size of w×h feature map, there are a total of w×h×9 candidate regions; finally, the classification layer outputs the scores of w×h×9×2 candidate regions, that is, the estimated probability of each region being a target/non-target, and the regression layer outputs w×h×9×4 parameters, that is, the coordinate parameters of the candidate area;

训练RPN网络时，给每个候选区域分配一个二进制标签，以此来标注该区域是否是对象目标，操作如下：1)与某个真正目标区域(Ground Truth，GT)有最高的IoU(Intersection-over-Union，交集并集之比)重叠候选区域；2)与任意GT包围盒有大于0.7的IoU交叠的候选区域。分配负标签给与所有GT包围盒的IoU比率都低于0.3的候选区域；3)介于两者之间的舍弃。When training the RPN network, assign a binary label to each candidate area to mark whether the area is an object target. The operation is as follows: 1) It has the highest IoU (Intersection- over-Union, the ratio of intersection and union) overlapping candidate regions; 2) candidate regions that have an IoU overlap greater than 0.7 with any GT bounding box. Assign negative labels to candidate regions whose IoU ratios to all GT bounding boxes are lower than 0.3; 3) discard in between.

有了这些定义，最小化目标函数。对一个图像的损失函数定义为:With these definitions, the objective function is minimized. The loss function for an image is defined as:

其中，i是第i个候选区域的索引，P_i是候选区域是第i类的概率。如果候选区域的标签为正，为1，如果候选区域标签为0，就是0。t_i是一个向量，表示预测的包围盒的4个参数化坐标，是对应的GT包围盒的坐标向量。N_cls和N_reg分别为分类损失函数与位置回归损失函数的归一化系数，λ为两者之间的权重参数，分类损失函数L_cls是两个类别的对数损失，两个类别为目标和非目标：where i is the index of the ith candidate region, and P _i is the probability that the candidate region is the ith class. If the label of the candidate region is positive, is 1, if the candidate region label is 0, It is 0. t _i is a vector representing the 4 parameterized coordinates of the predicted bounding box, is the coordinate vector of the corresponding GT bounding box. N _cls and N _reg are the normalization coefficients of the classification loss function and the position regression loss function respectively, λ is the weight parameter between the two, the classification loss function L _cls is the logarithmic loss of the two categories, and the two categories are the target and non-target:

对于位置回归损失函数L_reg，用以下函数定义：For the positional regression loss function L _reg , the following function is defined:

其中，R是鲁棒的损失函数(smooth L1)。Among them, R is a robust loss function (smooth L1).

然而，训练一个多任务深度学习网络并非是一件容易实现的过程，因为不同任务级别的信息有着各自不同的学习难点和收敛速度，因此，设计一个良好的多任务目标函数是至关重要的。多任务联合训练过程如下：假设，总任务数为T，对于第t个任务的训练数据记为其中t∈(1,T)，i∈(1,N)，N为总训练样本数，分别为第i样本的特征向量和标注标签，那么多任务目标函数则表示为：However, training a multi-task deep learning network is not an easy process because different task-level information has different learning difficulties and convergence speeds. Therefore, it is crucial to design a good multi-task objective function. The multi-task joint training process is as follows: Assume that the total number of tasks is T, and the training data for the t-th task is recorded as Where t∈(1,T), i∈(1,N), N is the total number of training samples, are the feature vector and label of the i-th sample respectively, then the multi-task objective function is expressed as:

式中是输入特征向量和权重参数w^t的映射函数，L(·)为损失函数，Φ(w^t)为权重参数的正则化值；In the formula is the input feature vector and the mapping function of the weight parameter w ^t , L( ) is the loss function, and Φ(w ^t ) is the regularization value of the weight parameter;

对于损失函数，利用softmax配合对数似然代价函数训练最后一层的特征，实现图像分类。softmax损失函数定义如下：For the loss function, the features of the last layer are trained using softmax and the log-likelihood cost function to achieve image classification. The softmax loss function is defined as follows:

式中，x_i为第i深度特征，W_j为最后一个全连接层中权重的第j列，b是偏置项，m,n分别为处理样本数量与类别数；In the formula, x _i is the i-th depth feature, W _j is the j-th column of the weight in the last fully connected layer, b is the bias item, m, n are the number of processing samples and the number of categories respectively;

卷积神经网络训练是一个反向传播过程，与BP算法类似，通过误差函数反向传播，利用随机梯度下降法对卷积参数和偏置进行优化调整，直到网络收敛或者达到最大迭代次数停止；Convolutional neural network training is a backpropagation process, similar to the BP algorithm, through the backpropagation of the error function, using the stochastic gradient descent method to optimize and adjust the convolution parameters and bias until the network converges or reaches the maximum number of iterations to stop;

该神经网络训练是一个反向传播过程，通过误差函数反向传播，利用随机梯度下降法对卷积参数和偏置进行优化调整，直到网络收敛或者达到最大迭代次数停止；The neural network training is a backpropagation process, through the backpropagation of the error function, the convolution parameters and bias are optimized and adjusted by the stochastic gradient descent method, until the network converges or reaches the maximum number of iterations to stop;

反向传播需要通过对带有标签的训练样本进行比较，采用平方误差代价函数，对于c个类别，N个训练样本的多类别进行识别，网络最终输出误差函数用公式(7)来计算误差，Backpropagation needs to compare the training samples with labels, and use the square error cost function to identify multiple categories of c categories and N training samples. The final output error function of the network uses formula (7) to calculate the error.

式中，E^N为平方误差代价函数，为第n个样本对应标签的第k维，为第n个样本对应网络预测的第k个输出；In the formula, E ^N is the square error cost function, is the k-th dimension of the label corresponding to the n-th sample, The nth sample corresponds to the kth output predicted by the network;

对误差函数进行反向传播时，采用传统的BP算法类似的计算方法，具体公式形式，如公式(8)所示，When backpropagating the error function, a calculation method similar to the traditional BP algorithm is used, and the specific formula form is shown in formula (8),

δ^l＝(W^l+1)^Tδ^l+1×f'(u^l)(u^l＝W^lx^l-1+b^l) (8)δ ^l ＝(W ^l+1 ) ^T δ ^l+1 ×f'(u ^l )(u ^l ＝W ^l x ^l-1 +b ^l ) (8)

式中，δ^l代表当前层的误差函数，δ^l+1代表上一层的误差函数，W^l+1为上一层映射矩阵，f'表示激活函数的反函数，即上采样，u^l表示未通过激活函数的上一层的输出，x^l-1表示下一层的输入，W^l为本层映射权值矩阵。In the formula, δ ^l represents the error function of the current layer, δ ^l+1 represents the error function of the previous layer, W ^l+1 is the mapping matrix of the previous layer, f' represents the inverse function of the activation function, that is, upsampling, u ^l Represents the output of the previous layer that has not passed the activation function, x ^l-1 represents the input of the next layer, and W ^l is the mapping weight matrix of this layer.

再进一步，多任务在学习过程中存在任务之间的关联性，即任务之间存在信息共享，在同时训练多个任务时，网络利用任务之间的共享信息增强系统的归纳偏置能力和分类器的泛化能力；所述多任务网络通过在感兴趣区域模块后添加五个全连接层分为五个子任务，每个全连接后连接softmax激活函数将阈值归一化在[0,1]之间，再将归一化后的值送入分段函数促进二进制码的输出，通过分段学习和编码策略降低哈希码之间的冗余性从而增强学习到的特征的鲁棒性；Furthermore, there is a correlation between tasks in the multi-task learning process, that is, there is information sharing between tasks. When training multiple tasks at the same time, the network uses the shared information between tasks to enhance the system's inductive bias ability and classification. The generalization ability of the device; the multi-task network is divided into five subtasks by adding five fully connected layers after the region of interest module, and each fully connected softmax activation function normalizes the threshold in [0,1] In between, the normalized value is sent to the segmentation function to promote the output of the binary code, and the redundancy between the hash codes is reduced through segmentation learning and coding strategies to enhance the robustness of the learned features;

将多任务学习网络分为T个任务，每个任务中含有c^t个类别，每个任务的全连接层一维向量输出使用m^t表示；首先利用softmax激活函数将全连接层的输出归一化在[0,1]之间，公式具体表现形式如下：The multi-task learning network is divided into T tasks, each task contains c ^t categories, and the one-dimensional vector output of the fully connected layer of each task is represented by m ^t ; firstly, the output of the fully connected layer is normalized by using the softmax activation function between [0,1], the specific expression of the formula is as follows:

其中θ表示随机超平面；得到归一化后的值送入阈值分段函数进行二值化，得到全连接层的二进制输出，公式具体表现形式如下：Among them, θ represents a random hyperplane; the normalized value is sent to the threshold segmentation function for binarization, and the binary output of the fully connected layer is obtained. The specific expression of the formula is as follows:

最后为了获取多任务卷积网络分段并行学习到的车辆属性哈希码，对公式(10)得到的H^t向量再次进行一定比例的融合，使用向量f_A表示，公式具体表现形式如下：Finally, in order to obtain the vehicle attribute hash code learned in parallel by the multi-task convolutional network, the H ^t vector obtained by the formula (10) is fused again in a certain proportion, and expressed by the vector f _A. The specific expression of the formula is as follows:

f_A＝[α¹H¹；α²H²；...；α^tH^t] (11)f _A = [α ¹ H ¹ ; α ² H ² ; . . . ; α ^t H ^t ] (11)

公式(11)中的α^t具体表现形式如下：The specific form of α ^t in formula (11) is as follows:

在每个H^t之前乘上惩罚因子α^t弥补了不同任务之间由于不同分类数造成的误差。Multiplying the penalty factor α ^t before each H ^t makes up for the error caused by different classification numbers between different tasks.

更进一步，手工设计的功能时代大量使用了形象化的图像金字塔，以至于像DPM这样的物体检测器需要高密度的采样才能获得好的结果(例如每倍频10个音阶)；对于识别任务，工程特征已经被深度卷积网络计算的特征大部分所取代。深度卷积网络除了能够表示更高级别的语义之外，还具有更强大的尺度变异性，从而有助于从单一输入尺度上计算的特征中识别出来；但是，即使有这种鲁棒性，金字塔仍然需要得到最准确的结果；在ImageNet和COCO检测挑战中最近的所有最重要的条目都使用了对特征化图像金字塔的多尺度测试；使图像金字塔的每个级别具有特征的主要优点在于它产生了多级别的特征表示，其中所有级别在语义上都是强的，包括高分辨率级别；Furthermore, the age of hand-designed features makes heavy use of figurative image pyramids, so that object detectors like DPM require high-density sampling to achieve good results (e.g., 10 octaves); for recognition tasks, Engineered features have been mostly replaced by features computed by deep convolutional networks. In addition to being able to represent higher-level semantics, deep convolutional networks have more robust scale variability, thereby facilitating recognition from features computed at a single input scale; however, even with this robustness, Pyramids are still needed to get the most accurate results; all recent top entries in the ImageNet and COCO detection challenges use multi-scale testing of featurized image pyramids; the main advantage of having features at each level of an image pyramid is that it A multi-level feature representation is produced, where all levels are semantically strong, including the high-resolution level;

利用卷积特征层次结构的金字塔形状，同时创建一个在所有尺度上都具有强大语义的特征金字塔；为了实现这个目标依靠一种结构将低分辨率，语义上强大的特征与高分辨率，语义上弱的特征通过自顶向下的路径和横向连接相结合，并且可以从单个输入图像比例快速构建，可以用来代替特征化的图像金字塔而不牺牲代表性的特征，速度或内存；为了得到车辆图像的实例特征并且适应任意尺寸的卷积特征图的输入，选择共享模块conv2_x到conv5_x每个单元的最后一层并结合感兴趣区域模块的输出，再添加一个金字塔池化层和向量压缩层将三维特征压缩成一个一维特征向量，这样选择是因为即可丰富特征金字塔得到的特征图信息，每个阶段的最深层又有最强大的特征表示功能；Utilizes the pyramid shape of the convolutional feature hierarchy while simultaneously creating a feature pyramid that is semantically strong at all scales; to achieve this goal relies on a structure that combines low-resolution, semantically powerful features with high-resolution, semantically Weak features are combined via top-down paths and lateral connections, and can be quickly constructed from a single input image scale, and can be used instead of a featurized image pyramid without sacrificing representative features, speed, or memory; in order to obtain vehicle The instance features of the image and adapt to the input of the convolution feature map of any size, select the last layer of each unit of the shared module conv2_x to conv5_x and combine the output of the region of interest module, and then add a pyramid pooling layer and a vector compression layer will be The three-dimensional features are compressed into a one-dimensional feature vector. This choice is because the feature map information obtained by the feature pyramid can be enriched, and the deepest layer of each stage has the most powerful feature representation function;

将每个模块的最后一层作为特征金字塔的输入，对于上述定义的网络conv2_x到conv5_x的最后一层依次选择{4²,8²,16²,16²}作为特征金字塔的输入特征图大小；用I表示输入图像，其长宽分别用字母h，w表示，第x层的共享卷积模块用convx_x表示，输入图像I后被激活为一个三维的特征向量T，维度为h′×w′×d是一系列二维特征图的集合，二维特征图长宽为h′×w′，T中含有d个二维特征图，用集合S＝S{S_n},n∈(1,d)表示，S_n对应为第n个通道特征图；接着将三维的特征向量T送入特征金字塔，经过多个尺度卷积核卷积过后得到三维特征向量T′，维度为l×l×d，同样包含一组二维特征图，可用S′＝S′{S′_n},n∈(1,d)表示，其中S_n′对应为第n个通道特征图，每个特征图大小为l×l，一共包含d个；然后利用k×k大小的滑动窗口并且选择最大池化对特征图进行逻辑回归，得到一组l/k×l/k大小的特征图，再对每个通道的S_n′进行融合得到一维向量，依次对d个通道进行相同操作，最后得到的个性特征向量f_B大小为(1,l/k×d)。最终的检索特征向量f如公式(13)所示：Take the last layer of each module as the input of the feature pyramid, and select {4 ² , 8 ² , 16 ² , 16 ² } as the input feature map size of the feature pyramid for the last layer of the network conv2_x to conv5_x defined above; Use I to represent the input image, and its length and width are represented by letters h and w respectively. The shared convolution module of the xth layer is represented by convx_x. After the input image I is activated into a three-dimensional feature vector T, the dimension is h′×w′ ×d is a set of a series of two-dimensional feature maps, the length and width of the two-dimensional feature maps are h′×w′, T contains d two-dimensional feature maps, and the set S=S{S _n },n∈(1, d) indicates that S _n corresponds to the feature map of the nth channel; then the three-dimensional feature vector T is sent into the feature pyramid, and the three-dimensional feature vector T′ is obtained after convolution with multiple scale convolution kernels, and the dimension is l×l× d, also contains a set of two-dimensional feature maps, which can be represented by S′=S′{S′ _n }, n∈(1,d), where S _n ′ corresponds to the nth channel feature map, and the size of each feature map It is l×l, including d in total; then use the sliding window of k×k size and select the maximum pooling to perform logistic regression on the feature map to obtain a set of feature maps of l/k×l/k size, and then for each Channel S _n ′ is fused to obtain a one-dimensional vector, and the same operation is performed on d channels in turn, and the finally obtained personality feature vector f _B has a size of (1,l/k×d). The final retrieval feature vector f is shown in formula (13):

f＝[f_A；f_B] (13)。f=[f _A ; f _B ] (13).

局部敏感哈希算法的基本思想是：将原始数据空间中的两个相邻数据点通过相同的映射或投影变换后，这两个数据点在新的数据空间中仍然相邻的概率很大，而不相邻的数据点被映射到同一个桶的概率很小。也就是说，如果我们对原始数据进行一些哈希映射后，我们希望原先相邻的两个数据能够被哈希到相同的桶内，具有相同的桶号。对原始数据集合中所有的数据都进行哈希映射后，这样就得到了一个哈希表，这些原始数据集被分散到了哈希表的桶内，每个桶会落入一些原始数据，属于同一个桶内的数据就有很大可能是相邻的，当然也存在不相邻的数据被哈希到了同一个桶内。因此，如果能够找到这样一些哈希函数，使得经过它们的哈希映射变换后，原始空间中相邻的数据落入相同的桶内的话，那么在该数据集合中进行近邻查找就变得容易了，只需要将查询数据进行哈希映射得到其桶号，然后取出该桶号对应桶内的所有数据，再进行线性匹配即可查找到与查询数据相邻的数据。换句话说，通过哈希函数映射变换操作，将原始数据集合分成了多个子集合，而每个子集合中的数据间是相邻的且该子集合中的元素个数较小，因此将一个在超大集合内查找相邻元素的问题转化为了在一个很小的集合内查找相邻元素的问题，这种算法能使得查找计算量大幅度下降；The basic idea of locality-sensitive hashing algorithm is: After transforming two adjacent data points in the original data space through the same mapping or projection, the probability that these two data points are still adjacent in the new data space is very high. The probability that non-adjacent data points are mapped to the same bucket is very small. That is to say, if we perform some hash mapping on the original data, we hope that two adjacent data can be hashed into the same bucket with the same bucket number. After hash mapping is performed on all the data in the original data set, a hash table is obtained. These original data sets are scattered into the buckets of the hash table, and each bucket will fall into some original data belonging to the same bucket. The data in a bucket is likely to be adjacent. Of course, there are also non-adjacent data that are hashed into the same bucket. Therefore, if some hash functions can be found so that after their hash map transformation, the adjacent data in the original space fall into the same bucket, then it becomes easy to search for neighbors in the data set , you only need to perform hash mapping on the query data to obtain its bucket number, then take out all the data in the bucket corresponding to the bucket number, and then perform linear matching to find the data adjacent to the query data. In other words, through the hash function mapping transformation operation, the original data set is divided into multiple sub-sets, and the data in each sub-set is adjacent and the number of elements in the sub-set is small, so one in The problem of finding adjacent elements in a very large set is transformed into the problem of finding adjacent elements in a small set. This algorithm can greatly reduce the amount of search calculations;

对于原本相邻的两个数据点经过哈希变换后落入相同桶内的哈希函数需要满足以下两个条件：For the hash function that two adjacent data points fall into the same bucket after hash transformation, the following two conditions need to be met:

如果d(x,y)≤d1，则h(x)＝h(y)的概率至少为p1；If d(x,y)≤d1, then h(x)=h(y) with probability at least p1;

如果d(x,y)≥d2，则h(x)＝h(y)的概率至多为p2；If d(x,y)≥d2, then the probability of h(x)=h(y) is at most p2;

其中d(x,y)表示x和y之间的距离，d1<d2，h(x)和h(y)分别表示对x和y进行哈希变换。Where d(x,y) represents the distance between x and y, d1<d2, h(x) and h(y) represent the hash transformation of x and y, respectively.

满足以上两个条件的哈希函数称为(d1,d2,p1,p2)-敏感。而通过一个或多个(d1,d2,p1,p2)-敏感的哈希函数对原始数据集合进行哈希生成一个或多个哈希表的过程称为局部敏感哈希。A hash function that satisfies the above two conditions is called (d1,d2,p1,p2)-sensitive. The process of generating one or more hash tables by hashing the original data set through one or more (d1, d2, p1, p2)-sensitive hash functions is called locality-sensitive hashing.

使用局部敏感哈希进行对海量数据建立索引，即哈希表并通过索引来进行近似最近邻查找的过程如下：The process of using local sensitive hash to index massive data, that is, hash table and perform approximate nearest neighbor search through index is as follows:

离线建立索引Offline indexing

(1)选取满足(d1,d2,p1,p2)-敏感的局部敏感哈希的哈希函数；(1) Select a hash function that satisfies (d1,d2,p1,p2)-sensitive locality-sensitive hashing;

(2)根据对查找结果的准确率，即相邻的数据被查找到的概率来确定哈希表的个数L，每个哈希表内的哈希函数的个数K，以及跟局部敏感哈希的哈希函数自身有关的参数；(2) Determine the number L of hash tables, the number K of hash functions in each hash table, and local sensitivity according to the accuracy of the search results, that is, the probability that adjacent data is found parameters related to the hash function itself;

(3)将所有数据经过局部敏感哈希的哈希函数哈希到相应的桶内，构成了一个或多个哈希表；(3) Hash all the data into corresponding buckets through the hash function of local sensitive hashing, forming one or more hash tables;

在线查找find online

(1)将查询数据经过局部敏感哈希的哈希函数哈希得到相应的桶号；(1) Hash the query data through the hash function of local sensitive hash to obtain the corresponding bucket number;

(2)将桶号中对应的数据取出；为了保证查找速度，只取出前2L个数据；(2) Take out the corresponding data in the bucket number; in order to ensure the search speed, only take out the first 2L data;

(3)计算查询数据与这2L个数据之间的相似度或距离，返回最近邻的数据；(3) Calculate the similarity or distance between the query data and the 2L data, and return the nearest neighbor data;

局部敏感哈希在线查找时间由两个部分组成：①通过局部敏感哈希的哈希函数计算哈希值，即计算桶号的时间；②将查询数据与桶内的数据进行比较计算的时间。因此，局部敏感哈希的查找时间至少是一个次线性时间。这是因为这里通过对桶内的属于建立索引来加快匹配速度，这时第②部分的耗时就从O(N)变成了O(logN)或O(1)，极大的减少了计算量。The local sensitive hash online search time consists of two parts: ① the time to calculate the hash value through the hash function of the local sensitive hash, that is, the time to calculate the bucket number; ② the time to compare the query data with the data in the bucket. Therefore, the lookup time of locality-sensitive hashing is at least a sublinear time. This is because the matching speed is accelerated by indexing the belongings in the bucket. At this time, the time consumption of part ② changes from O(N) to O(logN) or O(1), which greatly reduces the calculation quantity.

所述的局部敏感哈希的一个关键是：将相似的样本映射到同一个具有高概率的同一个桶；局部敏感哈希的哈希函数h(.)满足以下条件：A key of the locality-sensitive hashing is: to map similar samples to the same bucket with high probability; the hash function h(.) of the locality-sensitive hashing satisfies the following conditions:

s{h(f_Aq)＝h(f_A)}＝sim(f_Aq,f_A) (14)s{h(f _Aq )=h(f _A )}=sim(f _Aq ,f _A ) (14)

式中，sim(f_Aq,f_A)表示f_Aq与f_A的相似度，h(f_A)表示f_A的哈希函数，h(f_Aq)表示f_A的哈希函数，，其中的相似性度量与一个距离函数σ直接关联，如：In the formula, sim(f _Aq , f _A ) represents the similarity between f _Aq and f _A , h(f _A ) represents the hash function of f _A , h(f _Aq ) represents the hash function of f _A , where The similarity measure is directly related to a distance function σ, such as:

局部敏感哈希函数的典型分类由随机投影和阈值给出，如式(16)所示，A typical classification of locality-sensitive hash functions is given by random projections and thresholds, as shown in Equation (16),

h(f_A)＝sign(Wf_A+b) (16)h(f _A )=sign(Wf _A +b) (16)

式中，W是一个随机超平面向量，b是一个随机截距。where W is a random hyperplane vector and b is a random intercept.

所述的局部敏感哈希由预处理算法和最近邻搜索算法构成，通过这两个算法处理将将搜索图像特征表示成一串固定长度的二值编码；The locality-sensitive hash is composed of a preprocessing algorithm and a nearest neighbor search algorithm, and the search image features will be represented as a string of fixed-length binary codes through these two algorithm processing;

所述预处理算法的过程为：The process of the preprocessing algorithm is:

输入一组提取的图像特征p和哈希表的数l₁，使用随机哈希函数g(.)对图像特征进行映射，将点p_j存储到哈希表T_i相应的桶号g_i(p_j)中；输出哈希表T_i,i＝1,…,l₁；Input a set of extracted image features p and the number l ₁ of the hash table, use the random hash function g(.) to map the image features, and store the point p _j in the corresponding bucket number g _i ₍ p _j ); output hash table T _i , i=1,...,l ₁ ;

所述最近邻搜索算法的过程为：The process of the nearest neighbor search algorithm is:

输入一个检索图像特征q，访问由预处理算法所生成的哈希表T_i,i＝1,…,l₁最近邻的数目K，返回检索点q在数据集S中的K个最近邻数据；Input a retrieval image feature q, access the hash table T _i generated by the preprocessing algorithm, i=1,...,l ₁ the number K of the nearest neighbors, and return the K nearest neighbor data of the retrieval point q in the data set S ;

设Γ＝{I₁,I₂,…,I_n}为检索的由n个图像构成的数据集，其每幅图像所对应的二进制代码为Γ_H＝{H₁,H₂,…,H_n}，H_i∈{0,1}^h；给定搜索图像I_q和二进制代码H_q，将H_q与H_i∈Γ_H之间的汉明距离小于阈值T_H的那些图像放入到候选池P中，为候选图像。Let Γ={I ₁ ,I ₂ ,…,In } be the retrieved data set consisting of _n images, and the binary code corresponding to each image is Γ _H ＝{H ₁ ,H ₂ ,…,H _n }, H _i ∈ {0,1} ^h ; given a search image I _q and a binary code H _q , put those images whose Hamming distance between H _q and H _i ∈ Γ _H is smaller than the threshold T _H into In the candidate pool P, for candidate images.

利用实例特征构建局部敏感再排序算法；在传统的局部敏感哈希算法中，返回的主要是距离上相近的图像，即检索图像和候选池中的图像相似度接近1；主要是因为通过低维的车辆属性哈希码映射后可得到相同型号的车辆，但是相同型号下的车辆仍存某些难以区分的情况，从人的主观判断上存在明显区别，但是仅仅通过车辆属性哈希码不能有效区别这些差异；为了排查出候选池中与检索图片含有相同个性特征的车辆，在检索图像经过车辆属性哈希码映射到各个桶中后，再利用获取到的图像实例特征，对桶中的图片再次进行排序缩小类内误差，再排序公式表现形式如下：Using instance features to construct a local-sensitive reordering algorithm; in the traditional local-sensitive hashing algorithm, the returned images are mainly similar in distance, that is, the similarity between the retrieved image and the image in the candidate pool is close to 1; mainly because of the low-dimensional Vehicles of the same model can be obtained after mapping the vehicle attribute hash code, but there are still some indistinguishable situations between vehicles of the same model. There are obvious differences in human subjective judgment, but only through the vehicle attribute hash code cannot be effective Distinguish these differences; in order to find out the vehicles in the candidate pool that have the same personality characteristics as the retrieved pictures, after the retrieved images are mapped to each bucket through the vehicle attribute hash code, and then use the obtained image instance features to compare the images in the bucket Sorting is performed again to reduce the intra-class error, and the expression of the re-sorting formula is as follows:

在公式(17)中，k表示经过车辆属性哈希码映射选出的桶中第k个图像，表示惩罚因子且cos表示度量图像实例特征的余弦距离公式；为了排除车辆属性哈希码的错误映射，y表示映射前检索图像f_Aq与桶中图像的类型是否相等，如果相等则y为1，否则为0；In formula (17), k represents the kth image in the bucket selected by the vehicle attribute hash code mapping, represents the penalty factor and cos represents the cosine distance formula for measuring the characteristics of image instances; in order to eliminate the wrong mapping of vehicle attribute hash codes, y represents the retrieval image f _Aq before mapping and the image in the bucket Whether the types of are equal, if they are equal, y is 1, otherwise it is 0;

在进一步排序中，已经将H_q与H_i∈Γ_H之间的汉明距离小于阈值T_H的那些图像放入到候选池P中，为了得到更为精准的搜索结果，本发明在候选池基础上进一步采用再排序方法；In further sorting, those images whose Hamming distance between H _q and H _i ∈ Γ _H are smaller than the threshold T _H have been put into the candidate pool P. In order to obtain more accurate search results, the present invention selects On the basis of further adopting the reordering method;

再排序方法，给定搜索图像I_q和候选池P，使用实例特征来确定从候选池P中图像的前k个排名图像；使用公式(17)计算它们之间的相似程度，The reranking method, given a search image I _q and a candidate pool P, uses instance features to determine the top-k ranked images from the images in the candidate pool P; the similarity between them is calculated using Equation (17),

进一步，关于再排序的评价，使用一个以排名为基础的标准来进行评价；对于给定一个搜索图像I_q和一个相似性度量，对每个数据集图像进行一个排名；这里用评估前k个排名图像来表示一个搜索图像I_q的检索精度，用公式(18)表示；Further, regarding the evaluation of re-ranking, a ranking-based criterion is used for evaluation; given a search image I _q and a similarity measure, a ranking is performed for each dataset image; here, the top k Ranking images to represent the retrieval accuracy of a search image I _q , expressed by formula (18);

式中Rel(i)表示搜索图像I_q与第i个排名图像之间的真实相关，k表示排名图像的个数，Precision@k搜索精度；在计算真实相关时，只考虑有分类标签部分，Rel(i)∈{0,1}，如果搜索图像与第i个排名图像都具有相同的标签设置Rel(i)＝1，否则设置Rel(i)＝0，遍历候选池P中前k个排名图像就能得到搜索精度；In the formula, Rel(i) represents the true correlation between the search image I _q and the i-th ranked image, k represents the number of ranked images, Precision@k search accuracy; when calculating the true correlation, only the part with classification labels is considered, Rel(i) ∈ {0,1}, if the search image has the same label as the i-th ranked image, set Rel(i)=1, otherwise set Rel(i)=0, traverse the top k candidates in the pool P Rank images to get search accuracy;

所述步骤5)中，在无法获取检索的图像时，采用文本检索方式进行辅助检索，在不增加额外训练的情况下，使得通过文本得到的检索特征和卷积网络得到的文本特征可共用一套检索方式，其文本获取特征方法如下：In the step 5), when the retrieved image cannot be obtained, the text retrieval method is used for auxiliary retrieval, and without additional training, the retrieval features obtained through the text and the text features obtained by the convolutional network can be shared. Set retrieval method, its text acquisition feature method is as follows:

初始化：文本文件解析成词条向量；去除小词、重复词；检查词条确保解析的正确性；Initialization: parse the text file into entry vectors; remove small words and repeated words; check the entries to ensure the correctness of the analysis;

5.1)从输入文本O中取出随机组合的分词最小向量R＝(r₁,r₂,...,r_n)；5.1) Take out the word segmentation minimum vector R=(r ₁ ,r ₂ ,...,r _n ) of random combination from the input text O;

5.2)对R与f_A顺序及车辆属性哈希码整合，得到文本属性特此时的f_ATxt维度小于R的维度；5.2) Integrate R and f _A sequence and vehicle attribute hash code to get the text attribute characteristic At this time, the dimension of _fATxt is smaller than that of R;

5.3)使用局部敏感再排序哈希算法检索；5.3) Retrieve using locality-sensitive reordering hash algorithm;

5.4)返回相似图像组I。5.4) Return similar image group I.

要实现上述发明内容，必须解决几个核心问题：1)针对图像特征提取难的问题，利用深度自编码卷积神经网络的强大的特征表征能力实现特征自适应提；2)针对大规模图像检索速度慢的问题，设计一种多任务分层方法，使用查询图像与数据库中图像快速比对；3)设计一种对车辆图像提取实例特征的方法用于相似类型车型检索；4)设计一种修正局部敏感哈希再排序码增大类内车辆图像之间的区别；5)利用端到端深度网络的优势，设计一种端到端的深度自编码卷积神经网络将检测、识别、特征提取融合到一个网络。To realize the above invention, several core problems must be solved: 1) Aiming at the difficult problem of image feature extraction, using the powerful feature representation ability of deep self-encoding convolutional neural network to realize feature adaptive extraction; 2) Aiming at large-scale image retrieval For the problem of slow speed, design a multi-task layering method, use the query image to quickly compare with the image in the database; 3) design a method for extracting instance features from vehicle images for retrieval of similar types of vehicles; 4) design a Modify the locality-sensitive hash reordering code to increase the difference between vehicle images in the class; 5) take advantage of the end-to-end deep network to design an end-to-end deep self-encoding convolutional neural network that combines detection, recognition, and feature extraction into one network.

本发明的基于多任务深度学习的修正局部敏感哈希再排序车辆检索方法，包括以下过程：1)将图像送入深度自编码卷积神经网络，在特征图上进行逻辑回归，对检索图像上的感兴趣区域进行位置、类别的分割、预测；2)使用多任务深度自编码卷积神经网络提取图像分段并行学习到的车辆属性哈希码；3)利用卷积特征层次结构的金字塔形状提取每辆车辆的实例特征；4)使用修正的局部敏感哈希方法对提取到的特征检索；5)对于无法获取车辆图像的情况下采用跨模态检索；The modified local sensitive hash reordering vehicle retrieval method based on multi-task deep learning of the present invention includes the following processes: 1) sending the image into a deep self-encoding convolutional neural network, performing logistic regression on the feature map, and performing a logical regression on the retrieved image 2) Use the multi-task deep self-encoding convolutional neural network to extract the vehicle attribute hash code learned in parallel by image segmentation; 3) Use the pyramid shape of the convolutional feature hierarchy Extract the instance features of each vehicle; 4) use the modified local sensitive hashing method to retrieve the extracted features; 5) use cross-modal retrieval when the vehicle image cannot be obtained;

本发明的有益效果主要表现在：The beneficial effects of the present invention are mainly manifested in:

1)提供了一种多任务端到端的卷积神经网络对车辆车型、车系、车标、颜色、车牌进行识别；1) A multi-task end-to-end convolutional neural network is provided to identify vehicle models, car series, car logos, colors, and license plates;

2)利用深度卷积神经网络强大的特征表征能力实现特征自适应提取；2) Use the powerful feature representation ability of deep convolutional neural network to realize feature adaptive extraction;

3)构建了修正局部敏感哈希再排序码对卷积网络提取到的特征高校检索；3) A modified local sensitive hash reordering code is constructed to retrieve the features extracted by the convolutional network;

4)本设计兼顾了通用性和专用性，在通用性方面，检索速度、精度和实用性等方面满足各类用户的需求；专用性方面用户根据自己的特定需求，做一个专用数据集并对网络参数进行微调后，实现一种面向特定应用的车辆检索的系统。4) This design takes both versatility and specificity into account. In terms of versatility, retrieval speed, accuracy and practicability meet the needs of various users; After the network parameters are fine-tuned, an application-specific vehicle retrieval system is realized.

附图说明Description of drawings

图1为整体检索流程图。Figure 1 is the overall retrieval flow chart.

图2为整体训练网络流程图。Figure 2 is a flowchart of the overall training network.

图3为RPN网络展开图。Figure 3 is an expanded view of the RPN network.

图4为车辆属性哈希码无法区分车辆示意图。Fig. 4 is a schematic diagram of vehicles that cannot be distinguished by vehicle attribute hash codes.

图5为文本特征向量生成图。Figure 5 is a graph of text feature vector generation.

具体实施方式Detailed ways

下面将结合本发明实施例中的附图，对本发明实施例中的技术方案进行清楚、完整地描述。The following will clearly and completely describe the technical solutions in the embodiments of the present invention with reference to the drawings in the embodiments of the present invention.

参照图1～图5，一种基于多任务深度学习的修正局部敏感哈希车辆检索方法，整体流程图如图1所示，首先对数据库中的图片送入用于深度学习和训练识别的多任务端到端的卷积神经网络，大量地训练和逐层递进的网络结构深入地学习车辆各种属性信息，包括车型、车系、车标、颜色、车牌；再利用此卷积网络提取车辆图像分段并行学习到的车辆属性哈希码，对构建的特征金字塔模块提取车辆的实例特征；对检索车辆图像与数据库中图像使用修正局部敏感哈希再排序方法进行比对。Referring to Figures 1 to 5, a modified local sensitive hash vehicle retrieval method based on multi-task deep learning. The overall flow chart is shown in Figure 1. First, the pictures in the database are sent to multiple Task end-to-end convolutional neural network, a large number of training and layer-by-layer progressive network structure to deeply learn various attribute information of vehicles, including model, car series, car logo, color, license plate; and then use this convolutional network to extract vehicles The vehicle attribute hash code learned in parallel by image segmentation is used to extract the instance features of the vehicle from the constructed feature pyramid module; the retrieved vehicle image is compared with the images in the database using the modified local sensitive hash reordering method.

所述用于深度学习和训练识别的多任务端到端的卷积神经网络含有共享卷积模块，感兴趣区域坐标回归和识别模块，多任务学习模块和实例特征提取模块，整体流程图如图2所示，共含有4个共享卷积模块和4层的特征金字塔模块；图2中双点划线是压缩层提取的车辆实例特征；图2中虚线部分框出的是提出的划分和编码模块，通过不同任务学习车辆的紧凑特征，最后将提取的两个特征向量进行融合；The multi-task end-to-end convolutional neural network used for deep learning and training recognition contains a shared convolution module, a region of interest coordinate regression and recognition module, a multi-task learning module and an instance feature extraction module. The overall flow chart is shown in Figure 2 As shown, there are a total of 4 shared convolution modules and 4 layers of feature pyramid modules; the double dotted line in Figure 2 is the vehicle instance feature extracted by the compression layer; the dotted line in Figure 2 is the proposed division and encoding module , learn the compact features of the vehicle through different tasks, and finally fuse the two extracted feature vectors;

本发明包括以下步骤：The present invention comprises the following steps:

1)共享卷积模块：共享网络由5个卷积模块组成，其中conv2_x到conv5_x的最后一层分别为{4²,8²,16²,16²}作为特征图的输出尺寸，conv1作为输入层只含有单层卷积层；1) Shared convolution module: The shared network consists of 5 convolution modules, where the last layer of conv2_x to conv5_x is {4 ² , 8 ² , 16 ² , 16 ² } as the output size of the feature map, and conv1 as the input The layer contains only a single-layer convolutional layer;

在共享卷积模块之后连接感兴趣区域坐标回归和识别模块，此模块可将任意大小的图像作为输入，输出目标区域的矩形预测框的集合，包含了每个预测框的位置坐标和数据集中类别的概率得分，为了生成区域建议框，首先输入图像经过卷积共享层生成特征图，然后在特征图上进行多尺度卷积操作，具体实现为：在每一个滑动窗口的位置使用3种尺度和3种长宽比，以当前滑动窗口中心为中心，并对应一种尺度和长宽比，则可以在原图上映射得到9种不同尺度的候选区域；如对于大小为w×h的共享卷积特征图，则总共有w×h×9个候选区域。最后，分类层输出w×h×9×2个候选区域的得分，即对每个区域是目标/非目标的估计概率，回归层输出w×h×9×4个参数，即候选区域的坐标参数，具体形式如图3所示；After the shared convolution module, the coordinate regression and recognition module of the region of interest is connected. This module can take an image of any size as input, and output a set of rectangular prediction frames of the target area, including the position coordinates of each prediction frame and the category in the data set. The probability score of , in order to generate a region suggestion box, first input the image through the convolution shared layer to generate a feature map, and then perform a multi-scale convolution operation on the feature map. The specific implementation is: use 3 scales and 3 kinds of aspect ratios, centered on the center of the current sliding window, and corresponding to one scale and aspect ratio, then 9 candidate regions of different scales can be mapped on the original image; for example, for a shared convolution with a size of w×h feature map, there are a total of w×h×9 candidate regions. Finally, the classification layer outputs the scores of w×h×9×2 candidate regions, that is, the estimated probability of being a target/non-target for each region, and the regression layer outputs w×h×9×4 parameters, which are the coordinates of the candidate regions Parameters, the specific form is shown in Figure 3;

训练RPN网络时，给每个候选区域分配一个二进制标签，以此来标注该区域是否是对象目标。具体操作如下：1)与某个真正目标区域(Ground Truth，GT)有最高的IoU(Intersection-over-Union，交集并集之比)重叠候选区域；2)与任意GT包围盒有大于0.7的IoU交叠的候选区域。分配负标签给与所有GT包围盒的IoU比率都低于0.3的候选区域；3)介于两者之间的舍弃。When training the RPN network, a binary label is assigned to each candidate region to mark whether the region is an object target. The specific operation is as follows: 1) It has the highest IoU (Intersection-over-Union, ratio of intersection and union) overlapping candidate area with a real target area (Ground Truth, GT); Candidate regions for IoU overlap. Assign negative labels to candidate regions whose IoU ratios to all GT bounding boxes are lower than 0.3; 3) discard in between.

其中，i是第i个候选区域的索引，P_i是候选区域是第i类的概率。如果候选区域的标签为正，为1，如果候选区域标签为0，就是0。t_i是一个向量，表示预测的包围盒的4个参数化坐标，是对应的GT包围盒的坐标向量。N_cls和N_reg分别为分类损失函数与位置回归损失函数的归一化系数，λ为两者之间的权重参数。分类损失函数L_cls是两个类别(目标vs.非目标)的对数损失：where i is the index of the ith candidate region, and P _i is the probability that the candidate region is the ith class. If the label of the candidate region is positive, is 1, if the candidate region label is 0, It is 0. t _i is a vector representing the 4 parameterized coordinates of the predicted bounding box, is the coordinate vector of the corresponding GT bounding box. N _cls and N _reg are the normalization coefficients of the classification loss function and the position regression loss function respectively, and λ is the weight parameter between the two. The classification loss function L _cls is the log loss for two classes (target vs. non-target):

然而，训练一个多任务深度学习网络并非是一件容易实现的过程，因为不同任务级别的信息有着各自不同的学习难点和收敛速度。因此，设计一个良好的多任务目标函数是至关重要的。多任务联合训练过程如下：假设，总任务数为T，对于第t个任务的训练数据记为其中t∈(1,T)，i∈(1,N)，N为总训练样本数。分别为第i样本的特征向量和标注标签。那么多任务目标函数则可以表示为：However, training a multi-task deep learning network is not an easy process, because information at different task levels has different learning difficulties and convergence speeds. Therefore, it is crucial to design a good multi-task objective function. The multi-task joint training process is as follows: Assume that the total number of tasks is T, and the training data for the t-th task is recorded as Where t∈(1,T), i∈(1,N), N is the total number of training samples. are the feature vector and label of the i-th sample, respectively. Then the multi-task objective function can be expressed as:

式中是输入特征向量和权重参数w^t的映射函数，L(·)为损失函数，Φ(w^t)为权重参数的正则化值。In the formula is the input feature vector and the mapping function of the weight parameter w ^t , L( ) is the loss function, and Φ(w ^t ) is the regularization value of the weight parameter.

式中，δ^l代表当前层的误差函数，δ^l+1代表上一层的误差函数，W^l+1为上一层映射矩阵，f'表示激活函数的反函数，即上采样，u^l表示未通过激活函数的上一层的输出，x^l-1表示下一层的输入，W^l为本层映射权值矩阵；In the formula, δ ^l represents the error function of the current layer, δ ^l+1 represents the error function of the previous layer, W ^l+1 is the mapping matrix of the previous layer, f' represents the inverse function of the activation function, that is, upsampling, u ^l Represents the output of the previous layer that has not passed the activation function, x ^l-1 represents the input of the next layer, and W ^l is the mapping weight matrix of this layer;

2)多任务在学习过程中存在任务之间的关联性，即任务之间存在信息共享，在同时训练多个任务时，网络利用任务之间的共享信息增强系统的归纳偏置能力和分类器的泛化能力；所述多任务网络通过在感兴趣区域模块后添加五个全连接层分为五个子任务，每个全连接后连接softmax激活函数将阈值归一化在[0,1]之间，再将归一化后的值送入分段函数促进二进制码的输出，通过分段学习和编码策略降低哈希码之间的冗余性从而增强学习到的特征的鲁棒性；2) There is a correlation between tasks in the learning process of multi-tasks, that is, there is information sharing between tasks. When training multiple tasks at the same time, the network uses the shared information between tasks to enhance the system's inductive bias ability and classifier The generalization ability; the multi-task network is divided into five subtasks by adding five fully connected layers after the region of interest module, and each fully connected softmax activation function normalizes the threshold between [0,1] Then, the normalized value is sent to the segmentation function to promote the output of the binary code, and the redundancy between the hash codes is reduced through segmentation learning and coding strategies to enhance the robustness of the learned features;

在每个H^t之前乘上惩罚因子α^t弥补了不同任务之间由于不同分类数造成的误差；Multiplying the penalty factor α ^t before each H ^t makes up for the error caused by different classification numbers between different tasks;

3)利用卷积特征层次结构的金字塔形状，同时创建一个在所有尺度上都具有强大语义的特征金字塔；为了实现这个目标依靠一种结构将低分辨率，语义上强大的特征与高分辨率，语义上弱的特征通过自顶向下的路径和横向连接相结合，并且可以从单个输入图像比例快速构建，可以用来代替特征化的图像金字塔而不牺牲代表性的特征，速度或内存；为了得到车辆图像的实例特征并且适应任意尺寸的卷积特征图的输入，选择共享模块conv2_x到conv5_x每个单元的最后一层并结合感兴趣区域模块的输出，再添加一个金字塔池化层和向量压缩层将三维特征压缩成一个一维特征向量，这样选择是因为即可丰富特征金字塔得到的特征图信息，每个阶段的最深层又有最强大的特征表示功能；3) Utilize the pyramidal shape of the convolutional feature hierarchy while creating a feature pyramid that is semantically strong at all scales; to achieve this goal relies on a structure that combines low-resolution, semantically strong features with high-resolution, Semantically weak features are combined via top-down paths and lateral connections, and can be quickly constructed from a single input image scale, and can be used to replace featurized image pyramids without sacrificing representative features, speed, or memory; for Get the instance features of the vehicle image and adapt to the input of the convolution feature map of any size, select the last layer of each unit of the shared module conv2_x to conv5_x and combine the output of the region of interest module, and then add a pyramid pooling layer and vector compression The layer compresses the three-dimensional features into a one-dimensional feature vector. This choice is because the feature map information obtained by the feature pyramid can be enriched, and the deepest layer of each stage has the most powerful feature representation function;

f＝[f_A；f_B] (13)f=[f _A ; f _B ] (13)

局部敏感哈希算法的基本思想是：将原始数据空间中的两个相邻数据点通过相同的映射或投影变换后，这两个数据点在新的数据空间中仍然相邻的概率很大，而不相邻的数据点被映射到同一个桶的概率很小。也就是说，如果我们对原始数据进行一些哈希映射后，我们希望原先相邻的两个数据能够被哈希到相同的桶内，具有相同的桶号。对原始数据集合中所有的数据都进行哈希映射后，这样就得到了一个哈希表，这些原始数据集被分散到了哈希表的桶内，每个桶会落入一些原始数据，属于同一个桶内的数据就有很大可能是相邻的，当然也存在不相邻的数据被哈希到了同一个桶内。因此，如果能够找到这样一些哈希函数，使得经过它们的哈希映射变换后，原始空间中相邻的数据落入相同的桶内的话，那么在该数据集合中进行近邻查找就变得容易了，只需要将查询数据进行哈希映射得到其桶号，然后取出该桶号对应桶内的所有数据，再进行线性匹配即可查找到与查询数据相邻的数据。换句话说，通过哈希函数映射变换操作，将原始数据集合分成了多个子集合，而每个子集合中的数据间是相邻的且该子集合中的元素个数较小，因此将一个在超大集合内查找相邻元素的问题转化为了在一个很小的集合内查找相邻元素的问题，这种算法能使得查找计算量大幅度下降；The basic idea of locality-sensitive hashing algorithm is: After transforming two adjacent data points in the original data space through the same mapping or projection, the probability that these two data points are still adjacent in the new data space is very high. The probability that non-adjacent data points are mapped to the same bucket is very small. That is to say, if we perform some hash mapping on the original data, we hope that two adjacent data can be hashed into the same bucket with the same bucket number. After hash mapping is performed on all the data in the original data set, a hash table is obtained. These original data sets are scattered into the buckets of the hash table, and each bucket will fall into some original data belonging to the same bucket. The data in a bucket is likely to be adjacent, and of course there are non-adjacent data that are hashed into the same bucket. Therefore, if some hash functions can be found so that after their hash map transformation, the adjacent data in the original space fall into the same bucket, then it becomes easy to search for neighbors in the data set , you only need to perform hash mapping on the query data to obtain its bucket number, then take out all the data in the bucket corresponding to the bucket number, and then perform linear matching to find the data adjacent to the query data. In other words, through the hash function mapping transformation operation, the original data set is divided into multiple sub-sets, and the data in each sub-set is adjacent and the number of elements in the sub-set is small, so one in The problem of finding adjacent elements in a very large set is transformed into the problem of finding adjacent elements in a small set. This algorithm can greatly reduce the amount of search calculations;

离线建立索引Offline indexing

在线查找find online

局部敏感哈希在线查找时间由两个部分组成：①通过局部敏感哈希的哈希函数计算哈希值，即计算桶号的时间；②将查询数据与桶内的数据进行比较计算的时间。因此，局部敏感哈希的查找时间至少是一个次线性时间。这是因为这里通过对桶内的属于建立索引来加快匹配速度，这时第②部分的耗时就从O(N)变成了O(logN)或O(1)，极大的减少了计算量；The local sensitive hash online search time consists of two parts: ① the time to calculate the hash value through the hash function of the local sensitive hash, that is, the time to calculate the bucket number; ② the time to compare the query data with the data in the bucket. Therefore, the lookup time of locality-sensitive hashing is at least a sublinear time. This is because the matching speed is accelerated by indexing the belongings in the bucket. At this time, the time consumption of part ② changes from O(N) to O(logN) or O(1), which greatly reduces the calculation quantity;

h(f_A)＝sign(Wf_A+b) (16)h(f _A )=sign(Wf _A +b) (16)

4)利用实例特征构建局部敏感再排序算法；在传统的局部敏感哈希算法中，返回的主要是距离上相近的图像，即检索图像和候选池中的图像相似度接近1；主要是因为通过低维的车辆属性哈希码映射后可得到相同型号的车辆，但是相同型号下的车辆仍存某些难以区分的情况，从人的主观判断上存在明显区别，但是仅仅通过车辆属性哈希码不能有效区别这些差异，如图4所示；为了排查出候选池中与检索图片含有相同个性特征的车辆，在检索图像经过车辆属性哈希码映射到各个桶中后，再利用获取到的图像实例特征，对桶中的图片再次进行排序缩小类内误差，再排序公式表现形式如下：4) Using instance features to build a local-sensitive reordering algorithm; in the traditional local-sensitive hashing algorithm, the returned images are mainly close in distance, that is, the similarity between the retrieved image and the image in the candidate pool is close to 1; the main reason is that through The same type of vehicle can be obtained after low-dimensional vehicle attribute hash code mapping, but there are still some indistinguishable situations between vehicles of the same model. There are obvious differences in human subjective judgment, but only through the vehicle attribute hash code These differences cannot be effectively distinguished, as shown in Figure 4; in order to find out the vehicles in the candidate pool that have the same personality characteristics as the retrieved pictures, after the retrieved images are mapped to each bucket through the vehicle attribute hash code, then use the obtained images Instance features, reorder the pictures in the bucket to reduce the intra-class error, and the reordering formula is expressed as follows:

式中，Rel(i)表示搜索图像I_q与第i个排名图像之间的真实相关，k表示排名图像的个数，Precision@k搜索精度；在计算真实相关时，只考虑有分类标签部分，Rel(i)∈{0,1}，如果搜索图像与第i个排名图像都具有相同的标签设置Rel(i)＝1，否则设置Rel(i)＝0，遍历候选池P中前k个排名图像就能得到搜索精度；In the formula, Rel(i) represents the true correlation between the search image I _q and the i-th ranked image, k represents the number of ranked images, Precision@k search accuracy; when calculating the true correlation, only the part with classification labels is considered , Rel(i)∈{0,1}, if the search image has the same label as the i-th ranked image, set Rel(i)=1, otherwise set Rel(i)=0, traverse the top k in the candidate pool P The search accuracy can be obtained by ranking images;

5)在无法获取检索的图像时，采用文本检索方式进行辅助检索，在不增加额外训练的情况下，使得通过文本得到的检索特征和卷积网络得到的文本特征可共用一套检索方式，如果某个文本中含有车辆描述信息辨识标记符，如图5所示，其文本获取特征方法如下：5) When the retrieved image cannot be obtained, the text retrieval method is used for auxiliary retrieval. Without additional training, the retrieval features obtained through the text and the text features obtained by the convolutional network can share a set of retrieval methods. If A certain text contains a vehicle description information identification tag, as shown in Figure 5, the method for obtaining the features of the text is as follows:

初始化过程为：文本文件解析成词条向量；去除小词、重复词；检查词条确保解析的正确性；The initialization process is: the text file is parsed into entry vectors; small words and repeated words are removed; the entries are checked to ensure the correctness of the analysis;

5.4)返回相似图像组I；5.4) return similar image group I;

以上所述仅为本发明的较佳实施举例，并不用于限制本发明，凡在本发明精神和原则之内，所作的任何修改、等同替换、改进等，均应包含在本发明的保护范围之内。The above descriptions are only examples of the preferred implementation of the present invention, and are not intended to limit the present invention. Any modifications, equivalent replacements, improvements, etc. made within the spirit and principles of the present invention shall be included in the protection scope of the present invention within.

Claims

1. A modified locality sensitive hashing vehicle retrieval method based on multitask deep learning is characterized by comprising the following steps:

1) Constructing a multi-task end-to-end convolutional neural network for deep learning and training recognition, and deeply learning various attribute information of the vehicle, including vehicle type, vehicle system, vehicle logo, color and license plate, by training data and a network structure which progresses layer by layer;

2) Constructing a vehicle attribute hash code by using the multitask convolutional neural network in the step 1) and adopting a segmented parallel learning and coding strategy;

3) Constructing a characteristic pyramid module by utilizing a pyramid pooling layer and a vector compression layer so as to adapt to convolution feature maps of different sizes and input example characteristics of extracted vehicles;

4) Constructing a local sensitive reordering algorithm by using the example characteristics obtained in the step 3);

5) A cross-mode retrieval method under the condition that a retrieval vehicle image cannot be obtained is constructed, and vehicle retrieval is realized.

2. The revised locality-sensitive hashing vehicle retrieval method based on multitask deep learning as claimed in claim 1, wherein: the multi-task end-to-end convolutional neural network for deep learning and training identification comprises a shared convolution module, an interesting region coordinate regression and identification module and a multi-task learning module;

a shared convolution module: the shared network consists of 5 convolution modules, where the final layers conv2_ x to conv5_ x are {4 } respectively ² ,8 ² ,16 ² ,16 ² Output size as characteristic diagram, conv1 as input layer only contains single layer convolution layer;

the method comprises the following steps that a region-of-interest coordinate regression and identification module is connected behind a shared convolution module, the module takes an image with any size as input, outputs a set of rectangular prediction frames of a target region, and comprises position coordinates of each prediction frame and probability scores of categories in a data set, in order to generate a region suggestion frame, firstly, the input image generates a feature map through a convolution sharing layer, and then, multi-scale convolution operation is carried out on the feature map, and the implementation process is as follows: using 3 scales and 3 length-width ratios at the position of each sliding window, taking the center of the current sliding window as the center and corresponding to one scale and length-width ratio, and then mapping to obtain 9 candidate regions with different scales on an original image; if the shared convolution characteristic diagram with the size of w × h is used, w × h × 9 candidate regions are totally obtained; finally, the classification layer outputs scores of w × h × 9 × 2 candidate regions, namely, the estimation probability that each region is a target/non-target, and the regression layer outputs w × h × 9 × 4 parameters, namely, coordinate parameters of the candidate regions;

when the RPN is trained, each candidate region is allocated with a binary label so as to mark whether the region is an object target, and the operation is as follows: 1) An IoU overlapping candidate region highest to a certain real target region GT; 2) Candidate regions with IoU overlap of more than 0.7 with any GT bounding box. Assigning a negative label to a candidate region for which the IoU ratio of all GT bounding boxes is below 0.3; 3) Between the two.

With these definitions, the objective function is minimized, and the loss function for an image is defined as:

where i is the index of the ith candidate region, P _i Is the probability that the candidate region is of the ith class. If the label of the candidate region is positive,is 1, if the candidate area label is 0,is exactly 0,t _i Is a vector, representing the 4 parameterized coordinates of the predicted bounding box,is the coordinate vector of the corresponding GT bounding box, N _cls And N _reg Normalization of classification loss function and position regression loss function respectivelyCoefficient, λ is a weighting parameter between the two, a classification loss function L _cls Is a log loss of two classes, target and non-target:

regression loss function L for position _reg Defined by the following function:

where R is a robust loss function (smooth L1).

However, training a multitask deep learning network is not an easy process to implement because information at different task levels has different learning difficulties and convergence rates, and the multitask joint training process is as follows: assuming that the total number of tasks is T, the training data for the T-th task is recorded asWherein t is the number of total training samples, i is the number of training samples (1, T), i is the number of training samples (1, N).Respectively, the feature vector and the label of the ith sample, and then the multitask objective function is expressed as:

in the formulaIs an input feature vectorAnd a weight parameter w ^t L (-) is a loss function, phi (w) ^t ) Is a regularization value of a weight parameter;

and for the loss function, training the characteristics of the last layer by utilizing softmax in cooperation with a log-likelihood cost function to realize image classification. The softmax loss function is defined as follows:

in the formula, x _i Is the ith depth feature, W _j B is a bias term, m and n are the number of processed samples and the category number respectively;

the convolutional neural network training is a back propagation process, and the convolutional parameters and the bias are optimized and adjusted by using a random gradient descent method through back propagation of an error function until the network is converged or the maximum iteration times is reached;

the neural network training is a back propagation process, the convolution parameters and the bias are optimized and adjusted by a random gradient descent method through back propagation of an error function until the network is converged or the maximum iteration times are reached;

the back propagation needs to compare the training samples with labels, adopt a square error cost function to identify multiple classes of the c classes and the N training samples, calculate the error by the formula (7) according to the final output error function of the network,

in the formula, E ^N In order to be a function of the squared error cost,for the kth dimension of the label for the nth sample,corresponding to the k output of the network prediction for the n sample;

when the error function is reversely propagated, a calculation method similar to the traditional BP algorithm is adopted, and a specific formula form is shown as a formula (8),

δ ^l ＝(W ^l+1 ) ^T δ ^l+1 ×f'(u ^l )(u ^l ＝W ^l x ^l-1 +b ^l ) (8)

in the formula, delta ^l Error function, δ, representing the current layer ^l+1 Representing the error function of the previous layer, W ^l+1 For the previous layer of the mapping matrix, f' represents the inverse of the activation function, i.e. upsampling, u ^l Output, x, representing the layer above the failed activation function ^l-1 Denotes the input of the next layer, W ^l The weight matrix is mapped for this layer.

3. The revised locality-sensitive hash-ordered vehicle retrieval method based on multitask deep learning according to claim 1 or 2, characterized by comprising: the multi-task has relevance between tasks in the learning process, namely information sharing exists between the tasks,

when a plurality of tasks are trained simultaneously, the network utilizes shared information among the tasks to enhance the induction bias capability of the system and the generalization capability of the classifier; the multitask network is divided into five subtasks by adding five full-connection layers behind an interested region module, each full-connection post-connection softmax activation function normalizes a threshold value between [0 and 1], then sends the normalized value to a segmentation function to promote the output of binary codes, and reduces the redundancy among the hash codes through a segmentation learning and coding strategy so as to enhance the robustness of the learned characteristics;

dividing the multi-task learning network into T tasks, wherein each task contains c ^t One class, full connected layer one-dimensional vector output per task uses m ^t Indicating that the full connection layer is first imported by utilizing softmax activation functionNormalized to [0,1]]The formula is expressed in the following form:

wherein θ represents a random hyperplane; and sending the normalized value into a threshold segmentation function for binarization to obtain binary output of the full-link layer, wherein the formula is specifically expressed as follows:

finally, H obtained by the formula (10) is compared with H in order to obtain the vehicle attribute hash code which is obtained by the segmentation and parallel learning of the multi-task convolutional network ^t The vectors are fused again in a certain proportion, using the vector f _A The formula is expressed in the following concrete form:

f _A ＝[α ¹ H ¹ ；α ² H ² ；...；α ^t H ^t ] (11)

α in the formula (11) ^t The concrete expression form is as follows:

at each H ^t Is previously multiplied by a penalty factor alpha ^t The error caused by different classification numbers among different tasks is compensated.

4. The revised locality-sensitive hashing vehicle retrieval method based on multitask deep learning as claimed in claim 3, wherein: simultaneously creating a characteristic pyramid with strong semantics on all scales by utilizing the pyramid shape of the convolution characteristic hierarchical structure; to achieve this goal, low resolution, semantically strong features are combined with high resolution, semantically weak features by top-down paths and transverse connections by means of a structure; the feature pyramid has rich semantics at all levels, can be quickly constructed from a single input image scale, and can be used to replace a characterized image pyramid without sacrificing representative features, speed or memory. In order to obtain example features of the vehicle image and adapt to the input of a convolution feature map with any size, the last layer of each unit of the sharing modules conv2_ x to conv5_ x is selected and combined with the output of the region-of-interest module, and a pyramid pooling layer and a vector compression layer are added to compress three-dimensional features into a one-dimensional feature vector, so that the selection is that feature map information obtained by a feature pyramid can be enriched, and the deepest layer of each stage has the strongest feature representation function;

the last layer of each module is taken as input of the feature pyramid, and {4 } is selected in turn for the last layer of the networks conv2_ x to conv5_ x defined above ² ,8 ² ,16 ² ,16 ² The size of an input feature map of the feature pyramid is used as the size of the feature pyramid; the input image is represented by I, the length and the width of the input image are respectively represented by letters h and w, the shared convolution module of the x-th layer is represented by convx _ x, the input image I is activated into a three-dimensional feature vector T, the dimension h 'multiplied by w' multiplied by d is a set of a series of two-dimensional feature maps, the length and the width of the two-dimensional feature map are h 'multiplied by w', the T contains d two-dimensional feature maps, and the set S = S { S } is used _n Is represented by ∈ (1, d), S _n Corresponding to the nth channel characteristic diagram; and then, sending the three-dimensional feature vector T into a feature pyramid, and obtaining a three-dimensional feature vector T ' after convolution by a plurality of scale convolution kernels, wherein the dimension is l multiplied by d, the three-dimensional feature vector T ' also comprises a group of two-dimensional feature maps, and can be used as S ' = S ' { S ' _n Denotes n ∈ (1, d), where S _n ' corresponding to the nth channel feature map, each feature map is l × l in size, and the total number of the feature maps is d; then, a sliding window with the size of k multiplied by k is utilized and the maximum pooling is selected to carry out logistic regression on the characteristic graph to obtain a group of characteristic graphs with the size of l/k multiplied by l/k, and then S of each channel is carried out _n ' conducting fusion to obtain a one-dimensional vector, conducting the same operation on d channels in sequence, and finally obtaining an individual characteristic vector f _B The size was (1,l/kXd). The final search feature vector f is shown in equation (13):

f＝[f _A ；f _B ] (13)。

5. the revised locality-sensitive hashing vehicle retrieval method based on multitask deep learning as claimed in claim 4, wherein: performing fast image comparison on the compact binary code by using a hash method and a Hamming distance; the Hash method adopts a locality sensitive Hash algorithm, namely, a Hash bit is constructed by adopting random projection transformation;

one key of the locality sensitive hash is: mapping similar samples to the same bucket with high probability; the hash function h () of the locality sensitive hash satisfies the following condition:

s{h(f _Aq )＝h(f _A )}＝sim(f _Aq ,f _A ) (14)

in the formula, sim (f) _Aq ,f _A ) Denotes f _Aq And f _A Similarity of (c), h (f) _A ) Denotes f _A Hash function of h (f) _Aq ) Denotes f _A The hash function of (1), wherein the similarity measure is directly associated with a distance function σ, such as:

a typical classification of a locality-sensitive hash function is given by a random projection and a threshold, as shown in equation (16),

h(f _A )＝sign(Wf _A +b) (16)

where W is a random hyperplane vector and b is a random intercept.

6. The revised locality-sensitive hashing vehicle retrieval method based on multitask deep learning as claimed in claim 5, wherein: the locality sensitive hash is composed of a preprocessing algorithm and a nearest neighbor search algorithm, and the search image features are represented into a string of binary codes with fixed length through the processing of the two algorithms;

the preprocessing algorithm comprises the following processes:

inputting a set of extracted image features p and a number of hash tables l ₁ Mapping the image features using a random hash function g _j Store to hash table T _i Corresponding barrel number g _i (p _j ) The preparation method comprises the following steps of (1) performing; output hash table T _i ,i＝1,…,l ₁ ；

The nearest neighbor search algorithm comprises the following processes:

inputting a search image feature q, accessing a hash table T generated by a preprocessing algorithm _i ,i＝1,…,l ₁ The number K of nearest neighbors returns K nearest neighbor data of the retrieval point q in the data set S;

let Γ = { I ₁ ,I ₂ ,…,I _n Is a data set composed of n images, and the binary code corresponding to each image is Γ _H ＝{H ₁ ,H ₂ ,…,H _n }，H _i ∈{0,1} ^h (ii) a Given search image I _q And binary code H _q Is prepared from H _q And H _i ∈Γ _H The Hamming distance between is less than the threshold value T _H Are put into the candidate pool P,are candidate images.

7. The revised locality-sensitive hashing vehicle retrieval method based on multitask deep learning as claimed in claim 6, wherein: in the step 4), a local sensitive reordering algorithm is constructed by using the example characteristics; in the traditional locality sensitive hashing algorithm, returned images which are close in distance are mainly used, namely the similarity between a retrieval image and an image in a candidate pool is close to 1; the vehicle attribute hash codes are mapped to obtain vehicles of the same model, but the vehicles of the same model are still difficult to distinguish, and the vehicle attribute hash codes are obviously distinguished from subjective judgment of people, but the differences cannot be effectively distinguished only through the vehicle attribute hash codes; in order to find out vehicles in the candidate pool which have the same individual characteristics as the retrieval pictures, after the retrieval images are mapped into each barrel through the vehicle attribute hash codes, the images in the barrels are sorted again by using the acquired image example characteristics to reduce the errors in the classes, and the representation form of a re-sorting formula is as follows:

in equation (17), k represents the kth image in the bucket selected by the vehicle attribute hash code mapping,represents a penalty factor andcos represents a cosine distance formula for measuring the characteristics of the image instance; to exclude the vehicle attribute hash code from being mapped incorrectly, y represents the pre-mapping search image f _Aq And images in bucketIs equal, y is 1 if equal, otherwise is 0.

In further ranking, H has already been assigned _q And H _i ∈Γ _H Hamming distance therebetween is less than a threshold value T _H The images are put into a candidate pool P, and in order to obtain a more accurate search result, a re-ordering method is further adopted on the basis of the candidate pool;

reordering method, given search image I _q And a candidate pool P, using the instance features to determine top k ranked images from the images in the candidate pool P; the degree of similarity between them is calculated using the formula (17),

further, with respect to the re-ranking evaluation, a ranking-based criterion is used for evaluation; for a given search image I _q And a similarity measure, one ranking for each dataset image; here, a search image I is represented by evaluating the top k ranking images _q Search essence ofDegree, expressed by equation (18);

where Rel (I) denotes a search image I _q The real correlation between the ith ranking image and k represents the number of the ranking images and precision @ k searching precision; when the real correlation is calculated, only a part with a classification label is considered, rel (i) ∈ {0,1}, if the search image and the ith ranking image have the same label setting Rel (i) =1, otherwise, rel (i) =0 is set, and the search precision can be obtained by traversing the first k ranking images in the candidate pool P.

8. The revised locality-sensitive hashing vehicle retrieval method based on multitask deep learning as claimed in claim 1, wherein: in the step 5), when the retrieved image cannot be obtained, a text retrieval mode is adopted for auxiliary retrieval, and under the condition of not adding extra training, the retrieval characteristics obtained through the text and the text characteristics obtained through the convolutional network can share one set of retrieval mode, and the text acquisition characteristic method comprises the following steps:

5.1 Parsing the input text O into entry vectors, removing small words and repeated words, and finally taking out the minimum word segmentation vector R = (R) of random combination ₁ ,r ₂ ,...,r _n )；

5.2 Pair of R and f _A Integrating the sequence and the vehicle attribute Hash codes to obtain text attribute characteristicsAt this time f _ATxt A dimension less than R;

5.3 Using a locality sensitive reordering hash algorithm;

5.4 ) return a similar group of images I.