CN108833928B

CN108833928B - Traffic monitoring video coding method

Info

Publication number: CN108833928B
Application number: CN201810720989.1A
Authority: CN
Inventors: 刘�东; 马常月; 吴枫; 彭秀莲
Original assignee: University of Science and Technology of China USTC
Current assignee: University of Science and Technology of China USTC
Priority date: 2018-07-03
Filing date: 2018-07-03
Publication date: 2020-06-26
Anticipated expiration: 2038-07-03
Also published as: CN108833928A

Abstract

The invention discloses a traffic monitoring video coding method, which is based on a vehicle and a background database to realize traffic monitoring video coding, can effectively remove global redundancy of a traffic monitoring video in a time dimension after paying a certain cost of storage space, and finally has the overall effect of effectively improving the overall coding performance of the traffic monitoring video under the condition of not obviously increasing the complexity of a coding end and a decoding end.

Description

Traffic Surveillance Video Coding Method

技术领域technical field

本发明涉及视频编码技术领域，尤其涉及一种交通监控视频编码方法。The invention relates to the technical field of video coding, in particular to a video coding method for traffic monitoring.

背景技术Background technique

近年来，随着智慧交通的迅速发展，监控视频的数据量出现出了爆炸性的增长。为了有效的存储与传输监控视频数据，首先要解决的问题就是监控视频的编码问题。In recent years, with the rapid development of smart transportation, the amount of surveillance video data has exploded. In order to effectively store and transmit surveillance video data, the first problem to be solved is the encoding of surveillance video.

当前，监控视频的压缩通常采用通用视频编码标准H.264/AVC或者H.265/HEVC。然而，考虑到监控视频具有的一些特性，如监控摄像头静止等，直接将通用视频编码技术用在监控视频的编码上，不能充分利用监控视频自有的特性。为了进一步提高监控视频编码的性能，许多研究者发明了一系列针对监控视频的编码技术。Currently, the compression of surveillance video usually adopts the general video coding standard H.264/AVC or H.265/HEVC. However, considering some characteristics of surveillance video, such as the stillness of surveillance cameras, etc., the general video coding technology is directly used in the coding of surveillance video, which cannot make full use of the inherent characteristics of surveillance video. In order to further improve the performance of surveillance video coding, many researchers have invented a series of coding techniques for surveillance video.

一般来讲，监控视频中的内容可以大致分为背景内容和前景内容。相应的，针对监控视频的编码可以分别从优化背景编码和优化前景编码两个方面来设计。考虑到监控摄像头静止这个特点，优化背景编码通常先生成一个高质量背景帧，然后依靠质量传递来提高整体监控视频的编码效率。优化前景编码方面，研究者先后提出了一些基于模型和物体分割的前景编码技术。Generally speaking, the content in the surveillance video can be roughly divided into background content and foreground content. Correspondingly, the coding for surveillance video can be designed from two aspects: optimizing background coding and optimizing foreground coding. Considering the static characteristics of surveillance cameras, optimized background encoding usually generates a high-quality background frame first, and then relies on quality transfer to improve the encoding efficiency of the overall surveillance video. In terms of optimizing foreground encoding, researchers have successively proposed some foreground encoding techniques based on models and object segmentation.

有一些工作还提出了其他的监控视频编码技术，例如：Some works also propose other surveillance video coding techniques, such as:

基于背景建模的自适应预测技术(Xianguo Zhang,Tiejun Huang,YonghongTian,andWenGao,“Background-modeling-based adaptive predictionfor surveillancevideo coding,”IEEE Transactions on ImageProcessing,vol.23,no.2,pp.769–784,2014.)Adaptive prediction techniques based on background modeling (Xianguo Zhang, Tiejun Huang, YonghongTian, and WenGao, “Background-modeling-based adaptive prediction for surveillance video coding,” IEEE Transactions on ImageProcessing, vol.23, no.2, pp.769–784 , 2014.)

基于车辆3D模型数据库的全局车辆编码技术(Jing Xiao,Ruimin Hu,LiangLiao,Yu Chen,ZhongyuanWang,and ZixiangXiong,“Knowledge-based coding ofobjectsfor multisource surveillance video data,”IEEETransactions on Multimedia,vol.18,no.9,pp.1691–1706,2016.)Global Vehicle Coding Technology Based on Vehicle 3D Model Database (Jing Xiao, Ruimin Hu, LiangLiao, Yu Chen, Zhongyuan Wang, and ZixiangXiong, "Knowledge-based coding of objects for multisource surveillance video data," IEEE Transactions on Multimedia, vol.18, no.9 , pp.1691–1706, 2016.)

以上方法的缺点：Disadvantages of the above method:

1、基于高质量背景帧的背景编码技术在生成高质量背景帧时会引起码流的激增，对网络传输造成不良影响，且编码性能也有待提高。1. The background encoding technology based on high-quality background frames will cause a surge of code streams when generating high-quality background frames, which will cause adverse effects on network transmission, and the encoding performance also needs to be improved.

2、基于模型和物体分割的前景编码技术在对前景进行像素级别的精细分割方面本身存在困难，而且由于分割出的前景可能形状不规则，用于表示它的码率是十分巨大的。2. The foreground coding technology based on model and object segmentation is difficult to perform fine pixel-level segmentation of the foreground, and since the segmented foreground may be irregular in shape, the code rate used to represent it is very huge.

3、基于背景建模的自适应预测技术在当前帧与参考帧上同时减去重建的背景帧，然后编码前景时直接以得到的当前帧前景像素在参考帧前景像素上做帧间预测。当前景像素的分割效果不佳时，对前景编码效率的提升容易造成不良影响。3. The adaptive prediction technology based on background modeling subtracts the reconstructed background frame from the current frame and the reference frame at the same time, and then directly uses the obtained foreground pixels of the current frame to perform inter-frame prediction on the foreground pixels of the reference frame when encoding the foreground. When the segmentation effect of foreground pixels is not good, it is easy to cause adverse effects on the improvement of foreground coding efficiency.

4、基于车辆3D模型数据库的全局车辆编码技术由于未存储车辆的纹理信息，导致车辆的重建质量无法提高。除此之外，该技术所需的车辆3D模型、监控摄像头的内部参数与外部参数、道路上车辆的位置和姿态信息难以获得或估计，从而为该技术的实用化带来困难。4. The global vehicle coding technology based on the vehicle 3D model database does not store the texture information of the vehicle, so the reconstruction quality of the vehicle cannot be improved. In addition, the 3D model of the vehicle, the internal and external parameters of the surveillance camera, and the position and attitude information of the vehicle on the road required by the technology are difficult to obtain or estimate, which brings difficulties to the practical application of the technology.

发明内容SUMMARY OF THE INVENTION

本发明的目的是提供一种交通监控视频编码方法，可以提高交通监控视频的编码性能。The purpose of the present invention is to provide a traffic monitoring video coding method, which can improve the coding performance of the traffic monitoring video.

本发明的目的是通过以下技术方案实现的：The purpose of this invention is to realize through the following technical solutions:

一种交通监控视频编码方法，其主要包括如下步骤：A traffic monitoring video coding method, which mainly comprises the following steps:

步骤1、采用前背景分割方法对原始交通监控视频序列进行处理，分离出车辆与背景，并分别去除分离出的车辆与背景之间存在的冗余后放入数据库。Step 1. The original traffic monitoring video sequence is processed by the front and background segmentation method, the vehicle and the background are separated, and the redundancy existing between the separated vehicle and the background is removed respectively, and then put into the database.

步骤2、对于待编码的交通监控视频同样采用前背景分割方法，分离出待编码车辆与待编码背景；对于待编码车辆采用特征匹配与快速运动估计的方式从数据库中选出匹配车辆；对于待编码背景基于绝对差和从数据库中选出匹配背景。Step 2: For the traffic monitoring video to be encoded, the foreground and background segmentation method is also used to separate the vehicle to be encoded and the background to be encoded; for the vehicle to be encoded, the matching vehicle is selected from the database by means of feature matching and rapid motion estimation; Encoded backgrounds are based on absolute differences and matching backgrounds are selected from the database.

步骤3、当采用帧间预测模式或者帧内预测模式时，使用预定方式判断待编码车辆或待编码背景是否需要在匹配车辆或匹配背景上进行率失真优化处理；根据判断结果进行相应处理，并使用相应的预测模式进行编码。Step 3. When using the inter-frame prediction mode or the intra-frame prediction mode, use a predetermined method to determine whether the vehicle to be encoded or the background to be encoded needs to be subjected to rate-distortion optimization processing on the matching vehicle or matching background; Use the corresponding prediction mode for encoding.

由上述本发明提供的技术方案可以看出，基于车辆和背景数据库实现交通监控视频编码，在付出一定存储空间的代价后，可以有效去除交通监控视频在时间维度上存在的全局冗余，最终，总体的效果是在未明显增加编、解码端复杂度的情况下，有效的提升了交通监控视频的整体编码性能。It can be seen from the above technical solution provided by the present invention that the traffic monitoring video coding based on the vehicle and the background database can effectively remove the global redundancy existing in the time dimension of the traffic monitoring video after paying a certain storage space cost. Finally, The overall effect is to effectively improve the overall encoding performance of traffic surveillance video without significantly increasing the complexity of encoding and decoding.

附图说明Description of drawings

为了更清楚地说明本发明实施例的技术方案，下面将对实施例描述中所需要使用的附图作简单地介绍，显而易见地，下面描述中的附图仅仅是本发明的一些实施例，对于本领域的普通技术人员来讲，在不付出创造性劳动的前提下，还可以根据这些附图获得其他附图。In order to illustrate the technical solutions of the embodiments of the present invention more clearly, the following briefly introduces the accompanying drawings used in the description of the embodiments. Obviously, the drawings in the following description are only some embodiments of the present invention. For those of ordinary skill in the art, other drawings can also be obtained from these drawings without any creative effort.

图1为本发明实施例提供的一种交通监控视频编码方法的流程图；1 is a flowchart of a traffic monitoring video encoding method provided by an embodiment of the present invention;

图2为本发明实施例提供的一种交通监控视频编码框架的原理图；2 is a schematic diagram of a traffic monitoring video coding framework provided by an embodiment of the present invention;

图3为本发明实施例提供的车辆区域背景SIFT特征去除流程图；FIG. 3 is a flowchart of removing SIFT features of vehicle area background provided by an embodiment of the present invention;

图4为本发明实施例提供的车辆和背景相似度分析流程图；FIG. 4 is a flow chart of vehicle and background similarity analysis provided by an embodiment of the present invention;

图5为本发明实施例提供的参考索引比特变化信息示意图；5 is a schematic diagram of reference index bit change information provided by an embodiment of the present invention;

图6为本发明实施例提供的测试序列截图。FIG. 6 is a screenshot of a test sequence provided by an embodiment of the present invention.

具体实施方式Detailed ways

下面结合本发明实施例中的附图，对本发明实施例中的技术方案进行清楚、完整地描述，显然，所描述的实施例仅仅是本发明一部分实施例，而不是全部的实施例。基于本发明的实施例，本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例，都属于本发明的保护范围。The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention. Obviously, the described embodiments are only a part of the embodiments of the present invention, rather than all the embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative work fall within the protection scope of the present invention.

本发明实施例提供一种交通监控视频编码方法，如图1所示，其主要包括如下步骤：An embodiment of the present invention provides a traffic monitoring video encoding method, as shown in FIG. 1 , which mainly includes the following steps:

整个编码框架的原理图如图2所示，其中线下部分也即上述的步骤1，线上部分也即上述步骤2～步骤3。The schematic diagram of the entire coding framework is shown in Figure 2, wherein the offline part is the above-mentioned step 1, and the online part is also the above-mentioned steps 2 to 3.

为了便于理解，下面针对上述三个步骤做详细的介绍。For ease of understanding, the above three steps are described in detail below.

一、车辆和背景数据库建立。1. Establishment of vehicle and background database.

本发明实施例中，对于原始交通监控视频序列，采用前背景分割方法(例如，SuBSENSE方法)分离出其中的车辆，背景从前景分离时产生的背景模型处提取，将属于视频序列前段的车辆和背景用于构建数据库。主要实现过程可以参照下述方式：In the embodiment of the present invention, for the original traffic monitoring video sequence, a foreground and background segmentation method (for example, the SuBSENSE method) is used to separate the vehicles, the background is extracted from the background model generated when the foreground is separated, and the vehicles and The background is used to build the database. The main implementation process can refer to the following methods:

1、车辆数据库建立。1. Establish vehicle database.

车辆数据库建立的较佳实施方式如下：The preferred embodiment of vehicle database establishment is as follows:

从原始交通监控视频序列前段分离出车辆并去除冗余后，将车辆从1到N进行编号；N为分离出的车辆数目。After the vehicles are separated from the front part of the original traffic monitoring video sequence and the redundancy is removed, the vehicles are numbered from 1 to N; N is the number of separated vehicles.

初始时，数据库中车辆为空；对于某个去除冗余的车辆v_i，采用基于倒排表的方法从除车辆v_i外所有其余车辆中检索出相似的车辆{v_i1，v_i2，...，v_im}，其中m为相似车辆的数目。Initially, the vehicle in the database is empty; for a certain vehicle v _i with redundancy removed, the method based on the inverted table is _{used to retrieve similar vehicles {v i1} _, v _i2 , . .., v _im }, where m is the number of similar vehicles.

为了确定m的大小，考虑车辆v_i和v_j匹配的SIFT特征数目，确定两个SIFT特征是否匹配的方式可以采用常规技术，也可以采用后文介绍车辆匹配时所提到的方式来实现。In order to determine the size of m, considering the number of SIFT features matched by vehicles v _i and v _j , the way to determine whether the two SIFT features match can be implemented by conventional techniques or by the methods mentioned in the introduction of vehicle matching later.

检索相似车辆时，比较车辆v_i和其余车辆中任一车辆v_j匹配的SIFT特征数目，当车辆v_i和车辆v_j匹配的SIFT特征数目满足下式时，将车辆v_j放入{v_i1，v_i2，...，v_im}中：When retrieving similar vehicles, compare the number of SIFT features matched by vehicle v _i and any vehicle v _j among the other vehicles. When the number of SIFT features matched by vehicle v _i and vehicle v _j satisfies the following formula, put vehicle v _j into {v _i1 , v _i2 , ..., v _im } in:

N_ij≥β×N_i；N _ij ≥β×N _i ;

N_ij≥min(N₀，N_i)；N _ij ≥ min(N ₀ , N _i );

上式中，N_ij为车辆v_i和车辆v_j匹配的SIFT特征数目，N_i为车辆v_i中的SIFT特征数目，β和N₀为常数；示例性的，β和N₀可以分别设置为0.1和4。通过上述方式处理后，可以得到车辆v_i对应的相似的车辆{v_i1，v_i2，...，v_im}。In the above formula, N _ij is the number of SIFT features matched by vehicle v _i and vehicle v _j , N _i is the number of SIFT features in vehicle v _i , β and N ₀ are constants; exemplarily, β and N ₀ can be set respectively are 0.1 and 4. After processing in the above manner, similar vehicles {v _i1 , v _i2 , . . . , v _im } corresponding to the vehicle v _i can be obtained.

之后，对车辆进行像素级别相似度的比较：对于车辆v_i，如果数据库中的车辆为空，则将车辆v_i放入数据库；否则，将车辆v_i与{v_i1，v_i2，...，v_im}中已经放入数据库的车辆进行像素级别相似度的比较，相似度比较时使用快速运动估计方式，损失函数使用绝对差和(Sum of Absolute Difference，SAD)。After that, the pixel-level similarity is compared for the vehicles: for the vehicle v _i , if the vehicle in the database is empty, put the vehicle v _i into the database; otherwise, put the vehicle v _i with {v _i1 , v _i2 , .. ., v _im } The vehicles that have been put into the database are compared for pixel-level similarity, and the fast motion estimation method is used when the similarity is compared, and the sum of absolute difference (SAD) is used as the loss function.

此处提到的快速运动估计方式可以通过常规技术实现，也可以采用后文介绍车辆匹配时所采用的特定的快速运动估计方式。The fast motion estimation method mentioned here can be implemented by conventional techniques, or a specific fast motion estimation method used in vehicle matching described later.

如果计算得到的绝对差和平均值小于设定值(例如5)，则判定判断两辆车在像素级别是相似的。本领域技术人员可以理解，在进行相似度计算时，每一次的计算对象是车辆v_i与{v_i1，v_i2，...，v_im}中已经放入数据库的某一车辆，当进行绝对差和计算时，将车辆v_i划分成一定尺寸的块，车辆v_i上的某一个块在放入数据库中某一车辆的整个图像上做快速运动估计；如后文提到的16x16的块，对于每一个16x16的块都会得到一个SAD值；此处考虑的绝对差和平均值也即v_i中所有16x16的块的绝对差和的平均值。If the calculated absolute difference and average value are less than a set value (eg, 5), it is determined that the two vehicles are similar at the pixel level. Those skilled in the art can understand that when the similarity is calculated, the object of each calculation is a vehicle v _i and {v _i1 , v _i2 , . . . , _vim } that has been put into the database. When calculating the absolute difference sum, the vehicle _vi is divided into blocks of a certain size, and a certain block on the vehicle _vi is used for fast motion estimation on the entire image of a vehicle in the database; For each 16x16 block, an SAD value is obtained; the absolute difference and average value considered here are also the average of the absolute difference _sums of all 16x16 blocks in vi.

如果{v_i1，v_i2，...，v_im}中已经放入数据库的车辆连续多辆(例如，10辆)没有与车辆v_i在像素级别相似，将车辆v_i放入数据库，反之，车辆v_i不放入数据库。If the vehicles in {v _i1 , _v _i2 _, _. , the vehicle _vi is not put into the database.

如果将最终决定车辆v_i放入数据库中，则将{v_i1，v_i2，...，v_im}中已经放入数据库的车辆与车辆v_i进行像素级别相似度的比较，如果存在与车辆v_i在像素级别相似的车辆，则将已放入数据库中的相似的车辆剔除出数据库；如果累计超过多辆车辆与车辆v_i在像素级别不相似，上述检查过程停止。If the final decision vehicle v _i is put into the database, compare the pixel-level similarity between the vehicles already put into the database in {v _i1 , v _i2 , ..., _vim } and the vehicle v _i , if there is a If the vehicle v _i is similar to the vehicle at the pixel level, the similar vehicles that have been put into the database will be removed from the database; if the cumulative number of vehicles is not similar to the vehicle v _i at the pixel level, the above checking process will stop.

采用上述方式处理每一车辆，确定最终放入数据库中的车辆并进行编码后放入数据库。Each vehicle is processed in the above-mentioned manner, and the final vehicle to be put into the database is determined, encoded and put into the database.

2、背景数据库建立2. Background database establishment

对于去除冗余的背景，每隔一段时间(例如，20s)取一帧背景并进行编码后放入数据库。For the redundant background, take a frame of background every time period (for example, 20s) and put it into the database after encoding.

在实际应用中，监控摄像头安装完成后，编码器首先进行车辆和背景数据库的建立工作。对于车辆，编码器依据前述车辆数据库的建立步骤，将准备放入数据库中的车辆进行高质量编码，并将编码后的车辆放入数据库。同时用于标识这些车辆的信息也被编入码流，解码端解出重建图像后，依据解出的车辆标识信息进行相同的车辆数据库建立过程；对于背景，编码器依据前述背景数据库的建立步骤，每隔一段时间对生成的背景帧进行高质量编码，并将编码后的背景放入数据库。同时高质量编码的背景以及用于标识这些背景的信息也被编入码流，解码器按照上述信息解出高质量的背景帧后，将其放入数据库中。这样，在编解码端可以建立相同的车辆和背景数据库。In practical applications, after the installation of the surveillance camera is completed, the encoder first performs the establishment of the vehicle and background database. For the vehicle, the encoder performs high-quality encoding on the vehicle to be put into the database according to the aforementioned steps of establishing the vehicle database, and puts the encoded vehicle into the database. At the same time, the information used to identify these vehicles is also encoded into the code stream. After the decoding end decodes the reconstructed image, the same vehicle database establishment process is performed according to the decoded vehicle identification information; for the background, the encoder follows the previous steps of establishing the background database. , perform high-quality encoding on the generated background frames at regular intervals, and put the encoded background into the database. At the same time, the high-quality encoded backgrounds and the information used to identify these backgrounds are also encoded into the code stream. After the decoder decodes the high-quality background frames according to the above information, it puts them into the database. In this way, the same vehicle and background database can be established on the codec side.

本发明实施例中，可以将原始交通监控视频序列进行划分，前一部分数据用来建立车辆和背景数据库；后一部分作为待编码的交通监控视频。当然，也可以将第一天的交通监控视频来建立车辆和背景数据库，从第二天开始的数据作为待编码的交通监控视频。编解码器按照本发明所述的方法进行交通监控视频的编解码工作。一般的交通监控视频通常要保存几个月的期限，当将保存的数据清空后，重复上述的工作。In the embodiment of the present invention, the original traffic monitoring video sequence can be divided, the former part of the data is used to establish a vehicle and background database; the latter part is used as the traffic monitoring video to be encoded. Of course, the traffic surveillance video of the first day can also be used to establish a vehicle and background database, and the data from the second day can be used as the traffic surveillance video to be encoded. The codec performs the codec work of the traffic monitoring video according to the method of the present invention. The general traffic surveillance video is usually saved for a period of several months, and the above-mentioned work is repeated after the saved data is emptied.

二、车辆和背景检索2. Vehicle and Background Search

1、车辆检索。1. Vehicle search.

1)车辆与背景的分离及去冗余操作。1) Vehicle and background separation and de-redundancy operations.

本发明实施例中，对于待编码的交通监控视频同样需要进行车辆与背景的分离，以及去冗余操作；这部分操作过程与车辆和背景数据库建立时的操作类似。这一操作过程较佳实施方式如下：In the embodiment of the present invention, the separation of the vehicle and the background and the redundancy removal operation are also required for the traffic surveillance video to be encoded; this part of the operation process is similar to the operation when the vehicle and background database is established. The preferred implementation of this operation process is as follows:

采用SuBSENSE方法分离出监控视频序列(原始交通监控视频序列或者待编码的交通监控视频)中的车辆后，由于车辆的形状可能不规则，将分离出的车辆的左上角至右下角的方形区域中的像素作为车辆，剩余部分作为背景。提取车辆的SIFT特征，将其中的背景SIFT特征去除，背景SIFT特征去除的流程见图3。After the vehicle in the surveillance video sequence (original traffic surveillance video sequence or traffic surveillance video to be encoded) is separated by the SuBSENSE method, since the shape of the vehicle may be irregular, the separated vehicle is separated from the upper left corner to the lower right corner of the square area. The pixels of the vehicle are used as the vehicle, and the remaining part is used as the background. The SIFT features of the vehicle are extracted, and the background SIFT features are removed. The process of removing the background SIFT features is shown in Figure 3.

采用SuBSENSE方法分离车辆的同时，会逐步生成比较干净的背景帧。从监控视频序列上提取车辆的同时，将背景帧上对应位置的背景提取出来。While using the SuBSENSE method to separate vehicles, it will gradually generate relatively clean background frames. While extracting the vehicle from the surveillance video sequence, the background of the corresponding position on the background frame is extracted.

以待编码的交通监控视频为例，对于分离出的当前待编码车辆与对应的背景，分别提取二者的SIFT特征，对于从当前待编码车辆上提取的每一SIFT特征，采用下式在对应背景上一定的位置邻域范围内进行检索：Taking the traffic surveillance video to be encoded as an example, for the separated current vehicle to be encoded and the corresponding background, the SIFT features of the two are extracted respectively. For each SIFT feature extracted from the current vehicle to be encoded, the following formula is used in the corresponding Search within a certain location neighborhood range on the background:

(xs_c-xs_b)²+(ys_c-ys_b)²≤d²；(xs _c -xs _b ) ² +(ys _c -ys _b ) ² ≤d ² ;

其中，xs_c和ys_c表示从当前待编码车辆上提取的SIFT特征的坐标，xs_b和ys_b表示从对应背景上提取的SIFT特征的坐标；d为位置邻域的界定范围；示例性的，可以设置d＝5。Wherein, xs _c and ys _c represent the coordinates of the SIFT features extracted from the current vehicle to be encoded, xs _b and ys _b represent the coordinates of the SIFT features extracted from the corresponding background; d is the bounded range of the location neighborhood; an exemplary , you can set d=5.

如果检索到的归一化后欧氏距离最小的SIFT特征与当前待编码车辆的某一SIFT特征的距离小于一定的阈值：D_min≤D₁；其中，D_min为归一化后欧氏距离最小的SIFT特征与当前待编码车辆的某一SIFT特征之间的归一化后的欧氏距离，D₁为阈值，示例性的可以设置D₁＝1.1；则说明背景区域中存在与当前待编码车辆的SIFT特征相像的SIFT特征，当前待编码车辆的相应SIFT特征为背景SIFT特征，将其从车辆SIFT中去除。If the distance between the retrieved SIFT feature with the smallest Euclidean distance after normalization and a certain SIFT feature of the current vehicle to be encoded is less than a certain threshold: D _min ≤ D ₁ ; where D _min is the normalized Euclidean distance The normalized Euclidean distance between the smallest SIFT feature and a certain SIFT feature of the currently to-be-coded vehicle, D ₁ is the threshold, and D ₁ =1.1 can be set exemplarily; The SIFT feature of the encoded vehicle is similar to the SIFT feature of the vehicle, and the corresponding SIFT feature of the current vehicle to be encoded is the background SIFT feature, which is removed from the vehicle SIFT.

2)采用特征匹配进行粗检索。2) Use feature matching for rough retrieval.

本发明实施例中，提取车辆(包含数据库中车辆及待编码车辆)的SIFT特征，数据库中的车辆基于SIFT特征建立倒排表索引，对于待编码车辆，基于SIFT特征匹配从数据库中粗略的检索出若干候选车辆。这一过程较佳实施方式如下：In the embodiment of the present invention, the SIFT features of the vehicles (including the vehicles in the database and the vehicles to be encoded) are extracted, the vehicles in the database establish an inverted table index based on the SIFT features, and for the vehicles to be encoded, a rough search is performed from the database based on the SIFT feature matching Select several candidate vehicles. The preferred implementation of this process is as follows:

采用特征匹配的方式从数据库中粗略的选出若干候选车辆：将数据库中每一车辆的SIFT特征采用k-means算法量化成视觉文字，对于每一视觉文字，计算对应的映射均值向量；再将数据库中每一车辆的每一SIFT特征映射到最近邻的视觉文字，比较映射的SIFT特征向量与最近邻视觉文字对应的映射均值向量，得到每一SIFT特征向量的二值化表征；同时，将数据库中的每一车辆用其SIFT特征对应的视觉文字的频率直方图表示，采用倒排表的方式组织数据库中每一车辆的频率直方图。Use feature matching to roughly select several candidate vehicles from the database: quantify the SIFT features of each vehicle in the database into visual text using the k-means algorithm, and calculate the corresponding mapping mean vector for each visual text; Each SIFT feature of each vehicle in the database is mapped to the nearest neighbor visual text, and the mapped SIFT feature vector is compared with the mapped mean vector corresponding to the nearest neighbor visual text, and the binary representation of each SIFT feature vector is obtained. Each vehicle in the database is represented by the frequency histogram of the visual text corresponding to its SIFT feature, and the frequency histogram of each vehicle in the database is organized by means of an inverted table.

对于当前待编码车辆，同样按照上述处理数据库中车辆的方法，将其每一SIFT特征分配到最近邻的视觉文字，得到当前待编码车辆的频率直方图，同时计算每一SIFT特征的二值化表征。For the current vehicle to be coded, according to the above method of processing vehicles in the database, each SIFT feature is assigned to the nearest visual text to obtain the frequency histogram of the current vehicle to be coded, and the binarization of each SIFT feature is calculated at the same time. characterization.

在比较当前待编码车辆与数据库中某个车辆的相似度时，在映射到同一视觉文字的SIFT特征的二值化表征的汉明距离小于一定阈值的条件下，以tf-idf(term frequency-inverse document frequency，项频率-反文档频率)项加权的频率直方图的距离作为相似度的评价指标，得到当前待编码车辆与数据库中每一车辆的相似度的比较结果；依照计算的相似度的比较结果进行排序，选出相似度排名靠前的若干车辆作为候选车辆。When comparing the similarity between the current vehicle to be encoded and a vehicle in the database, under the condition that the Hamming distance of the binarized representation of the SIFT feature mapped to the same visual text is less than a certain threshold, tf-idf (term frequency- Inverse document frequency, item frequency - inverse document frequency) item weighted frequency histogram distance is used as the evaluation index of similarity, and the comparison result of the similarity between the current vehicle to be encoded and each vehicle in the database is obtained; according to the calculated similarity The comparison results are sorted, and several vehicles with the highest similarity ranking are selected as candidate vehicles.

示例性的，在具体的实现中，可以检索出10个候选车辆。Exemplarily, in a specific implementation, 10 candidate vehicles may be retrieved.

3)使用快速运动估计的方式进行车辆精选。3) Vehicle selection using fast motion estimation.

本发明实施例中，使用快速运动估计的方式从若干候选车辆中精选出一个匹配车辆；这一过程较佳实施方式如下：In the embodiment of the present invention, a matching vehicle is selected from several candidate vehicles by means of fast motion estimation; the preferred implementation of this process is as follows:

a、将当前待编码车辆与每一候选车辆进行对齐。a. Align the current vehicle to be encoded with each candidate vehicle.

对齐的较佳实施方式如下：The preferred implementation of alignment is as follows:

对于当前待编码车辆的某个SIFT特征，计算其与每一候选车辆的所有SIFT特征的距离，将计算得到的距离按从小到大的方式排序后，如果满足下式，则判定当前待编码车辆的相应SIFT特征在相应候选车辆中找到了匹配SIFT特征：For a certain SIFT feature of the current vehicle to be encoded, calculate the distance between it and all SIFT features of each candidate vehicle, sort the calculated distances in ascending order, and determine the current vehicle to be encoded if the following formula is satisfied The corresponding SIFT features of are found matching SIFT features in the corresponding candidate vehicles:

d₁≤D₂；d ₁ ≤ D ₂ ;

d₁/d₂≤α；d ₁ /d ₂ ≤α;

其中，d₁和d₂分别为最小和第二小距离，D₂和α为常数；Among them, d ₁ and d ₂ are the smallest and second smallest distances, respectively, and D ₂ and α are constants;

按照上述方式计算当前待编码车辆的每一SIFT特征，得到当前待编码车辆与每一候选车辆的SIFT匹配对；依照得到的SIFT特征匹配对的结果，计算当前待编码车辆与每一候选车辆的位置偏移，如下式所示：Calculate each SIFT feature of the current vehicle to be encoded according to the above method, and obtain the SIFT matching pair of the current vehicle to be encoded and each candidate vehicle; position offset, as follows:

其中，MV_x和MV_y为偏移的水平分量和竖直分量，n为匹配的SIFT特征对的数目，xc_i和yc_i为当前待编码车辆的SIFT特征的坐标，xv_i和yv_i为候选车辆的SIFT特征的坐标；i为SIFT特征匹配对的序号；Among them, MV _x and MV _y are the horizontal and vertical components of the offset, _n is the number of matched SIFT feature pairs, xci and _yci are the coordinates of the SIFT feature of the current vehicle to be encoded _, and xvi and _yvi are The coordinates of the SIFT feature of the candidate vehicle; i is the sequence number of the SIFT feature matching pair;

再采用迭代的方式去除异常点，得到最终的位置偏移结果；依照计算得到的位置偏移结果，将当前待编码车辆与相应候选车辆进行对齐。Then an iterative method is used to remove abnormal points to obtain the final position offset result; according to the calculated position offset result, the current vehicle to be encoded is aligned with the corresponding candidate vehicle.

异常点可以通过如下方式来确定：如果由某对SIFT匹配对计算得到的运动向量偏离均值运动向量较远(即超过设定值)，则该SIFT特征匹配对为异常点。The abnormal point can be determined by the following way: if the motion vector calculated by a certain pair of SIFT matching pairs deviates far from the mean motion vector (ie, exceeds the set value), the SIFT feature matching pair is an abnormal point.

b、再将当前待编码车辆划分成固定大小为16x16的块，每一16x16的块在某一候选车辆中搜索损失函数最小的块，其中损失函数由绝对差和及运动矢量的编码码率组成；搜索的方式为以当前16x16的块的位置为起始点，在该起始点周围上下左右64像素范围内进行八点钻石型搜索，将所有16x16的块的损失函数累加作为整个当前待编码车辆在某一候选车辆上的整体损失函数；最终保留整体损失函数最小的候选车辆作为匹配车辆。b. Divide the current vehicle to be encoded into blocks with a fixed size of 16x16, each 16x16 block searches for the block with the smallest loss function in a candidate vehicle, where the loss function is composed of the absolute difference sum and the encoding code rate of the motion vector ; The search method is to take the position of the current 16x16 block as the starting point, perform an eight-point diamond search within 64 pixels around the starting point, and accumulate the loss functions of all 16x16 blocks as the entire current vehicle to be encoded. The overall loss function on a candidate vehicle; the candidate vehicle with the smallest overall loss function is finally reserved as the matching vehicle.

2、背景检索。2. Background search.

本发明实施例中，对于待编码背景基于绝对差和从数据库中选出匹配背景，这一过程较佳实施方式如下：In the embodiment of the present invention, for the background to be encoded, the matching background is selected from the database based on the absolute difference and the preferred implementation of this process is as follows:

以当前待编码背景与数据库中背景对应位置像素的绝对差和作为相似度评价准则，计算当前待编码背景与数据库中每个背景的绝对差和，如下式所示：Taking the absolute difference sum of the current background to be encoded and the pixels corresponding to the background in the database as the similarity evaluation criterion, calculate the absolute difference sum of the current background to be encoded and each background in the database, as shown in the following formula:

SAD＝∑_k∈B|pc_k-pl_k|；SAD = ∑ _k∈B |pc _k -pl _k |;

其中，pc_k与pl_k分别为当前待编码背景与数据库中背景第k个像素的值，B为当前待编码背景像素的集合；Wherein, pc _k and pl _k are the current background to be encoded and the value of the kth pixel of the background in the database, respectively, and B is the set of current background pixels to be encoded;

将计算结果从小到大排序，以绝对差和最小的背景作为当前待编码背景的匹配背景。Sort the calculation results from small to large, and use the background with the absolute difference and the smallest as the matching background of the current background to be encoded.

三、编码。3. Coding.

1、相似度分析。1. Similarity analysis.

本发明实施例中，确定当前待编码车辆和背景的匹配车辆和背景后，确定当前车辆和背景是否在匹配车辆和背景上做率失真优化(RDO)。当前车辆和背景采取帧间预测方式时，将匹配车辆和背景与当前车辆和背景的已有参考帧信息作RDO比较；当前车辆和背景采取帧内预测方式时，将候选车辆和背景与当前车辆和背景粗略的帧内预测方式作RDO比较。车辆和背景相似度分析的具体流程见图4。下面将分别详细介绍帧间、帧内预测方式下RDO的比较。In the embodiment of the present invention, after determining the matching vehicle and background of the current vehicle to be encoded and the background, it is determined whether rate-distortion optimization (RDO) is performed on the matching vehicle and background for the current vehicle and background. When the current vehicle and background are in the inter-frame prediction mode, the matching vehicle and background are compared with the existing reference frame information of the current vehicle and the background for RDO; when the current vehicle and the background are in the intra-frame prediction mode, the candidate vehicle and background are compared with the current vehicle. Compare with RDO with rough background intra-frame prediction. The specific process of vehicle and background similarity analysis is shown in Figure 4. The comparison of RDO in the inter-frame and intra-frame prediction modes will be described in detail below.

1)帧间预测模式下RDO的比较。1) Comparison of RDO in inter prediction mode.

帧间预测模式下率失真优化的比较准则为：The comparison criteria for rate-distortion optimization in inter prediction mode are:

其中，I为拉格朗日损失函数，D为预测块与匹配块的绝对差和，R为用于表示模式信息的比特数，λ为拉格朗日乘子；Among them, I is the Lagrangian loss function, D is the absolute difference sum of the prediction block and the matching block, R is the number of bits used to represent the pattern information, and λ is the Lagrangian multiplier;

为了将匹配车辆与背景和现有的参考帧作比较，先计算得到当前待编码车辆和背景与现有参考帧的拉格朗日损失函数，再计算考虑检索得到的匹配车辆和背景后，得到更新的拉格朗日损失函数，比较更新前后的拉格朗日损失函数，确定是否在匹配的车辆和背景上做RDO。这一过程较佳实施方式如下：In order to compare the matching vehicle with the background and the existing reference frame, first calculate the Lagrangian loss function of the current vehicle to be encoded and the background and the existing reference frame, and then calculate the matching vehicle and background obtained by considering the retrieval. The updated Lagrangian loss function, compare the Lagrangian loss function before and after the update, and determine whether to do RDO on the matching vehicle and background. The preferred implementation of this process is as follows:

a、计算当前待编码车辆和当前待编码背景与现有参考帧的拉格朗日损失函数：a. Calculate the Lagrangian loss function of the current vehicle to be encoded, the current background to be encoded and the existing reference frame:

对于当前待编码车辆的每一个现有参考帧，首先估计出当前车辆在现有参考帧上的位移，然后得到当前车辆在现有参考帧上的最优RDO结果，最后将其与当前车辆在候选匹配车辆上的最优RDO结果作比较，确定是否要在候选匹配车辆上做RDO，相关过程如下：For each existing reference frame of the current vehicle to be encoded, first estimate the displacement of the current vehicle on the existing reference frame, then obtain the optimal RDO result of the current vehicle on the existing reference frame, and finally compare it with the current vehicle on the existing reference frame. Compare the optimal RDO results on the candidate matching vehicle to determine whether to do RDO on the candidate matching vehicle. The relevant process is as follows:

以4×4的块为单位，得到当前待编码车辆对应位置上采用帧间预测4×4的块的运动矢量(Motion Vector，MV)和及其参考帧的图像编号(POC)信息，以此为基础，估计当前待编码车辆上对应4×4的块的运动矢量信息，估计的公式如下：Taking a 4×4 block as a unit, obtain the motion vector (Motion Vector, MV) of the 4×4 block using inter-frame prediction and the picture number (POC) information of its reference frame at the corresponding position of the current vehicle to be encoded. Based on this, the motion vector information of the corresponding 4×4 block on the current vehicle to be encoded is estimated. The estimated formula is as follows:

其中，MVX_ref和MVY_ref分别为现有参考帧上帧间预测4×4的块运动矢量的水平分量和竖直分量；POC_cur、POC_ref和POC_colref分别为当前待编码车辆所在的帧的POC、现有参考帧的POC和现有参考帧上帧间预测4×4的块参考帧的POC；MVX_cur和MVY_cur分别为估计得到的当前待编码车辆对应4×4的块运动矢量的水平分量和竖直分量；遍历当前待编码车辆中的每一个4x4小的块，记录帧间预测4×4的块的数目及其对应的当前待编码车辆4×4的块的运动矢量，最终估计的当前待编码车辆运动矢量的水平分量与竖直分量为所有帧间预测4x4小的块运动矢量的平均值；Among them, MVX _ref and MVY _ref are the horizontal component and vertical component of the 4×4 block motion vector of the inter-frame prediction on the existing reference frame, respectively; POC _cur , POC _ref and POC _colref are respectively the frame of the current to-be-encoded vehicle. POC, the POC of the existing reference frame, and the POC of the 4×4 block reference frame for inter-frame prediction on the existing reference frame; MVX _cur and MVY _cur are the estimated values of the 4×4 block motion vector corresponding to the current vehicle to be encoded, respectively. Horizontal component and vertical component; traverse each 4×4 small block in the current vehicle to be encoded, record the number of inter-frame prediction 4×4 blocks and the corresponding motion vector of the 4×4 block of the current vehicle to be encoded, and finally The estimated horizontal component and vertical component of the currently to-be-coded vehicle motion vector are the average value of all inter-frame prediction 4x4 small block motion vectors;

从而得到当前待编码车辆在每个现有参考帧上的位移，之后，将当前待编码车辆划分成固定大小为16x16的块，每一16x16的块在所有现有参考帧中依次搜索损失函数最小的块，其中损失函数由绝对差和及运动矢量的编码码率组成；搜索的方式为以当前16x16的块按估计得到的位移平移后的位置为起始点，在该起始点周围上下左右64像素范围内进行八点钻石型搜索；以16x16的块为单位，记录当前待编码车辆中所有块与所有现有参考帧中匹配块的最小损失函数；依次遍历当前待编码车辆中每个16x16的块，累加其记录得最小损失函数和，得到当前待编码车辆相对于现有参考帧的拉格朗日损失函数

Thereby, the displacement of the current vehicle to be encoded on each existing reference frame is obtained. After that, the current vehicle to be encoded is divided into blocks with a fixed size of 16x16, and each 16x16 block is sequentially searched in all existing reference frames. The minimum loss function The loss function is composed of the absolute difference sum and the coding rate of the motion vector; the search method is to take the position of the current 16x16 block translated by the estimated displacement as the starting point, and 64 pixels around the starting point. Perform an eight-point diamond search within the range; take 16x16 blocks as a unit, record the minimum loss function of all blocks in the current vehicle to be encoded and matching blocks in all existing reference frames; traverse each 16x16 block in the current vehicle to be encoded in turn , accumulate the sum of the minimum loss functions recorded, and obtain the Lagrangian loss function of the current vehicle to be encoded relative to the existing reference frame

对于当前待编码背景，将其划分成16x16的块；对于当前16x16的块，从所有现有参考帧中搜索最小损失函数对应的匹配块；搜索的方式为比较所有参考帧对应位置的16x16的块与当前待编码背景内当前16x16的块的绝对差和，选出最小的绝对差和作为当前待编码背景内当前16x16的块的损失函数；遍历当前待编码背景中所有16x16的块，累加所有16x16的块的损失函数，作为当前待编码背景的拉格朗日损失函数

For the current background to be encoded, it is divided into 16x16 blocks; for the current 16x16 block, the matching block corresponding to the minimum loss function is searched from all existing reference frames; the search method is to compare the 16x16 blocks in the corresponding positions of all reference frames The sum of absolute differences with the current 16x16 block in the current background to be encoded, and the smallest absolute difference sum is selected as the loss function of the current 16x16 block in the current background to be encoded; traverse all 16x16 blocks in the current background to be encoded, and accumulate all 16x16 blocks The loss function of the block, as the Lagrangian loss function of the current background to be encoded

b、将匹配车辆和背景考虑进来，计算更新后的拉格朗日损失函数：b. Taking into account the matching vehicle and background, calculate the updated Lagrangian loss function:

对于当前待编码车辆内的每个16x16的块，在拉格朗日损失函数

计算结果的基础上，采用快速运动估计的方法，计算其与匹配车辆的损失函数；再将每一16x16的块与匹配车辆的损失函数，与计算拉格朗日损失函数

时得到的与现有参考帧的最小损失函数比较，取较小者为相应16x16的块的最小损失函数；遍历当前待编码车辆内的每个16x16的块，累加每个16x16的块的最小损失函数，得到当前待编码车辆的拉格朗日损失函数

同时，对于当前待编码车辆，引起比特数的变化包含了匹配车辆在数据库中的位置索引信息、匹配车辆在参考帧中的位置信息、参考索引(参考帧的索引)比特变化信息和CTU级别的表示信息，将这些比特数变化与拉格朗日损失函数

组合起来，得到更新后的拉格朗日损失函数

For each 16x16 block within the current vehicle to be encoded, the Lagrangian loss function

On the basis of the calculation results, the method of fast motion estimation is used to calculate the loss function of the matching vehicle; then each 16x16 block and the loss function of the matching vehicle are calculated with the Lagrangian loss function.

Compared with the minimum loss function of the existing reference frame, the smaller one is the minimum loss function of the corresponding 16x16 block; traverse each 16x16 block in the current vehicle to be encoded, and accumulate the minimum loss of each 16x16 block. function to get the Lagrangian loss function of the current vehicle to be encoded

At the same time, for the current vehicle to be encoded, the change in the number of bits caused by the matching vehicle includes the position index information of the matching vehicle in the database, the position information of the matching vehicle in the reference frame, the reference index (reference frame index) bit change information and the CTU level To represent the information, combine these bit number changes with the Lagrangian loss function

Combined to get the updated Lagrangian loss function

对于当前待编码背景内的每个16x16的块，在拉格朗日损失函数

计算结果的基础上，计算与匹配背景的损失函数；再将每一16x16的块与匹配背景的损失函数，与计算拉格朗日损失函数

时得到的与现有参考帧的最小损失函数比较，取较小者为相应16x16的块的最小损失函数；遍历当前待编码背景内的每个16x16的块，累加每个16x16的块的最小损失函数，得到当前待编码背景的损失函数

同时，对于当前待编码背景，引起比特数的变化包含了匹配背景在数据库中的位置索引信息及参考索引比特变化信息，将这些比特数变化与拉格朗日损失函数

组合起来，得到更新后的拉格朗日损失函数

For each 16x16 block in the current background to be encoded, the Lagrangian loss function

On the basis of the calculation results, calculate and match the loss function of the background; then combine each 16x16 block with the loss function of the matching background, and calculate the Lagrangian loss function

Compared with the minimum loss function of the existing reference frame, the smaller one is the minimum loss function of the corresponding 16x16 block; traverse each 16x16 block in the current background to be encoded, and accumulate the minimum loss of each 16x16 block. function to get the loss function of the current background to be encoded

At the same time, for the current background to be encoded, the change in the number of bits caused by the background includes the position index information and reference index bit change information of the matching background in the database, and these changes in the number of bits are combined with the Lagrange loss function

Combined to get the updated Lagrangian loss function

以参考索引比特变化信息的比特数计算方式为例进行介绍：Take the calculation method of the number of bits of the reference index bit change information as an example to introduce:

如图5所示，对于当前待编码车辆和背景中的每个16x16的块，在计算其与现有的参考帧和匹配车辆与背景的最小损失函数时，如果其最小损失函数对应的匹配块索引是n-1，则比特数加1，其中，n为现有的参考帧数目；否则，如果其最小损失函数对应的匹配块在匹配的车辆或背景上，则比特数增加n-1-idx，其中，idx为在不考虑匹配的车辆和背景时，该16x16的块最小损失函数对应的匹配块索引。除此之外，比特数不变。遍历当前待编码车辆和背景内的每个16x16的块，最终的参考索引比特变化信息的比特数位每个16x16的块变化比特数的求和。将比特数变化和前面计算得到的拉格朗日损失函数组合起来，得到更新后的对应于当前待编码车辆和背景的拉格朗日损失函数。As shown in Figure 5, for each 16x16 block in the current vehicle to be encoded and the background, when calculating its minimum loss function with the existing reference frame and matching vehicle and background, if the matching block corresponding to its minimum loss function If the index is n-1, the number of bits is increased by 1, where n is the number of existing reference frames; otherwise, if the matching block corresponding to its minimum loss function is on the matched vehicle or background, the number of bits is increased by n-1- idx, where idx is the matching block index corresponding to the 16x16 block minimum loss function when the matched vehicle and background are not considered. Other than that, the number of bits remains the same. Traverse the current vehicle to be encoded and each 16x16 block in the background, and the final reference index bit change information is the sum of the bit number of each 16x16 block change. Combining the change in the number of bits with the Lagrangian loss function calculated earlier, the updated Lagrangian loss function corresponding to the current vehicle to be encoded and the background is obtained.

最后，比较拉格朗日损失函数

与更新后的拉格朗日损失函数

之间的大小，若

则在匹配车辆上进行率失真优化处理；比较拉格朗日损失函数

与更新后的拉格朗日损失函数

之间的大小，若

则在匹配背景上进行率失真优化处理。Finally, compare the Lagrangian loss functions

with the updated Lagrangian loss function

size, if

Then perform rate-distortion optimization on the matched vehicle; compare the Lagrangian loss function

with the updated Lagrangian loss function

size, if

Then, rate-distortion optimization is performed on the matching background.

2、帧内预测模式下RDO的比较。2. Comparison of RDO in intra prediction mode.

帧内预测模式下率失真优化的比较准则与帧间预测模式类似，也表示为：The comparison criterion for rate-distortion optimization in intra prediction mode is similar to that in inter prediction mode, which is also expressed as:

其中，J为拉格朗日损失函数，D为预测块与匹配块的绝对差和，R为用于表示模式信息的比特数，λ为拉格朗日乘子。Among them, J is the Lagrangian loss function, D is the absolute difference sum of the prediction block and the matching block, R is the number of bits used to represent the pattern information, and λ is the Lagrangian multiplier.

a、对于当前待编码背景，在帧内预测模式下，始终在匹配背景上进行率失真优化处理。a. For the current background to be encoded, in the intra-frame prediction mode, rate-distortion optimization is always performed on the matching background.

b、对于当前待编码车辆，首先，粗略估计出当前待编码车辆采用帧内预测时的损失函数：将当前待编码车辆划分成固定大小为16x16的块，对于每个16x16的块，依次进行均值模式(DC)、平滑模式(planar)、水平和垂直帧内预测模式的估计，得到每个16x16的块对应于每种模式的绝对差和；帧内预测模式估计时，当前16x16的块的参考像素值由邻近16x16的块的原始值推出；对于每个16x16的块，将其在所有模式下估计得到的绝对差和按照从小到大的顺序排序，以绝对差和最小的结果作为当前16x16的块的最优匹配结果；遍历当前待编码车辆中所有16x16的块，累加每个16x16的块的最优匹配结果，得到当前待编码车辆的拉格朗日损失函数

b. For the current vehicle to be encoded, first, roughly estimate the loss function when the current vehicle to be encoded adopts intra-frame prediction: divide the current vehicle to be encoded into blocks with a fixed size of 16x16, and for each 16x16 block, perform the mean value in turn Mode (DC), smooth mode (planar), horizontal and vertical intra prediction mode estimation, get the absolute difference sum of each 16x16 block corresponding to each mode; when estimating intra prediction mode, the reference of the current 16x16 block The pixel value is derived from the original value of the adjacent 16x16 block; for each 16x16 block, the absolute difference sum estimated in all modes is sorted in ascending order, and the result with the smallest absolute difference sum is used as the current 16x16. The optimal matching result of the block; traverse all 16x16 blocks in the current vehicle to be encoded, accumulate the optimal matching results of each 16x16 block, and obtain the Lagrangian loss function of the current vehicle to be encoded

将匹配车辆考虑进来，计算更新后的拉格朗日损失函数：对于当前待编码车辆内的每个16x16的块，在拉格朗日损失函数

计算结果的基础上，采用快速运动估计的方法，计算与匹配车辆的损失函数(绝对差和)；再将每一16x16的块与匹配车辆的损失函数，与计算拉格朗日损失函数

时得到的其帧内预测估计出的最小绝对差和比较，取较小者为相应16x16的块的最小损失函数；遍历当前待编码车辆内的每个16x16的块，累加每个16x16的块的最小损失函数，得到当前待编码车辆的损失函数

同时，对于当前待编码车辆，引起比特数的变化包含了匹配车辆在数据库中的位置索引信息、匹配车辆在参考帧中的位置信息和CTU级别的表示信息，将这些比特数变化与拉格朗日损失函数

组合起来，得到更新后的拉格朗日损失函数

Taking matching vehicles into account, compute the updated Lagrangian loss function: For each 16x16 block within the current vehicle to be encoded, the Lagrangian loss function

On the basis of the calculation results, the method of fast motion estimation is used to calculate and match the loss function (sum of absolute difference) of the vehicle; then each 16x16 block and the loss function of the matching vehicle are used to calculate the Lagrangian loss function.

The minimum absolute difference and comparison of the estimated intra-frame prediction obtained when , take the smaller one as the minimum loss function of the corresponding 16x16 block; traverse each 16x16 block in the current vehicle to be encoded, and accumulate the value of each 16x16 block. Minimum loss function to get the loss function of the current vehicle to be encoded

At the same time, for the current vehicle to be encoded, the changes in the number of bits caused by the position index information of the matching vehicle in the database, the position information of the matching vehicle in the reference frame, and the representation information of the CTU level. daily loss function

Combined to get the updated Lagrangian loss function

比较拉格朗日损失函数

与更新后的拉格朗日损失函数

之间的大小，若

则在匹配车辆上进行率失真优化处理。Comparing Lagrangian Loss Functions

with the updated Lagrangian loss function

size, if

The rate-distortion optimization process is then performed on the matched vehicle.

2、车辆与背景的编码。2. Coding of vehicles and backgrounds.

1)当采用帧间预测模式时，如果判定需要在匹配车辆或匹配背景上进行率失真优化处理，则新申请一个参考帧的空间，将匹配车辆或匹配背景贴于新申请的参考帧上与现有的参考帧一起供当前待编码车辆或待编码背景做帧间预测；帧间预测结束后，遍历当前待编码车辆或当前待编码背景覆盖的每个4x4的块，如果某个4x4的块参考了当前待编码车辆或当前待编码背景的信息，则将相应的语法元素编入码流；1) When the inter-frame prediction mode is adopted, if it is determined that rate-distortion optimization processing is required on the matching vehicle or the matching background, a space for a new reference frame is newly applied, and the matching vehicle or matching background is pasted on the newly applied reference frame. The existing reference frames are used for inter-frame prediction of the current vehicle to be encoded or the background to be encoded; after the inter-frame prediction is over, traverse each 4x4 block covered by the current vehicle to be encoded or the current background to be encoded, if a certain 4x4 block With reference to the information of the current vehicle to be encoded or the current background to be encoded, the corresponding syntax element is encoded into the code stream;

2)当采用帧内预测模式时，如果判定需要在匹配车辆或匹配背景上进行率失真优化处理，则新申请一个参考帧的空间，将匹配车辆或匹配背景贴于新申请的参考帧上供当前待编码车辆或待编码背景做帧内预测。2) When the intra-frame prediction mode is adopted, if it is determined that rate-distortion optimization processing is required on the matching vehicle or the matching background, a space for a new reference frame is newly applied, and the matching vehicle or matching background is pasted on the newly applied reference frame for reference. Intra-frame prediction is performed on the current vehicle to be encoded or the background to be encoded.

上述两部分中，将匹配车辆贴于新申请的参考帧的位置由下式确定：In the above two parts, the position where the matching vehicle is attached to the reference frame of the new application is determined by the following formula:

x₀＝x_c+MV_x；x ₀ =x _c +MV _x ;

y₀＝y_c+MV_y；y ₀ =y _c +MV _y ;

其中，x₀和y₀表示匹配车辆贴到新申请的参考帧上的位置，x_c和y_c表示当前待编码车辆在当前帧的位置，MV_x和MV_y为当前待编码车辆相对于匹配车辆偏移的水平分量和竖直分量(通过前述快速运动估计获得)；Among them, x ₀ and y ₀ represent the position where the matching vehicle is attached to the reference frame of the new application, x _c and y _c represent the position of the current vehicle to be encoded in the current frame, MV _x and MV _y are the relative position of the current vehicle to be encoded relative to the matching vehicle the horizontal and vertical components of the vehicle offset (obtained by the aforementioned fast motion estimation);

将匹配背景贴在参考帧上时，与参考帧位置对齐即可。When the matching background is pasted on the reference frame, it can be aligned with the reference frame position.

3、编码码流结构3. Encoding code stream structure

本发明实施例中，编码码流的结构分为片(slice)和树形编码单元(CTU)两层；其中：In the embodiment of the present invention, the structure of the encoded code stream is divided into two layers: slice and tree coding unit (CTU); wherein:

slice层：对于当前待编码车辆，slice层包含一个表示当前slice层中是否有匹配车辆被参考的标记(flag)；遍历当前slice层中所有车辆覆盖的4x4的块，判断其是否参考了匹配车辆，如果存在某个4x4的块参考了匹配车辆，则标记为真，否则标记为假；如果标记为真，则slice层还要包含表示当前slice层中被参考匹配车辆数目的语法元素；对于每个匹配车辆，其在数据库中的位置索引、其贴在新申请的参考帧上的位置一并编入码流，被参考的匹配车辆数目、每个匹配车辆的索引、以及每个匹配车辆贴在新申请的参考帧上的位置采用定长编码方式进行编码；Slice layer: For the current vehicle to be encoded, the slice layer contains a flag that indicates whether there is a matching vehicle in the current slice layer; traverse the 4x4 blocks covered by all vehicles in the current slice layer to determine whether it refers to a matching vehicle. , if there is a 4x4 block that references a matching vehicle, it is marked as true, otherwise it is marked as false; if it is marked as true, the slice layer also contains a syntax element indicating the number of referenced matching vehicles in the current slice layer; for each matching vehicles, their position index in the database, their position on the reference frame of the new application are encoded into the code stream together, the number of referenced matching vehicles, the index of each matching vehicle, and the labeling of each matching vehicle The position on the reference frame of the new application is coded using fixed-length coding;

对于当前待编码背景，slice层包含一个表示当前slice层中是否有匹配背景被参考的标记；遍历当前slice层中所有背景覆盖的4x4的块，判断其是否参考了匹配背景，如果存在某个4x4的块参考了匹配背景，则标记为真，否则标记为假；如果标记为真，则slice层还要包含被参考的匹配背景在数据库中的位置索引语法元素，该语法元素采用定长编码方式进行编码；For the current background to be coded, the slice layer contains a flag indicating whether there is a matching background in the current slice layer to be referenced; traverse all the 4x4 blocks covered by the background in the current slice layer to determine whether it refers to the matching background, if there is a certain 4x4 If the block refers to the matching background, it is marked as true, otherwise it is marked as false; if it is marked as true, the slice layer also contains the position index syntax element of the referenced matching background in the database, and the syntax element adopts the fixed-length encoding method. to encode;

CTU层：对于当前待编码车辆，CTU层包含一个表示当前CTU层是否参考了匹配车辆像素的标记；遍历当前CTU层中每个4x4的块，如果存在某个4x4的块参考了匹配车辆像素，则标记为真，否则标记为假；当标记为真时，CTU层还要包含一个表示匹配车辆索引(index)的语法元素；CTU layer: For the current vehicle to be encoded, the CTU layer contains a flag indicating whether the current CTU layer refers to the matching vehicle pixel; traverse each 4x4 block in the current CTU layer, if there is a 4x4 block that refers to the matching vehicle pixel, If it is marked as true, otherwise it is marked as false; when it is marked as true, the CTU layer also contains a syntax element indicating the matching vehicle index (index);

对于当前待编码背景，CTU层包含一个表示当前CTU层是否参考了匹配背景像素的标记。For the current background to be encoded, the CTU layer contains a flag indicating whether the current CTU layer refers to matching background pixels.

另一方面，为了说明本发明上述方案的编码性能还进行了相关测试。On the other hand, in order to illustrate the coding performance of the above scheme of the present invention, relevant tests are also carried out.

测试条件包括：1)帧间配置：随机接入即Random Access，RA；低延时B即Low-delayB，LDB；低延时P即Low-delay P，LDP。2)基本量化步长(QP)设置为{27,32,37,42}，基于的软件是HM16.7，测试序列为自己拍摄的14段测试序列，截图如图6所示。实验结果见表1与表2。The test conditions include: 1) Inter-frame configuration: random access is Random Access, RA; low-delay B is Low-delayB, LDB; low-delay P is Low-delay P, LDP. 2) The basic quantization step size (QP) is set to {27, 32, 37, 42}, the software based on it is HM16.7, and the test sequence is a 14-segment test sequence taken by myself, as shown in Figure 6. The experimental results are shown in Table 1 and Table 2.

其中表1为RA、LDB、LDP设置下的性能对比结果，表2为RA、LDB、LDP设置下的编解码端复杂度对比结果。Among them, Table 1 shows the performance comparison results under the RA, LDB, and LDP settings, and Table 2 shows the codec side complexity comparison results under the RA, LDB, and LDP settings.

表1RA、LDB、LDP设置下的性能对比结果Table 1. Performance comparison results under RA, LDB, and LDP settings

表2RA、LDB、LDP设置下的编解码端复杂度对比结果Table 2 Comparison results of codec complexity under RA, LDB and LDP settings

从表1～表2中可以看出，本发明实施例上述方案相对于HM16.7在RA、LDB和LDP模式下可分别获得35.1％、31.3％和28.8.0％的码率节省，并且编解码端的复杂度的增加在合理范围内。It can be seen from Table 1 to Table 2 that, compared with HM16.7, the above scheme in the embodiment of the present invention can obtain 35.1%, 31.3% and 28.8.0% of the code rate saving in RA, LDB and LDP modes respectively, and the coding The increase in complexity at the decoding end is within a reasonable range.

通过以上的实施方式的描述，本领域的技术人员可以清楚地了解到上述实施例可以通过软件实现，也可以借助软件加必要的通用硬件平台的方式来实现。基于这样的理解，上述实施例的技术方案可以以软件产品的形式体现出来，该软件产品可以存储在一个非易失性存储介质(可以是CD-ROM，U盘，移动硬盘等)中，包括若干指令用以使得一台计算机设备(可以是个人计算机，服务器，或者网络设备等)执行本发明各个实施例所述的方法。From the description of the above embodiments, those skilled in the art can clearly understand that the above embodiments can be implemented by software or by means of software plus a necessary general hardware platform. Based on this understanding, the technical solutions of the above embodiments may be embodied in the form of software products, and the software products may be stored in a non-volatile storage medium (which may be CD-ROM, U disk, mobile hard disk, etc.), including Several instructions are used to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in various embodiments of the present invention.

以上所述，仅为本发明较佳的具体实施方式，但本发明的保护范围并不局限于此，任何熟悉本技术领域的技术人员在本发明披露的技术范围内，可轻易想到的变化或替换，都应涵盖在本发明的保护范围之内。因此，本发明的保护范围应该以权利要求书的保护范围为准。The above description is only a preferred embodiment of the present invention, but the protection scope of the present invention is not limited to this. Substitutions should be covered within the protection scope of the present invention. Therefore, the protection scope of the present invention should be based on the protection scope of the claims.

Claims

1. A traffic monitoring video coding method is characterized by comprising the following steps:

processing an original traffic monitoring video sequence by adopting a foreground and background segmentation method, separating out vehicles and a background, and respectively removing redundancy existing between the separated vehicles and the background and then putting the vehicles and the background into a database;

for the traffic monitoring video to be coded, a foreground and background segmentation method is also adopted to separate the vehicle to be coded and the background to be coded; selecting matched vehicles from a database by adopting a characteristic matching and rapid motion estimation mode for the vehicles to be coded; selecting a matched background from a database on the basis of the sum of absolute differences for the background to be coded;

when an inter-frame prediction mode or an intra-frame prediction mode is adopted, judging whether the vehicle to be coded or the background to be coded needs to perform rate distortion optimization processing on a matched vehicle or a matched background by using a preset mode; performing corresponding processing according to the judgment result, and encoding by using a corresponding prediction mode;

wherein, the processing the original traffic monitoring video sequence by adopting the front background segmentation method to separate the vehicle and the background, and respectively removing the redundancy existing between the separated vehicle and the background and then putting the vehicle and the background into the database comprises the following steps:

numbering vehicles from 1 to N after the redundancy is removed;

initially, the vehicle in the database is empty; for a certain vehicle v with redundancy removed_iRemoving vehicles v by using a method based on an inverted list_iSearching similar vehicles { v ] from all other vehicles_i1,v_i2,…,v_imWhere m is the number of similar vehicles;

comparing vehicles v when retrieving similar vehicles_iAnd any one of the remaining vehicles v_jNumber of SIFT features matched when vehicle v_iAnd a vehicle v_jWhen the matched SIFT feature number meets the following formula, the vehicle v is connected_jPut in { v_i1,v_i2,…,v_imIn the method, the following steps:

N_ij≥β×N_i；

N_ij≥min(N₀,N_i)；

in the above formula, N_ijFor vehicles v_iAnd a vehicle v_jNumber of matched SIFT features, N_iFor vehicles v_iNumber of SIFT features in (1), β and N₀Is a constant;

then, forAnd (3) comparing the similarity of the pixel levels by the vehicle: for vehicle v_iIf the vehicle in the database is empty, the vehicle v is put_iPutting the obtained product into a database; otherwise, the vehicle v is driven_iAnd { v_i1,v_i2,…,v_imComparing the pixel level similarity of vehicles which are put into the database, using a rapid motion estimation mode during similarity comparison, using a loss function as a sum of absolute differences, and judging that the two vehicles are similar at the pixel level if the calculated sum of absolute differences and the average value are smaller than a set value; if { v }_i1,v_i2,…,v_imThe vehicles already put into the database are not connected with the vehicle v continuously_iAt a pixel level, the vehicle v is similar_iPut into the database, otherwise, the vehicle v_iNot putting the data into a database; if the vehicle v is to be finally decided_iPut into the database, then { v }_i1,v_i2,…,v_imThe vehicles and the vehicle v in which the database has been put_iMaking a comparison of the pixel level similarity, if any, to the vehicle v_iRemoving similar vehicles which are put into the database from the database if the vehicles are similar in pixel level; if the number of vehicles and the number of vehicles v are more than the total number_iStopping comparing the pixel level similarity of the vehicles when the pixel levels are not similar;

processing each vehicle in the above manner, determining the vehicle finally put into the database, coding the vehicle and putting the vehicle into the database;

and for the background after removing the redundancy, taking a frame of background at intervals, coding the background and then putting the background into a database.

2. The traffic monitoring video coding method according to claim 1, wherein when a front background segmentation method is adopted to separate the vehicle from the background, pixels in a square region from the upper left corner to the lower right corner of the separated vehicle are used as the vehicle, and the rest part is used as the background;

for the separated current vehicle to be coded and the corresponding background, SIFT features of the vehicle to be coded and the corresponding background are respectively extracted, and for each SIFT feature extracted from the current vehicle to be coded, the following formula is adopted to search in a certain position neighborhood range on the corresponding background:

(xs_c-xs_b)²+(ys_c-ys_b)²≤d²；

wherein, xs_cAnd ys_cCoordinates, xs, representing SIFT features extracted from the vehicle currently to be coded_bAnd ys_bCoordinates representing SIFT features extracted from a corresponding background, and d is a defined range of a position neighborhood;

if the distance between the searched SIFT feature with the minimum Euclidean distance after normalization and a certain SIFT feature of the current vehicle to be coded is smaller than a threshold value, the fact that the SIFT feature similar to the SIFT feature of the current vehicle to be coded exists in the background region is indicated, the corresponding SIFT feature of the current vehicle to be coded is the background SIFT feature, and the SIFT feature is removed from the SIFT of the vehicle.

3. The traffic monitoring video coding method according to claim 1, wherein the selecting matched vehicles from the database by using the methods of feature matching and fast motion estimation for the vehicles to be coded comprises:

firstly, a plurality of candidate vehicles are roughly selected from a database in a characteristic matching mode: quantizing SIFT features of each vehicle in the database into visual words by adopting a k-means algorithm, and calculating a corresponding mapping mean vector for each visual word; mapping each SIFT feature of each vehicle in the database to the nearest neighbor visual character, and comparing the mapped SIFT feature vectors with the mapping mean vector corresponding to the nearest neighbor visual character to obtain the binarization representation of each SIFT feature vector; simultaneously, representing each vehicle in the database by using a frequency histogram of visual characters corresponding to SIFT features of the vehicle, and organizing the frequency histogram of each vehicle in the database in an inverted list mode; for the current vehicle to be coded, distributing each SIFT feature to the nearest visual characters according to the method for processing the vehicle in the database to obtain a frequency histogram of the current vehicle to be coded, and simultaneously calculating the binarization representation of each SIFT feature; when the similarity between a current vehicle to be coded and a certain vehicle in a database is compared, under the condition that the Hamming distance of the binarization representation of the SIFT features mapped to the same visual characters is smaller than a certain threshold value, the distance of a frequency histogram weighted by a tf-idf term is used as an evaluation index of the similarity, and the comparison result of the similarity between the current vehicle to be coded and each vehicle in the database is obtained; sorting according to the comparison result of the calculated similarity, and selecting a plurality of vehicles with the top similarity ranking as candidate vehicles;

then, a matching vehicle is selected from a plurality of candidate vehicles by using a rapid motion estimation mode: aligning a current vehicle to be coded with each candidate vehicle, dividing the current vehicle to be coded into blocks with the fixed size of 16x16, and searching a block with the minimum loss function in a certain candidate vehicle by each block with the size of 16x16, wherein the loss function consists of the sum of absolute differences and the coding code rate of a motion vector; the searching mode is that the position of the current 16x16 block is taken as a starting point, eight-point diamond type searching is carried out in the range of 64 pixels up, down, left and right around the starting point, and the loss functions of all the 16x16 blocks are accumulated to be used as the whole loss function of the whole current vehicle to be coded on a candidate vehicle; and finally, the candidate vehicle with the minimum overall loss function is reserved as the matching vehicle.

4. The traffic monitoring video coding method according to claim 3, wherein the alignment of the current vehicle to be coded with each candidate vehicle is performed as follows:

for a certain SIFT feature of the current vehicle to be coded, calculating the distance between the certain SIFT feature and all SIFT features of each candidate vehicle, sorting the calculated distances from small to large, and if the following formula is met, judging that the corresponding SIFT feature of the current vehicle to be coded finds a matched SIFT feature in the corresponding candidate vehicle:

d₁≤D₂；

d₁/d₂≤α；

wherein d is₁And d₂Respectively, a minimum and a second small distance, D₂And α are constants;

calculating each SIFT feature of the current vehicle to be coded according to the mode to obtain SIFT matching pairs of the current vehicle to be coded and each candidate vehicle; according to the obtained result of SIFT feature matching pairs, calculating the position offset of the current vehicle to be coded and each candidate vehicle, as shown in the following formula:

wherein, MV_xAnd MV_yFor the horizontal and vertical components of the offset, n is the number of matched SIFT feature pairs, xc_iAnd yc_iAs coordinates of the SIFT feature of the vehicle currently to be encoded, xv_iAnd yv_iCoordinates of SIFT features of the candidate vehicle; i is the serial number of SIFT feature matching pair;

removing abnormal points in an iteration mode to obtain a final position offset result; and aligning the current vehicle to be coded with the corresponding candidate vehicle according to the calculated position offset result.

5. The traffic monitoring video coding method according to claim 1, wherein selecting the matching context from the database based on the sum of absolute differences for the context to be coded comprises:

taking the absolute difference sum of the current background to be coded and the pixels at the corresponding positions of the background in the database as a similarity evaluation criterion, calculating the absolute difference sum of the current background to be coded and each background in the database, as shown in the following formula:

SAD＝∑_k∈B|pc_k-pl_k|；

wherein, pc_kAnd pl_kRespectively representing the current background to be coded and the k pixel value of the background in the database, wherein B is the set of the current background pixel to be coded;

and sorting the calculation results from small to large, and taking the background with the minimum absolute difference as the matching background of the current background to be coded.

6. The traffic monitoring video coding method according to claim 1, wherein when the inter-frame prediction mode is adopted, the determining whether the vehicle to be coded or the background to be coded needs to perform rate distortion optimization processing on the matching vehicle or the matching background by using a predetermined mode comprises:

the comparison criterion for rate-distortion optimization in the inter-prediction mode is as follows:

j is a Lagrange loss function, D is the sum of absolute differences of the prediction block and the matching block, R is the bit number used for representing mode information, and lambda is a Lagrange multiplier;

firstly, calculating Lagrange loss functions of a current vehicle to be encoded and a current background to be encoded and a current reference frame:

for each existing reference frame of the current vehicle to be coded, the motion vector of the block adopting inter-frame prediction 4 × 4 at the corresponding position of the current vehicle to be coded and the image number information of the reference frame are obtained by taking the block of 4 × 4 as a unit, and on the basis, the motion vector information of the block corresponding to 4 × 4 on the current vehicle to be coded is estimated, wherein the estimation formula is as follows:

wherein, MVX_refAnd MVY_refHorizontal and vertical components of the block motion vector, POC, of inter prediction 4 × 4 on an existing reference frame, respectively_cur、POC_refAnd POC_colrefThe image number of the frame where the current vehicle to be coded is located, the image number of the existing reference frame and the block reference frame of inter-frame prediction 4 × 4 on the existing reference frameThe image number of (1); MVX_curAnd MVY_curTraversing each 4x4 small block in the current vehicle to be coded, recording the number of blocks of inter-frame prediction 4 × and the motion vector of the corresponding block of the current vehicle to be coded 4 ×, and finally, taking the horizontal component and the vertical component of the finally estimated current vehicle to be coded as the average value of all the inter-frame prediction 4x4 small block motion vectors;

obtaining the displacement of the current vehicle to be coded on each existing reference frame, then dividing the current vehicle to be coded into blocks with the fixed size of 16x16, and sequentially searching the block with the minimum loss function in all the existing reference frames by each block with the size of 16x16, wherein the loss function consists of the sum of absolute differences and the coding code rate of a motion vector; the searching mode is that the position of the current 16x16 block after translation according to the estimated displacement is taken as a starting point, and eight-point diamond type searching is carried out in the range of 64 pixels around the starting point; recording the minimum loss function of all blocks in the current vehicle to be coded and the matching blocks of the blocks in all the existing reference frames by taking a 16x16 block as a unit; sequentially traversing each 16x16 block in the current vehicle to be coded, accumulating the blocks to obtain the minimum loss function sum, and obtaining the Lagrangian loss function of the current vehicle to be coded relative to the current reference frame

For the current background to be coded, dividing the current background into 16x16 blocks; for the current 16x16 block, searching a matching block corresponding to the minimum loss function from all the existing reference frames; the searching mode is that the absolute difference sum of the 16x16 blocks at the corresponding positions of all the reference frames and the current 16x16 block in the current background to be coded is compared, and the minimum absolute difference sum is selected as the loss function of the current 16x16 block in the current background to be coded; traversing all 16x16 blocks in the background to be coded currently, and accumulating the loss functions of all 16x16 blocks as Lagrangian loss functions of the background to be coded currently

Then, taking into account the matching vehicles and the background, an updated lagrangian loss function is calculated:

for each 16x16 block in the vehicle currently to be encoded, the lagrange loss function

On the basis of the calculation result, calculating a loss function of the vehicle and the matched vehicle by adopting a rapid motion estimation method; then, the loss function of each 16x16 block and the matched vehicle is compared with the calculated Lagrange loss function

Comparing the obtained minimum loss function with the minimum loss function of the existing reference frame, and taking the smaller one as the minimum loss function of the corresponding 16x16 block; traversing each 16x16 block in the current vehicle to be coded, and accumulating the minimum loss function of each 16x16 block to obtain the Lagrangian loss function of the current vehicle to be coded

Meanwhile, for the current vehicle to be coded, the change of the bit number comprises the position index information of the matched vehicle in the database, the position information of the matched vehicle in the reference frame, the reference index bit change information and the CTU-level representation information, and the change of the bit number and the Lagrangian loss function are combined

Combined to obtain an updated Lagrangian loss function

For each 16x16 block within the current context to be encoded, the lagrangian loss function

Calculating a loss function of the matched background on the basis of the calculation result; then, the loss function of each 16x16 block and the matched background is combined with the calculated Lagrangian loss function

Comparing the obtained minimum loss function with the minimum loss function of the existing reference frame, and taking the smaller one as the minimum loss function of the corresponding 16x16 block; traversing each 16x16 block in the current background to be coded, and accumulating the minimum loss function of each 16x16 block to obtain the loss function of the current background to be coded

Meanwhile, for the current background to be coded, the change of the bit number comprises the position index information and the reference index bit change information of the matched background in the database, and the bit number change and the Lagrangian loss function are carried out

Combined to obtain an updated Lagrangian loss function

Finally, the Lagrangian loss functions are compared

With updated lagrange loss function

The size between, if

Performing rate distortion optimization processing on the matched vehicle; comparison of Lagrange loss functions

With updated lagrange loss function

The size between, if

Then a rate distortion optimization process is performed on the matching background.

7. The traffic monitoring video coding method according to claim 1, wherein when the intra prediction mode is adopted, the determining whether the vehicle to be coded or the background to be coded needs to perform rate distortion optimization processing on the matching vehicle or the matching background by using a predetermined mode comprises:

the comparison criteria for rate-distortion optimization in intra prediction mode are:

for the current background to be coded, carrying out rate distortion optimization processing on the matched background all the time in an intra-frame prediction mode;

for the current vehicle to be coded, firstly, a loss function when the current vehicle to be coded adopts intra-frame prediction is roughly estimated: dividing a current vehicle to be coded into blocks with a fixed size of 16x16, and sequentially estimating DC, planar, horizontal and vertical intra-frame prediction modes for each block of 16x16 to obtain the sum of absolute differences of each block of 16x16 corresponding to each mode; in intra prediction mode estimation, the reference pixel values of the current 16x16 block are deduced from the original values of the neighboring 16x16 blocks; for each 16x16 block, sorting the sum of absolute differences estimated in all modes in order from small to large, and taking the result with the smallest sum of absolute differences as the optimal matching result of the current 16x16 block; go throughAccumulating the optimal matching result of each 16x16 block of all 16x16 blocks in the current vehicle to be coded to obtain the Lagrangian loss function of the current vehicle to be coded

Then, taking the matching vehicles into account, an updated lagrangian loss function is calculated: for each 16x16 block in the vehicle currently to be encoded, the lagrange loss function

On the basis of the calculation result, calculating and matching a loss function of the vehicle by adopting a rapid motion estimation method; then, the loss function of each 16x16 block and the matched vehicle is compared with the calculated Lagrange loss function

Comparing the obtained minimum absolute difference sum estimated by the intra-frame prediction, and taking the smaller one as the minimum loss function of the corresponding 16x16 block; traversing each 16x16 block in the current vehicle to be coded, and accumulating the minimum loss function of each 16x16 block to obtain the loss function of the current vehicle to be coded

Meanwhile, for the current vehicle to be coded, the change of the bit number comprises the position index information of the matched vehicle in the database, the position information of the matched vehicle in the reference frame and the CTU-level representation information, and the change of the bit number and the Lagrange loss function are combined

Combined to obtain an updated Lagrangian loss function

Finally, the Lagrangian loss functions are compared

With updated lagrange loss function

The size between, if

Then a rate-distortion optimization process is performed on the matching vehicle.

8. The traffic monitoring video coding method according to claim 1, 6 or 7, wherein the corresponding processing is performed according to the judgment result, the coding is performed by using the corresponding prediction mode, and the information of the matched vehicle or the matched background referred to in the coding is coded into the code stream together

When an inter-frame prediction mode is adopted, if the rate distortion optimization processing needs to be carried out on the matched vehicle or the matched background, a reference frame space is newly applied, and the matched vehicle or the matched background is attached to the newly applied reference frame and is used for inter-frame prediction of the current vehicle to be coded or the background to be coded together with the existing reference frame; after the interframe prediction is finished, traversing each 4x4 block covered by the current vehicle to be coded or the current background to be coded, and if a certain 4x4 block refers to the information of the current vehicle to be coded or the current background to be coded, coding a corresponding syntax element into a code stream;

when an intra-frame prediction mode is adopted, if the rate distortion optimization processing needs to be carried out on the matched vehicle or the matched background, a reference frame space is newly applied, and the matched vehicle or the matched background is attached to the newly applied reference frame for intra-frame prediction of the current vehicle to be coded or the background to be coded;

the location of the reference frame where the matching vehicle is attached to the new application is determined by:

x₀＝x_c+MV_x；

y₀＝y_c+MV_y；

wherein x is₀And y₀Indicating the location of the matching vehicle to attach to the newly applied reference frame, x_cAnd y_cIndicating the position, MV, of the current vehicle to be coded in the current frame_xAnd MV_yThe horizontal component and the vertical component of the offset of the current vehicle to be coded relative to the matched vehicle;

when the matched background is pasted on the reference frame, the matched background is aligned with the position of the reference frame.

9. The traffic monitoring video coding method according to claim 8, wherein the structure of the coded stream is divided into two layers, a slice and a tree coding unit CTU; wherein:

slice layer: for the current vehicle to be coded, the slice layer comprises a mark for indicating whether a matched vehicle is referenced in the current slice layer; traversing 4x4 blocks covered by all vehicles in the current slice layer, and judging whether the blocks refer to matched vehicles, if a certain 4x4 block refers to a matched vehicle, marking the block as true, otherwise, marking the block as false; if the mark is true, the slice layer also comprises a syntax element which represents the number of the referenced matched vehicles in the current slice layer; for each matched vehicle, the position index of the matched vehicle in the database and the position of the matched vehicle attached to the reference frame of the new application are coded into a code stream, and the number of the referenced matched vehicles, the index of each matched vehicle and the position of each matched vehicle attached to the reference frame of the new application are coded in a fixed-length coding mode;

for the current background to be coded, the slice layer comprises a mark for indicating whether the matching background is referred in the current slice layer; traversing all 4x4 blocks covered by the background in the current slice layer, and judging whether the blocks refer to a matching background, if a certain 4x4 block refers to the matching background, marking the block as true, otherwise, marking the block as false; if the mark is true, the slice layer also contains a position index syntax element of the referenced matching background in the database, and the syntax element is coded by adopting a fixed-length coding mode;

and (3) CTU layer: for the current vehicle to be coded, the CTU layer comprises a mark for indicating whether the current CTU layer refers to the matched vehicle pixel or not; traversing each 4x4 block in the current CTU layer, if there is some 4x4 block that references a matching vehicle pixel, then marking as true, otherwise marking as false; when the flag is true, the CTU layer further includes a syntax element indicating a matching vehicle index;

for the current background to be encoded, the CTU layer contains a flag indicating whether the current CTU layer references matching background pixels.