CN104301735B

CN104301735B - The overall situation coding method of urban transportation monitor video and system

Info

Publication number: CN104301735B
Application number: CN201410616965.3A
Authority: CN
Inventors: 胡瑞敏; 马民生; 肖晶; 胡金晖; 尹黎明
Original assignee: Wuhan University WHU
Current assignee: Wuhan University WHU
Priority date: 2014-10-31
Filing date: 2014-10-31
Publication date: 2017-09-29
Anticipated expiration: 2034-10-31
Also published as: CN104301735A

Abstract

The invention discloses a method and system for global encoding of urban traffic monitoring video, comprising steps: step 1, dividing the original monitoring video into vehicle video and removing vehicle video; step 2, adopting an optional differential encoding method to remove the vehicle video Encoding; step 3, extracting the global feature parameter set of the global moving vehicle in the vehicle video; step 4, performing global encoding on the vehicle video based on the global feature parameters. The invention further removes the global redundancy in the monitoring video on the basis of removing the scene redundancy, and effectively improves the encoding and compression efficiency of the urban traffic monitoring video.

Description

Global encoding method and system for urban traffic surveillance video

技术领域technical field

本发明属于城市交通监控视频编码技术领域，尤其涉及一种城市交通监控视频全局编码方法及系统。The invention belongs to the technical field of urban traffic monitoring video coding, and in particular relates to a global coding method and system for urban traffic monitoring video.

背景技术Background technique

视频信号压缩编码技术的目标是在保证一定重构质量的前提下，以尽量少的比特数来表征视频信息。传统基于香农信息论的视频编码方法从信号处理层面入手，以像素、块为表示基础，采用变换、预测、熵编码融合的混合编码框架，通过挖掘图像视频信号自身的时空冗余来提高压缩性能。然而目前大多数视频压缩技术都面向非特定的应用，近年来，针对专门应用(例如监控视频)的特点和需求而开发的视频压缩技术成为备受关注的研究方向，例如城市交通环境下的监控视频编码和传输技术。AVS-S2针对监控视频场景长期不变的特点，通过对监控背景和前景进行建模，选择性地使用原始模式和差分模式对各块进行编码，去除了大量存在的“场景冗余”，编码效率是H.264/AVC的两倍，是首个面向视频监控的国际标准。然而AVS-S2无法去除因全局对象运动产生的“全局冗余”，压缩效率提升有限，数据量和存储容量之间的矛盾依然十分突出The goal of video signal compression coding technology is to represent video information with as few bits as possible under the premise of ensuring a certain reconstruction quality. The traditional video coding method based on Shannon information theory starts from the signal processing level, uses pixels and blocks as the basis, adopts a hybrid coding framework that combines transformation, prediction, and entropy coding, and improves compression performance by mining the spatiotemporal redundancy of the image and video signal itself. However, most of the current video compression technologies are oriented to non-specific applications. In recent years, video compression technologies developed for the characteristics and needs of specialized applications (such as surveillance video) have become a research direction that has attracted much attention, such as surveillance in urban traffic environments. Video coding and transmission technology. AVS-S2 aims at the long-term unchanged characteristics of surveillance video scenes, by modeling the surveillance background and foreground, and selectively using the original mode and differential mode to encode each block, removing a large number of existing "scene redundancy". The efficiency is twice that of H.264/AVC, which is the first international standard for video surveillance. However, AVS-S2 cannot remove the "global redundancy" caused by global object motion, the compression efficiency is limited, and the contradiction between data volume and storage capacity is still very prominent

监控视频中不同款型的车辆具有视频纹理特性的相似性，同一款型的车辆具有3D对象的同一性，同一台车则具有外观特征的长时稳定性。具有相似、同一、长时稳定性的各类城市运行车辆被遍布城市各地的监控摄像镜头反复捕获，从而产生了大量的城市监控数据冗余。城市监控点设置大多处于欠覆盖状态，车辆和人员移动产生的数据构成了城市监控数据的主要来源。同一台运动车辆在城域海量监控摄像头下被反复摄录产生的视频监控数据冗余称为全局冗余。不同运动对象间存在纹理相似性、同一类语义对象间存在形体一致性、特定对象间存在长时相似性，产生了大量的运动对象全局冗余。传统的视频编码和场景冗余去除技术去除的是局部时空冗余，而监控视频中由于车辆被摄像头重复长时摄录产生的全局冗余为视频压缩效率的进一步提升提供了巨大空间。Different models of vehicles in the surveillance video have similarity in video texture characteristics, vehicles of the same model have the identity of 3D objects, and the same vehicle has long-term stability of appearance characteristics. All kinds of urban operating vehicles with similar, identical, and long-term stability are repeatedly captured by surveillance cameras all over the city, resulting in a large amount of urban surveillance data redundancy. Most of the urban monitoring points are under-covered, and the data generated by the movement of vehicles and people constitutes the main source of urban monitoring data. The redundancy of video monitoring data generated by the same moving vehicle being repeatedly recorded by a large number of surveillance cameras in the city is called global redundancy. There are texture similarities between different moving objects, shape consistency between the same semantic objects, and long-term similarities between specific objects, resulting in a large number of global redundancy of moving objects. Traditional video coding and scene redundancy removal technologies remove local spatio-temporal redundancy, while the global redundancy in surveillance video due to repeated long-term recording of vehicles by cameras provides a huge space for further improvement of video compression efficiency.

发明内容Contents of the invention

针对现有技术存在的不足，本发明提供了一种考虑了全局冗余的城市交通监控视频全局编码方法及系统，该方法可进一步提高城市交通监控视频的编码效率。Aiming at the deficiencies in the prior art, the present invention provides a method and system for global encoding of urban traffic monitoring video in consideration of global redundancy, and the method can further improve the encoding efficiency of urban traffic monitoring video.

为解决上述技术问题，本发明采用如下的技术方案：In order to solve the problems of the technologies described above, the present invention adopts the following technical solutions:

(一)一种城市交通监控视频全局编码方法，包括步骤：(1) a global encoding method for urban traffic monitoring video, comprising steps:

步骤1，将原始监控视频分割成车辆视频和去除车辆的视频；Step 1, the original monitoring video is divided into vehicle video and video of removing vehicle;

步骤2，采用可选差分编码方式对去除车辆的视频进行编码；Step 2, using an optional differential encoding method to encode the video with the vehicle removed;

步骤3，提取车辆视频中全局运动车辆的全局特征参数集，进一步包括：Step 3, extracting the global feature parameter set of the global moving vehicle in the vehicle video, further including:

S31提取全局运动车辆的2D外观特征；S31 extracts the 2D appearance feature of the global moving vehicle;

S32构建车辆3D模型数据库，车辆3D模型数据库包括各类车辆的通用3D模型、精细3D模型及模型关键描述参数集，模型关键描述参数由模型描述参数集降维获得；S32 builds a vehicle 3D model database, the vehicle 3D model database includes general 3D models of various vehicles, fine 3D models and model key description parameter sets, and the model key description parameters are obtained by dimensionality reduction of the model description parameter set;

S33采用稀疏编码方式构建全局运动车辆的全局车辆纹理字典，进一步包括：S33 uses sparse coding to build a global vehicle texture dictionary for global moving vehicles, further including:

以为代价函数，基于全局运动车辆的纹理信息，获得第一层知识字典，即所有类全局运动车辆的共性视觉纹理信息知识字典；by As the cost function, based on the texture information of the global moving vehicle, the first layer of knowledge dictionary is obtained, that is, the common visual texture information knowledge dictionary of all types of global moving vehicles;

获得各类全局运动车辆经第一层知识字典重建后与原始全局运算车辆的差异信息r_c，以为代价函数，基于差异信息r_c，获得第二层知识字典，即各类全局运动车辆的三维结构及纹理个性信息知识字典；Obtain the difference information r _c between all kinds of global moving vehicles reconstructed by the first-level knowledge dictionary and the original global computing vehicles, to As the cost function, based on the difference information r _c , the second-level knowledge dictionary is obtained, that is, the knowledge dictionary of the three-dimensional structure and texture individual information of various global moving vehicles;

获得各类全局运动车辆下各个体全局运动车辆经第二层知识字典重建后与原始全局运动车辆的差异信息r_c,m，以为代价函数，基于差异信息r_c,m，获得第三层知识字典，即个体全局运动车辆的个性长时更新信息知识字典；Obtain the difference information r _c,m between each individual global moving vehicle reconstructed by the second-layer knowledge dictionary and the original global moving vehicle under various global moving vehicles, to As the cost function, based on the difference information r _c,m , the third layer of knowledge dictionary is obtained, that is, the knowledge dictionary of the personality long-term update information of individual global moving vehicles;

上述，D₁表示第一层知识字典；C表示全局运动车辆的类型数，c表示全局运动车辆类型编号；y_c表示所有类型全局运动车辆纹理信息；a₁表示编码系数；τ为平衡因子，根据实际情况和经验设定，τ越大编码系数越稀疏；表示第二层知识字典，M表示某类全局运动车辆下个体车辆数量，m表示某类全局运动车辆下个体车辆编号，a_2,c表示编码系数；表示第三层知识字典，N表示该类型全局运动车辆下的个体车辆数量，i表示该类型全局运动车辆下的个体车辆编号；a_3,c,m表示编码系数；As mentioned above, D ₁ represents the first layer of knowledge dictionary; C represents the number of types of global moving vehicles, c represents the type number of global moving vehicles; y _c represents the texture information of all types of global moving vehicles; a ₁ represents the coding coefficient; τ is the balance factor, According to the actual situation and empirical setting, the larger τ is, the sparser the coding coefficient is; Represents the second-level knowledge dictionary, M represents the number of individual vehicles under a certain type of global motion vehicle, m represents the number of individual vehicles under a certain type of global motion vehicle, a _{2, c} represent coding coefficients; Represents the third layer of knowledge dictionary, N represents the number of individual vehicles under this type of global moving vehicle, i represents the number of individual vehicles under this type of global moving vehicle; a _{3, c, m} represent coding coefficients;

S34将全局运动车辆的2D外观特征与车辆3D模型数据库中模型进行特征匹配，获得全局运动车辆的纹理及模型关键描述参数信息；S34 matches the 2D appearance features of the global moving vehicle with the model in the vehicle 3D model database, and obtains the texture of the global moving vehicle and key description parameter information of the model;

S35根据全局运动车辆的2D外观特征提取全局运动车辆位置信息，并结合对应的模型关键描述参数信息中的姿态信息构成全局运动车辆的位置及姿态参数；S35 extracting the position information of the global moving vehicle according to the 2D appearance feature of the global moving vehicle, and combining the attitude information in the corresponding model key description parameter information to form the position and attitude parameters of the global moving vehicle;

S36对全局特征参数集和三级知识字典对应的编码系数进行无损压缩，所述的全局特征参数集由步骤S34获得的纹理及模型关键描述参数信息和步骤S35获得的位置及姿态参数构成；S36 performs lossless compression on the coding coefficients corresponding to the global feature parameter set and the three-level knowledge dictionary, and the global feature parameter set is composed of the texture and model key description parameter information obtained in step S34 and the position and attitude parameters obtained in step S35;

步骤4，基于全局特征参数对车辆视频进行全局编码。Step 4: Globally encode the vehicle video based on the global feature parameters.

步骤1中采用背景建模和车辆检测技术将原始监控视频分割成车辆视频和去除车辆的视频，具体包括：In step 1, background modeling and vehicle detection techniques are used to segment the original surveillance video into vehicle video and vehicle-removed video, including:

S11将原始监控视频图像转换至YUV空间，基于背景差分法建立自动更新的背景模型；S11 converts the original surveillance video image to YUV space, and establishes an automatically updated background model based on the background difference method;

S12利用车辆检测法检测原始监控视频图像中车辆，获得车辆视频图像；S12 utilizes the vehicle detection method to detect the vehicle in the original surveillance video image, and obtain the vehicle video image;

S13将原始监控视频图像减去车辆视频图像，获得包含背景空洞的去除车辆的视频图像；S13 subtracting the vehicle video image from the original monitoring video image to obtain a video image containing background holes and removing the vehicle;

S14采用背景模型对S13获得的视频图像中背景空洞进行叠加填补，获得去除车辆的视频图像。S14 uses the background model to superimpose and fill the background hole in the video image obtained in S13 to obtain the video image without the vehicle.

步骤2进一步包括子步骤：Step 2 further includes sub-steps:

S21根据去除车辆的视频图像生成背景图像，经编码后重构背景图像；S21 generates a background image according to the video image from which the vehicle is removed, and reconstructs the background image after encoding;

S22对去除车辆的视频图像进行全局运动估计，获得全局运动矢量；S22 performs global motion estimation on the video image from which the vehicle is removed, and obtains a global motion vector;

S23基于重构背景图像和全局运动矢量，选择性地使用原始编码模式或差分编码模式对各视频块进行编码。S23, based on the reconstructed background image and the global motion vector, selectively use the original coding mode or the differential coding mode to code each video block.

子步骤S32进一步包括：Substep S32 further includes:

(1)构建基于网格结构的车辆通用3D模型；(1) Construct a general 3D model of a vehicle based on a grid structure;

(2)获得车辆精细3D模型；(2) Obtain a fine 3D model of the vehicle;

(3)根据车辆通用3D模型获得3D模型描述参数集；(3) obtaining a 3D model description parameter set according to the general 3D model of the vehicle;

(4)对3D模型描述参数集中参数进行降维，获得关键描述参数。(4) Dimensionality reduction is performed on the parameters in the 3D model description parameter set to obtain key description parameters.

子步骤S35进一步包括：Substep S35 further includes:

(1)确定全局运动车辆的位置与角度参数ρ＝[x,y,θ]^T，x、y为全局运动车辆中心在世界坐标系的垂直投影坐标，θ为全局运动车辆主运动方向与OX轴的夹角；(1) Determine the position and angle parameters of the global moving vehicle ρ=[x, y, θ] ^T , x, y are the vertical projection coordinates of the center of the global moving vehicle in the world coordinate system, θ is the main moving direction of the global moving vehicle and OX the included angle of the axis;

(2)通过背景建模提取车辆视频中的运动区域；(2) extracting the motion region in the vehicle video by background modeling;

(3)利用稀疏光流法获得全局运动车辆的二维运动矢量；(3) Using the sparse optical flow method to obtain the two-dimensional motion vector of the global moving vehicle;

(4)获得全局运动车辆在世界坐标系的主运动方向θ与速度v；(4) Obtain the main motion direction θ and velocity v of the global moving vehicle in the world coordinate system;

(5)将全局运动车辆匹配的通用3D模型的二维投影与运动区域的大小与形状迭代匹配，得到全局运动车辆在世界坐标系下的位置参数(x,y)。(5) Iteratively match the two-dimensional projection of the general 3D model of the global moving vehicle with the size and shape of the moving area, and obtain the position parameters (x, y) of the global moving vehicle in the world coordinate system.

步骤4进一步包括子步骤：Step 4 further includes sub-steps:

S41基于全局车辆特征参数对全局运动车辆进行运动估计和运动补偿，得到残差参数信息；S41 Perform motion estimation and motion compensation on the global moving vehicle based on the global vehicle characteristic parameters, to obtain residual parameter information;

S42获取全局运动车辆的光照补偿参数；S42 acquires the illumination compensation parameters of the global moving vehicle;

S43融合残差参数信息和光照补偿参数，并对残参数信息和光照补偿参数进行无损编码。S43 fuses the residual parameter information and the illumination compensation parameter, and performs lossless coding on the residual parameter information and the illumination compensation parameter.

(二)一种城市交通监控视频全局编码系统，包括：(2) A global coding system for urban traffic monitoring video, comprising:

(1)视频分割模块，用来将原始监控视频分割成车辆视频和去除车辆的视频；(1) Video segmentation module, used for dividing original monitoring video into vehicle video and removing the video of vehicle;

(2)可选差分编码模块，用来采用可选差分编码方式对去除车辆的视频进行编码；(2) An optional differential encoding module, used to encode the video in which the vehicle is removed in an optional differential encoding manner;

(3)全局特征参数提取模块，用来提取车辆视频中全局运动车辆的全局特征参数集，本模块进一步包括子模块：(3) The global feature parameter extraction module is used to extract the global feature parameter set of the global moving vehicle in the vehicle video, and this module further includes submodules:

2D外观特征提取模块，用来提取全局运动车辆的2D外观特征；The 2D appearance feature extraction module is used to extract the 2D appearance features of the global moving vehicle;

车辆3D模型数据库构建模块，用来构建车辆3D模型数据库，车辆3D模型数据库包括各类车辆的通用3D模型、精细3D模型及模型关键描述参数集，模型关键描述参数由模型描述参数集降维获得；The vehicle 3D model database building module is used to construct the vehicle 3D model database. The vehicle 3D model database includes general 3D models, fine 3D models and key description parameter sets of various vehicles. The key description parameters of the model are obtained by reducing the dimension of the model description parameter set ;

全局车辆纹理字典构建模块，用来采用稀疏编码方式构建全局运动车辆的全局车辆纹理字典，进一步包括：The global vehicle texture dictionary construction module is used to construct the global vehicle texture dictionary of the global moving vehicle by means of sparse coding, further including:

纹理及模型关键描述参数信息获取模块，用来将全局运动车辆的2D外观特征与车辆3D模型数据库中模型进行特征匹配，获得全局运动车辆的纹理及模型关键描述参数信息；The texture and model key description parameter information acquisition module is used to match the 2D appearance features of the global moving vehicle with the model in the vehicle 3D model database, and obtain the texture and model key description parameter information of the global moving vehicle;

位置及姿态参数获取模块，用来根据全局运动车辆的2D外观特征提取全局运动车辆位置信息，并结合对应的模型关键描述参数信息中的姿态信息构成全局运动车辆的位置及姿态参数；The position and attitude parameter acquisition module is used to extract the position information of the global moving vehicle according to the 2D appearance characteristics of the global moving vehicle, and combine the attitude information in the corresponding model key description parameter information to form the position and attitude parameters of the global moving vehicle;

无损压缩模块，用来对全局特征参数集和三级知识字典对应的编码系数进行无损压缩，所述的全局特征参数集由步骤S34获得的纹理及模型关键描述参数信息和步骤S35获得的位置及姿态参数构成；The lossless compression module is used to perform lossless compression on the coding coefficients corresponding to the global feature parameter set and the three-level knowledge dictionary. The global feature parameter set is composed of the texture and model key description parameter information obtained in step S34 and the position and position obtained in step S35. Composition of attitude parameters;

(4)全局编码模块，用来基于全局特征参数对车辆视频进行全局编码。(4) A global encoding module, used to globally encode the vehicle video based on the global feature parameters.

上述位置及姿态参数获取模块进一步包括：The above-mentioned position and attitude parameter acquisition module further includes:

位置与角度参数确定模块，用来确定全局运动车辆的位置与角度参数ρ＝[x,y,θ]^T，x、y为全局运动车辆中心在世界坐标系的垂直投影坐标，θ为全局运动车辆主运动方向与OX轴的夹角；The position and angle parameter determination module is used to determine the position and angle parameters of the global moving vehicle ρ=[x,y,θ] ^T , where x and y are the vertical projection coordinates of the center of the global moving vehicle in the world coordinate system, and θ is the global moving The angle between the main motion direction of the vehicle and the OX axis;

运动区域提取模块，用来通过背景建模提取车辆视频中的运动区域；The motion area extraction module is used to extract the motion area in the vehicle video through background modeling;

二维运动矢量获得模块，用来利用稀疏光流法获得全局运动车辆的二维运动矢量；The two-dimensional motion vector acquisition module is used to obtain the two-dimensional motion vector of the global moving vehicle by using the sparse optical flow method;

主运动方向与速度获得模块，用来获得全局运动车辆在世界坐标系的主运动方向θ与速度v；The main motion direction and speed acquisition module is used to obtain the main motion direction θ and speed v of the global moving vehicle in the world coordinate system;

位置参数获得模块，用来将全局运动车辆匹配的通用3D模型的二维投影与运动区域的大小与形状迭代匹配，得到全局运动车辆在世界坐标系下的位置参数(x,y)。The position parameter acquisition module is used to iteratively match the two-dimensional projection of the universal 3D model matched with the global moving vehicle with the size and shape of the moving area, and obtain the position parameters (x, y) of the global moving vehicle in the world coordinate system.

上述全局编码模块进一步包括：The above-mentioned global encoding module further includes:

残差参数信息获得模块，用来基于全局车辆特征参数对全局运动车辆进行运动估计和运动补偿，得到残差参数信息；The residual parameter information obtaining module is used to perform motion estimation and motion compensation on the global moving vehicle based on the global vehicle characteristic parameters to obtain the residual parameter information;

光照补偿参数获取模块，用来获取全局运动车辆的光照补偿参数；The illumination compensation parameter acquisition module is used to acquire the illumination compensation parameters of the global moving vehicle;

无损编码模块，用来融合残差参数信息和光照补偿参数，并对残参数信息和光照补偿参数进行无损编码。The lossless encoding module is used to fuse the residual parameter information and the illumination compensation parameter, and perform lossless encoding on the residual parameter information and the illumination compensation parameter.

本发明基于城市监控视频中全局对象冗余产生的机制，利用监控视频中车辆信息占比较大，车辆结构性强、外观相似、纹理丰富的特点，通过车辆检测技术对原始视频进行分割，生成车辆视频和去除车辆后的视频，采用不同的方式分别进行编码。对于车辆视频通过稀疏编码技术等建立车辆知识字典，提取得到全局特征参数集，由于编码时仅通过对运动车辆的纹理、姿态等特征进行描述，将全局运动车辆的视频数据转变成仅包含少量信息的特征描述数据，有效去除了运动车辆的全局冗余；而对于去除车辆后的视频(包括背景图像和其他运动对象)则采用基于AVS-S2可选差分编码的方式进行编码。本发明在去除场景冗余的基础上进一步去除了监控视频中的全局冗余，有效提高了编码压缩效率。The present invention is based on the mechanism of global object redundancy generation in urban monitoring video, and utilizes the large proportion of vehicle information in the monitoring video, the characteristics of strong vehicle structure, similar appearance, and rich texture, and uses vehicle detection technology to segment the original video to generate vehicles. The video and the video after removing the vehicle are encoded separately in different ways. For the vehicle video, the vehicle knowledge dictionary is established through sparse coding technology, etc., and the global feature parameter set is extracted. Since the encoding only describes the texture, attitude and other characteristics of the moving vehicle, the video data of the global moving vehicle is transformed into only containing a small amount of information. The feature description data of the moving vehicle effectively removes the global redundancy of the moving vehicle; while the video after removing the vehicle (including background images and other moving objects) is encoded based on AVS-S2 optional differential coding. The present invention further removes the global redundancy in the monitoring video on the basis of removing the scene redundancy, and effectively improves the encoding and compression efficiency.

附图说明Description of drawings

图1是本发明方法的具体流程图；Fig. 1 is the concrete flowchart of the inventive method;

图2是视频分割的具体流程图；Fig. 2 is the specific flowchart of video segmentation;

图3是车辆检测法的具体流程图；Fig. 3 is the concrete flowchart of vehicle detection method;

图4是提取全局特征参数的具体流程图；Fig. 4 is the specific flow chart of extracting the global feature parameter;

图5为车辆2D外观特征示意图，其中，图a是车辆的3D模型图，图b是车辆2D模板的采样；Fig. 5 is a schematic diagram of vehicle 2D appearance features, wherein, Fig. a is a 3D model diagram of a vehicle, and Fig. b is a sampling of a vehicle 2D template;

图6是车辆通用3D模型和车辆精细3D模型示意图，其中，图a是车辆通用3D模型，图b是车辆通用3D模型的纹理示意图，图c是车辆精细3D模型示意图；Fig. 6 is a schematic diagram of a general 3D model of a vehicle and a fine 3D model of a vehicle, wherein Fig. a is a general 3D model of a vehicle, Fig. b is a texture schematic diagram of a general 3D model of a vehicle, and Fig. c is a schematic diagram of a fine 3D model of a vehicle;

图7为运动车辆姿态位置与空间角度参数描述示意图；Fig. 7 is a schematic diagram describing the attitude position and space angle parameters of a moving vehicle;

图8是车辆位置及姿态参数提取示意图；Fig. 8 is a schematic diagram of vehicle position and attitude parameter extraction;

图9是车辆视频全局编码流程示意图。FIG. 9 is a schematic diagram of a global encoding process of a vehicle video.

具体实施方式detailed description

为使本发明目的、技术特征和优点能够更加明显易懂，下面结合附图和具体实施方式对本发明作进一步说明。In order to make the purpose, technical features and advantages of the present invention more obvious and comprehensible, the present invention will be further described below in conjunction with the accompanying drawings and specific embodiments.

城市交通监控视频中监控场景较为固定，车辆信息占比较大，且车辆运动产生大量全局冗余。基于城市交通监控视频的上述特点及车辆结构性强、外观相似、纹理丰富的特点，本发明提供了一种城市交通监控视频全局编码方法及系统对城市交通监控视频进行编码，以去除监控视频中场景冗余和全局冗余。The monitoring scene in urban traffic surveillance video is relatively fixed, the vehicle information accounts for a large proportion, and vehicle movement produces a large amount of global redundancy. Based on the above-mentioned characteristics of urban traffic monitoring video and the characteristics of strong vehicle structure, similar appearance and rich texture, the present invention provides a global encoding method and system for urban traffic monitoring video to encode urban traffic monitoring video to remove Scene redundancy and global redundancy.

本发明首先，通过视频分割技术将原始监控视频分成车辆视频和去除车辆的视频。然后，对去除车辆的视频，采用基于AVS-S2的可选差分编码方式去除该视频中的场景冗余。接着，对车辆视频，利用稀疏编码技术创建车辆知识字典，进而生成包括车辆的位置与姿态信息以及纹理与参数信息的全局特征参数集，对全局特征参数进行全局编码；编码时仅通过对运动车辆的纹理、姿态等特征进行描述，将全局运动车辆的视频数据转变成仅包含少量信息的特征描述数据，有效去除了运动车辆的全局冗余，从而进一步提升编码效率。Firstly, the present invention divides the original monitoring video into the vehicle video and the vehicle-removed video through the video segmentation technology. Then, for the video where the vehicle is removed, the optional differential coding method based on AVS-S2 is used to remove the scene redundancy in the video. Next, for the vehicle video, use sparse coding technology to create a vehicle knowledge dictionary, and then generate a global feature parameter set including the vehicle's position and attitude information as well as texture and parameter information, and encode the global feature parameters globally; The texture, pose and other features of the vehicle are described, and the video data of the global moving vehicle is transformed into feature description data containing only a small amount of information, which effectively removes the global redundancy of the moving vehicle, thereby further improving the coding efficiency.

图1为本发明方法的具体流程图，参照图1，本发明方法具体步骤如下：Fig. 1 is the concrete flowchart of the inventive method, with reference to Fig. 1, the specific steps of the inventive method are as follows:

步骤1，视频分割：将原始监控视频分割成车辆视频和去除车辆的视频。Step 1, video segmentation: segment the original surveillance video into vehicle video and remove vehicle video.

视频分割可通过背景建模和车辆检测技术实现，将原始监控视频分成车辆视频和去除车辆的视频两部分。Video segmentation can be realized by background modeling and vehicle detection technology, and the original surveillance video is divided into two parts: the vehicle video and the video without the vehicle.

见图2，本步骤针对视频图像进行，进一步包括子步骤：See Figure 2, this step is carried out for the video image, and further includes sub-steps:

S11将原始监控视频图像转换至YUV空间，基于背景差分法建立自动更新的背景模型。S11 converts the original surveillance video image to YUV space, and establishes an automatically updated background model based on the background difference method.

本具体实施中，背景模型的建立采用ViBe方法(视觉背景提取方法)实现，但背景模型的建模方法并不限于ViBe方法。In this specific implementation, the background model is established using the ViBe method (visual background extraction method), but the modeling method of the background model is not limited to the ViBe method.

S12利用车辆检测法检测原始监控视频图像中车辆，并获得车辆视频图像。S12 Use the vehicle detection method to detect the vehicle in the original surveillance video image, and obtain the vehicle video image.

本步骤的具体实施流程见图3，包括步骤：The specific implementation process of this step is shown in Figure 3, including steps:

(1)对原始监控视频图像进行高斯滤波，采用背景模型检测运动区域。高斯滤波用来消除图像中高斯噪声，改善图像质量，进一步保证后续视频处理的正确性。(1) Gaussian filtering is performed on the original surveillance video image, and the background model is used to detect the moving area. Gaussian filtering is used to eliminate Gaussian noise in the image, improve image quality, and further ensure the correctness of subsequent video processing.

(2)选取训练样本，提取训练样本的SIFT特征(尺度不变特征转换)，并采用Adaboost分类器训练车辆检测器；所述的训练样本为一系列原始监控视频图像。(2) Select training samples, extract the SIFT feature (scale-invariant feature transformation) of the training samples, and use the Adaboost classifier to train the vehicle detector; the training samples are a series of original surveillance video images.

(3)采用已训练的车辆检测器对运动区域的SIFT特征进行分类，若运动区域中属于车辆的SIFT特征占总SIFT特征的比例大于阈值R，则判断此运动区域为车辆；否则，为非车辆区域。阈值R根据经验设定。(3) Use the trained vehicle detector to classify the SIFT features of the moving area. If the proportion of the SIFT features belonging to the vehicle in the moving area to the total SIFT features is greater than the threshold R, it is judged that the moving area is a vehicle; otherwise, it is not vehicle area. The threshold R is set empirically.

作为优选，在获取运动区域后，还需要去除运动区域的运动阴影。Preferably, after the motion area is acquired, the motion shadow of the motion area needs to be removed.

S13将原始监控视频图像减去车辆视频图像，获得包含背景空洞的去除车辆的视频图像。S13 subtracts the vehicle video image from the original surveillance video image to obtain a video image of the vehicle with background holes removed.

步骤2，采用可选差分编码方式处理去除车辆的视频。Step 2, use optional differential coding to process the video that removes the vehicle.

可选差分编码方式是视频编码技术领域的常规技术，即在传统混合编码标准方案(如H.264)基础上，增加基于背景帧的预测编码技术，扩展出背景参考预测和差分预测两种编码预测模式。当待编码模块为背景块，则通过背景参考预测使得残差更小；若待编码块是前背景混合块，则采用差分预测模式，即利用剪除掉背景后的前景部分进行预测；而纯前景块继续采用传统近邻预测模式。原则上，前背景混合块也可以采用传统近邻预测模式或差分预测模式。The optional differential coding method is a conventional technology in the field of video coding technology, that is, on the basis of the traditional hybrid coding standard scheme (such as H.264), the predictive coding technology based on the background frame is added, and two kinds of coding, the background reference prediction and the differential prediction, are extended. predictive mode. When the module to be coded is a background block, the background reference prediction is used to make the residual smaller; if the block to be coded is a foreground-background mixed block, the differential prediction mode is used, that is, the foreground part after the background is cut off is used for prediction; while the pure The foreground block continues to use the traditional nearest neighbor prediction mode. In principle, the foreground-background mixing block can also adopt the traditional nearest neighbor prediction mode or differential prediction mode.

本具体实施中，采用基于AVS-S2的可选差分编码方式对去除车辆的视频进行编码。基于AVS-S2的可选差分编码方式针对监控视频场景长期不变的特点，通过对监控背景和前景进行建模，以去除大量存在的“场景冗余”，编码效率是H.264/AVC编码方式的两倍。基于AVS-S2的可选差分编码方式对于各P帧的宏块，除使用现有编码方式外，还可以选择性的使用“最近参考帧与背景图像的差分结果”来对“当前宏块与其对应背景差分结果”进行预测编码。In this specific implementation, an optional differential encoding method based on AVS-S2 is used to encode the video without the vehicle. The optional differential coding method based on AVS-S2 aims at the long-term invariable characteristics of the surveillance video scene, and removes a large number of "scene redundancy" by modeling the surveillance background and foreground. The coding efficiency is H.264/AVC coding way twice. The optional differential coding method based on AVS-S2 For the macroblocks of each P frame, in addition to using the existing coding method, you can also selectively use the "difference result between the latest reference frame and the background image" to compare the "current macroblock with its Corresponding background difference result" for predictive coding.

本步骤针对去除车辆的视频进行编码，进一步包括以下子步骤：This step encodes the video with the vehicle removed, and further includes the following sub-steps:

S21背景建模：S21 background modeling:

使用视频图像建模生成背景图像，经编码后重构背景图像。The background image is generated by using video image modeling, and the background image is reconstructed after encoding.

S22全局运动估计：S22 global motion estimation:

对去除车辆的视频图像进行像素或亚像素精度的全局运动估计，获得全局运动矢量。The global motion estimation with pixel or sub-pixel precision is performed on the video image without the vehicle, and the global motion vector is obtained.

S23编码模式选择：S23 encoding mode selection:

基于重构背景图像和全局运动矢量，选择性地使用原始编码模式或差分编码模式对各视频块进行编码。对背景块通过背景参考预测使得残差更小；对前背景混合块，采用差分预测模式，利用减除掉背景后的前景部分进行预测；对纯前景块则继续采用传统近邻预测模式。Each video block is selectively encoded using an original encoding mode or a differential encoding mode based on the reconstructed background image and the global motion vector. For the background block, the residual error is smaller through the background reference prediction; for the foreground and background mixed block, the differential prediction mode is used to predict the foreground part after subtracting the background; for the pure foreground block, the traditional neighbor prediction mode is continued.

步骤3，提取车辆视频中全局运动车辆的全局特征参数。Step 3, extract the global feature parameters of the global moving vehicle in the vehicle video.

见图4，本步骤进一步包括子步骤：See Figure 4, this step further includes sub-steps:

S31基于车辆视频提取全局运动车辆的2D外观特征。S31 Extracting 2D appearance features of a global moving vehicle based on the vehicle video.

对步骤1获得的车辆视频中各全局运动车辆提取其2D外观特征，具体实施方式如下：对某一款型车辆，预先计算其在不同视点下的2D图像轮廓。例如，对某款轿车，在360度车辆方向范围内将其量化为72段，在90度的仰角范围内将其量化为19段，共给出1368个2D形状模板。图5是车辆2D外观特征示意图。其中，图a为轿车3D模型图；图b为图a中轿车的2D模板的采样，第1～3行分别表示相机仰角为0度、15度和30度时的2D模板，第1～4列分别是车辆方向为0度、30度、90度和120度时的2D模板。Extract the 2D appearance features of each global moving vehicle in the vehicle video obtained in step 1. The specific implementation method is as follows: For a certain type of vehicle, pre-calculate its 2D image contours under different viewpoints. For example, for a certain car, it is quantized into 72 segments in the range of 360-degree vehicle direction, and 19 segments in the range of 90-degree elevation angle, and a total of 1368 2D shape templates are given. Fig. 5 is a schematic diagram of 2D appearance features of a vehicle. Among them, Figure a is the 3D model of the car; Figure b is the sampling of the 2D template of the car in Figure a, the first to third lines represent the 2D templates when the camera elevation angle is 0 degrees, 15 degrees and 30 degrees respectively, the first to the fourth The columns are the 2D templates when the vehicle orientation is 0°, 30°, 90° and 120°, respectively.

S32车辆3D模型数据库的建立。S32 Establishment of vehicle 3D model database.

建立车辆3D模型数据库时，根据车辆品牌和型号对车辆进行分类。When building a vehicle 3D model database, classify vehicles according to vehicle make and model.

建立车辆的通用3D模型和精细3D模型，即构成车辆3D模型数据库。车辆3D模型由5个主要部分组成：车身主体及4个车轮，具体采用CAD模型构建，车辆3D模型由网格结构组成，并存储各网格顶点坐标及网格面索引。车窗、车灯等组件由于其可辨识性和可区分性高，在描述车辆特征及区分车辆型号中作用重要，该类组件被称为车辆的关键组件，在车辆通用3D模型和车辆精细3D模型中对关键组件采用不同详细程度的表示。Establish the general 3D model and fine 3D model of the vehicle, that is, constitute the vehicle 3D model database. The vehicle 3D model is composed of 5 main parts: the main body of the vehicle body and 4 wheels, which are specifically constructed using a CAD model. The vehicle 3D model is composed of a grid structure, and stores the coordinates of each grid vertex and the grid surface index. Components such as windows and lights play an important role in describing vehicle characteristics and distinguishing vehicle models due to their high identifiability and distinguishability. Such components are called key components of vehicles. The key components are represented in different levels of detail in the model.

本步骤的具体实施方式如下：The specific implementation of this step is as follows:

(1)建立车辆通用3D模型(1) Establish a general 3D model of the vehicle

采用四边形网格表示车辆通用3D模型，见图6(a)，该图为奥迪Q7车型的通用3D模型。四边形网格具有高度概括性，四边形网格边界并不完全与车辆关键组件边界重合，因此，采用附着于模型上的二维封闭线表示车辆关键组件。在图6(a)的纹理示意图上显示所有关键组件的轮廓线，见图6(b)。The quadrilateral grid is used to represent the general 3D model of the vehicle, as shown in Figure 6(a), which is the general 3D model of the Audi Q7 model. The quadrilateral grid is highly generalized, and the boundary of the quadrilateral grid does not completely coincide with the boundary of the key components of the vehicle. Therefore, the key components of the vehicle are represented by two-dimensional closed lines attached to the model. The contour lines of all key components are shown on the texture schematic in Fig. 6(a), see Fig. 6(b).

(2)获取车辆精细3D模型(2) Obtain the fine 3D model of the vehicle

各型号车辆出厂前，其基于CAD的精细3D模型就已经存在，可以在相关网站进行下载。精细3D模型中，为提高车辆型号的可辨识性，车辆关键组件不仅采用轮廓线表示，还保留各部件的外观特征。图6(c)展示了包括奥迪Q7车型的精细网格3D模型及其关键组件。Before each type of vehicle leaves the factory, its CAD-based fine 3D model already exists, which can be downloaded from the relevant website. In the fine 3D model, in order to improve the recognizability of the vehicle model, the key components of the vehicle are not only represented by contour lines, but also retain the appearance characteristics of each component. Figure 6(c) shows the fine mesh 3D model including the Audi Q7 car model and its key components.

(3)基于通用3D模型参数将车辆通用3D模型与车辆视频中全局运动车辆进行匹配。(3) Match the general 3D model of the vehicle with the global moving vehicle in the vehicle video based on the general 3D model parameters.

基于通用3D模型获得通用3D模型描述参数集，基于描述参数实现车辆通用3D模型与车辆视频中全局运动车辆的匹配，并通过调整通用3D模型描述参数实现车辆通用3D模型与车辆视频中全局运动车辆的最佳适配，车辆通用3D模型与全局运动车辆的匹配属于本技术领域内的常规技术。本具体实施中，通用3D模型描述参数包括车辆的轴距、车头宽度、引擎盖高度等30个参数。Obtain a general 3D model description parameter set based on the general 3D model, realize the matching between the general 3D model of the vehicle and the global moving vehicle in the vehicle video based on the description parameters, and realize the general 3D model of the vehicle and the global moving vehicle in the vehicle video by adjusting the description parameters of the general 3D model The best fit of the vehicle's general 3D model and the global motion vehicle's matching belongs to the conventional technology in this technical field. In this specific implementation, the general 3D model description parameters include 30 parameters such as the wheelbase of the vehicle, the width of the front, and the height of the hood.

通过车辆通用3D模型与全局运动车辆的匹配，即在视频图像恢复时，将与全局运动车辆匹配的通用3D车辆模型放置于视频中该全局运动车辆对应的真实位置。Through the matching of the general 3D model of the vehicle and the global moving vehicle, that is, when the video image is restored, the general 3D vehicle model matched with the global moving vehicle is placed in the real position corresponding to the global moving vehicle in the video.

(4)通用3D模型描述参数的降维(4) Dimensionality reduction of general 3D model description parameters

采用主成分分析法(PCA)对30个通用3D模型描述参数进行降维以获得关键描述参数。参数数目少，计算简单，对噪声与低质的适应度高；但参数数目多，模型对车辆细节表达程度高，以及与实际车辆匹配程度高，所以需要在两者间进行平衡。Leotta通过实验得出：前6个PCA主成分就能较好表达车辆模型，同时有效降低计算量。所以降维后的通用3D模型描述参数，即关键描述参数p＝[p₁,p₂,p₃,p₄,p₅,p₆]^T。Principal component analysis (PCA) was used to reduce the dimensionality of 30 general 3D model description parameters to obtain key description parameters. The number of parameters is small, the calculation is simple, and the adaptability to noise and low quality is high; but the number of parameters is large, the model has a high degree of expression of vehicle details, and a high degree of matching with the actual vehicle, so a balance needs to be struck between the two. Leotta obtained through experiments that the first six PCA principal components can better express the vehicle model and effectively reduce the amount of calculation. Therefore, the general 3D model description parameters after dimensionality reduction, that is, the key description parameters p=[p ₁ , p ₂ , p ₃ , p ₄ , p ₅ , p ₆ ] ^T .

S33全局车辆纹理字典的构建Construction of S33 Global Vehicle Texture Dictionary

构建全局车辆纹理字典的作用是，当将与全局运动车辆匹配的通用3D车辆模型放置于视频中对应的真实位置后，可依据全局车辆纹理字典对通用3D车辆模型进行纹理重建。The purpose of constructing the global vehicle texture dictionary is to reconstruct the texture of the general 3D vehicle model according to the global vehicle texture dictionary after the general 3D vehicle model matching the global moving vehicle is placed in the corresponding real position in the video.

根据全局车辆提取和识别结果，采用稀疏编码方式对车辆视频中各全局运动车辆构建全局车辆纹理字典，全局车辆纹理字典与车辆3D模型数据库共同构成车辆知识字典。According to the global vehicle extraction and recognition results, a global vehicle texture dictionary is constructed for each global moving vehicle in the vehicle video by means of sparse coding, and the global vehicle texture dictionary and the vehicle 3D model database jointly constitute a vehicle knowledge dictionary.

本步骤首先，构建全局车辆的共性视觉纹理信息知识库；接着，构建各类型车辆的三维结构及纹理个性信息知识字典；最后，构建个体车辆的个性长时更新信息知识字典。编码时仅通过对运动车辆的纹理、姿态等特征进行描述，将全局运动车辆的视频数据转变成仅包含少量信息的特征描述数据，有效去除运动车辆的全局冗余，进一步提升编码效率。In this step, the common visual texture information knowledge base of global vehicles is constructed first; then, the three-dimensional structure and texture individual information knowledge dictionary of various types of vehicles is constructed; finally, the individual vehicle personality long-term update information knowledge dictionary is constructed. When encoding, only by describing the texture, attitude and other features of the moving vehicle, the video data of the global moving vehicle is transformed into feature description data containing only a small amount of information, effectively removing the global redundancy of the moving vehicle, and further improving the coding efficiency.

本步骤的三层知识字典构建的具体实施方式如下：The specific implementation of the three-layer knowledge dictionary construction of this step is as follows:

(1)第一层知识字典构建，即所有类型全局运动车辆的共性视觉纹理信息知识字典的构建。(1) The first layer of knowledge dictionary construction, that is, the construction of the common visual texture information knowledge dictionary of all types of global moving vehicles.

通过稀疏编码方式构建所有类型运动车辆的共性视觉纹理信息知识字典，代价函数如下：The common visual texture information knowledge dictionary of all types of moving vehicles is constructed by sparse coding, and the cost function is as follows:

其中，D₁表示第一层知识字典；C表示全局运动车辆的类型数，c表示全局运动车辆的类型编号；y_c表示全局运动车辆的共性视觉纹理信息，即所有类型全局运动车辆的纹理信息；a₁表示编码系数；τ为平衡因子，根据实际情况和经验设定，τ越大编码系数越稀疏。Among them, D ₁ represents the first layer of knowledge dictionary; C represents the number of types of global moving vehicles, c represents the type number of global moving vehicles; y _c represents the common visual texture information of global moving vehicles, that is, the texture information of all types of global moving vehicles ; a ₁ represents the coding coefficient; τ is the balance factor, which is set according to the actual situation and experience, the larger the τ, the sparser the coding coefficient.

代价函数(1)用来计算稀疏度，通过约束项防止过拟合，同时也可以减少编码系数中非0元素。The cost function (1) is used to calculate the sparsity, prevent over-fitting through the constraint item, and also reduce the non-zero elements in the coding coefficient.

(2)第二层知识字典构建，即各类型全局运动车辆的三维结构及纹理个性信息知识字典的构建。(2) The second layer of knowledge dictionary construction, that is, the construction of the three-dimensional structure and texture individual information knowledge dictionary of various types of global moving vehicles.

通过稀疏编码方式构建各类型全局运动车辆的三维结构及纹理个性信息知识字典，具体为：Construct the three-dimensional structure and texture individual information knowledge dictionary of various types of global moving vehicles by means of sparse coding, specifically:

首先，提取各类全局运动车辆经第一层知识字典重建后与原始全局运动车辆的差异信息r_c：First, extract the difference information r _c between various global moving vehicles reconstructed by the first layer of knowledge dictionary and the original global moving vehicles:

其次，针对各类型全局运动车辆分别构建第二层知识字典，代价函数如下：Secondly, the second-layer knowledge dictionary is constructed for each type of global moving vehicle, and the cost function is as follows:

其中，表示第二层知识字典，M表示某类型全局运动车辆下的个体车辆数量，m表示某类型全局运动车辆下个体车辆编号，a_2,c表示编码系数。in, Represents the second-level knowledge dictionary, M represents the number of individual vehicles under a certain type of global motion vehicle, m represents the number of individual vehicles under a certain type of global motion vehicle, a _{2, c} represent the coding coefficients.

(3)第三层知识字典构建，即个体全局运动车辆的个性长时更新信息知识字典。(3) The third layer of knowledge dictionary construction, that is, the long-term update information knowledge dictionary of individual global moving vehicles.

通过稀疏编码方式构建个体运动车辆的个性长时更新信息知识字典，具体为：Construct the long-term update information knowledge dictionary of the personality of individual moving vehicles through sparse coding, specifically:

首先，提取各类全局运动车辆下各个体车辆经第二层知识字典重建后与原始全局运行车辆的差异信息r_c,m：First, extract the difference information r _c,m of each individual vehicle under various global moving vehicles reconstructed by the second-level knowledge dictionary and the original global running vehicle:

其次，针对各类型全局运动车辆中个体车辆分别构建第三层知识字典，代价函数如下：Secondly, the third-layer knowledge dictionary is constructed for individual vehicles in various types of global moving vehicles, and the cost function is as follows:

其中，表示第三层知识字典，N表示该类型全局运动对象下的个体车辆数量，i表示该类型运动对象下的个体车辆编号；a_3,c,m表示编码系数。in, Represents the third-level knowledge dictionary, N represents the number of individual vehicles under this type of global moving object, i represents the number of individual vehicles under this type of moving object; a _{3, c, m} represent encoding coefficients.

车辆视频编码由基于三层知识字典的全局车辆特征表达来处理。一方面，由于三层知识字典随时间缓慢变化，可以将全局运动车辆的视频数据转变成仅包含少量信息的特征描述信息，特征描述信息仅包含全局运动对象视频数据的少量信息；另一方面，在城市公共安全应用中，由于监控摄像头位置相对固定，背景信息相对固定，因此，可以降低背景信息传输频率，比如100帧传1次背景帧信息，那么，每帧背景信息只需传输少量的特征描述信息，在解码端利用三层知识字典即可完成重建，从而大幅提升视频大数据的编码效率。Vehicle video coding is handled by a global vehicle feature representation based on a three-layer knowledge dictionary. On the one hand, since the three-layer knowledge dictionary changes slowly over time, the video data of the global moving vehicle can be transformed into feature description information containing only a small amount of information, and the feature description information only contains a small amount of information of the video data of the global moving object; on the other hand, In urban public security applications, since the position of the monitoring camera is relatively fixed and the background information is relatively fixed, the frequency of background information transmission can be reduced. For example, background frame information is transmitted once every 100 frames. Then, only a small amount of features need to be transmitted for each frame of background information. Descriptive information can be reconstructed by using a three-layer knowledge dictionary at the decoding end, thereby greatly improving the coding efficiency of video big data.

S34获取全局运动车辆的纹理与模型参数信息。S34 acquires texture and model parameter information of the global moving vehicle.

将全局运动车辆的2D外观特征与车辆3D模型进行特征匹配，得到全局运动车辆的纹理及模型参数信息。The 2D appearance features of the global moving vehicle are matched with the vehicle 3D model to obtain the texture and model parameter information of the global moving vehicle.

特征匹配的具体步骤如下：The specific steps of feature matching are as follows:

(1)基于车辆3D模型数据库，预生成和建立车辆所有视点量化后(见步骤S31)的2D正交外观掩膜的索引；(1) Based on the vehicle 3D model database, pre-generate and establish the index of the 2D orthogonal appearance mask after all viewpoints of the vehicle are quantified (see step S31);

(2)对检测到的车辆矩形区域，通过基于区域的匹配和轮廓的匹配，评估选出参数(类型，倾角，方向)以匹配前景模型。(2) Evaluate selected parameters (type, inclination, direction) to match the foreground model for the detected rectangular area of the vehicle through area-based matching and contour matching.

S35根据全局运动车辆2D外观特征提取全局运动车辆位置信息，并结合车辆3D模型中对应的车辆姿态信息描述构成全局运动车辆的位置及姿态参数。S35 extracts the position information of the global moving vehicle according to the 2D appearance feature of the global moving vehicle, and describes the position and attitude parameters of the global moving vehicle in combination with the corresponding vehicle attitude information in the vehicle 3D model.

本步骤进一步包括子步骤：This step further includes sub-steps:

(1)提取车辆视频中各全局运动车辆的特征信息。(1) Extract the feature information of each global moving vehicle in the vehicle video.

第i个全局运动车辆的特征信息F_i包括空间位置(x,y,z)和姿态角度(α,β,γ)六个参数，见图8：The feature information F _i of the i-th global moving vehicle includes six parameters of spatial position (x, y, z) and attitude angle (α, β, γ), as shown in Figure 8:

F_i＝[x_i,y_i,z_i,α_i,β_i,γ_i] (6)F _i ＝[x _i ,y _i , _zi ,α _i ,β _i ,γ _i ] (6)

第一步，参数确定：The first step is to determine the parameters:

假定静态监控摄像头满足透视投影原理，摄像头的检校及拍摄区域的地平面参数均实现离线预处理。通常情况下，目标姿态采用3个位置参数(x,y,z)与3个角度参数(α,β,γ)进行描述。但在车辆运行场景下，可以认为车辆主要沿地平面运行，利用地平面约束，车辆姿态参数ρ可以约减为3个：Assuming that the static surveillance camera satisfies the principle of perspective projection, the calibration of the camera and the ground plane parameters of the shooting area are all realized offline preprocessing. Usually, the target pose is described by three position parameters (x, y, z) and three angle parameters (α, β, γ). However, in the vehicle running scenario, it can be considered that the vehicle mainly runs along the ground plane. Using the ground plane constraints, the vehicle attitude parameter ρ can be reduced to three:

ρ＝[x,y,θ]^T (6)ρ=[x,y,θ] ^T (6)

其中，x、y为车辆中心在以地平面为XOY平面的世界坐标系(WCS)上的垂直投影坐标，θ为车辆主运动方向与OX轴的夹角。Among them, x and y are the vertical projection coordinates of the vehicle center on the world coordinate system (WCS) with the ground plane as the XOY plane, and θ is the angle between the main motion direction of the vehicle and the OX axis.

第二步，获取全局运动车辆的姿态参数：The second step is to obtain the attitude parameters of the global moving vehicle:

本步骤主要包括基于光流法的车辆姿态参数初始化和基于预测跟踪的车辆姿态参数更新。首先，基于车辆3D模型通过背景建模提取运动区域；然后，利用稀疏光流法计算车辆的二维运动矢量，结合摄像头检校结果得到车辆在世界坐标系下的主运动方向θ与速度v；最后，将车辆3D模型的二维投影与运动区域的大小与形状迭代匹配，得到车辆在世界坐标系下的位置参数(x,y)。This step mainly includes the initialization of vehicle attitude parameters based on the optical flow method and the update of vehicle attitude parameters based on predictive tracking. First, based on the 3D model of the vehicle, the motion area is extracted through background modeling; then, the two-dimensional motion vector of the vehicle is calculated using the sparse optical flow method, and the main motion direction θ and velocity v of the vehicle in the world coordinate system are obtained by combining the camera calibration results; Finally, the two-dimensional projection of the vehicle 3D model is iteratively matched with the size and shape of the motion area to obtain the position parameters (x, y) of the vehicle in the world coordinate system.

(2)提取重建全局运动车辆时三级知识字典中基元的索引信息，即字典中的编码系数向量ID_i＝[a₁,a_2,c,a_3.c.m]。(2) Extract the index information of primitives in the three-level knowledge dictionary when reconstructing the global moving vehicle, that is, the coding coefficient vector ID _i =[a ₁ ,a _2,c ,a _3.cm ] in the dictionary.

(3)编码传输时，对特征信息F_i和编码系数向量ID_i进行无损压缩，具体可采用熵编码进行无损压缩，可有效保障全局运动车辆的关键信息。(3) During encoding and transmission, lossless compression is performed on the feature information F _i and the encoding coefficient vector ID _i . Specifically, entropy encoding can be used for lossless compression, which can effectively protect the key information of the global moving vehicle.

步骤S35获得的车辆的位置及姿态参数及步骤S34获得的纹理与模型参数信息共同构成全局车辆特征参数集，对全局车辆特征参数集进行无损压缩。The position and attitude parameters of the vehicle obtained in step S35 and the texture and model parameter information obtained in step S34 together constitute a global vehicle characteristic parameter set, and lossless compression is performed on the global vehicle characteristic parameter set.

步骤4基于全局车辆特征参数对车辆视频进行全局编码。Step 4: Globally encode the vehicle video based on the global vehicle feature parameters.

本步骤的具体流程见图9，具体步骤如下：The specific process of this step is shown in Figure 9, and the specific steps are as follows:

S41基于全局特征参数对全局运动车辆进行运动估计和运动补偿，得到残差参数信息。S41 Perform motion estimation and motion compensation on the global moving vehicle based on the global feature parameters to obtain residual parameter information.

采用多帧参考图像、1/4或1/8像素精度的运动估计方法和可变尺寸块运动补偿方法，具体可采用基于MPEG-1/2/4、H.263、H.264/AVC、H.265/HEVC或AVS的运动估计方法和运动补偿方法，但不限于此。Using multi-frame reference images, 1/4 or 1/8 pixel precision motion estimation method and variable size block motion compensation method, specifically based on MPEG-1/2/4, H.263, H.264/AVC, A motion estimation method and a motion compensation method of H.265/HEVC or AVS, but not limited thereto.

S42获取全局运动车辆的光照补偿参数。S42 Acquire the illumination compensation parameters of the globally moving vehicle.

由于光照变化因素对车辆视频的编码和恢复效果影响较大，本具体实施方式采用如下方法确定光照补偿参数：Since the lighting change factor has a great influence on the encoding and restoration effect of the vehicle video, the following method is used in this embodiment to determine the lighting compensation parameters:

(1)基于YUV颜色空间的UV分量建立颜色不变特征，并从车辆视频获得各种车辆样本构成样本集。(1) Establish color-invariant features based on the UV components of the YUV color space, and obtain various vehicle samples from vehicle videos to form a sample set.

(2)从样本集中随机抽样抽取N个样本点，N通常设定为样本集数量的1/4，优选的N>50。(2) Randomly sample N sample points from the sample set, N is usually set to 1/4 of the number of sample sets, preferably N>50.

(3)基于样本点获得光照补偿参数。(3) Obtain illumination compensation parameters based on sample points.

通过颜色特征对样本点进行筛选后获得，也可通过其他单个或组合特征进行筛选，如梯度特征、小波特征等，其他任何通过图像特征筛选结果获取光照补偿参数的方法都应当包含在本发明保护范围内。The sample points can be obtained by screening the color features, and can also be screened by other single or combined features, such as gradient features, wavelet features, etc. Any other method of obtaining illumination compensation parameters through image feature screening results should be included in the protection of the present invention. within range.

S43融合残差参数信息和光照补偿参数，并采用熵编码将残差参数信息和光照补偿参数传送到解码端，以对视频图像进行恢复。S43 fuses the residual parameter information and the illumination compensation parameter, and transmits the residual parameter information and the illumination compensation parameter to the decoding end by using entropy coding, so as to restore the video image.

对残差参数信息和光照补偿参数进行融合，采用熵编码对融合后的残差参数信息和光照补偿参数进行无损编码。熵编码可采用传统的变长编码和算法编码，实施方式包括但不限于CAVLC(基于上下文自适应的可变长编码)、CABAC(基于上下文的自适应二进制算术熵编码)、C2DVLC(基于上下文自适应的二维变长编码)和CBAC(基于上下文的二元算法编码)。The residual parameter information and the illumination compensation parameter are fused, and entropy coding is used to perform lossless coding on the fused residual parameter information and illumination compensation parameter. Entropy coding can adopt traditional variable-length coding and algorithmic coding, implementations include but not limited to CAVLC (context-based adaptive variable-length coding), CABAC (context-based adaptive binary arithmetic entropy coding), C2DVLC (context-based self-adaptive Adaptive two-dimensional variable length coding) and CBAC (context-based binary algorithm coding).

本发明利用城市监控视频中场景固定、存在大量全局冗余的特点，通过对视频车辆对象和剩余部分分别进行压缩处理，以更大程度消除视频序列中的冗余，获得更好的压缩性能。The present invention utilizes the characteristics of fixed scenes and a large amount of global redundancy in urban surveillance video, and compresses video vehicle objects and remaining parts to eliminate redundancy in video sequences to a greater extent and obtain better compression performance.

本文中所描述的具体实施例仅仅是对本发明精神作举例说明。本发明所属技术领域的技术人员可以对所描述的具体实施例做各种的修改或补充或采用类似的方式替代，但并不会偏离本发明的精神或者超越所附权利要求书所定义的范围。The specific embodiments described herein are merely illustrative of the spirit of the invention. Those skilled in the technical field of the present invention can make various modifications or supplements to the described specific embodiments or adopt similar methods to replace them, but they will not deviate from the spirit of the present invention or go beyond the scope defined in the appended claims .

Claims

1. A global coding method for urban traffic monitoring videos is characterized by comprising the following steps:

step 1, dividing an original monitoring video into a vehicle video and a video for removing a vehicle;

step 2, coding the video without the vehicle by adopting an optional differential coding mode;

step 3, extracting a global feature parameter set of a global motion vehicle in the vehicle video, further comprising:

s31, extracting 2D appearance characteristics of the global motion vehicle;

s32, a vehicle 3D model database is built, the vehicle 3D model database comprises general 3D models, fine 3D models and model key description parameter sets of various vehicles, and the model key description parameters are obtained by dimension reduction of the model description parameter sets;

s33, constructing a global vehicle texture dictionary of the global moving vehicle by adopting a sparse coding mode, and further comprising the following steps:

to be provided withObtaining a first-layer knowledge dictionary, namely a common visual texture information knowledge dictionary of all classes of global motion vehicles, based on the texture information of the global motion vehicles as a cost function;

obtaining difference information r between various global motion vehicles and original global operation vehicles after the global motion vehicles are reconstructed by a first-layer knowledge dictionary_cTo do so byAs a cost function, based on the difference information r_cAcquiring a second-layer knowledge dictionary, namely a three-dimensional structure and texture individual information knowledge dictionary of various global motion vehicles;

obtaining difference information r between each individual global motion vehicle under various global motion vehicles and the original global motion vehicle after reconstruction of the second-layer knowledge dictionary_c,mTo do so byAs a cost function, based on the difference information r_c,mObtaining a third-layer knowledge dictionary, namely an individual long-time updated information knowledge dictionary of the individual global motion vehicle;

above, D₁Representing a first level knowledge dictionary; c represents the type number of the global motion vehicle, and C represents the type number of the global motion vehicle; y is_cRepresenting all types of global motion vehicle texture information; a is₁Representing the encoded coefficients; tau is a balance factor, and the larger tau is, the more sparse the coding coefficient is;representing a second-level knowledge dictionary, M representing the number of individual vehicles under a certain type of global motion vehicle, M representing the number of individual vehicles under a certain type of global motion vehicle, a_2,cRepresenting the encoded coefficients;representing a third-layer knowledge dictionary, N representing the number of individual vehicles under the type of global motion vehicle, and i representing the number of individual vehicles under the type of global motion vehicle; a is_3,c,mRepresenting the encoded coefficients;

s34, carrying out feature matching on the 2D appearance features of the global motion vehicle and the models in the vehicle 3D model database to obtain the texture of the global motion vehicle and the key description parameter information of the models;

s35, extracting global motion vehicle position information according to the 2D appearance characteristics of the global motion vehicle, and combining the attitude information in the corresponding model key description parameter information to form the position and attitude parameters of the global motion vehicle;

s36, lossless compression is carried out on the coding coefficients corresponding to the global feature parameter set and the three-level knowledge dictionary, wherein the global feature parameter set is composed of the texture and model key description parameter information obtained in the step S34 and the position and posture parameters obtained in the step S35;

step 4, carrying out global coding on the vehicle video based on the global characteristic parameters;

step 4 further comprises the sub-steps of:

s41, motion estimation and motion compensation are carried out on the global motion vehicle based on the global vehicle characteristic parameters, and residual error parameter information is obtained;

s42, acquiring illumination compensation parameters of the global motion vehicle;

s43 fuses the residual parameter information and the illumination compensation parameter, and losslessly encodes the residual parameter information and the illumination compensation parameter.

2. The global encoding method for urban traffic monitoring video according to claim 1, characterized in that:

in the step 1, the original monitoring video is divided into the vehicle video and the video for removing the vehicle by adopting the background modeling and vehicle detection technology, and the method specifically comprises the following steps:

s11, converting the original monitoring video image into YUV space, and establishing an automatically updated background model based on a background subtraction method;

s12, detecting the vehicle in the original monitoring video image by using a vehicle detection method to obtain a vehicle video image;

s13, subtracting the vehicle video image from the original monitoring video image to obtain a vehicle-removed video image containing a background hole;

and S14, overlaying and filling the background hole in the video image obtained in S13 by adopting the background model, and obtaining the video image without the vehicle.

3. The global encoding method for urban traffic monitoring video according to claim 1, characterized in that:

step 2 further comprises the sub-steps of:

s21, generating a background image according to the video image without the vehicle, and reconstructing the background image after encoding;

s22, carrying out global motion estimation on the video image without the vehicle to obtain a global motion vector;

s23 encodes each video block selectively using the original encoding mode or the differential encoding mode based on the reconstructed background image and the global motion vector.

4. The global encoding method for urban traffic monitoring video according to claim 1, characterized in that:

the sub-step S32 further includes:

(1) constructing a vehicle universal 3D model based on a grid structure;

(2) obtaining a fine 3D model of the vehicle;

(3) obtaining a 3D model description parameter set according to the vehicle general 3D model;

(4) and the key description parameters of the model are obtained by dimension reduction of the model description parameter set.

5. The global encoding method for urban traffic monitoring video according to claim 1, characterized in that:

the sub-step S35 further includes:

(1) determining a global moving vehicle's position and angle parameter ρ ═ x, y, θ]^TX and y are vertical projection coordinates of the center of the global motion vehicle in a world coordinate system, and theta is an included angle between the main motion direction of the global motion vehicle and an OX axis;

(2) extracting a motion area in the vehicle video through background modeling;

(3) obtaining a two-dimensional motion vector of the global motion vehicle by using a sparse optical flow method;

(4) obtaining a main motion direction theta and a speed v of the global motion vehicle in a world coordinate system;

(5) and iteratively matching the two-dimensional projection of the universal 3D model matched with the global moving vehicle with the size and the shape of the moving area to obtain the position parameters (x, y) of the global moving vehicle under a world coordinate system.

6. A global coding system for urban traffic monitoring video is characterized by comprising:

(1) the video segmentation module is used for segmenting the original monitoring video into a vehicle video and a video for removing the vehicle;

(2) the optional differential coding module is used for coding the video without the vehicle by adopting an optional differential coding mode;

(3) the global characteristic parameter extraction module is used for extracting a global characteristic parameter set of a global motion vehicle in the vehicle video, and further comprises sub-modules:

the 2D appearance characteristic extraction module is used for extracting 2D appearance characteristics of the global motion vehicle;

the vehicle 3D model database construction module is used for constructing a vehicle 3D model database, the vehicle 3D model database comprises general 3D models, fine 3D models and model key description parameter sets of various vehicles, and the model key description parameters are obtained by reducing the dimensions of the model description parameter sets;

the global vehicle texture dictionary building module is used for building a global vehicle texture dictionary of a global moving vehicle in a sparse coding mode, and further comprises:

the texture and model key description parameter information acquisition module is used for carrying out feature matching on the 2D appearance features of the global motion vehicle and the models in the vehicle 3D model database to acquire the texture and model key description parameter information of the global motion vehicle;

the position and attitude parameter acquisition module is used for extracting global moving vehicle position information according to the 2D appearance characteristics of the global moving vehicle and combining attitude information in the corresponding model key description parameter information to form position and attitude parameters of the global moving vehicle;

the lossless compression module is used for carrying out lossless compression on the coding coefficients corresponding to the global feature parameter set and the three-level knowledge dictionary, wherein the global feature parameter set is composed of the texture and model key description parameter information obtained in the step S34 and the position and posture parameters obtained in the step S35;

(4) the global coding module is used for carrying out global coding on the vehicle video based on the global characteristic parameters;

the global encoding module further comprises sub-modules:

the residual parameter information acquisition module is used for carrying out motion estimation and motion compensation on the global motion vehicle based on the global vehicle characteristic parameters to obtain residual parameter information;

the illumination compensation parameter acquisition module is used for acquiring illumination compensation parameters of the global motion vehicle;

and the lossless coding module is used for fusing the residual parameter information and the illumination compensation parameter and carrying out lossless coding on the residual parameter information and the illumination compensation parameter.

7. The global urban traffic monitoring video coding system according to claim 6, wherein:

the position and posture parameter obtaining module further comprises:

a position and angle parameter determination module to determine a position and angle parameter ρ ═ x, y, θ for the globally moving vehicle]^TX and y are vertical projection coordinates of the center of the global motion vehicle in a world coordinate system, and theta is an included angle between the main motion direction of the global motion vehicle and an OX axis;

the motion region extraction module is used for extracting a motion region in the vehicle video through background modeling;

the two-dimensional motion vector obtaining module is used for obtaining a two-dimensional motion vector of the global motion vehicle by utilizing a sparse optical flow method;

the main motion direction and speed obtaining module is used for obtaining a main motion direction theta and a speed v of the global motion vehicle in a world coordinate system;

and the position parameter obtaining module is used for iteratively matching the two-dimensional projection of the universal 3D model matched with the global moving vehicle with the size and the shape of the moving area to obtain the position parameters (x, y) of the global moving vehicle under a world coordinate system.