CN114331848A

CN114331848A - Video image splicing method, device and equipment

Info

Publication number: CN114331848A
Application number: CN202111677340.4A
Authority: CN
Inventors: 戎思佳
Original assignee: Guangzhou Xiaopeng Motors Technology Co Ltd
Current assignee: Guangzhou Xiaopeng Motors Technology Co Ltd
Priority date: 2021-12-31
Filing date: 2021-12-31
Publication date: 2022-04-12
Anticipated expiration: 2041-12-31
Also published as: CN114331848B

Abstract

The application relates to a video image splicing method, a video image splicing device and video image splicing equipment. The video image splicing method comprises the following steps: obtaining the image proportion of the ROI and an original video image frame according to the vehicle-mounted radar ranging data and a first preset model; obtaining display position information of the ROI on the original video image frame according to the picture proportion and a second preset model; according to the picture proportion and the display position information, cutting the original video image frame to obtain a target ROI; and splicing the target ROI obtained by cutting into a target video image. The scheme provided by the application can improve the splicing effect of the video images.

Description

Video image stitching method, device and equipment

技术领域technical field

本申请涉及自动驾驶技术领域，尤其涉及一种视频图像拼接方法、装置及设备。The present application relates to the technical field of automatic driving, and in particular, to a video image stitching method, device and device.

背景技术Background technique

随着自动驾驶技术的不断发展，车辆部署了越来越多不同功能的传感器，例如在车辆前后左右部署摄像头和/或各类型雷达等传感器。其中，雷达主要用于测量车辆与障碍物的距离，而摄像头则是用于采集车辆周边的场景图像。在远程驾驶中，车辆可以将传感器采集的视频图像发送给远程座舱，远程座舱可以将视频图像进行拼接处理。With the continuous development of autonomous driving technology, more and more sensors with different functions are deployed in vehicles, for example, sensors such as cameras and/or various types of radars are deployed on the front, rear, left, and right sides of the vehicle. Among them, the radar is mainly used to measure the distance between the vehicle and the obstacle, and the camera is used to collect scene images around the vehicle. In remote driving, the vehicle can send the video images collected by the sensor to the remote cockpit, and the remote cockpit can stitch the video images.

相关技术中，在对图像进行拼接时，采用了传统的纯视觉图像拼接算法，通过异步更新拼接模型的方式来解决视频图像拼接方法中的实时性较低的问题。但是，相关技术的视频图像拼接处理方法，无法解决视差下的景深问题，拼接效果较差。In the related art, when splicing images, a traditional pure visual image splicing algorithm is adopted, and the problem of low real-time performance in the video image splicing method is solved by asynchronously updating the splicing model. However, the video image stitching processing method in the related art cannot solve the problem of depth of field under parallax, and the stitching effect is poor.

发明内容SUMMARY OF THE INVENTION

为解决或部分解决相关技术中存在的问题，本申请提供一种视频图像拼接方法、装置及设备，能够提升视频图像的拼接效果。In order to solve or partially solve the problems existing in the related art, the present application provides a video image splicing method, device and device, which can improve the splicing effect of video images.

本申请第一方面提供一种视频图像拼接方法，包括：A first aspect of the present application provides a video image stitching method, including:

根据车载雷达测距数据和第一预设模型获得感兴趣区域ROI与原始视频图像帧的画面比例；Obtain the ratio of the ROI of the region of interest to the original video image frame according to the vehicle radar ranging data and the first preset model;

根据所述画面比例和第二预设模型获得所述ROI在所述原始视频图像帧上的显示位置信息；Obtain the display position information of the ROI on the original video image frame according to the screen ratio and the second preset model;

根据所述画面比例以及所述显示位置信息，从所述原始视频图像帧裁剪得到目标ROI；According to the aspect ratio and the display position information, the target ROI is obtained by cropping the original video image frame;

将各个裁剪得到的所述目标ROI拼接成目标视频图像。The target ROI obtained from each crop is spliced into a target video image.

在一实施方式中，所述根据车载雷达测距数据和第一预设模型获得感兴趣区域ROI与原始视频图像帧的画面比例，包括：将车载雷达测距数据输入预先训练的浅层神经网络模型，输出感兴趣区域ROI与原始视频图像帧的画面比例；In one embodiment, the obtaining the ratio of the ROI of the region of interest to the original video image frame according to the vehicle-mounted radar ranging data and the first preset model includes: inputting the vehicle-mounted radar ranging data into a pre-trained shallow neural network. The model outputs the ratio of the ROI of the region of interest to the original video image frame;

所述根据所述画面比例和第二预设模型获得所述ROI在所述原始视频图像帧上的显示位置信息，包括：将所述画面比例输入拟合模型，输出所述ROI在所述原始视频图像帧上的显示位置信息。The obtaining the display position information of the ROI on the original video image frame according to the picture scale and the second preset model includes: inputting the picture scale into a fitting model, and outputting the ROI in the original video frame. Display position information on video image frames.

在一实施方式中，所述浅层神经网络模型采用以下方式训练得到：In one embodiment, the shallow neural network model is obtained by training in the following manner:

采用训练集对所述浅层神经网络模型进行训练，得到所述预先训练的浅层神经网络模型，其中所述训练集包括标注画面比例和训练用雷达测距数据，所述标注画面比例为标注训练用ROI与训练用视频图像帧的画面比例。The shallow neural network model is trained by using a training set to obtain the pre-trained shallow neural network model, wherein the training set includes annotated screen scale and training radar ranging data, and the marked screen scale is the mark The aspect ratio of the training ROI to the training video image frame.

在一实施方式中，所述训练集按以下方式获得：In one embodiment, the training set is obtained as follows:

从采集的雷达测距数据中选取设定数量数据作为训练用雷达测距数据；Select a set amount of data from the collected radar ranging data as the training radar ranging data;

根据画面连续原则，从所述训练用视频图像帧中标注出与上一帧训练用视频图像帧画面连续的ROI，得到标注训练用ROI；According to the principle of picture continuity, the ROI that is continuous with the previous frame of the training video image frame is marked from the training video image frame, and the training ROI is obtained;

根据所述标注训练用ROI与训练用视频图像帧进行对比，得到所述标注画面比例；According to the ROI for labeling training and the video image frame for training, the labeling screen ratio is obtained;

将所述训练用雷达测距数据与所述标注画面比例保存作为训练集。The ratio of the radar ranging data for training and the marked picture is saved as a training set.

在一实施方式中，所述拟合模型采用以下方式得到：In one embodiment, the fitting model is obtained in the following manner:

将所述标注训练用ROI在所述训练用视频图像帧上的标注显示位置和所述标注画面比例进行拟合处理，得到所述拟合模型。The fitting model is obtained by fitting the labeling display position of the labeling training ROI on the training video image frame and the labeling screen ratio.

在一实施方式中，所述将所述标注训练用ROI在所述训练用视频图像帧上的标注显示位置和所述标注画面比例进行拟合，得到所述拟合模型，包括：In one embodiment, the fitting model is obtained by fitting the labeling display position of the labeling training ROI on the training video image frame and the labeling screen ratio, including:

将各个所述标注画面比例作为已知量输入以目标显示位置为未知量的目标拟合方程，迭代求解所述目标显示位置，其中所述目标拟合方程包括多项式拟合系数；Inputting each of the marked screen ratios as known quantities into a target fitting equation with a target display position as an unknown quantity, and iteratively solving the target display position, wherein the target fitting equation includes a polynomial fitting coefficient;

当所述目标显示位置与所述标注训练用ROI在所述训练用视频图像帧上的标注显示位置之间的偏差小于预设阈值时，确定对应的多项式拟合系数取值为目标拟合系数取值，以所述目标拟合系数取值确定的目标拟合方程作为拟合模型。When the deviation between the target display position and the labeling display position of the labeling training ROI on the training video image frame is less than a preset threshold, determine the corresponding polynomial fitting coefficient as the target fitting coefficient value, and the target fitting equation determined by the target fitting coefficient value is used as the fitting model.

本申请第二方面提供一种视频图像拼接装置，包括：A second aspect of the present application provides a video image splicing device, comprising:

第一输出模块，用于根据车载雷达测距数据和第一预设模型获得感兴趣区域ROI与原始视频图像帧的画面比例；The first output module is used to obtain the picture ratio of the region of interest ROI and the original video image frame according to the vehicle radar ranging data and the first preset model;

第二输出模块，用于根据所述画面比例和第二预设模型获得所述ROI在所述原始视频图像帧上的显示位置信息；a second output module, configured to obtain the display position information of the ROI on the original video image frame according to the screen ratio and the second preset model;

目标区域模块，用于根据所述第一输出模块得到的画面比例以及所述第二输出模块得到的显示位置信息，从所述原始视频图像帧裁剪得到目标ROI；a target area module, configured to obtain a target ROI by cropping the original video image frame according to the screen ratio obtained by the first output module and the display position information obtained by the second output module;

拼接模块，用于将目标区域模块中各个裁剪得到的所述目标ROI拼接成目标视频图像。The splicing module is used for splicing the target ROI obtained by each cropping in the target area module into a target video image.

在一实施方式中，所述第一输出模块将车载雷达测距数据输入预先训练的浅层神经网络模型，输出感兴趣区域ROI与原始视频图像帧的画面比例；In one embodiment, the first output module inputs the vehicle-mounted radar ranging data into a pre-trained shallow neural network model, and outputs the screen ratio of the region of interest ROI and the original video image frame;

所述第二输出模块将所述画面比例输入拟合模型，输出所述ROI在所述原始视频图像帧上的显示位置信息。The second output module inputs the picture scale into a fitting model, and outputs the display position information of the ROI on the original video image frame.

在一实施方式中，所述装置还包括：In one embodiment, the apparatus further comprises:

模型训练模块，用于采用训练集对所述浅层神经网络模型进行训练，得到所述预先训练的浅层神经网络模型，其中所述训练集包括标注画面比例和训练用雷达测距数据，所述标注画面比例为标注训练用ROI与训练用视频图像帧的画面比例。A model training module is used to train the shallow neural network model by using a training set to obtain the pre-trained shallow neural network model, wherein the training set includes annotated screen ratio and training radar ranging data, so The annotated screen ratio is the screen ratio of the ROI for marking training and the video image frame for training.

本申请第三方面提供一种电子设备，包括：A third aspect of the present application provides an electronic device, comprising:

处理器；以及processor; and

存储器，其上存储有可执行代码，当所述可执行代码被所述处理器执行时，使所述处理器执行如上所述的方法。A memory having executable code stored thereon which, when executed by the processor, causes the processor to perform the method as described above.

本申请第四方面提供一种计算机可读存储介质，其上存储有可执行代码，当所述可执行代码被电子设备的处理器执行时，使所述处理器执行如上所述的方法。A fourth aspect of the present application provides a computer-readable storage medium on which executable codes are stored, and when the executable codes are executed by a processor of an electronic device, the processor is caused to execute the above method.

本申请提供的技术方案可以包括以下有益效果：The technical solution provided by this application can include the following beneficial effects:

本申请是采用车载雷达测距数据作为数据输入量，车载雷达测距数据的数据量本身较小，输入预设模型的计算量也比较小，另外车载雷达测距数据也可以反映出视频图像的景深不同，因此根据感兴趣区域ROI与原始视频图像帧的画面比例以及ROI在所述原始视频图像帧上的显示位置信息所得到的目标ROI可以使得后续拼接的画面更连续，使得根据目标ROI拼接成的目标视频图像的画面显示效果更好，从而提升了视频图像的拼接效果。This application uses the vehicle radar ranging data as the data input. The data volume of the vehicle radar ranging data itself is relatively small, and the calculation amount of the input preset model is also relatively small. In addition, the vehicle radar ranging data can also reflect the video image. The depth of field is different, so the target ROI obtained according to the screen ratio of the region of interest ROI and the original video image frame and the display position information of the ROI on the original video image frame can make the subsequent stitched pictures more continuous, so that the stitching according to the target ROI can be made. The resulting target video image has a better picture display effect, thereby improving the video image splicing effect.

进一步的，本申请是将车载雷达测距数据输入预先训练的浅层神经网络模型，输出感兴趣区域ROI与原始视频图像帧的画面比例，及将所述画面比例输入拟合模型，输出所述ROI在所述原始视频图像帧上的显示位置信息。由于浅层神经网络模型的层数本身较少，相对于深度神经网络而言，浅层神经网络模型的运算过程要简单很多，因此，计算速度较快；而拟合模型属于传统的数学模型，在输出ROI在原始视频图像帧上的显示位置信息时也可迅速完成。因此，本申请的技术方案能够满足视频拼接对实时性高的要求。Further, the present application is to input the vehicle radar ranging data into a pre-trained shallow neural network model, output the picture ratio of the region of interest ROI and the original video image frame, and input the picture scale into the fitting model, and output the described picture ratio. Display position information of the ROI on the original video image frame. Since the number of layers of the shallow neural network model itself is small, compared with the deep neural network, the operation process of the shallow neural network model is much simpler, so the calculation speed is faster; and the fitting model belongs to the traditional mathematical model, It can also be done quickly when outputting the display position information of the ROI on the original video image frame. Therefore, the technical solution of the present application can meet the requirement of high real-time performance for video splicing.

应当理解的是，以上的一般描述和后文的细节描述仅是示例性和解释性的，并不能限制本申请。It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not limiting of the present application.

附图说明Description of drawings

通过结合附图对本申请示例性实施方式进行更详细地描述，本申请的上述以及其它目的、特征和优势将变得更加明显，其中，在本申请示例性实施方式中，相同的参考标号通常代表相同部件。The above and other objects, features and advantages of the present application will become more apparent from the more detailed description of the exemplary embodiments of the present application in conjunction with the accompanying drawings, wherein the same reference numerals generally represent the exemplary embodiments of the present application. same parts.

图1是本申请实施例示出的视频图像拼接方法的流程示意图；1 is a schematic flowchart of a video image splicing method shown in an embodiment of the present application;

图2是本申请实施例示出的视频图像拼接方法的另一流程示意图；Fig. 2 is another schematic flowchart of the video image stitching method shown in the embodiment of the present application;

图3是本申请实施例示出的视频图像拼接方法的另一流程示意图；3 is another schematic flowchart of the video image stitching method shown in the embodiment of the present application;

图4是本申请实施例示出的视频图像拼接方法中对模型进行训练的流程示意图；4 is a schematic flowchart of training a model in the video image stitching method shown in the embodiment of the present application;

图5是本申请实施例示出的视频图像拼接方法中对模型进行应用的流程示意图；5 is a schematic flowchart of applying a model in the video image stitching method shown in the embodiment of the present application;

图6是本申请实施例示出的视频图像拼接装置的结构示意图；6 is a schematic structural diagram of a video image splicing device shown in an embodiment of the present application;

图7是本申请实施例示出的视频图像拼接装置的另一结构示意图；7 is another schematic structural diagram of the video image splicing device shown in the embodiment of the present application;

图8是本申请实施例示出的电子设备的结构示意图。FIG. 8 is a schematic structural diagram of an electronic device shown in an embodiment of the present application.

具体实施方式Detailed ways

下面将参照附图更详细地描述本申请的实施方式。虽然附图中显示了本申请的实施方式，然而应该理解，可以以各种形式实现本申请而不应被这里阐述的实施方式所限制。相反，提供这些实施方式是为了使本申请更加透彻和完整，并且能够将本申请的范围完整地传达给本领域的技术人员。Embodiments of the present application will be described in more detail below with reference to the accompanying drawings. Although embodiments of the present application are shown in the drawings, it should be understood that the present application may be implemented in various forms and should not be limited by the embodiments set forth herein. Rather, these embodiments are provided so that this application will be thorough and complete, and will fully convey the scope of this application to those skilled in the art.

在本申请使用的术语是仅仅出于描述特定实施例的目的，而非旨在限制本申请。在本申请和所附权利要求书中所使用的单数形式的“一种”、“所述”和“该”也旨在包括多数形式，除非上下文清楚地表示其他含义。还应当理解，本文中使用的术语“和/或”是指并包含一个或多个相关联的列出项目的任何或所有可能组合。The terminology used in this application is for the purpose of describing particular embodiments only and is not intended to limit the application. As used in this application and the appended claims, the singular forms "a," "the," and "the" are intended to include the plural forms as well, unless the context clearly dictates otherwise. It will also be understood that the term "and/or" as used herein refers to and includes any and all possible combinations of one or more of the associated listed items.

应当理解，尽管在本申请可能采用术语“第一”、“第二”、“第三”等来描述各种信息，但这些信息不应限于这些术语。这些术语仅用来将同一类型的信息彼此区分开。例如，在不脱离本申请范围的情况下，第一信息也可以被称为第二信息，类似地，第二信息也可以被称为第一信息。由此，限定有“第一”、“第二”的特征可以明示或者隐含地包括一个或者更多个该特征。在本申请的描述中，“多个”的含义是两个或两个以上，除非另有明确具体的限定。It should be understood that although the terms "first", "second", "third", etc. may be used in this application to describe various information, such information should not be limited by these terms. These terms are only used to distinguish the same type of information from each other. For example, the first information may also be referred to as the second information, and similarly, the second information may also be referred to as the first information without departing from the scope of the present application. Thus, a feature defined as "first" or "second" may expressly or implicitly include one or more of that feature. In the description of the present application, "plurality" means two or more, unless otherwise expressly and specifically defined.

相关技术的视频图像拼接处理方法，无法解决视差下的景深问题，拼接效果较差。针对上述问题，本申请实施例提供一种视频图像拼接方法，能够提升视频图像的拼接效果。The video image stitching processing method in the related art cannot solve the problem of depth of field under parallax, and the stitching effect is poor. In view of the above problems, the embodiments of the present application provide a video image splicing method, which can improve the splicing effect of video images.

以下结合附图详细描述本申请实施例的技术方案。The technical solutions of the embodiments of the present application will be described in detail below with reference to the accompanying drawings.

图1是本申请实施例示出的视频图像拼接方法的流程示意图。FIG. 1 is a schematic flowchart of a video image stitching method according to an embodiment of the present application.

参见图1，该方法包括：Referring to Figure 1, the method includes:

S101、根据车载雷达测距数据和第一预设模型获得感兴趣区域ROI与原始视频图像帧的画面比例。S101 , obtaining a screen ratio between the ROI of the region of interest and the original video image frame according to the vehicle-mounted radar ranging data and the first preset model.

该步骤中，可以将车载雷达测距数据输入预先训练的浅层神经网络模型，输出ROI(Region Of Interest，感兴趣区域)与原始视频图像帧的画面比例。In this step, the vehicle radar ranging data can be input into a pre-trained shallow neural network model, and the ratio of the ROI (Region Of Interest, region of interest) to the original video image frame is output.

其中，浅层神经网络模型采用以下方式训练得到：采用训练集对浅层神经网络模型进行训练，得到预先训练的浅层神经网络模型，其中训练集包括标注画面比例和训练用雷达测距数据，标注画面比例为标注训练用ROI与训练用视频图像帧的画面比例。Among them, the shallow neural network model is obtained by training in the following way: using the training set to train the shallow neural network model to obtain a pre-trained shallow neural network model, wherein the training set includes the scale of the labeled screen and the radar ranging data for training, The annotated frame ratio is the frame ratio of the ROI used for labeling training and the video image frame used for training.

其中，训练集可以按以下方式获得：从采集的雷达测距数据中选取设定数量数据作为训练用雷达测距数据；根据画面连续原则，从训练用视频图像帧中标注出与上一帧训练用视频图像帧画面连续的ROI，得到标注训练用ROI；根据标注训练用ROI与训练用视频图像帧进行对比，得到标注画面比例；将训练用雷达测距数据与标注画面比例保存作为训练集。Among them, the training set can be obtained in the following ways: select a set amount of data from the collected radar ranging data as the training radar ranging data; according to the principle of continuous picture, mark the training video image frame from the training video image frame and the previous frame training data. Use the continuous ROI of the video image frame to obtain the ROI for labeling training; compare the ROI for labeling training with the video image frame for training to obtain the proportion of labeling picture; save the radar ranging data for training and the proportion of labeling picture as a training set.

S102、根据画面比例和第二预设模型获得ROI在原始视频图像帧上的显示位置信息。S102. Obtain the display position information of the ROI on the original video image frame according to the screen ratio and the second preset model.

该步骤中，可以将画面比例输入拟合模型，输出ROI在原始视频图像帧上的显示位置信息。In this step, the screen ratio can be input into the fitting model, and the display position information of the ROI on the original video image frame can be output.

其中，拟合模型采用以下方式得到：将标注训练用ROI在训练用视频图像帧上的标注显示位置和标注画面比例进行拟合处理，得到拟合模型。Wherein, the fitting model is obtained in the following manner: The fitting model is obtained by fitting the labeling display position of the labeling training ROI on the training video image frame and the labeling screen scale.

例如，可以将各个标注画面比例作为已知量输入以目标显示位置为未知量的目标拟合方程，迭代求解目标显示位置，其中目标拟合方程包括多项式拟合系数；当目标显示位置与标注训练用ROI在训练用视频图像帧上的标注显示位置之间的偏差小于预设阈值时，确定对应的多项式拟合系数取值为目标拟合系数取值，以目标拟合系数取值确定的目标拟合方程作为拟合模型。For example, each annotated screen ratio can be input as a known quantity into a target fitting equation with the target display position as an unknown quantity, and the target display position can be solved iteratively, wherein the target fitting equation includes a polynomial fitting coefficient; When the deviation between the marked display positions of the ROI on the training video image frame is less than the preset threshold, the corresponding polynomial fitting coefficient is determined as the target fitting coefficient value, and the target is determined by the target fitting coefficient value. Fit the equation as the fitted model.

S103、根据画面比例以及显示位置信息，从原始视频图像帧裁剪得到目标ROI。S103. According to the screen ratio and the display position information, the target ROI is obtained by cropping from the original video image frame.

由于原始视频图像帧的画面尺寸已知或者固定，当ROI与原始视频图像帧的画面比例以及ROI在原始视频图像帧上的显示位置信息确定后，则可以很方便地从原始视频图像帧裁剪得到目标ROR也即最终的ROI。Since the picture size of the original video image frame is known or fixed, when the ratio of the ROI to the original video image frame and the display position information of the ROI on the original video image frame are determined, it can be easily obtained by cropping from the original video image frame. The target ROR is the final ROI.

S104、将各个裁剪得到的目标ROI拼接成目标视频图像。S104, splicing each cropped target ROI into a target video image.

得到目标ROR也即最终的ROI后，在后续需要进行视频图像拼接时，利用目标ROI进行拼接，可以使得拼接的视频图像的画面更连续，拼接效果更好。After obtaining the target ROR, that is, the final ROI, when subsequent video image splicing is required, using the target ROI for splicing can make the images of the spliced video images more continuous and the splicing effect better.

从该实施例可以看出，本申请是采用车载雷达测距数据作为数据输入量，车载雷达测距数据的数据量本身较小，输入预设模型的计算量也比较小，另外车载雷达测距数据也可以反映出视频图像的景深不同，因此根据感兴趣区域ROI与原始视频图像帧的画面比例以及ROI在原始视频图像帧上的显示位置信息所得到的目标ROI可以使得后续拼接的画面更连续，使得根据目标ROI拼接成的目标视频图像的画面显示效果更好，从而提升了视频图像的拼接效果。It can be seen from this embodiment that the application uses the vehicle-mounted radar ranging data as the data input, the data volume of the vehicle-mounted radar ranging data itself is relatively small, and the calculation amount of the input preset model is also relatively small. In addition, the vehicle-mounted radar ranging data The data can also reflect the different depth of field of the video image, so the target ROI obtained according to the ratio of the ROI of the region of interest to the original video image frame and the display position information of the ROI on the original video image frame can make the subsequent stitched pictures more continuous. , so that the screen display effect of the target video image spliced according to the target ROI is better, thereby improving the splicing effect of the video image.

图2是本申请实施例示出的视频图像拼接方法的另一流程示意图。FIG. 2 is another schematic flowchart of the video image stitching method shown in the embodiment of the present application.

参见图2，该方法包括：Referring to Figure 2, the method includes:

S201、将车载雷达测距数据输入预先训练的浅层神经网络模型，输出感兴趣区域ROI与原始视频图像帧的画面比例。S201 , input the vehicle radar ranging data into a pre-trained shallow neural network model, and output the picture ratio between the ROI of the region of interest and the original video image frame.

其中，原始视频图像帧为车载摄像头采集的车辆行驶环境视频图像帧。The original video image frame is a video image frame of the vehicle driving environment collected by the vehicle-mounted camera.

在本申请实施例中，车载雷达可以是超声波雷达或毫米波雷达等，其可以部署在车辆四周，而数量可以根据实际需求部署，例如可以部署12个超声波雷达。车载雷达测距数据主要包含车载雷达实时测得的目标或者障碍物与车辆之间的距离信息，该距离信息可以对应于车载摄像头在与车载雷达采集测距数据相同时刻拍摄的车辆行驶环境视频图像帧的图像景深。因此，车载雷达测距数据也可以反映出视频图像的景深不同。In this embodiment of the present application, the vehicle-mounted radar may be an ultrasonic radar or a millimeter-wave radar, etc., which may be deployed around the vehicle, and the number may be deployed according to actual requirements, for example, 12 ultrasonic radars may be deployed. The vehicle-mounted radar ranging data mainly includes the distance information between the target or obstacle and the vehicle measured by the vehicle-mounted radar in real time. The frame's image depth of field. Therefore, the vehicle radar ranging data can also reflect the different depth of field of the video image.

感兴趣区域(Region Of Interest，ROI)即在分析、处理图像时，图像上被重点关注的区域，该区域可能是目标或者障碍物在图像上的位置。在机器视觉、图像处理中，从被处理的图像以方框、圆、椭圆、不规则多边形等方式勾勒出需要处理的区域，被称为感兴趣区域。ROI的典型形状可以为矩形或其他形状。在ROI为矩形时，ROI与原始视频图像帧的画面比例包括ROI的长度与原始视频图像帧的画面长度之比以及ROI的宽度与原始视频图像帧的画面宽度之比。A region of interest (Region Of Interest, ROI) is a region on an image that is focused on when analyzing and processing an image, and the region may be the position of a target or an obstacle on the image. In machine vision and image processing, the area that needs to be processed is outlined from the processed image in the form of boxes, circles, ellipses, irregular polygons, etc., which is called the region of interest. Typical shapes of ROIs can be rectangles or other shapes. When the ROI is a rectangle, the aspect ratio of the ROI to the original video image frame includes the ratio of the length of the ROI to the image length of the original video image frame and the ratio of the width of the ROI to the image width of the original video image frame.

浅层神经网络模型可以是仅包含输入层、一个隐藏层和输出层的基础神经网络模型，各层均采用sigmoid(S型函数)作为激活函数。sigmoid函数用于隐层神经元输出，取值范围为(0，1)，它可以将一个实数映射到(0，1)的区间，可以用来做二分类。在特征相差比较复杂或是相差不是特别大时效果比较好。由于浅层神经网络模型的层数本身较少，相对于深度神经网络而言，浅层神经网络模型的运算过程要简单很多，因此，计算速度较快，可以更好满足高实时性要求。The shallow neural network model can be a basic neural network model that only includes an input layer, a hidden layer and an output layer, and each layer uses a sigmoid (S-shaped function) as an activation function. The sigmoid function is used for the output of neurons in the hidden layer. The value range is (0, 1). It can map a real number to the interval of (0, 1) and can be used for binary classification. The effect is better when the feature difference is more complex or the difference is not particularly large. Since the number of layers of the shallow neural network model itself is small, compared with the deep neural network, the operation process of the shallow neural network model is much simpler. Therefore, the calculation speed is faster and can better meet the high real-time requirements.

本申请可以采用训练集对浅层神经网络模型进行训练，得到预先训练的浅层神经网络模型。其中，训练集包括标注画面比例和训练用雷达测距数据，其中，标注画面比例为标注训练用ROI与训练用视频图像帧的画面比例，标注显示位置为标注训练用ROI在训练用视频图像帧上的标注显示位置。其中，训练集可以按以下方式获得：从采集的雷达测距数据中选取设定数量数据作为训练用雷达测距数据；根据画面连续原则，从训练用视频图像帧中标注出与上一帧训练用视频图像帧画面连续的ROI，得到标注训练用ROI；根据标注训练用ROI与训练用视频图像帧进行对比，得到标注画面比例；将训练用雷达测距数据与标注画面比例保存作为训练集。The present application can use the training set to train the shallow neural network model to obtain a pre-trained shallow neural network model. Among them, the training set includes the scale of the labeled screen and the radar ranging data for training. The scale of the labeled picture is the proportion of the labeled training ROI and the training video image frame, and the labeled display position is the labeled training ROI in the training video image frame. The label on the display shows the location. Among them, the training set can be obtained in the following ways: select a set amount of data from the collected radar ranging data as the training radar ranging data; according to the principle of continuous picture, mark the training video image frame from the training video image frame and the previous frame training data. Use the continuous ROI of the video image frame to obtain the ROI for labeling training; compare the ROI for labeling training with the video image frame for training to obtain the proportion of labeling picture; save the radar ranging data for training and the proportion of labeling picture as a training set.

S202、将ROI与原始视频图像帧的画面比例输入拟合模型，输出ROI在原始视频图像帧上的显示位置信息。S202. Input the aspect ratio of the ROI and the original video image frame into the fitting model, and output the display position information of the ROI on the original video image frame.

在本申请实施例中，拟合模型的作用在于当输入ROI与原始视频图像帧的画面比例时，可以输出ROI在原始视频图像帧上的显示位置信息。In this embodiment of the present application, the function of the fitting model is that when the aspect ratio of the ROI and the original video image frame is input, the display position information of the ROI on the original video image frame can be output.

与浅层神经网络模型在应用之前预先训练类似，此处的拟合模型也可以是经过拟合处理得到的数学模型。本申请可以将标注训练用ROI在训练用视频图像帧上的标注显示位置和标注画面比例进行拟合处理，得到拟合模型。Similar to the pre-training of the shallow neural network model before application, the fitting model here can also be a mathematical model obtained by fitting. The present application can perform fitting processing on the labeling display position of the labeling training ROI on the training video image frame and the labeling screen ratio to obtain a fitting model.

例如，将各个标注画面比例作为已知量输入以目标显示位置为未知量的目标拟合方程，迭代求解目标显示位置，其中目标拟合方程包括多项式拟合系数；当目标显示位置与标注训练用ROI在训练用视频图像帧上的标注显示位置之间的偏差小于预设阈值时，确定对应的多项式拟合系数取值为目标拟合系数取值，以所述目标拟合系数取值确定的目标拟合方程作为拟合模型。For example, input the scale of each annotated screen as a known quantity into a target fitting equation with the target display position as the unknown quantity, and iteratively solve the target display position, where the target fitting equation includes polynomial fitting coefficients; When the deviation between the marked display positions of the ROI on the training video image frame is less than the preset threshold, the corresponding polynomial fitting coefficient is determined as the target fitting coefficient value, and the target fitting coefficient value is determined by the value of the target fitting coefficient. The target fit equation is used as the fitted model.

S203、根据ROI与原始视频图像帧的画面比例以及ROI在原始视频图像帧上的显示位置信息，从原始视频图像帧裁剪得到目标ROI。S203 , according to the aspect ratio of the ROI and the original video image frame and the display position information of the ROI on the original video image frame, crop the target ROI from the original video image frame.

S204、将各个裁剪得到的目标ROI拼接成目标视频图像帧。S204, splicing the target ROIs obtained by each cropping into target video image frames.

上述过程所得到的目标ROI，可能是来自同一摄像头或者不同摄像头(例如，前方的摄像头和左侧的摄像头或右侧的摄像头)的原始视频图像帧，在拼接这些目标ROI时，可以是以其中一个摄像头，例如车前正方的摄像头采集的原始视频图像帧或者以质量较好的视频图像帧的目标ROI为基准。The target ROI obtained by the above process may be the original video image frames from the same camera or different cameras (for example, the front camera and the left camera or the right camera). When splicing these target ROIs, it can be A camera, such as the original video image frame collected by the camera in front of the car, or the target ROI of the video image frame with better quality as the benchmark.

其中，将各个裁剪得到的目标ROI拼接成目标视频图像帧可以利用已有的拼接处理方式实现，例如根据视频图像帧对应的位姿或拍摄时间等参考因素等进行拼接，本申请在此不加以限定。The splicing of each cropped target ROI into a target video image frame can be realized by using an existing splicing processing method, for example, splicing is performed according to reference factors such as the pose or shooting time corresponding to the video image frame, which is not described in this application. limited.

从该实施例可以看出，本申请是将车载雷达测距数据输入预先训练的浅层神经网络模型，输出感兴趣区域ROI与原始视频图像帧的画面比例，及将画面比例输入拟合模型，输出ROI在原始视频图像帧上的显示位置信息。由于浅层神经网络模型的层数本身较少，相对于深度神经网络而言，浅层神经网络模型的运算过程要简单很多，因此，计算速度较快；而拟合模型属于传统的数学模型，在输出ROI在原始视频图像帧上的显示位置信息时也可迅速完成。因此，本申请的技术方案能够满足视频拼接对实时性高的要求。It can be seen from this embodiment that the application is to input the vehicle radar ranging data into a pre-trained shallow neural network model, output the screen ratio of the region of interest ROI and the original video image frame, and input the screen ratio into the fitting model, Output the display position information of the ROI on the original video image frame. Since the number of layers of the shallow neural network model itself is small, compared with the deep neural network, the operation process of the shallow neural network model is much simpler, so the calculation speed is faster; and the fitting model belongs to the traditional mathematical model, It can also be done quickly when outputting the display position information of the ROI on the original video image frame. Therefore, the technical solution of the present application can meet the requirement of high real-time performance for video splicing.

图3是本申请实施例示出的视频图像拼接方法的另一流程示意图。图3相对于图1和图2更详细描述了本申请方案。FIG. 3 is another schematic flowchart of the video image stitching method shown in the embodiment of the present application. FIG. 3 describes the solution of the present application in more detail with respect to FIGS. 1 and 2 .

参见图3，该方法包括：Referring to Figure 3, the method includes:

S301、采集车载雷达测距数据和摄像头的视频图像帧并进行预处理。S301 , collecting vehicle-mounted radar ranging data and video image frames of a camera and performing preprocessing.

本申请中，可以当行驶车辆周边有物体进入超声波雷达有效范围内时，每隔设定时间例如5s记录当前所有摄像头的视频图像帧数据和车载雷达测距数据，并保存。In this application, when an object around the driving vehicle enters the effective range of the ultrasonic radar, the video image frame data and the vehicle radar ranging data of all the current cameras can be recorded every set time, such as 5s, and saved.

需要说明的是，为了方便训练浅层神经网络模型处理并进一步降低计算量，在本申请实施例中，可以对车载雷达测距数据进行预处理。该预处理例如包括进行数据清洗(例如，清除脏点数据即明显不符合要求的测距数据)和数据的归一化(normalization)等；归一化后的车载雷达测距数据对研究人员而言更加直观、方便。It should be noted that, in order to facilitate the processing of training the shallow neural network model and further reduce the amount of calculation, in this embodiment of the present application, the vehicle radar ranging data may be preprocessed. The preprocessing includes, for example, data cleaning (for example, removing dirty point data, that is, ranging data that obviously does not meet the requirements) and data normalization (normalization), etc.; The language is more intuitive and convenient.

S302、对浅层神经网络模型进行预先进行训练，得到训练后的浅层神经网络模型，对拟合模型进行拟合处理，得到处理后的拟合模型。S302 , pre-train the shallow neural network model to obtain a trained shallow neural network model, and perform fitting processing on the fitting model to obtain the processed fitting model.

该步骤的处理过程可以参见图4所示，图4是本申请实施例示出的视频图像拼接方法中对模型进行训练的流程示意图。The processing process of this step can be referred to as shown in FIG. 4 , which is a schematic flowchart of training a model in the video image stitching method shown in the embodiment of the present application.

本申请可以采用训练集对浅层神经网络模型进行训练，得到预先训练的浅层神经网络模型。训练集可以按以下方式获得：从采集的雷达测距数据中选取设定数量数据作为训练用雷达测距数据；根据画面连续原则，从训练用视频图像帧中标注出与上一帧训练用视频图像帧画面连续的ROI，得到标注训练用ROI；根据标注训练用ROI与训练用视频图像帧进行对比，得到标注画面比例；将训练用雷达测距数据与标注画面比例保存作为训练集。The present application can use the training set to train the shallow neural network model to obtain a pre-trained shallow neural network model. The training set can be obtained in the following ways: select a set amount of data from the collected radar ranging data as the training radar ranging data; according to the principle of continuous picture, mark the training video image frame from the training video image frame with the previous frame of the training video. The continuous ROI of the image frame is obtained to obtain the ROI for labeling training; according to the comparison between the ROI for labeling training and the video image frame for training, the proportion of labeling picture is obtained; the radar ranging data for training and the proportion of labeling picture are saved as the training set.

具体的，以下通过步骤S1至步骤S4说明获取训练集的过程。Specifically, the following describes the process of acquiring the training set through steps S1 to S4.

步骤S1：从采集的雷达测距数据中选取设定数量数据作为训练用雷达测距数据。Step S1: Select a set amount of data from the collected radar ranging data as training radar ranging data.

从采集的雷达测距数据中，可以选取第一设定数量数据作为训练用雷达测距数据，选择第二设定数量数据作为测试用雷达测距数据，选择第三设定数量数据作为验证用雷达测距数据。例如，从采集的雷达测距数据中，随机选择的50％作为训练用雷达测距数据，其他的用作对训练之后的浅层神经网络模型进行测试或验证。其中，选取的训练用雷达测距数据例如可以是车辆前向6个超声波雷达的雷达测距数据。From the collected radar ranging data, the first set amount of data can be selected as the training radar ranging data, the second set amount of data can be selected as the test radar ranging data, and the third set amount of data can be selected as the verification data. Radar ranging data. For example, from the collected radar ranging data, 50% are randomly selected as training radar ranging data, and the rest are used for testing or validating the shallow neural network model after training. The selected training radar ranging data may be, for example, radar ranging data of six ultrasonic radars in the forward direction of the vehicle.

雷达测距数据实时测得目标或者障碍物与车辆之间的距离信息，该距离信息可以对应于车载摄像头在与该车载雷达采集测距数据相同时刻拍摄的车辆行驶环境视频图像帧的图像景深。The radar ranging data measures the distance information between the target or obstacle and the vehicle in real time, and the distance information can correspond to the image depth of the vehicle driving environment video image frame captured by the vehicle-mounted camera at the same time as the vehicle-mounted radar collects the ranging data.

步骤S2：根据画面连续原则，从训练用视频图像帧中标注出与上一帧训练用视频图像帧画面连续的ROI，得到标注训练用ROI。Step S2: According to the principle of picture continuity, mark the ROI that is continuous with the previous frame of the training video image frame from the training video image frame, so as to obtain the labeling training ROI.

具体地，可以在训练用视频图像帧中选择区域进行拉伸或收缩，当发现与上一帧训练用视频图像帧画面连续时停止操作，则此时得到的区域为标注训练用ROI，此时标注训练用ROI的画面长度为标注长度，画面宽度为标注宽度。例如，若判断两者的像素没有明显跳变，则可以确定画面连续。Specifically, you can select an area in the training video image frame to stretch or shrink, and stop the operation when it is found to be continuous with the previous frame of the training video image frame, then the area obtained at this time is the ROI for labeling training. The length of the picture for labeling the ROI for training is the labeling length, and the width of the picture is the labeling width. For example, if it is judged that there is no obvious jump between the two pixels, it can be determined that the picture is continuous.

步骤S3：根据标注训练用ROI与训练用视频图像帧进行对比，得到标注画面比例。Step S3: According to the comparison between the ROI for labeling training and the video image frame for training, the scale of the labeling screen is obtained.

该步骤将标注训练用ROI与训练用视频图像帧进行对比，就可以得到标注画面比例。在标注训练用ROI为矩形时，标注画面比例包括ROI的标注长度与原始视频图像帧的画面长度之比以及ROI的标注宽度与原始视频图像帧的画面宽度之比。In this step, the labeled training ROI is compared with the training video image frame to obtain the labeled screen ratio. When the labeled training ROI is a rectangle, the labeled screen ratio includes the ratio of the labeled length of the ROI to the picture length of the original video image frame and the ratio of the labeled width of the ROI to the picture width of the original video image frame.

步骤S4：将训练用雷达测距数据与标注画面比例保存作为训练集。Step S4: Save the radar ranging data for training and the ratio of the labeled picture as a training set.

将训练用雷达测距数据与标注画面比例保存作为训练集，后续将训练用雷达测距数据以及与其对应的标注画面比例输入训练浅层神经网络模型，完成对浅层神经网络模型的训练。Save the training radar ranging data and the scale of the labeled screen as a training set, and then input the training radar ranging data and the corresponding scale of the labeled screen into the training shallow neural network model to complete the training of the shallow neural network model.

其中，对拟合模型进行拟合处理，得到处理后的拟合模型的实现方式可以包括：The implementation manner of performing fitting processing on the fitting model to obtain the processed fitting model may include:

将各个标注画面比例作为已知量输入以目标显示位置为未知量的目标拟合方程，迭代求解目标显示位置，其中目标拟合方程包括多项式拟合系数；Input the scale of each annotated screen as a known quantity into a target fitting equation with the target display position as an unknown quantity, and iteratively solve the target display position, wherein the target fitting equation includes a polynomial fitting coefficient;

当目标显示位置与标注训练用ROI在训练用视频图像帧上的标注显示位置之间的偏差小于预设阈值时，确定对应的多项式拟合系数取值为目标拟合系数取值，以所述目标拟合系数取值确定的目标拟合方程作为拟合模型。预设阈值例如可以为0.05但不局限于此。When the deviation between the target display position and the marked training ROI on the training video image frame is less than the preset threshold, determine the corresponding polynomial fitting coefficient as the target fitting coefficient, and use the The target fitting equation determined by the value of the target fitting coefficient is used as the fitting model. The preset threshold may be, for example, but not limited to, 0.05.

上述实施例中，标注显示位置为拟合过程中的真实值，目标显示位置为拟合过程中的预测值，偏差可以是目标显示位置与标注显示位置之间的均方误差(Mean SquareError，MSE)或者均方根误差(Root Mean Square Error，RMSE)，目标拟合方程可以是线性方程，亦可以是非线性方程。此外需要说明的是，当目标显示位置与标注显示位置之间的偏差小于该预设阈值时，即可认为此时目标显示位置与标注显示位置之间的偏差最小。In the above embodiment, the marked display position is the actual value in the fitting process, the target display position is the predicted value in the fitting process, and the deviation can be the mean square error (Mean Square Error, MSE) between the target display position and the marked display position. ) or root mean square error (Root Mean Square Error, RMSE), the target fitting equation can be a linear equation or a nonlinear equation. In addition, it should be noted that when the deviation between the target display position and the label display position is smaller than the preset threshold, it can be considered that the deviation between the target display position and the label display position is the smallest at this time.

例如，假设标注画面比例也即比例数据(k)和标注显示位置也即位置数据(x，y)线性相关，目标拟合方程可以举例如下但不局限于此：For example, assuming that the scale of the annotation screen, that is, the scale data (k), and the display position of the annotation, that is, the position data (x, y), are linearly correlated, the target fitting equation can be exemplified as follows, but is not limited to this:

x＝a+bk+ck^2+dk^3+ek^4+fk^5+gk^6x=a+bk+ck^2+dk^3+ek^4+fk^5+gk^6

y＝h+ik+jk^2+lk^3+mk^4+nk^5+pk^6y=h+ik+jk^2+lk^3+mk^4+nk^5+pk^6

其中，第一个多项式方程中的a、b、c、d、e、f、g和第二个多项式方程中的h、i、j、l、m、n、p为多项式拟合系数。Among them, a, b, c, d, e, f, g in the first polynomial equation and h, i, j, l, m, n, p in the second polynomial equation are polynomial fitting coefficients.

所谓拟合，就是将平面上一系列的点，用一条光滑的曲线连接起来。因为这条曲线有无数种可能，从而有各种拟合方法。拟合的曲线一般可以用函数表示，根据这个函数的不同有不同的拟合名字。常用的拟合方法例如包括最小二乘曲线拟合法等，在MATLAB(一种数学软件)工具中也可以用polyfit函数来拟合多项式。polyfit函数是基于最小二乘法。上述多项式方程中，每个公式只要有七组及以上非线性数据即可有解，也即利用MATLAB中的polyfit函数进行拟合处理，就可以得到位置数据(x，y)与真实值(标注显示位置)最接近时所对应的多项式拟合系数的取值，则可以将此时对应的多项式拟合系数取值作为目标拟合系数取值，以该目标拟合系数取值所确定的目标拟合方程作为拟合模型。The so-called fitting is to connect a series of points on the plane with a smooth curve. Because this curve has an infinite number of possibilities, there are various fitting methods. The fitted curve can generally be represented by a function, and there are different fitting names depending on the function. Common fitting methods include, for example, the least squares curve fitting method, etc. In MATLAB (a mathematical software) tool, the polyfit function can also be used to fit polynomials. The polyfit function is based on the least squares method. In the above polynomial equations, each formula can be solved as long as there are seven or more sets of nonlinear data, that is, by using the polyfit function in MATLAB for fitting processing, the position data (x, y) and the true value (labeled) can be obtained. The value of the corresponding polynomial fitting coefficient when the display position) is closest, then the corresponding polynomial fitting coefficient value at this time can be used as the target fitting coefficient value, and the target determined by the target fitting coefficient value Fit the equation as the fitted model.

需说明的是，在MATLAB中采用polyfit函数来拟合多项式的具体过程可以按已有技术实现，本申请在此不加以限定。另外，上述方程多项式以七项式为例说明但不局限于此。It should be noted that the specific process of using the polyfit function to fit the polynomial in MATLAB can be implemented according to the prior art, which is not limited in this application. In addition, the polynomials of the above equations are illustrated by taking the heptapomials as an example, but are not limited thereto.

S303、将车载雷达测距数据输入预先训练的浅层神经网络模型，输出感兴趣区域ROI与原始视频图像帧的画面比例。S303 , input the vehicle radar ranging data into the pre-trained shallow neural network model, and output the picture ratio of the ROI of the region of interest and the original video image frame.

浅层神经网络模型可以是仅包含输入层、一个隐藏层和输出层的基础神经网络模型，各层均采用sigmoid(S型函数)作为激活函数。由于浅层神经网络模型的层数本身较少，相对于深度神经网络而言，浅层神经网络模型的运算过程要简单很多，因此，计算速度较快，可以更好满足高实时性要求。The shallow neural network model can be a basic neural network model that only includes an input layer, a hidden layer and an output layer, and each layer uses a sigmoid (S-shaped function) as an activation function. Since the number of layers of the shallow neural network model itself is small, compared with the deep neural network, the operation process of the shallow neural network model is much simpler. Therefore, the calculation speed is faster and can better meet the high real-time requirements.

该步骤可以将车载雷达测距数据输入预先训练的浅层神经网络模型进行运算处理，输出感兴趣区域ROI与原始视频图像帧的画面比例。该步骤的描述可以参见步骤S201中的描述，此处不再赘述。In this step, the vehicle radar ranging data can be input into the pre-trained shallow neural network model for operation processing, and the picture ratio of the ROI of the region of interest and the original video image frame can be output. For the description of this step, reference may be made to the description in step S201, and details are not repeated here.

S304、将ROI与原始视频图像帧的画面比例输入拟合模型，输出ROI在原始视频图像帧上的显示位置信息。S304. Input the picture ratio of the ROI and the original video image frame into the fitting model, and output the display position information of the ROI on the original video image frame.

该步骤中，将ROI与原始视频图像帧的画面比例输入拟合模型进行拟合运算，可以输出ROI在原始视频图像帧上的显示位置信息。In this step, the aspect ratio of the ROI and the original video image frame is input into the fitting model for fitting operation, and the display position information of the ROI on the original video image frame can be output.

S305、根据ROI与原始视频图像帧的画面比例以及ROI在原始视频图像帧上的显示位置信息，从原始视频图像帧裁剪得到目标ROI。S305 , according to the aspect ratio of the ROI and the original video image frame and the display position information of the ROI on the original video image frame, crop the target ROI from the original video image frame.

其中，上述步骤S303至S305的过程可以同时参见图5所示，图5是本申请实施例示出的视频图像拼接方法中对模型进行应用的流程示意图。The process of the above steps S303 to S305 can also be referred to as shown in FIG. 5 . FIG. 5 is a schematic flowchart of applying a model in the video image stitching method shown in the embodiment of the present application.

S306、将各个裁剪得到的目标ROI拼接成目标视频图像帧。S306, splicing the target ROIs obtained by each cropping into target video image frames.

从该实施例可以看出，本申请采用车载雷达测距数据作为数据输入量，车载雷达测距数据的数据量本身较小，输入预设模型的计算量也比较小，车载雷达测距数据也可以反映出视频图像的景深不同；另外，本申请也采用浅层神经网络模型的运算，由于浅层神经网络模型的层数本身较少，相对于深度神经网络而言，浅层神经网络模型的运算过程要简单很多，运算速度快。因此，本申请的技术方案能够满足视频拼接对实时性高的要求，也能使得根据目标ROI拼接成的目标视频图像的画面显示效果更好，从而提升了视频图像的拼接效果。It can be seen from this embodiment that the application uses vehicle-mounted radar ranging data as the data input, the data volume of the vehicle-mounted radar ranging data itself is relatively small, the calculation amount of the input preset model is also relatively small, and the vehicle-mounted radar ranging data is also relatively small. It can reflect that the depth of field of the video image is different; in addition, the application also adopts the operation of the shallow neural network model, because the number of layers of the shallow neural network model itself is small, compared with the deep neural network, the The operation process is much simpler and the operation speed is fast. Therefore, the technical solution of the present application can meet the high real-time requirement of video splicing, and can also make the display effect of the target video image spliced according to the target ROI better, thereby improving the splicing effect of the video image.

与前述应用功能实现方法实施例相对应，本申请还提供了一种视频图像拼接装置、电子设备及相应的实施例。Corresponding to the foregoing application function implementation method embodiments, the present application further provides a video image splicing device, an electronic device, and corresponding embodiments.

图6是本申请实施例示出的视频图像拼接装置的结构示意图。FIG. 6 is a schematic structural diagram of a video image splicing apparatus shown in an embodiment of the present application.

参见图6，本申请提供的一种视频图像拼接装置60，包括：第一输出模块601、第二输出模块602、目标区域模块603、拼接模块604。Referring to FIG. 6 , a video image splicing device 60 provided by the present application includes: a first output module 601 , a second output module 602 , a target area module 603 , and a splicing module 604 .

第一输出模块601，用于根据车载雷达测距数据和第一预设模型获得感兴趣区域ROI与原始视频图像帧的画面比例。其中，第一预设模型可以是浅层神经网络模型，其中，浅层神经网络模型采用以下方式训练得到：采用训练集对浅层神经网络模型进行训练，得到预先训练的浅层神经网络模型，其中训练集包括标注画面比例和训练用雷达测距数据，标注画面比例为标注训练用ROI与训练用视频图像帧的画面比例。The first output module 601 is configured to obtain the picture ratio of the region of interest ROI and the original video image frame according to the vehicle radar ranging data and the first preset model. Wherein, the first preset model may be a shallow neural network model, wherein the shallow neural network model is obtained by training in the following manner: using a training set to train the shallow neural network model to obtain a pre-trained shallow neural network model, The training set includes annotated screen ratio and training radar ranging data, and the marked screen ratio is the screen ratio of the ROI for marking training and the video image frame for training.

第二输出模块602，用于根据画面比例和第二预设模型获得ROI在原始视频图像帧上的显示位置信息。其中，第二预设模型可以是拟合模型。拟合模型可以采用以下方式得到：将标注训练用ROI在训练用视频图像帧上的标注显示位置和标注画面比例进行拟合，得到拟合模型。The second output module 602 is configured to obtain the display position information of the ROI on the original video image frame according to the screen ratio and the second preset model. Wherein, the second preset model may be a fitting model. The fitting model can be obtained in the following manner: the fitting model is obtained by fitting the labeling display position of the labeling training ROI on the training video image frame and the labeling screen ratio.

目标区域模块603，用于根据第一输出模块601得到的画面比例以及第二输出模块602得到的显示位置信息，从原始视频图像帧裁剪得到目标ROI。由于原始视频图像帧的画面尺寸已知或者固定，当ROI与原始视频图像帧的画面比例以及ROI在原始视频图像帧上的显示位置信息确定后，则目标区域模块603可以很方便地从原始视频图像帧裁剪得到目标ROR也即最终的ROI。The target area module 603 is configured to obtain the target ROI by cropping the original video image frame according to the screen ratio obtained by the first output module 601 and the display position information obtained by the second output module 602 . Since the picture size of the original video image frame is known or fixed, when the aspect ratio of the ROI to the original video image frame and the display position information of the ROI on the original video image frame are determined, the target area module 603 can easily extract the image from the original video image. The image frame is cropped to obtain the target ROR, which is the final ROI.

拼接模块604，用于将目标区域模块603中各个裁剪得到的目标ROI拼接成目标视频图像。利用目标ROI进行拼接，可以使得拼接的视频图像的画面更连续，拼接效果更好。The splicing module 604 is used for splicing the target ROI obtained by each cropping in the target region module 603 into a target video image. Using the target ROI for stitching can make the picture of the stitched video images more continuous and the stitching effect is better.

从该实施例可以看出，本申请提供的视频图像拼接装置，采用车载雷达测距数据作为数据输入量，车载雷达测距数据的数据量本身较小，输入预设模型的计算量也比较小，另外车载雷达测距数据也可以反映出视频图像的景深不同，因此根据感兴趣区域ROI与原始视频图像帧的画面比例以及ROI在原始视频图像帧上的显示位置信息所得到的目标ROI可以使得后续拼接的画面更连续，使得根据目标ROI拼接成的目标视频图像的画面显示效果更好，从而提升了视频图像的拼接效果。It can be seen from this embodiment that the video image splicing device provided by the present application uses the vehicle radar ranging data as the data input, the data volume of the vehicle radar ranging data itself is relatively small, and the calculation amount of the input preset model is also relatively small In addition, the vehicle radar ranging data can also reflect the different depth of field of the video image, so the target ROI obtained according to the ratio of the ROI of the region of interest to the original video image frame and the display position information of the ROI on the original video image frame can make The subsequent stitched pictures are more continuous, so that the picture display effect of the target video image stitched according to the target ROI is better, thereby improving the stitching effect of the video images.

图7是本申请实施例示出的视频图像拼接装置的另一结构示意图。FIG. 7 is another schematic structural diagram of a video image splicing apparatus shown in an embodiment of the present application.

参见图7，本申请提供的一种视频图像拼接装置60，包括：第一输出模块601、第二输出模块602、目标区域模块603、拼接模块604、模型训练模块605、数据收集模块606。Referring to FIG. 7 , a video image stitching apparatus 60 provided by the present application includes: a first output module 601 , a second output module 602 , a target area module 603 , a stitching module 604 , a model training module 605 , and a data collection module 606 .

第一输出模块601、第二输出模块602、目标区域模块603、拼接模块604的功能可以参见图6中的描述。For the functions of the first output module 601, the second output module 602, the target area module 603, and the splicing module 604, reference may be made to the description in FIG. 6 .

进一步的，第一输出模块601可以将车载雷达测距数据输入预先训练的浅层神经网络模型，输出感兴趣区域ROI与原始视频图像帧的画面比例。Further, the first output module 601 can input the vehicle radar ranging data into the pre-trained shallow neural network model, and output the picture ratio of the region of interest ROI and the original video image frame.

第二输出模块602可以将画面比例输入拟合模型，输出ROI在原始视频图像帧上的显示位置信息。The second output module 602 can input the screen ratio into the fitting model, and output the display position information of the ROI on the original video image frame.

模型训练模块605，用于采用训练集对浅层神经网络模型进行训练，得到预先训练的浅层神经网络模型，其中训练集包括标注画面比例和训练用雷达测距数据，标注画面比例为标注训练用ROI与训练用视频图像帧的画面比例。其中，训练集可以按以下方式获得：从采集的雷达测距数据中选取设定数量数据作为训练用雷达测距数据；根据画面连续原则，从训练用视频图像帧中标注出与上一帧训练用视频图像帧画面连续的ROI，得到标注训练用ROI；根据标注训练用ROI与训练用视频图像帧进行对比，得到标注画面比例；将训练用雷达测距数据与标注画面比例保存作为训练集。The model training module 605 is used to train the shallow neural network model by using the training set to obtain a pre-trained shallow neural network model, wherein the training set includes the marked screen ratio and the radar ranging data for training, and the marked screen ratio is marked training The aspect ratio of the ROI and the video image frame used for training. Among them, the training set can be obtained in the following ways: select a set amount of data from the collected radar ranging data as the training radar ranging data; according to the principle of continuous picture, mark the training video image frame from the training video image frame and the previous frame training data. Use the continuous ROI of the video image frame to obtain the ROI for labeling training; compare the ROI for labeling training with the video image frame for training to obtain the proportion of labeling picture; save the radar ranging data for training and the proportion of labeling picture as a training set.

模型训练模块605还可以将标注训练用ROI在训练用视频图像帧上的标注显示位置和标注画面比例进行拟合处理，得到拟合模型。例如，将各个标注画面比例作为已知量输入以目标显示位置为未知量的目标拟合方程，迭代求解目标显示位置，其中目标拟合方程包括多项式拟合系数；当目标显示位置与标注训练用ROI在训练用视频图像帧上的标注显示位置之间的偏差小于预设阈值时，确定对应的多项式拟合系数取值为目标拟合系数取值，以目标拟合系数取值确定的目标拟合方程作为拟合模型。The model training module 605 may also perform a fitting process on the marked display position of the marked training ROI on the training video image frame and the marked screen scale to obtain a fitted model. For example, input the scale of each annotated screen as a known quantity and input the target fitting equation with the target display position as the unknown quantity, and iteratively solve the target display position, where the target fitting equation includes polynomial fitting coefficients; When the deviation between the marked display positions of the ROI on the training video image frame is less than the preset threshold, the corresponding polynomial fitting coefficient is determined as the target fitting coefficient value, and the target fitting coefficient determined by the target fitting coefficient value is determined. The fitting equation is used as the fitting model.

数据收集模块606，用于采集车载雷达测距数据和摄像头的视频图像帧并进行预处理。本申请中，可以当行驶车辆周边有物体进入超声波雷达有效范围内时，每隔设定时间例如5s记录当前所有摄像头的视频图像帧数据和车载雷达测距数据，并保存。为了方便训练浅层神经网络模型处理并进一步降低计算量，数据收集模块606还可以对车载雷达测距数据进行预处理。该预处理例如包括进行数据清洗(例如，清除脏点数据即明显不符合要求的测距数据)和数据的归一化(normalization)等；归一化后的车载雷达测距数据对研究人员而言更加直观、方便。The data collection module 606 is used to collect vehicle radar ranging data and video image frames of the camera and perform preprocessing. In this application, when an object around the driving vehicle enters the effective range of the ultrasonic radar, the video image frame data and the vehicle radar ranging data of all the current cameras can be recorded every set time, such as 5s, and saved. In order to facilitate the training of the shallow neural network model and further reduce the amount of computation, the data collection module 606 may also preprocess the vehicle radar ranging data. The preprocessing includes, for example, data cleaning (for example, removing dirty point data, that is, ranging data that obviously does not meet the requirements) and data normalization (normalization), etc.; The language is more intuitive and convenient.

关于上述实施例中的装置，其中各个模块执行操作的具体方式已经在有关该方法的实施例中进行了详细描述，此处将不再做详细阐述说明。Regarding the apparatus in the above-mentioned embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment of the method, and will not be described in detail here.

参见图8，电子设备800包括存储器810和处理器820。Referring to FIG. 8 , an electronic device 800 includes a memory 810 and a processor 820 .

处理器820可以是中央处理单元(Central Processing Unit，CPU)，还可以是其他通用处理器、数字信号处理器(Digital Signal Processor，DSP)、专用集成电路(Application Specific Integrated Circuit，ASIC)、现场可编程门阵列(Field-Programmable Gate Array，FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。The processor 820 may be a central processing unit (Central Processing Unit, CPU), and may also be other general-purpose processors, digital signal processors (Digital Signal Processors, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), field-available processors Field-Programmable Gate Array (FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

存储器810可以包括各种类型的存储单元，例如系统内存、只读存储器(ROM)和永久存储装置。其中，ROM可以存储处理器820或者计算机的其他模块需要的静态数据或者指令。永久存储装置可以是可读写的存储装置。永久存储装置可以是即使计算机断电后也不会失去存储的指令和数据的非易失性存储设备。在一些实施方式中，永久性存储装置采用大容量存储装置(例如磁或光盘、闪存)作为永久存储装置。另外一些实施方式中，永久性存储装置可以是可移除的存储设备(例如软盘、光驱)。系统内存可以是可读写存储设备或者易失性可读写存储设备，例如动态随机访问内存。系统内存可以存储一些或者所有处理器在运行时需要的指令和数据。此外，存储器810可以包括任意计算机可读存储媒介的组合，包括各种类型的半导体存储芯片(例如DRAM，SRAM，SDRAM，闪存，可编程只读存储器)，磁盘和/或光盘也可以采用。在一些实施方式中，存储器810可以包括可读和/或写的可移除的存储设备，例如激光唱片(CD)、只读数字多功能光盘(例如DVD-ROM，双层DVD-ROM)、只读蓝光光盘、超密度光盘、闪存卡(例如SD卡、min SD卡、Micro-SD卡等)、磁性软盘等。计算机可读存储媒介不包含载波和通过无线或有线传输的瞬间电子信号。Memory 810 may include various types of storage units, such as system memory, read only memory (ROM), and persistent storage. The ROM may store static data or instructions required by the processor 820 or other modules of the computer. Persistent storage devices may be readable and writable storage devices. Permanent storage may be a non-volatile storage device that does not lose stored instructions and data even if the computer is powered off. In some embodiments, persistent storage devices employ mass storage devices (eg, magnetic or optical disks, flash memory) as persistent storage devices. In other embodiments, persistent storage may be a removable storage device (eg, a floppy disk, an optical drive). System memory can be a readable and writable storage device or a volatile readable and writable storage device, such as dynamic random access memory. System memory can store some or all of the instructions and data that the processor needs at runtime. Additionally, memory 810 may include any combination of computer-readable storage media, including various types of semiconductor memory chips (eg, DRAM, SRAM, SDRAM, flash memory, programmable read-only memory), and magnetic and/or optical disks may also be employed. In some implementations, memory 810 may include a removable storage device that is readable and/or writable, such as a compact disc (CD), a read-only digital versatile disc (eg, DVD-ROM, dual-layer DVD-ROM), Read-only Blu-ray Disc, Ultra-Density Disc, Flash Card (eg SD Card, Min SD Card, Micro-SD Card, etc.), Magnetic Floppy Disk, etc. Computer readable storage media do not contain carrier waves and transient electronic signals transmitted over wireless or wire.

存储器810上存储有可执行代码，当可执行代码被处理器820处理时，可以使处理器820执行上文述及的方法中的部分或全部。Executable codes are stored on the memory 810, and when the executable codes are processed by the processor 820, the processor 820 can be caused to execute some or all of the above-mentioned methods.

此外，根据本申请的方法还可以实现为一种计算机程序或计算机程序产品，该计算机程序或计算机程序产品包括用于执行本申请的上述方法中部分或全部步骤的计算机程序代码指令。Furthermore, the method according to the present application can also be implemented as a computer program or computer program product comprising computer program code instructions for performing some or all of the steps in the above method of the present application.

或者，本申请还可以实施为一种计算机可读存储介质(或非暂时性机器可读存储介质或机器可读存储介质)，其上存储有可执行代码(或计算机程序或计算机指令代码)，当可执行代码(或计算机程序或计算机指令代码)被电子设备(或服务器等)的处理器执行时，使处理器执行根据本申请的上述方法的各个步骤的部分或全部。Alternatively, the present application can also be implemented as a computer-readable storage medium (or a non-transitory machine-readable storage medium or a machine-readable storage medium) on which executable codes (or computer programs or computer instruction codes) are stored, When the executable code (or computer program or computer instruction code) is executed by the processor of the electronic device (or server, etc.), the processor is caused to perform some or all of the steps of the above method according to the present application.

以上已经描述了本申请的各实施例，上述说明是示例性的，并非穷尽性的，并且也不限于所披露的各实施例。在不偏离所说明的各实施例的范围和精神的情况下，对于本技术领域的普通技术人员来说许多修改和变更都是显而易见的。本文中所用术语的选择，旨在最好地解释各实施例的原理、实际应用或对市场中的技术的改进，或者使本技术领域的其他普通技术人员能理解本文披露的各实施例。Various embodiments of the present application have been described above, and the foregoing descriptions are exemplary, not exhaustive, and not limiting of the disclosed embodiments. Numerous modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments. The terminology used herein was chosen to best explain the principles of the embodiments, the practical application or improvement over the technology in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein.

Claims

1. a video image splicing method, is characterized in that, comprises:

Obtain the ratio of the ROI of the region of interest to the original video image frame according to the vehicle radar ranging data and the first preset model;

Obtain the display position information of the ROI on the original video image frame according to the screen ratio and the second preset model;

According to the aspect ratio and the display position information, the target ROI is obtained by cropping the original video image frame;

The target ROI obtained from each crop is spliced into a target video image.

2. method according to claim 1, is characterized in that:

The obtaining the ratio of the ROI of the region of interest to the original video image frame according to the vehicle-mounted radar ranging data and the first preset model includes: inputting the vehicle-mounted radar ranging data into a pre-trained shallow neural network model, and outputting the region of interest The aspect ratio of the ROI and the original video image frame;

The obtaining the display position information of the ROI on the original video image frame according to the picture scale and the second preset model includes: inputting the picture scale into a fitting model, and outputting the ROI in the original video frame. Display position information on video image frames.

3. The method according to claim 2, wherein the shallow neural network model is obtained by training in the following manner:

The shallow neural network model is trained by using a training set to obtain the pre-trained shallow neural network model, wherein the training set includes annotated screen scale and training radar ranging data, and the marked screen scale is the mark The aspect ratio of the training ROI to the training video image frame.

4. The method according to claim 3, wherein the training set is obtained in the following manner:

Select a set amount of data from the collected radar ranging data as the training radar ranging data;

According to the principle of picture continuity, the ROI that is continuous with the previous frame of the training video image frame is marked from the training video image frame, and the training ROI is obtained;

According to the ROI for labeling training and the video image frame for training, the labeling screen ratio is obtained;

The ratio of the radar ranging data for training and the marked picture is saved as a training set.

5. The method according to claim 2, wherein the fitting model is obtained in the following manner:

The fitting model is obtained by fitting the labeling display position of the labeling training ROI on the training video image frame and the labeling screen ratio.

6. The method according to claim 5, characterized in that, performing fitting processing on the labeling display position of the labeling training ROI on the training video image frame and the labeling screen ratio, to obtain the result. described fitting model, including:

Inputting each of the marked screen ratios as known quantities into a target fitting equation with a target display position as an unknown quantity, and iteratively solving the target display position, wherein the target fitting equation includes a polynomial fitting coefficient;

When the deviation between the target display position and the labeling display position of the labeling training ROI on the training video image frame is less than a preset threshold, determine the corresponding polynomial fitting coefficient as the target fitting coefficient value, and the target fitting equation determined by the target fitting coefficient value is used as the fitting model.

7. A video image splicing device is characterized in that, comprising:

The first output module is used to obtain the picture ratio of the region of interest ROI and the original video image frame according to the vehicle radar ranging data and the first preset model;

a second output module, configured to obtain the display position information of the ROI on the original video image frame according to the screen ratio and the second preset model;

a target area module, configured to obtain a target ROI by cropping the original video image frame according to the screen ratio obtained by the first output module and the display position information obtained by the second output module;

The splicing module is used for splicing the target ROI obtained by each cropping in the target area module into a target video image.

8. The device according to claim 7, wherein:

The first output module inputs the vehicle-mounted radar ranging data into the pre-trained shallow neural network model, and outputs the picture ratio of the region of interest ROI and the original video image frame;

The second output module inputs the picture scale into a fitting model, and outputs the display position information of the ROI on the original video image frame.

9. The apparatus of claim 8, wherein the apparatus further comprises:

A model training module is used to train the shallow neural network model by using a training set to obtain the pre-trained shallow neural network model, wherein the training set includes annotated screen ratio and training radar ranging data, so The annotated screen ratio is the screen ratio of the ROI for marking training and the video image frame for training.

10. An electronic device, comprising:

processor; and

A memory having executable code stored thereon which, when executed by the processor, causes the processor to perform the method of any one of claims 1 to 6.

11. A computer-readable storage medium on which executable codes are stored, and when the executable codes are executed by a processor of an electronic device, the processor is caused to execute as claimed in any one of claims 1 to 6. method described.