CN112017300B

CN112017300B - Mixed reality image processing method, device and equipment

Info

Publication number: CN112017300B
Application number: CN202010712798.8A
Authority: CN
Inventors: 吴涛
Original assignee: Qingdao Xiaoniao Kankan Technology Co Ltd
Current assignee: Qingdao Xiaoniao Kankan Technology Co Ltd
Priority date: 2020-07-22
Filing date: 2020-07-22
Publication date: 2024-11-05
Anticipated expiration: 2040-07-22
Also published as: CN112017300A

Abstract

The present invention discloses a method, device and equipment for processing mixed reality images. The method comprises: acquiring an environmental scene image; identifying a target object contained in the environmental scene image to obtain spatial position information of the target object; and superimposing description information corresponding to the target object onto the environmental scene image to obtain a mixed reality image.

Description

Mixed reality image processing method, device and equipment

技术领域Technical Field

本公开实施例涉及图像处理技术领域，更具体地，本公开实施例涉及一种混合现实图像的处理方法、混合现实图像的处理装置及头戴显示设备。The embodiments of the present disclosure relate to the field of image processing technology. More specifically, the embodiments of the present disclosure relate to a method for processing mixed reality images, a device for processing mixed reality images, and a head-mounted display device.

背景技术Background Art

混合现实技术(Mixed Reality，MR)是虚拟现实技术的进一步发展，该技术通过在虚拟环境中引入现实场景信息，在虚拟世界、现实世界和用户之间搭起一个交互反馈的信息回路，以增强用户体验的真实感。Mixed Reality (MR) technology is a further development of virtual reality technology. This technology introduces real-world scene information into the virtual environment, builds an interactive feedback information loop between the virtual world, the real world and the user, and enhances the realism of the user experience.

目前，在幼儿教学领域，可以利用增强现实技术对教学视频进行处理。在利用头戴显示设备进行幼儿教学时，可以为用户显示更逼真的场景。但是，用户不能与场景中物体进行交互，用户体验较差。At present, in the field of early childhood education, augmented reality technology can be used to process teaching videos. When using head-mounted display devices for early childhood education, more realistic scenes can be displayed to users. However, users cannot interact with objects in the scene, and the user experience is poor.

因此，有必要提供一种新的混合现实图像的处理的方案。Therefore, it is necessary to provide a new solution for processing mixed reality images.

发明内容Summary of the invention

本公开实施例的目的在于提供一种混合现实图像的处理的新技术方案。The purpose of the embodiments of the present disclosure is to provide a new technical solution for processing mixed reality images.

根据本公开实施例的第一方面，提供了一种混合现实图像的处理方法，所述方法包括：According to a first aspect of an embodiment of the present disclosure, a method for processing a mixed reality image is provided, the method comprising:

获取环境场景图像；Acquire environmental scene images;

识别所述环境场景图像中包含的目标对象，获得所述目标对象的空间位置信息；Identify the target object contained in the environmental scene image, and obtain the spatial position information of the target object;

基于所述空间位置信息，将所述目标对象对应的描述信息叠加到所述环境场景图像上，获得混合现实图像。Based on the spatial position information, the description information corresponding to the target object is superimposed on the environmental scene image to obtain a mixed reality image.

可选地，所述识别所述环境场景图像中包含的目标对象，获得所述目标对象的空间位置信息的步骤之后，所述方法还包括：Optionally, after the step of identifying the target object contained in the environmental scene image and obtaining the spatial position information of the target object, the method further includes:

获取待展示的虚拟对象的初始数据集；Obtaining an initial data set of virtual objects to be displayed;

根据所述环境场景图像中目标平面的平面参数和所述待展示的虚拟对象的初始数据集，在所述环境场景图像中渲染出所述待展示的虚拟对象；Rendering the virtual object to be displayed in the environment scene image according to the plane parameters of the target plane in the environment scene image and the initial data set of the virtual object to be displayed;

基于所述空间位置信息和所述平面参数，将所述目标对象对应的描述信息、以及所述待展示的虚拟对象对应的描述信息叠加到所述环境场景图像上，获得混合现实图像。Based on the spatial position information and the plane parameters, the description information corresponding to the target object and the description information corresponding to the virtual object to be displayed are superimposed on the environmental scene image to obtain a mixed reality image.

可选地，所述环境场景图像包括通过所述头戴显示设备的双目相机采集的第一图像和第二图像；Optionally, the environmental scene image includes a first image and a second image captured by a binocular camera of the head mounted display device;

所述识别所述环境场景图像中包含的目标对象，获得所述目标对象的空间位置信息的步骤包括：The step of identifying the target object contained in the environmental scene image and obtaining the spatial position information of the target object comprises:

基于预定的识别模型识别所述第一图像中包含的目标对象，获得所述目标对象所在区域的第一位置信息；Identify the target object contained in the first image based on a predetermined recognition model, and obtain first position information of the area where the target object is located;

基于预定的识别模型识别所述第二图像中包含的目标对象，获得所述目标对象所在区域的第二位置信息；Identify the target object contained in the second image based on a predetermined recognition model, and obtain second position information of the area where the target object is located;

根据所述第一位置信息和所述第二位置信息，确定所述目标对象的空间位置信息。The spatial position information of the target object is determined according to the first position information and the second position information.

可选地，所述方法还包括：训练预定的识别模型的步骤：Optionally, the method further comprises: a step of training a predetermined recognition model:

获取历史环境场景图像；Acquire historical environmental scene images;

确定所述历史环境场景图像中的目标对象的所在区域，以及对所述目标对象所在区域的位置信息和所述目标对象的类别信息进行标注；Determining a region where a target object is located in the historical environment scene image, and marking position information of the region where the target object is located and category information of the target object;

根据标注后的历史环境场景图像生成数据集；Generate a data set based on the annotated historical environment scene images;

根据所述数据集训练所述识别模型。The recognition model is trained according to the data set.

可选地，所述方法还包括：Optionally, the method further comprises:

识别所述环境场景图像中包含的目标对象，获得所述目标对象的类别信息；Identify the target object contained in the environmental scene image and obtain category information of the target object;

根据所述目标对象的类别信息，从预先建立的数据库中选取所述目标对象对应的描述信息。According to the category information of the target object, description information corresponding to the target object is selected from a pre-established database.

可选地，所述头戴显示设备包括第一相机和第二相机；Optionally, the head mounted display device includes a first camera and a second camera;

所述基于所述空间位置信息，将所述目标对象对应的描述信息叠加到所述环境场景图像上，获得混合现实图像的步骤包括：The step of superimposing the description information corresponding to the target object onto the environment scene image based on the spatial position information to obtain a mixed reality image comprises:

根据所述空间位置信息和预定的第一偏移量，确定所述目标对象对应的描述信息的显示位置的三维坐标信息；Determining three-dimensional coordinate information of a display position of the description information corresponding to the target object according to the spatial position information and a predetermined first offset;

对所述三维坐标信息进行转换，获得所述显示位置在所述第一相机的图像坐标系下的第一像素坐标和所述显示位置在所述第二相机的图像坐标系下的第二像素坐标；Converting the three-dimensional coordinate information to obtain a first pixel coordinate of the display position in the image coordinate system of the first camera and a second pixel coordinate of the display position in the image coordinate system of the second camera;

根据所述第一像素坐标和所述第二像素坐标，将所述目标对象对应的描述信息叠加到所述环境场景图像上，获得混合现实图像。According to the first pixel coordinates and the second pixel coordinates, the description information corresponding to the target object is superimposed on the environmental scene image to obtain a mixed reality image.

可选地，所述方法还包括：确定所述环境场景图像中目标平面的平面参数的步骤：Optionally, the method further comprises: a step of determining plane parameters of a target plane in the environment scene image:

对所述环境场景图像进行特征点提取；Extracting feature points from the environment scene image;

根据提取到的特征点构建特征点云；Construct a feature point cloud based on the extracted feature points;

基于所述特征点云对所述环境场景图像进行平面检测，确定目标平面；Performing plane detection on the environment scene image based on the feature point cloud to determine a target plane;

获取所述目标平面的平面参数，所述平面参数包括中心点坐标和法向量。The plane parameters of the target plane are obtained, wherein the plane parameters include center point coordinates and a normal vector.

可选地，所述根据所述环境场景图像中目标平面的平面参数和所述待展示的虚拟对象的初始数据集，在所述环境场景图像中渲染出所述待展示的虚拟对象的步骤包括：Optionally, the step of rendering the virtual object to be displayed in the environmental scene image according to the plane parameters of the target plane in the environmental scene image and the initial data set of the virtual object to be displayed comprises:

获取待展示的虚拟对象的初始数据集，所述初始数据集包括构建所述虚拟对象的多个特征点的三维坐标信息；Acquire an initial data set of a virtual object to be displayed, the initial data set including three-dimensional coordinate information of a plurality of feature points for constructing the virtual object;

根据所述目标平面的中心点坐标确定所述待展示的虚拟对象的放置位置，以及根据所述目标平面的法向量确定所述待展示的虚拟对象的放置方向；Determining a placement position of the virtual object to be displayed according to the center point coordinates of the target plane, and determining a placement direction of the virtual object to be displayed according to the normal vector of the target plane;

根据所述待展示的虚拟对象的初始数据集、所述待展示的虚拟对象的放置位置和放置方向，确定所述待展示的虚拟对象的目标数据集；Determining a target data set of the virtual object to be displayed according to the initial data set of the virtual object to be displayed, and the placement position and placement direction of the virtual object to be displayed;

根据所述待展示的虚拟对象的目标数据集，在所述环境场景图像中渲染出所述待展示的虚拟对象。According to the target data set of the virtual object to be displayed, the virtual object to be displayed is rendered in the environment scene image.

可选地，所述基于所述空间位置信息和所述平面参数，将所述目标对象对应的描述信息、以及所述待展示的虚拟对象对应的描述信息叠加到所述环境场景图像上，获得混合现实图像的步骤包括：Optionally, the step of superimposing the description information corresponding to the target object and the description information corresponding to the virtual object to be displayed on the environment scene image based on the spatial position information and the plane parameters to obtain the mixed reality image includes:

根据所述空间位置信息和预定的第一偏移量，确定所述目标对象对应的描述信息的显示位置的第一位置信息；Determine first position information of a display position of the description information corresponding to the target object according to the spatial position information and a predetermined first offset;

根据所述目标平面的中心点坐标和预定的第二偏移量，确定所述待展示的虚拟对象对应的描述信息的显示位置的第二位置信息；Determining second position information of a display position of the description information corresponding to the virtual object to be displayed according to the center point coordinates of the target plane and a predetermined second offset;

根据所述第一位置信息和所述第二位置信息，将所述目标对象对应的描述信息、以及所述待展示的虚拟对象对应的描述信息叠加到所述环境场景图像上，获得混合现实图像。According to the first position information and the second position information, the description information corresponding to the target object and the description information corresponding to the virtual object to be displayed are superimposed on the environmental scene image to obtain a mixed reality image.

根据本公开实施例的第二方面，提供了一种混合现实图像的处理装置，所述装置包括：According to a second aspect of an embodiment of the present disclosure, a device for processing a mixed reality image is provided, the device comprising:

获取模块，用于获取环境场景图像；An acquisition module is used to acquire an environmental scene image;

识别模型，用于识别所述环境场景图像中包含的目标对象，获得所述目标对象的空间位置信息；A recognition model is used to recognize a target object contained in the environmental scene image and obtain spatial position information of the target object;

图像生成模块，用于基于所述空间位置信息，将所述目标对象对应的描述信息叠加到所述环境场景图像上，获得混合现实图像；An image generation module, configured to superimpose the description information corresponding to the target object onto the environment scene image based on the spatial position information to obtain a mixed reality image;

或者，or,

所述装置包括处理器和存储器，所述存储器存储有计算机指令，所述计算机指令被所述处理器运行时，执行本公开实施例的第一方面任一项所述的方法。The device includes a processor and a memory, wherein the memory stores computer instructions, and when the computer instructions are executed by the processor, the method described in any one of the first aspects of the embodiments of the present disclosure is executed.

根据本公开实施例的第三方面，提供了一种头戴显示设备，包括双目相机和本公开实施例的第二方面所述的混合现实图像的处理装置。According to a third aspect of an embodiment of the present disclosure, a head-mounted display device is provided, comprising a binocular camera and the apparatus for processing mixed reality images according to the second aspect of an embodiment of the present disclosure.

根据本公开实施例，通过对混合现实图像的处理，可以在展示目标对象的同时，向用户展示目标对象相关的描述信息。根据本公开实施例，通过对视频中每一帧图像进行处理，可以将目标对象的描述信息融入到实景视频中，使用户可以通过头戴显示设备观看实景视频的同时，获得目标对象对应的描述信息，用户体验更好。According to the embodiments of the present disclosure, by processing the mixed reality image, the description information related to the target object can be displayed to the user while displaying the target object. According to the embodiments of the present disclosure, by processing each frame of the video, the description information of the target object can be integrated into the real-life video, so that the user can obtain the description information corresponding to the target object while watching the real-life video through the head-mounted display device, and the user experience is better.

根据本公开实施例可以应用于教学场景，向用户展示目标对象的同时，向用户展示目标对象相关的描述信息，方便用户了解目标对象，还可以增加教学的趣味性，用户体验更好。According to the embodiment of the present disclosure, it can be applied to teaching scenarios. While displaying the target object to the user, the descriptive information related to the target object is displayed to the user, which makes it easier for the user to understand the target object, increases the fun of teaching, and provides a better user experience.

通过以下参照附图对本发明的示例性实施例的详细描述，本发明的其它特征及其优点将会变得清楚。Further features and advantages of the present invention will become apparent from the following detailed description of exemplary embodiments of the present invention with reference to the attached drawings.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

为了更清楚地说明本公开实施例的技术方案，下面将对实施例中所需要使用的附图作简单地介绍。应当理解，以下附图仅示出了本发明的某些实施例，因此不应被看作是对范围的限定。对于本领域普通技术人员来讲，在不付出创造性劳动的前提下，还可以根据这些附图获得其他相关的附图。In order to more clearly illustrate the technical solutions of the embodiments of the present disclosure, the following is a brief introduction to the drawings required for use in the embodiments. It should be understood that the following drawings only illustrate certain embodiments of the present invention and should not be regarded as limiting the scope. For those of ordinary skill in the art, other related drawings can also be obtained based on these drawings without creative work.

图1为可用于实现本公开实施例的头戴显示设备的硬件配置示意图；FIG1 is a schematic diagram of a hardware configuration of a head mounted display device that can be used to implement an embodiment of the present disclosure;

图2为本公开实施例的混合现实图像的处理方法的流程示意图；FIG2 is a schematic diagram of a process of a method for processing a mixed reality image according to an embodiment of the present disclosure;

图3为本公开实施例的混合现实图像的处理方法的流程示意图；FIG3 is a schematic diagram of a process of a mixed reality image processing method according to an embodiment of the present disclosure;

图4为本公开实施例的混合现实图像的处理方法的场景示意图；FIG4 is a schematic diagram of a scene of a method for processing a mixed reality image according to an embodiment of the present disclosure;

图5为本公开实施例的混合现实图像的处理装置的结构方框图；FIG5 is a structural block diagram of a device for processing mixed reality images according to an embodiment of the present disclosure;

图6为本公开实施例的混合现实图像的处理装置的结构方框图；FIG6 is a structural block diagram of a device for processing mixed reality images according to an embodiment of the present disclosure;

图7为本公开实施例的头戴显示设备的结构示意图。FIG. 7 is a schematic diagram of the structure of a head mounted display device according to an embodiment of the present disclosure.

具体实施方式DETAILED DESCRIPTION

现在将参照附图来详细描述本发明的各种示例性实施例。应注意到：除非另外具体说明，否则在这些实施例中阐述的部件和步骤的相对布置、数字表达式和数值不限制本发明的范围。Various exemplary embodiments of the present invention will now be described in detail with reference to the accompanying drawings. It should be noted that the relative arrangement of components and steps, numerical expressions and numerical values set forth in these embodiments do not limit the scope of the present invention unless otherwise specifically stated.

以下对至少一个示例性实施例的描述实际上仅仅是说明性的，决不作为对本发明及其应用或使用的任何限制。The following description of at least one exemplary embodiment is merely illustrative in nature and is in no way intended to limit the invention, its application, or uses.

对于相关领域普通技术人员已知的技术、方法和设备可能不作详细讨论，但在适当情况下，所述技术、方法和设备应当被视为说明书的一部分。Technologies, methods, and equipment known to ordinary technicians in the relevant art may not be discussed in detail, but where appropriate, the technologies, methods, and equipment should be considered as part of the specification.

在这里示出和讨论的所有例子中，任何具体值应被解释为仅仅是示例性的，而不是作为限制。因此，示例性实施例的其它例子可以具有不同的值。In all examples shown and discussed herein, any specific values should be interpreted as merely exemplary and not limiting. Therefore, other examples of the exemplary embodiments may have different values.

应注意到：相似的标号和字母在下面的附图中表示类似项，因此，一旦某一项在一个附图中被定义，则在随后的附图中不需要对其进行进一步讨论。It should be noted that like reference numerals and letters refer to similar items in the following figures, and therefore, once an item is defined in one figure, it need not be further discussed in subsequent figures.

<硬件配置><Hardware Configuration>

图1是可用于实现本公开实施例的混合现实图像的处理方法的一种头戴显示设备100的硬件配置示意图。FIG1 is a schematic diagram of the hardware configuration of a head mounted display device 100 that can be used to implement the method for processing mixed reality images according to an embodiment of the present disclosure.

在一个实施例中，该头戴显示设备100可以是虚拟现实(Virtual Reality，VR)设备、增强现实(Augmented Reality，AR)设备或者混合现实(Mixed Reality)设备等智能设备。In one embodiment, the head mounted display device 100 may be an intelligent device such as a virtual reality (VR) device, an augmented reality (AR) device, or a mixed reality (Mixed Reality) device.

在一个实施例中，该头戴显示设备100包括第一相机和第二相机，第一相机和第二相机用于模拟人眼。In one embodiment, the head mounted display device 100 includes a first camera and a second camera, and the first camera and the second camera are used to simulate human eyes.

在一个实施例中，如图1所示，该头戴显示设备100可以包括处理器110、存储器120、接口装置130、通信装置140、显示装置150、输入装置160、扬声器170、麦克风180、摄像头190等。其中，处理器110例如可以包括但不限于中央处理器CPU、微处理器MCU等。处理器110例如还可以包括图像处理器GPU(Graphics Processing Unit)等。存储器120例如可以包括但不限于ROM(只读存储器)、RAM(随机存取存储器)、诸如硬盘的非易失性存储器等。接口装置130例如可以包括但不限于USB接口、串行接口、并行接口、红外接口等。通信装置140例如能够进行有线或无线通信，具体的可以包括WiFi通信、蓝牙通信、2G/3G/4G/5G通信等。显示装置150例如可以是液晶显示屏、LED显示屏、触摸显示屏等。输入装置160例如可以包括但不限于触摸屏、键盘、体感输入等。扬声器170和麦克风180可以用于输出/输入语音信息。摄像头180例如可以用于获取图像信息，摄像头190例如可以是双目摄像头。尽管在图1中对头戴显示设备100示出了多个装置，但是，本发明可以仅涉及其中的部分装置。In one embodiment, as shown in FIG1 , the head mounted display device 100 may include a processor 110, a memory 120, an interface device 130, a communication device 140, a display device 150, an input device 160, a speaker 170, a microphone 180, a camera 190, etc. Among them, the processor 110 may include, for example, but not limited to a central processing unit CPU, a microprocessor MCU, etc. The processor 110 may also include, for example, a graphics processor GPU (Graphics Processing Unit), etc. The memory 120 may include, for example, but not limited to ROM (Read Only Memory), RAM (Random Access Memory), a non-volatile memory such as a hard disk, etc. The interface device 130 may include, for example, but not limited to a USB interface, a serial interface, a parallel interface, an infrared interface, etc. The communication device 140 may, for example, be capable of wired or wireless communication, and may specifically include WiFi communication, Bluetooth communication, 2G/3G/4G/5G communication, etc. The display device 150 may, for example, be a liquid crystal display, an LED display, a touch display, etc. The input device 160 may include, for example, but not limited to a touch screen, a keyboard, somatosensory input, etc. The speaker 170 and the microphone 180 may be used to output/input voice information. The camera 180 may be used to obtain image information, for example, and the camera 190 may be a binocular camera, for example. Although multiple devices are shown for the head mounted display device 100 in FIG1 , the present invention may only involve some of the devices.

应用于本公开实施例中，头戴显示设备100的存储器120用于存储指令，所述指令用于控制所述处理器110进行操作以支持实现根据本公开第一方面提供的任意实施例的混合现实图像的处理方法。技术人员可以根据本公开实施例所公开方案设计指令。指令如何控制处理器进行操作，这是本领域公知，故在此不再详细描述。As used in the embodiments of the present disclosure, the memory 120 of the head mounted display device 100 is used to store instructions, and the instructions are used to control the processor 110 to operate to support the implementation of the mixed reality image processing method of any embodiment provided in the first aspect of the present disclosure. The technicians can design instructions according to the scheme disclosed in the embodiments of the present disclosure. How the instructions control the processor to operate is well known in the art, so it will not be described in detail here.

<方法实施例一><Method Example 1>

参见图2所示，说明本公开实施例提供的混合现实图像的处理方法。该方法涉及到头戴显示设备，该头戴显示设备可以是如图1所示的头戴显示设备100。该混合现实图像的处理方法包括以下步骤：Referring to FIG. 2 , a method for processing a mixed reality image provided by an embodiment of the present disclosure is described. The method relates to a head mounted display device, which may be the head mounted display device 100 as shown in FIG. 1 . The method for processing a mixed reality image comprises the following steps:

步骤210、获取环境场景图像。Step 210: Acquire an environmental scene image.

本实施例中，通过头戴显示设备获取环境场景图像。头戴显示设备包括双目相机。该环境场景图像包括通过双目相机中的第一相机采集的第一图像和通过双目相机中的第二相机采集的第二图像。其中，所述第一图像和所述第二图像是在同一时刻采集的。可选地，可以采用同一时钟触发源对第一相机和第二相机进行触发，以保证第一相机和第二相机的硬件同步。在本实施例中，所述第一图像和所述第二图像的图像尺寸大小一致，其中，图像的尺寸大小可以有多种设置。In this embodiment, an environmental scene image is acquired through a head-mounted display device. The head-mounted display device includes a binocular camera. The environmental scene image includes a first image captured by a first camera in the binocular camera and a second image captured by a second camera in the binocular camera. The first image and the second image are captured at the same time. Optionally, the same clock trigger source can be used to trigger the first camera and the second camera to ensure hardware synchronization of the first camera and the second camera. In this embodiment, the image size of the first image and the second image is consistent, wherein the image size can be set in a variety of ways.

在一个实施例中，在教学场景下，可以通过环境场景图像为用户展示不同的物体，以及针对展示的物体显示对应的教学内容。环境场景图像中包括目标对象。目标对象可以是环境场景图像中用于向用户展示的物体。环境场景图像例如可以是室内场景图像或者室外场景图像。室内场景图像中包含的目标对象可以是家居用品、食物等。家居用品例如可以是沙发、餐桌、座椅等。食物可以是蔬菜、水果、零食等。例如，苹果、香蕉、火龙果、西红柿、青菜等。室外场景图像中包含的目标对象例如可以是商店、公交车、交通灯等。In one embodiment, in a teaching scenario, different objects can be displayed to the user through an environmental scene image, and corresponding teaching content can be displayed for the displayed objects. The environmental scene image includes a target object. The target object can be an object used to display to the user in the environmental scene image. The environmental scene image can be, for example, an indoor scene image or an outdoor scene image. The target objects contained in the indoor scene image can be household items, food, etc. Household items can be, for example, sofas, dining tables, chairs, etc. Food can be vegetables, fruits, snacks, etc. For example, apples, bananas, pitaya, tomatoes, green vegetables, etc. The target objects contained in the outdoor scene image can be, for example, shops, buses, traffic lights, etc.

在获取环境场景图像，之后，进入步骤220。After acquiring the environment scene image, proceed to step 220.

步骤220、识别所述环境场景图像中包含的目标对象，获得所述目标对象的空间位置信息。Step 220: Identify the target object contained in the environmental scene image and obtain the spatial position information of the target object.

在一个实施例中，识别所述环境场景图像中包含的目标对象，获得所述目标对象的空间位置信息的步骤，可以进一步包括：基于预定的识别模型识别所述环境场景图像中包含的目标对象，获得所述目标对象的空间位置信息。根据本公开实施例，基于预定的识别模型，定位环境场景图像中的目标对象，可以提高识别的准确性。In one embodiment, the step of identifying the target object contained in the environmental scene image and obtaining the spatial position information of the target object may further include: identifying the target object contained in the environmental scene image based on a predetermined recognition model and obtaining the spatial position information of the target object. According to an embodiment of the present disclosure, locating the target object in the environmental scene image based on a predetermined recognition model can improve the accuracy of recognition.

该实施例中，该混合现实图像的处理方法，还包括训练预定的识别模型的步骤。训练预定的识别模型的步骤，可以进一步包括：步骤410-440。In this embodiment, the method for processing mixed reality images further includes the step of training a predetermined recognition model. The step of training a predetermined recognition model may further include: steps 410-440.

步骤410、获取历史环境场景图像。Step 410: Acquire a historical environment scene image.

该例子中，历史环境场景图像可以是包含目标对象的图像。环境场景可以是室内环境或者室外环境。目标对象可以是用于向用户展示的物体。In this example, the historical environment scene image may be an image containing a target object. The environment scene may be an indoor environment or an outdoor environment. The target object may be an object for display to a user.

步骤420、确定所述历史环境场景图像中的目标对象的所在区域，以及对所述目标对象所在区域的位置信息和所述目标对象的类别信息进行标注。Step 420: determine the area where the target object in the historical environment scene image is located, and mark the position information of the area where the target object is located and the category information of the target object.

该例子中，可以通过选取窗口选中历史环境场景图像中的目标对象。也就是说，目标对象所在区域可以是选取窗口选中的区域。目标对象所在区域的位置信息可以包括选取窗口的角点坐标以及选取窗口的尺寸信息。选取窗口的尺寸信息可以包括选取窗口的长度信息和宽度信息。可选地，该选取窗口的尺寸信息可以是预定的固定数值。可选地，针对不同类别的目标对象设置对应的选取窗口，不同类别的目标对象对应的选取窗口的尺寸信息可以相同，也可以不同。选取窗口的形状也可以根据实际需要进行设定。例如，选取窗口为长方形。本公开实施例对此不作限制。In this example, the target object in the historical environment scene image can be selected through the selection window. That is to say, the area where the target object is located can be the area selected by the selection window. The position information of the area where the target object is located may include the coordinates of the corner points of the selection window and the size information of the selection window. The size information of the selection window may include the length information and width information of the selection window. Optionally, the size information of the selection window may be a predetermined fixed value. Optionally, corresponding selection windows are set for target objects of different categories, and the size information of the selection windows corresponding to target objects of different categories may be the same or different. The shape of the selection window can also be set according to actual needs. For example, the selection window is rectangular. The embodiments of the present disclosure are not limited to this.

步骤430、根据标注后的历史环境场景图像生成数据集。Step 430: Generate a data set based on the annotated historical environment scene images.

步骤440、根据所述数据集训练所述识别模型。Step 440: Train the recognition model according to the data set.

该实施例中，将标注后的历史环境场景图像作为训练样本，并根据多个标注后的历史环境场景图像生成数据集。数据集中的训练样本的数量越多，训练结果的精度越高。并且，在数据集中的训练样本的数量达到一定值时，随着训练样本数量的增加，训练结果的精度的提高的幅度逐渐变小，直至趋于稳定。在此，可以兼顾训练结果的精度和数据处理成本确定所需的训练样本的数量，即历史环境场景图像的数量。In this embodiment, the annotated historical environment scene images are used as training samples, and a data set is generated based on a plurality of annotated historical environment scene images. The more training samples there are in the data set, the higher the accuracy of the training result. Moreover, when the number of training samples in the data set reaches a certain value, as the number of training samples increases, the improvement in the accuracy of the training result gradually decreases until it becomes stable. Here, the number of required training samples, i.e., the number of historical environment scene images, can be determined by taking into account both the accuracy of the training result and the data processing cost.

在一个更具体的例子中，根据标注后的历史环境场景图像生成数据集的步骤，可以进一步包括：步骤431-434。In a more specific example, the step of generating a data set based on the annotated historical environment scene images may further include: steps 431-434.

步骤431、获取第一预定数量的标注后的历史环境场景图像作为第一数据集。Step 431: Acquire a first predetermined number of annotated historical environment scene images as a first data set.

该步骤中，可以是选出少量的历史环境场景图像，并对选出的历史环境场景图像进行人工标注，以标注出历史环境场景图像中目标对象所在区域的位置信息和目标对象的类别信息。In this step, a small number of historical environment scene images may be selected, and the selected historical environment scene images may be manually annotated to annotate the location information of the area where the target object is located in the historical environment scene images and the category information of the target object.

步骤432、获取第二预定数量的未标注的历史环境场景图像作为第二数据集。Step 432: Acquire a second predetermined number of unlabeled historical environment scene images as a second data set.

该步骤中，第二预定数量远大于第一预定数量。In this step, the second predetermined number is much larger than the first predetermined number.

步骤433、根据第一数据集对第二数据集中的历史环境场景图像进行聚类，获得第二数据集中每一历史环境场景图像中目标对象所在区域的位置信息和目标对象的类别信息。Step 433: cluster the historical environment scene images in the second data set according to the first data set to obtain the location information of the area where the target object is located and the category information of the target object in each historical environment scene image in the second data set.

步骤434、将第一数据集和聚类后的第二数据集作为数据集。Step 434: Use the first data set and the clustered second data set as a data set.

根据本公开实施例，能够根据少量的已标注的第一数据集对大量的未标注的第二数据集进行聚类，以确定第二数据集中每一历史环境场景图像中目标对象所在区域的位置信息和目标对象的类别信息，并将第一数据集和聚类后的第二数据集均作为数据集对识别模型进行训练。这样，可以提高获取数据集的效率，降低人工成本。进一步地，还可以提高识别模型识别的准确性。According to the disclosed embodiment, a large number of unlabeled second data sets can be clustered based on a small number of labeled first data sets to determine the location information of the area where the target object is located in each historical environment scene image in the second data set and the category information of the target object, and the first data set and the clustered second data set are used as data sets to train the recognition model. In this way, the efficiency of obtaining data sets can be improved and labor costs can be reduced. Furthermore, the accuracy of recognition model recognition can also be improved.

在一个更具体的例子中，所述识别所述环境场景图像中包含的目标对象，获得所述目标对象的空间位置信息的步骤，可以进一步包括：步骤510-530。In a more specific example, the step of identifying the target object contained in the environmental scene image and obtaining the spatial position information of the target object may further include: steps 510-530.

步骤510、基于预定的识别模型识别所述第一图像中包含的目标对象，获得所述目标对象所在区域的第一位置信息。Step 510: Identify a target object contained in the first image based on a predetermined recognition model, and obtain first position information of an area where the target object is located.

步骤520、基于预定的识别模型识别所述第二图像中包含的目标对象，获得所述目标对象所在区域的第二位置信息。Step 520: Identify the target object contained in the second image based on a predetermined recognition model, and obtain second position information of the area where the target object is located.

步骤530、根据所述第一位置信息和所述第二位置信息，确定所述目标对象的空间位置信息。Step 530: Determine the spatial position information of the target object according to the first position information and the second position information.

该例子中，头戴显示设备包括双目相机，即第一相机和第二相机。第一图像和第二图分别通过第一相机和第二相机采集的，且第一图像和第二图像是在同一时刻采集的。也就是说，第一图像和第二图像包含相同的目标对象。第一位置信息可以是在第一相机的图像坐标系下目标对象所在区域的位置信息。第二位置信息可以是在第二相机的图像坐标系下目标对象所在区域的位置信息。例如，目标对象所在区域的位置信息可以是选取窗口的一个角点的坐标和选取窗口的尺寸信息。根据选取窗口的一个角点的坐标和选取窗口的尺寸信息，可以计算出选取窗口的四个角点的坐标。进一步地，利用立体三角定位原理，根据第一位置信息和第二位置信息可以计算出在第一相机的相机坐标系或者第二相机的相机坐标系下目标对象所在区域的空间位置信息。其中，目标对象所在区域的空间位置信息可以是在相机坐标系下选取窗口的四个角点的三维坐标。根据本公开实施例，通过对目标对象所在区域的定位，实现对目标对象的定位，不需要对构成目标对象的多个特征点进行定位，从而能够提高定位的效率。In this example, the head mounted display device includes binocular cameras, namely, a first camera and a second camera. The first image and the second image are respectively acquired by the first camera and the second camera, and the first image and the second image are acquired at the same time. That is, the first image and the second image contain the same target object. The first position information may be the position information of the area where the target object is located in the image coordinate system of the first camera. The second position information may be the position information of the area where the target object is located in the image coordinate system of the second camera. For example, the position information of the area where the target object is located may be the coordinates of a corner point of the selection window and the size information of the selection window. According to the coordinates of a corner point of the selection window and the size information of the selection window, the coordinates of the four corner points of the selection window may be calculated. Further, using the principle of stereo triangulation positioning, the spatial position information of the area where the target object is located in the camera coordinate system of the first camera or the camera coordinate system of the second camera may be calculated according to the first position information and the second position information. Among them, the spatial position information of the area where the target object is located may be the three-dimensional coordinates of the four corner points of the selection window in the camera coordinate system. According to the embodiment of the present disclosure, the positioning of the target object is realized by positioning the area where the target object is located, and there is no need to position the multiple feature points constituting the target object, thereby improving the efficiency of positioning.

在识别所述环境场景图像中包含的目标对象，获得所述目标对象的空间位置信息之后，进入步骤230。After identifying the target object contained in the environmental scene image and obtaining the spatial position information of the target object, the process proceeds to step 230 .

步骤230、基于所述空间位置信息，将所述目标对象对应的描述信息叠加到所述环境场景图像上，获得混合现实图像。Step 230: Based on the spatial position information, superimpose the description information corresponding to the target object onto the environmental scene image to obtain a mixed reality image.

该实施例中，所述描述信息中至少包括目标对象的教学信息。描述信息可以是字幕，也可以是插图。将目标对象的描述信息叠加到环境场景图像上，在向用户展示目标对象的同时，也可以向用户展示该目标对象的描述信息。该实施例中，按照目标对象的描述信息，可以向用户讲解目标对象的相关内容，从而实现对目标对象的展示和教学。针对不同类别的目标对象，描述信息可以不同。例如，家居产品的描述信息可以包括家居产品的名称和使用场景等。水果对应的描述信息可以包括水果的名称、水果的产地和水果的生长环境等。交通灯的描述信息可以包括交通灯的类型、用途和使用场景等。例如，如图4所示，可以对场景中的水果添加描述信息。In this embodiment, the description information at least includes the teaching information of the target object. The description information can be a subtitle or an illustration. The description information of the target object is superimposed on the environmental scene image, and the description information of the target object can also be displayed to the user while the target object is displayed to the user. In this embodiment, according to the description information of the target object, the relevant content of the target object can be explained to the user, thereby realizing the display and teaching of the target object. The description information can be different for target objects of different categories. For example, the description information of household products can include the name of the household products and the usage scenario. The description information corresponding to the fruit can include the name of the fruit, the origin of the fruit and the growing environment of the fruit. The description information of the traffic light can include the type, purpose and usage scenario of the traffic light. For example, as shown in Figure 4, description information can be added to the fruit in the scene.

在一个实施例中，该混合现实图像的处理方法还可以包括：识别所述环境场景图像中包含的目标对象，获得所述目标对象的类别信息。根据所述目标对象的类别信息，从预先建立的数据库中选取所述目标对象对应的描述信息。In one embodiment, the mixed reality image processing method may further include: identifying a target object contained in the environment scene image, obtaining category information of the target object, and selecting description information corresponding to the target object from a pre-established database according to the category information of the target object.

在一个实施例中，基于所述空间位置信息，将所述目标对象对应的描述信息叠加到所述环境场景图像上，获得混合现实图像的步骤，可以进一步包括：步骤610-630。In one embodiment, based on the spatial position information, the description information corresponding to the target object is superimposed on the environmental scene image to obtain a mixed reality image, which may further include: steps 610-630.

步骤610、根据所述空间位置信息和预定的第一偏移量，确定所述目标对象对应的描述信息的显示位置的三维坐标信息。Step 610: Determine the three-dimensional coordinate information of the display position of the description information corresponding to the target object according to the spatial position information and a predetermined first offset.

在一个例子中，获取目标对象的选取窗口的左上角的角点的三维坐标；根据该角点的三维坐标和预定的第一偏移量，可以确定出目标对象对应的描述信息的显示位置的三维坐标信息。根据预定的第一偏移量确定描述信息的显示位置的三维坐标信息，可以保证描述信息显示在目标对象的附近，同时也可以避免描述信息对目标对象造成遮挡。In one example, the three-dimensional coordinates of the corner point of the upper left corner of the selection window of the target object are obtained; based on the three-dimensional coordinates of the corner point and a predetermined first offset, the three-dimensional coordinate information of the display position of the description information corresponding to the target object can be determined. Determining the three-dimensional coordinate information of the display position of the description information based on the predetermined first offset can ensure that the description information is displayed near the target object, and can also prevent the description information from blocking the target object.

在一个例子中，确定目标对象的描述信息的显示位置，可以是确定用于显示描述信息的信息显示窗口的位置。具体地，获取目标对象的选取窗口的左上角的角点的三维坐标；根据该角点的三维坐标和预定的第一偏移量，可以确定出信息显示窗口的左上角的角点的三维坐标信息；根据信息显示窗口的左上角的角点的三维坐标信息和信息显示窗口的尺寸，为目标对象添加描述信息。该描述信息可以是字幕，也可以是插图。In one example, determining the display position of the description information of the target object may be determining the position of the information display window for displaying the description information. Specifically, the three-dimensional coordinates of the corner point of the upper left corner of the selection window of the target object are obtained; based on the three-dimensional coordinates of the corner point and a predetermined first offset, the three-dimensional coordinate information of the corner point of the upper left corner of the information display window may be determined; based on the three-dimensional coordinate information of the corner point of the upper left corner of the information display window and the size of the information display window, description information is added to the target object. The description information may be a subtitle or an illustration.

在一个例子中，基于SLAM(Simultaneous Localization And Mapping，可译为同步定位与建图)算法，可以获取头戴显示设备相对外界环境的姿态信息。该姿态信息包括头戴显示设备在世界坐标系下的旋转矩阵R_HMD和平移向量T_HMD。根据头戴显示设备的姿态信息对描述信息的显示位置的三维坐标信息进行坐标系转换，确定描述信息的显示位置在世界坐标系下的三维坐标信息。具体地，基于如下公式(1)对描述信息的显示位置的三维坐标信息进行坐标系转换。In one example, based on the SLAM (Simultaneous Localization And Mapping) algorithm, the posture information of the head-mounted display device relative to the external environment can be obtained. The posture information includes the rotation matrix R _HMD and the translation vector T _HMD of the head-mounted display device in the world coordinate system. According to the posture information of the head-mounted display device, the three-dimensional coordinate information of the display position of the description information is transformed into a coordinate system, and the three-dimensional coordinate information of the display position of the description information in the world coordinate system is determined. Specifically, the three-dimensional coordinate information of the display position of the description information is transformed into a coordinate system based on the following formula (1).

P_w＝R_HMD*P_c+T_HMD (1) _Pw ＝ _RHMD * _Pc + _THMD (1)

其中，P_w为描述信息的显示位置在世界坐标系下的三维坐标信息，P_c为描述信息的显示位置在相机坐标系下的三维坐标信息，R_HMD为头戴显示设备在世界坐标系下的旋转矩阵，T_HMD为头戴显示设备在世界坐标系下的平移向量。Wherein, _Pw is the three-dimensional coordinate information of the display position of the description information in the world coordinate system, _Pc is the three-dimensional coordinate information of the display position of the description information in the camera coordinate system, R _HMD is the rotation matrix of the head-mounted display device in the world coordinate system, and T _HMD is the translation vector of the head-mounted display device in the world coordinate system.

步骤620、对所述三维坐标信息进行转换，获得所述显示位置在所述第一相机的图像坐标系下的第一像素坐标和所述显示位置在所述第二相机的图像坐标系下的第二像素坐标。Step 620: convert the three-dimensional coordinate information to obtain a first pixel coordinate of the display position in the image coordinate system of the first camera and a second pixel coordinate of the display position in the image coordinate system of the second camera.

在一个例子中，基于如下公式(2)-(4)计算第一像素坐标和第二像素坐标。In one example, the first pixel coordinate and the second pixel coordinate are calculated based on the following formulas (2)-(4).

P_uv1＝k₁*E*P_w (2)P _uv1 = k ₁ * E * P _w (2)

P_uv2＝k₂*E*P_w (3)P _uv2 = k ₂ * E * P _w (3)

其中，P_uv1为显示位置在所述第一相机的图像坐标系下的第一像素坐标，P_uv2为显示位置在所述第二相机的图像坐标系下的第二像素坐标，k₁为第一相机的内参矩阵，k₂为第二相机的内参矩阵，P_w为描述信息的显示位置在世界坐标系下的三维坐标信息，E为头戴显示设备的转换矩阵，R_HMD为头戴显示设备在世界坐标系下的旋转矩阵，T_HMD为头戴显示设备在世界坐标系下的平移向量。Among them, P _uv1 is the first pixel coordinate of the display position in the image coordinate system of the first camera, P _uv2 is the second pixel coordinate of the display position in the image coordinate system of the second camera, k ₁ is the intrinsic parameter matrix of the first camera, k ₂ is the intrinsic parameter matrix of the second camera, P _w is the three-dimensional coordinate information of the display position of the description information in the world coordinate system, E is the transformation matrix of the head-mounted display device, R _HMD is the rotation matrix of the head-mounted display device in the world coordinate system, and T _HMD is the translation vector of the head-mounted display device in the world coordinate system.

步骤630、根据所述第一像素坐标和所述第二像素坐标，将所述目标对象对应的描述信息叠加到所述环境场景图像上，获得混合现实图像。Step 630: Superimpose the description information corresponding to the target object onto the environment scene image according to the first pixel coordinates and the second pixel coordinates to obtain a mixed reality image.

该实施例中，头戴显示设备的双目相机可以根据获得的第一像素坐标和第二像素坐标，将描述信息叠加显示在环境场景图像上。进一步地，根据实景视频的每一环境场景图像，可以将目标对象的描述信息叠加显示在实景视频中。In this embodiment, the binocular camera of the head mounted display device can overlay the description information on the environment scene image according to the obtained first pixel coordinates and second pixel coordinates. Further, according to each environment scene image of the real scene video, the description information of the target object can be overlayed and displayed in the real scene video.

<方法实施例二><Method Example 2>

参见图3所示，说明本公开实施例提供的混合现实图像的处理方法。该方法涉及到头戴显示设备，该头戴显示设备可以是如图1所示的头戴显示设备100。该混合现实图像的处理方法包括以下步骤。Referring to Fig. 3, a method for processing a mixed reality image provided by an embodiment of the present disclosure is described. The method relates to a head mounted display device, which may be the head mounted display device 100 as shown in Fig. 1. The method for processing a mixed reality image includes the following steps.

步骤310、获取环境场景图像。Step 310: Acquire an environmental scene image.

步骤320、识别所述环境场景图像中包含的目标对象，获得所述目标对象的空间位置信息。Step 320: Identify the target object contained in the environmental scene image and obtain the spatial position information of the target object.

本实施例中，步骤310中获取环境场景图像和步骤320中识别目标对象的空间位置信息的具体过程如前述实施例所述，在此不再赘述。In this embodiment, the specific process of acquiring the environmental scene image in step 310 and identifying the spatial position information of the target object in step 320 is as described in the above embodiment and will not be repeated here.

步骤330、获取待展示的虚拟对象的初始数据集。Step 330: Obtain an initial data set of virtual objects to be displayed.

在一个实施例中，待展示的虚拟对象的初始数据集包括构建该虚拟对象的多个特征点的三维坐标信息。根据初始数据集，可以构建出虚拟对象。根据用户的实际使用场景，可以预先生成多个待展示的虚拟对象的初始数据集。可选地，待展示的虚拟对象的初始数据可以存储在头戴显示设备中，也可以存储在服务器中。在待展示的虚拟对象的初始数据集，之后，进入步骤340。In one embodiment, the initial data set of the virtual object to be displayed includes three-dimensional coordinate information of multiple feature points for constructing the virtual object. Based on the initial data set, the virtual object can be constructed. According to the actual usage scenario of the user, multiple initial data sets of virtual objects to be displayed can be pre-generated. Optionally, the initial data of the virtual object to be displayed can be stored in the head-mounted display device or in the server. After the initial data set of the virtual object to be displayed, enter step 340.

步骤340、根据所述环境场景图像中目标平面的平面参数和所述待展示的虚拟对象的初始数据集，在所述环境场景图像中渲染出所述待展示的虚拟对象。Step 340: Render the virtual object to be displayed in the environment scene image according to the plane parameters of the target plane in the environment scene image and the initial data set of the virtual object to be displayed.

该实施例中，环境场景图像中的目标平面可以是地面、桌面、平台等。基于目标平面可以生成虚拟对象。根据本公开实施例，在通过头戴显示设备进行教学的过程中，可以将虚拟对象到实景视频中，使用户可以通过头戴显示设备观看实景视频的同时可以与虚拟对象进行交互，便于用户从不同的角度观察虚拟对象，从而提高基于头戴显示设备进行教学的效果，用户体验更好。In this embodiment, the target plane in the environmental scene image can be the ground, a desktop, a platform, etc. A virtual object can be generated based on the target plane. According to the embodiment of the present disclosure, in the process of teaching through a head-mounted display device, a virtual object can be added to a real-scene video, so that a user can interact with the virtual object while watching the real-scene video through the head-mounted display device, which is convenient for the user to observe the virtual object from different angles, thereby improving the effect of teaching based on the head-mounted display device and providing a better user experience.

在一个实施例中，所述根据所述环境场景图像中目标平面的平面参数和所述待展示的虚拟对象的初始数据集，在所述环境场景图像中渲染出所述待展示的虚拟对象的步骤，可以进一步包括：步骤710-740。In one embodiment, the step of rendering the virtual object to be displayed in the environmental scene image according to the plane parameters of the target plane in the environmental scene image and the initial data set of the virtual object to be displayed may further include: steps 710-740.

步骤710、获取待展示的虚拟对象的初始数据集，所述初始数据集包括构建所述虚拟对象的多个特征点的三维坐标信息。Step 710: Acquire an initial data set of a virtual object to be displayed, wherein the initial data set includes three-dimensional coordinate information of a plurality of feature points for constructing the virtual object.

步骤720、根据所述目标平面的中心点坐标确定所述待展示的虚拟对象的放置位置，以及根据所述目标平面的法向量确定所述待展示的虚拟对象的放置方向。Step 720: Determine the placement position of the virtual object to be displayed according to the center point coordinates of the target plane, and determine the placement direction of the virtual object to be displayed according to the normal vector of the target plane.

可选地，在生成虚拟对象时，可以使虚拟对象的重心与目标平面的中心点重合，并且沿目标平面的法向量方向码放虚拟对象。根据目标平面的平面参数确定待展示的虚拟对象的放置位置和放置方向，可以是待展示的虚拟对象显示于混合现实图像的中间，且避免显示的虚拟对象倾斜。Optionally, when generating a virtual object, the center of gravity of the virtual object can be made to coincide with the center point of the target plane, and the virtual object can be stacked along the normal vector direction of the target plane. The placement position and placement direction of the virtual object to be displayed can be determined according to the plane parameters of the target plane, so that the virtual object to be displayed can be displayed in the middle of the mixed reality image, and the displayed virtual object can be prevented from being tilted.

步骤730、根据所述待展示的虚拟对象的初始数据集、所述待展示的虚拟对象的放置位置和放置方向，确定所述待展示的虚拟对象的目标数据集。Step 730: Determine a target data set of the virtual object to be displayed according to the initial data set of the virtual object to be displayed, and the placement position and placement direction of the virtual object to be displayed.

步骤740、根据所述待展示的虚拟对象的目标数据集，在所述环境场景图像中渲染出所述待展示的虚拟对象。Step 740: Render the virtual object to be displayed in the environment scene image according to the target data set of the virtual object to be displayed.

在一个例子中，虚拟对象的目标数据集包括构建所述虚拟对象的多个特征点的三维坐标信息，并且所述特点的三维坐标信息为根据目标平面的平面参数转换后的坐标信息。在所述环境场景图像中渲染出所述待展示的虚拟对象具体包括，对目标数据集中每一特征点的三维坐标信息进行转换；获得该特征点在第一相机的图像坐标系下第三像素坐标和该特征点在第二相机的图像坐标系下第四像素坐标；头戴显示设备的双目相机根据目标数据集中每一特征点的第三像素坐标和第四像素坐标，在所述环境场景图像中渲染出所述待展示的虚拟对象。In one example, the target data set of the virtual object includes three-dimensional coordinate information of multiple feature points for constructing the virtual object, and the three-dimensional coordinate information of the feature points is coordinate information converted according to the plane parameters of the target plane. Rendering the virtual object to be displayed in the environmental scene image specifically includes converting the three-dimensional coordinate information of each feature point in the target data set; obtaining the third pixel coordinate of the feature point in the image coordinate system of the first camera and the fourth pixel coordinate of the feature point in the image coordinate system of the second camera; the binocular camera of the head mounted display device renders the virtual object to be displayed in the environmental scene image according to the third pixel coordinate and the fourth pixel coordinate of each feature point in the target data set.

在一个实施例中，该混合现实图像的处理方法还可以包括确定所述环境场景图像中目标平面的平面参数的步骤。具体地，可以进一步包括步骤710-740。In one embodiment, the method for processing mixed reality images may further include the step of determining the plane parameters of the target plane in the environment scene image. Specifically, it may further include steps 710-740.

步骤710、对所述环境场景图像进行特征点提取。Step 710: extract feature points from the environment scene image.

该步骤中，该环境场景图像包括通过双目相机中的第一相机采集的第一图像和通过双目相机中的第二相机采集的第二图像。对第一图像进行特征点提取，获取多个第一特征点。对图像进行特征点提取，可以采用FAST(Feature from Accelerated Segment Test，加速分割测试特征)角点检测算法、SIFT(Scale Invariant Feature Transform，尺度不变特征变换)算法、或者ORB(Oriented FAST and Rotated BRIEF，快速特征点提取和描述)算法等，在此不做任何限定。In this step, the environmental scene image includes a first image captured by a first camera in a binocular camera and a second image captured by a second camera in a binocular camera. Feature points are extracted from the first image to obtain a plurality of first feature points. Feature points of the image can be extracted by using a FAST (Feature from Accelerated Segment Test) corner detection algorithm, a SIFT (Scale Invariant Feature Transform) algorithm, or an ORB (Oriented FAST and Rotated BRIEF) algorithm, etc., without any limitation here.

步骤720、根据提取到的特征点构建特征点云。Step 720: construct a feature point cloud based on the extracted feature points.

在一个例子中，特征点云可以包括多个第一特征点在世界坐标系下的三维坐标信息。具体地，采用极线匹配方法在第二图像中做特征点匹配，获取第二图像中与多个第一特征点匹配的多个第二特征点。通过三角化，根据第一特征点和第二特征点，计算第一相机坐标系下多个第一特征点的三维坐标信息。根据采集到的当前帧的环境场景图像时头戴显示设备的位姿进行坐标系转换，获得多个第一特征点在世界坐标系下的三维坐标信息，即获得特征点云。In one example, the feature point cloud may include three-dimensional coordinate information of multiple first feature points in a world coordinate system. Specifically, the epipolar matching method is used to match feature points in the second image, and multiple second feature points in the second image that match the multiple first feature points are obtained. Through triangulation, the three-dimensional coordinate information of the multiple first feature points in the first camera coordinate system is calculated based on the first feature points and the second feature points. The coordinate system is converted according to the posture of the head-mounted display device when the environmental scene image of the current frame is collected, and the three-dimensional coordinate information of the multiple first feature points in the world coordinate system is obtained, that is, the feature point cloud is obtained.

步骤730、基于所述特征点云对所述环境场景图像进行平面检测，确定目标平面。Step 730: Perform plane detection on the environment scene image based on the feature point cloud to determine the target plane.

在一个例子中，根据从特征点云中随机抽取的三个特征点，确定当前平面；获取当前平面的法向量和第一内点的数量，在当前平面的法向量和第一内点的数量满足预定条件时，将当前平面确定当前有效平面。第一内点为与当前平面的距离小于第一预定距离的特征点。在确定多个当前有效平面之后，选取第一内点数量最多的平面为目标平面。In one example, the current plane is determined based on three feature points randomly extracted from the feature point cloud; the normal vector of the current plane and the number of first inner points are obtained, and when the normal vector of the current plane and the number of first inner points meet a predetermined condition, the current plane is determined as the current valid plane. The first inner point is a feature point whose distance from the current plane is less than a first predetermined distance. After determining multiple current valid planes, the plane with the largest number of first inner points is selected as the target plane.

步骤740、获取所述目标平面的平面参数，所述平面参数包括中心点坐标和法向量。Step 740: Obtain plane parameters of the target plane, wherein the plane parameters include center point coordinates and a normal vector.

下面以一个具体的例子对在环境场景图像中渲染出待展示的虚拟对象的确定进行说明。其中，待展示的虚拟对象为恐龙模型。The following is a specific example to illustrate the determination of rendering a virtual object to be displayed in an environmental scene image, wherein the virtual object to be displayed is a dinosaur model.

步骤901、获取环境场景图像。Step 901: Acquire an environmental scene image.

步骤902、确定所述环境场景图像中目标平面的平面参数。平面参数包括目标平面的中心点三维坐标P_cplane和法向量V_cplane。Step 902: Determine the plane parameters of the target plane in the environment scene image. The plane parameters include the three-dimensional coordinates P _cplane of the center point of the target plane and the normal vector V _cplane .

步骤903、基于如下公式(5)和(6)，根据头戴显示设备的姿态信息对目标平面的中心点三维坐标P_cplane和法向量V_cplane的进行坐标系转换。Step 903: Based on the following formulas (5) and (6), coordinate system transformation is performed on the three-dimensional coordinates P _cplane of the center point of the target plane and the normal vector V _cplane according to the posture information of the head mounted display device.

P_wplane＝R_HMD*P_cplane+T_HMD (5)P _wplane ＝R _HMD *P _cplane +T _HMD (5)

V_wplane＝R_HMD*V_cplane+T_HMD (6)V _wplane ＝R _HMD *V _cplane +T _HMD (6)

其中，P_wplane为目标平面的中心点在世界坐标系下的三维坐标信息，P_cplane为目标平面的中心点在相机坐标系下的三维坐标信息，W_wplane为在世界坐标系下目标平面的法向量，V_cplane为在相机坐标系下目标平面的法向量，R_HMD为头戴显示设备在世界坐标系下的旋转矩阵，T_HMD为头戴显示设备在世界坐标系下的平移向量。Among them, P _wplane is the three-dimensional coordinate information of the center point of the target plane in the world coordinate system, P _cplane is the three-dimensional coordinate information of the center point of the target plane in the camera coordinate system, W _wplane is the normal vector of the target plane in the world coordinate system, V _cplane is the normal vector of the target plane in the camera coordinate system, R _HMD is the rotation matrix of the head-mounted display device in the world coordinate system, and T _HMD is the translation vector of the head-mounted display device in the world coordinate system.

步骤904、获取待展示的恐龙模型的初始数据集；Step 904: obtaining an initial data set of the dinosaur model to be displayed;

步骤905、根据待展示的恐龙模型的初始数据集、在世界坐标系下目标平面的中心点三维坐标P_wplane和法向量V_wplane，确定所述待展示的虚拟对象的目标数据集。Step 905 : Determine the target data set of the virtual object to be displayed according to the initial data set of the dinosaur model to be displayed, the three-dimensional coordinates P _wplane of the center point of the target plane in the world coordinate system, and the normal vector V _wplane .

步骤906、基于如下公式(7)-(9)，对恐龙模型的目标数据集中每一特征点的三维坐标信息进行转换；获得该特征点在第一相机的图像坐标系下第三像素坐标和该特征点在第二相机的图像坐标系下第四像素坐标。Step 906: Based on the following formulas (7)-(9), the three-dimensional coordinate information of each feature point in the target data set of the dinosaur model is converted to obtain the third pixel coordinate of the feature point in the image coordinate system of the first camera and the fourth pixel coordinate of the feature point in the image coordinate system of the second camera.

P_uv3＝k₁*E*P_wdinosant (7)P _uv3 ＝k ₁ *E*P _wdinosant (7)

P_uv4＝k₂*E*P_wdinosant (8)P _uv4 ＝k ₂ *E*P _wdinosant (8)

其中，P_wdinosant为目标数据集中任一特征点在世界坐标系下的三维坐标信息，P_uv3为目标数据集中任一特征点在第一相机的图像坐标系下的第三像素坐标，P_uv4为目标数据集中任一特征点在第一相机的图像坐标系下的第三像素坐标，k₁为第一相机的内参矩阵，k₂为第二相机的内参矩阵，E为头戴显示设备的转换矩阵，R_HMD为头戴显示设备在世界坐标系下的旋转矩阵，T_HMD为头戴显示设备在世界坐标系下的平移向量。Among them, P _wdinosant is the three-dimensional coordinate information of any feature point in the target data set in the world coordinate system, P _uv3 is the third pixel coordinate of any feature point in the target data set in the image coordinate system of the first camera, P _uv4 is the third pixel coordinate of any feature point in the target data set in the image coordinate system of the first camera, k ₁ is the intrinsic parameter matrix of the first camera, k ₂ is the intrinsic parameter matrix of the second camera, E is the transformation matrix of the head-mounted display device, R _HMD is the rotation matrix of the head-mounted display device in the world coordinate system, and T _HMD is the translation vector of the head-mounted display device in the world coordinate system.

步骤907、根据恐龙模型的目标数据集中每一特征点的第三像素坐标和第四像素坐标，在所述环境场景图像中渲染出所述待展示的恐龙模型。Step 907: Render the dinosaur model to be displayed in the environment scene image according to the third pixel coordinates and the fourth pixel coordinates of each feature point in the target data set of the dinosaur model.

在所述环境场景图像中渲染出所述待展示的虚拟对象，之后，进入步骤350。The virtual object to be displayed is rendered in the environment scene image, and then the process proceeds to step 350 .

步骤350、基于所述空间位置信息和所述平面参数，将所述目标对象对应的描述信息、以及所述待展示的虚拟对象对应的描述信息叠加到所述环境场景图像上，获得混合现实图像。Step 350: Based on the spatial position information and the plane parameters, the description information corresponding to the target object and the description information corresponding to the virtual object to be displayed are superimposed on the environmental scene image to obtain a mixed reality image.

在一个实施例中，所述基于所述空间位置信息和所述平面参数，将所述目标对象对应的描述信息、以及所述待展示的虚拟对象对应的描述信息叠加到所述环境场景图像上，获得混合现实图像的步骤，可以进一步包括：步骤810-830。In one embodiment, the step of superimposing the description information corresponding to the target object and the description information corresponding to the virtual object to be displayed on the environmental scene image based on the spatial position information and the plane parameters to obtain a mixed reality image may further include: steps 810-830.

步骤810、根据所述空间位置信息和预定的第一偏移量，确定所述目标对象对应的描述信息的显示位置的第一位置信息。Step 810: Determine first position information of a display position of description information corresponding to the target object according to the spatial position information and a predetermined first offset.

目标对象对应的描述信息中至少包括目标对象的教学信息。将目标对象的描述信息叠加到环境场景图像上，在向用户展示目标对象的同时，也可以向用户展示该目标对象的相关教学信息。在一个实施例中，识别所述环境场景图像中包含的目标对象，获得所述目标对象的类别信息。根据所述目标对象的类别信息，从预先建立的数据库中选取所述目标对象对应的描述信息。第一位置信息的确定过程如前述实施例所述，在此不再赘述。The description information corresponding to the target object at least includes the teaching information of the target object. The description information of the target object is superimposed on the environmental scene image, and while the target object is displayed to the user, the relevant teaching information of the target object can also be displayed to the user. In one embodiment, the target object contained in the environmental scene image is identified to obtain the category information of the target object. According to the category information of the target object, the description information corresponding to the target object is selected from a pre-established database. The process of determining the first position information is as described in the above embodiment, and will not be repeated here.

步骤820、根据所述目标平面的中心点坐标和预定的第二偏移量，确定所述待展示的虚拟对象对应的描述信息的显示位置的第二位置信息。Step 820: Determine second position information of a display position of the description information corresponding to the virtual object to be displayed according to the center point coordinates of the target plane and a predetermined second offset.

待展示的虚拟对象对应的描述信息中至少包括虚拟对象的教学信息。描述信息可以是字幕，也可以是插图。在环境场景图像中渲染出所述待展示的虚拟对象之后，可以将虚拟对象的描述信息叠加到环境场景图像上，从而在向用户展示虚拟对象的同时，也可以向用户展示该虚拟对象的相关教学信息。该实施例中，按照虚拟对象的描述信息，可以向用户讲解目标对象的相关内容，从而实现对虚拟对象的展示和教学。The description information corresponding to the virtual object to be displayed includes at least the teaching information of the virtual object. The description information may be a subtitle or an illustration. After the virtual object to be displayed is rendered in the environmental scene image, the description information of the virtual object may be superimposed on the environmental scene image, so that while the virtual object is displayed to the user, the relevant teaching information of the virtual object may also be displayed to the user. In this embodiment, according to the description information of the virtual object, the relevant content of the target object may be explained to the user, thereby realizing the display and teaching of the virtual object.

步骤830、根据所述第一位置信息和所述第二位置信息，将所述目标对象对应的描述信息、以及所述待展示的虚拟对象对应的描述信息叠加到所述环境场景图像上，获得混合现实图像。Step 830: Based on the first position information and the second position information, superimpose the description information corresponding to the target object and the description information corresponding to the virtual object to be displayed onto the environmental scene image to obtain a mixed reality image.

根据预定的第一偏移量确定目标对象描述信息的显示位置的第一位置信息，根据预定的第二偏移量确定虚拟对象的描述信息的显示位置的第二位置信息，可以保证在目标对象和虚拟对象附近显示对应的描述信息的同时，可以避免描述信息对遮挡目标对象或者虚拟对象。By determining the first position information of the display position of the target object description information according to a predetermined first offset, and determining the second position information of the display position of the virtual object description information according to a predetermined second offset, it can be ensured that while the corresponding description information is displayed near the target object and the virtual object, the description information can be prevented from obstructing the target object or the virtual object.

根据本公开实施例，在通过头戴显示设备进行教学的过程中，可以将虚拟对象到实景视频中，使用户可以通过头戴显示设备观看实景视频的同时可以与虚拟对象进行交互，便于用户从不同的角度观察虚拟对象，从而提高基于头戴显示设备进行教学的效果，用户体验更好。此外，在通过头戴显示设备进行教学的过程中，可以将目标对象和虚拟对象对应的描述信息融入到实景视频中，使用户可以通过头戴显示设备观看实景视频的同时，获得目标对象和虚拟对象对应的教学信息，用户体验更好。According to the disclosed embodiment, in the process of teaching through a head-mounted display device, a virtual object can be added to a real-scene video, so that the user can watch the real-scene video through the head-mounted display device while interacting with the virtual object, which is convenient for the user to observe the virtual object from different angles, thereby improving the effect of teaching based on the head-mounted display device and providing a better user experience. In addition, in the process of teaching through a head-mounted display device, the description information corresponding to the target object and the virtual object can be integrated into the real-scene video, so that the user can watch the real-scene video through the head-mounted display device while obtaining the teaching information corresponding to the target object and the virtual object, which provides a better user experience.

<装置实施例一><Device Example 1>

参见图5，本公开实施例提供了混合现实图像的处理装置50，该混合现实图像的处理装置50包括获取模块51、识别模块52和图像生成模块53。5 , an embodiment of the present disclosure provides a mixed reality image processing device 50 , wherein the mixed reality image processing device 50 includes an acquisition module 51 , a recognition module 52 , and an image generation module 53 .

该获取模块51可以用于获取环境场景图像。The acquisition module 51 can be used to acquire an environment scene image.

该识别模块52可以用于识别所述环境场景图像中包含的目标对象，获得所述目标对象的空间位置信息。The recognition module 52 can be used to recognize the target object contained in the environmental scene image and obtain the spatial position information of the target object.

在一个具体的例子中，该识别模块52具体用于基于预定的识别模型识别所述第一图像中包含的目标对象，获得所述目标对象所在区域的第一位置信息；基于预定的识别模型识别所述第二图像中包含的目标对象，获得所述目标对象所在区域的第二位置信息；以及根据所述第一位置信息和所述第二位置信息，确定所述目标对象的空间位置信息。In a specific example, the recognition module 52 is specifically used to identify the target object contained in the first image based on a predetermined recognition model, and obtain first position information of the area where the target object is located; identify the target object contained in the second image based on a predetermined recognition model, and obtain second position information of the area where the target object is located; and determine the spatial position information of the target object based on the first position information and the second position information.

在一个具体的例子中，该识别模块52具体用于识别所述环境场景图像中包含的目标对象，获得所述目标对象的类别信息。In a specific example, the recognition module 52 is specifically used to recognize the target object contained in the environmental scene image and obtain the category information of the target object.

该图像生成模块53可以用于基于所述空间位置信息，将所述目标对象对应的描述信息叠加到所述环境场景图像上，获得混合现实图像。The image generation module 53 can be used to superimpose the description information corresponding to the target object onto the environmental scene image based on the spatial position information to obtain a mixed reality image.

在一个实施例中，所述描述信息中至少包括所述目标对象的教学信息。In one embodiment, the description information at least includes teaching information of the target object.

在一个具体的例子中，该图像生成模块53具体用于根据所述空间位置信息和预定的第一偏移量，确定所述目标对象对应的描述信息的显示位置的三维坐标信息；In a specific example, the image generation module 53 is specifically used to determine the three-dimensional coordinate information of the display position of the description information corresponding to the target object according to the spatial position information and the predetermined first offset;

在一个实施例中，该混合现实图像的处理装置50还可以包括描述信息获取模块，该描述信息获取模块可以用于根据所述目标对象的类别信息，从预先建立的数据库中选取所述目标对象对应的描述信息。In one embodiment, the mixed reality image processing device 50 may further include a description information acquisition module, which may be used to select description information corresponding to the target object from a pre-established database according to the category information of the target object.

在一个实施例中，该混合现实图像的处理装置50还可以包括初始数据集获取模块、虚拟对象生成模块。In one embodiment, the mixed reality image processing device 50 may further include an initial data set acquisition module and a virtual object generation module.

在该实施例中，该初始数据集获取模块可以用于获取待展示的虚拟对象的初始数据集。In this embodiment, the initial data set acquisition module can be used to acquire an initial data set of a virtual object to be displayed.

该虚拟对象生成模块可以用于根据所述环境场景图像中目标平面的平面参数和所述待展示的虚拟对象的初始数据集，在所述环境场景图像中渲染出所述待展示的虚拟对象。The virtual object generation module can be used to render the virtual object to be displayed in the environmental scene image according to the plane parameters of the target plane in the environmental scene image and the initial data set of the virtual object to be displayed.

在一个具体的例子中，该虚拟对象生成模块具体用于获取待展示的虚拟对象的初始数据集，所述初始数据集包括构建所述虚拟对象的多个特征点的三维坐标信息；根据所述目标平面的中心点坐标确定所述待展示的虚拟对象的放置位置，以及根据所述目标平面的法向量确定所述待展示的虚拟对象的放置方向；根据所述待展示的虚拟对象的初始数据集、所述待展示的虚拟对象的放置位置和放置方向，确定所述待展示的虚拟对象的目标数据集；以及根据所述待展示的虚拟对象的目标数据集，在所述环境场景图像中渲染出所述待展示的虚拟对象。In a specific example, the virtual object generation module is specifically used to obtain an initial data set of a virtual object to be displayed, the initial data set including three-dimensional coordinate information of multiple feature points for constructing the virtual object; determine the placement position of the virtual object to be displayed according to the center point coordinates of the target plane, and determine the placement direction of the virtual object to be displayed according to the normal vector of the target plane; determine the target data set of the virtual object to be displayed according to the initial data set of the virtual object to be displayed, the placement position and placement direction of the virtual object to be displayed; and render the virtual object to be displayed in the environmental scene image according to the target data set of the virtual object to be displayed.

该图像生成模块53还可以用于基于所述空间位置信息和所述平面参数，将所述目标对象对应的描述信息、以及所述待展示的虚拟对象对应的描述信息叠加到所述环境场景图像上，获得混合现实图像。The image generation module 53 can also be used to superimpose the description information corresponding to the target object and the description information corresponding to the virtual object to be displayed on the environmental scene image based on the spatial position information and the plane parameters to obtain a mixed reality image.

在一个具体的例子中，该图像生成模块53具体用于根据所述空间位置信息和预定的第一偏移量，确定所述目标对象对应的描述信息的显示位置的第一位置信息；根据所述目标平面的中心点坐标和预定的第二偏移量，确定所述待展示的虚拟对象对应的描述信息的显示位置的第二位置信息；以及根据所述第一位置信息和所述第二位置信息，将所述目标对象对应的描述信息、以及所述待展示的虚拟对象对应的描述信息叠加到所述环境场景图像上，获得混合现实图像。In a specific example, the image generation module 53 is specifically used to determine the first position information of the display position of the description information corresponding to the target object based on the spatial position information and a predetermined first offset; determine the second position information of the display position of the description information corresponding to the virtual object to be displayed based on the center point coordinates of the target plane and a predetermined second offset; and superimpose the description information corresponding to the target object and the description information corresponding to the virtual object to be displayed on the environmental scene image based on the first position information and the second position information to obtain a mixed reality image.

参见图6，本公开实施例提供了混合现实图像的处理装置60，该混合现实图像的处理装置60包括处理器61和存储器62。存储器62用于存储计算机程序，计算机程序被处理器61执行时实现前述任一实施例公开的混合现实图像的处理方法。6 , the embodiment of the present disclosure provides a mixed reality image processing device 60, which includes a processor 61 and a memory 62. The memory 62 is used to store a computer program, and when the computer program is executed by the processor 61, the mixed reality image processing method disclosed in any of the above embodiments is implemented.

<装置实施例二><Device Example 2>

参见图7，本公开的实施例提供了一种头戴显示设备70，该头戴显示设备可以是如图1所示的头戴显示设备100。该头戴显示设备70包括双目相机71和图像处理装置72。7 , an embodiment of the present disclosure provides a head mounted display device 70 , which may be the head mounted display device 100 as shown in FIG. 1 . The head mounted display device 70 includes a binocular camera 71 and an image processing device 72 .

在一个实施例中，该图像处理装置72例如可以是如图5所示的混合现实图像的处理装置50，也可以是如图6所示的混合现实图像的处理装置60。In one embodiment, the image processing device 72 may be, for example, the mixed reality image processing device 50 as shown in FIG. 5 , or may be the mixed reality image processing device 60 as shown in FIG. 6 .

本说明书中的各个实施例均采用递进的方式描述，各个实施例之间相同相似的部分相互参见即可，每个实施例重点说明的都是与其他实施例的不同之处，但本领域技术人员应当清楚的是，上述各实施例可以根据需要单独使用或者相互结合使用。另外，对于装置实施例而言，由于其是与方法实施例相对应，所以描述得比较简单，相关之处参见方法实施例的对应部分的说明即可。以上所描述的系统实施例仅仅是示意性的，其中作为分离部件说明的模块可以是或者也可以不是物理上分开的。Each embodiment in this specification is described in a progressive manner, and the same or similar parts between the embodiments can be referred to each other. Each embodiment focuses on the differences from other embodiments, but it should be clear to those skilled in the art that the above embodiments can be used alone or in combination with each other as needed. In addition, for the device embodiment, since it corresponds to the method embodiment, the description is relatively simple, and the relevant parts can be referred to the description of the corresponding part of the method embodiment. The system embodiment described above is merely illustrative, and the modules described as separate components may or may not be physically separated.

本发明可以是系统、方法和/或计算机程序产品。计算机程序产品可以包括计算机可读存储介质，其上载有用于使处理器实现本发明的各个方面的计算机可读程序指令。The present invention may be a system, a method and/or a computer program product. The computer program product may include a computer-readable storage medium carrying computer-readable program instructions for causing a processor to implement various aspects of the present invention.

计算机可读存储介质可以是可以保持和存储由指令执行设备使用的指令的有形设备。计算机可读存储介质例如可以是――但不限于――电存储设备、磁存储设备、光存储设备、电磁存储设备、半导体存储设备或者上述的任意合适的组合。计算机可读存储介质的更具体的例子(非穷举的列表)包括：便携式计算机盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、静态随机存取存储器(SRAM)、便携式压缩盘只读存储器(CD-ROM)、数字多功能盘(DVD)、记忆棒、软盘、机械编码设备、例如其上存储有指令的打孔卡或凹槽内凸起结构、以及上述的任意合适的组合。这里所使用的计算机可读存储介质不被解释为瞬时信号本身，诸如无线电波或者其他自由传播的电磁波、通过波导或其他传输媒介传播的电磁波(例如，通过光纤电缆的光脉冲)、或者通过电线传输的电信号。A computer-readable storage medium may be a tangible device that can hold and store instructions used by an instruction execution device. A computer-readable storage medium may be, for example, but not limited to, an electrical storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing. More specific examples of computer-readable storage media (a non-exhaustive list) include: a portable computer disk, a hard disk, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a static random access memory (SRAM), a portable compact disk read-only memory (CD-ROM), a digital versatile disk (DVD), a memory stick, a floppy disk, a mechanical encoding device, such as a punch card or a raised structure in a groove on which instructions are stored, and any suitable combination of the foregoing. As used herein, a computer-readable storage medium is not to be interpreted as a transient signal per se, such as a radio wave or other freely propagating electromagnetic wave, an electromagnetic wave propagating through a waveguide or other transmission medium (e.g., a light pulse through a fiber optic cable), or an electrical signal transmitted through a wire.

这里所描述的计算机可读程序指令可以从计算机可读存储介质下载到各个计算/处理设备，或者通过网络、例如因特网、局域网、广域网和/或无线网下载到外部计算机或外部存储设备。网络可以包括铜传输电缆、光纤传输、无线传输、路由器、防火墙、交换机、网关计算机和/或边缘服务器。每个计算/处理设备中的网络适配卡或者网络接口从网络接收计算机可读程序指令，并转发该计算机可读程序指令，以供存储在各个计算/处理设备中的计算机可读存储介质中。The computer-readable program instructions described herein can be downloaded from a computer-readable storage medium to each computing/processing device, or downloaded to an external computer or external storage device via a network, such as the Internet, a local area network, a wide area network, and/or a wireless network. The network can include copper transmission cables, optical fiber transmissions, wireless transmissions, routers, firewalls, switches, gateway computers, and/or edge servers. The network adapter card or network interface in each computing/processing device receives the computer-readable program instructions from the network and forwards the computer-readable program instructions for storage in the computer-readable storage medium in each computing/processing device.

用于执行本发明操作的计算机程序指令可以是汇编指令、指令集架构(ISA)指令、机器指令、机器相关指令、微代码、固件指令、状态设置数据、或者以一种或多种编程语言的任意组合编写的源代码或目标代码，所述编程语言包括面向对象的编程语言—诸如Smalltalk、C++等，以及常规的过程式编程语言—诸如“如“语言或类似的编程语言。计算机可读程序指令可以完全地在用户计算机上执行、部分地在用户计算机上执行、作为一个独立的软件包执行、部分在用户计算机上部分在远程计算机上执行、或者完全在远程计算机或服务器上执行。在涉及远程计算机的情形中，远程计算机可以通过任意种类的网络—包括局域网(LAN)或广域网(WAN)网连接到用户计算机，或者，可以连接到外部计算机(例如利用因特网服务提供商来通过因特网连接)。在一些实施例中，通过利用计算机可读程序指令的状态信息来个性化定制电子电路，例如可编程逻辑电路、现场可编程门阵列(FPGA)或可编程逻辑阵列(PLA)，该电子电路可以执行计算机可读程序指令，从而实现本发明的各个方面。The computer program instructions for performing the operation of the present invention may be assembler instructions, instruction set architecture (ISA) instructions, machine instructions, machine-dependent instructions, microcode, firmware instructions, state setting data, or source code or object code written in any combination of one or more programming languages, including object-oriented programming languages, such as Smalltalk, C++, etc., and conventional procedural programming languages, such as "Like" languages or similar programming languages. The computer-readable program instructions may be executed entirely on the user's computer, partially on the user's computer, as a separate software package, partially on the user's computer, partially on a remote computer, or entirely on a remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer via any type of network, including a local area network (LAN) or a wide area network (WAN), or may be connected to an external computer (e.g., via the Internet using an Internet service provider). In some embodiments, the electronic circuit, such as a programmable logic circuit, a field programmable gate array (FPGA), or a programmable logic array (PLA), may be executed by using the state information of the computer-readable program instructions to personalize and customize the electronic circuit, thereby implementing various aspects of the present invention.

这里参照根据本发明实施例的方法、装置(系统)和计算机程序产品的流程图和/或框图描述了本发明的各个方面。应当理解，流程图和/或框图的每个方框以及流程图和/或框图中各方框的组合，都可以由计算机可读程序指令实现。Various aspects of the present invention are described herein with reference to the flow charts and/or block diagrams of the methods, devices (systems) and computer program products according to embodiments of the present invention. It should be understood that each box of the flow chart and/or block diagram and the combination of each box in the flow chart and/or block diagram can be implemented by computer-readable program instructions.

这些计算机可读程序指令可以提供给通用计算机、专用计算机或其它可编程数据处理装置的处理器，从而生产出一种机器，使得这些指令在通过计算机或其它可编程数据处理装置的处理器执行时，产生了实现流程图和/或框图中的一个或多个方框中规定的功能/动作的装置。也可以把这些计算机可读程序指令存储在计算机可读存储介质中，这些指令使得计算机、可编程数据处理装置和/或其他设备以特定方式工作，从而，存储有指令的计算机可读介质则包括一个制造品，其包括实现流程图和/或框图中的一个或多个方框中规定的功能/动作的各个方面的指令。These computer-readable program instructions can be provided to a processor of a general-purpose computer, a special-purpose computer, or other programmable data processing device, thereby producing a machine, so that when these instructions are executed by the processor of the computer or other programmable data processing device, a device that implements the functions/actions specified in one or more boxes in the flowchart and/or block diagram is generated. These computer-readable program instructions can also be stored in a computer-readable storage medium, and these instructions cause the computer, programmable data processing device, and/or other equipment to work in a specific manner, so that the computer-readable medium storing the instructions includes a manufactured product, which includes instructions for implementing various aspects of the functions/actions specified in one or more boxes in the flowchart and/or block diagram.

也可以把计算机可读程序指令加载到计算机、其它可编程数据处理装置、或其它设备上，使得在计算机、其它可编程数据处理装置或其它设备上执行一系列操作步骤，以产生计算机实现的过程，从而使得在计算机、其它可编程数据处理装置、或其它设备上执行的指令实现流程图和/或框图中的一个或多个方框中规定的功能/动作。Computer-readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other device so that a series of operating steps are performed on the computer, other programmable data processing apparatus, or other device to produce a computer-implemented process, thereby causing the instructions executed on the computer, other programmable data processing apparatus, or other device to implement the functions/actions specified in one or more boxes in the flowchart and/or block diagram.

附图中的流程图和框图显示了根据本发明的多个实施例的系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上，流程图或框图中的每个方框可以代表一个模块、程序段或指令的一部分，所述模块、程序段或指令的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。在有些作为替换的实现中，方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如，两个连续的方框实际上可以基本并行地执行，它们有时也可以按相反的顺序执行，这依所涉及的功能而定。也要注意的是，框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合，可以用执行规定的功能或动作的专用的基于硬件的系统来实现，或者可以用专用硬件与计算机指令的组合来实现。对于本领域技术人员来说公知的是，通过硬件方式实现、通过软件方式实现以及通过软件和硬件结合的方式实现都是等价的。The flow charts and block diagrams in the accompanying drawings show the possible architecture, functions and operations of the systems, methods and computer program products according to multiple embodiments of the present invention. In this regard, each box in the flow chart or block diagram can represent a part of a module, a program segment or an instruction, and a part of the module, a program segment or an instruction contains one or more executable instructions for realizing the specified logical function. In some alternative implementations, the functions marked in the box can also occur in a different order from the order marked in the accompanying drawings. For example, two consecutive boxes can actually be executed substantially in parallel, and they can sometimes be executed in the opposite order, depending on the functions involved. It should also be noted that each box in the block diagram and/or flow chart, and the combination of the boxes in the block diagram and/or flow chart can be implemented by a dedicated hardware-based system that performs the specified function or action, or can be implemented by a combination of dedicated hardware and computer instructions. It is well known to those skilled in the art that it is equivalent to implement it by hardware, implement it by software, and implement it by combining software and hardware.

以上已经描述了本发明的各实施例，上述说明是示例性的，并非穷尽性的，并且也不限于所披露的各实施例。在不偏离所说明的各实施例的范围和精神的情况下，对于本技术领域的普通技术人员来说许多修改和变更都是显而易见的。本文中所用术语的选择，旨在最好地解释各实施例的原理、实际应用或对市场中的技术改进，或者使本技术领域的其它普通技术人员能理解本文披露的各实施例。本发明的范围由所附权利要求来限定。Embodiments of the present invention have been described above, and the above description is exemplary, not exhaustive, and is not limited to the disclosed embodiments. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments. The selection of terms used herein is intended to best explain the principles of the embodiments, practical applications, or technical improvements in the marketplace, or to enable other persons of ordinary skill in the art to understand the embodiments disclosed herein. The scope of the present invention is defined by the appended claims.

Claims

1. A processing method of a mixed reality image, which is applied to a head-mounted display device, and is characterized in that the head-mounted display device comprises a first camera and a second camera; the method comprises the following steps:

acquiring an environmental scene image;

Identifying a target object contained in the environment scene image, and obtaining space position information and category information of the target object;

selecting description information corresponding to the target object from a pre-established database according to the category information of the target object;

Based on the spatial position information, overlapping the description information corresponding to the target object on the environment scene image to obtain a mixed reality image, wherein the method comprises the following steps:

According to the space position information and a preset first offset, determining three-dimensional coordinate information of a display position of the descriptive information corresponding to the target object;

Converting the three-dimensional coordinate information based on the posture information of the head-mounted display device to obtain a first pixel coordinate of the display position under the image coordinate system of the first camera and a second pixel coordinate of the display position under the image coordinate system of the second camera; wherein the gesture information comprises a rotation matrix and a translation vector of the head-mounted display device in a world coordinate system;

According to the first pixel coordinates and the second pixel coordinates, the description information corresponding to the target object is overlapped on the environment scene image, and a mixed reality image is obtained;

Wherein, after the step of identifying the target object contained in the environmental scene image and obtaining the spatial position information and the category information of the target object, the method further comprises:

Acquiring an initial data set of a virtual object to be displayed;

rendering the virtual object to be displayed in the environment scene image according to the plane parameter of the target plane in the environment scene image and the initial data set of the virtual object to be displayed;

And based on the spatial position information and the plane parameter, overlapping the description information corresponding to the target object and the description information corresponding to the virtual object to be displayed on the environment scene image to obtain a mixed reality image.

2. The method of claim 1, wherein the environmental scene image comprises a first image and a second image acquired by a binocular camera of the head mounted display device;

The step of identifying a target object contained in the environmental scene image and obtaining spatial position information of the target object includes:

Identifying a target object contained in the first image based on a preset identification model, and obtaining first position information of an area where the target object is located;

Identifying a target object contained in the second image based on a preset identification model, and obtaining second position information of an area where the target object is located;

And determining the spatial position information of the target object according to the first position information and the second position information.

3. The method of claim 2, wherein the method further comprises: training a predetermined recognition model:

acquiring a historical environmental scene image;

determining the region of a target object in the historical environment scene image, and marking the position information of the region of the target object and the category information of the target object;

generating a data set according to the marked historical environmental scene image;

training the recognition model according to the data set.

4. The method of claim 2, wherein the method further comprises: determining plane parameters of a target plane in the environmental scene image:

extracting feature points from the environmental scene image;

constructing a characteristic point cloud according to the extracted characteristic points;

performing plane detection on the environmental scene image based on the characteristic point cloud, and determining a target plane;

and obtaining plane parameters of the target plane, wherein the plane parameters comprise coordinates of a central point and normal vectors.

5. The method of claim 4, wherein the rendering the virtual object to be presented in the environmental scene image based on the plane parameters of the target plane in the environmental scene image and the initial data set of the virtual object to be presented comprises:

acquiring an initial data set of a virtual object to be displayed, wherein the initial data set comprises three-dimensional coordinate information of a plurality of feature points for constructing the virtual object;

determining the placement position of the virtual object to be displayed according to the center point coordinates of the target plane, and determining the placement direction of the virtual object to be displayed according to the normal vector of the target plane;

determining a target data set of the virtual object to be displayed according to the initial data set of the virtual object to be displayed, the placement position and the placement direction of the virtual object to be displayed;

and rendering the virtual object to be displayed in the environment scene image according to the target data set of the virtual object to be displayed.

6. The method of claim 5, wherein the step of superimposing the description information corresponding to the target object and the description information corresponding to the virtual object to be displayed on the environmental scene image based on the spatial location information and the plane parameter, and obtaining a mixed reality image includes:

determining first position information of a display position of the descriptive information corresponding to the target object according to the spatial position information and a preset first offset;

Determining second position information of the display position of the description information corresponding to the virtual object to be displayed according to the center point coordinate of the target plane and a preset second offset;

And according to the first position information and the second position information, the description information corresponding to the target object and the description information corresponding to the virtual object to be displayed are overlapped on the environment scene image, so that a mixed reality image is obtained.

7. A processing device of a mixed reality image, which is applied to a head-mounted display device, and is characterized in that the head-mounted display device comprises a first camera and a second camera; the device comprises:

The acquisition module is used for acquiring the environment scene image;

The identification model is used for identifying a target object contained in the environment scene image and obtaining space position information and category information of the target object;

the description information acquisition module is used for selecting the description information corresponding to the target object from a pre-established database according to the category information of the target object;

The image generating module is configured to superimpose description information corresponding to the target object on the environmental scene image based on the spatial position information, and obtain a mixed reality image, and includes:

The initial data set acquisition module is used for acquiring an initial data set of the virtual object to be displayed;

The virtual object generation module is used for rendering the virtual object to be displayed in the environment scene image according to the plane parameter of the target plane in the environment scene image and the initial data set of the virtual object to be displayed;

The image generation module is further configured to superimpose, on the environmental scene image, description information corresponding to the target object and description information corresponding to the virtual object to be displayed based on the spatial position information and the plane parameter, to obtain a mixed reality image;

or the apparatus comprises a processor and a memory storing computer instructions which, when executed by the processor, perform the method of any of claims 1-6.

8. A head-mounted display device comprising a binocular camera and the mixed reality image processing apparatus of claim 7.