WO2024138467A1

WO2024138467A1 - Ar display system based on multi-view cameras and viewport tracking

Info

Publication number: WO2024138467A1
Application number: PCT/CN2022/143041
Authority: WO
Inventors: Gang Shen; Wei Zong; Iris XIA; Juan ZHAO; Hua Yang
Original assignee: Intel Corporation
Priority date: 2022-12-28
Filing date: 2022-12-28
Publication date: 2024-07-04

Abstract

The disclosure provides an AR display system, an apparatus and a method for AR display. The method for AR display may include: determining a viewport direction of a user viewing an AR display surface, based on user images captured by a first camera array; determining a position and a size of an object of interest to be projected on an AR display surface, based on the viewport direction of the user and object images captured by a second camera array; and transmitting the position and the size of the object to a display driver for driving the AR display surface to display an AR projection of the object.

Description

AR DISPLAY SYSTEM BASED ON MULTI-VIEW CAMERAS AND VIEWPORT TRACKING

TECHNICAL FIELD

Embodiments described herein generally relate to Augmented Reality (AR) , and more particularly relate to an AR display system based on multi-view cameras and viewport tracking.

BACKGROUND

AR display has been widely used in various scenarios and may be a popular feature in modern vehicles. AR display technology may have many applications in vehicles. For example, an AR display system can be applied to a vehicle to augment text, images or videos, and create 3D animations on real-world views from windows or mirrors of the vehicle.

BRIEF DESCRIPTION OF THE DRAWINGS

The various advantages of the embodiments will become apparent to one skilled in the art by reading the following specification and appended claims, and by referencing the following drawings, in which:

FIG. 1 shows an overview of viewport-dependent AR display according to some embodiments of the present disclosure;

FIG. 2 shows an overview diagram of an example in-vehicle AR display system according to some embodiments of the present disclosure;

FIG. 3 shows example positions for arranging internal cameras and external cameras for a vehicle according to some embodiments of the present disclosure;

FIG. 4 shows an example processing flow for implementing viewport based AR display according to some embodiments of the present disclosure;

FIG. 5 shows example coordinates of multiple cameras placed on a car according to some embodiments of the present disclosure;

FIG. 6 shows a schematic diagram of a procedure for object detection with multiple cameras according to some embodiments of the present disclosure;

FIG. 7 shows a schematic diagram of object projection on a viewport- based virtual projection plane according to some embodiments of the present disclosure;

FIG. 8 shows a schematic diagram of a procedure for figuring out a size of an object on the viewport-based virtual projection plane according to some embodiments of the present disclosure;

FIG. 9 shows a flowchart of an overall procedure for AR display based on multi-view cameras and viewport tracking according to some embodiments of the present disclosure;

FIG. 10 is a block diagram illustrating components, according to some example embodiments, able to read instructions from a machine-readable or computer-readable medium and perform any one or more of the methodologies discussed herein;

FIG. 11 is a block diagram of an example processor platform in accordance with some embodiments of the disclosure.

DETAILED DESCRIPTION

Various aspects of the illustrative embodiments will be described using terms commonly employed by those skilled in the art to convey the substance of the disclosure to others skilled in the art. However, it will be apparent to those skilled in the art that many alternate embodiments may be practiced using portions of the described aspects. For purposes of explanation, specific numbers, materials, and configurations are set forth in order to provide a thorough understanding of the illustrative embodiments. However, it will be apparent to those skilled in the art that alternate embodiments may be practiced without the specific details. In other instances, well-known features may have been omitted or simplified in order to avoid obscuring the illustrative embodiments.

Further, various operations will be described as multiple discrete operations, in turn, in a manner that is most helpful in understanding the illustrative embodiments; however, the order of description should not be construed as to imply that these operations are necessarily order dependent. In particular, these operations need not be performed in the order of presentation.

AR display technology may have many applications in various scenarios. For example, an AR display system may be applied to a modern vehicle, which can augment text, images or videos, and create 3D animations on real-world views from windows or mirrors of the vehicle. For example, a front windshield of the vehicle may be a "see-through" Liquid Crystal Display (LCD) , and street names and warning signs can be overlaid directly on the windshield. This may be more intuitive than a current navigation map on a small LCD pad. As another example, the driver/passenger side windows can also be turned into AR-capable displays. Other cars in the mirrors can be marked with speed and distance information for safety concerns. The third example may be for in-car entertainment and advertising, and the passenger-side window may be used as a display to highlight tourist attractions or stores when passing by.

All these use cases show the versatility of AR display technology and the usefulness and fun of enabling AR display in vehicles. The use cases also show a few significant differences between AR in a Head Mounted Device (HMD) and AR in vehicles.

Firstly, unlike the HMD, the viewports and viewing areas of the driver and the passenger may have a much more extensive changing range than the display surface of the HMD. Also, the field of view (FOV) of the driver or the passenger will be much broader.

Secondly, an AR display surface, like the front windshield, can be a “see-through” LCD, which is much larger than the HMD and has more positions/resolutions for augmented content.

Thirdly, there could be many concurrent users (therefore many viewports) using the AR display system.

In-vehicle AR display solutions are currently limited to small screens or mirrors. For example, some rear mirrors can display AR contents by analyzing rear views from camera feeds. The analysis happens on a 2D single-view feed. However, the same approach is not applicable to the vehicle’s front view (windshield) .

There are also attempts to use Head-up Display (HUD) display like what has been used in the cockpits of fighter aircraft. However, the display surface in vehicles can be much larger than the cockpit of fighter aircraft. The HUD display usually has a very small FOV.

According to the disclosure, an AR display solution based on multi-view cameras and viewport tracking is proposed to improve AR experiences in various scenarios. For example, the solution may be utilized in vehicles. Specifically, a windshield (partial or complete) may be made as a “see-through” LCD to project AR overlays on the actual view of the world, multi-view cameras may be utilized to track both viewports of internal eyes and external objects of interest, a method is proposed to calculate correct sizes and positions of the AR overlays according to multi-view camera array and geometry projection principles, and a method using multiple cameras and machine learning models (e.g., YOLO) is proposed to detect and consolidate objects of interest based on multi-view cameras and single-view object detection. It is noted that the machine learning model YOLO is described in B. Strbac, M. Gostovic, Z. Lukac and D. Samardzija, "YOLO Multi-Camera Object Detection and Distance Estimation, " 2020 Zooming Innovation in Consumer Technologies Conference (ZINC) , 2020, pp. 26-30, doi: 10.1109/ZINC50678.2020.9161805, which is incorporated herein by reference in its entirety.

In an example, both internal and external cameras may be utilized for AR displays in vehicles. The proper positions of objects or AR overlays to be projected on the AR display surface may depend on the head position and gazing direction of a user viewing the AR display surface (e.g. the driver or the passenger) . To be less intrusive to drivers and passengers, visual-based detection (using internal cameras) may be a better choice than others –like wearing a sensing device or glasses. Meanwhile, external cameras can be used for object detection and distance detection. The internal and external cameras for determining viewports and object positions, respectively, will need to be coordinated and synchronized to ensure correct and spontaneous AR experiences in vehicles, especially for the "see-through" LCD. In this disclosure, the internal cameras may be referred to as the first camera array and the external cameras may be referred to as the second camera array.

FIG. 1 shows an overview of viewport-dependent AR display of an AR overlay according to some embodiments of the present disclosure. As shown in FIG. 1, on the "see-through" LCD, the rendering of the AR overlay (the cylinder) will depend on the driver's viewport. The cylinder's size depends on the distance and location of the pedestrian outside compared to the AR display surface (the windshield or the “see-through” LCD) . The internal and external cameras will provide the data for calculating the position and size of the "projected" figure of the pedestrian on the AR display surface, based on the driver's viewport.

FIG. 2 shows an overview diagram of an example in-vehicle AR display system according to some embodiments of the present disclosure. As shown in FIG. 2, the system may include two sets of cameras, i.e. the first camera array (e.g. internal camera-1 and internal camera-2) and the second camera array (e.g. external camera-1 and external camera-2) . The first array of internal cameras may detect and track the viewing position and direction of the driver/passenger. The second array of external cameras may scan and find objects of interest in the surrounding area. A novel idea is to use the coordinates of projections of two cameras to calculate the positions of the objects of interest in the virtual display plane –the “see-through” AR display or the windshield, based on Cartesian coordinates.

A vehicle may have multiple AR displays (windshields, side windows, etc. ) . The same system can support multiple AR displays by tracking human eyes and viewports. For example, the windshield can use the “see-through” AR display.

Specifically, as shown in FIG. 2, the AR display system may include an AR display surface (e.g. AR display 1, AR display 2, AR display 3) to display a AR projection of an object of interest; a display driver to drive AR display on the AR display surface; a first camera array (e.g. internal camera-1, internal camera-2) to track a viewport of a user viewing the AR display surface; a second camera array (e.g. external camera-1, external camera-2) to detect the object of interest; and one or more processors coupled to the display driver, the first camera array and the second camera array.

According to the disclosure, the processors may be configured to determine a viewport direction of the user based on user images captured by the first camera array; determine a position and a size of the object of interest to be projected on the AR display surface based on the viewport direction of the user and object images captured by the second camera array; and provide the position and the size of the object to the display driver for driving the AR display surface to display the AR projection of the object.

Generally, assuming an AR display surface (an x-y plane) , user images captured by two internal cameras can be used to determine the viewport using existing algorithms (like a gaze tracking algorithm) . With this detected viewport, the projected position of the object of interest on the AR display surface may be calculated by triangulations of two external cameras after object detections (using an object detection model such as YOLO) . The size of an AR overlay associated with an object of interest may depend on the projected size of the object on the AR display surface. The details about determination of the position and size of the object of interest will be described below with reference to FIG. 4 to FIG. 8.

The arrangement of the first camera array and the second camera array may vary depending on the application scenario of the AR display system. FIG. 3 shows example positions for arranging internal cameras and external cameras for a vehicle according to some embodiments of the present disclosure. As shown in FIG. 3, the internal camera array and the external camera array can be horizontally placed on two sides of the windshield.

In addition, it is important to note that all camera feeds (and maybe other information such as depth information from Lidar or Infrared) should be synchronized to ensure the correct objection detection and calculation of positions. As shown in FIG. 2, the first camera array and the second camera array may be coupled to a clock synchronization module to ensure synchronization of the user images and the object images.

FIG. 4 shows an example processing flow for implementing viewport based AR display according to some embodiments of the present disclosure. Taking a vehicle as an applicable scene, the key problem for implementing an in-vehicle AR display system is to calculate the position on the projection plane (i.e. the AR display surface, e.g., windshield) of a detected object outside the vehicles.

For example, the coordinate (X, Y, Z) may be used to represent the world coordinate, the X-axis may point to the vehicle heading direction, the Y-axis may point to the lateral direction, and the Z-axis may point to the vertical direction. FIG. 5 shows example coordinates of multiple cameras placed on a car according to some embodiments of the present disclosure.

First, camera and projector calibration may be performed to get the following intrinsic and extrinsic parameters of each camera relative to the (X, Y, Z) world coordinate system.

intrinsic matrix

rotation matrix

or Euler angle (roll, pitch, yaw)

translation vector

Then transformation between the camera/projector pixel homogeneous coordinate and the world 3D coordinate can be calculated by the following equations:

pixel homogeneous coordinate

pixel cartesian coordinate

According to the processing flow of FIG. 4, with the external camera inputs (i.e. object images captured by the external cameras) , the processor may use Machine Learning models like YOLO, TTFNet to detect the object of interest and calculate the pixel coordinate (u, v) of the object in a camera plane of each external camera. Assuming the (u _d, v _d) is the pixel coordinate of the object (or the AR overlay) on the AR display surface (like windshield) , the (u _d, v _d) may be obtained according to the viewport of the user and the coordinate (u, v) of the object in the camera plane of each external camera, which will be described in details below with reference to FIG. 7.

On the other hand, with the internal camera inputs (i.e. user images captured by the internal cameras) , the processor may use geometry triangulations or Machine Learning based models to detect and trace the user’s viewport direction (roll, yaw, pitch) .

Generally, the procedure to figure out the right position on the AR display surface for a detected object may be as follows. External cameras may capture images or videos to detect the object and obtain the two-dimensional (2D) coordinate (u, v) (e.g., using YOLO) of the object in a camera plane of each external camera. Meanwhile, the internal cameras may be used to capture the user's eyes and calculate the viewport direction (roll, pitch, yaw) of the user, which can be used to calculate a viewport virtual projection plane. Then the coordinate (u _d, v _d) on the virtual projection plane (e.g. the 2D plane of windshield display) , may be calculated. The coordinate (u _d, v _d) will be the input for AR rendering on the windshield display.

As illustrated in FIG. 4, the external camera is to collect global view. Two external cameras may get a 2D projection of an object on each camera’s 2D plane. It is noted that the targeted projection plane is the plane of windshield, and the projection position depends also on the driver’s or passenger’s viewport (roll, pitch, yaw) . The processing flow in FIG. 4 may include multi-camera object detection and identification, which will be described in details below with reference to FIG. 6.

FIG. 6 shows a schematic diagram of a procedure for object detection with multiple cameras according to some embodiments of the present disclosure. For object detection and classification with multiple external cameras, the processor may run the YOLO object detection model on output images from both cameras separately, obtaining bounding boxes and classifications for various objects (e.g., pedestrians, signs, cars) in the field of view. Assuming two external cameras are used for the object detection, corresponding objects in both images (i.e. the first camera image (Image 1) and the second camera image (Image 2) ) captured by the two external cameras may be identified by applying a mathematical transformation on the second camera image to match the reference frame of the first camera image. Then it can be determined where the object is expected to appear in the first camera image compared to the second transformed image (via the generated bounding boxes) and confirm whether the two objects captured in the first camera image and the second camera image are identical.

Given P as the object’s 3D coordinates, P ₁=KR ₁P describes the coordinates of the object in the first camera image, where K is the camera intrinsic matrix and R ₁ is the first camera’s rotation matrix. Similarly, P ₂=KR ₂P. Thus, transforming the coordinates P ₂ to P ₁ can be calculated from the following equation:

Ultimately, given YOLO outputs of an object ID and a bounding box (simplified here to a single pixel in the image) for each image, the aim is to match locations of the same object in both images to perform the triangulation calculation in the following steps. More concretely, given (u1, v1, id1) from the first camera image and (u2, v2, id2) from the second camera image where id1 and id2 denote the same object with object_id, a mapping of (object_id) -> (u1, v1, u2, v2) may be obtained. Then the relative order of objects identified in both images may be checked to further confirm the obtained object mappings. Once the object is identified and located within the images captured by the external cameras, the two sets of coordinates of the object in the first camera image and the second camera image may be used to calculate the projection coordinates of the object on the AR display surface, as described below with reference to FIG. 7.

FIG. 7 shows a schematic diagram of object projection on a viewport-based virtual projection plane according to some embodiments of the present disclosure. In order to improve the user’s experience of AR display, it may be desirable to tag or overlay information on objects from the user’s (e.g. a driver’s ) view naturally. Thus, it is proposed to realize the viewport dependent AR display.

As shown in FIG. 7, it is assumed that image I ₀ is captured by camera 0, and image I ₁ is captured by camera 1. The driver’s viewport moves to point “s” , image I _s is seen by the driver. An object P is captured in Image I ₀ as p ₀ and in Image I ₁ as p ₁. To find the object P in the driver’s observation image I _s, the following steps may be performed.

Firstly, the two cameras may be activated to ensure the driver’s viewport is in between these two cameras’ FoVs (Fields of View) . As illustrated in FIG. 7, camera 0 is on the left and camera 1 is on the right, so as to capture image I ₀ and image I ₁. Then image stereo rectification may be performed on image I ₀ and image I ₁ to get corresponding projection matrixs П ₀ and Π _1, and the object coordinates p ₀ and p ₁ may be projected into corresponding rectified images to get rectified object coordinates p ₀ ^’ and p ₁ ^’. Next, interpolation may be performed on the rectified object coordinates p ₀ ^’ and p ₁ ^’ to get the interpolated object coordinate p _s ^’ according to driver’s viewport direction. Finally, inverse projection may be performed on the interpolated object coordinate p _s ^’ by an inverse matrix (Пs) on the interpolated coordinate p _s ^’, so as to obtain the coordinate p _s that is mapped into driver’s viewport. The inverse matrix Пs may be represented as Пs= (1-s) П0 + sП1, in which s is a blending weight for image blending.

After getting the coordinate p _s of the object on the viewport-based virtual projection plane (i.e. the AR display surface) , the object or the AR overlay for the object may be displayed on the AR display surface according to the object coordinate p _s.

As described above, in order to display the object or add the AR overlay for the object on the AR display surface (e.g. the windshield in front of the driver) , the position of the object on the viewport-based virtual projection plane may be determined. But finding the object position is not enough, it is also necessary to figure out the size of the object or the AR overlay on the AR display surface.

FIG. 8 shows a schematic diagram of a procedure for figuring out a size of an object on the viewport-based virtual projection plane according to some embodiments of the present disclosure. As show in FIG. 8, the following parameters are defined: F _c : Camera focal length; L _c : Object length in camera pixel per inch; F _p : Projector focal length; L _p : Object length in projection pixel per inch. Since the parameters F _c, L _c and F _p are known, L _p can be calculated by the similar triangles rule: L _p = F _p *L _c /F _c. In this way, the size of the object to be projected on the AR display surface may be determined based on a camera focal length of a camera for capturing an object image of the object, a size of the object in a camera plane of the camera, and a projection focal length for projecting the object onto the AR display surface.

In summary, a novel AR display system (including camera arrays, AR display surface, processors) and a set of methods (utilizing camera positions and transformations) are proposed. Specifically, a windshield (partial or complete) may be made as a “see-through” LCD to project objects and AR overlays on the actual view of the world, multi-view cameras may be utilized to track both viewports of internal eyes and external objects of interest, a method is proposed to calculate correct sizes and positions of the objects or AR overlays according to multi-view camera arrays and geometry projection principles, and a method using multiple cameras and machine learning models (e.g., YOLO) is proposed to detect and consolidate objects of interest based on multi-view cameras and single-view object detection.

It is noted that although the embodiments of the disclosure are described with reference to the vehicles, it is obvious that the AR display system may be applied to various scenarios and may be especially suitable for AR display applications in which a viewport or view area of an AR viewer has an extensive changing range.

FIG. 9 shows a flowchart of an overall procedure for AR display based on multi-view cameras and viewport tracking according to some embodiments of the present disclosure. As shown in FIG. 9, the procedure may be implemented by a processor circuitry and may include operations 910 to 930.

At operation 910, the processor circuitry may determine a viewport direction of a user viewing an AR display surface, based on user images captured by a first camera array.

At operation 920, the processor circuitry may determine a position and a size of an object of interest to be projected on an AR display surface, based on the viewport direction of the user and object images captured by a second camera array.

At operation 930, the processor circuitry may transmit the position and the size of the object to a display driver for driving the AR display surface to display an AR projection of the object.

Specifically, the operation 920 may include performing object detection based on an object image captured by each camera in the second camera array to identify the object and obtain a coordinate of the object on a camera plane of each camera in the second camera array; and determining a projection coordinate of the object on the AR display surface based on the viewport direction of the user and the coordinate of the object on the camera plane of each camera in the second camera array.

According to some embodiments, determining the projection coordinate of the object on the AR display surface may include: performing image stereo rectification on the object image captured by each camera in the second camera array to obtain a plurality of rectified coordinates of the object in a plurality of corresponding rectified object images; interpolating the plurality of rectified coordinates of the object based on the viewport direction of the user to obtain an interpolated coordinate of the object; and performing inverse projection on the interpolated coordinate of the object to obtain the projection coordinate of the object on the AR display surface.

According to some embodiments, the processor circuitry may determine the size of the object to be projected on the AR display surface based on a camera focal length of a camera in the second camera array, a size of the object in a camera plane of the camera in the second camera array, and a projection focal length for projecting the object onto the AR display surface.

According to some embodiments, the processor circuitry may further generate an AR overlay for the object based on AR content associated with the object; determine a position and a size of the AR overlay according to the position and the size of the object; and transmit the AR overlay and the position and the size of the AR overlay to the display driver for driving the AR display surface to display the AR overlay.

FIG. 10 is a block diagram illustrating components, according to some example embodiments, able to read instructions from a machine-readable or computer-readable medium (e.g., a non-transitory machine-readable storage medium) and perform any one or more of the methodologies discussed herein. Specifically, FIG. 10 shows a diagrammatic representation of hardware resources 1000 including one or more processors (or processor cores) 1010, one or more memory/storage devices 1020, and one or more communication resources 1030, each of which may be communicatively coupled via a bus 1040. For embodiments where node virtualization (e.g., NFV) is utilized, a hypervisor 1002 may be executed to provide an execution environment for one or more network slices/sub-slices to utilize the hardware resources 1000.

The processors 1010 may include, for example, a processor 1012 and a processor 1014 which may be, e.g., a central processing unit (CPU) , a graphics processing unit (GPU) , a tensor processing unit (TPU) , a visual processing unit (VPU) , a field programmable gate array (FPGA) , or any suitable combination thereof.

The memory/storage devices 1020 may include main memory, disk storage, or any suitable combination thereof. The memory/storage devices 1020 may include, but are not limited to any type of volatile or non-volatile memory such as dynamic random access memory (DRAM) , static random-access memory (SRAM) , erasable programmable read-only memory (EPROM) , electrically erasable programmable read-only memory (EEPROM) , Flash memory, solid-state storage, etc.

The communication resources 1030 may include interconnection or network interface components or other suitable devices to communicate with one or more peripheral devices 1004 or one or more databases 1006 via a network 1008. For example, the communication resources 1030 may include wired communication components (e.g., for coupling via a Universal Serial Bus (USB) ) , cellular communication components, NFC components,

components (e.g.,

Low Energy) ,

components, and other communication components.

Instructions 1050 may comprise software, a program, an application, an applet, an app, or other executable code for causing at least any of the processors 1010 to perform any one or more of the methodologies discussed herein. The instructions 1050 may reside, completely or partially, within at least one of the processors 1010 (e.g., within the processor’s cache memory) , the memory/storage devices 1020, or any suitable combination thereof. Furthermore, any portion of the instructions 1050 may be transferred to the hardware resources 1000 from any combination of the peripheral devices 1004 or the databases 1006. Accordingly, the memory of processors 1010, the memory/storage devices 1020, the peripheral devices 1004, and the databases 1006 are examples of computer-readable and machine-readable media.

FIG. 11 is a block diagram of an example processor platform in accordance with some embodiments of the disclosure. The processor platform 1100 can be, for example, a server, a personal computer, a workstation, a self-learning machine (e.g., a neural network) , a mobile device (e.g., a cell phone, a smart phone, a tablet such as an iPad ^TM) , a personal digital assistant (PDA) , an Internet appliance, a DVD player, a CD player, a digital video recorder, a Blu-ray player, a gaming console, a personal video recorder, a set top box, a headset or other wearable device, or any other type of computing device.

The processor platform 1100 of the illustrated example includes a processor 1112. The processor 1112 of the illustrated example is hardware. For example, the processor 1112 can be implemented by one or more integrated circuits, logic circuits, microprocessors, GPUs, DSPs, or controllers from any desired family or manufacturer. The hardware processor may be a semiconductor based (e.g., silicon based) device. In some embodiments, the processor implements one or more of the methods or processes described above.

The processor 1112 of the illustrated example includes a local memory 1113 (e.g., a cache) . The processor 1112 of the illustrated example is in communication with a main memory including a volatile memory 1114 and a non-volatile memory 1116 via a bus 1118. The volatile memory 1114 may be implemented by Synchronous Dynamic Random Access Memory (SDRAM) , Dynamic Random Access Memory (DRAM) ,

Dynamic Random Access Memory

and/or any other type of random access memory device. The non-volatile memory 1116 may be implemented by flash memory and/or any other desired type of memory device. Access to the

main memory

1114, 1116 is controlled by a memory controller.

The processor platform 1100 of the illustrated example also includes interface circuitry 1120. The interface circuitry 1120 may be implemented by any type of interface standard, such as an Ethernet interface, a universal serial bus (USB) , a

interface, a near field communication (NFC) interface, and/or a PCI express interface.

In the illustrated example, one or more input devices 1122 are connected to the interface circuitry 1120. The input device (s) 1122 permit (s) a user to enter data and/or commands into the processor 1112. The input device (s) can be implemented by, for example, an audio sensor, a microphone, a camera (still or video) , a keyboard, a button, a mouse, a touchscreen, a track-pad, a trackball, and/or a voice recognition system.

One or more output devices 1124 are also connected to the interface circuitry 1120 of the illustrated example. The output devices 1124 can be implemented, for example, by display devices (e.g., a light emitting diode (LED) , an organic light emitting diode (OLED) , a liquid crystal display (LCD) , a cathode ray tube display (CRT) , an in-place switching (IPS) display, a touchscreen, etc. ) , a tactile output device, a printer and/or speaker. The interface circuitry 1120 of the illustrated example, thus, typically includes a graphics driver card, a graphics driver chip and/or a graphics driver processor.

The interface circuitry 1120 of the illustrated example also includes a communication device such as a transmitter, a receiver, a transceiver, a modem, a residential gateway, a wireless access point, and/or a network interface to facilitate exchange of data with external machines (e.g., computing devices of any kind) via a network 1126. The communication can be via, for example, an Ethernet connection, a digital subscriber line (DSL) connection, a telephone line connection, a coaxial cable system, a satellite system, a line-of-site wireless system, a cellular telephone system, etc.

For example, the interface circuitry 1120 may include a training dataset inputted through the input device (s) 1122 or retrieved from the network 1126.

The processor platform 1100 of the illustrated example also includes one or more mass storage devices 1128 for storing software and/or data. Examples of such mass storage devices 1128 include floppy disk drives, hard drive disks, compact disk drives, Blu-ray disk drives, redundant array of independent disks (RAID) systems, and digital versatile disk (DVD) drives.

Machine executable instructions 1132 may be stored in the mass storage device 1128, in the volatile memory 1114, in the non-volatile memory 1116, and/or on a removable non-transitory computer readable storage medium such as a CD or DVD.

Additional Notes and Examples:

Example 1 includes an apparatus for Augmented Reality (AR) display, comprising: interface circuitry; and processor circuitry coupled to the interface circuitry and configured to: determine a viewport direction of a user viewing an AR display surface, based on user images received via the interface circuitry from a first camera array; determine a position and a size of an object of interest to be projected on an AR display surface, based on the viewport direction of the user and object images received via the interface circuitry from a second camera array; and provide the position and the size of the object to the interface circuitry for transmission to a display driver for driving the AR display surface to display an AR projection of the object.

Example 2 includes the apparatus of Example 1, wherein the processor circuitry is configured to determine the position of the object to be projected on the AR display surface by: performing object detection based on an object image captured by each camera in the second camera array to identify the object and obtain a coordinate of the object on a camera plane of each camera in the second camera array; and determining a projection coordinate of the object on the AR display surface based on the viewport direction of the user and the coordinate of the object on the camera plane of each camera in the second camera array.

Example 3 includes the apparatus of Example 2, wherein determining the projection coordinate of the object on the AR display surface comprises: performing image stereo rectification on the object image captured by each camera in the second camera array to obtain a plurality of rectified coordinates of the object in a plurality of corresponding rectified object images; interpolating the plurality of rectified coordinates of the object based on the viewport direction of the user to obtain an interpolated coordinate of the object; and performing inverse projection on the interpolated coordinate of the object to obtain the projection coordinate of the object on the AR display surface.

Example 4 includes the apparatus of any of Examples 1 to 3, wherein the processor circuitry is configured to determine the size of the object to be projected on the AR display surface based on a camera focal length of a camera in the second camera array, a size of the object in a camera plane of the camera in the second camera array, and a projection focal length for projecting the object onto the AR display surface.

Example 5 includes the apparatus of any of Examples 1 to 4, wherein the first camera array and the second camera array are synchronized to ensure synchronization of the user images and the object images.

Example 6 includes the apparatus of any of Examples 1 to 5, wherein the processor circuitry is further configured to: generate an AR overlay for the object based on AR content associated with the object; determine a position and a size of the AR overlay according to the position and the size of the object; and provide the AR overlay and the position and the size of the AR overlay to the interface circuitry for transmission to the display driver for driving the AR display surface to display the AR overlay.

Example 7 includes the apparatus of any of Examples 1 to 6, wherein the apparatus is applied in a vehicle, and the user is a driver or a passenger in the vehicle.

Example 8 includes the apparatus of Example 7, wherein the AR display surface comprises at least one of a front windshield, a driver-side window, a passenger-side window, or a rear-view mirror of the vehicle.

Example 9 includes the apparatus of Example 8, wherein the front windshield of the vehicle is a see-through liquid crystal display (LCD) .

Example 10 includes the apparatus of any of Examples 7 to 9, wherein the first camera array comprises at least two internal cameras installed in the vehicle for tracking a viewport of the user, and the second camera array comprises at least two external cameras installed outside the vehicle for detecting the object of interest.

Example 11 includes an Augmented Reality (AR) display system, comprising: an AR display surface to display a AR projection of an object of interest; a display driver to drive AR display on the AR display surface; a first camera array to track a viewport of a user viewing the AR display surface; a second camera array to detect the object of interest; and a processor circuitry coupled to the display driver, the first camera array and the second camera array and configured to: determine a viewport direction of the user based on user images captured by the first camera array; determine a position and a size of the object of interest to be projected on the AR display surface based on the viewport direction of the user and object images captured by the second camera array; and provide the position and the size of the object to the display driver for driving the AR display surface to display the AR projection of the object.

Example 12 includes the AR display system of Example 11, wherein the processor circuitry is configured to determine the position of the object to be projected on the AR display surface by: performing object detection based on an object image captured by each camera in the second camera array to identify the object and obtain a coordinate of the object on a camera plane of each camera in the second camera array; and determining a projection coordinate of the object on the AR display surface based on the viewport direction of the user and the coordinate of the object on the camera plane of each camera in the second camera array.

Example 13 includes the AR display system of Example 11 or 12, wherein the processor circuitry is configured to determine the size of the object to be projected on the AR display surface based on a camera focal length of a camera in the second camera array, a size of the object in a camera plane of the camera in the second camera array, and a projection focal length for projecting the object onto the AR display surface.

Example 14 includes the AR display system of any of Examples 11 to 13, wherein the first camera array and the second camera array are synchronized to ensure synchronization of the user images and the object images.

Example 15 includes the AR display system of any of Example 11 to 14, wherein the processor circuitry is further configured to: generate an AR overlay for the object based on AR content associated with the object; determine a position and a size of the AR overlay according to the position and the size of the object; and provide the AR overlay and the position and the size of the AR overlay to the display driver for driving the AR display surface to display the AR overlay.

Example 16 includes the AR display system of any of Examples 11 to 15, wherein the system is applied in a vehicle, and the user is a driver or a passenger in the vehicle.

Example 17 includes the AR display system of Example 16, wherein the AR display surface comprises at least one of a front windshield, a driver-side window, a passenger-side window, or a rear-view mirror of the vehicle.

Example 18 includes the AR display system of Example 17, wherein the front windshield of the vehicle is a see-through liquid crystal display (LCD) .

Example 19 includes a method for Augmented Reality (AR) display, comprising: determining a viewport direction of a user viewing an AR display surface, based on user images captured by a first camera array; determining a position and a size of an object of interest to be projected on an AR display surface, based on the viewport direction of the user and object images captured by a second camera array; and transmitting the position and the size of the object to a display driver for driving the AR display surface to display an AR projection of the object.

Example 20 includes the method of Example 19, wherein determining the position of the object to be projected on the AR display surface comprises: performing object detection based on an object image captured by each camera in the second camera array to identify the object and obtain a coordinate of the object on a camera plane of each camera in the second camera array; and determining a projection coordinate of the object on the AR display surface based on the viewport direction of the user and the coordinate of the object on the camera plane of each camera in the second camera array.

Example 21 includes the method of Example 20, wherein determining the projection coordinate of the object on the AR display surface comprises: performing image stereo rectification on the object image captured by each camera in the second camera array to obtain a plurality of rectified coordinates of the object in a plurality of corresponding rectified object images; interpolating the plurality of rectified coordinates of the object based on the viewport direction of the user to obtain an interpolated coordinate of the object; and performing inverse projection on the interpolated coordinate of the object to obtain the projection coordinate of the object on the AR display surface.

Example 22 includes the method of any of Examples 19 to 21, wherein the size of the object to be projected on the AR display surface is determined based on a camera focal length of a camera in the second camera array, a size of the object in a camera plane of the camera in the second camera array, and a projection focal length for projecting the object onto the AR display surface.

Example 23 includes the method of any of Examples 19 to 22, further comprises: generating an AR overlay for the object based on AR content associated with the object; determining a position and a size of the AR overlay according to the position and the size of the object; and transmitting the AR overlay and the position and the size of the AR overlay to the display driver for driving the AR display surface to display the AR overlay.

Example 24 includes a computer-readable medium having instructions stored thereon, wherein the instructions, when executed by processor circuitry, cause the processor circuitry to perform any method of Examples 19 to 23.

Example 25 includes an apparatus, comprising means for performing any method of Examples 19 to 23.

Various techniques, or certain aspects or portions thereof, may take the form of program code (i.e., instructions) embodied in tangible media, such as floppy diskettes, CD-ROMs, hard drives, non-transitory computer readable storage medium, or any other machine-readable storage medium, wherein, when the program code is loaded into and executed by a machine, such as a computer, the machine becomes an apparatus for practicing the various techniques. The non-transitory computer readable storage medium may be a computer readable storage medium that does not include signal. In the case of program code execution on programmable computers, the computing system may include a processor, a storage medium readable by the processor (including volatile and non-volatile memory and/or storage elements) , at least one input device, and at least one output device. The volatile and non-volatile memory and/or storage elements may be a RAM, EPROM, flash drive, optical drive, magnetic hard drive, solid state drive, or other medium for storing electronic data. One or more programs that may implement or utilize the various techniques described herein may use an application programming interface (API) , reusable controls, and the like. Such programs may be implemented in a high level procedural or object oriented programming language to communicate with a computer system. However, the program (s) may be implemented in assembly or machine language, if desired. In any case, the language may be a compiled or interpreted language, and combined with hardware implementations. Exemplary systems or devices may include without limitation, laptop computers, tablet computers, desktop computers, smart phones, computer terminals and servers, storage databases, and other electronics which utilize circuitry and programmable memory, such as household appliances, smart televisions, digital video disc (DVD) players, heating, ventilating, and air conditioning (HVAC) controllers, light switches, and the like.

The above detailed description includes references to the accompanying drawings, which form a part of the detailed description. The drawings show, by way of illustration, specific embodiments that may be practiced. These embodiments are also referred to herein as “examples. ” Such examples may include elements in addition to those shown or described. However, the present inventors also contemplate examples in which only those elements shown or described are provided. Moreover, the present inventors also contemplate examples using any combination or permutation of those elements shown or described (or one or more aspects thereof) , either with respect to a particular example (or one or more aspects thereof) , or with respect to other examples (or one or more aspects thereof) shown or described herein.

All publications, patents, and patent documents referred to in this document are incorporated by reference herein in their entirety, as though individually incorporated by reference. In the event of inconsistent usages between this document and those documents so incorporated by reference, the usage in the incorporated reference (s) should be considered supplementary to that of this document; for irreconcilable inconsistencies, the usage in this document controls.

In this document, the terms “a” or “an” are used, as is common in patent documents, to include one or more than one, independent of any other instances or usages of “at least one” or “one or more. ” In this document, the term “or” is used to refer to a nonexclusive or, such that “A or B” includes “A but not B, ” “B but not A, ” and “A and B, ” unless otherwise indicated. In the appended claims, the terms “including” and “in which” are used as the plain-English equivalents of the respective terms “comprising” and “wherein. ” Also, in the following claims, the terms “including” and “comprising” are open-ended, that is, a system, device, article, or process that includes elements in addition to those listed after such a term in a claim are still deemed to fall within the scope of that claim. Moreover, in the following claims, the terms “first, ” “second, ” and “third, ” etc. are used merely as labels, and are not intended to impose numerical requirements on their objects.

The above description is intended to be illustrative, and not restrictive. For example, the above-described examples (or one or more aspects thereof) may be used in combination with each other. Other embodiments may be used, such as by one of ordinary skill in the art upon reviewing the above description. The Abstract is to allow the reader to quickly ascertain the nature of the technical disclosure and is submitted with the understanding that it will not be used to interpret or limit the scope or meaning of the claims. Also, in the above Detailed Description, various features may be grouped together to streamline the disclosure. This should not be interpreted as intending that an unclaimed disclosed feature is essential to any claim. Rather, inventive subject matter may lie in less than all features of a particular disclosed embodiment. Thus, the following claims are hereby incorporated into the Detailed Description, with each claim standing on its own as a separate embodiment. The scope of the embodiments should be determined with reference to the appended claims, along with the full scope of equivalents to which such claims are entitled.

Claims

An apparatus for Augmented Reality (AR) display, comprising: interface circuitry; and processor circuitry coupled to the interface circuitry and configured to:

determine a viewport direction of a user viewing an AR display surface, based on user images received via the interface circuitry from a first camera array;

determine a position and a size of an object of interest to be projected on an AR display surface, based on the viewport direction of the user and object images received via the interface circuitry from a second camera array; and

provide the position and the size of the object to the interface circuitry for transmission to a display driver for driving the AR display surface to display an AR projection of the object.
The apparatus of claim 1, wherein the processor circuitry is configured to determine the position of the object to be projected on the AR display surface by:

performing object detection based on an object image captured by each camera in the second camera array to identify the object and obtain a coordinate of the object on a camera plane of each camera in the second camera array; and

determining a projection coordinate of the object on the AR display surface based on the viewport direction of the user and the coordinate of the object on the camera plane of each camera in the second camera array.
The apparatus of claim 2, wherein determining the projection coordinate of the object on the AR display surface comprises:

performing image stereo rectification on the object image captured by each camera in the second camera array to obtain a plurality of rectified coordinates of the object in a plurality of corresponding rectified object images;

interpolating the plurality of rectified coordinates of the object based on the viewport direction of the user to obtain an interpolated coordinate of the object; and

performing inverse projection on the interpolated coordinate of the object to obtain the projection coordinate of the object on the AR display surface.
The apparatus of claim 1, wherein the processor circuitry is configured to determine the size of the object to be projected on the AR display surface based on a camera focal length of a camera in the second camera array, a size of the object in a camera plane of the camera in the second camera array, and a projection focal length for projecting the object onto the AR display surface.
The apparatus of claim 1, wherein the first camera array and the second camera array are synchronized to ensure synchronization of the user images and the object images.
The apparatus of claim 1, wherein the processor circuitry is further configured to:

generate an AR overlay for the object based on AR content associated with the object;

determine a position and a size of the AR overlay according to the position and the size of the object; and

provide the AR overlay and the position and the size of the AR overlay to the interface circuitry for transmission to the display driver for driving the AR display surface to display the AR overlay.
The apparatus of any of claims 1 to 6, wherein the apparatus is applied in a vehicle, and the user is a driver or a passenger in the vehicle.
The apparatus of claim 7, wherein the AR display surface comprises at least one of a front windshield, a driver-side window, a passenger-side window, or a rear-view mirror of the vehicle.
The apparatus of claim 8, wherein the front windshield of the vehicle is a see-through liquid crystal display (LCD) .
The apparatus of claim 7, wherein the first camera array comprises at least two internal cameras installed in the vehicle for tracking a viewport of the user, and the second camera array comprises at least two external cameras installed outside the vehicle for detecting the object of interest.
An Augmented Reality (AR) display system, comprising:

an AR display surface to display a AR projection of an object of interest;

a display driver to drive AR display on the AR display surface;

a first camera array to track a viewport of a user viewing the AR display surface;

a second camera array to detect the object of interest; and

a processor circuitry coupled to the display driver, the first camera array and the second camera array and configured to:

determine a viewport direction of the user based on user images captured by the first camera array;

determine a position and a size of the object of interest to be projected on the AR display surface based on the viewport direction of the user and object images captured by the second camera array; and

provide the position and the size of the object to the display driver for driving the AR display surface to display the AR projection of the object.
The AR display system of claim 11, wherein the processor circuitry is configured to determine the position of the object to be projected on the AR display surface by:

performing object detection based on an object image captured by each camera in the second camera array to identify the object and obtain a coordinate of the object on a camera plane of each camera in the second camera array; and

determining a projection coordinate of the object on the AR display surface based on the viewport direction of the user and the coordinate of the object on the camera plane of each camera in the second camera array.
The AR display system of claim 11, wherein the processor circuitry is configured to determine the size of the object to be projected on the AR display surface based on a camera focal length of a camera in the second camera array, a size of the object in a camera plane of the camera in the second camera array, and a projection focal length for projecting the object onto the AR display surface.
The AR display system of claim 11, wherein the first camera array and the second camera array are synchronized to ensure synchronization of the user images and the object images.
The AR display system of claim 11, wherein the processor circuitry is further configured to:

generate an AR overlay for the object based on AR content associated with the object;

determine a position and a size of the AR overlay according to the position and the size of the object; and

provide the AR overlay and the position and the size of the AR overlay to the display driver for driving the AR display surface to display the AR overlay.
The AR display system of any of claims 11 to 15, wherein the system is applied in a vehicle, and the user is a driver or a passenger in the vehicle.
The AR display system of claim 16, wherein the AR display surface comprises at least one of a front windshield, a driver-side window, a passenger-side window, or a rear-view mirror of the vehicle.
The AR display system of claim 17, wherein the front windshield of the vehicle is a see-through liquid crystal display (LCD) .
A method for Augmented Reality (AR) display, comprising:

determining a viewport direction of a user viewing an AR display surface, based on user images captured by a first camera array;

determining a position and a size of an object of interest to be projected on an AR display surface, based on the viewport direction of the user and object images captured by a second camera array; and

transmitting the position and the size of the object to a display driver for driving the AR display surface to display an AR projection of the object.
The method of claim 19, wherein determining the position of the object to be projected on the AR display surface comprises:

performing object detection based on an object image captured by each camera in the second camera array to identify the object and obtain a coordinate of the object on a camera plane of each camera in the second camera array; and

determining a projection coordinate of the object on the AR display surface based on the viewport direction of the user and the coordinate of the object on the camera plane of each camera in the second camera array.
The method of claim 20, wherein determining the projection coordinate of the object on the AR display surface comprises:

performing image stereo rectification on the object image captured by each camera in the second camera array to obtain a plurality of rectified coordinates of the object in a plurality of corresponding rectified object images;

interpolating the plurality of rectified coordinates of the object based on the viewport direction of the user to obtain an interpolated coordinate of the object; and

performing inverse projection on the interpolated coordinate of the object to obtain the projection coordinate of the object on the AR display surface.
The method of claim 19, wherein the size of the object to be projected on the AR display surface is determined based on a camera focal length of a camera in the second camera array, a size of the object in a camera plane of the camera in the second camera array, and a projection focal length for projecting the object onto the AR display surface.
The method of claim 19, further comprises:

generating an AR overlay for the object based on AR content associated with the object;

determining a position and a size of the AR overlay according to the position and the size of the object; and

transmitting the AR overlay and the position and the size of the AR overlay to the display driver for driving the AR display surface to display the AR overlay.
A computer-readable medium having instructions stored thereon, wherein the instructions, when executed by processor circuitry, cause the processor circuitry to perform any method of claims 19 to 23.
An apparatus, comprising means for performing any method of claims 19 to 23.