[go: up one dir, main page]

WO2024093372A1 - 测距方法和装置 - Google Patents

测距方法和装置 Download PDF

Info

Publication number
WO2024093372A1
WO2024093372A1 PCT/CN2023/108397 CN2023108397W WO2024093372A1 WO 2024093372 A1 WO2024093372 A1 WO 2024093372A1 CN 2023108397 W CN2023108397 W CN 2023108397W WO 2024093372 A1 WO2024093372 A1 WO 2024093372A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
camera
feature
vehicle
map
Prior art date
Application number
PCT/CN2023/108397
Other languages
English (en)
French (fr)
Inventor
周磊
蔡纪源
尹昊
蔡佳
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2024093372A1 publication Critical patent/WO2024093372A1/zh

Links

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01SRADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
    • G01S17/00Systems using the reflection or reradiation of electromagnetic waves other than radio waves, e.g. lidar systems
    • G01S17/88Lidar systems specially adapted for specific applications
    • G01S17/93Lidar systems specially adapted for specific applications for anti-collision purposes
    • G01S17/931Lidar systems specially adapted for specific applications for anti-collision purposes of land vehicles
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01SRADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
    • G01S13/00Systems using the reflection or reradiation of radio waves, e.g. radar systems; Analogous systems using reflection or reradiation of waves whose nature or wavelength is irrelevant or unspecified
    • G01S13/02Systems using reflection of radio waves, e.g. primary radar systems; Analogous systems
    • G01S13/06Systems determining position data of a target
    • G01S13/08Systems for measuring distance only
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01SRADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
    • G01S13/00Systems using the reflection or reradiation of radio waves, e.g. radar systems; Analogous systems using reflection or reradiation of waves whose nature or wavelength is irrelevant or unspecified
    • G01S13/86Combinations of radar systems with non-radar systems, e.g. sonar, direction finder
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01SRADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
    • G01S13/00Systems using the reflection or reradiation of radio waves, e.g. radar systems; Analogous systems using reflection or reradiation of waves whose nature or wavelength is irrelevant or unspecified
    • G01S13/86Combinations of radar systems with non-radar systems, e.g. sonar, direction finder
    • G01S13/867Combination of radar systems with cameras
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01SRADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
    • G01S13/00Systems using the reflection or reradiation of radio waves, e.g. radar systems; Analogous systems using reflection or reradiation of waves whose nature or wavelength is irrelevant or unspecified
    • G01S13/88Radar or analogous systems specially adapted for specific applications
    • G01S13/93Radar or analogous systems specially adapted for specific applications for anti-collision purposes
    • G01S13/931Radar or analogous systems specially adapted for specific applications for anti-collision purposes of land vehicles
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01SRADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
    • G01S15/00Systems using the reflection or reradiation of acoustic waves, e.g. sonar systems
    • G01S15/02Systems using the reflection or reradiation of acoustic waves, e.g. sonar systems using reflection of acoustic waves
    • G01S15/06Systems determining the position data of a target
    • G01S15/08Systems for measuring distance only
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01SRADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
    • G01S15/00Systems using the reflection or reradiation of acoustic waves, e.g. sonar systems
    • G01S15/86Combinations of sonar systems with lidar systems; Combinations of sonar systems with systems not using wave reflection
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01SRADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
    • G01S15/00Systems using the reflection or reradiation of acoustic waves, e.g. sonar systems
    • G01S15/88Sonar systems specially adapted for specific applications
    • G01S15/93Sonar systems specially adapted for specific applications for anti-collision purposes
    • G01S15/931Sonar systems specially adapted for specific applications for anti-collision purposes of land vehicles
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01SRADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
    • G01S17/00Systems using the reflection or reradiation of electromagnetic waves other than radio waves, e.g. lidar systems
    • G01S17/02Systems using the reflection of electromagnetic waves other than radio waves
    • G01S17/06Systems determining position data of a target
    • G01S17/08Systems determining position data of a target for measuring distance only
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01SRADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
    • G01S17/00Systems using the reflection or reradiation of electromagnetic waves other than radio waves, e.g. lidar systems
    • G01S17/86Combinations of lidar systems with systems other than lidar, radar or sonar, e.g. with direction finders
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/04Context-preserving transformations, e.g. by using an importance map
    • G06T3/047Fisheye or wide-angle transformations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/13Edge detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/136Segmentation; Edge detection involving thresholding
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/50Depth or shape recovery
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/80Analysis of captured images to determine intrinsic or extrinsic camera parameters, i.e. camera calibration
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/90Determination of colour characteristics

Definitions

  • the embodiments of the present application relate to the field of autonomous driving, and in particular to distance measurement methods and devices.
  • the distance to obstacles around the vehicle is mainly sensed by distance measuring sensors (such as lidar, millimeter wave radar and ultrasonic radar) installed around the vehicle.
  • distance measuring sensors such as lidar, millimeter wave radar and ultrasonic radar
  • the distance measuring sensors installed around the vehicle have detection blind spots.
  • the embodiment of the present application provides a distance measurement method and device for a vehicle, which enables the vehicle to measure the distance to obstacles in the detection blind spot. To achieve the above purpose, the embodiment of the present application adopts the following technical solutions:
  • an embodiment of the present application provides a distance measurement method, which is applied to a vehicle, wherein the vehicle includes a first camera and a second camera, and the method includes: first acquiring a first image and a second image. Then acquiring a first depth map and a second depth map. Thereafter, determining the distance between an object in the first image and/or the second image and the vehicle based on the first depth map and the second depth map. Determining the distance between an object in the first image and/or the second image and the vehicle based on the first depth map and the second depth map.
  • the first image is an image captured by the first camera
  • the second image is an image captured by the second camera
  • the first camera and the second camera have a common viewing area
  • the first camera is a fisheye camera
  • the second camera is a pinhole camera
  • the first depth map is a depth map corresponding to the first image
  • the second depth map is a depth map corresponding to the second image.
  • the distance measurement method provided in the embodiment of the present application obtains a first depth map and a second depth map through a first image captured by a fisheye camera with a large field of view and a second image captured by a pinhole camera in a common viewing area with the camera, and then determines the distance between the object in the first image and/or the second image and the vehicle based on the first depth map and the second depth map.
  • the fisheye camera with a large field of view can make up for the blind spots inherent in the layout of the distance measurement sensor, so that the vehicle can measure the distance to obstacles (such as suspended obstacles) in the detection blind spots.
  • the first image and the second image may be input into a target network to obtain a first depth map and a second depth map.
  • the first depth map and the second depth map are obtained, and then the distance between the object in the first image and the second image and the vehicle is determined according to the first depth map and the second depth map.
  • the fisheye camera with a large field of view can make up for the blind spots inherent in the layout of the ranging sensor, so that the vehicle can measure the distance to obstacles (such as suspended obstacles) in the detection blind spots.
  • the first camera may be a fisheye camera with a field of view greater than a preset angle.
  • the preset angle is 180 degrees or 192 degrees.
  • some objects may not be in the common viewing area of the first camera and the second camera, but exist alone in the field of view of the first camera or the second camera, that is, exist alone in the first image or the second image.
  • the distance between these objects and the vehicle can be determined by the first depth map and the second depth map.
  • a first feature map and a second feature map may be obtained. Then, a third feature map may be obtained based on a first feature point of the first feature map and a plurality of target feature points corresponding to the first feature point. Then, a fourth feature map may be obtained based on a second feature point of the second feature map and a plurality of target feature points corresponding to the second feature point. Then, the first depth map and the second depth map may be obtained based on the third feature map and the fourth feature map.
  • the first feature map is a feature map corresponding to the first image
  • the second feature map is a feature map corresponding to the second image.
  • the first feature point is any feature point in the first feature map.
  • the multiple target feature points corresponding to the first feature point are feature points in the second feature map that meet the epipolar constraint with the first feature point.
  • the second feature point is any feature point in the second feature map, and the multiple target feature points corresponding to the second feature point are feature points in the first feature map that meet the epipolar constraint with the second feature point.
  • the epipolar constraint refers to the constraint formed by the image point and the camera optical center under the projection model when describing the projection of the same point onto two images with different perspectives.
  • the epipolar line is not necessarily a straight line, but may also be a curve.
  • e1 is the intersection of the line connecting the optical centers O1O2 of the two images and the plane of image one
  • e2 is the intersection of the line connecting the optical centers O1O2 of the two images and the plane of image two.
  • feature matching of feature points through target feature points that meet epipolar constraints in the image of the common view area corresponding to the feature points can reduce the computational complexity of the feature matching process on the one hand; on the other hand, since the feature points that meet epipolar constraints in the image of the common view area corresponding to the feature points have a high similarity with the feature points, feature matching of feature points through target feature points of feature points can make the matched feature points fuse the features of the target feature points, increase the recognition of the feature points, and enable the target network to obtain the corresponding depth map more accurately according to the feature map after feature fusion, providing high ranging accuracy.
  • the first feature map corresponding to the first image is flattened into a one-dimensional feature representation as [a0, a1, ..., aH1xW1], with a length of H1xW1
  • the second feature map corresponding to the second image is flattened into a one-dimensional feature representation as [b0, b1, ... bH2xW2], with a length of H2xW2.
  • the one-dimensional feature C is then mapped into three features, Q, K, and V, using a network, whose dimensions remain the same as C.
  • the feature bi with index position i in the second feature map has n feature index positions corresponding to the depth range (dmin, dmax) in the first feature map after calculation through the epipolar constraint, which are ⁇ ad0, ad1, ..., adn ⁇ respectively.
  • the above operation is performed on each feature point to obtain a one-dimensional feature C', and then it is split and converted into a third feature map corresponding to the first feature map and a fourth feature map corresponding to the second feature map according to the splicing order of C.
  • the target feature point may also be a feature point that meets the epipolar constraint in an image where an image corresponding to the feature point exists in a common view area, and feature points around the feature point that meets the epipolar constraint.
  • the first feature map corresponding to the first image is flattened into a one-dimensional feature representation as [a0, a1, ..., aH1xW1], with a length of H1xW1
  • the second feature map corresponding to the second image is flattened into a one-dimensional feature representation as [b0, b1, ... bH2xW2], with a length of H2xW2.
  • the feature bi at index position i in the second feature map has n feature index positions corresponding to the depth range (dmin, dmax) in the first feature map after calculation through the epipolar constraint, they are ⁇ ad0, ad1, ..., adn ⁇ , and there are m candidate points after dilation processing, which are generally represented as ⁇ ad0, ad1, ..., adn, adn+1, ..., adn+m ⁇ .
  • the element qii in Q corresponding to the pinhole image feature index position i does not need to be dot-multiplied with all elements in K of length H1xW1+H2xW2, but only needs to be dot-multiplied with the n+m elements corresponding to ⁇ ad0, ad1, ..., adn, adn+1, ..., adn+m ⁇ to obtain each element of qii and the n+m elements.
  • the distance between the object and the vehicle in the first image and/or the second image can be determined based on the first depth map, the second depth map, the first structural semantics, and the second structural semantics, wherein the first structural semantics is used to indicate the edges and planes of the objects in the first image, and the second structural semantics is used to indicate the edges and planes of the objects in the second image.
  • the first image and the second image have a common viewing area, and the same object may exist in the first image and the second image. Since the depth of the pixel in the depth map is relative to the camera coordinate system corresponding to the image where the pixel is located, the depth of the same pixel in different camera coordinate systems may have deviations when converted to the unified coordinate system established by the vehicle. This deviation may affect the accuracy of the distance between the edge point and the vehicle. For this reason, the same pixel of different cameras can be aligned in the unified coordinate system by characterizing the structural semantics of the edge and plane structure of each object in the image to eliminate the deviation and improve the ranging accuracy.
  • the first image, the first feature map, or the third feature map may be input into the target network to obtain the first structural semantics.
  • the second image, the second feature map, or the fourth feature map may be input into the target network to obtain the second structural semantics.
  • the third feature map is used as input, the first structural semantics obtained may be more accurate because the third feature map is a feature map fused based on the epipolar constraint; similarly, when the fourth feature map is used as input, the second structural semantics obtained may also be more accurate.
  • the target network can also output the edges and planes of objects in the feature map corresponding to the image based on the image or feature map. Since the first image and the second image have a common view area, the same object may exist in the first image and the second image. Since the depth of the pixel in the depth map is relative to the camera coordinate system corresponding to the picture where the pixel is located, the depth of the same pixel in different camera coordinate systems may have deviations when converted to the unified coordinate system established with the vehicle. This deviation may affect the accuracy of the distance between the edge point and the vehicle. For this reason, the same pixel of different cameras can be aligned in the unified coordinate system by characterizing the structural semantics of the edge and plane structure of each object in the image to eliminate the deviation and improve the ranging accuracy.
  • the first image is an image captured by the first camera at a first moment
  • the second image is an image captured by the second camera at the first moment
  • the distance between the object and the vehicle in the first image and/or the second image may be determined according to the first depth map, the second depth map, the first instance segmentation result, the second instance segmentation result, the first distance information, and the second distance information.
  • the first instance segmentation result is used to indicate the background and movable objects in the first image
  • the second instance segmentation result is used to indicate the background and movable objects in the second image
  • the first distance information is used to indicate the distance between the object and the vehicle in the third image
  • the third image is the image captured by the first camera at the second moment
  • the second distance information is used to indicate the distance between the object and the vehicle in the fourth image
  • the fourth image is the image captured by the second camera at the second moment.
  • the first image and the second image have a common viewing area, and the same object may exist in the first image and the second image. Since the depth of the pixel in the depth map is relative to the camera coordinate system corresponding to the image where the pixel is located, the depth of the same pixel in different camera coordinate systems may have deviations when converted to the unified coordinate system established by the vehicle, and the deviation may affect the accuracy of the distance between the edge point and the vehicle.
  • the distance between the object and the vehicle in the third image captured by the first camera at the second moment in the image and the instance segmentation result of the first image can be represented, and the distance between each object and the vehicle in the first image can be corrected with reference to the fixed background in the two images.
  • the distance between the object and the vehicle in the fourth image captured by the second camera at the second moment in the image and the instance segmentation result of the second image can be represented, and the distance between each object and the vehicle in the first two images can be corrected with reference to the fixed background in the two images, so that the same pixel of different cameras can be aligned in the unified coordinate system to eliminate the deviation, thereby improving the ranging accuracy.
  • the first image, the first feature map, or the third feature map may be input into the target network to obtain the first instance segmentation.
  • the second image, the second feature map, or the fourth feature map may be input into the target network to obtain the second instance segmentation result.
  • the third feature map is used as input, the first instance segmentation obtained may be more accurate because the third feature map is a feature map fused based on the epipolar constraint; similarly, when the fourth feature map is used as input, the second instance segmentation obtained may also be more accurate.
  • the target network can also output the instance segmentation result of the image corresponding to the feature map based on the feature map. Since the first image and the second image have a common view area, the first image and the second image may contain the same object. Since the depth of the pixel in the depth map is relative to the camera coordinate system corresponding to the image where the pixel is located, the depth of the same pixel in different camera coordinate systems is converted to the camera coordinate system. There may be deviations in the unified coordinate system established by the vehicle, and the deviations may affect the accuracy of the distance between the edge point and the vehicle.
  • the distance between the object and the vehicle in the third image acquired by the first camera at the second moment in the image and the instance segmentation result of the first image can be represented, and the distance between each object and the vehicle in the first image can be corrected with reference to the fixed background in the two images.
  • the distance between the object and the vehicle in the fourth image acquired by the second camera at the second moment in the image and the instance segmentation result of the second image can be represented, and the distance between each object and the vehicle in the first two images can be corrected with reference to the fixed background in the two images, so that the same pixel points of different cameras are aligned in the unified coordinate system to eliminate the deviation, thereby improving the ranging accuracy.
  • the first image may be calibrated according to the internal parameters of the first camera and the preset internal parameters of the fisheye camera, and then a corresponding depth map is obtained based on the calibrated first image to determine the distance between the object and the vehicle.
  • the first image captured by the first camera can be calibrated by presetting the internal parameters of the fisheye camera to eliminate the deviations and further improve the ranging accuracy.
  • the second image may be calibrated according to the internal parameters of the second camera and the preset pinhole camera internal parameters, and then a corresponding depth map is obtained based on the calibrated second image to determine the distance between the object and the vehicle.
  • the second image captured by the second camera can be calibrated by presetting the internal parameters of the pinhole camera to eliminate the deviations and further improve the ranging accuracy.
  • the objects in the first image and the second image may be three-dimensionally reconstructed according to the distances between the objects in the first image and the second image and the vehicle, and the three-dimensionally reconstructed objects may be displayed.
  • performing three-dimensional reconstruction of the objects in the first image and/or the second image based on the distance between the objects in the first image and/or the second image and the vehicle and displaying the three-dimensionally reconstructed objects can help users to more intuitively understand the positional relationship between the objects in the first image and/or the second image and the vehicle.
  • prompt information may be displayed according to the distance between the object in the first image and the second image and the vehicle.
  • a collision warning prompt message is displayed to remind the user that the vehicle may collide with the object in the first image.
  • distance prompt information is displayed to remind the user that the distance between the vehicle and the object in the second image is relatively close.
  • the distance between the object in the first image and/or the second image and the vehicle is the distance between the object in the first image and/or the second image and the isometric contour of the vehicle.
  • the isometric contour of the vehicle is an isometric contour set according to the outer contour of the vehicle.
  • the isometric contour may be an isometric line extending outward from the outer contour line of the vehicle body in a two-dimensional (2D) top view, or may be an isometric surface extending outward from the three-dimensional (3D) outer contour of the vehicle body.
  • the vehicle isometric contour may be adjusted according to the distance between the object and the vehicle in the first image and the second image.
  • the color of the equidistant contour is adjusted to yellow.
  • the color of the equidistant contour is adjusted to red.
  • the first camera may be a rear-view fisheye camera
  • the second camera may be a rear-view pinhole camera
  • the first camera may be a forward-looking fisheye camera
  • the second camera may be a forward-looking pinhole camera
  • the first camera may be a left-looking fisheye camera
  • the second camera may be a left-looking pinhole camera
  • the first camera may be a right-looking fisheye camera
  • the second camera may be a right-looking pinhole camera
  • an embodiment of the present application provides a distance measuring device, which is applied to a vehicle including a first camera and a second camera, and the distance measuring device includes: an acquisition unit, a network unit and a determination unit.
  • the acquisition unit is used to acquire a first image and a second image, the first image is an image captured by the first camera, the second image is an image captured by the second camera, the first camera and the second camera have a common viewing area, the first camera is a fisheye camera, and the second camera is a pinhole camera.
  • the network unit is used to acquire a first depth map and a second depth map, the first depth map is a depth map corresponding to the first image, and the second depth map is a depth map corresponding to the second image.
  • the determination unit is used to determine the distance between the object in the first image and/or the second image and the vehicle based on the first depth map and the second depth map.
  • the network unit is specifically configured to: obtain a first feature map and a second feature map, wherein the first feature map The feature map is a feature map corresponding to the first image, and the second feature map is a feature map corresponding to the second image.
  • a third feature map is obtained based on a first feature point of the first feature map and a plurality of target feature points corresponding to the first feature point, wherein the first feature point is an arbitrary feature point in the first feature map, and the plurality of target feature points corresponding to the first feature point are feature points in the second feature map that meet the epipolar constraint with the first feature point.
  • a fourth feature map is obtained based on a second feature point of the second feature map and a plurality of target feature points corresponding to the second feature point, wherein the second feature point is an arbitrary feature point in the second feature map, and the plurality of target feature points corresponding to the second feature point are feature points in the first feature map that meet the epipolar constraint with the second feature point.
  • the first depth map and the second depth map are obtained based on the third feature map and the fourth feature map.
  • the determination unit is specifically used to determine the distance between the object and the vehicle in the first image and/or the second image based on the first depth map, the second depth map, the first structural semantics, and the second structural semantics, wherein the first structural semantics is used to indicate the edges and planes of the objects in the first image, and the second structural semantics is used to indicate the edges and planes of the objects in the second image.
  • the first image is an image captured by the first camera at a first moment.
  • the determination unit is specifically used to determine the distance between the object and the vehicle in the first image and/or the second image according to the first depth map, the second depth map, the first instance segmentation result, the second instance segmentation result, the first distance information and the second distance information, the first instance segmentation result is used to indicate the background and movable objects in the first image, the second instance segmentation result is used to indicate the background and movable objects in the second image, the first distance information is used to indicate the distance between the object and the vehicle in a third image, the third image is an image captured by the first camera at the second moment, the second distance information is used to indicate the distance between the object and the vehicle in a fourth image, and the fourth image is an image captured by the second camera at the second moment.
  • the acquisition unit is further configured to: calibrate the first image according to an intrinsic parameter of the first camera and a preset fisheye camera intrinsic parameter.
  • the acquisition unit is further used to: calibrate the second image according to an intrinsic parameter of the second camera and a preset pinhole camera intrinsic parameter.
  • the determination unit is further used to: perform three-dimensional reconstruction of the objects in the first image and the second image according to the distance between the objects in the first image and the second image and the vehicle; and display the three-dimensionally reconstructed objects.
  • the determining unit is further configured to: display prompt information according to the distance between the object and the vehicle in the first image and the second image.
  • an embodiment of the present application further provides a ranging device, which includes: one or more processors, when the one or more processors execute program codes or instructions, implement the method described in the above first aspect or any possible implementation method thereof.
  • the distance measuring device may further include one or more memories, and the one or more memories are used to store the program code or instruction.
  • an embodiment of the present application further provides a chip, comprising: an input interface, an output interface, and one or more processors.
  • the chip also includes a memory.
  • the one or more processors are used to execute the code in the memory, and when the one or more processors execute the code, the chip implements the method described in the first aspect or any possible implementation thereof.
  • the above chip may also be an integrated circuit.
  • an embodiment of the present application further provides a computer-readable storage medium for storing a computer program, wherein the computer program includes methods for implementing the method described in the above-mentioned first aspect or any possible implementation thereof.
  • an embodiment of the present application further provides a computer program product comprising instructions, which, when executed on a computer, enables the computer to implement the method described in the first aspect or any possible implementation thereof.
  • the embodiment of the present application also provides a distance measuring device, including: an acquisition unit, a network unit and a determination unit.
  • the acquisition unit is used to acquire a first image and a second image, the first image is an image captured by a first camera, the second image is an image captured by a second camera, the first camera and the second camera have a common viewing area, the first camera is a fisheye camera, and the second camera is a pinhole camera;
  • the network unit is used to acquire a first depth map and a second depth map, the first depth map is a depth map corresponding to the first image, and the second depth map is a depth map corresponding to the second image;
  • the determination unit is used to determine the distance between the object in the first image and/or the second image and the vehicle according to the first depth map and the second depth map.
  • the above-mentioned distance measuring device is also used to implement the method described in the above-mentioned first aspect or any possible implementation thereof.
  • an embodiment of the present application provides a ranging system, including one or more first cameras, one or more second cameras, and a computing device, wherein the one or more first cameras are used to acquire a first image, and the one or more second cameras are used to acquire a second image.
  • the computing device is used to perform ranging based on the first image and the second image using the method described in the first aspect or any possible implementation thereof.
  • an embodiment of the present application provides a vehicle, which includes one or more fisheye cameras, one or more pinhole cameras and one or more processors, and the one or more processors implement the method described in the first aspect or any possible implementation thereof.
  • the vehicle also includes a display screen for displaying information such as road conditions, distance prompt information, a two-dimensional/three-dimensional model of the vehicle or a two-dimensional/three-dimensional model of an obstacle.
  • the vehicle also includes a speaker for playing voice prompt information, and the voice prompt information may include information such as danger prompts and/or the distance between the vehicle and the obstacle. For example, when the distance between the vehicle and the obstacle is less than a preset threshold, the voice prompts the driver to pay attention to the existence of the obstacle.
  • the vehicle can use only the display screen to display prompt information to remind the driver, or only the voice prompt information to remind the driver, or combine the display screen display and the voice prompt to remind the driver. For example, when the distance between the vehicle and the obstacle is lower than the first threshold, the prompt information is only displayed on the display screen. When the distance between the vehicle and the obstacle is lower than the second threshold (the second threshold is lower than the first threshold), the driver is prompted to pay attention to the obstacle while displaying the prompt information, thereby attracting the driver's attention.
  • the distance measuring device, computer storage medium, computer program product and chip provided in this embodiment are all used to execute the method provided above. Therefore, the beneficial effects that can be achieved can refer to the beneficial effects in the method provided above and will not be repeated here.
  • FIG1 is a schematic diagram of an image provided in an embodiment of the present application.
  • FIG2 is a schematic diagram of the structure of a distance measurement system provided in an embodiment of the present application.
  • FIG3 is a schematic diagram of the structure of an image acquisition system provided in an embodiment of the present application.
  • FIG4 is a schematic diagram of a flow chart of a distance measurement method provided in an embodiment of the present application.
  • FIG5 is a schematic diagram of the structure of a target network provided in an embodiment of the present application.
  • FIG6 is a schematic diagram of extracting image features provided by an embodiment of the present application.
  • FIG7 is a schematic diagram of aligning pixels provided in an embodiment of the present application.
  • FIG8 is a schematic diagram of a ranging scenario provided in an embodiment of the present application.
  • FIG9 is a schematic diagram of a display interface provided in an embodiment of the present application.
  • FIG10 is a schematic diagram of another display interface provided in an embodiment of the present application.
  • FIG11 is a schematic diagram of another ranging scenario provided in an embodiment of the present application.
  • FIG12 is a schematic diagram of another display interface provided in an embodiment of the present application.
  • FIG13 is a schematic diagram of another display interface provided in an embodiment of the present application.
  • FIG14 is a schematic diagram of another display interface provided in an embodiment of the present application.
  • FIG15 is a schematic diagram of another display interface provided in an embodiment of the present application.
  • FIG16 is a schematic diagram of another display interface provided in an embodiment of the present application.
  • FIG17 is a schematic diagram of another display interface provided in an embodiment of the present application.
  • FIG18 is a schematic diagram of another display interface provided in an embodiment of the present application.
  • FIG19 is a schematic diagram of the structure of a distance measuring device provided in an embodiment of the present application.
  • FIG20 is a schematic diagram of the structure of a chip provided in an embodiment of the present application.
  • FIG. 21 is a schematic diagram of the structure of an electronic device provided in an embodiment of the present application.
  • a and/or B in this article is merely a description of the association relationship of associated objects, indicating that three relationships may exist.
  • a and/or B can mean: A exists alone, A and B exist at the same time, and B exists alone.
  • first and second in the description and drawings of the embodiments of the present application are used to distinguish different objects, or to It is used to distinguish different treatments of the same object, rather than to describe a specific order of objects.
  • Image distortion is caused by the deviation of lens manufacturing precision and assembly process, which will introduce distortion and cause the original image to be distorted.
  • general cameras must be dedistorted when in use, especially fisheye cameras. If dedistortion is not performed, the target size distribution in the original image of the camera will be uneven, which will cause great interference to the perception algorithm. Therefore, the original image must be dedistorted. However, information will be lost after the original image is dedistorted. The loss of information is very fatal in unmanned driving and has the potential risk of causing traffic accidents.
  • Epipolar constraint describes the constraints formed by the image point and the camera optical center under the projection model when the same point is projected onto two images with different perspectives.
  • the spatial point P or P' on the image point P1, its image point P2 in image 2 must be on the epipolar line e2P2, which is expressed as an epipolar constraint.
  • the epipolar line is not necessarily a straight line, but may also be a curve.
  • e1 is the intersection of the line connecting the optical centers O1O2 of the two corresponding cameras of the two images and the plane of image 1
  • e2 is the intersection of the line connecting the optical centers O1O2 of the two corresponding cameras of the two images and the plane of image 2.
  • Common visual area refers to the area of visual field that has intersection or overlap.
  • Feature fusion Take the fusion of feature maps 2 and 3 corresponding to two images as an example.
  • Feature map 2 is flattened into a one-dimensional feature representation as [a0, a1, ..., aH1xW1], with a length of H1xW1
  • feature map 3 is flattened into a one-dimensional feature representation as [b0, b1, ...bH2xW2], with a length of H2xW2.
  • the MLP network is then used to map feature C into three features, namely Q, K, and V, whose dimensions remain the same as C.
  • the three mapped features are then input into the Transformer network.
  • the fused feature C' is obtained, and then the fused feature C' is split into fused feature map 2 corresponding to feature map 2 and fused feature map 3 corresponding to feature map 3.
  • QKT represents vector dot product.
  • the depth estimation of multiple images with overlapping areas is mainly performed through the disparity estimation method.
  • the disparity estimation method needs to determine the pixel point corresponding to each pixel point in the overlapping area of the image on the other image and calculate the disparity between the pixel point and the corresponding pixel point, and then calculate the depth of the pixel point through the disparity between the pixel point and the corresponding pixel point.
  • the disparity estimation method can calculate the depth of all pixels in the overlapping area through disparity, but since there are no corresponding pixels in other images for pixels outside the overlapping area in the image, it is impossible to obtain the disparity of the pixels outside the overlapping area in the image, nor to calculate the depth of the pixels outside the overlapping area in the image.
  • an embodiment of the present application provides a distance measurement method that can perform depth estimation on multiple images with overlapping areas.
  • the method is applicable to a distance measurement system, and FIG2 shows a possible existence form of the distance measurement system.
  • the distance measurement system includes an image acquisition system and a computer device.
  • the image acquisition system and the computer device can communicate in a wired or wireless manner.
  • An image acquisition system is used to acquire a first image and a second image having a common viewing area.
  • a computer device is used to determine the distance between the object in the first image and the second image and the vehicle based on the first image and the second image having a common viewing area acquired by the image acquisition system.
  • the image acquisition system may be composed of a plurality of cameras having a common viewing area.
  • the image acquisition system includes a first camera, which is a camera with a field of view greater than a preset angle.
  • the preset angle may be 180 degrees or 192 degrees.
  • the multiple cameras may be cameras of the same specification.
  • the image acquisition system can be composed of multiple fisheye cameras.
  • the multiple cameras may be cameras of different specifications.
  • the image acquisition system may be composed of one or more fisheye cameras and one or more pinhole cameras.
  • the image acquisition system may be arranged on a vehicle.
  • the above-mentioned vehicle may be a land vehicle or a non-land vehicle.
  • the above-mentioned land vehicles may include a compact car, a full-size sport utility vehicle (SUV), a van, a truck, a van, a bus, a motorcycle, a bicycle, a scooter, a train, a snowmobile, a wheeled vehicle, a tracked vehicle or a rail-mounted vehicle.
  • SUV sport utility vehicle
  • Such non-land vehicles may include drones, airplanes, hovercraft, spacecraft, ships, and sailboats.
  • the image acquisition system can be composed of four fisheye cameras (front view, rear view, left view and right view) and six pinhole cameras (front view, rear view, left and right front side view, left and right rear side view) arranged around the vehicle body, and there is a common viewing area between the fisheye cameras and the pinhole cameras.
  • the image acquisition system may be composed of a front-view pinhole camera, a front-view fisheye camera, a rear-view pinhole camera, and a rear-view fisheye camera arranged around the vehicle body.
  • the field of view of the front-view pinhole camera and the field of view of the front-view fisheye camera have a common viewing area
  • the field of view of the rear-view pinhole camera and the field of view of the rear-view fisheye camera have a common viewing area.
  • the image acquisition system may be composed of a rear-view fisheye camera and a rear-view pinhole camera, and the two have a common viewing area.
  • the image acquisition system may be composed of a forward-looking fisheye camera and a forward-looking pinhole camera, and the two have a common viewing area.
  • the image acquisition system may be composed of a left-viewing fisheye camera and a left-viewing pinhole camera, and the two cameras have a common viewing area.
  • the image acquisition system may be composed of a right-viewing fisheye camera and a right-viewing pinhole camera, and the two cameras have a common viewing area.
  • the computer device may be a terminal or a server.
  • the terminal may be a vehicle-mounted terminal, a smart phone, a tablet computer, a laptop computer, a desktop computer, a smart speaker, a smart watch, a smart TV, etc., but is not limited thereto.
  • the server may be an independent physical server, or a server cluster or distributed system composed of multiple physical servers, or a cloud server providing cloud computing services.
  • the image acquisition system and the computer device can communicate via wired or wireless means.
  • the above wireless method can achieve communication through a communication network, which can be a local area network, or a wide area network transferred through a relay device, or a local area network and a wide area network.
  • a communication network can be a wifi hotspot network, a wifi P2P network, a Bluetooth network, a zigbee network, a near field communication (NFC) network, or a possible general short-range communication network in the future, a dedicated short-range communication (DSRC) network, etc.
  • the above-mentioned communication network can be a third-generation mobile communication technology (3rd-generation wireless telephone technology, 3G) network, a fourth-generation mobile communication technology (the 4th-generation mobile communication technology, 4G) network, a fifth-generation mobile communication technology (5th-generation mobile communication technology, 5G) network, a public land mobile network (Public Land Mobile Network, PLMN) or the Internet, etc., and the embodiments of the present application are not limited to this.
  • 3G third-generation mobile communication technology
  • 4G fourth-generation mobile communication technology
  • 5th-generation mobile communication technology 5th-generation mobile communication technology
  • PLMN Public Land Mobile Network
  • the ranging method provided in the embodiment of the present application is introduced below in conjunction with the ranging system shown in FIG. 2 .
  • FIG4 shows a distance measurement method provided by an embodiment of the present application, which is applied to a vehicle, wherein the vehicle includes a first camera and a second camera.
  • the method can be executed by a computer device in the distance measurement system. As shown in FIG4 , the method includes:
  • a computer device acquires a first image and a second image.
  • the first image is an image captured by a first camera
  • the second image is an image captured by a second camera
  • the first camera is a fisheye camera
  • the second camera is a pinhole camera.
  • the common viewing area refers to the area where the visual field has an intersecting or overlapping range.
  • the first camera may be a camera with a field of view greater than a preset angle.
  • the preset angle may be 180 degrees or 192 degrees.
  • the computer device acquires the first image and the second image taken by a camera from the image acquisition system.
  • the computer device may acquire a first image and a second image captured by a forward-looking fisheye camera and a forward-looking pinhole camera arranged on the vehicle body, wherein the forward-looking fisheye camera and the forward-looking pinhole camera have a common viewing area.
  • the computer device may also acquire multiple groups of images, each group of images includes a first image and a second image, and cameras that capture the first image and the second image of the same group have a common viewing area.
  • the computer device obtains 4 groups of images taken by 8 cameras from an image acquisition system consisting of 4 fisheye cameras (front view, rear view, left view and right view) and 4 pinhole cameras (front view, rear view, left view and right view) arranged around the vehicle body.
  • the first group of images includes the first image 1 taken by the front fisheye camera and the second image 1 taken by the front pinhole camera, and there is a common viewing area between the front fisheye camera and the front pinhole camera.
  • the second group of images includes the first image 2 taken by the rear fisheye camera and the second image 2 taken by the rear pinhole camera, and there is a common viewing area between the rear fisheye camera and the rear pinhole camera.
  • the third group of images includes the first image 3 taken by the left fisheye camera and the second image 3 taken by the left pinhole camera, and there is a common viewing area between the left fisheye camera and the left pinhole camera.
  • the fourth group of images includes the first image 4 taken by the right fisheye camera and the second image 4 taken by the right pinhole camera, and there is a common viewing area between the right fisheye camera and the right pinhole camera.
  • the computer device may acquire multiple groups of images captured by the image acquisition system within a period of time, wherein the acquisition time of multiple first images in the same group is the same, and the acquisition time of first images in different groups is different.
  • the computer device can acquire 5 groups of images, the acquisition time of the first image and the second image in the first group of images is both 10:00:00, the acquisition time of the first image and the second image in the second group of images is both 10:00:01, the acquisition time of the first image and the second image in the third group of images is both 10:00:02, the acquisition time of the first image and the second image in the fourth group of images is both 10:00:03, and the acquisition time of the first image and the second image in the fifth group of images is both 10:00:04.
  • the computer device may also calibrate the coordinates of the pixel points in each of the above images according to the intrinsic parameters of the camera corresponding to each image and the preset camera intrinsic parameters.
  • the first image is calibrated according to the intrinsic parameters of the first camera and the preset fisheye camera intrinsic parameters.
  • the second image is calibrated according to the intrinsic parameters of the second camera and the preset pinhole camera intrinsic parameters.
  • the camera intrinsic parameters of a fisheye camera include focal length (fx, fy), imaging center position (cx, cy) and distortion parameters (k1, k2, k3, k4)
  • the camera internal parameters of the pinhole camera include focal length (fx, fy), imaging center position (cx, cy) and corresponding distortion parameters.
  • the distortion parameters include radial distortion coefficients (k1, k2, k3) and tangential distortion coefficients (p1, p2).
  • the camera external parameters are relative to the preset coordinate system, and the parameters are the three-dimensional position offset (x, y, z) and the angle between the camera optical axis and the coordinate axis (yaw, pitch, roll).
  • the preset coordinate system can be the body coordinate system established relative to the vehicle.
  • the inverse projection solution obtains the distorted point coordinates (xdistorted, ydistorted) on the unit depth plane.
  • the coordinates (x, y) of the undistorted point on the unit depth plane are then brought into the imaging process under the intrinsic parameters of the corresponding camera of the template camera system (after distortion and projection transformation), and then the coordinates (u', v') of the calibrated pixel points are obtained based on the coordinates of the undistorted point and the preset camera intrinsic parameters.
  • the correspondence between the coordinates of the calibrated pixel points and the coordinates of the calibrated pixel points is established, and each pixel point in the first image and the second image is converted according to the correspondence, so as to calibrate the camera images (the first image and the second image) into the images of the template camera.
  • an interpolation algorithm may be used to smooth the calibration image.
  • the image can be calibrated into the image of the template camera system.
  • the pixels of the camera image are back-projected onto the unit depth plane to simulate the real light entry path, and then projected onto the template camera.
  • the computer device obtains a first depth map and a second depth map.
  • the first depth map is the depth map corresponding to the first image
  • the second depth map is the depth map corresponding to the second image.
  • Degree graph
  • a computer device may obtain a first feature map and a second feature map. Then, a third feature map is obtained based on the first feature point of the first feature map and the multiple target feature points corresponding to the first feature point. Then, a fourth feature map is obtained based on the second feature point of the second feature map and the multiple target feature points corresponding to the second feature point. Then, the first depth map and the second depth map are obtained based on the third feature map and the fourth feature map.
  • the first feature map is a feature map corresponding to the first image
  • the second feature map is a feature map corresponding to the second image.
  • the first feature point is any feature point in the first feature map
  • the multiple target feature points corresponding to the first feature point are feature points in the second feature map that meet the epipolar constraint with the first feature point.
  • the second feature point is any feature point in the second feature map, and the multiple target feature points corresponding to the second feature point are feature points in the first feature map that meet the epipolar constraint with the second feature point.
  • feature matching of feature points through target feature points that meet the epipolar constraint in the image in the common view area corresponding to the feature points can, on the one hand, reduce the amount of calculation in the feature matching process; on the other hand, since the feature points that meet the epipolar constraint in the image in the common view area corresponding to the feature points have a high similarity with the feature points, feature matching of feature points through the target feature points of the feature points can make the matched feature points fuse the features of the target feature points, thereby increasing the recognition of the feature points and enabling the corresponding depth map to be obtained more accurately based on the feature map after feature fusion, thereby providing high ranging accuracy.
  • the computer device may input the first image and the second image into the target network to obtain the first depth map and the second depth map.
  • the target network may include a first subnetwork, and the first subnetwork is used to output a feature map of an image based on an input image.
  • a first image with a size of HxW taken by a fisheye camera can be input into the first subnetwork to obtain a first feature map with a size of H1xW1 for characterizing features of the first image
  • a second image with a size of H’xW’ taken by a pinhole camera can be input into the first subnetwork to obtain a second feature map with a size of H2xW2 for characterizing features of the second image.
  • the same first subnetwork may be used to extract features of images captured by different cameras to obtain feature maps corresponding to the images.
  • the resent50 feature extraction network is used to extract the features of the images captured by the pinhole camera and the fisheye camera to obtain the feature maps corresponding to the images.
  • different feature extraction networks may be used for different images to extract features of the images to obtain feature maps corresponding to the images.
  • the resent50 feature extraction network is used to extract the features of the image to obtain the feature map corresponding to the image.
  • the resent50 feature extraction network with deformable convolution is used to extract the features of the image to obtain the feature map corresponding to the image.
  • the first feature map and the second feature map may be size-aligned.
  • the first feature map is a feature map of a first image captured by a fisheye camera
  • the second feature map is a feature map of a second image captured by a pinhole camera.
  • the focal length of the fisheye camera is N
  • the focal length of the pinhole camera is 4N, that is, the focal length of the pinhole camera is 4 times that of the fisheye camera.
  • the first feature map can be enlarged by 4 times to align the sizes of the first feature map and the second feature map.
  • the target network may include a second subnetwork, and the second subnetwork is used to output a corresponding fused feature map according to the input feature map.
  • the second subnetwork can obtain a third feature map (i.e., a fused feature map of the first feature map) based on the first feature point of the first feature map and the multiple target feature points corresponding to the first feature point.
  • a fourth feature map i.e., a fused feature map of the second feature map is obtained based on the second feature point of the second feature map and the multiple target feature points corresponding to the second feature point.
  • the first feature map corresponding to the first image is flattened into a one-dimensional feature representation as [a0, a1, ..., aH1xW1], with a length of H1xW1
  • the second feature map corresponding to the second image is flattened into a one-dimensional feature representation as [b0, b1, ...bH2xW2], with a length of H2xW2.
  • the one-dimensional feature C is then mapped into three features, Q, K, and V, using a network.
  • the above operation is performed on each feature point to obtain a one-dimensional feature C', and then it is split and converted into a third feature map corresponding to the first feature map and a fourth feature map corresponding to the second feature map according to the splicing order of C.
  • the target feature point may also be a feature point that meets the epipolar constraint in an image where an image corresponding to the feature point exists in a common view area, and feature points around the feature point that meets the epipolar constraint.
  • the first feature map corresponding to the first image is flattened into a one-dimensional feature representation as [a0, a1, ..., aH1xW1], with a length of H1xW1
  • the second feature map corresponding to the second image is flattened into a one-dimensional feature representation as [b0, b1, ... bH2xW2], with a length of H2xW2.
  • the network is then used to map the one-dimensional feature C into three features, namely Q, K, and V. Assuming that the feature bi at index position i in the second feature map has n feature index positions corresponding to the depth range (dmin, dmax) in the first feature map after calculation through the epipolar constraint, they are ⁇ ad0, ad1,..., adn ⁇ .
  • the target network may include a third subnetwork, and the third subnetwork is used to output a corresponding depth map according to the input feature map or fused feature map.
  • the third feature map and the fourth feature map may be input into a third subnetwork of the target network to obtain a first depth map and a second depth map.
  • the first feature map and the second feature map may be input into a third subnetwork of the target network to obtain a first depth map and a second depth map.
  • the third sub-network is trained using a first training data sample set, and the first training data sample set includes a plurality of images and depth maps corresponding to the plurality of images.
  • a real-value car with 360-degree laser scanning can be used to obtain synchronous frame data of point cloud and image. Then, the depth map corresponding to the image is obtained through the point cloud, and the third sub-network is supervised and trained using the image and the depth map corresponding to the image. At the same time, self-supervised training and consistency between temporal frames can be used to assist in training the third sub-network.
  • S303 The computer device determines the distance between the object in the first image and/or the second image and the vehicle according to the first depth map and the second depth map.
  • the three-dimensional coordinates of each pixel in the first image in the first camera coordinate system can be obtained according to the first depth map (or the third depth map), and the three-dimensional coordinates of each pixel in the second image in the second camera coordinate system can be obtained according to the second depth map (or the fourth depth map), and then the three-dimensional coordinates of the pixel in the first camera coordinate system and the three-dimensional coordinates of the pixel in the second camera coordinate system are converted into the coordinates of the pixel in the vehicle coordinate system, and then the distance between the object and the vehicle in the first image and/or the second image is determined by the three-dimensional coordinates of the pixel in the vehicle coordinate system.
  • the first camera coordinate system is a coordinate system established with the optical center of the first camera as the coordinate origin
  • the second coordinate system is a coordinate system established with the optical center of the second camera as the coordinate origin
  • the vehicle coordinate system is a coordinate system established with the vehicle body reference point (such as the center of the rear axle of the vehicle) as the coordinate origin.
  • some objects may not be in the common viewing area of the first camera and the second camera, but may be in the single camera.
  • the objects exist solely within the field of view of the first camera or the second camera, that is, exist solely in the first image or the second image.
  • the distances between these objects and the vehicle can be determined by the first depth map and the second depth map.
  • the distance between the object and the vehicle in the first image and/or the second image can be determined based on the first depth map (or the third depth map), the second depth map (or the fourth depth map), the first structural semantics and the second structural semantics.
  • the three-dimensional coordinates of each pixel in the first image in the first camera coordinate system can be obtained according to the first depth map (or the third depth map), and the three-dimensional coordinates of each pixel in the second image in the second camera coordinate system can be obtained according to the second depth map (or the fourth depth map), and then the three-dimensional coordinates of the pixel in the first camera coordinate system and the three-dimensional coordinates of the pixel in the second camera coordinate system are converted into the coordinates of the pixel in the vehicle coordinate system, and then the pixel points corresponding to the edge points of the target object are aligned according to the first structural semantics and the second structural semantics, and then the distance between the object in the first image and/or the second image and the vehicle is determined by the three-dimensional coordinates of the pixel in the vehicle coordinate system.
  • the target object is an object that exists in both the first image and the second image.
  • an edge point of an object in a given space may appear simultaneously in a first image taken by a fisheye camera and a second image taken by a pinhole camera, but since the depth map corresponding to the image corresponds to each camera, the pixel coordinates provided by the fisheye camera and the pixel coordinates provided by the pinhole camera may still produce deviations in a unified coordinate system. Therefore, the distance measurement method provided in the embodiment of the present application uses structural semantics to align the pixel points corresponding to the edge points of the target object according to the above-mentioned first structural semantics and the above-mentioned second structural semantics, which can reduce the above-mentioned deviation.
  • the first image and the second image in FIG. 7 show the edge of object 1.
  • the pixel points corresponding to the edge of object 1 in the first image taken by the fisheye camera are converted to a string of points [q1, q2, ..., qm] in the vehicle coordinate system, and the pixel points corresponding to the edge of object 1 in the second image taken by the pinhole camera are converted to another string of points [p1, p2, ..., pn] in the vehicle coordinate system.
  • the pixel points corresponding to the edge of object 1 in the first image are rotated and translated by RT operation, they can be aligned with the pixel points corresponding to the edge of object 1 in the second image, that is, the sum of the Euclidean geometric distances between the two most adjacent points is the smallest.
  • the RT matrix can be calculated by the gradient solution algorithm to align the edges of object 1 in the first image and/or the second image. Similarly, the same optimization can be performed on the pixel points of other identical objects in the first image and/or the second image.
  • the distance between the object and the vehicle in the first image and/or the second image can be determined based on the first depth map (or third depth map), the second depth map (or fourth depth map), the first instance segmentation result, the second instance segmentation result, the first distance information, and the second distance information.
  • the above-mentioned first instance segmentation result is used to indicate the background and movable objects in the above-mentioned first image
  • the above-mentioned second instance segmentation result is used to indicate the background and movable objects in the above-mentioned second image
  • the above-mentioned first distance information is used to indicate the distance between the object and the vehicle in the third image
  • the above-mentioned third image is the image captured by the above-mentioned first camera at the second moment
  • the above-mentioned second distance information is used to indicate the distance between the object and the vehicle in the fourth image
  • the above-mentioned fourth image is the image captured by the above-mentioned second camera at the above-mentioned second moment.
  • the three-dimensional coordinates of each pixel in the first image in the first camera coordinate system can be obtained according to the first depth map (or the third depth map), and the three-dimensional coordinates of each pixel in the second image in the second camera coordinate system can be obtained according to the second depth map (or the fourth depth map), and then the three-dimensional coordinates of the pixel in the first camera coordinate system and the three-dimensional coordinates of the pixel in the second camera coordinate system are converted into the coordinates of the pixel in the vehicle coordinate system, and then the distance between the object and the vehicle in the first image and/or the second image is determined by the three-dimensional coordinates of the pixel in the vehicle coordinate system. Then, the distance between the object and the vehicle in the first image and/or the second image is corrected by the above-mentioned first instance segmentation result, the above-mentioned second instance segmentation result, the first distance information and the second distance information.
  • the position of the background is fixed, and the distance between the vehicle and the background at one of the two moments is determined by the position relationship of the vehicle between the two moments and the distance between the vehicle and the background at the other moment.
  • the vehicle is 5 meters away from the wall at moment 1
  • the vehicle travels 0.5 meters away from the wall between moment 1 and moment 2
  • this relationship is used to correct the distance between the object and the vehicle in the first image and/or the second image obtained by the first instance segmentation result, the second instance segmentation result, the first distance information, and the second distance information.
  • the distance measurement method obtains the first depth map and the second depth map corresponding to the above images by inputting the first image captured by a camera with a field of view greater than a preset angle and the second image captured by a camera having a common viewing area with the camera into the target network, and then determines the distance between the object in the first image and the above second image and the vehicle according to the first depth map and the second depth map.
  • the camera with a field of view greater than the preset angle can make up for the blind spots inherent in the layout of the distance measurement sensor, so that the vehicle can measure the distance to obstacles in the detection blind spots.
  • the related art uses ultrasonic sensors for distance measurement. Due to the blind spots in the sensor layout, the ultrasonic sensor cannot detect the suspended obstacles behind the vehicle and give the user a reminder during the vehicle reversing process, which can easily cause accidents.
  • the distance measurement method provided in the embodiment of the present application uses the images collected by the first camera (such as a fisheye camera) and the second camera (such as a pinhole camera) with a common viewing area to jointly measure the distance, thereby compensating for the blind spots inherent in the distance measurement sensor layout, so that the vehicle can detect the suspended obstacles in the detection blind spots of the ultrasonic sensor and give the user a reminder, so that the user can promptly discover the suspended obstacles behind the vehicle, thereby reducing the probability of the vehicle colliding with the suspended obstacles.
  • the first camera such as a fisheye camera
  • the second camera such as a pinhole camera
  • the ranging method provided in the embodiment of the present application may further include:
  • the target network may include a fourth subnetwork, and the fourth subnetwork is used to output corresponding structural semantics according to the input image or feature map, wherein the structural semantics is used to indicate the edges and planes of objects in the image.
  • the first image, the first feature map or the third feature map may be input into the fourth subnetwork of the target network to obtain the first structural semantics.
  • the second image, the second feature map or the fourth feature map may be input into the fourth subnetwork of the target network to obtain the second structural semantics.
  • the first structural semantics is used to indicate the edges and planes of objects in the first image
  • the second structure is used to indicate the edges and planes of objects in the second image.
  • the object edge of the object may be represented by a heat map of the edge, and the plane structure of the object may be represented by a three-dimensional normal vector map.
  • instance segmentation and annotation can be performed on multiple images to obtain the edges of objects in the multiple images, and then the plane normal vectors of each area of the image can be calculated based on the geometric information of the point cloud and the semantic information of the instance segmentation annotation to obtain the planar structure of the objects in the multiple images. Then, the second sub-network is trained through the supervision of the object edges in the multiple images and the planar structure of the objects in the multiple images.
  • the target network may include a fifth subnetwork, which is used to output corresponding structural semantics according to the input image or feature map.
  • the object attribute includes the instance segmentation result of the image.
  • the instance segmentation of the image is used to indicate the background and movable objects in the image.
  • the instance segmentation of the image is used to indicate movable objects such as vehicles and pedestrians in the image and backgrounds such as the ground and walls in the image.
  • the first image, the first feature map or the third feature map may be input into the fifth subnetwork of the target network to obtain a first instance segmentation result.
  • the second image, the second feature map or the fourth feature map may be input into the fifth subnetwork of the target network to obtain a second instance segmentation result.
  • the first instance segmentation result is used to indicate the instance segmentation result of the first image
  • the second instance segmentation result is used to indicate the instance segmentation result of the second image.
  • instance segmentation annotation may be performed on multiple images to obtain instance segmentation structures of the multiple images, and then the fifth sub-network may be trained using the multiple images and the instance segmentation structures of the multiple images.
  • S306 The computer device performs three-dimensional reconstruction of the objects in the first image and the second image according to the distances between the objects in the first image and the second image and the vehicle.
  • the computer device performs three-dimensional reconstruction of the objects in the first image and the second image according to the distance between the objects and the vehicle and the color information in the first image and the second image, wherein the color information is used to indicate the color of each pixel in the first image and the second image.
  • the computer device displays the three-dimensionally reconstructed object.
  • the terminal may display the three-dimensionally reconstructed object through a display panel.
  • the server may send a display instruction to the terminal, and the terminal may display the three-dimensional image according to the display instruction after receiving the display instruction.
  • the constructed object may be any suitable object that can be used to display the three-dimensional image according to the display instruction after receiving the display instruction.
  • S308 The computer device displays prompt information according to the distance between the object in the first image and the second image and the vehicle.
  • a collision warning prompt message is displayed to remind the user that the vehicle may collide with the object in the first image.
  • distance prompt information is displayed to remind the user that the distance between the vehicle and the object in the second image is relatively close.
  • the above prompt information may be text information, sound information or image information.
  • the distance between the object in the first image and/or the second image and the vehicle is the distance between the object in the first image and/or the second image and the isometric contour of the vehicle.
  • the isometric contour of the vehicle is an isometric contour set according to the outer contour of the vehicle.
  • the isometric contour may be an isometric line extending outward from the outer contour line of the vehicle body in a two-dimensional (2D) top view, or may be an isometric surface extending outward from the three-dimensional (3D) outer contour of the vehicle body.
  • the vehicle isometric profile may be adjusted according to the distance between the object and the vehicle in the first image and the second image.
  • the color of the equidistant contour is adjusted to yellow.
  • the color of the equidistant contour is adjusted to red.
  • the method may further include: displaying a first interface.
  • the first interface may include a display function control and a setting function control.
  • the display function is used to display objects around the vehicle.
  • the user can click the display function control to display the second interface, through which the sensor used in the display function operation process can be selected.
  • the user has selected to turn on the front fisheye camera, rear fisheye camera, front pinhole camera, rear pinhole camera, laser radar and ultrasonic detector.
  • object 1, object 2, and object 3 are located in the field of view of the forward-looking fisheye camera
  • object 2 and object 4 are located in the field of view of the forward-looking pinhole camera
  • Object 2 is located in the common field of view of the forward-looking fisheye camera and the forward-looking pinhole camera.
  • the user only chooses to turn on the forward-looking fisheye camera and turn off the forward-looking pinhole camera in the second interface, then returns to the first interface and clicks on the display function control. Then the third interface shown in Figure 13 is displayed. It can be seen from the third interface shown in Figure 13 that there are objects 1, 2, and 3 in front of the vehicle. Since the forward-looking pinhole camera is turned off, the third interface shown in Figure 13 lacks object 4 within the field of view of the forward-looking pinhole camera compared to the actual scene shown in Figure 11. Therefore, there is a detection blind spot when only using the forward-looking fisheye camera for object detection, and object 4 in front of the vehicle cannot be detected.
  • the user only chooses to turn on the front-view pinhole camera and turn off the front-view fisheye camera in the second interface, then returns to the first interface and clicks the display function control. Then the third interface shown in Figure 15 is displayed. It can be seen from the third interface shown in Figure 15 that there are objects 2 and 4 in front of the vehicle. Since the front-view fisheye camera is turned off, the third interface shown in Figure 15 lacks object 4 in the field of view of the front-view fisheye camera compared to the actual scene shown in Figure 11. Therefore, there is a detection blind spot when only using the front-view pinhole camera for object detection, and objects 1 and 3 in front of the vehicle cannot be found.
  • the user only chooses to turn on the front pinhole camera and the front fisheye camera in the second interface, then returns to the first interface and clicks the display function control. Then the third interface shown in FIG17 is displayed. It can be seen from the third interface shown in FIG17 that there are objects 1, 2, 3 and 4 in front of the vehicle, which is consistent with the actual scene shown in FIG11.
  • the embodiment of the present application can compensate for the blind spots inherent in the single sensor layout by selecting multiple sensors to measure distance together, so that the vehicle can measure the distance to obstacles in the detection blind spot of a single sensor.
  • the third interface may also display prompt information. For example, when the distance between the vehicle and the object is less than 0.5 meters, the third interface displays prompt information.
  • the embodiment of the present application can remind the user when the distance between an object and the vehicle is close, so that the user can deal with it in time, thereby avoiding a collision between the vehicle and the object.
  • the ranging method provided in the embodiment of the present application can be integrated into a public cloud and released externally as a service.
  • the distance measurement method can also protect the data uploaded by users. For example, for images, users can be required to upload images that have been encrypted in advance.
  • the distance measurement method provided in the embodiment of the present application can also be integrated into a private cloud and used internally as a service.
  • the distance measurement method can be determined whether to protect the user uploaded data according to actual needs.
  • the ranging method provided in the embodiment of the present application may also be integrated into a hybrid cloud, wherein a hybrid cloud refers to an architecture including one or more public clouds and one or more private clouds.
  • the service may provide an application programming interface (API) and/or a user interface (also referred to as a user interface).
  • the user interface may be a graphical user interface (GUI) or a command user interface (CUI).
  • GUI graphical user interface
  • CUI command user interface
  • a business system such as an operating system or a software system may directly call the API provided by the service for distance measurement, or the service may receive an image input by the user through the GUI or CUI and perform distance measurement based on the image.
  • the distance measurement method provided in the embodiment of the present application can be packaged into a software package for sale, and the user can install and use it in the user's operating environment after purchasing the software package.
  • the above software package can also be pre-installed in various devices (for example, desktop computers, laptops, tablet computers, smart phones, etc.), and the user purchases a device with a pre-installed software package and uses the device to measure distance based on the image.
  • the distance measuring device for executing the above distance measuring method will be introduced below in conjunction with FIG. 19 .
  • the ranging device includes hardware and/or software modules corresponding to the execution of each function.
  • the embodiments of the present application can be implemented in the form of hardware or a combination of hardware and computer software. Whether a function is executed in the form of hardware or computer software driving hardware depends on the specific application and design constraints of the technical solution. Those skilled in the art can use different methods to implement the described functions for each specific application in combination with the embodiments, but such implementation should not be considered to exceed the scope of the embodiments of the present application.
  • the embodiment of the present application can divide the functional modules of the ranging device according to the above method example.
  • each functional module can be divided according to each function, or two or more functions can be integrated into one processing module.
  • the above integrated module can be implemented in the form of hardware. It should be noted that the division of modules in this embodiment is schematic and is only a logical function division. There may be other division methods in actual implementation.
  • Figure 19 shows a possible composition diagram of the ranging device involved in the above embodiment.
  • the ranging device 1800 may include: an acquisition unit 1801, a network unit 1802 and a determination unit 1803.
  • the acquisition unit 1801 is used to acquire a first image and a second image, where the first image is an image captured by a first camera, and the second image is an image captured by a second camera.
  • the first camera is a fisheye camera
  • the second camera is a pinhole camera.
  • the network unit 1802 is configured to obtain a first depth map and a second depth map, where the first depth map is a depth map corresponding to the first image, and the second depth map is a depth map corresponding to the second image.
  • the determining unit 1803 is configured to determine the distance between the object in the first image and/or the second image and the vehicle according to the first depth map and the second depth map.
  • the network unit is specifically used to: perform feature extraction on the first image and the second image to obtain a first feature map and a second feature map, wherein the first feature map is a feature map corresponding to the first image, and the second feature map is a feature map corresponding to the second image.
  • the network unit is specifically used to: input the third feature map into the target network to obtain the first depth map and the first structural semantics, and the first structural semantics is used to indicate the edges and planes of objects in the first image.
  • the determination unit is specifically configured to determine the distance between the object and the vehicle in the first image and the second image according to the first depth map, the second depth map, the first structural semantics, and the second structural semantics.
  • the network unit is specifically used to: input the third feature map into the target network to obtain the first depth map and the first instance segmentation result, and the first instance segmentation result is used to indicate the background and movable objects in the first image.
  • the first image is an image captured by the first camera at a first moment
  • the second image is an image captured by the second camera at the first moment
  • the determination unit is specifically used to determine the distance between the object and the vehicle in the first image and the second image according to the first depth map, the second depth map, the first instance segmentation result, the second instance segmentation result, the first distance information, and the second distance information, wherein the first distance information is used to indicate the distance between the object and the vehicle in the third image, the third image is the image captured by the first camera at the second moment, and the second distance information is used to indicate the distance between the object and the vehicle in the fourth image, and the fourth image is the image captured by the second camera at the second moment.
  • the acquisition unit is further used to: calibrate the first image according to an intrinsic parameter of the first camera and a preset camera intrinsic parameter.
  • the second image is an image captured by a second camera
  • the acquisition unit is further used to calibrate the second image according to an intrinsic parameter of the second camera and a preset camera intrinsic parameter.
  • the determination unit is further configured to: perform three-dimensional reconstruction of the objects in the first image and the second image according to the distance between the objects in the first image and the second image and the vehicle, and display the three-dimensionally reconstructed objects.
  • the determination unit is further configured to: display prompt information according to the distance between the object in the first image and the second image and the vehicle.
  • the first camera is a fisheye camera
  • the second camera is a pinhole camera
  • the embodiment of the present application also provides a chip.
  • the chip can be a system on chip (SOC) or other chips.
  • the chip 1900 includes one or more processors 1901 and an interface circuit 1902. Optionally, the chip 1900 may also include a bus 1903.
  • the processor 1901 may be an integrated circuit chip with signal processing capability. In the implementation process, each step of the above distance measurement method may be completed by an integrated logic circuit of hardware in the processor 1901 or by instructions in the form of software.
  • the processor 1901 can be a general-purpose processor, a digital signal processing (DSP), an integrated circuit (ASIC), a field-programmable gate array (FPGA) or other programmable logic devices, discrete gates or transistor logic devices, discrete hardware components.
  • DSP digital signal processing
  • ASIC integrated circuit
  • FPGA field-programmable gate array
  • the methods and steps disclosed in the embodiments of the present application can be implemented or executed.
  • the general-purpose processor can be a microprocessor or the processor can also be any conventional processor, etc.
  • the interface circuit 1902 can be used to send or receive data, instructions or information.
  • the processor 1901 can use the data, instructions or other information received by the interface circuit 1902 to process, and can send the processing completion information through the interface circuit 1902.
  • the chip also includes a memory, which may include a read-only memory and a random access memory, and provides operation instructions and data to the processor.
  • a portion of the memory may also include a non-volatile random access memory (NVRAM).
  • NVRAM non-volatile random access memory
  • the memory stores executable software modules or data structures
  • the processor can perform corresponding operations by calling operation instructions stored in the memory (the operation instructions can be stored in the operating system).
  • the chip can be used in the distance measuring device involved in the embodiment of the present application.
  • the interface circuit 1902 can be used to output the execution result of the processor 1901.
  • the distance measuring method provided by one or more embodiments of the embodiment of the present application can refer to the above embodiments, which will not be repeated here.
  • processor 1901 and the interface circuit 1902 can be implemented through hardware design, software design, or a combination of hardware and software, and there is no limitation here.
  • the electronic device 100 may be a mobile phone, a tablet computer, a wearable device, a vehicle-mounted device, an augmented reality (AR)/virtual reality (VR) device, a laptop computer, an ultra-mobile personal computer (UMPC), a netbook, a personal digital assistant (PDA), a model processing device, or a chip or functional module in a model processing device.
  • AR augmented reality
  • VR virtual reality
  • UMPC ultra-mobile personal computer
  • PDA personal digital assistant
  • model processing device or a chip or functional module in a model processing device.
  • FIG21 is a schematic diagram of the structure of an electronic device 100 provided in an embodiment of the present application.
  • the electronic device 100 may include a processor 110, an external memory interface 120, an internal memory 121, a universal serial bus (USB) interface 130, charging management module 140, power management module 141, battery 142, antenna 1, antenna 2, mobile communication module 150, wireless communication module 160, audio module 170, speaker 170A, receiver 170B, microphone 170C, headphone jack 170D, sensor module 180, button 190, motor 191, indicator 192, camera 193, display screen 194, and subscriber identification module (SIM) card interface 195, etc.
  • SIM subscriber identification module
  • the sensor module 180 may include a pressure sensor 180A, a gyroscope sensor 180B, an air pressure sensor 180C, a magnetic sensor 180D, an acceleration sensor 180E, a distance sensor 180F, a proximity light sensor 180G, a fingerprint sensor 180H, a temperature sensor 180J, a touch sensor 180K, an ambient light sensor 180L, a bone conduction sensor 180M, etc.
  • the structure illustrated in the embodiment of the present application does not constitute a specific limitation on the electronic device 100.
  • the electronic device 100 may include more or fewer components than shown in the figure, or combine some components, or split some components, or arrange the components differently.
  • the components shown in the figure may be implemented in hardware, software, or a combination of software and hardware.
  • the processor 110 may include one or more processing units, for example, the processor 110 may include an application processor (AP), a modem processor, a graphics processor (GPU), an image signal processor (ISP), a controller, a memory, a video codec, a digital signal processor (DSP), a baseband processor, and/or a neural-network processing unit (NPU), etc.
  • AP application processor
  • GPU graphics processor
  • ISP image signal processor
  • controller a memory
  • video codec a digital signal processor
  • DSP digital signal processor
  • NPU neural-network processing unit
  • Different processing units may be independent devices or integrated in one or more processors.
  • the controller may be the nerve center and command center of the electronic device 100.
  • the controller may generate an operation control signal according to the instruction operation code and the timing signal to complete the control of fetching and executing instructions.
  • a memory may also be provided in the processor 110 for storing instructions and data.
  • the processor 110 may include one or more interfaces.
  • the interface may include an inter-integrated circuit (I2C) interface, an inter-integrated circuit sound (I2S) interface, a pulse code modulation (PCM) interface, a universal asynchronous receiver/transmitter (UART) interface, a mobile industry processor interface (MIPI), a general-purpose input/output (GPIO) interface, a subscriber identity module (SIM) interface, and/or a universal serial bus (USB) interface, etc.
  • I2C inter-integrated circuit
  • I2S inter-integrated circuit sound
  • PCM pulse code modulation
  • UART universal asynchronous receiver/transmitter
  • MIPI mobile industry processor interface
  • GPIO general-purpose input/output
  • SIM subscriber identity module
  • USB universal serial bus
  • the I2C interface is a bidirectional synchronous serial bus.
  • the processor 110 can couple the touch sensor 180K through the I2C interface, so that the processor 110 and the touch sensor 180K communicate through the I2C bus interface to realize the touch function of the electronic device 100.
  • the MIPI interface can be used to connect the processor 110 with peripheral devices such as the display screen 194 and the camera 193.
  • the MIPI interface includes a camera serial interface (CSI), a display serial interface (DSI), etc.
  • the processor 110 and the camera 193 communicate through the CSI interface to realize the shooting function of the electronic device 100.
  • the processor 110 and the display screen 194 communicate through the DSI interface to realize the display function of the electronic device 100.
  • the interface connection relationship between the modules illustrated in the embodiment of the present application is only a schematic illustration and does not constitute a structural limitation on the electronic device 100.
  • the electronic device 100 may also adopt different interface connection methods in the above embodiments, or a combination of multiple interface connection methods.
  • the charging management module 140 is used to receive charging input from a charger.
  • the charger can be a wireless charger or a wired charger.
  • the power management module 141 is used to connect the battery 142, the charging management module 140 and the processor 110.
  • the power management module 141 receives input from the battery 142 and/or the charging management module 140, and provides power to the processor 110, the internal memory 121, the external memory, the display screen 194, the camera 193, and the wireless communication module 160.
  • the electronic device 100 implements the display function through a GPU, a display screen 194, and an application processor.
  • the GPU is a microprocessor for image processing, which connects the display screen 194 and the application processor.
  • the GPU is used to perform mathematical and geometric calculations for graphics rendering.
  • the processor 110 may include one or more GPUs that execute program instructions to generate or change display information.
  • the display screen 194 is used to display images, videos, etc.
  • the display screen 194 includes a display panel.
  • the display panel can be a liquid crystal display (LCD), an organic light-emitting diode (OLED), an active-matrix organic light-emitting diode or an active-matrix organic light-emitting diode (AMOLED), a flexible light-emitting diode (FLED), Miniled, MicroLed, Micro-oLed, quantum dot light-emitting diodes (QLED), etc.
  • the electronic device 100 may include 1 or N display screens 194, where N is a positive integer greater than 1.
  • the electronic device 100 can realize the shooting function through ISP, camera 193, touch sensor, video codec, GPU, display screen 194 and application processor.
  • the ISP is used to process the data fed back by the camera 193. For example, when taking a photo, the shutter is opened and light is transmitted to the camera through the lens. On the camera photosensitive element, the light signal is converted into an electrical signal, and the camera photosensitive element transmits the electrical signal to the ISP for processing and converts it into an image visible to the naked eye.
  • the ISP can also perform algorithm optimization on the noise, brightness, and skin color of the image.
  • the ISP can also optimize the exposure, color temperature and other parameters of the shooting scene.
  • the ISP can be set in the camera 193.
  • the camera 193 is used to capture still images or videos.
  • an optical image is generated through a lens and projected onto a photosensitive element.
  • the photosensitive element may be a charge coupled device (CCD) or a complementary metal oxide semiconductor (CMOS) phototransistor.
  • CMOS complementary metal oxide semiconductor
  • the photosensitive element converts the optical signal into an electrical signal, and then transmits the electrical signal to the ISP for conversion into a digital image signal.
  • the ISP outputs the digital image signal to the DSP for processing.
  • the DSP converts the digital image signal into an image signal in a standard RGB, YUV or other format. It should be understood that in the description of the embodiments of the present application, an image in RGB format is used as an example for introduction, and the embodiments of the present application do not limit the image format.
  • the electronic device 100 may include 1 or N cameras 193, where N is a positive integer greater than 1.
  • the digital signal processor is used to process digital signals, and can process not only digital image signals but also other digital signals. For example, when the electronic device 100 is selecting a frequency point, the digital signal processor is used to perform Fourier transform on the frequency point energy.
  • Video codecs are used to compress or decompress digital videos.
  • the electronic device 100 may support one or more video codecs. In this way, the electronic device 100 may play or record videos in a variety of coding formats, such as Moving Picture Experts Group (MPEG) 1, MPEG2, MPEG3, MPEG4, etc.
  • MPEG Moving Picture Experts Group
  • MPEG2 MPEG2, MPEG3, MPEG4, etc.
  • the external memory interface 120 can be used to connect an external memory card, such as a Micro SD card, to expand the storage capacity of the electronic device 100.
  • the internal memory 121 can be used to store computer executable program codes, which include instructions.
  • the processor 110 executes various functional applications and data processing of the electronic device 100 by running the instructions stored in the internal memory 121.
  • the internal memory 121 may include a program storage area and a data storage area.
  • the electronic device 100 can implement audio functions such as music playing and recording through the audio module 170, the speaker 170A, the receiver 170B, the microphone 170C, the headphone jack 170D, and the application processor.
  • the button 190 includes a power button, a volume button, etc.
  • the button 190 can be a mechanical button. It can also be a touch button.
  • the electronic device 100 can receive button input and generate key signal input related to the user settings and function control of the electronic device 100.
  • the motor 191 can generate a vibration prompt.
  • the motor 191 can be used for incoming call vibration prompts, and can also be used for touch vibration feedback. For example, touch operations acting on different applications (such as taking pictures, audio playback, etc.) can correspond to different vibration feedback effects. For touch operations acting on different areas of the display screen 194, the motor 191 can also correspond to different vibration feedback effects.
  • the indicator 192 can be an indicator light, which can be used to indicate the charging status, power changes, and can also be used to indicate messages, missed calls, notifications, etc.
  • the SIM card interface 195 is used to connect a SIM card.
  • the electronic device 100 can be a chip system or a device with a similar structure as shown in Figure 21.
  • the chip system can be composed of chips, or it can include chips and other discrete devices.
  • the actions, terms, etc. involved in the various embodiments of the present application can refer to each other without limitation.
  • the message name or parameter name in the message exchanged between the various devices in the embodiments of the present application is only an example, and other names can also be used in the specific implementation without limitation.
  • the component structure shown in Figure 21 does not constitute a limitation on the electronic device 100.
  • the electronic device 100 may include more or fewer components than those shown in Figure 21, or combine certain components, or arrange the components differently.
  • the processor and transceiver described in the present application can be implemented in an integrated circuit (IC), an analog IC, a radio frequency integrated circuit, a mixed signal IC, an application specific integrated circuit (ASIC), a printed circuit board (PCB), an electronic device, etc.
  • the processor and transceiver can also be manufactured using various IC process technologies, such as complementary metal oxide semiconductor (CMOS), N-type metal oxide semiconductor (NMOS), P-type metal oxide semiconductor (positive channel metal oxide semiconductor, PMOS), bipolar junction transistor (BJT), bipolar CMOS (BiCMOS), silicon germanium (SiGe), gallium arsenide (GaAs), etc.
  • CMOS complementary metal oxide semiconductor
  • NMOS N-type metal oxide semiconductor
  • PMOS P-type metal oxide semiconductor
  • BJT bipolar junction transistor
  • BiCMOS bipolar CMOS
  • SiGe silicon germanium
  • GaAs gallium arsenide
  • An embodiment of the present application further provides a distance measurement device, which includes: one or more processors.
  • a distance measurement device which includes: one or more processors.
  • the one or more processors execute program codes or instructions, the above-mentioned related method steps are implemented to implement the distance measurement method in the above-mentioned embodiment.
  • the apparatus may further include one or more memories for storing the program code or instructions.
  • An embodiment of the present application also provides a vehicle, which includes one or more fisheye cameras, one or more pinhole cameras and one or more processors, and the processors can be used in the distance measurement method in the above embodiment.
  • the one or more processors can be implemented in the form of the above distance measurement device.
  • the vehicle also includes a display screen for displaying information such as road conditions, distance prompt information, a two-dimensional/three-dimensional model of the vehicle or a two-dimensional/three-dimensional model of an obstacle.
  • the vehicle also includes a speaker for playing voice prompt information, and the voice prompt information may include danger prompts and/or information such as the distance between the vehicle and the obstacle.
  • a voice prompt is used to remind the driver to pay attention to the existence of obstacles, etc.
  • the vehicle can use only the display screen to display prompt information to remind the driver, or only the voice prompt information to remind the driver, or the display screen display and voice prompt are combined to remind the driver. For example, when the distance between the vehicle and the obstacle is lower than the first threshold, only the prompt information is displayed on the display screen.
  • the present application does not limit the values of the second threshold and the first threshold, it only needs to satisfy that the second threshold is lower than the first threshold, for example, the second threshold is 2 meters and the first threshold is 1 meter), the driver is prompted to pay attention to the obstacle in combination with the voice prompt while displaying, thereby attracting the driver's attention.
  • the embodiment of the present application further provides a computer storage medium, in which computer instructions are stored.
  • the distance measuring device executes the above-mentioned related method steps to implement the distance measuring method in the above-mentioned embodiment.
  • the embodiment of the present application also provides a computer program product.
  • the computer program product When the computer program product is run on a computer, the computer is enabled to execute the above-mentioned related steps to implement the ranging method in the above-mentioned embodiment.
  • the embodiment of the present application also provides a distance measuring device, which can be a chip, an integrated circuit, a component or a module.
  • the device may include a connected processor and a memory for storing instructions, or the device includes one or more processors for obtaining instructions from an external memory.
  • the processor can execute instructions so that the chip executes the distance measuring method in the above-mentioned method embodiments.
  • the size of the serial numbers of the above-mentioned processes does not mean the order of execution.
  • the execution order of each process should be determined by its function and internal logic, and should not constitute any limitation on the implementation process of the embodiments of the present application.
  • the disclosed systems, devices and methods can be implemented in other ways.
  • the device embodiments described above are only schematic.
  • the division of the above units is only a logical function division. There may be other division methods in actual implementation, such as multiple units or components can be combined or integrated into another system, or some features can be ignored or not executed.
  • Another point is that the mutual coupling or direct coupling or communication connection shown or discussed can be through some interfaces, indirect coupling or communication connection of devices or units, which can be electrical, mechanical or other forms.
  • the units described above as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in one place or distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
  • each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit.
  • the above functions are implemented in the form of software functional units and sold or used as independent products, they can be stored in a computer-readable storage medium.
  • the technical solution of the present application or the part that contributes to the prior art or the part of the technical solution, can be embodied in the form of a software product.
  • the computer software product is stored in a storage medium, including several instructions to enable a computer device (which can be a personal computer, server, or network device, etc.) to execute all or part of the steps of the above methods in each embodiment of the present application.
  • the aforementioned storage media include: U disk, mobile hard disk, read-only memory (ROM), random access memory (RAM), disk or optical disk, and other media that can store program codes.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Remote Sensing (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Electromagnetism (AREA)
  • Acoustics & Sound (AREA)
  • Image Analysis (AREA)

Abstract

一种测距方法和装置,涉及自动驾驶领域,能够使车辆对障碍物进行测距。该方法应用于车辆,车辆包括第一相机和第二相机,该方法包括:首先获取第一图像和第二图像(S301)。然后获取第一深度图和第二深度图(S302)。之后根据第一深度图和第二深度图确定第一图像和/或第二图像中的物体与车辆之间的距离(S303)。其中,第一图像为第一相机采集的图像,第二图像为第二相机采集的图像,第一相机和第二相机存在共视区域,第一相机为鱼眼相机,第二相机为针孔相机,第一深度图为第一图像对应的深度图,第二深度图为第二图像对应的深度图。

Description

测距方法和装置
本申请要求于2022年10月31日提交中国专利局、申请号为202211349288.4、申请名称为“测距方法和装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请实施例涉及自动驾驶领域,尤其涉及测距方法和装置。
背景技术
随着科技的进步,高级辅助驾驶和自动驾驶技术日益走进了人们的生活。在高级辅助驾驶和自动驾驶技术中,感知自车周围障碍物的距离是决策规划和驾驶安全的重要基础。
相关技术中,主要通过安装在车辆周身的测距传感器(如激光雷达、毫米波雷达和超声波雷达)等设备感知自车周围障碍物的距离,然而安装在车辆周身的测距传感器存在探测盲区。
因此,车辆如何对探测盲区中障碍物进行测距是本领域人员亟须解决的问题之一。
发明内容
本申请实施例提供了用于车辆的测距方法和装置,能够使车辆对探测盲区中障碍物进行测距。为达到上述目的,本申请实施例采用如下技术方案:
第一方面,本申请实施例提供了一种测距方法,应用于车辆,上述车辆包括第一相机和第二相机,该方法包括:首先获取第一图像和第二图像。然后获取第一深度图和第二深度图。之后根据所述第一深度图和所述第二深度图确定所述第一图像和/或所述第二图像中的物体与车辆之间的距离根据所述第一深度图和所述第二深度图确定所述第一图像和/或所述第二图像中的物体与车辆之间的距离。其中,所述第一图像为第一相机采集的图像,所述第二图像为第二相机采集的图像,所述第一相机和所述第二相机存在共视区域,所述第一相机为鱼眼相机,所述第二相机为针孔相机,所述第一深度图为所述第一图像对应的深度图,所述第二深度图为所述第二图像对应的深度图。
本申请实施例提供的测距方法,通过视场范围较大的鱼眼相机采集的第一图像和与该相机存在共视区域的针孔相机采集的第二图像得到第一深度图和第二深度图,之后根据第一深度图和第二深度图确定第一图像和/或第二图像中的物体与车辆之间的距离。通过视场范围较大的鱼眼相机能够弥补测距传感器布局固有的盲区,使车辆对探测盲区中障碍物(如悬空障碍物)进行测距。
需要说明的是,不同相机存在共视区域是指不同相机的视野区域存在交叉或重叠的区域。
在一种可能的实现方式中,可以将所述第一图像和所述第二图像输入目标网络,得到第一深度图和第二深度图。
可以看出,通过将所述第一图像和所述第二图像输入目标网络,得到第一深度图和第二深度图,之后根据第一深度图和第二深度图确定第一图像和所述第二图像中的物体与车辆之间的距离。通过视场范围较大的鱼眼相机能够弥补测距传感器布局固有的盲区,使车辆对探测盲区中障碍物(如悬空障碍物)进行测距。
可选地,上述第一相机可以为视场范围大于预设角度的鱼眼相机。示例性地,所述预设角度为180度或192度。
需要说明的是,一些物体(如悬空障碍物)可能不处于第一相机和第二相机的共视区域,而是单独存在于第一相机或第二相机的视野范围内,即单独存在于第一图像或第二图像中,这些物体与车辆之间的距离可以通过第一深度图和第二深度图确定得到。
在一种可能的实现方式中,可以获取第一特征图和第二特征图。然后根据所述第一特征图的第一特征点及所述第一特征点对应的多个目标特征点,得到第三特征图。之后根据所述第二特征图的第二特征点及所述第二特征点对应的多个目标特征点,得到第四特征图。接着根据所述第三特征图和所述第四特征图,得到所述第一深度图和所述第二深度图。其中,所述第一特征图为所述第一图像对应的特征图,所述第二特征图为所述第二图像对应的特征图。所述第一特征点为所述第一特征图中的任意 特征点,所述第一特征点对应的多个目标特征点为所述第二特征图中与所述第一特征点符合极线约束的特征点。所述第二特征点为所述第二特征图中的任意特征点,所述第二特征点对应的多个目标特征点为所述第一特征图中与所述第二特征点符合极线约束的特征点。
其中,极线约束是指在描述同一个点投影到两个不同视角的图像上时,像点、相机光心在投影模型下形成的约束。以图1为例,针对图像一中沿针孔相机光心O1,像点P1光线上的空间点P或者P’,其在图像二中的像点P2一定在极线e2P2上,即表述像点P1和像点P2符合极线约束。对于异构的针孔和鱼眼相机对,极线不一定是直线,也可能是曲线。其中,e1为两图像对应相机光心O1O2连线与图像一平面的交点,e2为两图像对应相机光心O1O2连线与图像二平面的交点。
相较于通过所有特征点对特征点进行特征匹配,通过特征点对应的图像存在共视区域的图像中符合极线约束的目标特征点对特征点进行特征匹配,一方面可以降低特征匹配过程的计算量;另一方面,由于特征点对应的图像存在共视区域的图像中符合极线约束的特征点与特征点存在较高的相似度,通过特征点的目标特征点对特征点进行特征匹配可以使匹配后的特征点融合目标特征点的特征,增加了特征点的辨识度,使目标网络根据特征融合后的特征图能够更加准确地得到对应的深度图,提供高了测距的准确度。第一图像与第二图像的像素点之间存在极线约束,第一图像的像素点与第一特征图的特征点之间有对应关系,第二图像的像素点与第二特征图的特征点之间有对应关系,从而可以确定第一特征图的特征点与第二特征图的特征点之间存在的极线约束关系。
示例性地,以获取鱼眼相机拍摄的第一图像和针孔相机拍摄的第二图像为例,将第一图像对应的第一特征图展平为一维特征表示为[a0,a1,…,aH1xW1],长度为H1xW1,将第二图像对应的第二特征图展平为一维特征表示为[b0,b1,…bH2xW2],长度为H2xW2。进一步将两个特征拼接为一维特征C=[a0,a1,…,aH1xW1,b0,b1,…bH2xW2]。之后利用网络将一维特征C映射为三个特征,分别为Q,K,V,其维度与C保持相同。假设第二特征图中索引位置为i的特征bi,通过极线约束计算后,其在第一特征图中对应深度范围(dmin,dmax)的特征索引位置有n个,分别为{ad0,ad1,…,adn},由此Q中对应针孔图像特征索引位置为i的元素qii,其不需要与长度为H1xW1+H2xW2的K中所有元素进行点乘,而只需与{ad0,ad1,…,adn}对应的n个元素进行逐个点乘得到qii与n个元素的各元素的乘积,再经过softmax运算得到S=[s1,s2,...,sn],最后与V中对应{ad0,ad1,…,adn}的n个元素进行加权求和运算,得到bi对应融合后的特征bi’,对每个特征点进行上述操作得到一维特征C’,然后按照C的拼接次序将其拆分并转换为第一特征图对应的第三特征图和第二特征图对应的第四特征图。
可选地,目标特征点也可以为与特征点对应的图像存在共视区域的图像中符合极线约束的特征点以及符合极线约束的特征点周围的特征点。
需要说明的是,对目标特征点位置进行一定范围的膨胀处理,可以提高相机外参细微变动的鲁棒性。以计算机设备获取鱼眼相机拍摄的第一图像和针孔相机拍摄的第二图像为例,将第一图像对应的第一特征图展平为一维特征表示为[a0,a1,…,aH1xW1],长度为H1xW1,将第二图像对应的第二特征图展平为一维特征表示为[b0,b1,…bH2xW2],长度为H2xW2。进一步将两个特征拼接为一维特征C=[a0,a1,…,aH1xW1,b0,b1,…bH2xW2]。之后利用网络将一维特征C映射为三个特征,分别为Q,K,V,假设第二特征图中索引位置为i的特征bi,通过极线约束计算后,其在第一特征图中对应深度范围(dmin,dmax)的特征索引位置有n个,分别为{ad0,ad1,…,adn},经过膨胀处理后又有m个候选点,总体表示为{ad0,ad1,…,adn,adn+1,…,adn+m}。由此Q中对应针孔图像特征索引位置为i的元素qii,其不需要与长度为H1xW1+H2xW2的K中所有元素进行点乘,而只需与{ad0,ad1,…,adn,adn+1,…,adn+m}对应的n+m个元素进行逐个点乘得到qii与n+m个元素的各元素 的乘积,再经过softmax运算得到S=[s1,s2,...,sn+m],最后与V中对应{ad0,ad1,…,adn,adn+1,…,adn+m}的n+m个元素进行加权求和运算,得到bi对应融合后的特征bi’。
在一种可能的实现方式中,可以根据所述第一深度图、所述第二深度图、第一结构语义和第二结构语义,确定所述第一图像和/或所述第二图像中的物体与车辆之间的距离,其中,所述第一结构语义用于指示所述第一图像中的物体边缘和平面,所述第二结构语义用于指示所述第二图像中的物体边缘和平面。
可以理解的是,第一图像和第二图像具有存在共视区域,第一图像和第二图像中可能存在相同物体。由于深度图中像素点的深度是相对于像素点所在图片对应的相机坐标系的,因此不同相机坐标系的同一像素点的深度在转换为以车辆建立的统一坐标系中可能存在偏差,该偏差可能会影响该边缘点与车辆之间的距离的准确性,为此可以通过表征图像中每一物体的边缘与平面结构的结构语义将不同相机的同一像素点在统一坐标系中进行对齐,以消除偏差,从而提高测距精度。
在一种可能的实现方式中,可以将第一图像、所述第一特征图或所述第三特征图输入目标网络,得到所述第一结构语义。之后将第二图像、所述第二特征图或所述第四特征图输入目标网络,得到所述第二结构语义。当以第三特征图为输入,获取到的第一结构语义可能更准确,因为第三特征图是基于极线约束进行融合后的特征图;类似的,当以第四特征图为输入,获取到的第二结构语义也可能更准确。
可以看出,目标网络还可以根据图像或特征图输出特征图对应图像中物体边缘和平面。由于第一图像和第二图像具有存在共视区域,第一图像和第二图像中可能存在相同物体。由于深度图中像素点的深度是相对于像素点所在图片对应的相机坐标系的,因此不同相机坐标系的同一像素点的深度在转换为以车辆建立的统一坐标系中可能存在偏差,该偏差可能会影响该边缘点与车辆之间的距离的准确性,为此可以通过表征图像中每一物体的边缘与平面结构的结构语义将不同相机的同一像素点在统一坐标系中进行对齐,以消除偏差,从而提高测距精度。
可选地,所述第一图像为所述第一相机在第一时刻采集的图像,所述第二图像为第二相机在所述第一时刻采集的图像。
在一种可能的实现方式中,可以根据所述第一深度图、所述第二深度图、第一实例分割结果、第二实例分割结果、第一距离信息和第二距离信息,确定所述第一图像和/或所述第二图像中的物体与车辆之间的距离。其中,所述第一实例分割结果用于指示所述第一图像中的背景和可移动物体,所述第二实例分割结果用于指示所述第二图像中的背景和可移动物体,所述第一距离信息用于指示第三图像中物体与车辆之间的距离,所述第三图像为所述第一相机在第二时刻采集的图像,所述第二距离信息用于指示第四图像中物体与车辆之间的距离,所述第四图像为所述第二相机在所述第二时刻采集的图像。
可以理解的是,第一图像和第二图像具有存在共视区域,第一图像和第二图像中可能存在相同物体。由于深度图中像素点的深度是相对于像素点所在图片对应的相机坐标系的,因此不同相机坐标系的同一像素点的深度在转换为以车辆建立的统一坐标系中可能存在偏差,该偏差可能会影响该边缘点与车辆之间的距离的准确性,为此可以通过表征图像中第一相机在第二时刻采集的第三图像中物体与车辆之间的距离以及第一图像的实例分割结果,以两幅图像中的固定背景为参考对第一图像中各物体与车辆之间的距离进行修正,通过表征图像中第二相机在第二时刻采集的第四图像中物体与车辆之间的距离以及第二图像的实例分割结果,以两幅图像中的固定背景为参考对第一二像中各物体与车辆之间的距离,从而使不同相机的同一像素点在统一坐标系中进行对齐,以消除偏差,从而提高测距精度。
在一种可能的实现方式中,可以将第一图像、所述第一特征图或所述第三特征图输入目标网络,得到所述第一实例分割。之后将第二图像、所述第二特征图或所述第四特征图输入目标网络,得到所述第二实例分割结果。当以第三特征图为输入,获取到的第一实例分割可能更准确,因为第三特征图是基于极线约束进行融合后的特征图;类似的,当以第四特征图为输入,获取到的第二实例分割也可能更准确。
可以看出,目标网络还可以根据特征图输出特征图对应图像的实例分割结果。由于第一图像和第二图像具有存在共视区域,第一图像和第二图像中可能存在相同物体。由于深度图中像素点的深度是相对于像素点所在图片对应的相机坐标系的,因此不同相机坐标系的同一像素点的深度在转换为以车 辆建立的统一坐标系中可能存在偏差,该偏差可能会影响该边缘点与车辆之间的距离的准确性,为此可以通过表征图像中第一相机在第二时刻采集的第三图像中物体与车辆之间的距离以及第一图像的实例分割结果,以两幅图像中的固定背景为参考对第一图像中各物体与车辆之间的距离进行修正,通过表征图像中第二相机在第二时刻采集的第四图像中物体与车辆之间的距离以及第二图像的实例分割结果,以两幅图像中的固定背景为参考对第一二像中各物体与车辆之间的距离,从而使不同相机的同一像素点在统一坐标系中进行对齐,以消除偏差,从而提高测距精度。
在一种可能的实现方式中,可以根据所述第一相机的内参和预设鱼眼相机内参对所述第一图像进行校准。再基于校准后的第一图像获取其对应的深度图,从而用于确定物体与车辆的距离。
需要说明的是,由于相机模组制作时工艺产生的偏差,不同相机的参数会有偏差,因此可以通过预设鱼眼相机内参对第一相机采集的第一图像进行校准以消除偏差,进一步提升测距精度。
在一种可能的实现方式中,可以根据所述第二相机的内参和预设针孔相机内参对所述第二图像进行校准。再基于校准后的第二图像获取其对应的深度图,从而用于确定物体与车辆的距离。
需要说明的是,由于相机模组制作时工艺产生的偏差,不同相机的参数会有偏差,因此可以通过预设针孔相机内参对第二相机采集的第二图像进行校准以消除偏差,进一步提升测距精度。
在一种可能的实现方式中,可以根据所述第一图像和所述第二图像中的物体与车辆之间的距离对所述第一图像和所述第二图像中的物体进行三维重建。显示三维重建后的物体。
可以理解的是,根据第一图像和/或第二图像中的物体与车辆之间的距离对第一图像和/或第二图像中的物体进行三维重建并显示三维重建后的物体,有助于使用户更直观了解第一图像和/或第二图像中的物体与车辆的位置关系。
在一种可能的实现方式中,可以根据所述第一图像和所述第二图像中的物体与车辆之间的距离显示提示信息。
示例性地,在第一图像中的物体与车辆之间的距离小于距离阈值的时候,显示碰撞告警提示信息以提醒用户车辆可能与第一图像中的物体发送碰撞。
又示例性地,在第二图像中的物体与车辆之间的距离小于距离阈值的时候,显示距离提示信息以提醒用户车辆与第二图像中的物体距离较近。
可以理解是的,通过第一图像和所述第二图像中的物体与车辆之间的距离显示提示信息,可以在图像中的物体与车辆距离较近时提醒用户,以便用户及时处理,从而避免车辆与图像中的物体发生碰撞。
在一种可能的实现方式中,上述第一图像和/或第二图像中的物体与车辆之间的距离为第一图像和/或第二图像中的物体与车辆的等距轮廓之间的距离。车辆的等距轮廓是根据车辆外轮廓设置的等距轮廓。上述等距轮廓可以为二维(2D)的俯视视角下车身外轮廓线向外扩展的等距线,也可以为三维(3D)的车身三维外轮廓向外扩展的等距曲面。
在一种可能的实现方式中,可以根据所述第一图像和所述第二图像中的物体与车辆之间的距离调整车辆等距轮廓。
例如,当图像中的物体与车辆之间的距离小于或等于0.5m时,调整等距轮廓的颜色为黄色。
又例如,图像中的物体与车辆之间的距离小于或等于0.2m时,调整等距轮廓的颜色为红色。
在一种可能的实现方式中,第一相机可以为后视鱼眼相机,第二相机可以为后视针孔相机构成。
在一种可能的实现方式中,第一相机可以为前视鱼眼相机,第二相机可以为前视针孔相机构成。
在一种可能的实现方式中,第一相机可以为左视鱼眼相机,第二相机可以为左视针孔相机构成。
在一种可能的实现方式中,第一相机可以为右视鱼眼相机,第二相机可以为右视针孔相机构成。
第二方面,本申请实施例提供了一种测距装置,应用于包括第一相机和第二相机的车辆,该测距装置包括:获取单元、网络单元和确定单元。所述获取单元,用于获取第一图像和第二图像,所述第一图像为第一相机采集的图像,所述第二图像为第二相机采集的图像,所述第一相机和所述第二相机存在共视区域,所述第一相机为鱼眼相机,所述第二相机为针孔相机。所述网络单元,用于获取第一深度图和第二深度图,所述第一深度图为所述第一图像对应的深度图,所述第二深度图为所述第二图像对应的深度图。所述确定单元,用于根据所述第一深度图和所述第二深度图确定所述第一图像和/或所述第二图像中的物体与车辆之间的距离。
在一种可能的实现方式中,所述网络单元具体用于:获取第一特征图和第二特征图,所述第一特 征图为所述第一图像对应的特征图,所述第二特征图为所述第二图像对应的特征图。根据所述第一特征图的第一特征点及所述第一特征点对应的多个目标特征点,得到第三特征图,所述第一特征点为所述第一特征图中的任意特征点,所述第一特征点对应的多个目标特征点为所述第二特征图中与所述第一特征点符合极线约束的特征点。根据所述第二特征图的第二特征点及所述第二特征点对应的多个目标特征点,得到第四特征图,所述第二特征点为所述第二特征图中的任意特征点,所述第二特征点对应的多个目标特征点为所述第一特征图中与所述第二特征点符合极线约束的特征点。根据所述第三特征图和所述第四特征图,得到所述第一深度图和所述第二深度图。
在一种可能的实现方式中,所述确定单元具体用于:根据所述第一深度图、所述第二深度图、第一结构语义和第二结构语义,确定所述第一图像和/或所述第二图像中的物体与车辆之间的距离,其中,所述第一结构语义用于指示所述第一图像中的物体边缘和平面,所述第二结构语义用于指示所述第二图像中的物体边缘和平面。
可选地,所述第一图像为所述第一相机在第一时刻采集的图像。
在一种可能的实现方式中,所述确定单元具体用于:根据所述第一深度图、所述第二深度图、第一实例分割结果、第二实例分割结果、第一距离信息和第二距离信息,确定所述第一图像和/或所述第二图像中的物体与车辆之间的距离,所述第一实例分割结果用于指示所述第一图像中的背景和可移动物体,所述第二实例分割结果用于指示所述第二图像中的背景和可移动物体,所述第一距离信息用于指示第三图像中物体与车辆之间的距离,所述第三图像为所述第一相机在第二时刻采集的图像,所述第二距离信息用于指示第四图像中物体与车辆之间的距离,所述第四图像为所述第二相机在所述第二时刻采集的图像。
在一种可能的实现方式中,所述获取单元还用于:根据所述第一相机的内参和预设鱼眼相机内参对所述第一图像进行校准。
在一种可能的实现方式中,所述获取单元还用于:根据所述第二相机的内参和预设针孔相机内参对所述第二图像进行校准。
在一种可能的实现方式中,所述确定单元还用于:根据所述第一图像和所述第二图像中的物体与车辆之间的距离对所述第一图像和所述第二图像中的物体进行三维重建;显示三维重建后的物体。
在一种可能的实现方式中,所述确定单元还用于:根据所述第一图像和所述第二图像中的物体与车辆之间的距离显示提示信息。
第三方面,本申请实施例还提供一种测距装置,该测距装置包括:一个或多个处理器,当所述一个或多个处理器执行程序代码或指令时,实现上述第一方面或其任意可能的实现方式中所述的方法。
可选地,该测距装置还可以包括一个或多个存储器,该一个或多个存储器用于存储该程序代码或指令。
第四方面,本申请实施例还提供一种芯片,包括:输入接口、输出接口、一个或多个处理器。可选地,该芯片还包括存储器。该一个或多个处理器用于执行该存储器中的代码,当该一个或多个处理器执行该代码时,该芯片实现上述第一方面或其任意可能的实现方式中所述的方法。
可选地,上述芯片还可以为集成电路。
第五方面,本申请实施例还提供一种计算机可读存储介质,用于存储计算机程序,该计算机程序包括用于实现上述第一方面或其任意可能的实现方式中所述的方法。
第六方面,本申请实施例还提供一种包含指令的计算机程序产品,当其在计算机上运行时,使得计算机实现上述第一方面或其任意可能的实现方式中所述的方法。
第七方面,本申请实施例还提供一种测距装置,包括:获取单元、网络单元和确定单元。获取单元用于获取第一图像和第二图像,所述第一图像为第一相机采集的图像,所述第二图像为第二相机采集的图像,所述第一相机和所述第二相机存在共视区域,所述第一相机为鱼眼相机,所述第二相机为针孔相机;网络单元用于获取第一深度图和第二深度图,上述第一深度图为上述第一图像对应的深度图,上述第二深度图为上述第二图像对应的深度图;确定单元,用于根据上述第一深度图和上述第二深度图确定上述第一图像和/或上述第二图像中的物体与车辆之间的距离。上述测距装置还用于实现上述第一方面或其任意可能的实现方式中所述的方法。
第八方面,本申请实施例提供一种测距系统,包括一个或多个第一相机、一个或多个第二相机、计算设备,所述一个或多个第一相机用于获取第一图像,一个或多个第二相机用于获取第二图像,所 述计算设备用于基于第一图像和第二图像,采用上述第一方面或其任意可能的实现方式中所述的方法进行测距。
第九方面,本申请实施例提供了一种车辆,该车辆包括一个或多个鱼眼相机、一个或多个针孔相机和一个或多个处理器,一个或多个处理器实现上述第一方面或其任意可能的实现方式中所述的方法。可选的,所述车辆还包括显示屏,用于显示路况、距离提示信息、车辆的二维/三维模型或障碍物的二维/三维模型等信息。可选的,所述车辆还包括扬声器,用于播放语音提示信息,语音提示信息可以包括危险提示和/或车辆与障碍物的距离等信息,例如,在车辆距离障碍物的距离小于预设阈值时,语音提示驾驶员注意障碍物的存在等。车辆既可以只用显示屏显示提示信息来提醒驾驶员,也可以只用语音提示信息来提醒驾驶员,或者将显示屏显示和语音提示结合起来提示驾驶员,例如,在车辆与障碍物的距离低于第一阈值时,只通过显示屏显示提示信息,当车辆与障碍物的距离低于第二阈值时(第二阈值低于第一阈值),在显示的同时结合语音提示驾驶员注意障碍物,从而引起驾驶员的注意。
本实施例提供的测距装置、计算机存储介质、计算机程序产品和芯片均用于执行上文所提供的方法,因此,其所能达到的有益效果可参考上文所提供的方法中的有益效果,此处不再赘述。
附图说明
为了更清楚地说明本申请实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请实施例的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1为本申请实施例提供的一种图像的示意图;
图2为本申请实施例提供的一种测距系统的结构示意图;
图3为本申请实施例提供的一种图像采集系统的结构示意图;
图4为本申请实施例提供的一种测距方法的流程示意图;
图5为本申请实施例提供的一种目标网络的结构示意图;
图6为本申请实施例提供的一种提取图像特征示意图;
图7为本申请实施例提供的一种对像素点进行对齐的示意图;
图8为本申请实施例提供的一种测距场景的示意图;
图9为本申请实施例提供的一种显示界面的示意图;
图10为本申请实施例提供的另一种显示界面的示意图;
图11为本申请实施例提供的另一种测距场景的示意图;
图12为本申请实施例提供的又一种显示界面的示意图;
图13为本申请实施例提供的又一种显示界面的示意图;
图14为本申请实施例提供的又一种显示界面的示意图;
图15为本申请实施例提供的又一种显示界面的示意图;
图16为本申请实施例提供的又一种显示界面的示意图;
图17为本申请实施例提供的又一种显示界面的示意图;
图18为本申请实施例提供的又一种显示界面的示意图;
图19为本申请实施例提供的一种测距装置的结构示意图;
图20为本申请实施例提供的一种芯片的结构示意图;
图21为本申请实施例提供的一种电子设备的结构示意图。
具体实施方式
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本申请实施例一部分实施例,而不是全部的实施例。基于本申请实施例中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请实施例保护的范围。
本文中术语“和/或”,仅仅是一种描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。
本申请实施例的说明书以及附图中的术语“第一”和“第二”等是用于区别不同的对象,或者用 于区别对同一对象的不同处理,而不是用于描述对象的特定顺序。
此外,本申请实施例的描述中所提到的术语“包括”和“具有”以及它们的任何变形,意图在于覆盖不排他的包含。例如包含了一系列步骤或单元的过程、方法、系统、产品或设备没有限定于已列出的步骤或单元,而是可选的还包括其他没有列出的步骤或单元,或可选的还包括对于这些过程、方法、产品或设备固有的其他步骤或单元。
需要说明的是,本申请实施例的描述中,“示例性地”或者“例如”等词用于表示作例子、例证或说明。本申请实施例中被描述为“示例性地”或者“例如”的任何实施例或设计方案不应被解释为比其他实施例或设计方案更优先或更具优势。确切而言,使用“示例性地”或者“例如”等词旨在以具体方式呈现相关概念。
在本申请实施例的描述中,除非另有说明,“多个”的含义是指两个或两个以上。
图像畸变:图像畸变是由于透镜制造精度以及组装工艺的偏差会引入畸变,导致原始图像失真。现在一般相机在使用时必须去畸变,特别是鱼眼相机,若不去畸变,相机的原始图像中目标尺寸分布不匀质,对感知算法造成较大干扰,因此要对原始图像去畸变。但是,原始图像去畸变之后会丢失信息。信息的丢失在无人驾驶中非常致命,有造成交通事故的潜在风险。
极线约束:描述同一个点投影到两个不同视角的图像上时,像点、相机光心在投影模型下形成的约束。
如图1所示,针对图像一,沿针孔相机光心O1,像点P1光线上的空间点P或者P’,其在图像二中的像点P2一定在极线e2P2上,即表述为极线约束。对于异构的针孔和鱼眼相机对,极线不一定是直线,也可能是曲线。其中,e1为两图像对应相机光心O1O2连线与图像一平面的交点,e2为两图像对应相机光心O1O2连线与图像二平面的交点。
共视区域:指视野区域有交叉或重叠范围。
特征融合:两幅图像对应的特征图2和特征图3进行融合为例。特征图2展平为一维特征表示为[a0,a1,…,aH1xW1],长度为H1xW1,特征图3展平为一维特征表示为[b0,b1,…bH2xW2],长度为H2xW2。进一步将两个特征拼接为一维特征C=[a0,a1,…,aH1xW1,b0,b1,…bH2xW2]。之后利用MLP网络将特征C映射为三个特征,分别为Q,K,V,其维度与C保持相同。然后将映射得到的三个特征输入Transformer网络。相关技术中,Transformer网络中会根据C’=softmax(QKT/sqrt(dk))V(其中dk=H1xW1+H2xW2)。得到融合特征C’,之后将融合特征C’拆分为特征图2对应的融合特征图2和特征图3对应的融合特征图3。其中,QKT表示向量点乘。
可以看出,相关技术中的特征融合过程中,需要将默认任意相机上的一个特征点与所有相机上的特征点进行匹配运算。这样不仅计算量大,而且不相关的图像信息也会影响网络学习融合的特征。
相关技术中,主要通过视差估计方法对具有重叠区域的多幅图像进行深度估计,视差估计方法中需要确定图像重叠区域的每个像素点在其他图像上对应的像素点并计算像素点与对应像素点之间的视差,然后通过像素点与对应像素点之间的视差计算得到像素点的深度。
可以看出视差估计方法可以通过视差计算出重叠区域所有像素点的深度,但由于图像中重叠区域外的像素点,在其他图像中不存在对应的像素点,因此无法得到图像中重叠区域外的像素点的视差,也无法计算图像中重叠区域外的像素点的深度。
为此本申请实施例提供了一种测距方法,能够对具有重叠区域的多个图像进行深度估计。该方法适用于测距系统,图2示出了该测距系统的一种可能的存在形式。
如图2所示该测距系统包括图像采集系统和计算机设备。图像采集系统和计算机设备可以通过有线或无线方式进行通信。
图像采集系统,用于采集具有存在共视区域的第一图像和第二图像。
计算机设备,用于根据图像采集系统采集的具有存在共视区域的第一图像和第二图像确定上述第一图像和上述第二图像中的物体与车辆之间的距离。
图像采集系统可以由多个具有存在共视区域的相机构成。
图像采集系统包括第一相机,第一相机为视场范围大于预设角度的相机。
示例性地,预设角度可以为180度或192度。
在一种可能的实现方式中,上述多个相机可以为同一规格的相机。
例如,图像采集系统可以由多个鱼眼相机构成。
在另一种可能的实现方式中,上述多个相机可以为不同规格的相机。
例如,图像采集系统可以由一个或多个鱼眼相机和一个或多个针孔相机构成。
在一种可能的实现方式中,图像采集系统可以布置在车辆上。
上述车辆可以为陆地车辆或非陆地车辆。
上述陆地车辆可以包括小型轿车、全尺寸运动型多功能车(sport utility vehicle,SUV)、货车、卡车、面包车、公共汽车、摩托车、自行车、踏板车、火车、雪地汽车、轮式车辆、履带式车辆或铁轨式车辆。
上述非陆地车辆可以包括无人机、飞机、气垫船、航天器、船舶和帆船。
在一种可能的实现方式中,图像采集系统可以由车身四周布置的4个鱼眼相机(前视、后视、左视和右视)和6个针孔相机(前视、后视、左右前向侧视、左右后向侧视)构成,并且鱼眼相机与针孔相机间存在共视区域。
如图3所示,在一种可能的实现方式中,图像采集系统可以由车身四周布置的前视针孔相机、前视鱼眼相机、后视针孔相机和后视鱼眼相机组成。其中,前视针孔相机的视野范围与前视鱼眼相机的视野范围存在共视区域,后视针孔相机的视野范围与后视鱼眼相机的视野范围存在共视区域。
在另一种可能的实现方式中,图像采集系统可以由后视鱼眼相机和后视针孔相机构成,并且两者具有存在共视区域。
在又一种可能的实现方式中,图像采集系统可以由前视鱼眼相机和前视针孔相机构成,并且两者具有存在共视区域。
在又一种可能的实现方式中,图像采集系统可以由左视鱼眼相机和左视针孔相机构成,并且两者具有存在共视区域。
在又一种可能的实现方式中,图像采集系统可以由右视鱼眼相机和右视针孔相机构成,并且两者具有存在共视区域。
上述计算机设备可以是终端或服务器。其中,终端可以是车载终端、智能手机、平板电脑、笔记本电脑、台式计算机、智能音箱、智能手表、智能电视等,但并不局限于此。服务器可以是独立的物理服务器,也可以是多个物理服务器构成的服务器集群或者分布式系统,还可以是提供云计算服务的云服务器。
图像采集系统和计算机设备可以通过有线或无线方式进行通信。
需要说明的是,上述无线方式可以通过通信网络实现通信,该通信网络可以是局域网,也可以是通过中继设备转接的广域网,或者包括局域网和广域网。当该通信网络为局域网时,示例性地,该通信网络可以是wifi热点网络、wifi P2P网络、蓝牙网络、zigbee网络、近场通信(near field communication,NFC)网或者未来可能的通用短距离通信网络、专用短距通信(dedicated short range communication,DSRC)网络等。
示例性地,上述通信网络可以是第三代移动通信技术(3rd-generation wireless telephone technology,3G)网络、第四代移动通信技术(the 4th generation mobile communication technology,4G)网络、第五代移动通信技术(5th-generation mobile communication technology,5G)网络、公共陆地移动网(Public Land Mobile Network,PLMN)或因特网等,本申请实施例对此不做限定。
下面结合图2示出的测距系统介绍本申请实施例提供的测距方法。
图4示出了本申请实施例提供的一种测距方法,应用于车辆,所述车辆包括第一相机和第二相机,该方法可以由上述测距系统中的计算机设备执行,如图4所示,该方法包括:
S301、计算机设备获取第一图像和第二图像。
其中,上述第一图像为第一相机采集的图像,上述第二图像为第二相机采集的图像,上述第一相机和上述第二相机存在共视区域,上述第一相机为鱼眼相机,上述第二相机为针孔相机。
需要说明的是,共视区域是指视野区域有交叉或重叠范围。
可选地,上述第一相机可以视场范围大于预设角度的相机。示例性地,上述预设角度可以为180度或192度。
示例性地,计算机设备从图像采集系统中获取相机拍摄的第一图像和第二图像。
例如,计算机设备可以获取由车身上布置的前视鱼眼相机和前视针孔相机拍摄的第一图像和第二图像。其中,前视鱼眼相机和前视针孔相机存在共视区域。
在一种可能的实现方式中,计算机设备也可以获取多组图像,每组图像包括第一图像和第二图像,同一组的第一图像和第二图像的拍摄相机存在共视区域。
示例性地,计算机设备从车身四周布置的4个鱼眼相机(前视、后视、左视和右视)和4个针孔相机(前视、后视、左视和右视)构成的图像采集系统中获取8个相机拍摄的4组图像。其中,第1组图像包括前视鱼眼相机拍摄的第一图像1和前视针孔相机拍摄的第二图像1,前视鱼眼相机和前视针孔相机存在共视区域。第2组图像包括后视鱼眼相机拍摄的第一图像2和后视针孔相机拍摄的第二图像2,后视鱼眼相机和后视针孔相机存在共视区域。第3组图像包括左视鱼眼相机拍摄的第一图像3和左视针孔相机拍摄的第二图像3,左视鱼眼相机和左视针孔相机存在共视区域。第4组图像包括右视鱼眼相机拍摄的第一图像4和右视针孔相机拍摄的第二图像4,右视鱼眼相机和右视针孔相机存在共视区域。
可选地,计算机设备可以获取图像采集系统一段时间内拍摄的多组图像。其中,同一组内的多个第一图像采集时间相同,不同组的第一图像采集时间不相同。
例如,计算机设备可以获取5组图像,第一组图像中第一图像和第二图像的采集时间均为10:00:00,第二组图像中第一图像和第二图像的采集时间均为10:00:01,第三组图像中第一图像和第二图像的采集时间均为10:00:02,第四组图像中第一图像和第二图像的采集时间均为10:00:03,第五组图像中第一图像和第二图像的采集时间均为10:00:04。
可选地,计算机设备还可以根据每个图像对应相机的内参和预设相机内参对每个上述图像中像素点的坐标进行校准。
例如,根据上述第一相机的内参和预设鱼眼相机内参对上述第一图像进行校准。
又例如,根据第二相机的内参和预设针孔相机内参对上述第二图像进行校准。
其中,鱼眼相机(如坎纳拉摄像机模式(Kannala-Brandt Camera Mode,KB)模型的鱼眼相机)的相机内参包括焦距(fx,fy),成像中心位置(cx,cy)和畸变参数(k1,k2,k3,k4)
针孔相机的相机内参包括焦距(fx,fy),成像中心位置(cx,cy)以及相应的畸变参数。畸变参数包括径向畸变系数(k1,k2,k3)和切向畸变系数(p1,p2)。
相机外参则是相对预设坐标系,参数为三维位置偏移(x,y,z)和相机光轴与坐标轴的夹角(yaw,pitch,roll)。例如,预设坐标系可以是相对车辆建立的车身坐标系。
例如,根据针孔相机或鱼眼相机采集的图像中的每个像素点的坐标(u,v)和标准针孔模型成像过程公式逆投影求解得到单位深度平面上畸变的点坐标(xdistorted,ydistorted)。
再逆畸变变换利用求解得到单位深度平面上无畸变的点(x,y)。其中,r2=x2+y2
再将单位深度平面上无畸变的点的坐标(x,y),带入模板相机系统对应相机的内参下的成像过程中(畸变和投影变换后),然后根据无畸变的点的坐标和预设相机内参得到像素点校准后的坐标(u’,v’)。由此建立像素点校准的坐标和像素点校准后的坐标的对应关系,根据该对应关系对第一图像和第二图像中的每个像素点进行转换,从而将相机图像(第一图像和第二图像)校准成模板相机的图像。
可选地的,在上述校准过程中也可利用插值算法对校准图像进行平滑处理。
需要说明的是,由于相机模组制作时工艺产生的偏差,不同图像采集系统的参数会有偏差。为了保证不同图像采集系统测距精度的一致性,同时便于训练深度估计模型,因此可以将图像校准成模板相机系统的图像。将相机图像的像素反投影到单位深度平面上,模拟真实的光线射入路线,再将其投影到模板相机上。
S302、计算机设备获取第一深度图和第二深度图。
其中,上述第一深度图为上述第一图像对应的深度图,上述第二深度图为上述第二图像对应的深 度图。
在一种可能的实现方式中,计算机设备可以获取第一特征图和第二特征图。然后根据上述第一特征图的第一特征点及上述第一特征点对应的多个目标特征点,得到第三特征图。之后根据上述第二特征图的第二特征点及上述第二特征点对应的多个目标特征点,得到第四特征图。接着根据上述第三特征图和上述第四特征图,得到上述第一深度图和上述第二深度图。其中,上述第一特征图为上述第一图像对应的特征图,上述第二特征图为上述第二图像对应的特征图。上述第一特征点为上述第一特征图中的任意特征点,上述第一特征点对应的多个目标特征点为上述第二特征图中与上述第一特征点符合极线约束的特征点。上述第二特征点为上述第二特征图中的任意特征点,上述第二特征点对应的多个目标特征点为上述第一特征图中与上述第二特征点符合极线约束的特征点。
相较于通过所有特征点对特征点进行特征匹配,通过特征点对应的图像存在共视区域的图像中符合极线约束的目标特征点对特征点进行特征匹配,一方面可以降低特征匹配过程的计算量;另一方面,由于特征点对应的图像存在共视区域的图像中符合极线约束的特征点与特征点存在较高的相似度,通过特征点的目标特征点对特征点进行特征匹配可以使匹配后的特征点融合目标特征点的特征,增加了特征点的辨识度,使根据特征融合后的特征图能够更加准确地得到对应的深度图,提供高了测距的准确度。
在另一种可能的实现方式中,计算机设备可以将第一图像和第二图像输入目标网络,得到第一深度图和第二深度图。
如图5所示,在一种可能的实现方式中,目标网络可以包括第一子网络,第一子网络用于根据输入的图像输出图像的特征图。
示例性地,如图6所示,可以将鱼眼相机拍摄的尺寸为HxW的第一图像输入第一子网络以得到用于表征第一图像特征的尺寸为H1xW1的第一特征图,将针孔相机拍摄的尺寸为H’xW’的第二图像输入第一子网络以得到用于表征第二图像特征的尺寸为H2xW2的第二特征图。
在一种可能的实现方式中,可以对不同相机采集的图像采用相同的第一子网络提取图像的特征得到图像对应的特征图。
例如,对针孔相机和鱼眼相机采集的图像均采用resent50特征提取网络提取图像的特征得到图像对应的特征图。
在另一种可能的实现方式中,可以对不同的图像采用不同的特征提取网络提取图片的特征得到图像对应的特征图。
例如,对针孔相机采集的图像采用resent50特征提取网络提取图片的特征得到图像对应的特征图,对鱼眼相机采集的图像采用带形变卷积(deformable convolution)的resent50特征提取网络提取图片的特征得到图像对应的特征图。
在一种可能的实现方式中,可以对第一特征图和第二特征图进行尺寸对齐。
示例性地,第一特征图为鱼眼相机采集的第一图像的特征图,第二特征图为针孔相机采集的第二图像的特征图。上述鱼眼相机的焦距为N,针孔相机的焦距为4N,即针孔相机的焦距是鱼眼相机的焦距的4倍。这种情况下,可以将第一特征图放大4倍使第一特征图和第二特征图的尺寸对齐。
如图5所示,在一种可能的实现方式中,目标网络可以包括第二子网络,第二子网络用于根据输入的特征图输出对应的融合特征图。
示例性地,第二子网络可以根据上述第一特征图的第一特征点及上述第一特征点对应的多个目标特征点,得到第三特征图(即第一特征图的融合特征图)。根据上述第二特征图的第二特征点及上述第二特征点对应的多个目标特征点,得到第四特征图(即第二特征图的融合特征图)。
需要说明的是,以计算机设备获取鱼眼相机拍摄的第一图像和针孔相机拍摄的第二图像为例,将第一图像对应的第一特征图展平为一维特征表示为[a0,a1,…,aH1xW1],长度为H1xW1,将第二图像对应的第二特征图展平为一维特征表示为[b0,b1,…bH2xW2],长度为H2xW2。进一步将两个特征拼接为一维特征C=[a0,a1,…,aH1xW1,b0,b1,…bH2xW2]。之后利用网络将一维特征C映射为三个特征,分别为Q,K,V,假设第二特征图中索引位置为i的特征bi,通过极线约束计算后,其在第一特征图中对应深度范围(dmin,dmax)的特征索引位置有n个,分别为{ad0,ad1,…,adn}, 由此Q中对应针孔图像特征索引位置为i的元素qii,其不需要与长度为H1xW1+H2xW2的K中所有元素进行点乘,而只需与{ad0,ad1,…,adn}对应的n个元素进行逐个点乘得到qii与n个元素的乘积,再经过softmax运算得到S=[s1,s2,...,sn],最后与V中对应{ad0,ad1,…,adn}的n个元素进行加权求和运算,得到bi对应融合后的特征bi’,对每个特征点进行上述操作得到一维特征C’,然后按照C的拼接次序将其拆分并转换为第一特征图对应的第三特征图和第二特征图对应的第四特征图。
可选地,目标特征点也可以为与特征点对应的图像存在共视区域的图像中符合极线约束的特征点以及符合极线约束的特征点周围的特征点。
需要说明的是,对目标特征点位置进行一定范围的膨胀处理,可以提高相机外参细微变动的鲁棒性。以计算机设备获取鱼眼相机拍摄的第一图像和针孔相机拍摄的第二图像为例,将第一图像对应的第一特征图展平为一维特征表示为[a0,a1,…,aH1xW1],长度为H1xW1,将第二图像对应的第二特征图展平为一维特征表示为[b0,b1,…bH2xW2],长度为H2xW2。进一步将两个特征拼接为一维特征C=[a0,a1,…,aH1xW1,b0,b1,…bH2xW2]。之后利用网络将一维特征C映射为三个特征,分别为Q,K,V,假设第二特征图中索引位置为i的特征bi,通过极线约束计算后,其在第一特征图中对应深度范围(dmin,dmax)的特征索引位置有n个,分别为{ad0,ad1,…,adn},经过膨胀处理后又有m个候选点,总体表示为{ad0,ad1,…,adn,adn+1,…,adn+m}。由此Q中对应针孔图像特征索引位置为i的元素qii,其不需要与长度为H1xW1+H2xW2的K中所有元素进行点乘,而只需与{ad0,ad1,…,adn,adn+1,…,adn+m}对应的n+m个元素进行逐个点乘得到qii与n+m个元素的乘积,再经过softmax运算得到S=[s1,s2,...,sn+m],最后与V中对应{ad0,ad1,…,adn,adn+1,…,adn+m}的n+m个元素进行加权求和运算,得到bi对应融合后的特征bi’。
如图5所示,在一种可能的实现方式中,目标网络可以包括第三子网络,第三子网络用于根据输入的特征图或融合特征图输出对应的深度图。
示例性地,可以将上述第三特征图和上述第四特征图输入目标网络的第三子网络中,得到第一深度图和第二深度图。
又示例性地,可以将上述第一特征图和上述第二特征图输入目标网络的第三子网络中,得到第一深度图和第二深度图。
其中,上述第三子网络是利用第一训练数据样本集训练得到的,上述第一训练数据样本集中包括多个图像和上述多个图像对应的深度图。
例如,可以利用带有360度激光扫描的真值车获取点云与图像的同步帧数据。然后通过点云得到图像对应的深度图,利用图像和图像对应的深度图有监督训练第三子网络,同时可利用自监督训练与时序帧间一致性进行辅助训练第三子网络。
S303、计算机设备根据第一深度图和第二深度图确定第一图像和/或第二图像中的物体与车辆之间的距离。
示例性地,可以根据第一深度图(或第三深度图)得到第一图像中各像素点在第一相机坐标系中的三维坐标,根据第二深度图(或第四深度图)得到第二图像中各像素点在第二相机坐标系中的三维坐标,然后将像素点在第一相机坐标系中的三维坐标和像素点在第二相机坐标系中的三维坐标转换为像素点在车辆坐标系的坐标,然后通过像素点在车辆坐标系的三维坐标确定第一图像和/或第二图像中的物体与车辆之间的距离。其中,第一相机坐标系是以第一相机的光心为坐标原点建立的坐标系,第二坐标系是以第二相机的光心为坐标原点建立的坐标系,车辆坐标系是以车身参考点(如车辆后轴中心)为坐标原点建立的坐标系。
需要说明的是,一些物体(如悬空障碍物)可能不处于第一相机和第二相机的共视区域,而是单 独存在于第一相机或第二相机的视野范围内,即单独存在于第一图像或第二图像中,这些物体与车辆之间的距离可以通过第一深度图和第二深度图确定得到。
在一种可能的实现方式中,可以根据上述第一深度图(或第三深度图)、上述第二深度图(或第四深度图)、第一结构语义和第二结构语义,确定上述第一图像和/或上述第二图像中的物体与车辆之间的距离。
示例性地,可以根据第一深度图(或第三深度图)得到第一图像中各像素点在第一相机坐标系中的三维坐标,根据第二深度图(或第四深度图)得到第二图像中各像素点在第二相机坐标系中的三维坐标,然后将像素点在第一相机坐标系中的三维坐标和像素点在第二相机坐标系中的三维坐标转换为像素点在车辆坐标系的坐标,之后根据上述第一结构语义和上述第二结构语义对目标物体的边缘点对应的像素点进行对齐,然后通过像素点在车辆坐标系的三维坐标确定第一图像和/或第二图像中的物体与车辆之间的距离。其中,目标物体为第一图像和第二图像中均存在的物体。
可以理解的是,由于实际空间中的点可能在多个相机中被观测到。例如,给定空间中的一个物体边缘点可能在鱼眼相机拍摄的第一图像和针孔相机拍摄的第二图像中同时出现,但由于图像对应的深度图是对应各个相机的,因此鱼眼相机提供的像素点坐标和针孔相机提供的像素点坐标仍可能在统一的坐标系中产生偏差。因此,本申请实施例提供的测距方法中利用结构语义根据上述第一结构语义和上述第二结构语义对目标物体的边缘点对应的像素点进行对齐的处理,可以减少上述偏差。
例如,如图7所示,图7中第一图像和第二图像示出了物体1的边缘。以对物体1的边缘对应的像素点进行对齐为例。鱼眼相机拍摄的第一图像中的物体1边缘对应的像素点转换为车辆坐标系中表示为一串点[q1,q2,…,qm],针孔相机拍摄的第二图像中的物体1边缘对应的像素点转换为车辆坐标系中表示为另一串点[p1,p2,…,pn],通过将第一图像中的物体1边缘对应的像素点进行旋转平移RT操作后,可以与第二图像中的物体1边缘对应的像素点对齐,即两两最相邻的点之间欧式几何距离和最小。通过梯度求解算法,即可计算得到RT矩阵,从而将第一图像和/或第二图像中的物体1边缘对齐。同理,也可以对第一图像和/或第二图像中的其他相同物体的像素点进行一样的优化。
在一种可能的实现方式中,可以根据上述第一深度图(或第三深度图)、上述第二深度图(或第四深度图)、第一实例分割结果、第二实例分割结果、第一距离信息和第二距离信息,确定上述第一图像和/或上述第二图像中的物体与车辆之间的距离。
其中,上述第一实例分割结果用于指示上述第一图像中的背景和可移动物体,上述第二实例分割结果用于指示上述第二图像中的背景和可移动物体,上述第一距离信息用于指示第三图像中物体与车辆之间的距离,上述第三图像为上述第一相机在第二时刻采集的图像,上述第二距离信息用于指示第四图像中物体与车辆之间的距离,上述第四图像为上述第二相机在上述第二时刻采集的图像。
示例性地,可以根据第一深度图(或第三深度图)得到第一图像中各像素点在第一相机坐标系中的三维坐标,根据第二深度图(或第四深度图)得到第二图像中各像素点在第二相机坐标系中的三维坐标,然后将像素点在第一相机坐标系中的三维坐标和像素点在第二相机坐标系中的三维坐标转换为像素点在车辆坐标系的坐标,然后通过像素点在车辆坐标系的三维坐标确定第一图像和/或第二图像中的物体与车辆之间的距离。之后通过上述第一实例分割结果、上述第二实例分割结果、第一距离信息和第二距离信息对得到的第一图像和/或第二图像中的物体与车辆之间的距离进行修正。
可以理解的是,背景的位置是固定的,通过两个时刻之间车辆的位置关系及两个时刻中一个时刻中车辆与背景之间的距离,确定车辆在两个时刻中的另一时刻车辆与背景之间的距离。例如,车辆在时刻1与墙面相距5米,车辆在时刻1至时刻2之间向远离该墙面方向行驶了0.5米,则可以确定车辆在时刻2与墙面相距5+0.5=5.5米。因此,利用这一关系对通过第一实例分割结果、第二实例分割结果、第一距离信息和第二距离信息对得到的第一图像和/或第二图像中的物体与车辆之间的距离进行修正。
根据第一实例分割结果可以将第一图像中的背景和可移动物体分割,利用第一时刻和第二时刻车辆的位置关系和第二时刻拍摄的第三图像中背景与车辆之间的距离换算出第一时刻拍摄的第一图像中该背景与车辆之间的换算距离。然后通过换算得到的换算距离对第一图像中的物体(该背景)与车辆之间的距离(测量距离)进行修正(如换算距离*0.2+测量距离*0.8=修正后的距离)。
例如,第一图像和第三图像中均包括墙体1,第三图像中车辆与墙体1的距离为0.5米,第二时刻至第一时刻之间车辆向该墙面方向行驶了0.2米,则通过换算可以确定第一时刻车辆与墙体之间的换算距离为0.5-0.2=0.3米,若通过第一图像的深度图测量得到的车辆与墙体之间的测量距离为0.28米,则 可以通过换算距离对测量距离进行修正,即确定修正后的距离为0.3*0.2+0.28*0.8=0.284米,以进一步提高测距的准确性。
根据第二实例分割结果可以将第二图像中的背景和可移动物体分割,利用第一时刻和第二时刻车辆的位置关系和第二时刻拍摄的第四图像中背景与车辆之间的距离换算出第一时刻拍摄的第二图像中该背景与车辆之间的换算距离。然后通过换算得到的换算距离对第二图像中的物体(该背景)与车辆之间的距离(测量距离)进行修正(如换算距离*0.2+测量距离*0.8=修正后的距离)。
可以看出,本申请实施例提供的测距方法,通过将视场范围大于预设角度的相机采集的第一图像和与该相机具有存在共视区域的相机采集的第二图像输入的目标网络中得到上述图像对应的第一深度图和第二深度图,之后根据第一深度图和第二深度图确定第一图像和上述第二图像中的物体与车辆之间的距离。通过视场范围大于预设角度的相机能够弥补测距传感器布局固有的盲区,使车辆对探测盲区中障碍物进行测距。
如图8所示,相关技术利用超声波传感器进行测距,由于传感器布局存在盲区,在车辆倒车过程中超声波传感器无法发现车辆后方的悬空障碍物并给予用户提醒,容易造成事故。而本申请实施例提供的测距方法,通过具有存在共视区域的第一相机(如鱼眼相机)和第二相机(如针孔相机)采集的图像共同进行测距,从而可以弥补测距传感器布局固有的盲区,使车辆可以探测到超声波传感器的探测盲区中存在的悬空障碍物并予用户提醒,使用户及时发现车辆后发的悬空障碍物,从而降低车辆碰撞悬空障碍物的概率。
可选地,本申请实施例提供的测距方法还可以包括:
S304、获取第一结构语义和第二结构语义。
如图5所示,在一种可能的实现方式中,目标网络可以包括第四子网络,第四子网络用于根据输入的图像或特征图输出对应的结构语义。其中,结构语义用于指示图像中的物体边缘和平面。
示例性地,可以将第一图像、上述第一特征图或上述第三特征图输入目标网络的第四子网络,得到第一结构语义。将上述第二图像、上述第二特征图或上述第四特征图输入目标网络的第四子网络,得到第二结构语义。其中,上述第一结构语义用于指示上述第一图像中的物体边缘和平面,上述第二结构用于指示上述第二图像中的物体边缘和平面。
可选地,物体的物体边缘可以由边缘的热力图表示,物体的平面结构可以由三维法向量图表示。
例如,可以对多个图像进行实例分割标注得到多个图像中的物体边缘,然后结合点云的几何信息以及实例分割标注语义信息对图像各区域的平面法向量进行计算得到多个图像中物体的平面结构,之后通过多个图像中的物体边缘和多个图像中物体的平面结构监督训练第二子网络。
S305、获取第一实例分割结果和第二实例分割结果。
如图5所示,在一种可能的实现方式中,目标网络可以包括第五子网络,第五子网络用于根据输入的图像或特征图输出对应的结构语义。其中,物体属性包括图像的实例分割结果。图像的实例分割用于指示图像中的背景和可移动物体。例如,图像的实例分割用于指示图像中的车辆、行人等可移动物体和图像中地面、墙面等背景。
示例性地,可以将第一图像、上述第一特征图或上述第三特征图输入目标网络的第五子网络,得到第一实例分割结果。将第二图像、上述第二特征图或上述第四特征图输入目标网络的第五子网络,得到第二实例分割结果。其中,上述第一实例分割结果用于指示第一图像的实例分割结果,上述第二实例分割结果用于指示第二图像的实例分割结果。
在一种可能的实现方式中,可以对多个图像进行实例分割标注得到多个图像的实例分割结构,然后通过多个图像和多个图像的实例分割结构训练第五子网络。
S306、计算机设备根据上述第一图像和上述第二图像中的物体与车辆之间的距离对上述第一图像和上述第二图像中的物体进行三维重建。
在一种可能的实现方式中,计算机设备根据上述第一图像和上述第二图像中的物体与车辆之间的距离和颜色信息对上述第一图像和上述第二图像中的物体进行三维重建。其中,颜色信息用于指示第一图像和第二图像中每个像素点的颜色。
S307、计算机设备显示三维重建后的物体。
示例性地,终端可以通过显示面板显示三维重建后的物体。
又示例性地,服务器可以向终端发送显示指令,终端在收到显示指令后根据显示指令显示三维重 建后的物体。
S308、计算机设备根据上述第一图像和上述第二图像中的物体与车辆之间的距离显示提示信息。
示例性地,在第一图像中的物体与车辆之间的距离小于距离阈值的时候,显示碰撞告警提示信息以提醒用户车辆可能与第一图像中的物体发送碰撞。
又示例性地,在第二图像中的物体与车辆之间的距离小于距离阈值的时候,显示距离提示信息以提醒用户车辆与第二图像中的物体距离较近。
上述提示信息可以是文字信息、声音信息或图像信息。
可以理解是的,通过第一图像和上述第二图像中的物体与车辆之间的距离显示提示信息,可以在图像中的物体与车辆距离较近时提醒用户,以便用户及时处理,从而避免车辆与图像中的物体发生碰撞。
在一种可能的实现方式中,上述第一图像和/或第二图像中的物体与车辆之间的距离为第一图像和/或第二图像中的物体与车辆的等距轮廓之间的距离。车辆的等距轮廓是根据车辆外轮廓设置的等距轮廓。上述等距轮廓可以为二维(2D)的俯视视角下车身外轮廓线向外扩展的等距线,也可以为三维(3D)的车身三维外轮廓向外扩展的等距曲面。
在一种可能的实现方式中,可以根据上述第一图像和上述第二图像中的物体与车辆之间的距离调整车辆等距轮廓。
例如,当图像中的物体与车辆之间的距离小于或等于0.5m时,调整等距轮廓的颜色为黄色。
又例如,图像中的物体与车辆之间的距离小于或等于0.2m时,调整等距轮廓的颜色为红色。
在一种可能的实现方式中,该方法还可以包括:显示第一界面。如图9所示,第一界面可以包括显示功能控件和设置功能控件。其中,显示功能用于显示车辆周边物体。
如图10所示,用户可以通过点击显示功能控件,显示第二界面,通过第二界面可以选择显示功能运行过程所使用的传感器。通过图10可以看出,用户选择了开启前视鱼眼相机、后视鱼眼相机、前视针孔相机、后视针孔相机、激光雷达和超声波探测器。
如图11所示,在一种场景中车辆前方存在多个物体。其中,物体1、物体2和物体3位于前视鱼眼相机的视野范围内,物体2和物体4位于前视针孔相机的视野范围内。物体2位于前视鱼眼相机和前视针孔相机的共同视野范围内。
如图12所示,用户在第二界面仅选择开启前视鱼眼相机且关闭前视针孔相机,然后返回第一界面,点击显示功能控件。则之后显示图13所示的第三界面。通过图13显示的第三界面可以看出,车辆前方存在物体1、物体2和物体3。由于关闭了前视针孔相机,因此图13所显示的第三界面与图11所示的实际场景相比缺少了处于前视针孔相机视野范围内的物体4。因此,仅通过前视鱼眼相机进行物体探测存在探测盲区,无法发现车辆前方的物体4。
如图14所示,用户在第二界面仅选择开启前视针孔相机且关闭前视鱼眼相机,然后返回第一界面,点击显示功能控件。则之后显示图15所示的第三界面。通过图15显示的第三界面可以看出,车辆前方存在物体2和物体4。由于关闭了前视鱼眼相机,因此图15所显示的第三界面与图11所示的实际场景相比缺少了处于前视鱼眼相机视野范围内的物体4。因此,仅通过前视针孔相机进行物体探测存在探测盲区,无法发现车辆前方的物体1和物体3。
如图16所示,用户在第二界面仅选择开启前视针孔相机和前视鱼眼相机,然后返回第一界面,点击显示功能控件。则之后显示图17所示的第三界面。通过图17显示的第三界面可以看出,车辆前方存在物体1、物体2、物体3和物体4与图11所示的实际场景一致。
可以看出,相比与单一传感器进行测距,本申请实施例可以通过选择多个传感器共同测距,从而弥补单一传感器布局固有的盲区,使车辆对单一传感器的探测盲区中障碍物进行测距。
在一种可能的实现方式中,第三界面还可以显示提示信息。例如,在车辆与物体之间的距离小于0.5米时,第三界面显示提示信息。
如图18所示,初始时,车辆前方2米处存在物体。车辆直行一段时间后,车辆与前方物体之间距离缩短为0.4米。这时第三界面中显示文字框提示用户距离物体较近。
可以看出,本申请实施例可以在物体与车辆距离较近时提醒用户,以便用户及时处理,从而避免车辆与物体发生碰撞。
在一种可能的实现方式中,本申请实施例提供的测距方法可集成到公有云作为一项服务对外发布。 当该测距方法集成到公有云作为一项服务对外发布,还可以对用户上传数据进行保护。例如对于图像,可以要求用户上传的图像已事先进行加密。
在另一种可能的实现方式中,本申请实施例提供的测距方法也可以集成到私有云,作为一项服务对内使用。当测距方法集成到私有云时,可以根据实际需要确定是否对用户上传数据进行保护。
在又一种可能的实现方式中,本申请实施例提供的测距方法还可以集成到混合云。其中,混合云是指包括一个或多个公有云和一个或多个私有云的架构。
当本申请实施例提供的测距方法以服务的方式提供给用户使用时,该服务可以提供应用程序编程接口(application programming interface,API)和/或用户界面(也称作用户接口)。其中,用户界面可以是图形用户界面(graphical user interface)或者是命令用户界面(command user interface,CUI)。如此,操作系统或软件系统等业务系统可以直接调用该服务提供的API进行测距,或者是服务通过GUI或CUI接收用户输入的图像,根据图像进行测距。
在又一种可能的实现方式中,本申请实施例提供的测距方法可以封装成软件包出售,用户购买软件包后可在该用户的运行环境下安装使用。当然,上述软件包也可以预安装在各种设备(例如,台式机、笔记本电脑、平板电脑、智能手机等)中,用户购买预安装软件包的设备,并使用该设备,根据图像进行测距。
下面将结合图19介绍用于执行上述测距方法的测距装置。
可以理解的是,测距装置为了实现上述功能,其包含了执行各个功能相应的硬件和/或软件模块。结合本文中所公开的实施例描述的各示例的算法步骤,本申请实施例能够以硬件或硬件和计算机软件的结合形式来实现。某个功能究竟以硬件还是计算机软件驱动硬件的方式来执行,取决于技术方案的特定应用和设计约束条件。本领域技术人员可以结合实施例对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请实施例的范围。
本申请实施例可以根据上述方法示例对测距装置进行功能模块的划分,例如,可以对应各个功能划分各个功能模块,也可以将两个或两个以上的功能集成在一个处理模块中。上述集成的模块可以采用硬件的形式实现。需要说明的是,本实施例中对模块的划分是示意性的,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式。
在采用对应各个功能划分各个功能模块的情况下,图19示出了上述实施例中涉及的测距装置的一种可能的组成示意图,如图19所示,该测距装置1800可以包括:获取单元1801、网络单元1802和确定单元1803。
获取单元1801,用于获取第一图像和第二图像,所述第一图像为第一相机采集的图像,所述第二图像为第二相机采集的图像,所述第一相机和所述第二相机存在共视区域,所述第一相机为鱼眼相机,所述第二相机为针孔相机。
网络单元1802,用于获取第一深度图和第二深度图,上述第一深度图为上述第一图像对应的深度图,上述第二深度图为上述第二图像对应的深度图。
确定单元1803,用于根据上述第一深度图和上述第二深度图确定上述第一图像和/或上述第二图像中的物体与车辆之间的距离。
在一种可能的实现方式中,上述网络单元具体用于:对上述第一图像和上述第二图像进行特征提取,得到第一特征图和第二特征图,上述第一特征图为上述第一图像对应的特征图,上述第二特征图为上述第二图像对应的特征图。根据上述第一特征图的每一特征点的目标特征点对上述第一特征图的每一特征点进行特征匹配,得到第三特征图,上述目标特征点为与特征点对应的图像存在共视区域的图像中符合极线约束的特征点。根据上述第二特征图的每一特征点的目标特征点对上述第二特征图的每一特征点进行特征匹配,得到第四特征图。将上述第三特征图和上述第四特征图输入目标网络,得到第一深度图和第二深度图。
在一种可能的实现方式中,上述网络单元具体用于:将上述第三特征图输入目标网络,得到上述第一深度图和第一结构语义,上述第一结构语义用于指示上述第一图像中的物体边缘和平面。将上述第四特征图输入目标网络,得到上述第二深度图和第二结构语义,上述第二结构语义用于指示上述第二图像中的物体边缘和平面。
在一种可能的实现方式中,上述确定单元具体用于:根据上述第一深度图、上述第二深度图、上述第一结构语义和上述第二结构语义,确定上述第一图像和上述第二图像中的物体与车辆之间的距离。
在一种可能的实现方式中,上述网络单元具体用于:将上述第三特征图输入目标网络,得到上述第一深度图和第一实例分割结果,上述第一实例分割结果用于指示上述第一图像中的背景和可移动物体。将上述第四特征图输入目标网络,得到上述第二深度图和第二实例分割结果,上述第二实例分割结果用于指示上述第二图像中的背景和可移动物体。
可选地,上述第一图像为上述第一相机在第一时刻采集的图像,上述第二图像为第二相机在上述第一时刻采集的图像。
在一种可能的实现方式中,上述确定单元具体用于:根据上述第一深度图、上述第二深度图、上述第一实例分割结果、第二实例分割结果、第一距离信息和第二距离信息,确定上述第一图像和上述第二图像中的物体与车辆之间的距离,上述第一距离信息用于指示第三图像中物体与车辆之间的距离,上述第三图像为上述第一相机在第二时刻采集的图像,上述第二距离信息用于指示第四图像中物体与车辆之间的距离,上述第四图像为上述第二相机在上述第二时刻采集的图像。
在一种可能的实现方式中,上述获取单元还用于:根据上述第一相机的内参和预设相机内参对上述第一图像进行校准。
在一种可能的实现方式中,上述第二图像为第二相机采集的图像,上述获取单元还用于:根据上述第二相机的内参和预设相机内参对上述第二图像进行校准。
在一种可能的实现方式中,上述确定单元还用于:根据上述第一图像和上述第二图像中的物体与车辆之间的距离对上述第一图像和上述第二图像中的物体进行三维重建。显示三维重建后的物体。
在一种可能的实现方式中,上述确定单元还用于:根据上述第一图像和上述第二图像中的物体与车辆之间的距离显示提示信息。
在一种可能的实现方式中,上述第一相机为鱼眼相机,上述第二相机为针孔相机。
本申请实施例还提供了一种芯片。该芯片可以为系统级芯片(System on Chip,SOC)或其他芯片。
图20示出了一种芯片1900的结构示意图。芯片1900包括一个或多个处理器1901以及接口电路1902。可选的,上述芯片1900还可以包含总线1903。
处理器1901可能是一种集成电路芯片,具有信号的处理能力。在实现过程中,上述测距方法的各步骤可以通过处理器1901中的硬件的集成逻辑电路或者软件形式的指令完成。
可选地,上述的处理器1901可以是通用处理器、数字信号处理(digital signal processing,DSP)器、集成电路(application specific integrated circuit,ASIC)、现场可编程门阵列(field-programmable gate array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。可以实现或者执行本申请实施例中的公开的各方法、步骤。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。
接口电路1902可以用于数据、指令或者信息的发送或者接收,处理器1901可以利用接口电路1902接收的数据、指令或者其他信息,进行加工,可以将加工完成信息通过接口电路1902发送出去。
可选的,芯片还包括存储器,存储器可以包括只读存储器和随机存取存储器,并向处理器提供操作指令和数据。存储器的一部分还可以包括非易失性随机存取存储器(non-volatile random access memory,NVRAM)。
可选的,存储器存储了可执行软件模块或者数据结构,处理器可以通过调用存储器存储的操作指令(该操作指令可存储在操作系统中),执行相应的操作。
可选的,芯片可以使用在本申请实施例涉及的测距装置中。可选的,接口电路1902可用于输出处理器1901的执行结果。关于本申请实施例的一个或多个实施例提供的测距方法可参考前述各个实施例,这里不再赘述。
需要说明的,处理器1901、接口电路1902各自对应的功能既可以通过硬件设计实现,也可以通过软件设计来实现,还可以通过软硬件结合的方式来实现,这里不作限制。
图13为本申请实施例提供的一种电子设备的结构示意图,电子设备100可以为手机、平板电脑、可穿戴设备、车载设备、增强现实(augmented reality,AR)/虚拟现实(virtual reality,VR)设备、笔记本电脑、超级移动个人计算机(ultra-mobile personal computer,UMPC)、上网本、个人数字助理(personal digital assistant,PDA)、模型处理装置或者模型处理装置中的芯片或者功能模块。
示例性地,图21是本申请实施例提供的一例电子设备100的结构示意图。电子设备100可以包括处理器110,外部存储器接口120,内部存储器121,通用串行总线(universal serial bus,USB)接口 130,充电管理模块140,电源管理模块141,电池142,天线1,天线2,移动通信模块150,无线通信模块160,音频模块170,扬声器170A,受话器170B,麦克风170C,耳机接口170D,传感器模块180,按键190,马达191,指示器192,摄像头193,显示屏194,以及用户标识模块(subscriber identification module,SIM)卡接口195等。其中传感器模块180可以包括压力传感器180A,陀螺仪传感器180B,气压传感器180C,磁传感器180D,加速度传感器180E,距离传感器180F,接近光传感器180G,指纹传感器180H,温度传感器180J,触摸传感器180K,环境光传感器180L,骨传导传感器180M等。
可以理解的是,本申请实施例示意的结构并不构成对电子设备100的具体限定。在本申请另一些实施例中,电子设备100可以包括比图示更多或更少的部件,或者组合某些部件,或者拆分某些部件,或者不同的部件布置。图示的部件可以硬件,软件或软件和硬件的组合实现。
处理器110可以包括一个或多个处理单元,例如:处理器110可以包括应用处理器(application processor,AP),调制解调处理器,图形处理器(graphics processing unit,GPU),图像信号处理器(image signal processor,ISP),控制器,存储器,视频编解码器,数字信号处理器(digital signal processor,DSP),基带处理器,和/或神经网络处理器(neural-network processing unit,NPU)等。其中,不同的处理单元可以是独立的器件,也可以集成在一个或多个处理器中。
其中,控制器可以是电子设备100的神经中枢和指挥中心。控制器可以根据指令操作码和时序信号,产生操作控制信号,完成取指令和执行指令的控制。
处理器110中还可以设置存储器,用于存储指令和数据。在一些实施例中,处理器110可以包括一个或多个接口。接口可以包括集成电路(inter-integrated circuit,I2C)接口,集成电路内置音频(inter-integrated circuit sound,I2S)接口,脉冲编码调制(pulse code modulation,PCM)接口,通用异步收发传输器(universal asynchronous receiver/transmitter,UART)接口,移动产业处理器接口(mobile industry processor interface,MIPI),通用输入输出(general-purpose input/output,GPIO)接口,用户标识模块(subscriber identity module,SIM)接口,和/或通用串行总线(universal serial bus,USB)接口等。
其中,I2C接口是一种双向同步串行总线,处理器110可以通过I2C接口耦合触摸传感器180K,使处理器110与触摸传感器180K通过I2C总线接口通信,实现电子设备100的触摸功能。MIPI接口可以被用于连接处理器110与显示屏194,摄像头193等外围器件。MIPI接口包括摄像头串行接口(camera serial interface,CSI),显示屏串行接口(display serial interface,DSI)等。在一些实施例中,处理器110和摄像头193通过CSI接口通信,实现电子设备100的拍摄功能。处理器110和显示屏194通过DSI接口通信,实现电子设备100的显示功能。
可以理解的是,本申请实施例示意的各模块间的接口连接关系,只是示意性说明,并不构成对电子设备100的结构限定。在本申请另一些实施例中,电子设备100也可以采用上述实施例中不同的接口连接方式,或多种接口连接方式的组合。
充电管理模块140用于从充电器接收充电输入。其中,充电器可以是无线充电器,也可以是有线充电器。电源管理模块141用于连接电池142,充电管理模块140与处理器110。电源管理模块141接收电池142和/或充电管理模块140的输入,为处理器110,内部存储器121,外部存储器,显示屏194,摄像头193,和无线通信模块160等供电。
电子设备100通过GPU,显示屏194,以及应用处理器等实现显示功能。GPU为图像处理的微处理器,连接显示屏194和应用处理器。GPU用于执行数学和几何计算,用于图形渲染。处理器110可包括一个或多个GPU,其执行程序指令以生成或改变显示信息。
显示屏194用于显示图像,视频等。显示屏194包括显示面板。显示面板可以采用液晶显示屏(liquid crystal display,LCD),有机发光二极管(organic light-emitting diode,OLED),有源矩阵有机发光二极体或主动矩阵有机发光二极体(active-matrix organic light emitting diode的,AMOLED),柔性发光二极管(flex light-emitting diode,FLED),Miniled,MicroLed,Micro-oLed,量子点发光二极管(quantum dot light emitting diodes,QLED)等。在一些实施例中,电子设备100可以包括1个或N个显示屏194,N为大于1的正整数。
电子设备100可以通过ISP,摄像头193,触摸传感器、视频编解码器,GPU,显示屏194以及应用处理器等实现拍摄功能。
其中,ISP用于处理摄像头193反馈的数据。例如,拍照时,打开快门,光线通过镜头被传递到摄 像头感光元件上,光信号转换为电信号,摄像头感光元件将上述电信号传递给ISP处理,转化为肉眼可见的图像。ISP还可以对图像的噪点,亮度,肤色进行算法优化。ISP还可以对拍摄场景的曝光,色温等参数优化。在一些实施例中,ISP可以设置在摄像头193中。
摄像头193用于捕获静态图像或视频。实例通过镜头生成光学图像投射到感光元件。感光元件可以是电荷耦合器件(charge coupled device,CCD)或互补金属氧化物半导体(complementary metal-oxide-semiconductor,CMOS)光电晶体管。感光元件把光信号转换成电信号,之后将电信号传递给ISP转换成数字图像信号。ISP将数字图像信号输出到DSP加工处理。DSP将数字图像信号转换成标准的RGB,YUV等格式的图像信号,应理解,在本申请实施例的描述中,以RGB格式的图像为例进行介绍,本申请实施例对图像格式不做限定。在一些实施例中,电子设备100可以包括1个或N个摄像头193,N为大于1的正整数。
数字信号处理器用于处理数字信号,除了可以处理数字图像信号,还可以处理其他数字信号。例如,当电子设备100在频点选择时,数字信号处理器用于对频点能量进行傅里叶变换等。
视频编解码器用于对数字视频压缩或解压缩。电子设备100可以支持一种或多种视频编解码器。这样,电子设备100可以播放或录制多种编码格式的视频,例如:动态图像专家组(moving picture experts group,MPEG)1,MPEG2,MPEG3,MPEG4等。
外部存储器接口120可以用于连接外部存储卡,例如Micro SD卡,实现扩展电子设备100的存储能力。内部存储器121可以用于存储计算机可执行程序代码,所述可执行程序代码包括指令。处理器110通过运行存储在内部存储器121的指令,从而执行电子设备100的各种功能应用以及数据处理。内部存储器121可以包括存储程序区和存储数据区。
电子设备100可以通过音频模块170,扬声器170A,受话器170B,麦克风170C,耳机接口170D,以及应用处理器等实现音频功能。例如音乐播放,录音等。
按键190包括开机键,音量键等。按键190可以是机械按键。也可以是触摸式按键。电子设备100可以接收按键输入,产生与电子设备100的用户设置以及功能控制有关的键信号输入。马达191可以产生振动提示。马达191可以用于来电振动提示,也可以用于触摸振动反馈。例如,作用于不同应用(例如拍照,音频播放等)的触摸操作,可以对应不同的振动反馈效果。作用于显示屏194不同区域的触摸操作,马达191也可对应不同的振动反馈效果。指示器192可以是指示灯,可以用于指示充电状态,电量变化,也可以用于指示消息,未接来电,通知等。SIM卡接口195用于连接SIM卡。
需要指出的是,电子设备100可以是芯片系统或有图21中类似结构的设备。其中,芯片系统可以由芯片构成,也可以包括芯片和其他分立器件。本申请的各实施例之间涉及的动作、术语等均可以相互参考,不予限制。本申请的实施例中各个设备之间交互的消息名称或消息中的参数名称等只是一个示例,具体实现中也可以采用其他的名称,不予限制。此外,图21中示出的组成结构并不构成对该电子设备100的限定,除图21所示部件之外,该电子设备100可以包括比图21所示更多或更少的部件,或者组合某些部件,或者不同的部件布置。
本申请中描述的处理器和收发器可实现在集成电路(integrated circuit,IC)、模拟IC、射频集成电路、混合信号IC、专用集成电路(application specific integrated circuit,ASIC)、印刷电路板(printed circuit board,PCB)、电子设备等上。该处理器和收发器也可以用各种IC工艺技术来制造,例如互补金属氧化物半导体(complementary metal oxide semiconductor,CMOS)、N型金属氧化物半导体(nMetal-oxide-semiconductor,NMOS)、P型金属氧化物半导体(positive channel metal oxide semiconductor,PMOS)、双极结型晶体管(Bipolar Junction Transistor,BJT)、双极CMOS(BiCMOS)、硅锗(SiGe)、砷化镓(GaAs)等。
本申请实施例还提供一种测距装置,该装置包括:一个或多个处理器,当上述一个或多个处理器执行程序代码或指令时,实现上述相关方法步骤实现上述实施例中的测距方法。
可选地,该装置还可以包括一个或多个存储器,该一个或多个存储器用于存储该程序代码或指令。
本申请实施例还提供一种车辆,该车辆包括一个或多个鱼眼相机、一个或多个针孔相机和一个或多个处理器,所述处理器可以用于上述实施例中的测距方法。可选的,上述一个或多个处理器可以上述测距装置的形式实施。可选的,所述车辆还包括显示屏,用于显示路况、距离提示信息、车辆的二维/三维模型或障碍物的二维/三维模型等信息。可选的,所述车辆还包括扬声器,用于播放语音提示信息,语音提示信息可以包括危险提示和/或车辆与障碍物的距离等信息,例如,在车辆距离障碍物的距 离小于预设阈值时,语音提示驾驶员注意障碍物的存在等。车辆既可以只用显示屏显示提示信息来提醒驾驶员,也可以只用语音提示信息来提醒驾驶员,或者将显示屏显示和语音提示结合起来提示驾驶员,例如,在车辆与障碍物的距离低于第一阈值时,只通过显示屏显示提示信息,当车辆与障碍物的距离低于第二阈值时(本申请对第二阈值和第一阈值的取值不做限定,只需要满足第二阈值低于第一阈值,例如,第二阈值为2米,第一阈值为1米),在显示的同时结合语音提示驾驶员注意障碍物,从而引起驾驶员的注意。
本申请实施例还提供一种计算机存储介质,该计算机存储介质中存储有计算机指令,当该计算机指令在测距装置上运行时,使得测距装置执行上述相关方法步骤实现上述实施例中的测距方法。
本申请实施例还提供了一种计算机程序产品,当该计算机程序产品在计算机上运行时,使得计算机执行上述相关步骤,以实现上述实施例中的测距方法。
本申请实施例还提供一种测距装置,这个装置具体可以是芯片、集成电路、组件或模块。具体的,该装置可包括相连的处理器和用于存储指令的存储器,或者该装置包括一个或多个处理器,用于从外部存储器获取指令。当装置运行时,处理器可执行指令,以使芯片执行上述各方法实施例中的测距方法。
应理解,在本申请的各种实施例中,上述各过程的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本申请实施例的实施过程构成任何限定。
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各实例的单元及算法步骤,能够以电子硬件,或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统、装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。
在本申请所提供的几个实施例中,应该理解到,所揭露的系统、装置和方法,可以通过其他的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,上述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其他的形式。
上述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。
上述功能如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请各个实施例上述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(Read Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。
以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应所述以权利要求的保护范围为准。

Claims (20)

  1. 一种测距方法,应用于车辆,所述车辆包括第一相机和第二相机,其特征在于,包括:
    获取第一图像和第二图像,所述第一图像为所述第一相机采集的图像,所述第二图像为所述第二相机采集的图像,所述第一相机和所述第二相机存在共视区域,所述第一相机为鱼眼相机,所述第二相机为针孔相机;
    获取第一深度图和第二深度图,所述第一深度图为所述第一图像对应的深度图,所述第二深度图为所述第二图像对应的深度图;
    根据所述第一深度图和所述第二深度图确定所述第一图像和/或所述第二图像中的物体与所述车辆之间的距离。
  2. 根据权利要求1所述的方法,其特征在于,所述获取第一深度图和第二深度图,包括:
    获取第一特征图和第二特征图,所述第一特征图为所述第一图像对应的特征图,所述第二特征图为所述第二图像对应的特征图;
    根据所述第一特征图的第一特征点及所述第一特征点对应的多个目标特征点,得到第三特征图,所述第一特征点为所述第一特征图中的任意特征点,所述第一特征点对应的多个目标特征点为所述第二特征图中与所述第一特征点符合极线约束的特征点;
    根据所述第二特征图的第二特征点及所述第二特征点对应的多个目标特征点,得到第四特征图,所述第二特征点为所述第二特征图中的任意特征点,所述第二特征点对应的多个目标特征点为所述第一特征图中与所述第二特征点符合极线约束的特征点;
    根据所述第三特征图和所述第四特征图,得到所述第一深度图和所述第二深度图。
  3. 根据权利要求1或2所述的方法,其特征在于,所述根据所述第一深度图和所述第二深度图确定所述第一图像和/或所述第二图像中的物体与车辆之间的距离,包括:
    根据所述第一深度图、所述第二深度图、第一结构语义和第二结构语义,确定所述第一图像和/或所述第二图像中的物体与车辆之间的距离,其中,所述第一结构语义用于指示所述第一图像中的物体边缘和平面,所述第二结构语义用于指示所述第二图像中的物体边缘和平面。
  4. 根据权利要求1或2所述的方法,其特征在于,所述第一图像为所述第一相机在第一时刻采集的图像,所述第二图像为所述第二相机在所述第一时刻采集的图像,所述根据所述第一深度图和所述第二深度图确定所述第一图像和/或所述第二图像中的物体与车辆之间的距离,包括:
    根据所述第一深度图、所述第二深度图、第一实例分割结果、第二实例分割结果、第一距离信息和第二距离信息,确定所述第一图像和/或所述第二图像中的物体与车辆之间的距离,其中,所述第一实例分割结果用于指示所述第一图像中的背景和可移动物体,所述第二实例分割结果用于指示所述第二图像中的背景和可移动物体,所述第一距离信息用于指示第三图像中物体与车辆之间的距离,所述第三图像为所述第一相机在第二时刻采集的图像,所述第二距离信息用于指示第四图像中物体与车辆之间的距离,所述第四图像为所述第二相机在所述第二时刻采集的图像。
  5. 根据权利要求1至4中任一项所述的方法,其特征在于,所述方法还包括:
    根据所述第一相机的内参和预设鱼眼相机内参对所述第一图像进行校准。
  6. 根据权利要求1至5中任一项所述的方法,其特征在于,所述方法还包括:
    根据所述第二相机的内参和预设针孔相机内参对所述第二图像进行校准。
  7. 根据权利要求1至6中任一项所述的方法,其特征在于,所述方法还包括:
    根据所述第一图像和所述第二图像中的物体与车辆之间的距离对所述第一图像和所述第二图像中的物体进行三维重建;
    显示三维重建后的物体。
  8. 根据权利要求1至7中任一项所述的方法,其特征在于,所述方法还包括:
    根据所述第一图像和所述第二图像中的物体与车辆之间的距离显示提示信息。
  9. 一种测距装置,应用于包括第一相机和第二相机的车辆,其特征在于,包括:获取单元、网络单元和确定单元;
    所述获取单元,用于获取第一图像和第二图像,所述第一图像为所述第一相机采集的图像,所述第二图像为所述第二相机采集的图像,所述第一相机和所述第二相机存在共视区域,所述第一相机为 鱼眼相机,所述第二相机为针孔相机;
    所述网络单元,用于获取第一深度图和第二深度图,所述第一深度图为所述第一图像对应的深度图,所述第二深度图为所述第二图像对应的深度图;
    所述确定单元,用于根据所述第一深度图和所述第二深度图确定所述第一图像和/或所述第二图像中的物体与所述车辆之间的距离。
  10. 根据权利要求9所述的装置,其特征在于,所述网络单元具体用于:
    获取第一特征图和第二特征图,所述第一特征图为所述第一图像对应的特征图,所述第二特征图为所述第二图像对应的特征图;
    根据所述第一特征图的第一特征点及所述第一特征点对应的多个目标特征点,得到第三特征图,所述第一特征点为所述第一特征图中的任意特征点,所述第一特征点对应的多个目标特征点为所述第二特征图中与所述第一特征点符合极线约束的特征点;
    根据所述第二特征图的第二特征点及所述第二特征点对应的多个目标特征点,得到第四特征图,所述第二特征点为所述第二特征图中的任意特征点,所述第二特征点对应的多个目标特征点为所述第一特征图中与所述第二特征点符合极线约束的特征点;
    根据所述第三特征图和所述第四特征图,得到所述第一深度图和所述第二深度图。
  11. 根据权利要求9或10所述的装置,其特征在于,所述确定单元具体用于:
    根据所述第一深度图、所述第二深度图、第一结构语义和第二结构语义,确定所述第一图像和/或所述第二图像中的物体与车辆之间的距离,其中,所述第一结构语义用于指示所述第一图像中的物体边缘和平面,所述第二结构语义用于指示所述第二图像中的物体边缘和平面。
  12. 根据权利要求9至11中任一项所述的装置,其特征在于,所述第一图像为所述第一相机在第一时刻采集的图像,所述第二图像为第二相机在所述第一时刻采集的图像,所述确定单元具体用于:
    根据所述第一深度图、所述第二深度图、第一实例分割结果、第二实例分割结果、第一距离信息和第二距离信息,确定所述第一图像和/或所述第二图像中的物体与车辆之间的距离,所述第一实例分割结果用于指示所述第一图像中的背景和可移动物体,所述第二实例分割结果用于指示所述第二图像中的背景和可移动物体,所述第一距离信息用于指示第三图像中物体与车辆之间的距离,所述第三图像为所述第一相机在第二时刻采集的图像,所述第二距离信息用于指示第四图像中物体与车辆之间的距离,所述第四图像为所述第二相机在所述第二时刻采集的图像。
  13. 根据权利要求9至12中任一项所述的装置,其特征在于,所述获取单元还用于:
    根据所述第一相机的内参和预设鱼眼相机内参对所述第一图像进行校准。
  14. 根据权利要求9至13中任一项所述的装置,其特征在于,所述获取单元还用于:
    根据所述第二相机的内参和预设针孔相机内参对所述第二图像进行校准。
  15. 根据权利要求9至14中任一项所述的装置,其特征在于,所述确定单元还用于:
    根据所述第一图像和所述第二图像中的物体与车辆之间的距离对所述第一图像和所述第二图像中的物体进行三维重建;
    显示三维重建后的物体。
  16. 根据权利要求9至15中任一项所述的装置,其特征在于,所述确定单元还用于:
    根据所述第一图像和所述第二图像中的物体与车辆之间的距离显示提示信息。
  17. 一种测距装置,包括一个或多个处理器和存储器,其特征在于,所述一个或多个处理器执行存储在存储器中的程序或指令,以使得所述测距装置实现上述权利要求1至8中任一项所述的方法。
  18. 一种车辆,其特征在于,所述车辆包括一个或多个鱼眼相机、一个或多个针孔相机和一个或多个处理器,所述处理器用于执行计算机指令以实现如上述权利要求1至8中任一项所述的方法。
  19. 一种计算机可读存储介质,用于存储计算机程序,其特征在于,当所述计算机程序在计算机或处理器运行时,使得所述计算机或所述处理器实现上述权利要求1至8中任一项所述的方法。
  20. 一种计算机程序产品,所述计算机程序产品中包含指令,其特征在于,当所述指令在计算机或处理器上运行时,使得所述计算机或所述处理器实现上述权利要求1至8中任一项所述的方法。
PCT/CN2023/108397 2022-10-31 2023-07-20 测距方法和装置 WO2024093372A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202211349288.4A CN118011428A (zh) 2022-10-31 2022-10-31 测距方法和装置
CN202211349288.4 2022-10-31

Publications (1)

Publication Number Publication Date
WO2024093372A1 true WO2024093372A1 (zh) 2024-05-10

Family

ID=90929602

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/108397 WO2024093372A1 (zh) 2022-10-31 2023-07-20 测距方法和装置

Country Status (2)

Country Link
CN (1) CN118011428A (zh)
WO (1) WO2024093372A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP4485355A1 (en) * 2023-06-27 2025-01-01 Lite-On Technology Corporation Ranging system and ranging method thereof

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109708655A (zh) * 2018-12-29 2019-05-03 百度在线网络技术(北京)有限公司 导航方法、装置、车辆及计算机可读存储介质
DE102019100303A1 (de) * 2019-01-08 2020-07-09 HELLA GmbH & Co. KGaA Verfahren und Vorrichtung zum Ermitteln einer Krümmung einer Fahrbahn
CN114782911A (zh) * 2022-06-20 2022-07-22 小米汽车科技有限公司 图像处理的方法、装置、设备、介质、芯片及车辆

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109708655A (zh) * 2018-12-29 2019-05-03 百度在线网络技术(北京)有限公司 导航方法、装置、车辆及计算机可读存储介质
DE102019100303A1 (de) * 2019-01-08 2020-07-09 HELLA GmbH & Co. KGaA Verfahren und Vorrichtung zum Ermitteln einer Krümmung einer Fahrbahn
CN114782911A (zh) * 2022-06-20 2022-07-22 小米汽车科技有限公司 图像处理的方法、装置、设备、介质、芯片及车辆

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP4485355A1 (en) * 2023-06-27 2025-01-01 Lite-On Technology Corporation Ranging system and ranging method thereof

Also Published As

Publication number Publication date
CN118011428A (zh) 2024-05-10

Similar Documents

Publication Publication Date Title
US10726579B1 (en) LiDAR-camera calibration
US11468585B2 (en) Pseudo RGB-D for self-improving monocular slam and depth prediction
US20200387698A1 (en) Hand key point recognition model training method, hand key point recognition method and device
US12062138B2 (en) Target detection method and apparatus
TWI755762B (zh) 目標跟蹤方法、智慧移動設備和儲存介質
CN108140235A (zh) 用于产生图像视觉显示的系统和方法
US20190147606A1 (en) Apparatus and method of five dimensional (5d) video stabilization with camera and gyroscope fusion
CN106846410B (zh) 基于三维的行车环境成像方法及装置
US20240112404A1 (en) Image modification techniques
US20230005277A1 (en) Pose determining method and related device
WO2021203868A1 (zh) 数据处理的方法和装置
CN111950428A (zh) 目标障碍物识别方法、装置及运载工具
WO2022206517A1 (zh) 一种目标检测方法及装置
CN113673584A (zh) 一种图像检测方法及相关装置
TW202418218A (zh) 圖像中的物件移除
CN111104893A (zh) 目标检测方法、装置、计算机设备及存储介质
CN113378605A (zh) 多源信息融合方法及装置、电子设备和存储介质
CN116052461A (zh) 虚拟车位确定方法、显示方法、装置、设备、介质及程序
WO2024093372A1 (zh) 测距方法和装置
CN114881863B (zh) 一种图像拼接方法、电子设备及计算机可读存储介质
JP2025504144A (ja) 画像検出方法及び装置
CN112639864B (zh) 用于测距的方法和装置
WO2022142596A1 (zh) 一种图像处理方法,装置及储存介质
US20240048843A1 (en) Local compute camera calibration
CN115358937B (zh) 图像去反光方法、介质及电子设备

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23884311

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 2023884311

Country of ref document: EP

ENP Entry into the national phase

Ref document number: 2023884311

Country of ref document: EP

Effective date: 20250320