[go: up one dir, main page]

CN118982678A - Intelligent scene understanding method for humanoid robots based on laser point cloud - Google Patents

Intelligent scene understanding method for humanoid robots based on laser point cloud Download PDF

Info

Publication number
CN118982678A
CN118982678A CN202411118957.6A CN202411118957A CN118982678A CN 118982678 A CN118982678 A CN 118982678A CN 202411118957 A CN202411118957 A CN 202411118957A CN 118982678 A CN118982678 A CN 118982678A
Authority
CN
China
Prior art keywords
point cloud
point
laser radar
scene
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202411118957.6A
Other languages
Chinese (zh)
Other versions
CN118982678B (en
Inventor
郭明
吴祥玲
张潇澜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing University of Civil Engineering and Architecture
Original Assignee
Beijing University of Civil Engineering and Architecture
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing University of Civil Engineering and Architecture filed Critical Beijing University of Civil Engineering and Architecture
Priority to CN202411118957.6A priority Critical patent/CN118982678B/en
Publication of CN118982678A publication Critical patent/CN118982678A/en
Application granted granted Critical
Publication of CN118982678B publication Critical patent/CN118982678B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • G06N3/0455Auto-encoder networks; Encoder-decoder networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0475Generative networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/094Adversarial learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/80Analysis of captured images to determine intrinsic or extrinsic camera parameters, i.e. camera calibration
    • G06T7/85Stereo camera calibration
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/24Aligning, centring, orientation detection or correction of the image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/25Determination of region of interest [ROI] or a volume of interest [VOI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/817Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level by voting

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Optical Radar Systems And Details Thereof (AREA)

Abstract

The invention discloses a humanoid robot scene intelligent understanding method based on laser point cloud, which comprises the following steps: the humanoid robot performs autonomous navigation and acquires three-dimensional live-action data through a laser radar scanning system to obtain scene point cloud data; processing the obtained scene point cloud data by using an adaptive radius sampling module; performing point cloud semantic feature extraction based on RPV-Net on the extracted effective point cloud data to obtain a 3D bounding box and a semantic tag of the point cloud data; processing point cloud data by using a channel attention module, acquiring point cloud integrity characteristics by using an average pooling method, extracting individual characteristics by using a maximum pooling method, and performing characteristic aggregation to acquire point cloud global characteristics; and constructing a multi-mode mixed model of a scene understanding Bert pre-training model and an encoder-decoder model, and processing the global characteristics of the point cloud. The method provides real-time and high-precision three-dimensional point cloud data for the humanoid robot by utilizing a laser radar technology, and can realize depth perception and intelligent understanding of the environment.

Description

Intelligent humanoid robot scene understanding method based on laser point cloud
Technical Field
The invention relates to the field of humanoid robot perception and intelligent understanding. More specifically, the invention relates to a humanoid robot scene intelligent understanding method based on laser point clouds.
Background
With the rapid development of robot technology, humanoid robots exhibit wide application potential in various application scenarios, such as disaster response, high-risk environment detection, daily services, and the like. However, in order for a humanoid robot to operate effectively in a complex environment, highly accurate environmental awareness is required. Traditional robotic vision systems rely primarily on cameras or simple sensors, which present significant challenges in environments with limited vision or complex lighting conditions. Especially in dynamic environments, it is difficult for conventional vision systems to acquire high quality environmental data in real time and accurately. In the research and development and application of the humanoid robot, accurate three-dimensional environment information is acquired in real time and efficient intelligent scene understanding is realized, so that the method is a key for improving the autonomy and the functionality of the robot.
Disclosure of Invention
The invention provides an intelligent understanding method of a humanoid robot scene based on laser point clouds, which provides real-time and high-precision three-dimensional point cloud data for the humanoid robot by utilizing a laser radar technology and can realize depth perception and intelligent understanding of the environment.
To achieve these objects and other advantages and in accordance with the purpose of the invention, as embodied and broadly described herein, there is provided a humanoid robot scene intelligent understanding method based on a laser point cloud, including:
A laser radar scanning system is assembled on a humanoid robot, the humanoid robot performs autonomous navigation and acquires three-dimensional live-action data through the laser radar scanning system to obtain scene point cloud data;
Processing the obtained scene point cloud data by using an adaptive radius sampling module to extract enough effective point cloud data;
Performing point cloud semantic feature extraction based on RPV-Net on the extracted effective point cloud data to obtain a 3D bounding box and a semantic tag of the point cloud data forming the building member in the scene;
Processing the point cloud data after the point cloud semantic feature extraction by using a channel attention module, acquiring the integral features of the point cloud by adopting an average pooling method, extracting individual features with differentiation by adopting a maximum pooling method, and carrying out feature aggregation on the integral features and the individual features to acquire the global features of the point cloud;
And constructing a multi-mode mixed model of a scene understanding Bert pre-training model and an encoder-decoder model, processing the acquired global characteristics of the point cloud, increasing the diversity of training data through the scene understanding Bert pre-training model, understanding the three-dimensional point cloud object into the three-dimensional ground object and the parameter information of the three-dimensional ground object through the encoder-decoder model, and improving the understanding capability of the humanoid robot on the scene.
Preferably, the laser point cloud-based intelligent understanding method for a humanoid robot scene, the laser radar scanning system comprises: the system comprises a laser radar scanner, an inertial measurement unit and a PTP time soft synchronization module, wherein the laser radar scanner and the inertial measurement unit are fused in a tight coupling mode, and the PTP time soft synchronization module enables time synchronization of the laser radar, the inertial measurement unit and the humanoid robot.
Preferably, the intelligent understanding method of the humanoid robot scene based on the laser point cloud further comprises calibrating the laser radar scanning system, specifically:
Using an Allan variance analysis method to evaluate noise characteristics and bias stability of the inertial measurement unit, and performing internal calibration on the inertial measurement unit;
and calibrating external parameters of the laser radar scanner and the inertial measurement unit by adopting a lidar_align calibration method, and acquiring a conversion matrix between a coordinate system of the laser radar scanner and a coordinate system of the inertial measurement unit so as to convert point cloud data under the coordinate system of the laser radar scanner into the coordinate system of the inertial measurement unit.
Preferably, the intelligent understanding method of the humanoid robot scene based on the laser point cloud further comprises the step of removing motion distortion of scene point cloud data acquired by the laser radar scanner after calibrating the laser radar scanning system, and the specific process is as follows:
Motion distortion occurring in scene point cloud data includes: a. if the scanning view field of the laser radar scanner always falls on the same straight wall surface, when the laser radar scanner moves along the direction vertical to the wall surface, the distance from the laser radar scanner to the same point of the wall surface is continuously reduced, so that point cloud data can be raised along the movement direction of the laser radar scanner; b. when the laser radar scanner moves to have angle deviation, the movement of the centers of the petals of the scanning lines can lead the point clouds of the petals of the scanning lines to be stretched into an irregular shape, namely, the centers are distorted; c. the laser radar scanner has a fixed field angle and cannot cover a circle, so that the coverage range of the field of view can be scaled along with the change of the distance, and the size of the petals of the scanning line can be scaled along with the movement of the laser radar scanner, namely, the edge distortion is generated;
And aiming at the motion distortion of a and b, removing the motion compensation of the inter-frame point cloud by the inertial measurement unit, wherein the specific calculation mode comprises the following steps of: P correct=Tk,i×Pi, wherein T k and T k+1 are start time and end time of a frame of point cloud recorded by a laser radar scanner, T i is time of ith point P i relative to start time in a kth frame of point cloud set S, S num is total point number of a frame of point cloud set, T k,k+1 is frame pose transformation, T k,i is pose transformation of P i point relative to start time, and P correct is point coordinate after motion distortion correction calculated by using pose transformation matrix corresponding to P i point;
for the motion distortion of c, the scanning points with unreliable edges are filtered out by limiting the size of the angle of view.
Preferably, in the intelligent understanding method of the humanoid robot scene based on the laser point cloud, after removing the motion distortion of the scene point cloud data, the method further comprises the step of reconstructing the scene point cloud data for point cloud storage, and the specific process is as follows:
The laser radar scanner scans a frame of point cloud data, wherein the frame of point cloud data comprises 20 scanning line petals in a graph, each scanning line petal is continuous at the center, and each scanning line petal sequentially stores the whole petals into one row of a matrix by taking the point closest to the scanning center as a starting point;
Storing spatially adjacent scan line petals into adjacent rows of the matrix, specifically: the 20 scanning line petals in one frame of point cloud data are obtained by rotating for 4 times by taking 5 scanning line petals as a period, and in one scanning period, the starting point numbers recorded by the 5 scanning line petals clockwise are sequentially 0, 1000, 2000, 500 and 1500; the difference of the start point numbers of the petals of the scanning lines in 4 scanning periods is 2500, and the start point numbers of the petals of the adjacent scanning lines indexed clockwise by one frame of scanning point cloud are calculated through the point number relation; and sequentially indexing corresponding scanning line petals through the obtained starting point numbers, and sequentially storing the petals according to rows to obtain a matrix with a continuous spatial relationship among the rows.
Preferably, the intelligent understanding method of the humanoid robot scene based on the laser point cloud further comprises the following steps after the reconstructed point cloud storage is performed on the scene point cloud data: the formula for extracting the angular points and plane point characteristics of each frame of point cloud data is modified, and specifically comprises the following steps:
A. calculating vector differences between the point with the calculated curvature and the front five points and the rear five points of the point with the calculated curvature to the scanning original point respectively, summing the vector differences, and reflecting the curvature value of the point with the length of the obtained sum, wherein the curvature value calculation formula is as follows:
Wherein c i is the curvature value of the scan point P i, S is an integer set of-5 to 5 excluding 0, AndVectors representing the ith and the jth points in the kth frame point cloud data under the laser radar scanner coordinate system L;
B. When the curvature value of the scanning point is larger than 1, judging the scanning point as an angular point; when the curvature value of the scanning point is smaller than 0.1, judging the scanning point as a plane point;
C. and C, extracting corner points and plane points in the point cloud data of each frame according to the steps A and B.
Preferably, in the intelligent understanding method of the humanoid robot scene based on the laser point cloud, the adaptive radius sampling module is used for processing the obtained scene point cloud data, specifically: firstly, calculating Euclidean distances between a sampling center point and all points in scene point cloud data, marking the points exceeding a specific threshold value R as Q, and keeping original values of the rest points; then, the points in the spherical sampling radius are arranged in ascending order according to the distance, and the nearest w points are extracted as sampling points; if the sampling points contain Q values, which means that the number of effective points in the spherical sampling radius is less than w, the sampling radius needs to be gradually increased until at least w effective points are met, so that enough effective points are ensured to be used for feature learning;
The extracting of the point cloud semantic features based on RPV-Net is carried out on the extracted effective point cloud data, and specifically comprises the following steps: inputting point cloud P epsilon R N×3, namely, point cloud data originally containing three dimensions, wherein N represents the number of the point clouds; obtaining seeds through semantic segmentation network pairs; extracting boundary voxel characteristics of the whole pilar by using the idea of the volume column pilar-based, voting the obtained pilar, and then performing Vote & Sampling & on the obtained point cloud characteristics
An operation of Grouping, wherein each seed independently generates a vote through a voting module; the votes are then grouped into h groups and processed by Proposal modules to generate the final Proposal, which outputs the 3D bounding box and semantic tags.
Preferably, in the intelligent understanding method of the humanoid robot scene based on the laser point cloud, the channel attention module is used for processing the point cloud data extracted by the semantic features of the point cloud, and the specific process is as follows:
The characteristics of the point cloud data may be represented as f=r BxNx1xC, where B, N, C represents batch size, number of point clouds, and number of characteristic channels, respectively;
And an average pooling method and a maximum pooling method are adopted along the dimension of the feature channel number C, the average pooling method is used for acquiring the integral features of the point cloud, the maximum pooling method extracts individual features with more distinguishing property, and then the integral features and the individual features are subjected to feature aggregation to generate two different global features F avg and F max of the point cloud.
Preferably, in the method for intelligently understanding a humanoid robot scene based on a laser point cloud, the understanding of the Bert pre-training model through the scene increases diversity of training data, specifically:
the scene understanding Bert pre-training model pair generates an countermeasure network GAN model expansion data set, and the diversity of training data is increased;
in the standard generation countermeasure network GAN model, a generator G and a discriminator D are included;
Setting training samples aiming at a scene point cloud data set by adjusting parameter settings for generating an countermeasure network GAN model;
the object of the generator G is to generate dummy data as close as possible to the real data, and the object of the arbiter D is to distinguish the real data from the dummy data generated by the generator G, and to add a balance coefficient γ to adjust the weight between the two.
Preferably, in the intelligent understanding method of the humanoid robot scene based on the laser point cloud, the processing procedure of the constructed encoder-decoder model is specifically as follows:
the point cloud encoder takes point clouds P epsilon R n*d as input, wherein n is the number of input point clouds, and d is the feature dimension of each input point;
The output of the point cloud encoder is a point feature sequence x= (X 1,x2,…,xm)∈Rm*c, where m is the number of output point features and c is the output point feature dimension;
Mapping the point feature X to a point mark y= (Y 1,y2,…,ym)∈Rk*c′, where c' is the dimension of the point mark, by a multi-layer perceptron MLP;
3D-MELL is a decoder-only interface that accepts a tag sequence consisting of text and point tags, i.e., a mixed token sequence, denoted as z= (Z 1,z2,…,zk)∈Rk*c′, where k is the total number of tokens;
With the self-attention mechanism, 3D-MELL can understand the contextual relationship between different types of tokens, enabling it to generate a response based on text and point cloud input; the output of 3D-MELL is a series of predictive markers Z <i=(z1,z2,…,zi-1), the prediction of the ith marker being conditioned on all previous markers.
The invention at least comprises the following beneficial effects:
The method aims at the problem of uneven point cloud data density, and the self-adaptive radius sampling module is used for processing the obtained scene point cloud data to extract enough effective point cloud data, so that enough points are ensured to be used for feature learning, and the understanding capability and the segmentation precision of the humanoid robot on complex scenes are improved.
Secondly, the invention designs a channel attention module aiming at the problems of low processing speed and lost characteristics caused by large environmental data quantity so as to solve the problems of reducing the parameter quantity and simultaneously avoiding the loss of single characteristics and integral characteristics.
Thirdly, the laser radar scanner used in the invention is a solid-state laser radar scanner, and aims at the characteristics of small visual angle, petal-shaped scanning track, non-repeated scanning mode and the like of the solid-state laser radar scanner, the motion distortion is removed, the point cloud of the solid-state laser radar scanner is reconstructed, the spatial relationship between scanning lines is established, and the characteristic extraction formula is corrected, so that the extraction efficiency and precision are improved, the conditions of insufficient characteristic points and wrong selection caused by the laser radar scanner and the environment are effectively improved, and the problems of layering and aggravation of point cloud are solved.
Fourth, the multi-mode mixed model of the Bert pre-training model and the encoder-decoder model is used for understanding scenes, so that the understanding capability of the humanoid robot on complex scenes is improved.
Additional advantages, objects, and features of the invention will be set forth in part in the description which follows and in part will become apparent to those having ordinary skill in the art upon examination of the following or may be learned from practice of the invention.
Drawings
Fig. 1 is a schematic flow chart of a humanoid robot scene intelligent understanding method based on laser point clouds according to an embodiment of the invention;
FIG. 2 is a schematic diagram of PTP time synchronization according to an embodiment of the present invention;
FIG. 3 is a flow chart of the laser radar scanner and IMU external parameter calibration according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of distortion of a lidar scanner perpendicular to a wall surface in an embodiment of the present invention;
FIG. 5 is a schematic diagram of center distortion when the laser radar scanner is angularly offset in an embodiment of the present invention;
FIG. 6 is a schematic diagram of edge distortion occurring when the field angle of a lidar scanner is small in an embodiment of the present invention;
FIG. 7 is a schematic diagram of a petal-shaped point cloud with a certain radian about laser scanning radar scanning in an embodiment of the present invention;
FIG. 8 is a graph showing the correspondence between the point number and the standard curvature difference under the standard condition in the embodiment of the present invention;
FIG. 9 is a graph showing the change of the distance correction value by calculating the difference between the curvatures of the points corresponding to different distances and the curvature of the points corresponding to the standard conditions as the distance correction value according to the embodiment of the invention;
FIG. 10 is a schematic diagram of a sampling scheme using Pointnet ++;
FIG. 11 is a schematic diagram of an adaptive radius sampling method of the present invention employed in an embodiment of the present invention;
FIG. 12 is a schematic diagram of point cloud semantic segmentation in an embodiment of the present invention;
FIG. 13 is a schematic diagram of a channel attention module according to an embodiment of the present invention;
FIG. 14 is a schematic diagram of an encoder-decoder model process in accordance with an embodiment of the present invention;
FIG. 15 is a 3D bounding box estimation plot of an overall point cloud acquired by a lidar system in an example of the present invention;
FIG. 16 is a 3D bounding box estimation map of a local extraction map obtained in an example of the present invention;
Fig. 17 is a 3D bounding box estimation map of an accurate local extraction map obtained in an example of the present invention.
Detailed Description
The present invention is described in further detail below with reference to the drawings to enable those skilled in the art to practice the invention by referring to the description.
It will be understood that terms, such as "having," "including," and "comprising," as used herein, do not preclude the presence or addition of one or more other elements or groups thereof.
As shown in fig. 1, the embodiment of the invention provides a humanoid robot scene intelligent understanding method based on laser point cloud, which comprises the following steps:
S100, assembling a laser radar scanning system on the humanoid robot, wherein the humanoid robot performs autonomous navigation and acquires three-dimensional live-action data through the laser radar scanning system to obtain scene point cloud data.
In the ROS operation system of the humanoid robot, a move_base navigation planning frame is added, a mbot _navigation random autonomous navigation function module is built, and the function of three-dimensional live-action data acquisition from fixed route travel to random autonomous navigation of the humanoid robot is realized.
The laser radar scanning system includes: the system comprises a laser radar scanner, an inertial measurement unit and a PTP time soft synchronization module, wherein the laser radar scanner and the inertial measurement unit are fused in a tight coupling mode, and the PTP time soft synchronization module enables time synchronization of the laser radar, the inertial measurement unit and the humanoid robot.
Specifically, the laser radar scanner is a solid laser radar scanner, is a scanning sensor for acquiring the geometric shape of a measured target through ranging, mainly comprises a laser emitter, a detector and an information processing system, has strong anti-interference capability and can work in a severe environment. The working principle of the laser radar scanner is that visible and near infrared laser waves are emitted to a measured object, the emitted laser light can be reflected back to the laser radar after reaching the measured object, the laser radar is received by a receiver, the distance from the measured object to the scanner and the intensity of the reflected laser light are calculated by combining the light velocity by recording the time from the emission of the laser beam to the reflection of the laser beam, and therefore the geometric shape of the measured object is obtained. Specific measurement methods include triangulation, time of flight (ToF) and phase difference.
An Inertial Measurement Unit (IMU) is a sensor for measuring physical quantities of an object, including linear acceleration, angular velocity and magnetic field direction, and is mainly composed of an accelerometer, a gyroscope and a magnetometer. The IMU works on the principle of determining the motion state of the instrument by measuring acceleration and angular velocity. The accelerometer can measure linear acceleration along X, Y, Z three axes, the gyroscope can measure angular velocity around X, Y, Z three axes, and the magnetometer can measure the direction and intensity of geomagnetic field to calibrate drift of the gyroscope, so that the moving direction and the rotating state of the equipment are obtained. The deviation exists between the physical quantity measured by the IMU and the actual physical quantity, and the deviation is mainly divided into two types: systematic errors such as bias (bias) and random errors such as noise, the presence of errors is therefore taken into account in the calculation process.
The basic calculation formula of the IMU is as follows:
Wherein the method comprises the steps of A true is the true acceleration, b a is the bias of the accelerometer, n a is the noise during the accelerometer measurements,Omega true is the true angular velocity, b g is the bias of the gyroscope, and n g is the noise during the gyroscope measurement.
In practical applications, such errors need to be compensated for in order to ensure measurement accuracy. The compensation method typically includes two calibration and filtering algorithms. Calibration may be performed with the device stationary or in motion, the magnitude of the error determined, and the calibrated error value subtracted from the subsequent measurement. The filtering algorithm (such as Kalman filtering) estimates and compensates for the bias in real time while in motion, reducing the effects of noise.
In order to obtain a high-precision point cloud map, a plurality of sensors are required to be used in a fusion mode, and multi-source data mutual constraint correction is obtained. The fusion mode of the sensor is loose coupling and tight coupling, and the fusion mode used in the embodiment of the invention is tight coupling. The tight coupling is the observation equation and the motion equation for motion state estimation constructed by the lidar scanner and the IMU together. The principle is that the data collected by the laser radar scanner and the IMU are put into the same state vector, the rotation matrix, the translation matrix, the speed and the IMU zero offset are jointly optimized, and the state vector is only required to be optimized. And constructing error items comprising laser radar measurement residual errors and IMU measurement residual errors by utilizing the laser radar scanner and IMU data so as to achieve the purpose of simultaneously optimizing the pose, the speed and the deviation of the tightly coupled system. The tight coupling system has compact structure, high integration among the sensors, mutual constraint among the sensor models, more sufficient information utilization and higher positioning and mapping precision and efficiency. The key problem of the multi-sensor fusion operation is multi-sensor data fusion, wherein the data fusion mainly comprises two aspects of content, namely time synchronization and coordinate system alignment of multi-sensor data.
For time synchronization, the embodiment of the invention adopts a PTP time soft synchronization module, which specifically comprises the following steps:
The lidar scanner used in the embodiments of the present invention is a solid state lidar, so that PTPv2 may be used for time synchronization between the lidar scanner and other devices without GPS and PPS hardware signals. The method requires minimal external hardware support and only needs to have one master clock device in the whole network. I.e. the clock of the solid-state laser radar scanner is synchronized to the host computer in use, the synchronization process follows the IEEE1588v2.0PTP protocol, and the delay request-response mechanism Livox Mid-40 in the protocol is used as a slave end to perform PTP time synchronization with the master port (clock equipment of the host computer).
The PTP synchronization principle is that periodic time information is sent from a master port, the slave port receives the information with a time stamp, and local time is adjusted by calculating time delay and time difference between sending and receiving of the information, so that time between the slave device and the master device is kept consistent, and the schematic diagram is shown in fig. 2.
The transmission path delay can be expressed as:
the time offset can be expressed as:
The synchronization flow is as follows:
(1) The master port sends a Sync synchronous message to the slave port, and records time T1;
(2) After receiving the Sync message from the port, recording the receiving time T2;
(3) The master port sends a Sync message, and then sends a Follow_Up following message with a T1 value to the slave port;
(4) The slave port sends delay_req Delay request message to the master port, and the sending time T3 is recorded;
(5) After receiving the delay_req message, the main port records the receiving time T4;
(6) The master port sends a delay_resp Delay request response message carrying T4 to the slave port.
S110, calibrating the laser radar scanning system, specifically:
S111, evaluating noise characteristics and bias stability of the inertial measurement unit by using an Allan variance analysis method, and performing internal calibration on the inertial measurement unit.
The error of the IMU is mainly Gaussian white noise and bias (bias random walk), and the aim of calibrating the internal parameters of the IMU is to improve the accuracy of positioning, navigation and attitude estimation.
The Allen variance is a time domain analysis method that can separate and identify noise [59] of different sources from time series data, and the Allen variance σ 2 (τ) is a function of the observation time interval τ.
Where N is the number of data points, x i is the ith data point, and τ is the sampling time interval.
S112, calibrating external parameters of the laser radar scanner and the inertial measurement unit by adopting a lidar_align calibration method, and acquiring a conversion matrix between a coordinate system of the laser radar scanner and a coordinate system of the inertial measurement unit so as to convert point cloud data under the coordinate system of the laser radar scanner into the coordinate system of the inertial measurement unit.
The coordinate system is aligned, namely, the point cloud data is converted into the coordinate system of the IMU sensor, so that the subsequent data processing is facilitated. Therefore, the relative position relation between the two sensor coordinate systems needs to be acquired when the coordinate systems are aligned, so that external parameter calibration of the laser radar and the IMU needs to be performed, and a conversion matrix between the two coordinate systems is acquired. The essence of external parameter calibration is to solve the problem of data optimization, namely, the point cloud data acquired by a laser radar scanner is converted into an IMU coordinate system, and the point clouds of the front frame and the back frame can be overlapped to the greatest extent, so that the value of the external parameter is what is in the case. The calibration method adopted by the embodiment of the invention is lidar_align, and the core principle is that the original point cloud is randomly sampled (most of the point cloud extracted by default is repeated points), the closest point in the next frame of point cloud is found by kdtree method to calculate the distance, the K adjacent error can be generated in the calculation process, and when the conversion matrix is not optimal, the error can be increased, and the nonlinear optimization library nlopt is used for optimizing the error, so that the K adjacent error is approximately 0, and the rotation matrix and the translation matrix at the moment are solved.
The specific calibration flow is shown in fig. 3, wherein, the calibration data should be noted to contain a large amount of rotation and translation during the recording process, and the data amount is controlled within 2G, so that the calibration result has higher precision.
S120, after calibrating the laser radar scanning system, the method further comprises the step of removing motion distortion of scene point cloud data acquired by the laser radar scanner, wherein the specific process is as follows:
Motion distortion occurring in scene point cloud data includes: a. if the scanning view field of the laser radar scanner always falls on the same straight wall surface, when the laser radar scanner moves along the direction vertical to the wall surface, the distance from the laser radar scanner to the same point of the wall surface is continuously reduced, so that point cloud data can be raised along the movement direction of the laser radar scanner, as shown in fig. 4; b. when the movement of the laser radar scanner is angularly offset, the movement of the centers of the petals of the scanning lines can lead to the point cloud of the petals of the scanning lines to be stretched into an irregular shape, namely, the centers of the point cloud of the petals of the scanning lines are distorted, as shown in fig. 5; c. the view angle of the laser radar scanner is fixed and cannot cover a circle, so that the coverage range of the view field can be scaled along with the change of the distance, the size of the petals of the scanning line can be scaled along with the movement of the laser radar scanner, namely, the edges are distorted, and the graph is shown as 6;
And aiming at the motion distortion of a and b, the inertial measurement unit is used for removing the inter-frame point cloud by performing motion compensation, the acquisition frequency of the IMU is 100HZ, and the laser radar point cloud with the frequency of 10HZ can be interpolated. The specific calculation mode is as follows: P correct=Tk,i×Pi, wherein T k and T k+1 are start time and end time of a frame of point cloud recorded by the laser radar scanner, T i is time of ith point P i relative to start time in a kth frame of point cloud set S, S num is total point number of a frame of point cloud set, T k,k+1 is frame pose transformation, T k,i is pose transformation of P i point relative to start time, and P correct is point coordinate after motion distortion correction calculated by using pose transformation matrix corresponding to P i point. It should be noted that, in the calculation of t i, it is assumed that the solid-state laser radar scanner emits laser at uniform time intervals and the internal prism rotates at a uniform speed; in calculation T k,i, assuming that the lidar scanner remains in constant motion for one frame time, the pose change is also uniform for one frame time.
By the motion distortion removal method, point clouds in the same frame can be corrected to the starting moment of the current frame approximately, and the influence of the motion distortion on the point cloud characteristics is eliminated.
For the motion distortion of c, the scanning points with unreliable edges are filtered out by limiting the size of the angle of view.
It should be noted that, in the embodiment of the invention, the scanning frequency of the solid-state laser radar scanner is 10Hz, and the scanning time of one frame of point cloud is 0.1s. Ideally, the same frame of point clouds should be acquired at the same time, but in practice, a frame of point clouds is acquired uniformly within 0.1s. While the laser radar scanner is in motion during data acquisition, when the last point of the point cloud of the current frame is acquired, the scanning positions of the laser radar scanner and the first point of the current frame are different. However, the time stamps of all scanning points in a frame are the time when the point cloud starts to be acquired, and the point cloud of the frame is considered to be acquired at the starting time. The motion of the laser radar scanner itself affects the scanning point cloud, and motion distortion of the point cloud is generated. The more intense the lidar motion during a frame time, the more severe the resulting motion distortion.
S130, performing motion distortion removal on scene point cloud data, and then performing reconstruction point cloud storage on the scene point cloud data, wherein the specific process is as follows:
The laser radar scanner scans a frame of point cloud data, wherein the frame of point cloud data comprises 20 scanning line petals in a graph, each scanning line petal is continuous at the center, and each scanning line petal sequentially stores the whole petals into one row of a matrix by taking the point closest to the scanning center as a starting point;
Storing spatially adjacent scan line petals into adjacent rows of the matrix, specifically: the 20 scanning line petals in one frame of point cloud data are obtained by rotating for 4 times by taking 5 scanning line petals as a period, and in one scanning period, the starting point numbers recorded by the 5 scanning line petals clockwise are sequentially 0, 1000, 2000, 500 and 1500; the difference of the start point numbers of the petals of the scanning lines in 4 scanning periods is 2500, and the start point numbers of the petals of the adjacent scanning lines indexed clockwise by one frame of scanning point cloud are calculated through the point number relation; and sequentially indexing corresponding scanning line petals through the obtained starting point numbers, and sequentially storing the petals according to rows to obtain a matrix with a continuous spatial relationship among the rows.
It should be noted that, a frame of point cloud after removing the motion distortion is generally stored in a matrix according to a scanning sequence, so that the algorithm can quickly index and call the point cloud row by row or column by column. The solid-state laser radar scanning point cloud used in the embodiment of the invention can be stored in the matrix, but the scanning point cloud does not have obvious row and column characteristics in space due to the non-repeated scanning mode of the solid-state laser radar, so that the row and column characteristics of the storage matrix have no physical significance, and the index and relation construction of the point cloud cannot be carried out in space by using the row and column. Therefore, the point cloud of the solid-state lidar needs to be reconstructed to establish the spatial relationship between the scanning lines.
The solid-state laser radar scanning point cloud is composed of 20 petals in a graph, each petal is continuous at the center, and the scanning point cloud of a single petal can be stored in a matrix in rows. Each of the petals of the scanning line starts from the point closest to the scanning center, and the whole petals are sequentially stored in one row of the matrix. If there is a null point with return value on the scan line petals, which is replaced by the original point coordinates, the null point is removed to ensure that the same-row scan lines in the matrix are continuous in space. Since petals having consecutive dot numbers on a scanning line are not adjacent in space, a matrix stored in order of dot numbers does not have a spatial relationship between adjacent rows, and petals adjacent in space need to be stored in adjacent rows of the matrix.
After the point cloud space relation matrix is established according to step S130, the same feature can be quickly indexed in adjacent rows and columns by utilizing the matrix row-column relation during the subsequent feature point extraction and feature matching, without traversing all feature points, so that the calculated amount is reduced, and the algorithm running speed is increased. The method of removing empty spots and storing them in the order of adjacent petals may result in the discontinuity in the spot numbers of adjacent rows and columns of the matrix. Although the matrix stored according to the method loses the original continuous feature of the corresponding point numbers, a more efficient and rapid space index mode is provided compared with the point number index mode, and the space features among the points are reconstructed.
S140, after the scene point cloud data are subjected to the reconstruction point cloud storage, the method further comprises the following steps: the formula for extracting the angular points and plane point characteristics of each frame of point cloud data is modified, and specifically comprises the following steps:
A. calculating vector differences between the point with the calculated curvature and the front five points and the rear five points of the point with the calculated curvature to the scanning original point respectively, summing the vector differences, and reflecting the curvature value of the point with the length of the obtained sum, wherein the curvature value calculation formula is as follows:
Wherein c i is the curvature value of the scan point P i, S is an integer set of-5 to 5 excluding 0, AndVectors representing the ith and the jth points in the kth frame point cloud data under the laser radar scanner coordinate system L;
B. When the curvature value of the scanning point is larger than 1, judging the scanning point as an angular point; when the curvature value of the scanning point is smaller than 0.1, judging the scanning point as a plane point;
C. and C, extracting corner points and plane points in the point cloud data of each frame according to the steps A and B.
It should be noted that, the embodiment of the invention provides a LEGO-LoopLOAM algorithm, and the angular point and the plane point of each frame in the point cloud are extracted by calculating the curvature of each point as a judging condition. The method calculates vector differences between the calculated curvature points and the front and rear five points to the scanning origin point, sums the vector differences, and reflects the curvature of the calculated points according to the length of the sum, wherein the method defines corner points with curvature values larger than 1 and plane points with curvature values smaller than 0.1. Let the spatial coordinates of the curvature points be P i(xi,yi,zi), the curvature calculation formula is as follows:
where c i is the curvature of point P i, S is a set of integers from-5 to 5 that do not contain 0, AndAnd (3) representing vectors of the ith point and the j point in the kth frame point cloud data under the laser radar scanner coordinate system L.
The method for replacing the point curvature by the calculated value skillfully reduces the calculated amount of the actual curvature calculation method, has higher efficiency and has good application effect. It requires that the points on the scan line have no height difference and can be projected on a plane under the radar coordinate system without loss. The non-repetitive petal-shaped scanning line of the solid-state laser radar scanner does not meet the requirement and is not suitable for the method, so that the characteristic extraction formula is corrected according to the scanning characteristics of the solid-state laser radar scanner.
The single scanning line point cloud of the mechanical laser radar scanner is arranged in a straight line along the direction of the XOY plane, and the curvature calculated according to the original formula is the curvature of the point on the XOY plane. But the single frame scanning point cloud of the solid-state laser radar scanner is petal-shaped. The arrangement of the point clouds on a plane is not flat, and itself has a certain curvature, as shown in fig. 7. The inherent radian of the scan line may interfere with the calculation result of the feature point extraction formula, resulting in erroneous judgment of the diagonal feature point and the planar feature point.
Because the scanning line of the solid-state laser radar scanner has radian, each point on the scanning line has corresponding curvature even on a plane, so that the curvature value calculated by the solid-state laser radar scanning point cloud by using the formula is discontinuous in change and is inconsistent with the actual curvature change. In an actual experiment, since the planar feature point is determined to be an angular feature point, it is sometimes necessary to eliminate the influence of curvature of the scan point.
First, the difference between the curvature of each point on the solid state scanner scan line and the curvature of the flat scan line point needs to be determined under standard conditions. The study defines that the scanning head is 10 meters from the vertical plane, and that the scanning direction is perpendicular to the plane is the standard condition. The calculated point curvature of the scanning point cloud under the condition is taken as a standard curvature difference c stdif between the solid-state laser radar scanning line and the flat scanning line. Under standard conditions, petals of each scanning line in the scanning point cloud are basically similar, 500 points are arranged on each scanning petal, and the two sides of a connecting line of a scanning center point and a farthest end point of each single petal are basically symmetrical. Therefore, only the standard curvature difference of 250 points of the half petals needs to be calculated, and the standard curvature difference of all points on the standard condition scanning line can be replaced by the corresponding points. Each point from the point closest to the center point of the scan on the single scan line to the furthest point of the petals is numbered, corresponding to numbers 0 through 249. The correspondence between the point number and the standard curvature difference of the point under the standard condition is shown in fig. 8. The standard curvature as a function of point number can be approximated by the following expression:
csrdif=5.28e—7×Index2+0.002
The standard curvature difference of the scanning point is not a constant value, and the size of the standard curvature difference is mainly influenced by two factors of an X-axis included angle of a connecting line of the scanning point and the scanning origin and a distance between the scanning point and the scanning origin. Therefore, the dynamic corresponding relation between the angle and the distance is required to be constructed according to the standard curvature difference so as to compensate the dynamic curvature difference of the scanner at different distances and different postures.
The inherent curvature of a point on a single petal of a single-frame point cloud can change along with the change of the angle theta of an X-axis connecting line of a scanning point and a scanning origin, so that the corresponding relation between the angle and a standard curvature difference needs to be determined. And calculating the corresponding relation between the included angle and the point number, so as to deduce the corresponding relation of the standard curvature difference along with the change of the included angle. The data recorded by the laser radar point cloud only has three-dimensional coordinates and reflection intensity of space points, so that the available known quantity calculated in the subsequent expression is limited to be the three-dimensional coordinates. The calculation formula of the included angle theta of any point on the scanning line comprises the following steps:
Calculating an included angle corresponding to the point number under the standard condition, and fitting a linear expression as follows:
θ=1.9e—4×Index+0.54
The corresponding relation expression of the simultaneous obtainable standard curvature difference and the X-axis included angle of the connecting line of the scanning point and the scanning origin is converted into the corresponding relation expression of the standard curvature difference and the spatial coordinate of the point, and the specific formula is as follows:
The curvature calculation formula adopted by the invention is essentially a method for reflecting the curvature change through the difference value of the local point cloud distance. When the scanning distance becomes far, the distance difference from the adjacent point to the scanning center also becomes large, so that the corresponding curvature value is synchronously increased, and the calculated curvatures of the same feature under different scanning distances are different, so that the feature matching and loop detection precision are affected. Therefore, the influence of the distance on the curvature needs to be eliminated as much as possible, and the influence is mainly reflected in that the curvature calculated value becomes larger as the scanning distance increases.
In order to obtain a distance correction equation corresponding to curvature change, the invention collects point clouds every 2 meters on the same wall surface, the collection distance range is 10-60 meters, and the collection is 25 times. And calculating the point curvatures with the same point numbers on the single petals in the different distance scanning lines acquired each time. And calculating the difference value between the curvature of the corresponding points of different distances x i and the curvature of the corresponding points of the standard condition as a distance correction value. The distance correction value changes as shown in fig. 9. The corresponding expression is:
cdis=0.016xi+0.006
in conjunction with the above formulas, the dynamic correction value for the curvature can be found as follows:
the curvature calculation formula after correction of the invention is as follows:
And S200, processing the obtained scene point cloud data by using the adaptive radius sampling module to extract enough effective point cloud data.
The specific process of processing the obtained scene point cloud data by the self-adaptive radius sampling module is as follows: firstly, calculating Euclidean distances between a sampling center point and all points in scene point cloud data, marking the points exceeding a specific threshold value R as Q, and keeping original values of the rest points; then, the points in the spherical sampling radius are arranged in ascending order according to the distance, and the nearest w points are extracted as sampling points; if the sampling points contain Q values, the number of effective points in the spherical sampling radius is less than w, the sampling radius is required to be gradually increased until at least w effective points are met, so that enough effective points are ensured to be used for feature learning, and the understanding capability and the segmentation precision of the humanoid robot model to complex scenes are improved. As shown in fig. 10 and 11, fig. 10 is a sampling mode of Pointnet ++, fig. 11 is a self-adaptive radius sampling method of the present invention, where green pellets represent point cloud data in an initial radius, yellow pellets represent point cloud data copied by selecting a maximum value, and red pellets and blue pellets represent point cloud features whose initial radius is learned.
In addition, the invention also carries out local sampling on the point cloud and defines a range, and the points in the range are selected as local features to extract so as to capture the fine geometric structure of the small neighborhood. Each local point is subjected to multi-layer perceptron (MLP) operation and local max pooling to obtain local global features. The local features are regrouped and processed on a larger scale to generate feature expression on a wider scale, and the process is repeated until the features of the whole point cloud data are acquired, so that the understanding and representing capability of the model on the whole point cloud structure is enhanced.
S300, extracting point cloud semantic features based on RPV-Net from the extracted effective point cloud data, and acquiring a 3D bounding box and semantic tags of the point cloud data forming the building component in the scene.
The extracting of the point cloud semantic features based on RPV-Net is carried out on the extracted effective point cloud data, and specifically comprises the following steps: inputting point cloud P epsilon R N×3, namely, point cloud data originally containing three dimensions; randomly sampling through a semantic segmentation backbone network and processing the semantic segmentation backbone network to obtain features with depth information of points, clustering the point features into M point subsets according to the point features, voting pairs to obtain votes for each point subset, estimating and classifying the shape of a boundary frame by using a BEV method based on pilar, and redefining seeds, namely obtaining seeds through the semantic segmentation network pairs; extracting Boundary voxel characteristics (Boundary voxel) of the whole pilar by using the concept of the body columnar pilar-based, voting the obtained pilar, then performing a Vote & Sampling & Grouping operation on the obtained point cloud characteristics, and independently generating a voting Vote by each seed through a voting module; the votes are then grouped into h groups and processed by Proposal modules to generate the final Proposal, which outputs the 3D bounding box and semantic tags (Sematic labels). As shown in fig. 12, given one input point cloud having XYZ coordinates, votes are generated from the point cloud, and the votes are grouped into M clusters, from which f grids are divided for each M voting cluster centers. Representative points in the partitioning grid imply possible areas of the object. Then, the seed points around the representative point are revisited, and the surrounding seed point features are aggregated to the representative point. The proposal refinement and classification module fuses and processes the aggregated voting characteristics and revisited seed characteristics to generate refined representative points and semantic categories of the objects, and can be conveniently converted into a three-dimensional object boundary box.
S400, processing the point cloud data subjected to the point cloud semantic feature extraction by using a channel attention module, acquiring the integral features of the point cloud by adopting an average pooling method, extracting individual features with differentiation by adopting a maximum pooling method, and carrying out feature aggregation on the integral features and the individual features to acquire the global features of the point cloud.
The specific treatment process comprises the following steps:
According to the method, after the high-dimensional characteristics of the three-dimensional point cloud data are obtained, the influence of each characteristic channel on the whole point cloud information is focused, and a channel attention mechanism of the point cloud data is realized. The mechanism highlights the information of the key channels by analyzing the information association on each channel of the point cloud features. Firstly, extracting information capable of reflecting global characteristics of point clouds, learning based on the information to obtain attention weights of all channels, and further obtaining the characteristics of the point clouds processed by a channel attention module. The channel attention module is shown in fig. 13.
The feature of the point cloud data may be denoted as f=r BxNx1xC, where B, N, C represents the batch size, the number of point clouds, and the number of feature channels, respectively. The invention carries out pooling operation along the dimension of the characteristic channel number C to generate a point cloud channel attention mechanism Mc epsilon R BxNx1xC. The output characteristic Fc is obtained by matrix cross multiplication of point cloud characteristics F and Mc, namely the invention is subjected to pooling along the dimension of the characteristic channel number C, and the formula can be expressed as follows: fc=f×mc (F).
In order to generate the point cloud channel attention mechanism Mc, the method adopts two modes of average pooling and maximum pooling along the dimension of the characteristic channel number N. Averaging pooling is used to obtain the global features of the point cloud, while maximizing pooling helps to extract more discriminative features. The two pooling modes perform feature aggregation on the point cloud data input features F to generate two different point cloud global features F avg and F max. In addition, the invention uses a multi-layer perceptron (MLP) to extract point cloud characteristics and adopts the MLP sharing parameters to train the characteristics. And the required channel attention mechanism weight is obtained by firstly reducing the coefficient to k and then recovering the size of the point cloud channel C and combining the nonlinear activation function Sigmoid. The point cloud channel attention mechanism formula can be expressed as: mc=s { MLP (Avg (F))+mlp (Max (F)) }, where s represents a Sigmoid function.
S500, constructing a multi-mode mixed model of a scene understanding Bert pre-training model and an encoder-decoder model, processing the acquired global characteristics of the point cloud, increasing the diversity of training data through the scene understanding Bert pre-training model, understanding the three-dimensional point cloud object into the three-dimensional ground object and the parameter information of the three-dimensional ground object through the encoder-decoder model, and improving the understanding capability of the humanoid robot on the scene.
S510, understanding the Bert pre-training model through the scene to increase the diversity of training data, specifically:
s511, the scene understanding Bert pre-training model helps to generate an expansion data set of an antagonism network GAN model, so that the diversity of training data is increased;
s512, in the standard generation countermeasure network GAN model, a generator G and a discriminator D are included;
S513, setting training samples aiming at a scene point cloud data set by adjusting and generating parameter settings of an antagonism network GAN model;
the objective of the generator G is to generate false data as close to real data as possible, and the objective of the discriminator D is to distinguish the real data from the false data generated by the generator G, and to add a balance coefficient γ to adjust the weight between the real data and the false data.
It should be noted that, the scene understanding Bert pre-training model can perform noise reduction processing on the extracted global features of the point cloud, expand the variety of the data set, increase the diversity of training data and strengthen the sample set. The output of the scene understanding Bert pre-training model is the data input for generating the countermeasure network GAN, and in the standard generating countermeasure network, the scene understanding Bert pre-training model mainly comprises a generator G and a discriminator D, and the scene understanding Bert pre-training model helps the generating countermeasure network GAN model learn wider data distribution, so that the generalization capability of generating the countermeasure network GAN model is improved.
Where V (D, G) is the overall optimization objective function, E represents the expectation, x is the true data, z is the noise sampled from some prior distribution p z, G (z) is the data generated by the generator from the noise, and gamma is a dynamically adjusted parameter that is adjusted according to the behavior of the generator and the arbiter during training to maintain the balance between the two. If the log (1-D (G (z)) value is low, the loss of the generator can be enhanced by increasing the gamma value, evaluating the accuracy of the arbiter after each epoch, and adjusting gamma accordingly.
S520, the constructed encoder-decoder model processing process specifically comprises the following steps:
S521, a point cloud encoder takes point clouds P epsilon R n*d as input, wherein n is the number of input point clouds, and d is the feature dimension of each input point; the output of the point cloud encoder is a point feature sequence x= (X 1,x2,…,xm)∈Rm*c, where m is the number of output point features and c is the output point feature dimension;
s522, mapping the point feature X to a point mark Y= (Y 1,y2,…,ym)∈Rk*c′ by a multi-layer perceptron MLP, wherein c' is the dimension of the point mark;
S523, 3D-MELL is an interface end of a decoder only, accepting a tag sequence consisting of text and point tags, i.e. a mixed token sequence, denoted z= (Z 1,z2,…,zk)∈Rk*c′, where k is the total number of tokens, using a self-attention mechanism, 3D-MELL can understand the context between different types of tokens so that it can generate a response based on text and point cloud input, the output of 3D-MELL is a series of predictive tags Z <i=(z1,z2,…,zi-1), the prediction of the i-th tag being conditioned on all previous tags.
Fig. 14 is a schematic diagram of an encoder-decoder model processing procedure, where the point cloud encoder includes a self-attention mechanism module, a feedforward neural network module, and a geometric information aware adapter module.
The built encoder-decoder model can understand the intelligent extracted three-dimensional point cloud objects into three-dimensional ground objects such as beam columns, arches, tablets, pavements, flower beds, trees and the like, and also has information such as size, age and material of the three-dimensional ground objects. Therefore, the multi-mode hybrid model can intelligently identify information such as point cloud, images, texts and the like.
In summary, through the above embodiments, the humanoid robot will have high environmental awareness and intelligent understanding capabilities, so that it can work more effectively and autonomously in various application scenarios. This will not only push the development of humanoid robot technology, but will also contribute to the strength of future intelligent society construction.
The following is an example.
The lidar scanning system is mounted behind the eyes of the robot head. And acquiring three-dimensional information of the member to be assembled by using a laser radar scanning system, and acquiring high-precision three-dimensional scene point cloud data. And processing the obtained scene point cloud data by using the self-adaptive radius sampling module, extracting point cloud semantic features based on RPV-Net from the extracted effective point cloud data, and acquiring a 3D bounding box and a semantic tag of the point cloud data forming the building component in the scene. As shown in fig. 15 to 17. FIG. 15 is a 3D bounding box estimation plot of an overall point cloud acquired by a lidar system; FIG. 16 is a 3D bounding box estimation map of the acquired partial extraction map; fig. 17 is a 3D bounding box estimation map of the acquired accurate local extraction map. And then carrying out understanding communication by combining a large language model.
The number of equipment and the scale of processing described herein are intended to simplify the description of the present invention. Applications, modifications and variations of the present invention will be readily apparent to those skilled in the art.
Although embodiments of the present invention have been disclosed above, it is not limited to the details and embodiments shown and described, it is well suited to various fields of use for which the invention would be readily apparent to those skilled in the art, and accordingly, the invention is not limited to the specific details and illustrations shown and described herein, without departing from the general concepts defined in the claims and their equivalents.

Claims (10)

1.基于激光点云的人形机器人场景智能理解方法,其特征在于,包括:1. A humanoid robot scene intelligent understanding method based on laser point cloud, characterized by comprising: 在人形机器人上装配激光雷达扫描系统,所述人形机器人自主导航并通过所述激光雷达扫描系统进行三维实景数据的采集,得到场景点云数据;A laser radar scanning system is installed on a humanoid robot, and the humanoid robot autonomously navigates and collects three-dimensional real scene data through the laser radar scanning system to obtain scene point cloud data; 使用自适应半径采样模块对得到的场景点云数据进行处理,以提取足够的有效点云数据;The obtained scene point cloud data is processed using an adaptive radius sampling module to extract sufficient valid point cloud data; 对提取的有效点云数据进行基于RPV-Net的点云语义特征提取,获取场景中构成建筑构件的点云数据的3D包围盒和语义标签;Perform point cloud semantic feature extraction based on RPV-Net on the extracted valid point cloud data to obtain the 3D bounding box and semantic label of the point cloud data constituting the building components in the scene; 使用通道注意力模块对经过点云语义特征提取后的点云数据进行处理,采用平均池化方法获取点云的整体性特征,采用最大池化方法提取具有区分性的个体特征,将整体性特征和个体特征进行特征聚合,获取点云全局特征;The channel attention module is used to process the point cloud data after the semantic features of the point cloud are extracted. The average pooling method is used to obtain the overall features of the point cloud. The maximum pooling method is used to extract the discriminative individual features. The overall features and individual features are aggregated to obtain the global features of the point cloud. 构建场景理解Bert预训练模型和编码器-解码器模型的多模态混合模型,对获取的点云全局特征进行处理,通过所述场景理解Bert预训练模型增加训练数据的多样性,通过所述编码器-解码器模型将三维点云对象理解成三维地面物体和三维地面物体的参数信息,提升人形机器人对场景的理解能力。A multimodal hybrid model of a scene understanding Bert pre-training model and an encoder-decoder model is constructed to process the acquired point cloud global features. The scene understanding Bert pre-training model is used to increase the diversity of training data. The encoder-decoder model is used to understand three-dimensional point cloud objects as three-dimensional ground objects and parameter information of three-dimensional ground objects, thereby improving the humanoid robot's ability to understand scenes. 2.如权利要求1所述的基于激光点云的人形机器人场景智能理解方法,其特征在于,所述激光雷达扫描系统包括:激光雷达扫描仪、惯性测量单元以及PTP时间软同步模块,其中,所述激光雷达扫描仪与所述惯性测量单元通过紧耦合方式进行融合,所述PTP时间软同步模块使所述激光雷达、惯性测量单元以及所述人形机器人的时间同步。2. The humanoid robot scene intelligent understanding method based on laser point cloud as described in claim 1 is characterized in that the laser radar scanning system includes: a laser radar scanner, an inertial measurement unit and a PTP time soft synchronization module, wherein the laser radar scanner and the inertial measurement unit are fused through a tight coupling manner, and the PTP time soft synchronization module synchronizes the time of the laser radar, the inertial measurement unit and the humanoid robot. 3.如权利要求2所述的基于激光点云的人形机器人场景智能理解方法,其特征在于,还包括对所述激光雷达扫描系统进行标定,具体为:3. The method for intelligent scene understanding of a humanoid robot based on laser point cloud according to claim 2, characterized in that it also includes calibrating the laser radar scanning system, specifically: 使用Allan方差分析方法来评估所述惯性测量单元的噪声特性和偏置稳定性,对所述惯性测量单元进行内标定;Using the Allan variance analysis method to evaluate the noise characteristics and bias stability of the inertial measurement unit, and internally calibrating the inertial measurement unit; 采用lidar_align标定方法对所述激光雷达扫描仪和所述惯性测量单元的外参进行标定,获取所述激光雷达扫描仪的坐标系与所述惯性测量单元的坐标系之间的转换矩阵,以将在所述激光雷达扫描仪坐标系下的点云数据转换到所述惯性测量单元的坐标系下。The lidar_align calibration method is used to calibrate the external parameters of the laser radar scanner and the inertial measurement unit, and a transformation matrix between the coordinate system of the laser radar scanner and the coordinate system of the inertial measurement unit is obtained to transform the point cloud data in the laser radar scanner coordinate system into the coordinate system of the inertial measurement unit. 4.如权利要求3所述的基于激光点云的人形机器人场景智能理解方法,其特征在于,对所述激光雷达扫描系统进行标定后,还包括对所述激光雷达扫描仪采集的场景点云数据进行运动畸变去除,具体过程为:4. The method for intelligent scene understanding of a humanoid robot based on laser point cloud according to claim 3 is characterized in that after calibrating the laser radar scanning system, it also includes removing motion distortion of the scene point cloud data collected by the laser radar scanner, and the specific process is as follows: 在场景点云数据中出现的运动畸变包括:a、若激光雷达扫描仪的扫描视场始终落在同一平直墙面上,当沿着垂直于墙面的方向运动时,激光雷达扫描仪到墙面同一点的距离不断减小,导致点云数据会沿激光雷达扫描仪运动方向凸起;b、激光雷达扫描仪运动发生角度偏移时,扫描线花瓣中心的移动会导致扫描线花瓣点云被拉伸成不规则的形状,即中心畸变;c、激光雷达扫描仪视场角固定且不能覆盖一周,导致视场覆盖的范围会随着距离的变化缩放,使扫描线花瓣的大小随着激光雷达扫描仪的运动产生缩放,即边缘畸变;The motion distortions that appear in the scene point cloud data include: a. If the scanning field of view of the laser radar scanner always falls on the same flat wall, when moving in a direction perpendicular to the wall, the distance from the laser radar scanner to the same point on the wall will continue to decrease, causing the point cloud data to bulge along the direction of movement of the laser radar scanner; b. When the laser radar scanner moves at an angle, the movement of the center of the scan line petals will cause the scan line petal point cloud to be stretched into an irregular shape, i.e., center distortion; c. The laser radar scanner has a fixed field of view angle and cannot cover a circle, causing the range of the field of view to scale with the change in distance, causing the size of the scan line petals to scale with the movement of the laser radar scanner, i.e., edge distortion; 针对a和b的运动畸变,通过所述惯性测量单元对帧间点云进行运动补偿来去除,具体计算方式:Pcorrect=Tk,i×Pi,其中,tk和tk+1为激光雷达扫描仪记录一帧点云的起始时刻和终止时刻,ti为第k帧点云集合S内第i点Pi相对于起始时刻的时间,Snum为一帧点云集合的总点数,Tk,k+1为帧间位姿变换,Tk,i为Pi点相对于起始时刻的位姿变换,Pcorrect为利用Pi点对应的位姿变换矩阵计算出的运动畸变纠正后的点坐标;The motion distortion of a and b is removed by performing motion compensation on the point cloud between frames through the inertial measurement unit. The specific calculation method is: P correct = T k,i × P i , where t k and t k+1 are the start and end times of a frame of point cloud recorded by the laser radar scanner, ti is the time of the i-th point P i in the k-th frame point cloud set S relative to the start time, S num is the total number of points in a frame point cloud set, T k,k+1 is the inter-frame pose transformation, T k,i is the pose transformation of the P i point relative to the start time, and P correct is the point coordinate after motion distortion correction calculated using the pose transformation matrix corresponding to the P i point; 针对c的运动畸变,通过限制视场角大小滤除边缘不可靠的扫描点。In view of the motion distortion of c, unreliable scanning points at the edge are filtered out by limiting the field of view angle. 5.如权利要求4所述的基于激光点云的人形机器人场景智能理解方法,其特征在于,对场景点云数据进行运动畸变去除后还包括对场景点云数据进行重构点云存储,具体过程为:5. The method for intelligent scene understanding of a humanoid robot based on laser point cloud according to claim 4 is characterized in that after removing motion distortion of the scene point cloud data, the method further comprises reconstructing the scene point cloud data for point cloud storage, and the specific process is as follows: 所述激光雷达扫描仪扫描的一帧点云数据在图形上包括20个扫描线花瓣,每个扫描线花瓣在中心处连续,每个扫描线花瓣以其最靠近扫描中心的点为起点,将整个花瓣顺序存入矩阵的一行中;A frame of point cloud data scanned by the laser radar scanner includes 20 scan line petals in the graphics, each scan line petal is continuous at the center, each scan line petal takes its point closest to the scan center as the starting point, and the entire petal is sequentially stored in a row of the matrix; 将空间上相邻的扫描线花瓣存储到矩阵相邻行中,具体为:在一帧点云数据中的20个扫描线花瓣是以5个扫描线花瓣为周期旋转4次而得到的,在一个扫描周期中,5个扫描线花瓣顺时针记录的起始点号依次为0、1000、2000、500、1500;4个扫描周期的扫描线花瓣起始点号差为2500,通过点号关系来计算一帧扫描点云按顺时针索引的相邻扫描线花瓣起始点号;通过得到的起始点号顺序索引对应的扫描线花瓣,并依次按行存储,得到空间关系在行间连续的矩阵。The spatially adjacent scan line petals are stored in adjacent rows of the matrix. Specifically, the 20 scan line petals in a frame of point cloud data are obtained by rotating 4 times with 5 scan line petals as a period. In one scanning cycle, the starting point numbers of the 5 scan line petals recorded clockwise are 0, 1000, 2000, 500, and 1500 respectively; the difference in the starting point numbers of the scan line petals in 4 scanning cycles is 2500, and the starting point numbers of adjacent scan line petals indexed clockwise in a frame of scanning point cloud are calculated through the point number relationship; the corresponding scan line petals are indexed sequentially by the obtained starting point numbers, and stored row by row in sequence to obtain a matrix with continuous spatial relationships between rows. 6.如权利要求5所述的基于激光点云的人形机器人场景智能理解方法,其特征在于,对场景点云数据进行重构点云存储后,还包括:对提取每帧点云数据的角点和平面点特征的公式进行修正,具体为:6. The method for intelligent scene understanding of a humanoid robot based on laser point cloud according to claim 5, characterized in that after reconstructing the scene point cloud data for point cloud storage, it also includes: modifying the formula for extracting corner point and plane point features of each frame of point cloud data, specifically: A、计算所求曲率的点与其前面五个点和后面五个点分别到扫描原点的向量差并求和,并以得到的和的长度大小来反应所求点的曲率值大小,曲率值计算公式为:A. Calculate the vector difference between the point of curvature to be sought and the five points in front of it and the five points behind it to the origin of the scan, and sum them up. The length of the sum is used to reflect the curvature value of the point to be sought. The curvature value calculation formula is: 其中,ci为扫描点Pi的曲率值,S为不包含0的-5到5的整数集合,表示在激光雷达扫描仪坐标系L下第k帧点云数据内第i点、j点的向量;Where, ci is the curvature value of the scanning point Pi , S is an integer set from -5 to 5 that does not include 0, and Represents the vector of the i-th point and the j-th point in the k-th frame point cloud data in the laser radar scanner coordinate system L; B、当扫描点的曲率值大于1时,判断该扫描点为角点;当扫描点的曲率值小于0.1时,判断该扫描点为平面点;B. When the curvature value of the scanning point is greater than 1, the scanning point is judged to be a corner point; when the curvature value of the scanning point is less than 0.1, the scanning point is judged to be a plane point; C、根据步骤A和B提取每帧点云数据中的角点和平面点。C. Extract the corner points and plane points in each frame of point cloud data according to steps A and B. 7.如权利要求1所述的基于激光点云的人形机器人场景智能理解方法,其特征在于,所述使用自适应半径采样模块对得到的场景点云数据进行处理,具体为:首先计算采样中心点与场景点云数据内的所有点的欧氏距离,将超过特定阈值R的点标记为Q,其余点保留原值;然后对球形采样半径内的点按距离升序排列,并提取最近的w个点作为采样点;若采样点中包含Q值,表示球形采样半径内的有效点数不足w个,则需要逐步增大采样半径,直至满足至少w个有效点的条件,以确保有足够的有效点用于特征学习;7. The method for intelligent scene understanding of a humanoid robot based on laser point cloud as claimed in claim 1 is characterized in that the adaptive radius sampling module is used to process the obtained scene point cloud data, specifically: firstly, the Euclidean distance between the sampling center point and all points in the scene point cloud data is calculated, and the points exceeding a specific threshold R are marked as Q, and the remaining points retain the original value; then, the points within the spherical sampling radius are arranged in ascending order of distance, and the nearest w points are extracted as sampling points; if the sampling point contains a Q value, indicating that the number of valid points within the spherical sampling radius is less than w, then the sampling radius needs to be gradually increased until the condition of at least w valid points is met to ensure that there are enough valid points for feature learning; 所述对提取的有效点云数据进行基于RPV-Net的点云语义特征提取,具体为:输入点云P∈RN×3,即原始包含三个维度的点云数据,其中,N表示点云数目;通过语义分割网络对得到seeds;运用体柱化pillar-based的思想,提取出整体pillar的边界体素特征,对所得到的pillar进行一个投票,然后再对得到的点云特征进行一个Vote&Sampling&Grouping的操作,每个种子seed通过投票模块独立地生成一个投票;然后将投票分组为h个Group,并由Proposal模块处理,生成最终的Proposal,最后输出3D包围盒与语义标签。The extracted valid point cloud data is subjected to point cloud semantic feature extraction based on RPV-Net, specifically as follows: input point cloud P∈R N×3 , i.e., original point cloud data containing three dimensions, wherein N represents the number of point clouds; seeds are obtained through a semantic segmentation network; the pillar-based idea is used to extract the boundary voxel features of the whole pillar, a vote is performed on the obtained pillar, and then a Vote & Sampling & Grouping operation is performed on the obtained point cloud features, and each seed independently generates a vote through a voting module; then the votes are grouped into h groups, which are processed by a Proposal module to generate a final Proposal, and finally a 3D bounding box and a semantic label are output. 8.如权利要求1所述的基于激光点云的人形机器人场景智能理解方法,其特征在于,通过使用通道注意力模块对经过点云语义特征提取后的点云数据进行处理,具体过程为:8. The method for intelligent scene understanding of a humanoid robot based on laser point cloud according to claim 1 is characterized in that the point cloud data after the point cloud semantic feature extraction is processed by using a channel attention module, and the specific process is as follows: 点云数据的特征可以表示为F=RBxNx1xC,其中B、N、C分别代表批次大小、点云数目和特征通道数;The features of point cloud data can be expressed as F = RBxNx1xC , where B, N, and C represent the batch size, the number of point clouds, and the number of feature channels, respectively; 沿特征通道数C的维度采用平均池化方法和最大池化方法,平均池化方法用于获取点云的整体性特征,最大池化方法提取更具区分性的个体特征,然后将整体性特征和个体特征进行特征聚合,产生两种不同的点云全局特征Favg和FmaxThe average pooling method and the maximum pooling method are used along the dimension of the number of feature channels C. The average pooling method is used to obtain the overall features of the point cloud, and the maximum pooling method is used to extract more distinctive individual features. The overall features and individual features are then aggregated to generate two different point cloud global features F avg and F max . 9.如权利要求1所述的基于激光点云的人形机器人场景智能理解方法,其特征在于,所述通过所述场景理解Bert预训练模型增加训练数据的多样性,具体为:9. The method for intelligent scene understanding of a humanoid robot based on laser point cloud according to claim 1, characterized in that the diversity of training data is increased by the scene understanding Bert pre-training model, specifically: 所述场景理解Bert预训练模型对生成对抗网络GAN模型扩充数据集,增加训练数据的多样性;The scene understanding BERT pre-training model expands the data set for the generative adversarial network (GAN) model and increases the diversity of training data; 在标准的生成对抗网络GAN模型中,包括生成器G和判别器D;In the standard generative adversarial network GAN model, it includes a generator G and a discriminator D; 通过调整生成对抗网络GAN模型的参数设置,针对场景点云数据集,设置训练样本;By adjusting the parameter settings of the Generative Adversarial Network (GAN) model, training samples are set for the scene point cloud dataset; 生成器G的目标是生成尽可能接近真实数据的假数据,判别器D的目标是区分真实数据和生成器G产生的假数据,添加一个平衡系数γ调节二者之间的权重。The goal of the generator G is to generate fake data that is as close to the real data as possible, and the goal of the discriminator D is to distinguish between the real data and the fake data generated by the generator G, adding a balancing coefficient γ to adjust the weight between the two. 10.如权利要求9所述的基于激光点云的人形机器人场景智能理解方法,其特征在于,构建的编码器-解码器模型处理过程具体为:10. The method for intelligent scene understanding of a humanoid robot based on laser point cloud according to claim 9, characterized in that the encoder-decoder model processing process is specifically as follows: 点云编码器将点云P∈Rn*d作为输入,其中n是输入点云数目,d是输入每个点的特征维度;The point cloud encoder takes the point cloud P∈Rn*d as input, where n is the number of input point clouds and d is the feature dimension of each input point; 点云编码器的输出是点特征序列X=(x1,x2,…,xm)∈Rm*c,其中m是输出点特征的数量,c是输出点特征维度;The output of the point cloud encoder is a point feature sequence X = (x 1 ,x 2 ,…,x m) ∈R m*c , where m is the number of output point features and c is the output point feature dimension; 通过多层感知机MLP,将点特征X映射到点标记Y=(y1,y2,…,ym)∈Rk*c′,其中,c′是点标记的维度;Through the multi-layer perceptron MLP, the point feature X is mapped to the point label Y = (y 1 ,y 2 ,…,y m) ∈R k*c′ , where c′ is the dimension of the point label; 3D-MELL是一个仅解码器的接口端,接受由文本和点标记组成的标记序列,即混合令牌序列,该混合令牌序列表示为Z=(z1,z2,…,zk)∈Rk*c′,其中k是令牌总数;3D-MELL is a decoder-only interface that accepts a token sequence consisting of text and point tokens, i.e., a mixed token sequence, which is represented by Z = (z 1 ,z 2 ,…,z k) ∈R k*c′ , where k is the total number of tokens; 利用自注意力机制,3D-MELL能够理解不同类型令牌之间的上下文关系,使其能够基于文本和点云输入生成响应;3D-MELL的输出是一系列预测标记Z<i=(z1,z2,…,zi-1),第i个标记的预测以所有先前标记为条件。Using the self-attention mechanism, 3D-MELL is able to understand the contextual relationships between different types of tokens, enabling it to generate responses based on text and point cloud inputs; the output of 3D-MELL is a series of predicted tags Z <i = (z 1 ,z 2 ,…,zi -1 ), where the prediction of the i-th tag is conditioned on all previous tags.
CN202411118957.6A 2024-08-15 2024-08-15 Intelligent humanoid robot scene understanding method based on laser point cloud Active CN118982678B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202411118957.6A CN118982678B (en) 2024-08-15 2024-08-15 Intelligent humanoid robot scene understanding method based on laser point cloud

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202411118957.6A CN118982678B (en) 2024-08-15 2024-08-15 Intelligent humanoid robot scene understanding method based on laser point cloud

Publications (2)

Publication Number Publication Date
CN118982678A true CN118982678A (en) 2024-11-19
CN118982678B CN118982678B (en) 2025-03-04

Family

ID=93449331

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202411118957.6A Active CN118982678B (en) 2024-08-15 2024-08-15 Intelligent humanoid robot scene understanding method based on laser point cloud

Country Status (1)

Country Link
CN (1) CN118982678B (en)

Citations (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3885972A1 (en) * 2020-03-24 2021-09-29 Siemens Aktiengesellschaft Context based perception method and system for managing environmental safety in a computing environment
CN115421158A (en) * 2022-11-07 2022-12-02 中国人民解放军国防科技大学 Self-supervision learning solid-state laser radar three-dimensional semantic mapping method and device
WO2022252274A1 (en) * 2021-05-31 2022-12-08 北京理工大学 Point cloud segmentation and virtual environment generation method and apparatus based on pointnet network
WO2022257801A1 (en) * 2021-06-09 2022-12-15 山东大学 Slam-based mobile robot mine scene reconstruction method and system
US20230032036A1 (en) * 2020-10-30 2023-02-02 Boe Technology Group Co., Ltd. Three-dimensional scene constructing method, apparatus and system, and storage medium
CN116265862A (en) * 2021-12-16 2023-06-20 动态Ad有限责任公司 Vehicle, system and method for vehicle, and storage medium
CN116469095A (en) * 2023-04-21 2023-07-21 南京邮电大学 Time-Scene Adaptive 3D Target Detection Method Based on LesionSensor Fusion
CN116758497A (en) * 2023-03-21 2023-09-15 湖南大学 A laser point cloud three-dimensional target detection method based on unmanned system
CN117036895A (en) * 2023-10-10 2023-11-10 之江实验室 Multi-task environment sensing method based on point cloud fusion of camera and laser radar
CN117173527A (en) * 2023-09-15 2023-12-05 广东工业大学 Automatic driving multi-mode data updating and fusing method, device, equipment and medium
CN117392625A (en) * 2023-09-12 2024-01-12 武汉极目智能技术有限公司 Multimode BEV (BEV) looking around sensing method, device, equipment and storage medium
CN117496517A (en) * 2024-01-03 2024-02-02 广东工业大学 A method and system for lidar intelligent management and control in city-level real-life three-dimensional construction
US20240161478A1 (en) * 2022-11-16 2024-05-16 University Of Science And Technology Beijing Multimodal Weakly-Supervised Three-Dimensional (3D) Object Detection Method and System, and Device
CN118298122A (en) * 2024-04-28 2024-07-05 江苏大学 NDT-ICP point cloud registration-based laser radar-IMU tight coupling mapping method for unmanned platform
CN118298416A (en) * 2024-03-13 2024-07-05 南京航空航天大学 Multi-mode 3D target detection method based on time sequence modeling
WO2024217115A1 (en) * 2023-04-21 2024-10-24 长安大学 Three-dimensional object detection method based on multi-modal fusion and deep attention mechanism

Patent Citations (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3885972A1 (en) * 2020-03-24 2021-09-29 Siemens Aktiengesellschaft Context based perception method and system for managing environmental safety in a computing environment
US20230032036A1 (en) * 2020-10-30 2023-02-02 Boe Technology Group Co., Ltd. Three-dimensional scene constructing method, apparatus and system, and storage medium
WO2022252274A1 (en) * 2021-05-31 2022-12-08 北京理工大学 Point cloud segmentation and virtual environment generation method and apparatus based on pointnet network
WO2022257801A1 (en) * 2021-06-09 2022-12-15 山东大学 Slam-based mobile robot mine scene reconstruction method and system
CN116265862A (en) * 2021-12-16 2023-06-20 动态Ad有限责任公司 Vehicle, system and method for vehicle, and storage medium
CN115421158A (en) * 2022-11-07 2022-12-02 中国人民解放军国防科技大学 Self-supervision learning solid-state laser radar three-dimensional semantic mapping method and device
US20240161478A1 (en) * 2022-11-16 2024-05-16 University Of Science And Technology Beijing Multimodal Weakly-Supervised Three-Dimensional (3D) Object Detection Method and System, and Device
CN116758497A (en) * 2023-03-21 2023-09-15 湖南大学 A laser point cloud three-dimensional target detection method based on unmanned system
CN116469095A (en) * 2023-04-21 2023-07-21 南京邮电大学 Time-Scene Adaptive 3D Target Detection Method Based on LesionSensor Fusion
WO2024217115A1 (en) * 2023-04-21 2024-10-24 长安大学 Three-dimensional object detection method based on multi-modal fusion and deep attention mechanism
CN117392625A (en) * 2023-09-12 2024-01-12 武汉极目智能技术有限公司 Multimode BEV (BEV) looking around sensing method, device, equipment and storage medium
CN117173527A (en) * 2023-09-15 2023-12-05 广东工业大学 Automatic driving multi-mode data updating and fusing method, device, equipment and medium
CN117036895A (en) * 2023-10-10 2023-11-10 之江实验室 Multi-task environment sensing method based on point cloud fusion of camera and laser radar
CN117496517A (en) * 2024-01-03 2024-02-02 广东工业大学 A method and system for lidar intelligent management and control in city-level real-life three-dimensional construction
CN118298416A (en) * 2024-03-13 2024-07-05 南京航空航天大学 Multi-mode 3D target detection method based on time sequence modeling
CN118298122A (en) * 2024-04-28 2024-07-05 江苏大学 NDT-ICP point cloud registration-based laser radar-IMU tight coupling mapping method for unmanned platform

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
S. XU等: "FusionPainting: Multimodal Fusion with Adaptive Attention for 3D Object Detection", 2021 IEEE INTERNATIONAL INTELLIGENT TRANSPORTATION SYSTEMS CONFERENCE (ITSC), 25 August 2021 (2021-08-25), pages 3047 - 3054 *
罗寒等: "基于语义一致性约束与局部-全局感知的多模态3D视觉定位", 计算机应用研究, vol. 41, no. 07, 5 July 2024 (2024-07-05), pages 2203 - 2208 *

Also Published As

Publication number Publication date
CN118982678B (en) 2025-03-04

Similar Documents

Publication Publication Date Title
CN110675418B (en) Target track optimization method based on DS evidence theory
CN108229366B (en) Deep Learning Vehicle Obstacle Detection Method Based on Radar and Image Data Fusion
US20230236280A1 (en) Method and system for positioning indoor autonomous mobile robot
CN107092021B (en) Vehicle Lidar 3D Scanning Method, Land Object Classification Method and System
CN110689562A (en) Trajectory loop detection optimization method based on generation of countermeasure network
CN114140527B (en) Dynamic environment binocular vision SLAM method based on semantic segmentation
CN113160327A (en) Method and system for realizing point cloud completion
CN109547769B (en) A highway traffic dynamic three-dimensional digital scene acquisition and construction system and its working method
CN102494663B (en) Measuring system of swing angle of swing nozzle and measuring method of swing angle
CN114140539B (en) Method and device for acquiring position of indoor object
CN110555408A (en) Single-camera real-time three-dimensional human body posture detection method based on self-adaptive mapping relation
CN115128628A (en) Construction method of road grid map based on laser SLAM and monocular vision
CN117055004A (en) Three-dimensional human skeleton estimation method based on millimeter wave radar sparse point cloud
Cui et al. Precise landing control of UAV based on binocular visual SLAM
CN118640878B (en) Topography mapping method based on aviation mapping technology
CN119290016A (en) A multi-sensor fusion vehicle rapid positioning method, device, storage medium and electronic device
CN118982678B (en) Intelligent humanoid robot scene understanding method based on laser point cloud
Wlodarczyk-Sielicka et al. General concept of reduction process for big data obtained by interferometric methods
CN114429515A (en) A point cloud map construction method, device and device
CN118482684B (en) Elevation information acquisition method, device and system based on vision
CN119063717B (en) A method and system for unmanned intelligent mapping with synchronous positioning
CN118279568B (en) Multi-target identity judging method for distributed double-infrared sensor time sequence twin network
CN119152505B (en) Labeling method, labeling device, labeling equipment and storage medium
CN119006700B (en) Complex scene 3D reconstruction method based on multi-source fusion
CN119359766B (en) Unmanned aerial vehicle carries many targets and surveys tracking system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant