CN118982678A - Intelligent scene understanding method for humanoid robots based on laser point cloud - Google Patents
Intelligent scene understanding method for humanoid robots based on laser point cloud Download PDFInfo
- Publication number
- CN118982678A CN118982678A CN202411118957.6A CN202411118957A CN118982678A CN 118982678 A CN118982678 A CN 118982678A CN 202411118957 A CN202411118957 A CN 202411118957A CN 118982678 A CN118982678 A CN 118982678A
- Authority
- CN
- China
- Prior art keywords
- point cloud
- point
- laser radar
- scene
- data
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 95
- 238000005070 sampling Methods 0.000 claims abstract description 34
- 238000012549 training Methods 0.000 claims abstract description 32
- 238000011176 pooling Methods 0.000 claims abstract description 24
- 238000012545 processing Methods 0.000 claims abstract description 23
- 238000000605 extraction Methods 0.000 claims abstract description 16
- 230000003044 adaptive effect Effects 0.000 claims abstract description 7
- 230000033001 locomotion Effects 0.000 claims description 46
- 239000011159 matrix material Substances 0.000 claims description 34
- 238000005259 measurement Methods 0.000 claims description 34
- 230000008569 process Effects 0.000 claims description 22
- 238000004364 calculation method Methods 0.000 claims description 17
- 239000013598 vector Substances 0.000 claims description 13
- 230000008859 change Effects 0.000 claims description 12
- 238000012937 correction Methods 0.000 claims description 11
- 230000007246 mechanism Effects 0.000 claims description 11
- 230000009466 transformation Effects 0.000 claims description 10
- 238000013256 Gubra-Amylin NASH model Methods 0.000 claims description 9
- 230000008878 coupling Effects 0.000 claims description 8
- 238000010168 coupling process Methods 0.000 claims description 8
- 238000005859 coupling reaction Methods 0.000 claims description 8
- 230000011218 segmentation Effects 0.000 claims description 8
- 238000003860 storage Methods 0.000 claims description 7
- 230000004044 response Effects 0.000 claims description 6
- 238000004458 analytical method Methods 0.000 claims description 4
- 230000001174 ascending effect Effects 0.000 claims description 3
- 230000001143 conditioned effect Effects 0.000 claims description 3
- 230000001788 irregular Effects 0.000 claims description 3
- 239000000284 extract Substances 0.000 claims description 2
- 230000002776 aggregation Effects 0.000 abstract description 5
- 238000004220 aggregation Methods 0.000 abstract description 5
- 238000005516 engineering process Methods 0.000 abstract description 4
- 230000008447 perception Effects 0.000 abstract description 3
- 238000010586 diagram Methods 0.000 description 12
- 230000006870 function Effects 0.000 description 7
- 230000004927 fusion Effects 0.000 description 6
- 238000004422 calculation algorithm Methods 0.000 description 5
- 230000001133 acceleration Effects 0.000 description 4
- 238000006243 chemical reaction Methods 0.000 description 4
- 230000007613 environmental effect Effects 0.000 description 4
- 239000008188 pellet Substances 0.000 description 4
- 238000001914 filtration Methods 0.000 description 3
- 238000013507 mapping Methods 0.000 description 3
- 238000005457 optimization Methods 0.000 description 3
- 239000007787 solid Substances 0.000 description 3
- 238000013519 translation Methods 0.000 description 3
- 230000008485 antagonism Effects 0.000 description 2
- 230000008901 benefit Effects 0.000 description 2
- 238000010276 construction Methods 0.000 description 2
- 238000001514 detection method Methods 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 230000018109 developmental process Effects 0.000 description 2
- 230000004069 differentiation Effects 0.000 description 2
- 238000009826 distribution Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 230000001360 synchronised effect Effects 0.000 description 2
- 238000012935 Averaging Methods 0.000 description 1
- 230000004913 activation Effects 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 230000005358 geomagnetic field Effects 0.000 description 1
- 230000010365 information processing Effects 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- 239000003550 marker Substances 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 238000000691 measurement method Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000000737 periodic effect Effects 0.000 description 1
- 230000036544 posture Effects 0.000 description 1
- 238000005295 random walk Methods 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 230000003252 repetitive effect Effects 0.000 description 1
- 238000012827 research and development Methods 0.000 description 1
- 238000000638 solvent extraction Methods 0.000 description 1
- 230000009897 systematic effect Effects 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
- G06N3/0455—Auto-encoder networks; Encoder-decoder networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0475—Generative networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/094—Adversarial learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/80—Analysis of captured images to determine intrinsic or extrinsic camera parameters, i.e. camera calibration
- G06T7/85—Stereo camera calibration
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/24—Aligning, centring, orientation detection or correction of the image
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/25—Determination of region of interest [ROI] or a volume of interest [VOI]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/26—Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/80—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
- G06V10/817—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level by voting
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Computing Systems (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Software Systems (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Molecular Biology (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Mathematical Physics (AREA)
- Computational Linguistics (AREA)
- Biophysics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Life Sciences & Earth Sciences (AREA)
- Databases & Information Systems (AREA)
- Medical Informatics (AREA)
- Optical Radar Systems And Details Thereof (AREA)
Abstract
The invention discloses a humanoid robot scene intelligent understanding method based on laser point cloud, which comprises the following steps: the humanoid robot performs autonomous navigation and acquires three-dimensional live-action data through a laser radar scanning system to obtain scene point cloud data; processing the obtained scene point cloud data by using an adaptive radius sampling module; performing point cloud semantic feature extraction based on RPV-Net on the extracted effective point cloud data to obtain a 3D bounding box and a semantic tag of the point cloud data; processing point cloud data by using a channel attention module, acquiring point cloud integrity characteristics by using an average pooling method, extracting individual characteristics by using a maximum pooling method, and performing characteristic aggregation to acquire point cloud global characteristics; and constructing a multi-mode mixed model of a scene understanding Bert pre-training model and an encoder-decoder model, and processing the global characteristics of the point cloud. The method provides real-time and high-precision three-dimensional point cloud data for the humanoid robot by utilizing a laser radar technology, and can realize depth perception and intelligent understanding of the environment.
Description
Technical Field
The invention relates to the field of humanoid robot perception and intelligent understanding. More specifically, the invention relates to a humanoid robot scene intelligent understanding method based on laser point clouds.
Background
With the rapid development of robot technology, humanoid robots exhibit wide application potential in various application scenarios, such as disaster response, high-risk environment detection, daily services, and the like. However, in order for a humanoid robot to operate effectively in a complex environment, highly accurate environmental awareness is required. Traditional robotic vision systems rely primarily on cameras or simple sensors, which present significant challenges in environments with limited vision or complex lighting conditions. Especially in dynamic environments, it is difficult for conventional vision systems to acquire high quality environmental data in real time and accurately. In the research and development and application of the humanoid robot, accurate three-dimensional environment information is acquired in real time and efficient intelligent scene understanding is realized, so that the method is a key for improving the autonomy and the functionality of the robot.
Disclosure of Invention
The invention provides an intelligent understanding method of a humanoid robot scene based on laser point clouds, which provides real-time and high-precision three-dimensional point cloud data for the humanoid robot by utilizing a laser radar technology and can realize depth perception and intelligent understanding of the environment.
To achieve these objects and other advantages and in accordance with the purpose of the invention, as embodied and broadly described herein, there is provided a humanoid robot scene intelligent understanding method based on a laser point cloud, including:
A laser radar scanning system is assembled on a humanoid robot, the humanoid robot performs autonomous navigation and acquires three-dimensional live-action data through the laser radar scanning system to obtain scene point cloud data;
Processing the obtained scene point cloud data by using an adaptive radius sampling module to extract enough effective point cloud data;
Performing point cloud semantic feature extraction based on RPV-Net on the extracted effective point cloud data to obtain a 3D bounding box and a semantic tag of the point cloud data forming the building member in the scene;
Processing the point cloud data after the point cloud semantic feature extraction by using a channel attention module, acquiring the integral features of the point cloud by adopting an average pooling method, extracting individual features with differentiation by adopting a maximum pooling method, and carrying out feature aggregation on the integral features and the individual features to acquire the global features of the point cloud;
And constructing a multi-mode mixed model of a scene understanding Bert pre-training model and an encoder-decoder model, processing the acquired global characteristics of the point cloud, increasing the diversity of training data through the scene understanding Bert pre-training model, understanding the three-dimensional point cloud object into the three-dimensional ground object and the parameter information of the three-dimensional ground object through the encoder-decoder model, and improving the understanding capability of the humanoid robot on the scene.
Preferably, the laser point cloud-based intelligent understanding method for a humanoid robot scene, the laser radar scanning system comprises: the system comprises a laser radar scanner, an inertial measurement unit and a PTP time soft synchronization module, wherein the laser radar scanner and the inertial measurement unit are fused in a tight coupling mode, and the PTP time soft synchronization module enables time synchronization of the laser radar, the inertial measurement unit and the humanoid robot.
Preferably, the intelligent understanding method of the humanoid robot scene based on the laser point cloud further comprises calibrating the laser radar scanning system, specifically:
Using an Allan variance analysis method to evaluate noise characteristics and bias stability of the inertial measurement unit, and performing internal calibration on the inertial measurement unit;
and calibrating external parameters of the laser radar scanner and the inertial measurement unit by adopting a lidar_align calibration method, and acquiring a conversion matrix between a coordinate system of the laser radar scanner and a coordinate system of the inertial measurement unit so as to convert point cloud data under the coordinate system of the laser radar scanner into the coordinate system of the inertial measurement unit.
Preferably, the intelligent understanding method of the humanoid robot scene based on the laser point cloud further comprises the step of removing motion distortion of scene point cloud data acquired by the laser radar scanner after calibrating the laser radar scanning system, and the specific process is as follows:
Motion distortion occurring in scene point cloud data includes: a. if the scanning view field of the laser radar scanner always falls on the same straight wall surface, when the laser radar scanner moves along the direction vertical to the wall surface, the distance from the laser radar scanner to the same point of the wall surface is continuously reduced, so that point cloud data can be raised along the movement direction of the laser radar scanner; b. when the laser radar scanner moves to have angle deviation, the movement of the centers of the petals of the scanning lines can lead the point clouds of the petals of the scanning lines to be stretched into an irregular shape, namely, the centers are distorted; c. the laser radar scanner has a fixed field angle and cannot cover a circle, so that the coverage range of the field of view can be scaled along with the change of the distance, and the size of the petals of the scanning line can be scaled along with the movement of the laser radar scanner, namely, the edge distortion is generated;
And aiming at the motion distortion of a and b, removing the motion compensation of the inter-frame point cloud by the inertial measurement unit, wherein the specific calculation mode comprises the following steps of: P correct=Tk,i×Pi, wherein T k and T k+1 are start time and end time of a frame of point cloud recorded by a laser radar scanner, T i is time of ith point P i relative to start time in a kth frame of point cloud set S, S num is total point number of a frame of point cloud set, T k,k+1 is frame pose transformation, T k,i is pose transformation of P i point relative to start time, and P correct is point coordinate after motion distortion correction calculated by using pose transformation matrix corresponding to P i point;
for the motion distortion of c, the scanning points with unreliable edges are filtered out by limiting the size of the angle of view.
Preferably, in the intelligent understanding method of the humanoid robot scene based on the laser point cloud, after removing the motion distortion of the scene point cloud data, the method further comprises the step of reconstructing the scene point cloud data for point cloud storage, and the specific process is as follows:
The laser radar scanner scans a frame of point cloud data, wherein the frame of point cloud data comprises 20 scanning line petals in a graph, each scanning line petal is continuous at the center, and each scanning line petal sequentially stores the whole petals into one row of a matrix by taking the point closest to the scanning center as a starting point;
Storing spatially adjacent scan line petals into adjacent rows of the matrix, specifically: the 20 scanning line petals in one frame of point cloud data are obtained by rotating for 4 times by taking 5 scanning line petals as a period, and in one scanning period, the starting point numbers recorded by the 5 scanning line petals clockwise are sequentially 0, 1000, 2000, 500 and 1500; the difference of the start point numbers of the petals of the scanning lines in 4 scanning periods is 2500, and the start point numbers of the petals of the adjacent scanning lines indexed clockwise by one frame of scanning point cloud are calculated through the point number relation; and sequentially indexing corresponding scanning line petals through the obtained starting point numbers, and sequentially storing the petals according to rows to obtain a matrix with a continuous spatial relationship among the rows.
Preferably, the intelligent understanding method of the humanoid robot scene based on the laser point cloud further comprises the following steps after the reconstructed point cloud storage is performed on the scene point cloud data: the formula for extracting the angular points and plane point characteristics of each frame of point cloud data is modified, and specifically comprises the following steps:
A. calculating vector differences between the point with the calculated curvature and the front five points and the rear five points of the point with the calculated curvature to the scanning original point respectively, summing the vector differences, and reflecting the curvature value of the point with the length of the obtained sum, wherein the curvature value calculation formula is as follows:
Wherein c i is the curvature value of the scan point P i, S is an integer set of-5 to 5 excluding 0, AndVectors representing the ith and the jth points in the kth frame point cloud data under the laser radar scanner coordinate system L;
B. When the curvature value of the scanning point is larger than 1, judging the scanning point as an angular point; when the curvature value of the scanning point is smaller than 0.1, judging the scanning point as a plane point;
C. and C, extracting corner points and plane points in the point cloud data of each frame according to the steps A and B.
Preferably, in the intelligent understanding method of the humanoid robot scene based on the laser point cloud, the adaptive radius sampling module is used for processing the obtained scene point cloud data, specifically: firstly, calculating Euclidean distances between a sampling center point and all points in scene point cloud data, marking the points exceeding a specific threshold value R as Q, and keeping original values of the rest points; then, the points in the spherical sampling radius are arranged in ascending order according to the distance, and the nearest w points are extracted as sampling points; if the sampling points contain Q values, which means that the number of effective points in the spherical sampling radius is less than w, the sampling radius needs to be gradually increased until at least w effective points are met, so that enough effective points are ensured to be used for feature learning;
The extracting of the point cloud semantic features based on RPV-Net is carried out on the extracted effective point cloud data, and specifically comprises the following steps: inputting point cloud P epsilon R N×3, namely, point cloud data originally containing three dimensions, wherein N represents the number of the point clouds; obtaining seeds through semantic segmentation network pairs; extracting boundary voxel characteristics of the whole pilar by using the idea of the volume column pilar-based, voting the obtained pilar, and then performing Vote & Sampling & on the obtained point cloud characteristics
An operation of Grouping, wherein each seed independently generates a vote through a voting module; the votes are then grouped into h groups and processed by Proposal modules to generate the final Proposal, which outputs the 3D bounding box and semantic tags.
Preferably, in the intelligent understanding method of the humanoid robot scene based on the laser point cloud, the channel attention module is used for processing the point cloud data extracted by the semantic features of the point cloud, and the specific process is as follows:
The characteristics of the point cloud data may be represented as f=r BxNx1xC, where B, N, C represents batch size, number of point clouds, and number of characteristic channels, respectively;
And an average pooling method and a maximum pooling method are adopted along the dimension of the feature channel number C, the average pooling method is used for acquiring the integral features of the point cloud, the maximum pooling method extracts individual features with more distinguishing property, and then the integral features and the individual features are subjected to feature aggregation to generate two different global features F avg and F max of the point cloud.
Preferably, in the method for intelligently understanding a humanoid robot scene based on a laser point cloud, the understanding of the Bert pre-training model through the scene increases diversity of training data, specifically:
the scene understanding Bert pre-training model pair generates an countermeasure network GAN model expansion data set, and the diversity of training data is increased;
in the standard generation countermeasure network GAN model, a generator G and a discriminator D are included;
Setting training samples aiming at a scene point cloud data set by adjusting parameter settings for generating an countermeasure network GAN model;
the object of the generator G is to generate dummy data as close as possible to the real data, and the object of the arbiter D is to distinguish the real data from the dummy data generated by the generator G, and to add a balance coefficient γ to adjust the weight between the two.
Preferably, in the intelligent understanding method of the humanoid robot scene based on the laser point cloud, the processing procedure of the constructed encoder-decoder model is specifically as follows:
the point cloud encoder takes point clouds P epsilon R n*d as input, wherein n is the number of input point clouds, and d is the feature dimension of each input point;
The output of the point cloud encoder is a point feature sequence x= (X 1,x2,…,xm)∈Rm*c, where m is the number of output point features and c is the output point feature dimension;
Mapping the point feature X to a point mark y= (Y 1,y2,…,ym)∈Rk*c′, where c' is the dimension of the point mark, by a multi-layer perceptron MLP;
3D-MELL is a decoder-only interface that accepts a tag sequence consisting of text and point tags, i.e., a mixed token sequence, denoted as z= (Z 1,z2,…,zk)∈Rk*c′, where k is the total number of tokens;
With the self-attention mechanism, 3D-MELL can understand the contextual relationship between different types of tokens, enabling it to generate a response based on text and point cloud input; the output of 3D-MELL is a series of predictive markers Z <i=(z1,z2,…,zi-1), the prediction of the ith marker being conditioned on all previous markers.
The invention at least comprises the following beneficial effects:
The method aims at the problem of uneven point cloud data density, and the self-adaptive radius sampling module is used for processing the obtained scene point cloud data to extract enough effective point cloud data, so that enough points are ensured to be used for feature learning, and the understanding capability and the segmentation precision of the humanoid robot on complex scenes are improved.
Secondly, the invention designs a channel attention module aiming at the problems of low processing speed and lost characteristics caused by large environmental data quantity so as to solve the problems of reducing the parameter quantity and simultaneously avoiding the loss of single characteristics and integral characteristics.
Thirdly, the laser radar scanner used in the invention is a solid-state laser radar scanner, and aims at the characteristics of small visual angle, petal-shaped scanning track, non-repeated scanning mode and the like of the solid-state laser radar scanner, the motion distortion is removed, the point cloud of the solid-state laser radar scanner is reconstructed, the spatial relationship between scanning lines is established, and the characteristic extraction formula is corrected, so that the extraction efficiency and precision are improved, the conditions of insufficient characteristic points and wrong selection caused by the laser radar scanner and the environment are effectively improved, and the problems of layering and aggravation of point cloud are solved.
Fourth, the multi-mode mixed model of the Bert pre-training model and the encoder-decoder model is used for understanding scenes, so that the understanding capability of the humanoid robot on complex scenes is improved.
Additional advantages, objects, and features of the invention will be set forth in part in the description which follows and in part will become apparent to those having ordinary skill in the art upon examination of the following or may be learned from practice of the invention.
Drawings
Fig. 1 is a schematic flow chart of a humanoid robot scene intelligent understanding method based on laser point clouds according to an embodiment of the invention;
FIG. 2 is a schematic diagram of PTP time synchronization according to an embodiment of the present invention;
FIG. 3 is a flow chart of the laser radar scanner and IMU external parameter calibration according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of distortion of a lidar scanner perpendicular to a wall surface in an embodiment of the present invention;
FIG. 5 is a schematic diagram of center distortion when the laser radar scanner is angularly offset in an embodiment of the present invention;
FIG. 6 is a schematic diagram of edge distortion occurring when the field angle of a lidar scanner is small in an embodiment of the present invention;
FIG. 7 is a schematic diagram of a petal-shaped point cloud with a certain radian about laser scanning radar scanning in an embodiment of the present invention;
FIG. 8 is a graph showing the correspondence between the point number and the standard curvature difference under the standard condition in the embodiment of the present invention;
FIG. 9 is a graph showing the change of the distance correction value by calculating the difference between the curvatures of the points corresponding to different distances and the curvature of the points corresponding to the standard conditions as the distance correction value according to the embodiment of the invention;
FIG. 10 is a schematic diagram of a sampling scheme using Pointnet ++;
FIG. 11 is a schematic diagram of an adaptive radius sampling method of the present invention employed in an embodiment of the present invention;
FIG. 12 is a schematic diagram of point cloud semantic segmentation in an embodiment of the present invention;
FIG. 13 is a schematic diagram of a channel attention module according to an embodiment of the present invention;
FIG. 14 is a schematic diagram of an encoder-decoder model process in accordance with an embodiment of the present invention;
FIG. 15 is a 3D bounding box estimation plot of an overall point cloud acquired by a lidar system in an example of the present invention;
FIG. 16 is a 3D bounding box estimation map of a local extraction map obtained in an example of the present invention;
Fig. 17 is a 3D bounding box estimation map of an accurate local extraction map obtained in an example of the present invention.
Detailed Description
The present invention is described in further detail below with reference to the drawings to enable those skilled in the art to practice the invention by referring to the description.
It will be understood that terms, such as "having," "including," and "comprising," as used herein, do not preclude the presence or addition of one or more other elements or groups thereof.
As shown in fig. 1, the embodiment of the invention provides a humanoid robot scene intelligent understanding method based on laser point cloud, which comprises the following steps:
S100, assembling a laser radar scanning system on the humanoid robot, wherein the humanoid robot performs autonomous navigation and acquires three-dimensional live-action data through the laser radar scanning system to obtain scene point cloud data.
In the ROS operation system of the humanoid robot, a move_base navigation planning frame is added, a mbot _navigation random autonomous navigation function module is built, and the function of three-dimensional live-action data acquisition from fixed route travel to random autonomous navigation of the humanoid robot is realized.
The laser radar scanning system includes: the system comprises a laser radar scanner, an inertial measurement unit and a PTP time soft synchronization module, wherein the laser radar scanner and the inertial measurement unit are fused in a tight coupling mode, and the PTP time soft synchronization module enables time synchronization of the laser radar, the inertial measurement unit and the humanoid robot.
Specifically, the laser radar scanner is a solid laser radar scanner, is a scanning sensor for acquiring the geometric shape of a measured target through ranging, mainly comprises a laser emitter, a detector and an information processing system, has strong anti-interference capability and can work in a severe environment. The working principle of the laser radar scanner is that visible and near infrared laser waves are emitted to a measured object, the emitted laser light can be reflected back to the laser radar after reaching the measured object, the laser radar is received by a receiver, the distance from the measured object to the scanner and the intensity of the reflected laser light are calculated by combining the light velocity by recording the time from the emission of the laser beam to the reflection of the laser beam, and therefore the geometric shape of the measured object is obtained. Specific measurement methods include triangulation, time of flight (ToF) and phase difference.
An Inertial Measurement Unit (IMU) is a sensor for measuring physical quantities of an object, including linear acceleration, angular velocity and magnetic field direction, and is mainly composed of an accelerometer, a gyroscope and a magnetometer. The IMU works on the principle of determining the motion state of the instrument by measuring acceleration and angular velocity. The accelerometer can measure linear acceleration along X, Y, Z three axes, the gyroscope can measure angular velocity around X, Y, Z three axes, and the magnetometer can measure the direction and intensity of geomagnetic field to calibrate drift of the gyroscope, so that the moving direction and the rotating state of the equipment are obtained. The deviation exists between the physical quantity measured by the IMU and the actual physical quantity, and the deviation is mainly divided into two types: systematic errors such as bias (bias) and random errors such as noise, the presence of errors is therefore taken into account in the calculation process.
The basic calculation formula of the IMU is as follows:
Wherein the method comprises the steps of A true is the true acceleration, b a is the bias of the accelerometer, n a is the noise during the accelerometer measurements,Omega true is the true angular velocity, b g is the bias of the gyroscope, and n g is the noise during the gyroscope measurement.
In practical applications, such errors need to be compensated for in order to ensure measurement accuracy. The compensation method typically includes two calibration and filtering algorithms. Calibration may be performed with the device stationary or in motion, the magnitude of the error determined, and the calibrated error value subtracted from the subsequent measurement. The filtering algorithm (such as Kalman filtering) estimates and compensates for the bias in real time while in motion, reducing the effects of noise.
In order to obtain a high-precision point cloud map, a plurality of sensors are required to be used in a fusion mode, and multi-source data mutual constraint correction is obtained. The fusion mode of the sensor is loose coupling and tight coupling, and the fusion mode used in the embodiment of the invention is tight coupling. The tight coupling is the observation equation and the motion equation for motion state estimation constructed by the lidar scanner and the IMU together. The principle is that the data collected by the laser radar scanner and the IMU are put into the same state vector, the rotation matrix, the translation matrix, the speed and the IMU zero offset are jointly optimized, and the state vector is only required to be optimized. And constructing error items comprising laser radar measurement residual errors and IMU measurement residual errors by utilizing the laser radar scanner and IMU data so as to achieve the purpose of simultaneously optimizing the pose, the speed and the deviation of the tightly coupled system. The tight coupling system has compact structure, high integration among the sensors, mutual constraint among the sensor models, more sufficient information utilization and higher positioning and mapping precision and efficiency. The key problem of the multi-sensor fusion operation is multi-sensor data fusion, wherein the data fusion mainly comprises two aspects of content, namely time synchronization and coordinate system alignment of multi-sensor data.
For time synchronization, the embodiment of the invention adopts a PTP time soft synchronization module, which specifically comprises the following steps:
The lidar scanner used in the embodiments of the present invention is a solid state lidar, so that PTPv2 may be used for time synchronization between the lidar scanner and other devices without GPS and PPS hardware signals. The method requires minimal external hardware support and only needs to have one master clock device in the whole network. I.e. the clock of the solid-state laser radar scanner is synchronized to the host computer in use, the synchronization process follows the IEEE1588v2.0PTP protocol, and the delay request-response mechanism Livox Mid-40 in the protocol is used as a slave end to perform PTP time synchronization with the master port (clock equipment of the host computer).
The PTP synchronization principle is that periodic time information is sent from a master port, the slave port receives the information with a time stamp, and local time is adjusted by calculating time delay and time difference between sending and receiving of the information, so that time between the slave device and the master device is kept consistent, and the schematic diagram is shown in fig. 2.
The transmission path delay can be expressed as:
the time offset can be expressed as:
The synchronization flow is as follows:
(1) The master port sends a Sync synchronous message to the slave port, and records time T1;
(2) After receiving the Sync message from the port, recording the receiving time T2;
(3) The master port sends a Sync message, and then sends a Follow_Up following message with a T1 value to the slave port;
(4) The slave port sends delay_req Delay request message to the master port, and the sending time T3 is recorded;
(5) After receiving the delay_req message, the main port records the receiving time T4;
(6) The master port sends a delay_resp Delay request response message carrying T4 to the slave port.
S110, calibrating the laser radar scanning system, specifically:
S111, evaluating noise characteristics and bias stability of the inertial measurement unit by using an Allan variance analysis method, and performing internal calibration on the inertial measurement unit.
The error of the IMU is mainly Gaussian white noise and bias (bias random walk), and the aim of calibrating the internal parameters of the IMU is to improve the accuracy of positioning, navigation and attitude estimation.
The Allen variance is a time domain analysis method that can separate and identify noise [59] of different sources from time series data, and the Allen variance σ 2 (τ) is a function of the observation time interval τ.
Where N is the number of data points, x i is the ith data point, and τ is the sampling time interval.
S112, calibrating external parameters of the laser radar scanner and the inertial measurement unit by adopting a lidar_align calibration method, and acquiring a conversion matrix between a coordinate system of the laser radar scanner and a coordinate system of the inertial measurement unit so as to convert point cloud data under the coordinate system of the laser radar scanner into the coordinate system of the inertial measurement unit.
The coordinate system is aligned, namely, the point cloud data is converted into the coordinate system of the IMU sensor, so that the subsequent data processing is facilitated. Therefore, the relative position relation between the two sensor coordinate systems needs to be acquired when the coordinate systems are aligned, so that external parameter calibration of the laser radar and the IMU needs to be performed, and a conversion matrix between the two coordinate systems is acquired. The essence of external parameter calibration is to solve the problem of data optimization, namely, the point cloud data acquired by a laser radar scanner is converted into an IMU coordinate system, and the point clouds of the front frame and the back frame can be overlapped to the greatest extent, so that the value of the external parameter is what is in the case. The calibration method adopted by the embodiment of the invention is lidar_align, and the core principle is that the original point cloud is randomly sampled (most of the point cloud extracted by default is repeated points), the closest point in the next frame of point cloud is found by kdtree method to calculate the distance, the K adjacent error can be generated in the calculation process, and when the conversion matrix is not optimal, the error can be increased, and the nonlinear optimization library nlopt is used for optimizing the error, so that the K adjacent error is approximately 0, and the rotation matrix and the translation matrix at the moment are solved.
The specific calibration flow is shown in fig. 3, wherein, the calibration data should be noted to contain a large amount of rotation and translation during the recording process, and the data amount is controlled within 2G, so that the calibration result has higher precision.
S120, after calibrating the laser radar scanning system, the method further comprises the step of removing motion distortion of scene point cloud data acquired by the laser radar scanner, wherein the specific process is as follows:
Motion distortion occurring in scene point cloud data includes: a. if the scanning view field of the laser radar scanner always falls on the same straight wall surface, when the laser radar scanner moves along the direction vertical to the wall surface, the distance from the laser radar scanner to the same point of the wall surface is continuously reduced, so that point cloud data can be raised along the movement direction of the laser radar scanner, as shown in fig. 4; b. when the movement of the laser radar scanner is angularly offset, the movement of the centers of the petals of the scanning lines can lead to the point cloud of the petals of the scanning lines to be stretched into an irregular shape, namely, the centers of the point cloud of the petals of the scanning lines are distorted, as shown in fig. 5; c. the view angle of the laser radar scanner is fixed and cannot cover a circle, so that the coverage range of the view field can be scaled along with the change of the distance, the size of the petals of the scanning line can be scaled along with the movement of the laser radar scanner, namely, the edges are distorted, and the graph is shown as 6;
And aiming at the motion distortion of a and b, the inertial measurement unit is used for removing the inter-frame point cloud by performing motion compensation, the acquisition frequency of the IMU is 100HZ, and the laser radar point cloud with the frequency of 10HZ can be interpolated. The specific calculation mode is as follows: P correct=Tk,i×Pi, wherein T k and T k+1 are start time and end time of a frame of point cloud recorded by the laser radar scanner, T i is time of ith point P i relative to start time in a kth frame of point cloud set S, S num is total point number of a frame of point cloud set, T k,k+1 is frame pose transformation, T k,i is pose transformation of P i point relative to start time, and P correct is point coordinate after motion distortion correction calculated by using pose transformation matrix corresponding to P i point. It should be noted that, in the calculation of t i, it is assumed that the solid-state laser radar scanner emits laser at uniform time intervals and the internal prism rotates at a uniform speed; in calculation T k,i, assuming that the lidar scanner remains in constant motion for one frame time, the pose change is also uniform for one frame time.
By the motion distortion removal method, point clouds in the same frame can be corrected to the starting moment of the current frame approximately, and the influence of the motion distortion on the point cloud characteristics is eliminated.
For the motion distortion of c, the scanning points with unreliable edges are filtered out by limiting the size of the angle of view.
It should be noted that, in the embodiment of the invention, the scanning frequency of the solid-state laser radar scanner is 10Hz, and the scanning time of one frame of point cloud is 0.1s. Ideally, the same frame of point clouds should be acquired at the same time, but in practice, a frame of point clouds is acquired uniformly within 0.1s. While the laser radar scanner is in motion during data acquisition, when the last point of the point cloud of the current frame is acquired, the scanning positions of the laser radar scanner and the first point of the current frame are different. However, the time stamps of all scanning points in a frame are the time when the point cloud starts to be acquired, and the point cloud of the frame is considered to be acquired at the starting time. The motion of the laser radar scanner itself affects the scanning point cloud, and motion distortion of the point cloud is generated. The more intense the lidar motion during a frame time, the more severe the resulting motion distortion.
S130, performing motion distortion removal on scene point cloud data, and then performing reconstruction point cloud storage on the scene point cloud data, wherein the specific process is as follows:
The laser radar scanner scans a frame of point cloud data, wherein the frame of point cloud data comprises 20 scanning line petals in a graph, each scanning line petal is continuous at the center, and each scanning line petal sequentially stores the whole petals into one row of a matrix by taking the point closest to the scanning center as a starting point;
Storing spatially adjacent scan line petals into adjacent rows of the matrix, specifically: the 20 scanning line petals in one frame of point cloud data are obtained by rotating for 4 times by taking 5 scanning line petals as a period, and in one scanning period, the starting point numbers recorded by the 5 scanning line petals clockwise are sequentially 0, 1000, 2000, 500 and 1500; the difference of the start point numbers of the petals of the scanning lines in 4 scanning periods is 2500, and the start point numbers of the petals of the adjacent scanning lines indexed clockwise by one frame of scanning point cloud are calculated through the point number relation; and sequentially indexing corresponding scanning line petals through the obtained starting point numbers, and sequentially storing the petals according to rows to obtain a matrix with a continuous spatial relationship among the rows.
It should be noted that, a frame of point cloud after removing the motion distortion is generally stored in a matrix according to a scanning sequence, so that the algorithm can quickly index and call the point cloud row by row or column by column. The solid-state laser radar scanning point cloud used in the embodiment of the invention can be stored in the matrix, but the scanning point cloud does not have obvious row and column characteristics in space due to the non-repeated scanning mode of the solid-state laser radar, so that the row and column characteristics of the storage matrix have no physical significance, and the index and relation construction of the point cloud cannot be carried out in space by using the row and column. Therefore, the point cloud of the solid-state lidar needs to be reconstructed to establish the spatial relationship between the scanning lines.
The solid-state laser radar scanning point cloud is composed of 20 petals in a graph, each petal is continuous at the center, and the scanning point cloud of a single petal can be stored in a matrix in rows. Each of the petals of the scanning line starts from the point closest to the scanning center, and the whole petals are sequentially stored in one row of the matrix. If there is a null point with return value on the scan line petals, which is replaced by the original point coordinates, the null point is removed to ensure that the same-row scan lines in the matrix are continuous in space. Since petals having consecutive dot numbers on a scanning line are not adjacent in space, a matrix stored in order of dot numbers does not have a spatial relationship between adjacent rows, and petals adjacent in space need to be stored in adjacent rows of the matrix.
After the point cloud space relation matrix is established according to step S130, the same feature can be quickly indexed in adjacent rows and columns by utilizing the matrix row-column relation during the subsequent feature point extraction and feature matching, without traversing all feature points, so that the calculated amount is reduced, and the algorithm running speed is increased. The method of removing empty spots and storing them in the order of adjacent petals may result in the discontinuity in the spot numbers of adjacent rows and columns of the matrix. Although the matrix stored according to the method loses the original continuous feature of the corresponding point numbers, a more efficient and rapid space index mode is provided compared with the point number index mode, and the space features among the points are reconstructed.
S140, after the scene point cloud data are subjected to the reconstruction point cloud storage, the method further comprises the following steps: the formula for extracting the angular points and plane point characteristics of each frame of point cloud data is modified, and specifically comprises the following steps:
A. calculating vector differences between the point with the calculated curvature and the front five points and the rear five points of the point with the calculated curvature to the scanning original point respectively, summing the vector differences, and reflecting the curvature value of the point with the length of the obtained sum, wherein the curvature value calculation formula is as follows:
Wherein c i is the curvature value of the scan point P i, S is an integer set of-5 to 5 excluding 0, AndVectors representing the ith and the jth points in the kth frame point cloud data under the laser radar scanner coordinate system L;
B. When the curvature value of the scanning point is larger than 1, judging the scanning point as an angular point; when the curvature value of the scanning point is smaller than 0.1, judging the scanning point as a plane point;
C. and C, extracting corner points and plane points in the point cloud data of each frame according to the steps A and B.
It should be noted that, the embodiment of the invention provides a LEGO-LoopLOAM algorithm, and the angular point and the plane point of each frame in the point cloud are extracted by calculating the curvature of each point as a judging condition. The method calculates vector differences between the calculated curvature points and the front and rear five points to the scanning origin point, sums the vector differences, and reflects the curvature of the calculated points according to the length of the sum, wherein the method defines corner points with curvature values larger than 1 and plane points with curvature values smaller than 0.1. Let the spatial coordinates of the curvature points be P i(xi,yi,zi), the curvature calculation formula is as follows:
where c i is the curvature of point P i, S is a set of integers from-5 to 5 that do not contain 0, AndAnd (3) representing vectors of the ith point and the j point in the kth frame point cloud data under the laser radar scanner coordinate system L.
The method for replacing the point curvature by the calculated value skillfully reduces the calculated amount of the actual curvature calculation method, has higher efficiency and has good application effect. It requires that the points on the scan line have no height difference and can be projected on a plane under the radar coordinate system without loss. The non-repetitive petal-shaped scanning line of the solid-state laser radar scanner does not meet the requirement and is not suitable for the method, so that the characteristic extraction formula is corrected according to the scanning characteristics of the solid-state laser radar scanner.
The single scanning line point cloud of the mechanical laser radar scanner is arranged in a straight line along the direction of the XOY plane, and the curvature calculated according to the original formula is the curvature of the point on the XOY plane. But the single frame scanning point cloud of the solid-state laser radar scanner is petal-shaped. The arrangement of the point clouds on a plane is not flat, and itself has a certain curvature, as shown in fig. 7. The inherent radian of the scan line may interfere with the calculation result of the feature point extraction formula, resulting in erroneous judgment of the diagonal feature point and the planar feature point.
Because the scanning line of the solid-state laser radar scanner has radian, each point on the scanning line has corresponding curvature even on a plane, so that the curvature value calculated by the solid-state laser radar scanning point cloud by using the formula is discontinuous in change and is inconsistent with the actual curvature change. In an actual experiment, since the planar feature point is determined to be an angular feature point, it is sometimes necessary to eliminate the influence of curvature of the scan point.
First, the difference between the curvature of each point on the solid state scanner scan line and the curvature of the flat scan line point needs to be determined under standard conditions. The study defines that the scanning head is 10 meters from the vertical plane, and that the scanning direction is perpendicular to the plane is the standard condition. The calculated point curvature of the scanning point cloud under the condition is taken as a standard curvature difference c stdif between the solid-state laser radar scanning line and the flat scanning line. Under standard conditions, petals of each scanning line in the scanning point cloud are basically similar, 500 points are arranged on each scanning petal, and the two sides of a connecting line of a scanning center point and a farthest end point of each single petal are basically symmetrical. Therefore, only the standard curvature difference of 250 points of the half petals needs to be calculated, and the standard curvature difference of all points on the standard condition scanning line can be replaced by the corresponding points. Each point from the point closest to the center point of the scan on the single scan line to the furthest point of the petals is numbered, corresponding to numbers 0 through 249. The correspondence between the point number and the standard curvature difference of the point under the standard condition is shown in fig. 8. The standard curvature as a function of point number can be approximated by the following expression:
csrdif=5.28e—7×Index2+0.002
The standard curvature difference of the scanning point is not a constant value, and the size of the standard curvature difference is mainly influenced by two factors of an X-axis included angle of a connecting line of the scanning point and the scanning origin and a distance between the scanning point and the scanning origin. Therefore, the dynamic corresponding relation between the angle and the distance is required to be constructed according to the standard curvature difference so as to compensate the dynamic curvature difference of the scanner at different distances and different postures.
The inherent curvature of a point on a single petal of a single-frame point cloud can change along with the change of the angle theta of an X-axis connecting line of a scanning point and a scanning origin, so that the corresponding relation between the angle and a standard curvature difference needs to be determined. And calculating the corresponding relation between the included angle and the point number, so as to deduce the corresponding relation of the standard curvature difference along with the change of the included angle. The data recorded by the laser radar point cloud only has three-dimensional coordinates and reflection intensity of space points, so that the available known quantity calculated in the subsequent expression is limited to be the three-dimensional coordinates. The calculation formula of the included angle theta of any point on the scanning line comprises the following steps:
Calculating an included angle corresponding to the point number under the standard condition, and fitting a linear expression as follows:
θ=1.9e—4×Index+0.54
The corresponding relation expression of the simultaneous obtainable standard curvature difference and the X-axis included angle of the connecting line of the scanning point and the scanning origin is converted into the corresponding relation expression of the standard curvature difference and the spatial coordinate of the point, and the specific formula is as follows:
The curvature calculation formula adopted by the invention is essentially a method for reflecting the curvature change through the difference value of the local point cloud distance. When the scanning distance becomes far, the distance difference from the adjacent point to the scanning center also becomes large, so that the corresponding curvature value is synchronously increased, and the calculated curvatures of the same feature under different scanning distances are different, so that the feature matching and loop detection precision are affected. Therefore, the influence of the distance on the curvature needs to be eliminated as much as possible, and the influence is mainly reflected in that the curvature calculated value becomes larger as the scanning distance increases.
In order to obtain a distance correction equation corresponding to curvature change, the invention collects point clouds every 2 meters on the same wall surface, the collection distance range is 10-60 meters, and the collection is 25 times. And calculating the point curvatures with the same point numbers on the single petals in the different distance scanning lines acquired each time. And calculating the difference value between the curvature of the corresponding points of different distances x i and the curvature of the corresponding points of the standard condition as a distance correction value. The distance correction value changes as shown in fig. 9. The corresponding expression is:
cdis=0.016xi+0.006
in conjunction with the above formulas, the dynamic correction value for the curvature can be found as follows:
the curvature calculation formula after correction of the invention is as follows:
And S200, processing the obtained scene point cloud data by using the adaptive radius sampling module to extract enough effective point cloud data.
The specific process of processing the obtained scene point cloud data by the self-adaptive radius sampling module is as follows: firstly, calculating Euclidean distances between a sampling center point and all points in scene point cloud data, marking the points exceeding a specific threshold value R as Q, and keeping original values of the rest points; then, the points in the spherical sampling radius are arranged in ascending order according to the distance, and the nearest w points are extracted as sampling points; if the sampling points contain Q values, the number of effective points in the spherical sampling radius is less than w, the sampling radius is required to be gradually increased until at least w effective points are met, so that enough effective points are ensured to be used for feature learning, and the understanding capability and the segmentation precision of the humanoid robot model to complex scenes are improved. As shown in fig. 10 and 11, fig. 10 is a sampling mode of Pointnet ++, fig. 11 is a self-adaptive radius sampling method of the present invention, where green pellets represent point cloud data in an initial radius, yellow pellets represent point cloud data copied by selecting a maximum value, and red pellets and blue pellets represent point cloud features whose initial radius is learned.
In addition, the invention also carries out local sampling on the point cloud and defines a range, and the points in the range are selected as local features to extract so as to capture the fine geometric structure of the small neighborhood. Each local point is subjected to multi-layer perceptron (MLP) operation and local max pooling to obtain local global features. The local features are regrouped and processed on a larger scale to generate feature expression on a wider scale, and the process is repeated until the features of the whole point cloud data are acquired, so that the understanding and representing capability of the model on the whole point cloud structure is enhanced.
S300, extracting point cloud semantic features based on RPV-Net from the extracted effective point cloud data, and acquiring a 3D bounding box and semantic tags of the point cloud data forming the building component in the scene.
The extracting of the point cloud semantic features based on RPV-Net is carried out on the extracted effective point cloud data, and specifically comprises the following steps: inputting point cloud P epsilon R N×3, namely, point cloud data originally containing three dimensions; randomly sampling through a semantic segmentation backbone network and processing the semantic segmentation backbone network to obtain features with depth information of points, clustering the point features into M point subsets according to the point features, voting pairs to obtain votes for each point subset, estimating and classifying the shape of a boundary frame by using a BEV method based on pilar, and redefining seeds, namely obtaining seeds through the semantic segmentation network pairs; extracting Boundary voxel characteristics (Boundary voxel) of the whole pilar by using the concept of the body columnar pilar-based, voting the obtained pilar, then performing a Vote & Sampling & Grouping operation on the obtained point cloud characteristics, and independently generating a voting Vote by each seed through a voting module; the votes are then grouped into h groups and processed by Proposal modules to generate the final Proposal, which outputs the 3D bounding box and semantic tags (Sematic labels). As shown in fig. 12, given one input point cloud having XYZ coordinates, votes are generated from the point cloud, and the votes are grouped into M clusters, from which f grids are divided for each M voting cluster centers. Representative points in the partitioning grid imply possible areas of the object. Then, the seed points around the representative point are revisited, and the surrounding seed point features are aggregated to the representative point. The proposal refinement and classification module fuses and processes the aggregated voting characteristics and revisited seed characteristics to generate refined representative points and semantic categories of the objects, and can be conveniently converted into a three-dimensional object boundary box.
S400, processing the point cloud data subjected to the point cloud semantic feature extraction by using a channel attention module, acquiring the integral features of the point cloud by adopting an average pooling method, extracting individual features with differentiation by adopting a maximum pooling method, and carrying out feature aggregation on the integral features and the individual features to acquire the global features of the point cloud.
The specific treatment process comprises the following steps:
According to the method, after the high-dimensional characteristics of the three-dimensional point cloud data are obtained, the influence of each characteristic channel on the whole point cloud information is focused, and a channel attention mechanism of the point cloud data is realized. The mechanism highlights the information of the key channels by analyzing the information association on each channel of the point cloud features. Firstly, extracting information capable of reflecting global characteristics of point clouds, learning based on the information to obtain attention weights of all channels, and further obtaining the characteristics of the point clouds processed by a channel attention module. The channel attention module is shown in fig. 13.
The feature of the point cloud data may be denoted as f=r BxNx1xC, where B, N, C represents the batch size, the number of point clouds, and the number of feature channels, respectively. The invention carries out pooling operation along the dimension of the characteristic channel number C to generate a point cloud channel attention mechanism Mc epsilon R BxNx1xC. The output characteristic Fc is obtained by matrix cross multiplication of point cloud characteristics F and Mc, namely the invention is subjected to pooling along the dimension of the characteristic channel number C, and the formula can be expressed as follows: fc=f×mc (F).
In order to generate the point cloud channel attention mechanism Mc, the method adopts two modes of average pooling and maximum pooling along the dimension of the characteristic channel number N. Averaging pooling is used to obtain the global features of the point cloud, while maximizing pooling helps to extract more discriminative features. The two pooling modes perform feature aggregation on the point cloud data input features F to generate two different point cloud global features F avg and F max. In addition, the invention uses a multi-layer perceptron (MLP) to extract point cloud characteristics and adopts the MLP sharing parameters to train the characteristics. And the required channel attention mechanism weight is obtained by firstly reducing the coefficient to k and then recovering the size of the point cloud channel C and combining the nonlinear activation function Sigmoid. The point cloud channel attention mechanism formula can be expressed as: mc=s { MLP (Avg (F))+mlp (Max (F)) }, where s represents a Sigmoid function.
S500, constructing a multi-mode mixed model of a scene understanding Bert pre-training model and an encoder-decoder model, processing the acquired global characteristics of the point cloud, increasing the diversity of training data through the scene understanding Bert pre-training model, understanding the three-dimensional point cloud object into the three-dimensional ground object and the parameter information of the three-dimensional ground object through the encoder-decoder model, and improving the understanding capability of the humanoid robot on the scene.
S510, understanding the Bert pre-training model through the scene to increase the diversity of training data, specifically:
s511, the scene understanding Bert pre-training model helps to generate an expansion data set of an antagonism network GAN model, so that the diversity of training data is increased;
s512, in the standard generation countermeasure network GAN model, a generator G and a discriminator D are included;
S513, setting training samples aiming at a scene point cloud data set by adjusting and generating parameter settings of an antagonism network GAN model;
the objective of the generator G is to generate false data as close to real data as possible, and the objective of the discriminator D is to distinguish the real data from the false data generated by the generator G, and to add a balance coefficient γ to adjust the weight between the real data and the false data.
It should be noted that, the scene understanding Bert pre-training model can perform noise reduction processing on the extracted global features of the point cloud, expand the variety of the data set, increase the diversity of training data and strengthen the sample set. The output of the scene understanding Bert pre-training model is the data input for generating the countermeasure network GAN, and in the standard generating countermeasure network, the scene understanding Bert pre-training model mainly comprises a generator G and a discriminator D, and the scene understanding Bert pre-training model helps the generating countermeasure network GAN model learn wider data distribution, so that the generalization capability of generating the countermeasure network GAN model is improved.
Where V (D, G) is the overall optimization objective function, E represents the expectation, x is the true data, z is the noise sampled from some prior distribution p z, G (z) is the data generated by the generator from the noise, and gamma is a dynamically adjusted parameter that is adjusted according to the behavior of the generator and the arbiter during training to maintain the balance between the two. If the log (1-D (G (z)) value is low, the loss of the generator can be enhanced by increasing the gamma value, evaluating the accuracy of the arbiter after each epoch, and adjusting gamma accordingly.
S520, the constructed encoder-decoder model processing process specifically comprises the following steps:
S521, a point cloud encoder takes point clouds P epsilon R n*d as input, wherein n is the number of input point clouds, and d is the feature dimension of each input point; the output of the point cloud encoder is a point feature sequence x= (X 1,x2,…,xm)∈Rm*c, where m is the number of output point features and c is the output point feature dimension;
s522, mapping the point feature X to a point mark Y= (Y 1,y2,…,ym)∈Rk*c′ by a multi-layer perceptron MLP, wherein c' is the dimension of the point mark;
S523, 3D-MELL is an interface end of a decoder only, accepting a tag sequence consisting of text and point tags, i.e. a mixed token sequence, denoted z= (Z 1,z2,…,zk)∈Rk*c′, where k is the total number of tokens, using a self-attention mechanism, 3D-MELL can understand the context between different types of tokens so that it can generate a response based on text and point cloud input, the output of 3D-MELL is a series of predictive tags Z <i=(z1,z2,…,zi-1), the prediction of the i-th tag being conditioned on all previous tags.
Fig. 14 is a schematic diagram of an encoder-decoder model processing procedure, where the point cloud encoder includes a self-attention mechanism module, a feedforward neural network module, and a geometric information aware adapter module.
The built encoder-decoder model can understand the intelligent extracted three-dimensional point cloud objects into three-dimensional ground objects such as beam columns, arches, tablets, pavements, flower beds, trees and the like, and also has information such as size, age and material of the three-dimensional ground objects. Therefore, the multi-mode hybrid model can intelligently identify information such as point cloud, images, texts and the like.
In summary, through the above embodiments, the humanoid robot will have high environmental awareness and intelligent understanding capabilities, so that it can work more effectively and autonomously in various application scenarios. This will not only push the development of humanoid robot technology, but will also contribute to the strength of future intelligent society construction.
The following is an example.
The lidar scanning system is mounted behind the eyes of the robot head. And acquiring three-dimensional information of the member to be assembled by using a laser radar scanning system, and acquiring high-precision three-dimensional scene point cloud data. And processing the obtained scene point cloud data by using the self-adaptive radius sampling module, extracting point cloud semantic features based on RPV-Net from the extracted effective point cloud data, and acquiring a 3D bounding box and a semantic tag of the point cloud data forming the building component in the scene. As shown in fig. 15 to 17. FIG. 15 is a 3D bounding box estimation plot of an overall point cloud acquired by a lidar system; FIG. 16 is a 3D bounding box estimation map of the acquired partial extraction map; fig. 17 is a 3D bounding box estimation map of the acquired accurate local extraction map. And then carrying out understanding communication by combining a large language model.
The number of equipment and the scale of processing described herein are intended to simplify the description of the present invention. Applications, modifications and variations of the present invention will be readily apparent to those skilled in the art.
Although embodiments of the present invention have been disclosed above, it is not limited to the details and embodiments shown and described, it is well suited to various fields of use for which the invention would be readily apparent to those skilled in the art, and accordingly, the invention is not limited to the specific details and illustrations shown and described herein, without departing from the general concepts defined in the claims and their equivalents.
Claims (10)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202411118957.6A CN118982678B (en) | 2024-08-15 | 2024-08-15 | Intelligent humanoid robot scene understanding method based on laser point cloud |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202411118957.6A CN118982678B (en) | 2024-08-15 | 2024-08-15 | Intelligent humanoid robot scene understanding method based on laser point cloud |
Publications (2)
Publication Number | Publication Date |
---|---|
CN118982678A true CN118982678A (en) | 2024-11-19 |
CN118982678B CN118982678B (en) | 2025-03-04 |
Family
ID=93449331
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202411118957.6A Active CN118982678B (en) | 2024-08-15 | 2024-08-15 | Intelligent humanoid robot scene understanding method based on laser point cloud |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN118982678B (en) |
Citations (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP3885972A1 (en) * | 2020-03-24 | 2021-09-29 | Siemens Aktiengesellschaft | Context based perception method and system for managing environmental safety in a computing environment |
CN115421158A (en) * | 2022-11-07 | 2022-12-02 | 中国人民解放军国防科技大学 | Self-supervision learning solid-state laser radar three-dimensional semantic mapping method and device |
WO2022252274A1 (en) * | 2021-05-31 | 2022-12-08 | 北京理工大学 | Point cloud segmentation and virtual environment generation method and apparatus based on pointnet network |
WO2022257801A1 (en) * | 2021-06-09 | 2022-12-15 | 山东大学 | Slam-based mobile robot mine scene reconstruction method and system |
US20230032036A1 (en) * | 2020-10-30 | 2023-02-02 | Boe Technology Group Co., Ltd. | Three-dimensional scene constructing method, apparatus and system, and storage medium |
CN116265862A (en) * | 2021-12-16 | 2023-06-20 | 动态Ad有限责任公司 | Vehicle, system and method for vehicle, and storage medium |
CN116469095A (en) * | 2023-04-21 | 2023-07-21 | 南京邮电大学 | Time-Scene Adaptive 3D Target Detection Method Based on LesionSensor Fusion |
CN116758497A (en) * | 2023-03-21 | 2023-09-15 | 湖南大学 | A laser point cloud three-dimensional target detection method based on unmanned system |
CN117036895A (en) * | 2023-10-10 | 2023-11-10 | 之江实验室 | Multi-task environment sensing method based on point cloud fusion of camera and laser radar |
CN117173527A (en) * | 2023-09-15 | 2023-12-05 | 广东工业大学 | Automatic driving multi-mode data updating and fusing method, device, equipment and medium |
CN117392625A (en) * | 2023-09-12 | 2024-01-12 | 武汉极目智能技术有限公司 | Multimode BEV (BEV) looking around sensing method, device, equipment and storage medium |
CN117496517A (en) * | 2024-01-03 | 2024-02-02 | 广东工业大学 | A method and system for lidar intelligent management and control in city-level real-life three-dimensional construction |
US20240161478A1 (en) * | 2022-11-16 | 2024-05-16 | University Of Science And Technology Beijing | Multimodal Weakly-Supervised Three-Dimensional (3D) Object Detection Method and System, and Device |
CN118298122A (en) * | 2024-04-28 | 2024-07-05 | 江苏大学 | NDT-ICP point cloud registration-based laser radar-IMU tight coupling mapping method for unmanned platform |
CN118298416A (en) * | 2024-03-13 | 2024-07-05 | 南京航空航天大学 | Multi-mode 3D target detection method based on time sequence modeling |
WO2024217115A1 (en) * | 2023-04-21 | 2024-10-24 | 长安大学 | Three-dimensional object detection method based on multi-modal fusion and deep attention mechanism |
-
2024
- 2024-08-15 CN CN202411118957.6A patent/CN118982678B/en active Active
Patent Citations (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP3885972A1 (en) * | 2020-03-24 | 2021-09-29 | Siemens Aktiengesellschaft | Context based perception method and system for managing environmental safety in a computing environment |
US20230032036A1 (en) * | 2020-10-30 | 2023-02-02 | Boe Technology Group Co., Ltd. | Three-dimensional scene constructing method, apparatus and system, and storage medium |
WO2022252274A1 (en) * | 2021-05-31 | 2022-12-08 | 北京理工大学 | Point cloud segmentation and virtual environment generation method and apparatus based on pointnet network |
WO2022257801A1 (en) * | 2021-06-09 | 2022-12-15 | 山东大学 | Slam-based mobile robot mine scene reconstruction method and system |
CN116265862A (en) * | 2021-12-16 | 2023-06-20 | 动态Ad有限责任公司 | Vehicle, system and method for vehicle, and storage medium |
CN115421158A (en) * | 2022-11-07 | 2022-12-02 | 中国人民解放军国防科技大学 | Self-supervision learning solid-state laser radar three-dimensional semantic mapping method and device |
US20240161478A1 (en) * | 2022-11-16 | 2024-05-16 | University Of Science And Technology Beijing | Multimodal Weakly-Supervised Three-Dimensional (3D) Object Detection Method and System, and Device |
CN116758497A (en) * | 2023-03-21 | 2023-09-15 | 湖南大学 | A laser point cloud three-dimensional target detection method based on unmanned system |
CN116469095A (en) * | 2023-04-21 | 2023-07-21 | 南京邮电大学 | Time-Scene Adaptive 3D Target Detection Method Based on LesionSensor Fusion |
WO2024217115A1 (en) * | 2023-04-21 | 2024-10-24 | 长安大学 | Three-dimensional object detection method based on multi-modal fusion and deep attention mechanism |
CN117392625A (en) * | 2023-09-12 | 2024-01-12 | 武汉极目智能技术有限公司 | Multimode BEV (BEV) looking around sensing method, device, equipment and storage medium |
CN117173527A (en) * | 2023-09-15 | 2023-12-05 | 广东工业大学 | Automatic driving multi-mode data updating and fusing method, device, equipment and medium |
CN117036895A (en) * | 2023-10-10 | 2023-11-10 | 之江实验室 | Multi-task environment sensing method based on point cloud fusion of camera and laser radar |
CN117496517A (en) * | 2024-01-03 | 2024-02-02 | 广东工业大学 | A method and system for lidar intelligent management and control in city-level real-life three-dimensional construction |
CN118298416A (en) * | 2024-03-13 | 2024-07-05 | 南京航空航天大学 | Multi-mode 3D target detection method based on time sequence modeling |
CN118298122A (en) * | 2024-04-28 | 2024-07-05 | 江苏大学 | NDT-ICP point cloud registration-based laser radar-IMU tight coupling mapping method for unmanned platform |
Non-Patent Citations (2)
Title |
---|
S. XU等: "FusionPainting: Multimodal Fusion with Adaptive Attention for 3D Object Detection", 2021 IEEE INTERNATIONAL INTELLIGENT TRANSPORTATION SYSTEMS CONFERENCE (ITSC), 25 August 2021 (2021-08-25), pages 3047 - 3054 * |
罗寒等: "基于语义一致性约束与局部-全局感知的多模态3D视觉定位", 计算机应用研究, vol. 41, no. 07, 5 July 2024 (2024-07-05), pages 2203 - 2208 * |
Also Published As
Publication number | Publication date |
---|---|
CN118982678B (en) | 2025-03-04 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110675418B (en) | Target track optimization method based on DS evidence theory | |
CN108229366B (en) | Deep Learning Vehicle Obstacle Detection Method Based on Radar and Image Data Fusion | |
US20230236280A1 (en) | Method and system for positioning indoor autonomous mobile robot | |
CN107092021B (en) | Vehicle Lidar 3D Scanning Method, Land Object Classification Method and System | |
CN110689562A (en) | Trajectory loop detection optimization method based on generation of countermeasure network | |
CN114140527B (en) | Dynamic environment binocular vision SLAM method based on semantic segmentation | |
CN113160327A (en) | Method and system for realizing point cloud completion | |
CN109547769B (en) | A highway traffic dynamic three-dimensional digital scene acquisition and construction system and its working method | |
CN102494663B (en) | Measuring system of swing angle of swing nozzle and measuring method of swing angle | |
CN114140539B (en) | Method and device for acquiring position of indoor object | |
CN110555408A (en) | Single-camera real-time three-dimensional human body posture detection method based on self-adaptive mapping relation | |
CN115128628A (en) | Construction method of road grid map based on laser SLAM and monocular vision | |
CN117055004A (en) | Three-dimensional human skeleton estimation method based on millimeter wave radar sparse point cloud | |
Cui et al. | Precise landing control of UAV based on binocular visual SLAM | |
CN118640878B (en) | Topography mapping method based on aviation mapping technology | |
CN119290016A (en) | A multi-sensor fusion vehicle rapid positioning method, device, storage medium and electronic device | |
CN118982678B (en) | Intelligent humanoid robot scene understanding method based on laser point cloud | |
Wlodarczyk-Sielicka et al. | General concept of reduction process for big data obtained by interferometric methods | |
CN114429515A (en) | A point cloud map construction method, device and device | |
CN118482684B (en) | Elevation information acquisition method, device and system based on vision | |
CN119063717B (en) | A method and system for unmanned intelligent mapping with synchronous positioning | |
CN118279568B (en) | Multi-target identity judging method for distributed double-infrared sensor time sequence twin network | |
CN119152505B (en) | Labeling method, labeling device, labeling equipment and storage medium | |
CN119006700B (en) | Complex scene 3D reconstruction method based on multi-source fusion | |
CN119359766B (en) | Unmanned aerial vehicle carries many targets and surveys tracking system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |