CN117058512A - Multi-mode data fusion method and system based on traffic big model - Google Patents
Multi-mode data fusion method and system based on traffic big model Download PDFInfo
- Publication number
- CN117058512A CN117058512A CN202311083560.3A CN202311083560A CN117058512A CN 117058512 A CN117058512 A CN 117058512A CN 202311083560 A CN202311083560 A CN 202311083560A CN 117058512 A CN117058512 A CN 117058512A
- Authority
- CN
- China
- Prior art keywords
- data
- labeling
- mode
- traffic data
- traffic
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000007500 overflow downdraw method Methods 0.000 title claims abstract description 19
- 238000002372 labelling Methods 0.000 claims abstract description 153
- 230000004927 fusion Effects 0.000 claims abstract description 33
- 238000012545 processing Methods 0.000 claims abstract description 17
- 230000010354 integration Effects 0.000 claims abstract description 12
- 238000001514 detection method Methods 0.000 claims description 26
- 230000011218 segmentation Effects 0.000 claims description 11
- 238000000034 method Methods 0.000 abstract description 12
- 238000013473 artificial intelligence Methods 0.000 abstract description 6
- 238000011161 development Methods 0.000 description 4
- 230000018109 developmental process Effects 0.000 description 4
- 238000005457 optimization Methods 0.000 description 4
- 238000012549 training Methods 0.000 description 4
- 230000000007 visual effect Effects 0.000 description 4
- 241001464837 Viridiplantae Species 0.000 description 3
- 230000002159 abnormal effect Effects 0.000 description 3
- 230000008447 perception Effects 0.000 description 3
- XEEYBQQBJWHFJM-UHFFFAOYSA-N Iron Chemical compound [Fe] XEEYBQQBJWHFJM-UHFFFAOYSA-N 0.000 description 2
- 230000008859 change Effects 0.000 description 2
- 238000004891 communication Methods 0.000 description 2
- 238000010276 construction Methods 0.000 description 2
- 238000013461 design Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000000605 extraction Methods 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 238000003780 insertion Methods 0.000 description 2
- 230000037431 insertion Effects 0.000 description 2
- 230000003993 interaction Effects 0.000 description 2
- 238000010801 machine learning Methods 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 241000283070 Equus zebra Species 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 238000007796 conventional method Methods 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000002349 favourable effect Effects 0.000 description 1
- 238000009434 installation Methods 0.000 description 1
- 238000011835 investigation Methods 0.000 description 1
- 229910052742 iron Inorganic materials 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 230000029305 taxis Effects 0.000 description 1
- 230000002123 temporal effect Effects 0.000 description 1
- 238000001931 thermography Methods 0.000 description 1
- 230000016776 visual perception Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/80—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01S—RADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
- G01S17/00—Systems using the reflection or reradiation of electromagnetic waves other than radio waves, e.g. lidar systems
- G01S17/88—Lidar systems specially adapted for specific applications
- G01S17/89—Lidar systems specially adapted for specific applications for mapping or imaging
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/60—Protecting data
- G06F21/602—Providing cryptographic facilities or services
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/26—Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/764—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/50—Context or environment of the image
- G06V20/52—Surveillance or monitoring of activities, e.g. for recognising suspicious objects
- G06V20/54—Surveillance or monitoring of activities, e.g. for recognising suspicious objects of traffic, e.g. cars on the road, trains or boats
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/60—Type of objects
- G06V20/62—Text, e.g. of license plates, overlay texts or captions on TV images
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N1/00—Scanning, transmission or reproduction of documents or the like, e.g. facsimile transmission; Details thereof
- H04N1/44—Secrecy systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V2201/00—Indexing scheme relating to image or video recognition or understanding
- G06V2201/08—Detecting or categorising vehicles
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Software Systems (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Evolutionary Computation (AREA)
- Medical Informatics (AREA)
- Databases & Information Systems (AREA)
- Computing Systems (AREA)
- Artificial Intelligence (AREA)
- Bioethics (AREA)
- Signal Processing (AREA)
- Human Computer Interaction (AREA)
- Computer Hardware Design (AREA)
- Computer Security & Cryptography (AREA)
- General Engineering & Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Electromagnetism (AREA)
- Radar, Positioning & Navigation (AREA)
- Remote Sensing (AREA)
- Traffic Control Systems (AREA)
Abstract
The invention discloses a multi-mode data fusion method and system based on a traffic big model, and relates to the technical field of artificial intelligence. The method comprises the following steps: acquiring initial multi-mode traffic data; performing data matching alignment on the initial multi-mode traffic data to form multi-mode traffic data; performing space-time multi-dimensional, multi-target, multi-type and multi-task data labeling on the multi-mode traffic data to form multi-mode labeling data; and performing scene integration and data encryption processing on the multi-mode labeling data to form multi-mode fusion data. After matching alignment, space-time multidimensional labeling, multi-target labeling, multi-type labeling, multi-task labeling, scene integration and encryption processing are sequentially carried out on the data in a plurality of different modes, multi-mode fusion data are obtained, various complex and numerous traffic scenes can be dealt with, and the technical problem of low accuracy caused by relying on a single data source is solved.
Description
Technical Field
The invention relates to the technical field of artificial intelligence, in particular to a multi-mode data fusion method and system based on a traffic big model.
Background
In recent years, with the development of urban areas, the population is continuously increased, the quantity of the reserved automobiles is continuously increased, the traffic flow is continuously increased, and the traffic complexity is continuously increased. With the development of artificial intelligence technology, the current artificial intelligence is an artificial intelligence crossing multiple modal data, and the general artificial intelligence obtained through massive multi-modal training data can also be called as a large model.
The method has important significance in the judgment and early warning of traffic jams, the iterative updating of vehicles in areas, the detection and early warning of abnormal traffic conditions and the like, and has important significance in the convenience and safety of travel. However, the conventional method depends on a single data source and cannot be applied to complex and numerous traffic scenes, so that the accuracy of the analysis of the traffic scenes is low.
Disclosure of Invention
The invention aims to solve the technical problem that the traditional method depends on a single data source and cannot be suitable for complex and numerous traffic scenes. In order to achieve the above purpose, the invention provides a multi-mode data fusion method and system based on a traffic big model.
The invention provides a multi-mode data fusion method based on a traffic big model, which comprises the following steps:
acquiring initial multi-mode traffic data;
performing data matching alignment on the initial multi-mode traffic data to form multi-mode traffic data;
performing space-time multi-dimensional, multi-target, multi-type and multi-task data labeling on the multi-mode traffic data to form multi-mode labeling data;
and performing scene integration and data encryption processing on the multi-mode labeling data to form multi-mode fusion data.
In one embodiment, the performing data matching on the initial multi-mode traffic data to form multi-mode traffic data includes:
taking a sensor observation picture corresponding to any mode traffic data in the initial multi-mode traffic data as a reference coordinate system;
and taking the reference coordinate system as an alignment reference, and performing angle rotation and position offset adjustment on a sensor coordinate system corresponding to each mode of traffic data in the initial multi-mode traffic data.
In one embodiment, the performing data matching on the initial multi-mode traffic data to form multi-mode traffic data further includes:
taking the sensor acquisition frequency corresponding to any mode traffic data in the initial multi-mode traffic data as a reference frequency;
and taking the reference frequency as an alignment reference, and carrying out frame inserting or frame extracting adjustment on each mode traffic data in the initial multi-mode traffic data.
In one embodiment, the performing space-time multi-dimensional, multi-objective, multi-type and multi-task data labeling on the multi-modal traffic data to form multi-modal labeling data includes:
performing multi-target labeling and multi-type labeling of space dimensions on multi-mode traffic data of a single frame;
and carrying out multi-target labeling and multi-type labeling of time dimension on the multi-mode traffic data in the continuous time period.
In one embodiment, the performing space-time multidimensional, multi-objective, multi-type and multi-task data labeling on the multi-modal traffic data to form multi-modal labeling data further includes:
and performing target detection task labeling, multi-category target semantic segmentation task labeling, three-dimensional target detection task labeling, vehicle information identification task labeling, vehicle target tracking and re-identification task labeling and pedestrian gesture identification task labeling on the multi-mode traffic data.
The invention provides a multi-mode data fusion system based on a traffic big model, which comprises the following components:
the multi-mode data acquisition module is used for acquiring initial multi-mode traffic data;
the matching alignment module is used for carrying out data matching alignment on the initial multi-mode traffic data to form multi-mode traffic data;
the marking module is used for marking the multi-mode traffic data in a space-time multi-dimension, multi-target, multi-type and multi-task mode to form multi-mode marking data;
and the data processing module is used for performing scene integration and data encryption processing on the multi-mode labeling data to form multi-mode fusion data.
In one embodiment, the matching alignment module includes:
the reference coordinate system selection module is used for taking a sensor observation picture corresponding to any mode traffic data in the initial multi-mode traffic data as a reference coordinate system;
and the coordinate system adjusting module is used for carrying out angle rotation and position offset adjustment on the sensor coordinate system corresponding to each mode traffic data in the initial multi-mode traffic data by taking the reference coordinate system as an alignment reference.
In one embodiment, the matching alignment module further comprises:
the reference frequency selecting module is used for taking the sensor acquisition frequency corresponding to any mode traffic data in the initial multi-mode traffic data as a reference frequency;
and the frame number adjusting module is used for carrying out frame inserting or frame extracting adjustment on each mode traffic data in the initial multi-mode traffic data by taking the reference frequency as an alignment reference.
In one embodiment, the labeling module comprises:
the single-frame labeling module is used for carrying out multi-target labeling and multi-type labeling of space dimension on the multi-mode traffic data of a single frame;
and the continuous frame labeling module is used for carrying out multi-target labeling and multi-type labeling of time dimension on the multi-mode traffic data in the continuous time period.
In one embodiment, the labeling module further comprises:
the task labeling module is used for carrying out target detection task labeling, multi-category target semantic segmentation task labeling, three-dimensional target detection task labeling, vehicle information identification task labeling, vehicle target tracking and re-identification task labeling and pedestrian gesture identification task labeling on the multi-mode traffic data.
In the multi-mode data fusion method and system based on the traffic big model, based on the initial multi-mode traffic data, the traffic data of different modes are matched and aligned in data, so that the traffic data of different modes are fused under the same criterion, the target characteristics are represented from different dimensions, and the characteristics of each target in the traffic situation are comprehensively reflected. The multi-mode traffic data is subjected to space-time multi-dimensional, multi-target, multi-type and multi-task data labeling, the changes of different traffic scenes can be dealt with, corresponding labeling data are formed for different large-model tasks, and the multi-mode labeling data are fully utilized to perform training optimization of the different large-model tasks. The multi-mode labeling data is subjected to scene integration, different division can be performed aiming at different large-model tasks, and the method has pertinence. Further, the multi-mode labeling data is encrypted, so that the safety of the multi-mode labeling data is ensured. Therefore, according to the multi-mode data fusion method based on the traffic big model, the multi-mode fusion data can be obtained after matching alignment, space-time multi-dimensional labeling, multi-target labeling, multi-type labeling, multi-task labeling, scene integration and encryption processing are sequentially carried out on the data under a plurality of different modes, various complex and numerous traffic scenes can be dealt with, and the problem of low accuracy caused by relying on a single data source is avoided.
The multi-mode data fusion method based on the traffic big model provided by the invention is used for obtaining multi-mode fusion data, and carrying out algorithm analysis processing, so that scene understanding analysis of traffic states in time space can be realized. Further, through the multi-mode fusion data, the data after traffic scene understanding analysis can be extracted and mined, the information reflecting the traffic state is obtained through analysis, the traffic construction and communication degree in and among different administrative areas are enhanced, convenience of resident traffic traveling is improved, optimization of traffic environment is achieved, and economic development is promoted.
Drawings
Fig. 1 is a schematic flow chart of steps of a multi-mode data fusion method based on a traffic big model.
Fig. 2 is a schematic structural diagram of a multi-mode data fusion system based on a traffic big model.
Detailed Description
The technical scheme of the invention is further described in detail through the drawings and the embodiments.
Referring to fig. 1, the present invention provides a multi-mode data fusion method based on a traffic big model, which includes:
s10, acquiring initial multi-mode traffic data;
s20, performing data matching alignment on the initial multi-mode traffic data to form multi-mode traffic data;
s30, carrying out space-time multi-dimensional, multi-target, multi-type and multi-task data labeling on the multi-mode traffic data to form multi-mode labeling data;
s40, scene integration and data encryption processing are carried out on the multi-mode labeling data to form multi-mode fusion data.
In this embodiment, the initial multi-mode traffic data includes radar fusion mode traffic data, infrared light mode traffic data, laser radar mode traffic data, and the like. The radar fusion mode traffic data is acquired through a radar fusion integrated machine, the infrared light mode traffic data is acquired through an infrared light camera, and the laser radar mode traffic data is acquired through a laser radar. The initial multi-mode traffic data is derived from traffic data of different traffic positions such as urban roads, intersections, roadside parking areas, expressways, key areas such as gates of schools, hospitals and other places, and covers dynamic change data of targets such as vehicles and pedestrians in different scenes, flow change data of vehicles in different time periods, records of abnormal conditions in different scenes such as event data of vehicle congestion and the like, event statistics data of roadside parking, vehicle flow data of expressways, multidimensional data of vehicle running directions at intersections and the like. The initial multi-mode traffic data comprises traffic data with different dimensions in dynamic and static states, and is favorable for scene analysis and understanding of different scenes by the traffic large model.
The radar fusion integrated machine is a traffic sensor which combines a monocular or multi-purpose visible light camera with a millimeter wave radar to form a radar fusion integrated body. The visible light camera and the millimeter wave radar are combined, so that more complex traffic scenes can be dealt with. The infrared camera can generate night images with good perception at night by utilizing infrared thermal imaging, and under a night scene, when the perception capability of the visible light camera is reduced, the infrared sensing images are utilized to perform visual perception at the same time, so that the perception accuracy can be improved. In daytime scenes, under the condition that certain pedestrian vehicles are shielded, the infrared cameras can also provide important sensing information by utilizing the infrared thermal sensing capability of the infrared cameras. The laser radar can directly acquire the sensing information such as the real position, the distance, the angle, the speed and the like of the targets such as vehicles, pedestrians and the like in the road side parking scene, the sensing data obtained by the laser radar can be used for generating a three-dimensional frame image of the targets such as the vehicles, the pedestrians and the like, and the feature data is analyzed and judged by fusing the three-dimensional frame image with the two-dimensional image features obtained by the visible light and infrared light cameras, so that multidimensional sensing, analysis and judgment with more accurate, stable and strong anti-interference capability on different traffic scenes are realized.
The device such as the radar fusion integrated machine, the infrared camera, the laser radar and the like can be integrally arranged in the same device shell, and three sensing devices can be arranged on the high-level rod with the same height. And carrying out data matching alignment on the initial multi-mode traffic data, including alignment of a time dimension and a space dimension. The data matching and alignment of the space dimensions are to unify traffic data of different modes under the same coordinate system, so that the situation that the range size and the distance of traffic scene pictures captured by sensing equipment of different modes are different can be avoided, and the unification of the space positions of the multi-mode data is realized. The data matching and alignment of the time dimension are unified for the data acquisition frequency of the perceptrons of different modes. And (3) aligning the data of the time dimension of the sensing devices of different modes, and ensuring that acquired data of the devices of the multiple modes are the sensing data recorded by the same scene at the same moment.
The labeling of the space-time multi-dimension is to label the data in the two-way dimension of time and space. Multi-object annotations can be understood as annotations of multiple categories of objects. Multiple types of annotations can be understood as different types of annotations for the same object. The multitasking data annotation can be understood as performing different visual tasks through different visual algorithms based on the characteristics of different modal data when analyzing and understanding traffic scenes. In order to solve the problem that data labels in different scenes are not uniform in various aspects such as formats, categories and modes under different traffic scenes, in order to solve the problem that different scenes realize different functions, in this embodiment, multi-mode traffic data in different scenes are subjected to uniform space-time multi-dimensional, multi-objective, multi-type and multi-task data labels to form multi-mode label data.
The multi-mode labeling data are divided according to different scenes so as to meet the requirements of different traffic scenes when a large model is built. Or dividing the labeling labels of the labeling data of different modes, and taking the data of different modes according to the model effect when training the large model, and gradually iterating and optimizing the large model of each different application scene. The multi-mode labeling data is encrypted, so that the data safety can be ensured, and the privacy information of vehicles and pedestrians is prevented from being divulged. In one embodiment, the multimodal annotation data is encrypted according to a federal learning method. The federal learning method is a distributed machine learning method for protecting privacy, and can be exchanged among parties in an encrypted form, so that data on each site is protected, and a trained federal learning model can be placed in each party of the federal learning system or can be shared among multiple parties.
According to the traffic large model-based multi-mode data fusion method, based on initial multi-mode traffic data, the traffic data of different modes are matched and aligned in data, so that the traffic data of different modes are fused under the same criterion, the target characteristics are represented from different dimensions, and the characteristics of each target in a traffic situation are comprehensively reflected. The multi-mode traffic data is subjected to space-time multi-dimensional, multi-target, multi-type and multi-task data labeling, the changes of different traffic scenes can be dealt with, corresponding labeling data are formed for different large-model tasks, and the multi-mode labeling data are fully utilized to perform training optimization of the different large-model tasks. The multi-mode labeling data is subjected to scene integration, different division can be performed aiming at different large-model tasks, and the method has pertinence. Further, the multi-mode labeling data is encrypted, so that the safety of the multi-mode labeling data is ensured. Therefore, according to the multi-mode data fusion method based on the traffic big model, the multi-mode fusion data can be obtained after matching alignment, space-time multi-dimensional labeling, multi-target labeling, multi-type labeling, multi-task labeling, scene integration and encryption processing are sequentially carried out on the data under a plurality of different modes, various complex and numerous traffic scenes can be dealt with, and the problem of low accuracy caused by relying on a single data source is avoided.
The multi-mode data fusion method based on the traffic big model provided by the invention is used for obtaining multi-mode fusion data, and carrying out algorithm analysis processing, so that scene understanding analysis of traffic states in time space can be realized. Further, through the multi-mode fusion data, the data after traffic scene understanding analysis can be extracted and mined, the information reflecting the traffic state is obtained through analysis, the traffic construction and communication degree in and among different administrative areas are enhanced, convenience of resident traffic traveling is improved, optimization of traffic environment is achieved, and economic development is promoted.
In one embodiment, S20, performing data matching alignment on the initial multi-modal traffic data to form multi-modal traffic data includes:
s210, taking a sensor observation picture corresponding to any mode traffic data in the initial multi-mode traffic data as a reference coordinate system;
s220, taking the reference coordinate system as an alignment reference, and performing angle rotation and position offset adjustment on a sensor coordinate system corresponding to each mode traffic data in the initial multi-mode traffic data.
In this embodiment, performing data matching alignment on the initial multi-mode traffic data includes two dimensions, which are a spatial dimension and a temporal dimension, respectively. In the space dimension, the sensor observation picture of any mode is used as a reference coordinate system, and the sensors corresponding to traffic data of other modes are used for adjusting the coordinate system according to the installation position and the range of the sensing picture and are aligned with the reference coordinate system.
In one embodiment, the radar fusion integrated machine is used as a reference for observation, and the sensor coordinate systems of other modes such as a laser radar, an infrared camera and the like are subjected to angle rotation and position offset adjustment and aligned with the coordinates of the radar fusion integrated machine, so that the alignment in the space dimension is realized.
In one embodiment, S20, performing data matching alignment on the initial multi-mode traffic data to form multi-mode traffic data, and further includes:
s230, taking the sensor acquisition frequency corresponding to any mode traffic data in the initial multi-mode traffic data as a reference frequency;
s240, taking the reference frequency as an alignment reference, and carrying out frame inserting or frame extracting adjustment on each mode traffic data in the initial multi-mode traffic data.
In this embodiment, in the time dimension, the acquisition frequency of the sensor in any mode is used as the reference frequency, and the sensor corresponding to the traffic data in other modes performs frame extraction or frame insertion adjustment according to the reference frequency, so as to align the data in the time dimension with the reference frequency. The modal traffic data lower than the reference frequency is interpolated and the modal traffic data higher than the reference frequency is decimated to be the same as the reference frequency. In one embodiment, the frame interpolation method includes, but is not limited to, employing bilinear interpolation or the like.
In one embodiment, the data acquisition frequency of the radar fusion integrated machine is used as a reference frequency, and the sensing data of other modes such as a laser radar, an infrared camera and the like are subjected to frame insertion or frame extraction adjustment, so that the sensing data is aligned with the acquisition frequency of the radar fusion integrated machine, and the alignment in the time dimension is realized.
In one embodiment, S30, performing space-time multidimensional, multi-objective, multi-type, multi-task data labeling on multi-modal traffic data to form multi-modal labeling data, including:
s310, performing multi-target labeling and multi-type labeling of space dimensions on multi-mode traffic data of a single frame;
s320, multi-mode traffic data in the continuous time period is subjected to multi-target labeling and multi-type labeling in a time dimension.
In this embodiment, the space-time multidimensional labeling is to label data in both time and space dimensions. Multi-target labeling and multi-type labeling of spatial dimensions can be understood as multi-category, multi-target labeling in spatial dimensions based on multi-modal data of a single frame. The multi-target annotation and multi-type annotation of the time dimension can be understood as the annotation of continuous time periods of multi-mode data for performing tasks such as target tracking. Multi-target labeling may be understood as labeling of multiple categories of targets, including but not limited to, pedestrian, vehicular, non-automotive, lane-marking, traffic sign, green plant, etc., categories of target labeling. Vehicles include cars, buses, taxis, and the like. Multiple types of annotations can be understood as different types of annotations for the same object.
In one embodiment, taking a vehicle target as an example, the multi-type labeling can be used for carrying out various labeling modes such as two-dimensional rectangular frames, three-dimensional stereo frames, semantic level classification labeling, instance level classification labeling, point cloud labeling and the like on the vehicle. The annotation type can be switched at will in different scene tasks to adapt to the algorithm realized by the scene analysis task, so as to achieve the best model effect.
In one embodiment, S30, performing space-time multidimensional, multi-objective, multi-type, and multi-task data labeling on the multi-modal traffic data to form multi-modal labeling data, and further includes:
s330, performing target detection task labeling, multi-category target semantic segmentation task labeling, three-dimensional target detection task labeling, vehicle information identification task labeling, vehicle target tracking and re-identification task labeling and pedestrian gesture identification task labeling on the multi-mode traffic data.
In this embodiment, the multitasking data annotation may be understood as performing different visual tasks through different visual algorithms based on characteristics of different modal data when analyzing and understanding traffic scenes. In one embodiment, two-dimensional object detection and utilization of semantic segmentation algorithms are performed based on visible light image modality traffic data. And detecting a three-dimensional target, estimating the depth and the like according to the laser point cloud mode traffic data. And detecting obstacles according to millimeter wave radar mode traffic data. The advantages of different mode data are fully exerted through the multi-mode labeling data, traffic scene understanding is more comprehensively and deeply carried out, and the application of the traffic field large model is realized.
The target detection task is marked by marking the position coordinates and the category information of a detection frame for detecting the visible light image and the infrared light image mode traffic data to obtain the target. And when the target detection task labeling step is executed, a target detection algorithm based on deep learning or a target detection algorithm based on machine learning is adopted for labeling. Or, the target detection task is marked by clustering the laser point cloud modal traffic data to realize marking of the positions of different targets. The target detection task labels are used for realizing target detection, and for realizing the basic task of a traffic large model, the position and the category analysis of different targets of a traffic scene can be realized.
The multi-category target semantic segmentation task is marked by marking mask information and category information of targets after classifying the targets pixel by pixel in an image or point cloud. The marked multiple categories comprise green plants, berth lines, lane lines, well covers, zebra lines, iron grates and the like. The positions of targets such as green plants, berth lines and the like can be obtained more accurately by labeling multi-category targets by semantic segmentation tasks and combining a semantic segmentation algorithm, and the targets are used for subsequent scene analysis tasks. The three-dimensional target detection task is to label the laser radar point cloud modal traffic data with the three-dimensional targets such as the length, width and height information of the targets such as vehicles and pedestrians and the course angle information of the targets, so that the specific position of the vehicle can be judged, and traffic condition judgment can be more accurately carried out on the conditions such as shielding.
The vehicle information recognition task is marked by marking the character position, the character category and the vehicle type category of the image, and a license plate recognition result is obtained by combining a license plate recognition algorithm. The vehicle target tracking and re-identification task marking is marking of vehicle tracking across cameras by using a tracking algorithm at road sides, road junctions and the like, marking of vehicle targets is carried out under different cameras, and the vehicle target tracking and re-identification task marking can be used for tracking investigation and other scenes of illegal vehicles. The pedestrian gesture recognition task is marked by marking the interaction relation between the pedestrian and the person of other targets, and the recognition of the pedestrian gesture is realized by combining a key point detection method.
In one embodiment, multi-mode traffic data is marked by a plurality of tasks such as example segmentation task marking, multi-target classification task marking, human-vehicle interaction detection task marking, vehicle track prediction task marking, abnormal event detection task marking and the like, so that multi-mode marked data is formed, and the multi-mode marked data is used for building a traffic large model and has important value.
Referring to fig. 2, the present invention provides a multi-modal data fusion system 100 based on a traffic big model. The traffic big model based multi-modal data fusion system 100 includes a multi-modal data acquisition module 10, a matching alignment module 20, a labeling module 30, and a data processing module 40. The multi-modal data acquisition module 10 is configured to acquire initial multi-modal traffic data. The matching alignment module 20 is configured to perform data matching alignment on the initial multi-mode traffic data to form multi-mode traffic data. The labeling module 30 is used for labeling the multi-modal traffic data in a space-time multi-dimension, multi-objective, multi-type and multi-task manner to form multi-modal labeling data. The data processing module 40 is configured to perform scene integration and data encryption processing on the multi-modal labeling data to form multi-modal fusion data.
In this embodiment, the description of the multi-mode data acquisition module 10 may refer to the description of S10 in the above embodiment. The relevant description of the matching alignment module 20 may refer to the relevant description of S20 in the above embodiment. The relevant description of the labeling module 30 may refer to the relevant description of S30 in the above embodiment. The relevant description of the data processing module 40 may refer to the relevant description of S40 in the above-described embodiment.
In one embodiment, the matching alignment module 20 includes a reference coordinate system selection module and a coordinate system adjustment module. The reference coordinate system selection module is used for taking a sensor observation picture corresponding to any mode traffic data in the initial multi-mode traffic data as a reference coordinate system. The coordinate system adjusting module is used for adjusting angle rotation and position offset of a sensor coordinate system corresponding to each mode traffic data in the initial multi-mode traffic data by taking the reference coordinate system as an alignment reference.
In this embodiment, the description of the reference coordinate system selection module may refer to the description of S210 in the above embodiment. The description of the coordinate system adjustment module may refer to the description of S220 in the above embodiment.
In one embodiment, the matching alignment module 20 further includes a reference frequency selection module and a frame number adjustment module. The reference frequency selecting module is used for taking the sensor acquisition frequency corresponding to any mode traffic data in the initial multi-mode traffic data as the reference frequency. The frame number adjusting module is used for carrying out frame inserting or frame extracting adjustment on each mode traffic data in the initial multi-mode traffic data by taking the reference frequency as an alignment reference.
In this embodiment, the description of the reference frequency selection module may refer to the description of S230 in the above embodiment. The relevant description of the frame number adjustment module may refer to the relevant description of S240 in the above embodiment.
In one embodiment, the annotation module 30 includes a single frame annotation module and a continuous frame annotation module. The single-frame labeling module is used for carrying out multi-target labeling and multi-type labeling on the multi-mode traffic data of a single frame in space dimension. The continuous frame labeling module is used for carrying out multi-target labeling and multi-type labeling of time dimension on the multi-mode traffic data in the continuous time period.
In this embodiment, the description of the single-frame labeling module may refer to the description of S310 in the above embodiment. The description of the continuous frame labeling module may refer to the description of S320 in the above embodiment.
In one embodiment, the annotation module 30 also includes a task annotation module. The task labeling module is used for carrying out target detection task labeling, multi-category target semantic segmentation task labeling, three-dimensional target detection task labeling, vehicle information identification task labeling, vehicle target tracking and re-identification task labeling and pedestrian gesture identification task labeling on the multi-mode traffic data.
In this embodiment, the description of the task labeling module may refer to the description of S330 in the above embodiment.
In the various embodiments described above, the particular order or hierarchy of steps in the processes disclosed are examples of exemplary approaches. Based on design preferences, it is understood that the specific order or hierarchy of steps in the processes may be rearranged without departing from the scope of the present disclosure. The accompanying method claims present elements of the various steps in a sample order, and are not meant to be limited to the specific order or hierarchy.
Those of skill in the art will further appreciate that the various illustrative logical blocks (illustrative logical block) listed in the present invention, modules and steps may be implemented by electronic hardware, computer software, or combinations of both. To clearly illustrate this interchangeability of hardware and software, various illustrative components (illustrative components), modules, and steps have been described above generally in terms of their functionality. Whether such functionality is implemented as hardware or software depends upon the particular application and design requirements of the overall system. Those skilled in the art may implement the described functionality in varying ways for each particular application, but such implementation is not to be understood as beyond the scope of the embodiments of the present invention.
The various illustrative logical blocks or modules described in connection with the embodiments of the present invention may be implemented or performed with a general purpose processor, a digital signal processor, an Application Specific Integrated Circuit (ASIC), a field programmable gate array or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or any combination thereof designed to perform the functions described herein. A general purpose processor may be a microprocessor, but in the alternative, the general purpose processor may be any conventional processor, controller, microcontroller, or state machine. A processor may also be implemented as a combination of computing devices, e.g., a digital signal processor and a microprocessor, a plurality of microprocessors, one or more microprocessors in conjunction with a digital signal processor core, or any other similar configuration.
The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. A software module may be stored in RAM memory, flash memory, ROM memory, EPROM memory, EEPROM memory, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art. In an example, a storage medium may be coupled to the processor such that the processor can read information from, and write information to, the storage medium. In the alternative, the storage medium may be integral to the processor. The processor and the storage medium may reside in an ASIC, which may reside in a user terminal. In the alternative, the processor and the storage medium may reside as distinct components in a user terminal.
The foregoing description of the embodiments has been provided for the purpose of illustrating the general principles of the invention, and is not meant to limit the scope of the invention, but to limit the invention to the particular embodiments, and any modifications, equivalents, improvements, etc. that fall within the spirit and principles of the invention are intended to be included within the scope of the invention.
Claims (10)
1. A multi-mode data fusion method based on a traffic big model is characterized by comprising the following steps:
acquiring initial multi-mode traffic data;
performing data matching alignment on the initial multi-mode traffic data to form multi-mode traffic data;
performing space-time multi-dimensional, multi-target, multi-type and multi-task data labeling on the multi-mode traffic data to form multi-mode labeling data;
and performing scene integration and data encryption processing on the multi-mode labeling data to form multi-mode fusion data.
2. The traffic big model-based multi-modal data fusion method according to claim 1, wherein the performing data matching alignment on the initial multi-modal traffic data to form multi-modal traffic data includes:
taking a sensor observation picture corresponding to any mode traffic data in the initial multi-mode traffic data as a reference coordinate system;
and taking the reference coordinate system as an alignment reference, and performing angle rotation and position offset adjustment on a sensor coordinate system corresponding to each mode of traffic data in the initial multi-mode traffic data.
3. The traffic big model-based multi-modal data fusion method according to claim 2, wherein the performing data matching alignment on the initial multi-modal traffic data to form multi-modal traffic data further comprises:
taking the sensor acquisition frequency corresponding to any mode traffic data in the initial multi-mode traffic data as a reference frequency;
and taking the reference frequency as an alignment reference, and carrying out frame inserting or frame extracting adjustment on each mode traffic data in the initial multi-mode traffic data.
4. The traffic big model-based multi-modal data fusion method according to claim 1, wherein the performing space-time multi-dimensional, multi-objective, multi-type and multi-task data labeling on the multi-modal traffic data to form multi-modal labeling data comprises:
performing multi-target labeling and multi-type labeling of space dimensions on multi-mode traffic data of a single frame;
and carrying out multi-target labeling and multi-type labeling of time dimension on the multi-mode traffic data in the continuous time period.
5. The traffic big model-based multi-modal data fusion method according to claim 4, wherein the performing space-time multi-dimensional, multi-objective, multi-type and multi-task data labeling on the multi-modal traffic data to form multi-modal labeling data further comprises:
and performing target detection task labeling, multi-category target semantic segmentation task labeling, three-dimensional target detection task labeling, vehicle information identification task labeling, vehicle target tracking and re-identification task labeling and pedestrian gesture identification task labeling on the multi-mode traffic data.
6. A traffic big model-based multi-modal data fusion system, comprising:
the multi-mode data acquisition module is used for acquiring initial multi-mode traffic data;
the matching alignment module is used for carrying out data matching alignment on the initial multi-mode traffic data to form multi-mode traffic data;
the marking module is used for marking the multi-mode traffic data in a space-time multi-dimension, multi-target, multi-type and multi-task mode to form multi-mode marking data;
and the data processing module is used for performing scene integration and data encryption processing on the multi-mode labeling data to form multi-mode fusion data.
7. The traffic heavy model-based multimodal data fusion system of claim 6, wherein the matching alignment module comprises:
the reference coordinate system selection module is used for taking a sensor observation picture corresponding to any mode traffic data in the initial multi-mode traffic data as a reference coordinate system;
and the coordinate system adjusting module is used for carrying out angle rotation and position offset adjustment on the sensor coordinate system corresponding to each mode traffic data in the initial multi-mode traffic data by taking the reference coordinate system as an alignment reference.
8. The traffic heavy model-based multimodal data fusion system of claim 7, wherein the matching alignment module further comprises:
the reference frequency selecting module is used for taking the sensor acquisition frequency corresponding to any mode traffic data in the initial multi-mode traffic data as a reference frequency;
and the frame number adjusting module is used for carrying out frame inserting or frame extracting adjustment on each mode traffic data in the initial multi-mode traffic data by taking the reference frequency as an alignment reference.
9. The traffic heavy model-based multimodal data fusion system of claim 6, wherein the labeling module comprises:
the single-frame labeling module is used for carrying out multi-target labeling and multi-type labeling of space dimension on the multi-mode traffic data of a single frame;
and the continuous frame labeling module is used for carrying out multi-target labeling and multi-type labeling of time dimension on the multi-mode traffic data in the continuous time period.
10. The traffic heavy model-based multimodal data fusion system of claim 9, wherein the labeling module further comprises:
the task labeling module is used for carrying out target detection task labeling, multi-category target semantic segmentation task labeling, three-dimensional target detection task labeling, vehicle information identification task labeling, vehicle target tracking and re-identification task labeling and pedestrian gesture identification task labeling on the multi-mode traffic data.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311083560.3A CN117058512A (en) | 2023-08-25 | 2023-08-25 | Multi-mode data fusion method and system based on traffic big model |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311083560.3A CN117058512A (en) | 2023-08-25 | 2023-08-25 | Multi-mode data fusion method and system based on traffic big model |
Publications (1)
Publication Number | Publication Date |
---|---|
CN117058512A true CN117058512A (en) | 2023-11-14 |
Family
ID=88653336
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202311083560.3A Pending CN117058512A (en) | 2023-08-25 | 2023-08-25 | Multi-mode data fusion method and system based on traffic big model |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117058512A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN119328777A (en) * | 2024-12-23 | 2025-01-21 | 浙江有鹿机器人科技有限公司 | Data-driven robot closed-loop joint optimization method and system |
CN119379816A (en) * | 2024-12-27 | 2025-01-28 | 浙江大华技术股份有限公司 | Camera calibration method, device and storage medium based on radar matching |
-
2023
- 2023-08-25 CN CN202311083560.3A patent/CN117058512A/en active Pending
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN119328777A (en) * | 2024-12-23 | 2025-01-21 | 浙江有鹿机器人科技有限公司 | Data-driven robot closed-loop joint optimization method and system |
CN119328777B (en) * | 2024-12-23 | 2025-05-13 | 浙江有鹿机器人科技有限公司 | Robot closed loop joint optimization method and system based on data driving |
CN119379816A (en) * | 2024-12-27 | 2025-01-28 | 浙江大华技术股份有限公司 | Camera calibration method, device and storage medium based on radar matching |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Nidamanuri et al. | A progressive review: Emerging technologies for ADAS driven solutions | |
Bai et al. | Infrastructure-based object detection and tracking for cooperative driving automation: A survey | |
Javadi et al. | Vehicle speed measurement model for video-based systems | |
Datondji et al. | A survey of vision-based traffic monitoring of road intersections | |
McCall et al. | Video-based lane estimation and tracking for driver assistance: survey, system, and evaluation | |
Loce et al. | Computer vision in roadway transportation systems: a survey | |
Beck et al. | Automated vehicle data pipeline for accident reconstruction: New insights from LiDAR, camera, and radar data | |
CN117058512A (en) | Multi-mode data fusion method and system based on traffic big model | |
Kar et al. | Real-time traffic estimation at vehicular edge nodes | |
Rezaei et al. | Computer vision for driver assistance | |
Mithun et al. | Video-based tracking of vehicles using multiple time-spatial images | |
CN117133122A (en) | Traffic situation awareness prediction method and system based on multi-mode traffic big model | |
Bai et al. | Cyber mobility mirror: A deep learning-based real-world object perception platform using roadside LiDAR | |
Pravallika et al. | Deep learning frontiers in 3D object detection: a comprehensive review for autonomous driving | |
Lashkov et al. | Edge-computing-empowered vehicle tracking and speed estimation against strong image vibrations using surveillance monocular camera | |
Xiong et al. | Fast and robust approaches for lane detection using multi‐camera fusion in complex scenes | |
Ding et al. | A comprehensive approach for road marking detection and recognition | |
Biswas et al. | Detection of traffic rule violations using machine learning: An analytical review | |
Kanhere | Vision-based detection, tracking and classification of vehicles using stable features with automatic camera calibration | |
Sala et al. | Measuring traffic lane‐changing by converting video into space–time still images | |
CN118918540A (en) | Vehicle identification method, device, electronic equipment and readable storage medium | |
Wu et al. | Detection of moving violations | |
Ahmed et al. | Rapid Safety Assessment Tool for Non-Conventional Roadway Designs and Emerging Technologies: Innovative Artificial Intelligence Application | |
Ge | A spatial-temporal-map-based traffic video analytic model for large-scale cloud-based deployment | |
CN117523824A (en) | Road side parking scene detection method and device based on multispectral sensing |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |