CN118590489A

CN118590489A - A method for transmitting data fusion features of heterogeneous devices in Internet of Vehicles to the cloud

Info

Publication number: CN118590489A
Application number: CN202410818850.6A
Authority: CN
Inventors: 余晓霞; 侯胜强; 李鑫; 张志刚; 刘欢
Original assignee: Chongqing University of Technology
Current assignee: Chongqing University Of Technology & Tsinghua Automotive Research Institute & Linktron Measurement And Control Technology Co ltd; Chongqing University of Technology
Priority date: 2024-06-24
Filing date: 2024-06-24
Publication date: 2024-09-03
Anticipated expiration: 2044-06-24
Also published as: CN118590489B

Abstract

The present invention provides a method for transmitting data fusion features of heterogeneous devices in an Internet of Vehicles to the cloud. By combining a multi-source heterogeneous data fusion algorithm, feature extraction and unsupervised fusion processing are performed on heterogeneous data such as text data, vibration data and audio data, and the fusion is reduced to fusion feature data with low data volume for transmission to the cloud; the fusion feature data of the vehicle-mounted edge device is analyzed and converted in the cloud, restored to the preset text data, vibration data and audio data format, and the working data of the vehicle-mounted edge device is stored, and corresponding device identifiers and corresponding data models are respectively set for different vehicle-mounted edge devices. The method of the present invention can reduce the amount of data transmitted when the heterogeneous data of the vehicle-mounted edge device is transmitted to the cloud, save the network bandwidth for transmission to the cloud, reduce its network load, help improve the data interaction efficiency within the Internet of Vehicles system, and enhance the overall perception ability of the Internet of Vehicles system to vehicle status information.

Description

A method for transmitting data fusion features of heterogeneous devices in Internet of Vehicles to the cloud

技术领域Technical Field

本发明涉及物联网环境下异构设备数据输入和处理领域技术，尤其涉及一种车联网异构设备数据融合特征传输上云的方法。The present invention relates to the field of data input and processing of heterogeneous devices in an Internet of Things environment, and in particular to a method for transmitting data fusion features of heterogeneous devices in an Internet of Vehicles to the cloud.

背景技术Background Art

随着科技的不断进步，物联网(Internet of Things，IoT)已经成为连接世界的重要桥梁，将各种设备和系统通过互联网进行智能化连接。而在这个数字化时代，车联网(Connected Vehicles)作为物联网的重要应用领域之一，正日益受到关注和重视。车联网技术将汽车与互联网相连接，不仅使驾驶更加智能化和便捷化，还提供了更安全、高效和智能的交通解决方案。在车联网的发展中，各种传感器、通信技术和数据分析技术的结合，为车辆提供了更多的智能功能，从而推动了整个交通系统的升级和变革。With the continuous advancement of science and technology, the Internet of Things (IoT) has become an important bridge connecting the world, connecting various devices and systems intelligently through the Internet. In this digital age, connected vehicles (Connected Vehicles), as one of the important application areas of the Internet of Things, is gaining increasing attention and importance. Connected Vehicles technology connects cars to the Internet, which not only makes driving more intelligent and convenient, but also provides safer, more efficient and intelligent transportation solutions. In the development of connected vehicles, the combination of various sensors, communication technologies and data analysis technologies has provided vehicles with more intelligent functions, thereby promoting the upgrade and transformation of the entire transportation system.

在车联网领域，车辆产生的数据不仅包括传统的车载传感器数据，还涵盖了来自多个源头的多源异构数据，如音频、振动和文本数据。这些数据可能来自于车辆上的不同器件设备，可以称为车载边端设备，而对这些车载边端设备相关数据的综合利用，对于实现更智能、更安全的交通系统至关重要。音频数据可以用于识别车辆周围的环境声音，振动数据可以帮助监测车辆的状态和健康状况，而文本数据则可以包含来自车载系统、导航应用和通讯设备的信息。通过有效整合和分析这些多源异构数据，我们可以实现更精准的车辆健康监测、更智能的驾驶辅助系统以及更高效的交通管理。这种综合利用不同类型的数据，将为车联网技术的发展带来更多可能性，推动智能交通系统向着更智能化、更可持续的方向发展。In the field of Internet of Vehicles, the data generated by vehicles includes not only traditional on-board sensor data, but also multi-source heterogeneous data from multiple sources, such as audio, vibration, and text data. These data may come from different devices on the vehicle, which can be called on-board edge devices. The comprehensive use of data related to these on-board edge devices is crucial to achieving a smarter and safer transportation system. Audio data can be used to identify the ambient sound around the vehicle, vibration data can help monitor the status and health of the vehicle, and text data can contain information from on-board systems, navigation applications, and communication devices. By effectively integrating and analyzing these multi-source heterogeneous data, we can achieve more accurate vehicle health monitoring, smarter driver assistance systems, and more efficient traffic management. This comprehensive use of different types of data will bring more possibilities to the development of Internet of Vehicles technology and promote the development of intelligent transportation systems in a smarter and more sustainable direction.

随着车联网规模的扩大，车辆传感器种类的丰富，采集的数据量也呈指数级增长，预计将超过每秒40Gbit的数据量。这个量级的数据如果直接通过通信系统传输将造成严重的拥塞，并且异构设备数据难以高效管理，车联网需要有一个更高效的异构设备数据或融合特征传输上云的体系和方法。As the scale of the Internet of Vehicles expands and the types of vehicle sensors increase, the amount of data collected is also growing exponentially, and is expected to exceed 40Gbit per second. If this amount of data is transmitted directly through the communication system, it will cause serious congestion, and heterogeneous device data will be difficult to manage efficiently. The Internet of Vehicles needs a more efficient system and method for transmitting heterogeneous device data or fusion features to the cloud.

目前车辆传感器主要采用独立工作的方式，如利用摄像头进行图像识别，雷达进行速度和距离探测，车辆尚缺乏多种传感器数据有效深度融合的能力。车联网需要将多源异构数据进行融合以减少传输数据量、节省网络带宽、降低网络负载以及提高车辆整体感知能力。At present, vehicle sensors mainly work independently, such as using cameras for image recognition and radars for speed and distance detection. Vehicles still lack the ability to effectively integrate multiple sensor data. The Internet of Vehicles needs to integrate multi-source heterogeneous data to reduce the amount of transmitted data, save network bandwidth, reduce network load, and improve the overall perception ability of the vehicle.

传统的集中式运算和处理方式已经无法满足车联网的需求，尽管云端集中式处理和计算能力强大，但其通信传输性能与计算性能并不匹配，容易导致数据拥堵问题。为了应对这些挑战，车联网需要向更分布式的计算模式转变，采用边缘计算的方式进行数据预处理，减少数据传输量，提高实时性和效率。然而，相较于传感器数据量增长的速度，边缘采集计算装置的发展相对缓慢，单一车辆的边缘采集计算装置难以应对数据融合处理的巨大挑战。为了解决这一问题，车联网需要将车端、路边单元、基站等设备计算资源进行数据共享和资源共享。这种协同计算模式能够有效应对海量数据处理的需求，通过将数据处理任务分配到多个边缘采集计算装置，实现数据的高效处理和分析，从而确保车联网系统能够高效运行并提供更智能化的交通解决方案。Traditional centralized computing and processing methods can no longer meet the needs of the Internet of Vehicles. Although the centralized processing and computing capabilities of the cloud are powerful, their communication transmission performance does not match the computing performance, which easily leads to data congestion problems. In order to meet these challenges, the Internet of Vehicles needs to shift to a more distributed computing mode, using edge computing to pre-process data, reduce data transmission volume, and improve real-time performance and efficiency. However, compared with the growth rate of sensor data volume, the development of edge acquisition computing devices is relatively slow, and the edge acquisition computing devices of a single vehicle are difficult to cope with the huge challenges of data fusion processing. In order to solve this problem, the Internet of Vehicles needs to share data and resources among the computing resources of equipment such as vehicle terminals, roadside units, and base stations. This collaborative computing mode can effectively meet the needs of massive data processing. By allocating data processing tasks to multiple edge acquisition computing devices, efficient data processing and analysis can be achieved, thereby ensuring that the Internet of Vehicles system can operate efficiently and provide more intelligent traffic solutions.

发明内容Summary of the invention

针对上述现有技术的不足，本发明提供了一种车联网异构设备数据融合特征传输上云的方法，用以使得车载边端设备的异构数据在上云传输的传输数据量得以减少，节省上云传输的网络带宽，降低其网络负载，进而帮助提高车联网系统内的数据交互效率，提升车联网系统对车辆状态信息的整体感知能力。In view of the deficiencies of the above-mentioned prior art, the present invention provides a method for transmitting the data fusion features of heterogeneous devices in the Internet of Vehicles to the cloud, so as to reduce the amount of data transmitted to the cloud for heterogeneous data of vehicle-mounted edge devices, save network bandwidth for transmission to the cloud, reduce its network load, and thereby help improve the data interaction efficiency within the Internet of Vehicles system and enhance the overall perception capability of the Internet of Vehicles system of vehicle status information.

为解决上述技术问题，本发明采用了如下的技术方案：In order to solve the above technical problems, the present invention adopts the following technical solutions:

一种车联网异构设备数据融合特征传输上云的方法，包括如下步骤：A method for transmitting data fusion features of heterogeneous devices in an Internet of Vehicles to the cloud comprises the following steps:

S1、在云端构建物联网中各车载边端设备对应的设备标识和数据模型；所述车载边端设备对应的设备模型用于对车载边端设备个体身份进行标识；所述车载边端设备对应的数据模型用于记录相应车载边端设备的工作数据；S1. Construct the device identification and data model corresponding to each vehicle-mounted edge device in the Internet of Things in the cloud; the device model corresponding to the vehicle-mounted edge device is used to identify the individual identity of the vehicle-mounted edge device; the data model corresponding to the vehicle-mounted edge device is used to record the working data of the corresponding vehicle-mounted edge device;

S2、在预设数据采集区域布设边缘采集计算装置，每个边缘采集计算装置能够与多个车载边端设备建立数据通信连接，且在边缘采集计算装置与云端之间构建数据传输通道，使得车载边端设备采集的工作数据能够通过边缘采集计算装置传输至云端；S2. Deploy edge collection and computing devices in the preset data collection area. Each edge collection and computing device can establish a data communication connection with multiple vehicle-mounted edge devices, and build a data transmission channel between the edge collection and computing device and the cloud, so that the working data collected by the vehicle-mounted edge device can be transmitted to the cloud through the edge collection and computing device;

S3、车载边端设备采集其工作数据，并传输至其建立数据通信连接的边缘采集计算装置；所述工作数据包括车载边端设备的文本数据、振动数据和音频数据；S3, the vehicle-mounted edge device collects its working data and transmits it to the edge collection and computing device with which it establishes a data communication connection; the working data includes text data, vibration data and audio data of the vehicle-mounted edge device;

S4、边缘采集计算装置对车载边端设备的工作数据中的文本数据、振动数据和音频数据进行特征提取和无监督融合处理，得到车载边端设备的融合特征数据，并通过数据传输通道上传至云端；S4. The edge acquisition computing device performs feature extraction and unsupervised fusion processing on the text data, vibration data and audio data in the working data of the vehicle-mounted edge device to obtain the fused feature data of the vehicle-mounted edge device, and uploads it to the cloud through the data transmission channel;

S5、云端对车载边端设备的融合特征数据进行解析转换处理，得到的车载边端设备的文本恢复数据、振动恢复数据和音频恢复数据，作为工作解析数据存储至车载边端设备对应的数据模型中。S5. The cloud side parses and converts the fused feature data of the vehicle-mounted edge device, and obtains the text recovery data, vibration recovery data, and audio recovery data of the vehicle-mounted edge device, which are stored as working parsed data in the data model corresponding to the vehicle-mounted edge device.

作为优选方案，所述步骤S1中，车载边端设备对应的数据模型中设置了对车载边端设备整机及其部件的数据存储空间，用于分别存储车载边端设备的整机工作数据及其部件的工作数据。As a preferred solution, in step S1, a data storage space for the vehicle-mounted edge device and its components is set in the data model corresponding to the vehicle-mounted edge device, which is used to store the working data of the vehicle-mounted edge device and the working data of its components respectively.

作为优选方案，所述步骤S2包括：As a preferred solution, step S2 includes:

S201、在云端与边缘采集计算装置之间建立设备接入网关，每个设备接入网关用以作为一个或多个边缘采集计算装置的数据接入点；S201, establishing a device access gateway between the cloud and the edge collection and computing device, each device access gateway is used as a data access point for one or more edge collection and computing devices;

S202、在云端建立数据接入通道，用以作为云端与设备接入网关之间的数据传输通道，进而使得边缘采集计算装置采集到的车载边端设备的工作数据能够通过设备接入网关和数据接入通道上传至云端。S202. Establish a data access channel in the cloud to serve as a data transmission channel between the cloud and the device access gateway, so that the working data of the vehicle-mounted edge device collected by the edge collection and computing device can be uploaded to the cloud through the device access gateway and the data access channel.

作为优选方案，所述步骤S4包括：As a preferred solution, step S4 includes:

S401、车载边端设备与边缘采集计算装置进行数据通信，通过边缘采集计算装置采集车载边端设备的工作数据；S401, the vehicle-mounted edge device communicates data with the edge collection and computing device, and the edge collection and computing device collects working data of the vehicle-mounted edge device;

S402、边缘采集计算装置对所述工作数据中的文本数据进行分词、分句、去停用词的预处理；S402, the edge collection and computing device performs preprocessing of word segmentation, sentence segmentation, and stop word removal on the text data in the working data;

S403、边缘采集计算装置对预处理后的文本数据进行词嵌入特征提取，得到文本数据的词特征向量；S403, the edge collection and calculation device performs word embedding feature extraction on the preprocessed text data to obtain a word feature vector of the text data;

S404、边缘采集计算装置对工作数据中的振动数据和音频数据按时序进行切片分割处理，得到振动数据切片向量和音频数据切片向量；S404, the edge acquisition and calculation device performs slicing and segmenting processing on the vibration data and the audio data in the working data in time sequence to obtain a vibration data slice vector and an audio data slice vector;

S405、边缘采集计算装置对文本数据的词特征向量、振动数据切片向量和音频数据切片向量进行无监督降维融合处理，得到车载边端设备的融合特征数据。S405. The edge acquisition and computing device performs unsupervised dimensionality reduction and fusion processing on the word feature vectors of the text data, the vibration data slice vectors and the audio data slice vectors to obtain fused feature data of the vehicle-mounted edge device.

作为优选方案，所述步骤S401中，车载边端设备与能够建立数据通信的边缘采集计算装置进行数据连接，获取边缘采集计算装置的数据处理资源占用率，并选择数据处理资源占用率小于预设资源占用阈值的一个边缘采集计算装置，上传车载边端设备的工作数据。As a preferred solution, in step S401, the vehicle-mounted edge device establishes a data connection with an edge acquisition and computing device that can establish data communication, obtains the data processing resource occupancy rate of the edge acquisition and computing device, and selects an edge acquisition and computing device whose data processing resource occupancy rate is less than a preset resource occupancy threshold to upload the working data of the vehicle-mounted edge device.

作为优选方案，所述步骤S402中，边缘采集计算装置采用预先进行文本预处理训练的Bert模型对工作数据中的文本数据进行分词、分句、去停用词的预处理。As a preferred solution, in step S402, the edge acquisition computing device uses a Bert model that has been pre-trained for text preprocessing to perform word segmentation, sentence segmentation, and stop word removal on the text data in the working data.

作为优选方案，所述步骤S403中，边缘采集计算装置使用预先训练的Bert-base-chinese模型对于中文的文本数据进行词嵌入特征提取，使用预先训练的Bert-base-uncased模型对于英文的文本数据进行词嵌入特征提取；所得到的文本数据的词特征向量表征为维的矩阵向量，文本数据中的单个句子表征为n_i×N_x维向量，单个字符表征为1×N_x维向量，N_x表示特征提取的特征维数，n_i表示文本数据中第i句的字符数，m表示文本数据进行分句后获得的句子数，从而使得词特征向量中不仅包含了文本数据中全局文字的字符信息和语义信息，还包含了各个句子和字符的位置信息。As a preferred solution, in step S403, the edge acquisition computing device uses a pre-trained Bert-base-chinese model to perform word embedding feature extraction on Chinese text data, and uses a pre-trained Bert-base-uncased model to perform word embedding feature extraction on English text data; the word feature vector representation of the obtained text data is dimensional matrix vector, a single sentence in the text data is represented as an n _i ×N _x- dimensional vector, and a single character is represented as a 1×N _x- dimensional vector, N _x represents the feature dimension of feature extraction, n _i represents the number of characters in the i-th sentence in the text data, and m represents the number of sentences obtained after the text data is segmented. As a result, the word feature vector contains not only the character information and semantic information of the global text in the text data, but also the position information of each sentence and character.

作为优选方案，所述步骤S404中，边缘采集计算装置分别读取工作数据中的振动数据和音频数据并进行归一化处理，然后按照预设的特征提取的特征维数N_x，对振动数据和音频数据进行切片分割处理；得到的振动数据切片向量表征为x×N_x维向量，音频数据切片向量表征为y×N_x维向量，x、y分别表示振动数据和音频数据的切片分割片段数，从而使得振动数据切片向量和音频数据切片向量中不仅分别包含了振动数据和音频数据的幅值特征信息，还包含了各个切片分割片段的位置信息。As a preferred solution, in step S404, the edge acquisition and computing device reads the vibration data and audio data in the working data respectively and performs normalization processing, and then slices and divides the vibration data and audio data according to the feature dimension _Nx of the preset feature extraction; the obtained vibration data slice vector is represented as an x×N _x -dimensional vector, and the audio data slice vector is represented as a y×N _x -dimensional vector, where x and y represent the number of slice division segments of the vibration data and the audio data respectively, so that the vibration data slice vector and the audio data slice vector not only contain the amplitude feature information of the vibration data and the audio data, but also contain the position information of each slice division segment.

作为优选方案，所述步骤S405中，边缘采集计算装置先将文本数据的词特征向量、振动数据切片向量和音频数据切片向量进行拼接融合处理，得到工作数据特征矩阵向量，然后再将工作数据特征矩阵向量输入至经过预先训练的Autoencoder自编码模型进行无监督降维编码处理，得到车载边端设备的融合特征数据；其中，工作数据特征矩阵向量表征为维的矩阵向量，N_x表示特征提取的特征维数，n_i表示文本数据中第i句的字符数，m表示文本数据进行分句后获得的句子数，x、y分别表示振动数据和音频数据的切片分割片段数。As a preferred solution, in step S405, the edge acquisition computing device first performs splicing and fusion processing on the word feature vectors of the text data, the vibration data slice vectors and the audio data slice vectors to obtain the working data feature matrix vectors, and then inputs the working data feature matrix vectors into the pre-trained Autoencoder self-encoding model for unsupervised dimensionality reduction encoding processing to obtain the fused feature data of the vehicle-mounted edge device; wherein the working data feature matrix vector is represented as dimensional matrix vector, _Nx represents the feature dimension of feature extraction, _ni represents the number of characters in the i-th sentence in the text data, m represents the number of sentences obtained after the text data is segmented, and x and y represent the number of slice segmentation fragments of vibration data and audio data respectively.

作为优选方案，步骤S5中，云端先将车载边端设备的融合特征数据进行Autoencoder自编码的解编码处理，解析得到维的车载边端设备的工作数据特征矩阵向量，然后将所述工作数据特征矩阵向量进行分解得到维的文本数据的词特征向量、x×N_x维的振动数据切片向量和y×N_x维的音频数据切片向量，依据文本数据的词特征向量、振动数据切片向量和音频数据切片向量中包含的位置信息分别对文本数据的词特征以及振动数据和音频数据的切片分割片段按位置顺序进行特征重组，再将各自的重组结果分别按照预设的文本数据格式、振动数据格式和音频数据格式进行数据格式转换处理，得到的车载边端设备的文本恢复数据、振动恢复数据和音频恢复数据，最后将车载边端设备的文本恢复数据、振动恢复数据和音频恢复数据作为相应车载边端设备的工作解析数据，存储至相应车载边端设备对应的数据模型中。As a preferred solution, in step S5, the cloud first decodes the fused feature data of the vehicle-mounted edge device through the Autoencoder self-encoding process to obtain The working data feature matrix vector of the vehicle-mounted edge device is then decomposed to obtain dimensional word feature vectors of text data, x×N _x -dimensional vibration data slice vectors, and y×N _x- dimensional audio data slice vectors, and according to the position information contained in the word feature vectors of the text data, the vibration data slice vectors, and the audio data slice vectors, feature reorganization is performed on the word features of the text data and the slice segments of the vibration data and the audio data in position order, and then the respective reorganization results are converted into data formats according to the preset text data format, vibration data format, and audio data format to obtain text recovery data, vibration recovery data, and audio recovery data of the vehicle-mounted edge device, and finally the text recovery data, vibration recovery data, and audio recovery data of the vehicle-mounted edge device are used as the working parsing data of the corresponding vehicle-mounted edge device, and stored in the data model corresponding to the corresponding vehicle-mounted edge device.

相比于现有技术，本发明的有益效果在于：Compared with the prior art, the present invention has the following beneficial effects:

1、本发明提出车联网异构设备数据融合特征传输上云的方法，针对车载边端设备的异构数据量大、易造成上云传输通信拥塞等问题，通过结合多源异构数据融合算法，对文本数据、振动数据和音频数据等异构数据进行特征提取和无监督融合处理，将其融合降维为低数据量的融合特征数据进行上云传输，节省上云传输的网络带宽，降低其网络负载。1. The present invention proposes a method for transmitting the data fusion features of heterogeneous devices in the Internet of Vehicles to the cloud. In view of the problems that the amount of heterogeneous data of vehicle-mounted edge devices is large and easily causes communication congestion in cloud transmission, the method combines a multi-source heterogeneous data fusion algorithm to extract features and perform unsupervised fusion processing on heterogeneous data such as text data, vibration data and audio data, and then fuses and reduces their dimensions into fused feature data with low data volume for transmission to the cloud, thereby saving network bandwidth for cloud transmission and reducing its network load.

2、本发明的车联网异构设备数据融合特征传输上云的方法中，在云端又对车载边端设备的融合特征数据进行解析转换处理，恢复为预设的文本数据、振动数据和音频数据格式，对车载边端设备的工作数据进行存储，并且针对不同的车载边端设备分别设置有对应的设备标识和相应的数据模型，便于对存储的文本、音频、振动等异构数据进行车载边端设备的区分，为后续设备运维等应用提供数据支持。2. In the method for transmitting the fusion feature data of heterogeneous devices in the Internet of Vehicles to the cloud of the present invention, the fusion feature data of the vehicle-mounted edge device is analyzed and converted in the cloud and restored to the preset text data, vibration data and audio data format, and the working data of the vehicle-mounted edge device is stored. In addition, corresponding device identifiers and corresponding data models are respectively set for different vehicle-mounted edge devices, which facilitates the differentiation of vehicle-mounted edge devices for the stored heterogeneous data such as text, audio, vibration, etc., and provides data support for subsequent equipment operation and maintenance and other applications.

3、本发明方法能够使得车载边端设备的异构数据在上云传输的传输数据量得以减少，节省上云传输的网络带宽，降低其网络负载，同时还能够很好的保证云端对车载边端设备以后数据的存储完整性，进而帮助提高车联网系统内的数据交互效率，提升车联网系统对车辆状态信息的整体感知能力，具有很好的技术推广应用价值。3. The method of the present invention can reduce the amount of heterogeneous data of vehicle-mounted edge devices when transmitting them to the cloud, save network bandwidth for cloud transmission, and reduce its network load. At the same time, it can also well ensure the storage integrity of future data of vehicle-mounted edge devices in the cloud, thereby helping to improve the data interaction efficiency within the Internet of Vehicles system and enhance the overall perception capability of the Internet of Vehicles system of vehicle status information, and has great value for technology promotion and application.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

为了使发明的目的、技术方案和优点更加清楚，下面将结合附图对本发明作进一步的详细描述，其中：In order to make the purpose, technical solution and advantages of the invention more clear, the present invention will be further described in detail below with reference to the accompanying drawings, in which:

图1是本发明车联网异构设备数据融合特征传输上云的方法的流程图。FIG1 is a flow chart of a method for transmitting data fusion features of heterogeneous devices in Internet of Vehicles to the cloud according to the present invention.

图2是本发明车联网异构设备数据融合特征传输上云的方法的一个应用实例的整体工作示意图。FIG2 is an overall working diagram of an application example of the method for transmitting data fusion features of heterogeneous devices in Internet of Vehicles to the cloud according to the present invention.

图3是本发明实施例中构建的设备模型的示意图。FIG. 3 is a schematic diagram of a device model constructed in an embodiment of the present invention.

图4是本发明实施例中音频、文本、振动数据的特征融合算法原理示意图。FIG. 4 is a schematic diagram showing the principle of a feature fusion algorithm for audio, text, and vibration data in an embodiment of the present invention.

图5是本发明实施例中Bert模型提取句子数据的原理示意图。FIG5 is a schematic diagram showing the principle of extracting sentence data using the Bert model in an embodiment of the present invention.

图6是本发明实施例中AE模型的结构示意图。FIG. 6 is a schematic diagram of the structure of the AE model in an embodiment of the present invention.

具体实施方式DETAILED DESCRIPTION

为使本发明实施例的目的、技术方案和优点更加清楚，下面将结合本发明实施例中的附图，对本发明实施例中的技术方案进行清楚、完整地描述。显然，所描述的实施例是本发明一部分实施例，而不是全部的实施例。通常在此处附图中描述和示出的本发明实施例的组件可以通过各种不同的配置来布置和设计。因此，以下对在附图中提供的本发明的实施例的详细描述并非旨在限制要求保护的本发明的范围，而是仅表示本发明的选定实施例。基于本发明中的实施例，本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例都属于本发明保护的范围。In order to make the purpose, technical scheme and advantages of the embodiments of the present invention clearer, the technical scheme in the embodiments of the present invention will be clearly and completely described below in conjunction with the drawings in the embodiments of the present invention. Obviously, the described embodiments are part of the embodiments of the present invention, not all of the embodiments. The components of the embodiments of the present invention generally described and shown in the drawings here can be arranged and designed in various different configurations. Therefore, the following detailed description of the embodiments of the present invention provided in the drawings is not intended to limit the scope of the claimed invention, but only represents selected embodiments of the present invention. Based on the embodiments in the present invention, all other embodiments obtained by ordinary technicians in this field without making creative work belong to the scope of protection of the present invention.

针对现有技术中存在的不足，本发明主要着眼解决以下问题：In view of the deficiencies in the prior art, the present invention mainly aims to solve the following problems:

(1)随着物联网和传感技术的发展，采集的数据量呈指数级增长，直接通过通信系统传输容易造成上云传输通信拥塞，并且异构设备数据难以高效管理，人为操作难度高。(1) With the development of the Internet of Things and sensor technology, the amount of collected data is growing exponentially. Direct transmission through the communication system is likely to cause cloud transmission communication congestion. In addition, heterogeneous device data is difficult to manage efficiently and is difficult to operate manually.

(2)多源异构设备数据由于数据量大、数据结构不统一，导致多源异构设备数据利用不充分，造成车联网系统对车载设备的整体感知能力欠佳。(2) Due to the large amount of data and the non-uniform data structure, the data from multi-source heterogeneous devices is not fully utilized, resulting in poor overall perception of the vehicle-mounted devices by the Internet of Vehicles system.

(3)车载边端设备自身的处理计算能力不足，难以满足较复杂的数据融合处理的计算能力要求。(3) The processing and computing power of the on-board edge equipment is insufficient and it is difficult to meet the computing power requirements for more complex data fusion processing.

针对于上述技术问题，本发明提供了一种车联网异构设备数据融合特征传输上云的方法，其流程如图1所示，包括如下步骤：In view of the above technical problems, the present invention provides a method for transmitting data fusion features of heterogeneous devices in an Internet of Vehicles to the cloud, the process of which is shown in FIG1 and includes the following steps:

本发明提出车联网异构设备数据融合特征传输上云的方法，针对车载边端设备的异构数据量大、易造成上云传输通信拥塞等问题，通过结合多源异构数据融合算法，对文本数据、振动数据和音频数据等异构数据进行特征提取和无监督融合处理，将其融合降维为低数据量的融合特征数据进行上云传输，节省上云传输的网络带宽，降低其网络负载；另一方面，在云端又对车载边端设备的融合特征数据进行解析转换处理，恢复为预设的文本数据、振动数据和音频数据格式，对车载边端设备的工作数据进行存储，并且针对不同的车载边端设备分别设置有对应的设备标识和相应的数据模型，便于对存储的文本、音频、振动等异构数据进行车载边端设备的区分，为后续设备运维等应用提供数据支持。可见，本发明方法能够使得车载边端设备的异构数据在上云传输的传输数据量得以减少，节省上云传输的网络带宽，降低其网络负载，同时还能够很好的保证云端对车载边端设备以后数据的存储完整性，进而帮助提高车联网系统内的数据交互效率，提升车联网系统对车辆状态信息的整体感知能力。The present invention proposes a method for transmitting the fusion features of data of heterogeneous devices in the Internet of Vehicles to the cloud. In view of the problems that the amount of heterogeneous data of vehicle-mounted edge devices is large and easily causes communication congestion in cloud transmission, the present invention combines a multi-source heterogeneous data fusion algorithm to extract features and perform unsupervised fusion processing on heterogeneous data such as text data, vibration data and audio data, and fusion and reduce their dimensions into fused feature data with low data volume for transmission to the cloud, thereby saving network bandwidth for cloud transmission and reducing its network load. On the other hand, the fused feature data of the vehicle-mounted edge devices is parsed and converted in the cloud and restored to the preset text data, vibration data and audio data format, and the working data of the vehicle-mounted edge devices is stored. In addition, corresponding device identifiers and corresponding data models are respectively set for different vehicle-mounted edge devices, so as to facilitate the distinction of vehicle-mounted edge devices for stored heterogeneous data such as text, audio, vibration, etc., and provide data support for subsequent equipment operation and maintenance applications. It can be seen that the method of the present invention can reduce the amount of heterogeneous data of vehicle-mounted edge devices when transmitting to the cloud, save network bandwidth for cloud transmission, and reduce its network load. At the same time, it can also well ensure the storage integrity of future data of vehicle-mounted edge devices in the cloud, thereby helping to improve the data interaction efficiency within the Internet of Vehicles system and enhance the overall perception capability of the Internet of Vehicles system of vehicle status information.

下面，对本发明的车联网异构设备数据融合特征传输上云的方法进行详细说明。Next, the method for transmitting data fusion features of heterogeneous devices in the Internet of Vehicles to the cloud of the present invention is described in detail.

图2示出了本发明车联网异构设备数据融合特征传输上云的方法在具体应用实施的一个实例工作示意图。下面也以这个示例来加以展示和展开说明。FIG2 shows a schematic diagram of an example of a method for transmitting data fusion features of heterogeneous devices in an Internet of Vehicles to the cloud in a specific application. This example is also used to illustrate and explain the method below.

具体实施时，在步骤S1中，设备标识有主要起到对车载边端设备身份的标识作用，其具体的数据形式可以是多样的，例如，可以是一个设备ID，也可以是一个可视化的设备模型。在目前的应用场景中，为了更好的满足对数据可视化的较高要求，可以优选采用可视化的设备模型来作为设备标识。在设备模型的构建中，可以基于对象构建规范，采用面向对象理念，实现异构设备对象基于数字化抽象模型快速构建车载边端设备的设备模型。设备模型作为边端设备的抽象层屏蔽了底层终端差异，标准化了设备的能力表达和交互方式，形成层次化的对象结构，为应用和其他服务提供标准化的支撑和保障。利用设备模型描绘出设备结构画像，利用数据模型描述设备数据图谱，可以在云端录入边端设备相关信息。During the specific implementation, in step S1, the device identification mainly plays the role of identifying the identity of the vehicle-mounted edge device, and its specific data form can be diverse, for example, it can be a device ID or a visual device model. In the current application scenario, in order to better meet the high requirements for data visualization, a visual device model can be preferably used as the device identification. In the construction of the device model, it is possible to build specifications based on objects and adopt object-oriented concepts to enable heterogeneous device objects to quickly build device models of vehicle-mounted edge devices based on digital abstract models. As an abstract layer of edge devices, the device model shields the underlying terminal differences, standardizes the device's capability expression and interaction methods, forms a hierarchical object structure, and provides standardized support and guarantee for applications and other services. Use the device model to depict the device structure portrait, and use the data model to describe the device data map, and you can enter edge device related information in the cloud.

在构建设备模型的具体实施中，为了提高模型复用效率，可以按设备模型、设备对象、设备部件进行分层建模。一个设备模型代表一类设备的数字化抽象模型，代表一个具体的设备型号；一个设备部件代表所关注、所监测的某一类部件。将一整个设备模型细分为多个子系统，每个单独的子系统又细分为更多的部件。层级建模通过设备部件描述该设备模型的结构画像；一个设备对象代表该设备模型的一个实例化对象该设备对象继承对应设备模型的所有特征。In the specific implementation of building a device model, in order to improve the efficiency of model reuse, hierarchical modeling can be performed according to device models, device objects, and device components. A device model represents a digital abstract model of a type of equipment and represents a specific device model; a device component represents a certain type of component that is of concern and monitored. The entire device model is subdivided into multiple subsystems, and each individual subsystem is further subdivided into more components. Hierarchical modeling describes the structural portrait of the device model through device components; a device object represents an instantiation of the device model. The device object inherits all the characteristics of the corresponding device model.

以汽车为例按设备模型、设备对象、设备部件在云端进行分层建模，录入相应的配置信息，完成设备注册。设备模型是对一类汽车的数字化抽象模型的表示，代表特定汽车型号；每个设备模型细分为多个子系统，如汽车发动机、变速器等；每个子系统由多个部件组成，如汽车传动主轴、轮胎等。设备对象代表设备模型的一个实例化对象，例如一辆带有唯一ID的汽车。Taking the car as an example, hierarchical modeling is performed in the cloud according to the device model, device object, and device components, and the corresponding configuration information is entered to complete the device registration. The device model is a digital abstract model of a type of car, representing a specific car model; each device model is subdivided into multiple subsystems, such as a car engine, transmission, etc.; each subsystem consists of multiple components, such as a car transmission spindle, tires, etc. The device object represents an instantiation of the device model, such as a car with a unique ID.

在设备模型的构建过程中，需先为设备模型构建数据图谱，可以定义该设备模型的所有采样数据集的集合。可以通过提供可视化的设备数据模型管理功能，通过自定义基础数据模型，实现对数据模型的结构化定义，最终将数据包模型与设备标识(上述的设备模型或设备ID等)进行绑定。数据集分整机级和部件级，每个数据集都需要自定义测点信息，数据类型可以包括振动信号、音频信息、文本信息等等。结构画像和数据图谱组成一个完整的设备模型。如图3所示，展示了以汽车A为例，为变速齿轮箱子系统构建的设备模型。In the process of building the device model, it is necessary to first build a data map for the device model, which can define the collection of all sampled data sets of the device model. By providing a visual device data model management function and customizing the basic data model, the structured definition of the data model can be realized, and finally the data packet model is bound to the device identifier (the above-mentioned device model or device ID, etc.). The data set is divided into the whole machine level and the component level. Each data set requires customized measurement point information. The data type can include vibration signals, audio information, text information, etc. The structural portrait and data map constitute a complete device model. As shown in Figure 3, the device model built for the speed change gear box system is shown, taking car A as an example.

具体实施时，在步骤S1中，车载边端设备对应的数据模型中设置了对车载边端设备整机及其部件的数据存储空间，用于分别存储车载边端设备的整机工作数据及其部件的工作数据。数据模型可以视为是对设备数据图谱的数据存储库，用以存储车载边端设备的振动信号、音频信息、文本信息等。In specific implementation, in step S1, a data storage space for the vehicle-mounted edge device and its components is set in the data model corresponding to the vehicle-mounted edge device, which is used to store the working data of the vehicle-mounted edge device and the working data of its components respectively. The data model can be regarded as a data repository for the device data map, which is used to store the vibration signal, audio information, text information, etc. of the vehicle-mounted edge device.

具体实施时，步骤S2包括：In specific implementation, step S2 includes:

在云端新建设备接入网关，每个网关即为设备的一个接入点(endpoint)，代表一个抽象的设备采集接入端，它可以管理数台实际的车载边端设备，是数据采集端与平台数据接入端的统一出入口。在具体进行相关信息的配置时，包括：网关配置，包含MQTT协议的配置，包括网关名，所属区域，所属组织、网关描述等；传输配置，包含传输的消息标识、传输的周期、传输延迟时间、数据过期时间、数据包大小等；数据编码配置，依据不同设备的需求，配置相应的编码格式，例如JSON、二进制、文本等；数据加密配置，依据不同设备的数据要求，可以对数据传输前进行加密处理，支持主流的AES和DES等对称算法，保证数据在发送到系统数据交换层时网络传输安全。A new device access gateway is built in the cloud. Each gateway is an access point (endpoint) of the device, representing an abstract device collection access terminal. It can manage several actual vehicle-mounted edge devices and is a unified entrance and exit between the data collection terminal and the platform data access terminal. When configuring the relevant information, it includes: gateway configuration, including the configuration of the MQTT protocol, including the gateway name, region, organization, gateway description, etc.; transmission configuration, including the message identifier of the transmission, the transmission cycle, the transmission delay time, the data expiration time, the data packet size, etc.; data encoding configuration, according to the needs of different devices, configure the corresponding encoding format, such as JSON, binary, text, etc.; data encryption configuration, according to the data requirements of different devices, data can be encrypted before transmission, supporting mainstream symmetric algorithms such as AES and DES, to ensure the network transmission security when the data is sent to the system data exchange layer.

在云端新建数据接入通道时，每个网关下可构建多个数据通道。每个通道可以对应一个数据模型，例如构建四个数据接入通道，具体设计时可以分别对应四个数据模型：构建汽车整机录音数据模型，包括时间、音频采样数据两个数据项；汽车整机日志数据模型，包括时间、内容两个数据项；齿轮箱传动轴振动数据模型，包括时间，振动采样数据两个数据项；汽车音频、文本振动数据融合特征数据模型，包括时间融合特征两个数据项。等等。When creating a new data access channel in the cloud, multiple data channels can be built under each gateway. Each channel can correspond to a data model. For example, if four data access channels are built, four data models can be corresponding to them in the specific design: building a car recording data model, including two data items: time and audio sampling data; building a car log data model, including two data items: time and content; building a gearbox transmission shaft vibration data model, including two data items: time and vibration sampling data; building a car audio and text vibration data fusion feature data model, including two data items: time fusion feature. And so on.

具体实施时，步骤S3中，车载边端设备采集的工作数据包括车载边端设备的文本数据、振动数据和音频数据等。以汽车为例，某型号汽车采集的工作数据，可以包括其整机日志数据log.txt(文本数据)、整机录音数据audio.wav(音频数据)以及齿轮箱振动数据vib.dat(振动数据)；对于具体参数示例而言，例如，采集录音数据和振动数据的采样频率为10240Hz，日志数据的文本内容为“变速器齿轮箱传动轴故障，异响严重”。In specific implementation, in step S3, the working data collected by the vehicle-mounted edge device includes text data, vibration data, and audio data of the vehicle-mounted edge device. Taking a car as an example, the working data collected by a certain model of car may include its whole machine log data log.txt (text data), whole machine recording data audio.wav (audio data), and gearbox vibration data vib.dat (vibration data); for a specific parameter example, for example, the sampling frequency of collecting recording data and vibration data is 10240Hz, and the text content of the log data is "transmission gearbox drive shaft failure, serious abnormal noise".

具体实施时，步骤S4的流程图如图4所示，包括以下步骤：In specific implementation, the flowchart of step S4 is shown in FIG4 , which includes the following steps:

在具体实施中，可以设计边缘采集计算装置包括通信单元、文本预处理单元、文本特征提取单元、时序切片处理单元和融合处理单元。其中，通信单元用于与车载边端设备进行数据通信，采集车载边端设备的工作数据；文本预处理单元用于对所述工作数据中的文本数据进行分词、分句、去停用词的预处理；文本特征提取单元用于对预处理后的文本数据进行词嵌入特征提取，得到文本数据的词特征向量；时序切片处理单元用于对工作数据中的振动数据和音频数据进行时序解析，并按时序进行切片分割处理，得到振动数据切片向量和音频数据切片向量；融合处理单元用于对文本数据的词特征向量、振动数据切片向量和音频数据切片向量进行无监督融合降维处理，得到车载边端设备的融合特征数据。In a specific implementation, the edge acquisition computing device can be designed to include a communication unit, a text preprocessing unit, a text feature extraction unit, a time series slicing processing unit and a fusion processing unit. Among them, the communication unit is used to communicate data with the vehicle-mounted edge device and collect the working data of the vehicle-mounted edge device; the text preprocessing unit is used to perform word segmentation, sentence segmentation, and stop word removal preprocessing on the text data in the working data; the text feature extraction unit is used to perform word embedding feature extraction on the preprocessed text data to obtain the word feature vector of the text data; the time series slicing processing unit is used to perform time series analysis on the vibration data and audio data in the working data, and perform slicing and segmentation processing according to the time series to obtain the vibration data slice vector and the audio data slice vector; the fusion processing unit is used to perform unsupervised fusion and dimensionality reduction processing on the word feature vector of the text data, the vibration data slice vector and the audio data slice vector to obtain the fusion feature data of the vehicle-mounted edge device.

具体实施时，步骤S4中步骤S401的细节为：车载边端设备与能够建立数据通信的边缘采集计算装置进行数据连接，获取边缘采集计算装置的数据处理资源占用率，并选择数据处理资源占用率小于预设资源占用阈值的一个边缘采集计算装置，上传车载边端设备的工作数据。During specific implementation, the details of step S401 in step S4 are as follows: the vehicle-mounted edge device establishes data connection with the edge collection and computing device that can establish data communication, obtains the data processing resource occupancy rate of the edge collection and computing device, and selects an edge collection and computing device whose data processing resource occupancy rate is less than a preset resource occupancy threshold, and uploads the working data of the vehicle-mounted edge device.

例如，汽车A判断感知数据量过大，通过4G通信模块与路边单元进行通信，选择数据处理资源占用率小于预设资源占用阈值(即通信负载和计算负载较小)的一个边缘采集计算装置进行数据采集和融合处理，其相互间的数据传输过程会包括发起通信、身份认证、建立连接、数据传输、数据确认、协同计算等流程。For example, car A determines that the amount of perceived data is too large, and communicates with the roadside unit through the 4G communication module. It selects an edge collection and computing device whose data processing resource occupancy rate is less than the preset resource occupancy threshold (that is, the communication load and computing load are small) to perform data collection and fusion processing. The data transmission process between them will include initiating communication, identity authentication, establishing connection, data transmission, data confirmation, collaborative computing and other processes.

具体实施时，步骤S4中步骤S402的细节为：边缘采集计算装置采用预先进行文本预处理训练的Bert模型对工作数据中的文本数据进行分词、分句、去停用词的预处理。In the specific implementation, the details of step S402 in step S4 are: the edge acquisition computing device uses the Bert model that has been pre-trained for text preprocessing to perform word segmentation, sentence segmentation, and stop word removal on the text data in the working data.

例如，针对Bert模型存在512的最大序列输入限制，针对文本数据先进行分句处理，根据文本上下文语义分割以及利用“。”、“？”、“！”、“；”等标点符号进行辅助分割，获取具有不同长度的句子文本。然后针对不同长度的句子文本，中文按照字进行分词，英文按照单词进行分词，将连续的自然语言文本切分成具有语义意义的最小单位(词或子词)。最后去除对理解文本含义贡献较小的常见词语即停用词，减少文本中的噪音和冗余信息，以提高文本处理和分析的效果。例如前述示例的“变速器齿轮箱传动轴故障，异响严重”的整机日志数据log.txt，经过预处理后得到的数据为“变”、“速”、“器”、“齿”、“轮”、“箱”、“传”、“动”、“轴”、“故”、“障”、“异”、“响”、“严”、“重”。For example, the Bert model has a maximum sequence input limit of 512. The text data is first processed by sentence segmentation, and the text is segmented according to the context semantics and punctuation marks such as ".", "?", "!", ";" are used for auxiliary segmentation to obtain sentence texts with different lengths. Then, for sentence texts of different lengths, Chinese is segmented by characters, and English is segmented by words, and the continuous natural language text is divided into the smallest units (words or subwords) with semantic meaning. Finally, common words that contribute less to understanding the meaning of the text, namely stop words, are removed to reduce noise and redundant information in the text to improve the effect of text processing and analysis. For example, the whole machine log data log.txt of the aforementioned example "transmission gearbox transmission shaft failure, severe abnormal noise" is obtained after preprocessing as "change", "speed", "device", "tooth", "wheel", "box", "transmission", "motion", "shaft", "fault", "obstacle", "abnormal", "noise", "severe", and "heavy".

具体实施时，步骤S4中步骤S403的细节为：边缘采集计算装置使用预先训练的Bert-base-chinese模型对于中文的文本数据进行词嵌入特征提取，使用预先训练的Bert-base-uncased模型对于英文的文本数据进行词嵌入特征提取；所得到的文本数据的词特征向量表征为维的矩阵向量，文本数据中的单个句子表征为n_i×N_x维向量，单个字符表征为1×N_x维向量，N_x表示特征提取的特征维数，n_i表示文本数据中第i句的字符数，m表示文本数据进行分句后获得的句子数，从而使得词特征向量中不仅包含了文本数据中全局文字的字符信息和语义信息，还包含了各个句子和字符的位置信息。In the specific implementation, the details of step S403 in step S4 are as follows: the edge acquisition computing device uses the pre-trained Bert-base-chinese model to extract word embedding features for Chinese text data, and uses the pre-trained Bert-base-uncased model to extract word embedding features for English text data; the word feature vector representation of the obtained text data is dimensional matrix vector, a single sentence in the text data is represented as an n _i ×N _x- dimensional vector, and a single character is represented as a 1×N _x- dimensional vector, N _x represents the feature dimension of feature extraction, n _i represents the number of characters in the i-th sentence in the text data, and m represents the number of sentences obtained after the text data is segmented. As a result, the word feature vector contains not only the character information and semantic information of the global text in the text data, but also the position information of each sentence and character.

利用开源预训练好的Bert-base-uncased处理英文数据，Bert-base-chinese模型处理中文数据，对文本数据进行特征提取，通过字向量、位置向量、文本向量对文本数据进行词向量表征。Use the open source pre-trained Bert-base-uncased model to process English data and the Bert-base-chinese model to process Chinese data, extract features from text data, and represent text data with word vectors through word vectors, position vectors, and text vectors.

例如，若预设Bert模型的特征提取的特征维数N_x＝768，经过Bert模型最终单个字符的输出表征为1×768维，单个句子的输出表征为n_i×768维，n_i表示第i句的字符数，每句进行归一化处理。通过concat方法将m个句子的向量表征拼接得到单个文本数据最终得到的向量表征可以表示为维的矩阵向量，其中m表示文本数据进行分句后获得的句子数。For example, if the feature dimension of the feature extraction of the preset Bert model is N _x = 768, the output representation of a single character after the Bert model is 1×768 dimensions, and the output representation of a single sentence is n _i ×768 dimensions, where n _i represents the number of characters in the i-th sentence, and each sentence is normalized. The vector representation of a single text data obtained by concatenating the vector representations of m sentences through the concat method can be expressed as dimensional matrix vector, where m represents the number of sentences obtained after the text data is segmented.

如图5所示，前述示例的整机日志数据log.txt“变速器齿轮箱传动轴故障，异响严重”经过Bert模型之后得到15×768维向量矩阵的词特征向量(包含“变”、“速”、“器”、“齿”、“轮”、“箱”、“传”、“动”、“轴”、“故”、“障”、“异”、“响”、“严”、“重”15个字)。该词特征向量作为整机日志数据log.txt这个文本数据的向量表征，不仅包含了文本数据中全局文字的字符信息和语义信息，还包含了各个句子和字符的位置信息，具有充分的表达能力。As shown in Figure 5, the above-mentioned example of the whole machine log data log.txt "transmission gearbox transmission shaft failure, serious abnormal noise" is obtained after the Bert model to obtain the word feature vector of the 15×768-dimensional vector matrix (including 15 words: "change", "speed", "device", "tooth", "wheel", "box", "transmission", "drive", "shaft", "fault", "obstacle", "abnormal", "noise", "serious", and "serious"). As a vector representation of the text data of the whole machine log data log.txt, the word feature vector not only contains the character information and semantic information of the global text in the text data, but also contains the position information of each sentence and character, and has sufficient expression ability.

具体实施时，步骤S4中步骤S404的细节为：边缘采集计算装置分别读取工作数据中的振动数据和音频数据并进行归一化处理，然后按照预设的特征提取的特征维数N_x，对振动数据和音频数据进行切片分割处理；得到的振动数据切片向量表征为x×N_x维向量，音频数据切片向量表征为y×N_x维向量，x、y分别表示振动数据和音频数据的切片分割片段数，从而使得振动数据切片向量和音频数据切片向量中不仅分别包含了振动数据和音频数据的幅值特征信息，还包含了各个切片分割片段的位置信息。During specific implementation, the details of step S404 in step S4 are as follows: the edge acquisition and computing device reads the vibration data and audio data in the working data respectively and performs normalization processing, and then slices and divides the vibration data and audio data according to the feature dimension _Nx of the preset feature extraction; the obtained vibration data slice vector is represented as an x×N _x- dimensional vector, and the audio data slice vector is represented as a y×N _x -dimensional vector, where x and y represent the number of slice division segments of the vibration data and the audio data respectively, so that the vibration data slice vector and the audio data slice vector not only contain the amplitude feature information of the vibration data and the audio data, but also contain the position information of each slice division segment.

例如，分别读取振动数据和音频数据后进行归一化处理，然后按照切割维度768进行分割，划分得到多个数据块，作为振动数据的向量表征x×768和音频数据的向量表征y×768。之后，可以将振动数据的向量表征与音频数据的向量表征通过concat连接得到(x+y)×768维的向量矩阵。例如时长4s的音频audio.wav经过解析后得到4×10240个数据点，时长20s的振动log.dat经过解析后得到20×10240个数据点。各自经过归一化之后，按照768进行数据切割，最后不足的用数据点1填充，分别得到54×768和280×768维两个数据矩阵，然后使用concat连接得到334×768维的音频-振动联合数据矩阵。For example, the vibration data and audio data are read and normalized respectively, and then divided according to the cutting dimension 768 to obtain multiple data blocks, which are used as the vector representation x×768 of the vibration data and the vector representation y×768 of the audio data. After that, the vector representation of the vibration data and the vector representation of the audio data can be connected by concat to obtain a vector matrix of (x+y)×768 dimensions. For example, the audio audio.wav with a duration of 4s is parsed to obtain 4×10240 data points, and the vibration log.dat with a duration of 20s is parsed to obtain 20×10240 data points. After each is normalized, the data is cut according to 768, and finally the insufficient data is filled with data point 1, obtaining two data matrices of 54×768 and 280×768 dimensions respectively, and then concat is used to obtain a 334×768-dimensional audio-vibration joint data matrix.

具体实施时，步骤S4中步骤S405的细节为：边缘采集计算装置先将文本数据的词特征向量、振动数据切片向量和音频数据切片向量进行拼接融合处理，得到工作数据特征矩阵向量，然后再将工作数据特征矩阵向量输入至经过预先训练的Autoencoder自编码模型进行无监督降维编码处理，得到车载边端设备的融合特征数据；其中，工作数据特征矩阵向量表征为维的矩阵向量，N_x表示特征提取的特征维数，n_i表示文本数据中第i句的字符数，m表示文本数据进行分句后获得的句子数，x、y分别表示振动数据和音频数据的切片分割片段数。In the specific implementation, the details of step S405 in step S4 are as follows: the edge acquisition computing device first performs splicing and fusion processing on the word feature vector of the text data, the vibration data slice vector and the audio data slice vector to obtain the working data feature matrix vector, and then inputs the working data feature matrix vector into the pre-trained Autoencoder self-encoding model for unsupervised dimensionality reduction encoding processing to obtain the fusion feature data of the vehicle-mounted edge device; wherein the working data feature matrix vector is represented as dimensional matrix vector, _Nx represents the feature dimension of feature extraction, _ni represents the number of characters in the i-th sentence in the text data, m represents the number of sentences obtained after the text data is segmented, and x and y represent the number of slice segmentation fragments of vibration data and audio data respectively.

例如，将振动数据和音频数据的向量表征与文本表征向量concat得到349×768维的工作数据特征矩阵向量，作为Autoencoder自编码模型的输入。For example, the vector representations of vibration data and audio data are concat- ed with the text representation vector to obtain a 349×768-dimensional working data feature matrix vector, which is used as the input of the Autoencoder model.

Autoencoder自编码模型是一种常用的降维融合编码模型，其构架如图6所示。Autoencoder自编码模型中Encoder和Decoder各自包含3个一维卷积层。Encoder的3个一维卷积层将输入矩阵依次降维到256维、64维和32维，得到隐藏层输出349×32维的特征矩阵。Decoder将隐藏层输出重新依次还原回349×768维的输入向量矩阵。在应用其进行向量降维融合编码之前，可以通过最小化编码器输入与解码器输出重构误差MSE来训练Autoencoder模型，达到预期的降维融合要求。MSE损失函数的计算公式为其中，y_i表示训练过程中的模型输出编码标签，表示模型训练目标编码标签，K表示训练样本的数量。具体应用实施中，Autoencoder自编码模型在训练过程中的的超参数设置如下：神经网络优化器为Adam，学习率为1e^-4，批处理大小为10，Dropout随机失活率为0.4，总共训练200个epoch。The Autoencoder model is a commonly used dimensionality reduction and fusion coding model, and its architecture is shown in Figure 6. In the Autoencoder model, the Encoder and Decoder each contain three one-dimensional convolutional layers. The three one-dimensional convolutional layers of the Encoder reduce the input matrix to 256 dimensions, 64 dimensions, and 32 dimensions in turn, and obtain a 349×32-dimensional feature matrix for the hidden layer output. The Decoder restores the hidden layer output back to the 349×768-dimensional input vector matrix in turn. Before applying it for vector dimensionality reduction and fusion coding, the Autoencoder model can be trained by minimizing the reconstruction error MSE between the encoder input and the decoder output to achieve the expected dimensionality reduction and fusion requirements. The calculation formula of the MSE loss function is Among them, _yi represents the model output encoding label during training, Indicates the target encoding label of model training, and K indicates the number of training samples. In the specific application implementation, the hyperparameters of the Autoencoder model during training are set as follows: the neural network optimizer is Adam, the learning rate is 1e ^-4 , the batch size is 10, the Dropout random inactivation rate is 0.4, and a total of 200 epochs are trained.

具体实施时，步骤S5包括：云端先将车载边端设备的融合特征数据进行Autoencoder自编码的解编码处理，解析得到维的车载边端设备的工作数据特征矩阵向量，然后将所述工作数据特征矩阵向量进行分解得到维的文本数据的词特征向量、x×N_x维的振动数据切片向量和y×N_x维的音频数据切片向量，依据文本数据的词特征向量、振动数据切片向量和音频数据切片向量中包含的位置信息分别对文本数据的词特征以及振动数据和音频数据的切片分割片段按位置顺序进行特征重组，再将各自的重组结果分别按照预设的文本数据格式、振动数据格式和音频数据格式进行数据格式转换处理，得到的车载边端设备的文本恢复数据、振动恢复数据和音频恢复数据，最后将车载边端设备的文本恢复数据、振动恢复数据和音频恢复数据作为相应车载边端设备的工作解析数据，存储至相应车载边端设备对应的数据模型中。In the specific implementation, step S5 includes: the cloud first decodes the fused feature data of the vehicle-mounted edge device through the Autoencoder self-encoding, and parses it to obtain The working data feature matrix vector of the vehicle-mounted edge device is then decomposed to obtain dimensional word feature vectors of text data, x×N _x- dimensional vibration data slice vectors, and y×N _x- dimensional audio data slice vectors, and according to the position information contained in the word feature vectors of the text data, the vibration data slice vectors, and the audio data slice vectors, feature reorganization is performed on the word features of the text data and the slice segments of the vibration data and the audio data in position order, and then the respective reorganization results are converted into data formats according to the preset text data format, vibration data format, and audio data format to obtain text recovery data, vibration recovery data, and audio recovery data of the vehicle-mounted edge device, and finally the text recovery data, vibration recovery data, and audio recovery data of the vehicle-mounted edge device are used as the working parsing data of the corresponding vehicle-mounted edge device, and stored in the data model corresponding to the corresponding vehicle-mounted edge device.

利用所构建的数据传输通道将边端融合后的数据特征传输上云，并提供关键认证参数；边端利用工业网关与云端建立通信，将融合后的数据特征高效率传输上云；原始数据或者预处理后的异构数据可以选择在通信资源空闲的情况下进行发送，归集到云端的数据中心备份。The constructed data transmission channel is used to transmit the data features fused at the edge to the cloud, and key authentication parameters are provided; the edge uses the industrial gateway to establish communication with the cloud, and the fused data features are efficiently transmitted to the cloud; the original data or pre-processed heterogeneous data can be sent when communication resources are idle, and collected in the cloud data center for backup.

基于上述云端与车载边端设备之间的数据构架布局以及传输上云的方法，为了实现车载边端设备与云端之间的数据高兼容性接入，将车载边端设备的异构数据高兼容性接入任务解耦成数据接入与传输、数据解析与处理和数据存储与管理。具体而言，整体的数据上云传输过程可以综述为如下过程：Based on the above data architecture layout between the cloud and the vehicle-mounted edge devices and the method of transmitting to the cloud, in order to achieve high compatibility data access between the vehicle-mounted edge devices and the cloud, the high compatibility access task of heterogeneous data of the vehicle-mounted edge devices is decoupled into data access and transmission, data analysis and processing, and data storage and management. Specifically, the overall data transmission process to the cloud can be summarized as follows:

数据接入与传输：根据所构建的设备数据模型建立工业网关。工业数据网关是数据接入到物联云平台的网络接口，负责为采集的数据最终发送到物联云平台提供网络传输通道。利用Mosquitto开源消息代理软件，采用MQTT数据传输协议，按照标准JSON格式接收采集的数据，发送给Kafka消息队列实现异步处理与限流削峰。Data access and transmission: Establish an industrial gateway based on the constructed device data model. The industrial data gateway is the network interface for data access to the IoT cloud platform, responsible for providing a network transmission channel for the collected data to be sent to the IoT cloud platform. Using Mosquitto open source message agent software and MQTT data transmission protocol, the collected data is received in the standard JSON format and sent to the Kafka message queue for asynchronous processing and current limiting and peak shaving.

数据解析与处理：云端接收到设备传输的数据后，需要进行解析和处理。根据统一的物模型，将设备数据解析为振动数据、音频数据、文本数据的标准格式，并进行数据清洗、校验、转换等处理，以便后续的数据存储和分析。将内部消息队列中的原始数据进行解密，将解密后的数据按照数据接入模型解析成指定的数据结构。Data parsing and processing: After the cloud receives the data transmitted by the device, it needs to be parsed and processed. According to the unified object model, the device data is parsed into standard formats of vibration data, audio data, and text data, and data cleaning, verification, conversion and other processing are performed to facilitate subsequent data storage and analysis. The original data in the internal message queue is decrypted, and the decrypted data is parsed into the specified data structure according to the data access model.

数据存储与管理：解析后的异构设备数据可以存储在云端的MongoDB数据库中，以便后续的查询和分析。基于松耦合的技术设计理念设计开发的数据存储及数据管理框架，可以实现结构化数据、非结构化数据存储，通过数据适配的方式可以与第三方数据平台实现集成，例如Hadoop大数据平台。Data storage and management: The parsed heterogeneous device data can be stored in the MongoDB database in the cloud for subsequent query and analysis. The data storage and data management framework designed and developed based on the loosely coupled technical design concept can realize structured data and unstructured data storage, and can be integrated with third-party data platforms through data adaptation, such as the Hadoop big data platform.

综上所述，本发明方法具备如下的技术优点：In summary, the method of the present invention has the following technical advantages:

2、本发明的车联网异构设备数据融合特征传输上云的方法中，在云端又对车载边端设备的融合特征数据进行解析转换处理，恢复为预设的文本数据、振动数据和音频数据格式，对车载边端设备的工作数据进行存储，并且针对不同的车载边端设备分别设置有对应的设备标识和相应的数据模型，便于对存储的文本、音频、振动等异构数据进行车载边端设备的区分，为后续设备运维等应用提供数据支持。2. In the method for transmitting the fusion feature data of heterogeneous devices in the Internet of Vehicles to the cloud of the present invention, the fusion feature data of the vehicle-mounted edge device is analyzed and converted in the cloud and restored to the preset text data, vibration data and audio data format, and the working data of the vehicle-mounted edge device is stored. In addition, corresponding device identifiers and corresponding data models are set for different vehicle-mounted edge devices, which facilitates the differentiation of vehicle-mounted edge devices for the stored heterogeneous data such as text, audio, vibration, etc., and provides data support for subsequent equipment operation and maintenance and other applications.

最后需要说明的是，以上实施例仅用以说明本发明的技术方案而非限制技术方案，本领域的普通技术人员应当理解，那些对本发明的技术方案进行修改或者等同替换，而不脱离本技术方案的宗旨和范围，均应涵盖在本发明的权利要求范围当中。Finally, it should be noted that the above embodiments are only used to illustrate the technical solution of the present invention rather than to limit the technical solution. Those skilled in the art should understand that those modifications or equivalent substitutions of the technical solution of the present invention that do not depart from the purpose and scope of the technical solution should be included in the scope of the claims of the present invention.

Claims

1. The cloud uploading method for data fusion characteristic transmission of the heterogeneous equipment of the Internet of vehicles is characterized by comprising the following steps of:

s1, constructing a device identifier and a data model corresponding to each vehicle-mounted side device in the cloud end Internet of things; the equipment model corresponding to the vehicle-mounted side equipment is used for identifying the individual identity of the vehicle-mounted side equipment; the data model corresponding to the vehicle-mounted side equipment is used for recording working data of the corresponding vehicle-mounted side equipment;

S2, arranging edge acquisition computing devices in a preset data acquisition area, wherein each edge acquisition computing device can be connected with a plurality of vehicle-mounted side end devices in a data communication mode, and a data transmission channel is formed between the edge acquisition computing devices and a cloud end, so that working data acquired by the vehicle-mounted side end devices can be transmitted to the cloud end through the edge acquisition computing devices;

S3, the vehicle-mounted side equipment collects working data and transmits the working data to an edge collection computing device which establishes data communication connection; the working data comprise text data, vibration data and audio data of the vehicle-mounted side-end equipment;

S4, the edge acquisition computing device performs feature extraction and unsupervised fusion processing on text data, vibration data and audio data in the working data of the vehicle-mounted side equipment to obtain fusion feature data of the vehicle-mounted side equipment, and the fusion feature data is uploaded to a cloud end through a data transmission channel;

S5, the cloud end analyzes and converts the fusion characteristic data of the vehicle-mounted side equipment, and the obtained text recovery data, vibration recovery data and audio recovery data of the vehicle-mounted side equipment are stored as work analysis data into a data model corresponding to the vehicle-mounted side equipment.

2. The method for cloud computing transmission of data fusion features of heterogeneous equipment of the internet of vehicles according to claim 1, wherein in the step S1, a data storage space for the whole machine and parts thereof of the vehicle-mounted side equipment is provided in a data model corresponding to the vehicle-mounted side equipment, and the data storage space is used for respectively storing working data of the whole machine and working data of the parts thereof of the vehicle-mounted side equipment.

3. The method for cloud computing for data fusion feature transmission of heterogeneous internet of vehicles according to claim 1, wherein the step S2 comprises:

S201, establishing equipment access gateways between a cloud end and edge acquisition computing devices, wherein each equipment access gateway is used as a data access point of one or more edge acquisition computing devices;

S202, a data access channel is established in the cloud end and used as a data transmission channel between the cloud end and the equipment access gateway, so that working data of the vehicle-mounted side equipment acquired by the edge acquisition computing device can be uploaded to the cloud end through the equipment access gateway and the data access channel.

4. The method for cloud computing for data fusion feature transmission of heterogeneous internet of vehicles according to claim 1, wherein the step S4 comprises:

S401, carrying out data communication on the vehicle-mounted side equipment and an edge acquisition and calculation device, and acquiring working data of the vehicle-mounted side equipment through the edge acquisition and calculation device;

s402, preprocessing word segmentation, sentence segmentation and stop word removal of text data in the working data by an edge acquisition computing device;

s403, carrying out word embedding feature extraction on the preprocessed text data by the edge acquisition computing device to obtain word feature vectors of the text data;

S404, the edge acquisition computing device performs slice segmentation processing on vibration data and audio data in the working data according to time sequences to obtain vibration data slice vectors and audio data slice vectors;

And S405, performing unsupervised dimension reduction fusion processing on the word feature vector, the vibration data slice vector and the audio data slice vector of the text data by the edge acquisition computing device to obtain fusion feature data of the vehicle-mounted side equipment.

5. The method for cloud computing transmission of heterogeneous equipment data fusion features of the internet of vehicles according to claim 4, wherein in the step S401, the vehicle-mounted side equipment is in data connection with an edge acquisition computing device capable of establishing data communication, the data processing resource occupancy rate of the edge acquisition computing device is obtained, one edge acquisition computing device with the data processing resource occupancy rate smaller than a preset resource occupancy threshold is selected, and the working data of the vehicle-mounted side equipment is uploaded.

6. The method for cloud computing transmission of internet of vehicles heterogeneous equipment data fusion features according to claim 4, wherein in the step S402, the edge collection computing device performs word segmentation, sentence segmentation and stop word removal preprocessing on text data in the working data by using a Bert model which is trained in advance for text preprocessing.

7. The method for cloud computing for data fusion feature transmission of heterogeneous devices of internet of vehicles according to claim 4, wherein in the step S403, the edge collection computing device performs word embedding feature extraction on text data in chinese by using a pretrained Bert-base-chinese model, and performs word embedding feature extraction on text data in english by using a pretrained Bert-base-uncased model; the word feature vector of the obtained text data is characterized asThe method comprises the steps of representing a matrix vector of dimensions, representing a single sentence in text data as an N _i×N_x -dimensional vector, representing a single character as a 1 XN _x -dimensional vector, representing a feature dimension of feature extraction by N _x, representing the number of characters of an ith sentence in the text data by N _i, and representing the number of sentences obtained after the text data is divided into sentences by m, so that the word feature vector not only contains character information and semantic information of global characters in the text data, but also contains position information of each sentence and character.

8. The method for cloud computing transmission of data fusion features of heterogeneous equipment of the internet of vehicles according to claim 4, wherein in the step S404, the edge acquisition computing device reads vibration data and audio data in the working data respectively and performs normalization processing, and then performs slice segmentation processing on the vibration data and the audio data according to a feature dimension N _x extracted by a preset feature; the obtained vibration data slice vector is characterized as an xN _x -dimensional vector, the audio data slice vector is characterized as a yN _x -dimensional vector, and x and y respectively represent the slice segmentation segment numbers of the vibration data and the audio data, so that the vibration data slice vector and the audio data slice vector not only respectively contain amplitude characteristic information of the vibration data and the audio data, but also contain position information of each slice segmentation segment.

9. The method for cloud computing in the data fusion feature transmission of the heterogeneous equipment of the internet of vehicles according to claim 4, wherein in the step S405, the edge acquisition computing device performs splicing fusion processing on word feature vectors, vibration data slice vectors and audio data slice vectors of text data to obtain working data feature matrix vectors, and then inputs the working data feature matrix vectors into a pre-trained Autoencoder self-coding model to perform unsupervised dimension-reduction coding processing to obtain fusion feature data of the vehicle-mounted side equipment; wherein the working data feature matrix vector is characterized asThe matrix vector of the dimension, N _x, represents the feature dimension of feature extraction, N _i represents the number of characters of the ith sentence in the text data, m represents the number of sentences obtained after the text data is divided, and x and y represent the number of slice segmentation fragments of the vibration data and the audio data respectively.

10. The method for cloud computing based data fusion feature transmission of heterogeneous devices of internet of vehicles according to claim 9, wherein in step S5, the cloud end performs Autoencoder self-coding decoding processing on the fusion feature data of the vehicle-mounted side device, and analyzes the decoding processing to obtain the dataWorking data feature matrix vectors of the vehicle-mounted side-end equipment in dimension are decomposed to obtainAnd carrying out feature recombination on word features of the text data and slice segmentation fragments of the vibration data and the audio data according to position information contained in the word feature vector, the vibration data slice vector and the audio data slice vector of the text data, respectively, carrying out data format conversion processing on respective recombination results according to a preset text data format, a vibration data format and an audio data format, obtaining text recovery data, vibration recovery data and audio recovery data of the vehicle-mounted side equipment, and finally, taking the text recovery data, the vibration recovery data and the audio recovery data of the vehicle-mounted side equipment as work analysis data of the corresponding vehicle-mounted side equipment, and storing the work analysis data of the corresponding vehicle-mounted side equipment into a data model corresponding to the corresponding vehicle-mounted side equipment.