CN115618265A

CN115618265A - Data integration method and system based on big data and edge computing

Info

Publication number: CN115618265A
Application number: CN202211374271.4A
Authority: CN
Inventors: 罗建
Original assignee: Guangzhou Mofan Network Technology Co ltd
Current assignee: Guangzhou Mofan Network Technology Co ltd
Priority date: 2022-11-04
Filing date: 2022-11-04
Publication date: 2023-01-17

Abstract

The invention relates to a data processing technology, and discloses a data integration method and a system based on big data and edge calculation, wherein the data integration method comprises the following steps: acquiring edge nodes of a client, and collecting data by using the edge nodes to obtain data to be processed; carrying out standardization processing on the data to be processed to obtain standard data; performing feature analysis on the standard data to obtain an optimal feature subset; performing integrated trust calculation on the optimal feature subset to obtain a trust interval; and performing feature synthesis on the standard data according to the trust interval to obtain integrated data. The invention can improve the efficiency of data integration.

Description

Data integration method and system based on big data and edge computing

技术领域technical field

本发明涉及数据处理技术领域，尤其涉及一种基于大数据和边缘计算的数据整合方法及系统。The present invention relates to the technical field of data processing, in particular to a data integration method and system based on big data and edge computing.

背景技术Background technique

在企业初期信息化建设过程中，由于缺乏信息化建设统一规划，而建立了由不同核心技术构建的信息系统，由于系统的数据种类及数据规模庞大，导致企业信息数据不能有效流通和利用。随着企业信息化集成需求的出现，数据整合作为一种资源整合的理念和方式在不断发展完善。现有技术大多采用传统的云计算模式进行数据整合，云计算模式在进行数据整合时需要将多个数据计算程序分解再进行分析处理，所以对资源配置会有较高的要求，并且大数据的整合常常会导致传统云计算模式的过负荷，导致数据整合的速度缓慢。综上所述，现有技术存在数据整合效率低的问题。In the initial process of enterprise informatization construction, due to the lack of a unified plan for informatization construction, an information system constructed by different core technologies was established. Due to the huge data types and data scale of the system, enterprise information data cannot be effectively circulated and utilized. With the emergence of enterprise information integration needs, data integration, as a concept and method of resource integration, is constantly developing and improving. Most of the existing technologies use the traditional cloud computing mode for data integration. When the cloud computing mode performs data integration, multiple data calculation programs need to be decomposed and then analyzed and processed. Therefore, there will be higher requirements for resource allocation, and the big data Consolidation often overloads traditional cloud computing models, resulting in slow data consolidation. To sum up, the prior art has the problem of low data integration efficiency.

发明内容Contents of the invention

本发明提供一种基于大数据和边缘计算的数据整合方法及系统，其主要目的在于解决数据整合效率低的问题。The present invention provides a data integration method and system based on big data and edge computing, and its main purpose is to solve the problem of low data integration efficiency.

为实现上述目的，本发明提供的一种基于大数据和边缘计算的数据整合方法，包括：In order to achieve the above purpose, the present invention provides a data integration method based on big data and edge computing, including:

获取客户端的边缘节点，利用所述边缘节点收集数据，得到待处理数据；Obtaining an edge node of the client, using the edge node to collect data, and obtaining data to be processed;

对所述待处理数据进行标准化处理，得到标准数据；Standardizing the data to be processed to obtain standard data;

对所述标准数据进行特征分析，得到最优特征子集；Perform feature analysis on the standard data to obtain an optimal feature subset;

对所述最优特征子集进行整合信任计算，得到信任区间；performing integrated trust calculation on the optimal feature subset to obtain a trust interval;

根据所述信任区间对所述标准数据进行特征合成，得到整合数据。Perform feature synthesis on the standard data according to the trust interval to obtain integrated data.

可选地，所述利用所述边缘节点收集数据，得到待处理数据，包括：Optionally, the collecting data by using the edge node to obtain the data to be processed includes:

在所述边缘节点处设置边缘节点设备，得到多个边缘数据传感器；setting an edge node device at the edge node to obtain a plurality of edge data sensors;

利用所述边缘数据传感器连接所述边缘节点及云中心，得到数据传输通道；Using the edge data sensor to connect the edge node and the cloud center to obtain a data transmission channel;

所述边缘数据传感器利用所述数据传输通道接收目标数据；The edge data sensor uses the data transmission channel to receive target data;

将所述目标数据存储至所述边缘节点，得到待处理数据。storing the target data in the edge node to obtain data to be processed.

可选地，所述对所述待处理数据进行标准化处理，得到标准数据，包括：Optionally, the standardization processing of the data to be processed to obtain standard data includes:

判断所述待处理数据的存在形式；judging the existence form of the data to be processed;

若判定所述待处理数据的存在形式为数组，则执行预设的变换方案一，得到第一数据；If it is determined that the existence form of the data to be processed is an array, then execute the preset conversion scheme 1 to obtain the first data;

若判定所述待处理数据的存在形式为矩阵，则执行预设的变换方案二，得到第二数据；If it is determined that the existence form of the data to be processed is a matrix, then execute the second preset conversion scheme to obtain the second data;

汇总所述第一数据及第二数据，得到标准数据。Summarizing the first data and the second data to obtain standard data.

可选地，所述执行预设的变换方案一，得到第一数据，包括：Optionally, the execution of the preset conversion scheme one to obtain the first data includes:

对所述待处理数据进行多维分解，得到多个样本向量；performing multidimensional decomposition on the data to be processed to obtain multiple sample vectors;

计算多个所述样本向量的均值和标准差；calculating the mean and standard deviation of a plurality of said sample vectors;

根据所述均值和标准差对多个所述样本向量进行计算，得到第一数据。The plurality of sample vectors are calculated according to the mean value and the standard deviation to obtain the first data.

利用下式根据所述均值和标准差对多个所述样本向量进行计算：Using the following formula to calculate a plurality of the sample vectors according to the mean and standard deviation:

Z＝(X-M)/SZ=(X-M)/S

其中，Z表示为所述第一数据；X表示为所述样本向量；M表示为所述样本向量的均值；S表示为所述样本向量的标准差。Wherein, Z represents the first data; X represents the sample vector; M represents the mean value of the sample vector; S represents the standard deviation of the sample vector.

可选地，所述执行预设的变换方案二，得到第二数据，包括：Optionally, the execution of the second preset conversion scheme to obtain the second data includes:

对所述待处理数据进行列向量提取，得到多个列向量；Performing column vector extraction on the data to be processed to obtain multiple column vectors;

分别计算所述列向量的均值和标准差；Calculate the mean and standard deviation of the column vectors, respectively;

利用下式分别计算所述特征矩阵中的列向量的均值和标准差：Utilize the following formula to calculate the mean and standard deviation of the column vectors in the feature matrix respectively:

其中，

表示为所述待处理数据中的第i行第j列的标准数据值(i＝1,2,,…,n)；H_ij表示为所述待处理数据中的第i行第j列的向量；

表示为所述待处理数据中第j列的均值；L_j表示为所述待处理数据中第j列的标准差；n表示为所述待处理数据中的行数；in,

Expressed as the standard data value of row i and column j in the data to be processed (i=1,2,,...,n); H _ij is represented as the value of row i and column j in the data to be processed vector;

Expressed as the mean value of the j column in the data to be processed; Lj is represented as the standard deviation of the _j column in the data to be processed; n is represented as the number of rows in the data to be processed;

逐一利用所述列向量的均值和标准差对所述列向量进行处理，得到第二数据。Processing the column vectors one by one by using the mean value and standard deviation of the column vectors to obtain second data.

可选地，所述对所述最优特征子集进行整合信任计算，得到信任区间，包括：Optionally, performing integrated trust calculation on the optimal feature subset to obtain a trust interval includes:

计算所述最优特征子集的信任函数值；calculating a belief function value for the optimal feature subset;

利用下式计算所述最优特征子集的信任函数值：Use the following formula to calculate the trust function value of the optimal feature subset:

B(A)×2^v→[O,11B(A)×2 ^v →[O,11

其中，B(A)表示为所述最优特征子集A的信任函数值；v表示为预设的特征子集的模型框架；m(Q)表示为所述最优特征子集A的全部子集Q的概率函数；Among them, B(A) represents the trust function value of the optimal feature subset A; v represents the model framework of the preset feature subset; m(Q) represents all of the optimal feature subset A The probability function of the subset Q;

根据所述信任函数值计算所述最优特征子集的似然函数值；calculating a likelihood function value for the optimal feature subset based on the belief function value;

利用下式计算所述最优特征子集的似然函数值：The likelihood function value of the optimal feature subset is calculated using the following formula:

P(A)＝1-B(A)＝∑Q∩A≠φm(Q)P(A)＝1-B(A)＝∑Q∩A≠φm(Q)

其中，P(A)表示为所述最优特征子集A的似然函数值；B(A)表示为所述最优特征子集A的信任函数值；m(Q)表示为所述最优特征子集A的全部子集Q的概率函数；Among them, P(A) is expressed as the likelihood function value of the optimal feature subset A; B(A) is expressed as the trust function value of the optimal feature subset A; m(Q) is expressed as the optimal feature subset A The probability function of all subsets Q of the superior feature subset A;

对所述信任函数值及所述似然函数值进行整合表示，得到信任区间。The trust function value and the likelihood function value are integrated and expressed to obtain a trust interval.

可选地，所述对所述标准数据进行特征分析，得到最优特征子集，包括：Optionally, performing feature analysis on the standard data to obtain an optimal feature subset, including:

对所述标准数据进行特征提取，得到特征集合；performing feature extraction on the standard data to obtain a feature set;

对所述特征集合进行权重评估，根据权重评估的结果得到最优特征子集。A weight evaluation is performed on the feature set, and an optimal feature subset is obtained according to a result of the weight evaluation.

可选地，所述对所述特征集合进行权重评估，根据权重评估的结果得到最优特征子集，包括：Optionally, performing a weight evaluation on the feature set, and obtaining an optimal feature subset according to a result of the weight evaluation includes:

对所述特征集合进行权重计算，得到特征集合的特征权重；Performing weight calculation on the feature set to obtain the feature weight of the feature set;

根据所述特征权重计算所述特征集合的维度差异度；calculating the dimensional difference degree of the feature set according to the feature weight;

利用下式计算所述特征集合的维度差异度：Use the following formula to calculate the dimensional difference degree of the feature set:

其中，diff(a₁,a₂,i)表示为所述特征集合中特征数据a₁与特征数据a₂的在第i个维度上的维度差异度；a_1i与a_2i表示为单独数据在第i个维度上的差异数值；max(f_i)表示为第i个维度包含的特征数据所对应的特征权重的权重最大值；min(f_i)表示为为第i个维度包含的特征数据所对应的特征权重的权重最小值；Among them, diff(a ₁ ,a ₂ ,i) is expressed as the dimensional difference between feature data a ₁ and feature data a ₂ in the feature set on the i-th dimension; a _1i and a _2i are expressed as individual data in The difference value on the i-th dimension; max(f _i ) represents the maximum weight value of the feature weight corresponding to the feature data contained in the i-th dimension; min(f _i ) represents the feature data contained in the i-th dimension The minimum weight value of the corresponding feature weight;

根据所述维度差异度对所述特征集合进行调整，得到最优特征子集。The feature set is adjusted according to the dimension difference to obtain an optimal feature subset.

可选地，所述根据所述信任区间对所述标准数据进行特征合成，得到整合数据，包括：Optionally, performing feature synthesis on the standard data according to the trust interval to obtain integrated data includes:

将所述信任区间作为约束条件对所述标准数据进行筛选，得到有效数据；Using the trust interval as a constraint condition to filter the standard data to obtain valid data;

根据预设的合成规则对所述有效数据进行组合，得到整合数据。The effective data is combined according to a preset synthesis rule to obtain integrated data.

为了解决上述问题，本发明还提供一种基于大数据和边缘计算的数据整合系统，所述系统包括：In order to solve the above problems, the present invention also provides a data integration system based on big data and edge computing, said system comprising:

待处理数据收集模块，用于获取客户端的边缘节点，利用所述边缘节点收集数据，得到待处理数据；The data collection module to be processed is used to obtain the edge node of the client, and collect data by using the edge node to obtain the data to be processed;

标准化处理模块，用于对所述待处理数据进行标准化处理，得到标准数据；a standardized processing module, configured to perform standardized processing on the data to be processed to obtain standard data;

特征子集生成模块，用于对所述标准数据进行特征分析，得到最优特征子集；A feature subset generating module, configured to perform feature analysis on the standard data to obtain an optimal feature subset;

信任计算模块，用于对所述最优特征子集进行整合信任计算，得到信任区间；A trust calculation module, configured to perform integrated trust calculation on the optimal feature subset to obtain a trust interval;

整合数据生成模块，用于根据所述信任区间对所述标准数据进行特征合成，得到整合数据。An integrated data generating module, configured to perform feature synthesis on the standard data according to the trust interval to obtain integrated data.

本发明实施例提出了一种基于大数据和边缘计算的数据整合方法及系统，通过利用边缘节点收集数据，数据源产生的数据不需要再传输至云数据中心处理，而是就近即在客户端边缘侧完成数据分析和处理，提高了数据传输的效率；通过对待处理数据进行标准化处理，得到标准数据，消除了待处理数据自身变异和数值大小的影响，提高数据处理的准确性；通过对标准数据进行特征分析，得到最优特征子集，避免了高维度的数据处理，减少特征数量及数据处理时间，提高了数据整合的效率。因此本发明提出的基于大数据和边缘计算的数据整合方法、系统、电子设备及计算机可读存储介质，可以解决进行数据整合时效率低的问题。The embodiment of the present invention proposes a data integration method and system based on big data and edge computing. By using edge nodes to collect data, the data generated by the data source does not need to be transmitted to the cloud data center for processing, but can be processed on the client side nearby. The edge side completes data analysis and processing, which improves the efficiency of data transmission; standard data is obtained by standardizing the data to be processed, which eliminates the influence of the variation and numerical value of the data to be processed, and improves the accuracy of data processing; The feature analysis of the data obtains the optimal feature subset, which avoids high-dimensional data processing, reduces the number of features and data processing time, and improves the efficiency of data integration. Therefore, the data integration method, system, electronic device, and computer-readable storage medium proposed by the present invention based on big data and edge computing can solve the problem of low efficiency in data integration.

附图说明Description of drawings

图1为本发明一实施例提供的基于大数据和边缘计算的数据整合方法的流程示意图；FIG. 1 is a schematic flow diagram of a data integration method based on big data and edge computing provided by an embodiment of the present invention;

图2为本发明一实施例提供的利用所述边缘节点收集数据，得到待处理数据的流程示意图；FIG. 2 is a schematic flow diagram of using the edge node to collect data and obtain data to be processed according to an embodiment of the present invention;

图3为本发明一实施例提供的对所述最优特征子集进行整合信任计算，得到信任区间的流程示意图；FIG. 3 is a schematic flow diagram of performing integrated trust calculation on the optimal feature subset to obtain a trust interval according to an embodiment of the present invention;

图4为本发明一实施例提供的基于大数据和边缘计算的数据整合系统的功能模块图；FIG. 4 is a functional block diagram of a data integration system based on big data and edge computing provided by an embodiment of the present invention;

本发明目的的实现、功能特点及优点将结合实施例，参照附图做进一步说明。The realization of the purpose of the present invention, functional characteristics and advantages will be further described in conjunction with the embodiments and with reference to the accompanying drawings.

具体实施方式detailed description

应当理解，此处所描述的具体实施例仅仅用以解释本发明，并不用于限定本发明。It should be understood that the specific embodiments described here are only used to explain the present invention, not to limit the present invention.

本申请实施例提供一种基于大数据和边缘计算的数据整合方法。所述基于大数据和边缘计算的数据整合方法的执行主体包括但不限于服务端、终端等能够被配置为执行本申请实施例提供的该方法的电子设备中的至少一种。换言之，所述基于大数据和边缘计算的数据整合方法可以由安装在终端设备或服务端设备的软件或硬件来执行，所述软件可以是区块链平台。所述服务端包括但不限于：单台服务器、服务器集群、云端服务器或云端服务器集群等。所述服务器可以是独立的服务器，也可以是提供云服务、云数据库、云计算、云函数、云存储、网络服务、云通信、中间件服务、域名服务、安全服务、内容分发网络(ContentDelivery Network，CDN)、以及大数据和人工智能平台等基础云计算服务的云服务器。The embodiment of the present application provides a data integration method based on big data and edge computing. The execution subject of the data integration method based on big data and edge computing includes but is not limited to at least one of electronic devices such as a server and a terminal that can be configured to execute the method provided by the embodiment of the present application. In other words, the data integration method based on big data and edge computing can be executed by software or hardware installed on the terminal device or server device, and the software can be a block chain platform. The server includes, but is not limited to: a single server, a server cluster, a cloud server or a cloud server cluster, and the like. The server can be an independent server, or it can provide cloud service, cloud database, cloud computing, cloud function, cloud storage, network service, cloud communication, middleware service, domain name service, security service, content distribution network (ContentDelivery Network) , CDN), and cloud servers for basic cloud computing services such as big data and artificial intelligence platforms.

参照图1所示，为本发明一实施例提供的基于大数据和边缘计算的数据整合方法的流程示意图。在本实施例中，所述基于大数据和边缘计算的数据整合方法包括：Referring to FIG. 1 , it is a schematic flowchart of a data integration method based on big data and edge computing provided by an embodiment of the present invention. In this embodiment, the data integration method based on big data and edge computing includes:

S1、获取客户端的边缘节点，利用所述边缘节点收集数据，得到待处理数据；S1. Obtain an edge node of the client, use the edge node to collect data, and obtain data to be processed;

本发明实施例中，所述客户端包括路由器，通过所述路由器可以实现云端与数据源之间的数据传播；其中，边缘是一个相对的概念，主要是指从数据源到云计算中心路径之间的任意计算、存储以及网络相关资源，因此所述边缘节点指的就是在数据产生源头和云中心之间任一具有计算资源和网络资源的节点。例如，手机可以是人与云中心之间的边缘节点，网关可以是智能家居和云中心之间的边缘节点。In the embodiment of the present invention, the client includes a router, through which the data transmission between the cloud and the data source can be realized; wherein, the edge is a relative concept, mainly referring to the path between the data source and the cloud computing center. Any computing, storage and network-related resources between, so the edge node refers to any node between the source of data generation and the cloud center with computing resources and network resources. For example, a mobile phone can be an edge node between a person and a cloud center, and a gateway can be an edge node between a smart home and a cloud center.

请参阅图2所示，本发明实施例中，所述利用所述边缘节点收集数据，得到待处理数据，包括：Please refer to FIG. 2, in the embodiment of the present invention, the use of the edge node to collect data to obtain the data to be processed includes:

S21、在所述边缘节点处设置边缘节点设备，得到多个边缘数据传感器；S21. Setting an edge node device at the edge node to obtain a plurality of edge data sensors;

S22、利用所述边缘数据传感器连接所述边缘节点及云中心，得到数据传输通道；S22. Using the edge data sensor to connect the edge node and the cloud center to obtain a data transmission channel;

S23、所述边缘数据传感器利用所述数据传输通道接收目标数据；S23. The edge data sensor uses the data transmission channel to receive target data;

S24、将所述目标数据存储至所述边缘节点，得到待处理数据。S24. Store the target data in the edge node to obtain data to be processed.

本发明实施例中，所述边缘数据传感器用于感知所述边缘节点的状态和采集所述边缘节点所需的数据信息；所述云中心保证了所述边缘数据传感器与所述目标数据之间交互的规范和协议，所述云中心的存在形式可以是数据库，文件系统等等，所述数据源定义了位置信息的同时也封装了所述边缘数据传感器与所述数据源连接的接口；所述数据传输通道是用于将所述数据源产生的数据传送至所述边缘数据传感器，通过所述边缘数据传感器可以对所述数据中的关键状态进行初步筛选，例如，对于数据中的空白值进行删除，这样可以减少数据处理的时间。In the embodiment of the present invention, the edge data sensor is used to perceive the state of the edge node and collect the data information required by the edge node; the cloud center ensures the communication between the edge data sensor and the target data Interaction specifications and protocols, the existence of the cloud center can be a database, a file system, etc., the data source defines the location information and also encapsulates the interface connecting the edge data sensor to the data source; The data transmission channel is used to transmit the data generated by the data source to the edge data sensor, through which the key status in the data can be initially screened, for example, for blank values in the data Delete, which can reduce the time of data processing.

S2、对所述待处理数据进行标准化处理，得到标准数据；S2. Standardize the data to be processed to obtain standard data;

本发明实施例中，所述对所述待处理数据进行标准化处理，得到标准数据，包括：In the embodiment of the present invention, the standardization processing of the data to be processed to obtain standard data includes:

本发明实施例中，由于所述待处理数据是来自各种来源的大量非结构化或结构化数据，因此存在数据源多类化的特点，为了提高数据标准化的精确度，可以分别设置所述源数据格式对应的不同的变换方案，进而保证不同源数据格式均能实现标准化处理。In the embodiment of the present invention, since the data to be processed is a large amount of unstructured or structured data from various sources, there are multiple types of data sources. In order to improve the accuracy of data standardization, the Different transformation schemes corresponding to the source data format, thereby ensuring that different source data formats can be standardized.

本发明实施例中，所述执行预设的变换方案一，得到第一数据，包括：In the embodiment of the present invention, the execution of the preset conversion scheme 1 to obtain the first data includes:

本发明实施例中，所述待处理数据可以为多维数组，例如，所述待处理数为X＝[X₁,X₂,X₃,…,X_n](n为样本数)，其中，X₁,X₂,X₃,…,X_n为多维的样本向量，例如，向量X₁可以为X₁＝[X₁₁,X₂₁,…,X_d1](d为维数)，相当于多维数组X包含n个样本向量，每个样本向量中包含d个维数。In the embodiment of the present invention, the data to be processed may be a multidimensional array, for example, the number to be processed is X=[X ₁ , X ₂ , X ₃ ,...,X _n ] (n is the number of samples), where, X ₁ , X ₂ , X ₃ ,…,X _n are multi-dimensional sample vectors, for example, vector X ₁ can be X ₁ =[X ₁₁ ,X ₂₁ ,…,X _d1 ] (d is the number of dimensions), equivalent to The multidimensional array X contains n sample vectors, and each sample vector contains d dimensions.

本发明实施例中，可以利用下式根据所述均值和标准差对多个所述样本向量进行计算：In the embodiment of the present invention, the following formula can be used to calculate a plurality of said sample vectors according to said mean value and standard deviation:

Z＝(X-M)/SZ=(X-M)/S

本发明实施例中，所述执行预设的变换方案二，得到第二数据，包括：In the embodiment of the present invention, the execution of the second preset conversion scheme to obtain the second data includes:

分别计算所述列向量的均值和标准差Compute the mean and standard deviation of the column vectors separately

本发明实施例中，可以利用下式分别计算所述特征矩阵中的列向量的均值和标准差：In the embodiment of the present invention, the mean value and standard deviation of the column vectors in the feature matrix can be calculated respectively by using the following formula:

本发明实施例中，

表示为所述待处理数据中第j列的均值；L_j表示为所述待处理数据中第j列的标准差；n表示为所述待处理数据中的行数。In the embodiment of the present invention,

Expressed as the mean value of column _j in the data to be processed; Lj is represented as the standard deviation of column j in the data to be processed; n is represented as the number of rows in the data to be processed.

本发明实施例中，所述待处理数据是一个多维的矩阵，对所述待处理数据进行列向量的提取可以对所述待处理数据进行降维操作，简化数据处理复杂度；所述第二数据是一个标准矩阵，所述标准矩阵中的每一列都满足In the embodiment of the present invention, the data to be processed is a multi-dimensional matrix, and extracting the column vector of the data to be processed can perform a dimensionality reduction operation on the data to be processed, simplifying the complexity of data processing; the second The data is a standard matrix where each column satisfies

S3、对所述标准数据进行特征分析，得到最优特征子集；S3. Perform feature analysis on the standard data to obtain an optimal feature subset;

本发明实施例中，所述对所述标准数据进行特征分析，得到最优特征子集，包括：In the embodiment of the present invention, performing feature analysis on the standard data to obtain an optimal feature subset includes:

本发明实施例中，特征提取可以采用分类器对所述标准数据进行在线特征提取，例如，假设所述标准数据的集合为D＝(b_t,c_t)(t＝1,…,T,T为元素个数；b_t及c_t为所述标准数据中的多维向量，按照元素的顺序依次进行分类，可以利用二分类器通过线性函数sgn(w_t ^Tx_t)(w_t为分类器)实现得到所述特征集合；所述标准数据的集合也可以为D＝(b_t,c_t,e_t)，(b_t、c_t及e_t均为所述标准数据中的多维向量)，可以利用三分类器通过线性函数进行特征提取，得到特征集合，以此类推，所述分类器是根据所述标准数据的集合中多维向量的类别数确定的。In the embodiment of the present invention, the feature extraction can use a classifier to perform online feature extraction on the standard data, for example, assuming that the set of the standard data is D=(b _t , c _t )(t=1,...,T, T is the number of elements; b _t and c _t are multi-dimensional vectors in the standard data, which are classified in turn according to the order of the elements, and the binary classifier can be used to pass the linear function sgn(w _t ^T x _t ) (w _t is the classification implement) to obtain the feature set; the set of the standard data can also be D=(b _t , c _t , e _t ), (b _t , c _t and e _t are all multidimensional vectors in the standard data ), can use three classifiers to perform feature extraction through linear functions to obtain feature sets, and so on, the classifier is determined according to the number of categories of multidimensional vectors in the set of standard data.

本发明实施例中，所述对所述特征集合进行权重评估，根据权重评估的结果得到最优特征子集，包括：In the embodiment of the present invention, the weight evaluation of the feature set is performed, and the optimal feature subset is obtained according to the result of the weight evaluation, including:

本发明实施例中，权重计算可以利用优序图法，根据所述特征集合中的数据相对大小进行计算，例如，将数字0表示相对不重要，数字1表示相对更重要，数字0.5表示一样重要，若所述特征集合中的两个集合是一样重要的，给这两个集合分别记上0.5分，当所有特征集合都进行了一轮比较之后，汇总每个所述特征集合的记分情况，得到每个所述特征集合对应的特征权重。In the embodiment of the present invention, the weight calculation can use the prioritization graph method to calculate according to the relative size of the data in the feature set, for example, the number 0 means relatively unimportant, the number 1 means relatively more important, and the number 0.5 means equally important , if two sets in the feature sets are equally important, record 0.5 points for the two sets respectively, and after all feature sets have been compared for a round, summarize the scoring situation of each feature set, A feature weight corresponding to each feature set is obtained.

本发明实施例中，可以利用下式计算所述特征集合的维度差异度：In the embodiment of the present invention, the dimension difference degree of the feature set can be calculated by using the following formula:

其中，diff(a₁,a₂,i)表示为所述特征集合中特征数据a₁与特征数据a₂的在第i个维度上的维度差异度；a_1i与a_2i表示为单独数据在第i个维度上的差异数值；max(f_i)表示为第i个维度包含的特征数据所对应的特征权重的权重最大值；min(f_i)表示为为第i个维度包含的特征数据所对应的特征权重的权重最小值。Among them, diff(a ₁ ,a ₂ ,i) is expressed as the dimensional difference between feature data a ₁ and feature data a ₂ in the feature set on the i-th dimension; a _1i and a _2i are expressed as individual data in The difference value on the i-th dimension; max(f _i ) represents the maximum weight value of the feature weight corresponding to the feature data contained in the i-th dimension; min(f _i ) represents the feature data contained in the i-th dimension The minimum weight value of the corresponding feature weight.

本发明实施例中，对所述特征集合进行调整是根据所述维度差异度的数值进行筛选，保留小于预设的维度差异度阈值的所述维度差异度对应的特征集合，所述维度差异度阈值可以为0.2；所述最优子集由所述维度差异度排名最后k个所述特征集合构成，k一般取值为6。In the embodiment of the present invention, the adjustment of the feature set is to filter according to the numerical value of the dimensional difference degree, and retain the feature set corresponding to the dimensional difference degree smaller than the preset dimensional difference degree threshold, and the dimensional difference degree The threshold may be 0.2; the optimal subset is composed of the last k feature sets ranked by the degree of difference in the dimensions, and k generally takes a value of 6.

S4、对所述最优特征子集进行整合信任计算，得到信任区间；S4. Perform integrated trust calculation on the optimal feature subset to obtain a trust interval;

请参阅图3所示，本发明实施例中，所述对所述最优特征子集进行整合信任计算，得到信任区间，包括：Please refer to FIG. 3, in the embodiment of the present invention, the integrated trust calculation is performed on the optimal feature subset to obtain a trust interval, including:

S31、计算所述最优特征子集的信任函数值；S31. Calculate the trust function value of the optimal feature subset;

S32、根据所述信任函数值计算所述最优特征子集的似然函数值；S32. Calculate the likelihood function value of the optimal feature subset according to the trust function value;

S33、对所述信任函数值及所述似然函数值进行整合表示，得到信任区间。本发明实施例中，可以利用下式计算所述最优特征子集的信任函数值：S33. Integrate and express the trust function value and the likelihood function value to obtain a trust interval. In the embodiment of the present invention, the following formula can be used to calculate the trust function value of the optimal feature subset:

B(A)×2^v→[0,1]B(A)×2 ^v →[0,1]

其中，B(A)表示为所述最优特征子集A的信任函数值；v表示为预设的特征子集的模型框架；m(Q)表示为所述最优特征子集A的全部子集Q的概率函数。Among them, B(A) represents the trust function value of the optimal feature subset A; v represents the model framework of the preset feature subset; m(Q) represents all of the optimal feature subset A Probability function for subset Q.

具体地，所述概率函数是离散随机变量在某一特定取值上的概率，例如，计算将一枚均匀硬币抛掷3次后出现正面次数的概率，3即为特征取值。Specifically, the probability function is the probability of a discrete random variable at a certain value. For example, the probability of heads occurring after tossing an even coin 3 times is calculated, and 3 is the feature value.

本发明实施例中，可以利用下式计算所述最优特征子集的似然函数值：In the embodiment of the present invention, the likelihood function value of the optimal feature subset can be calculated using the following formula:

P(A)＝1-B(A)＝∑Q∩A≠φm(Q)P(A)＝1-B(A)＝∑Q∩A≠φm(Q)

其中，P(A)表示为所述最优特征子集A的似然函数值；B(A)表示为所述最优特征子集A的信任函数值；m(Q)表示为所述最优特征子集A的全部子集Q的概率函数。Among them, P(A) is expressed as the likelihood function value of the optimal feature subset A; B(A) is expressed as the trust function value of the optimal feature subset A; m(Q) is expressed as the optimal feature subset A The probability function of all subsets Q of the superior feature subset A.

具体地，所述似然函数表示不否认A的信任度，是所有与A相交的所述特征子集的基本概率分布之和。Specifically, the likelihood function represents the degree of confidence of not denying A, which is the sum of the basic probability distributions of all the feature subsets intersecting with A.

本发明实施例中，所述信任区间可以表示为μ(A)＝[B(A),P(A)]；其中，μ(A)表示为所述最优特征子集A的信任区间；B(A)表示为所述最优特征子集A的似然函数值，为所述信任区间的下限；P(A)表示为所述最优特征子集A的任函数值，为所述信任区间的上限；例如，所述信任区间假设为(0.25,0.85),表示A为真的信任度有0.25，A为假的信任度有0.15，A为不确定的信任度为0.6。In the embodiment of the present invention, the trust interval may be expressed as μ(A)=[B(A), P(A)]; wherein, μ(A) is expressed as the trust interval of the optimal feature subset A; B(A) is expressed as the likelihood function value of the optimal feature subset A, which is the lower limit of the confidence interval; P(A) is expressed as the arbitrary function value of the optimal feature subset A, which is the The upper limit of the trust interval; for example, the trust interval is assumed to be (0.25,0.85), which means that A is true with a trust degree of 0.25, A is false with a trust degree of 0.15, and A is uncertain with a trust degree of 0.6.

S5、根据所述信任区间对所述标准数据进行特征合成，得到整合数据。S5. Perform feature synthesis on the standard data according to the trust interval to obtain integrated data.

本发明实施例中，所述根据所述信任区间对所述标准数据进行特征合成，得到整合数据，包括：In the embodiment of the present invention, the feature synthesis of the standard data according to the trust interval to obtain the integrated data includes:

本发明实施例中，对所述标准数据进行筛选就是筛选所述标准数据对应的特征子集的信任函数值处于信任区间的所述特征子集，例如，特征子集R的信任函数值为0.56，所述信任区间为(0.25,0.85)，所述特征子集R的信任函数值位于该区间内，则保留这一特征子集R对应的标准数据；所述合成规则可以利用D-S证据理论的组合规则，通过综合特征信息及利用上、下限概率(所述信任区间)解决信息融合问题来得到最终结果，即数据整合，是一种主要针对多源信息的组合规则。In the embodiment of the present invention, screening the standard data is to screen the feature subsets whose trust function value of the feature subset corresponding to the standard data is in the trust interval, for example, the trust function value of the feature subset R is 0.56 , the trust interval is (0.25,0.85), and the trust function value of the feature subset R is located in this interval, then the standard data corresponding to this feature subset R is reserved; the composition rule can use the D-S evidence theory Combination rules, which solve the information fusion problem by integrating characteristic information and using upper and lower bound probabilities (the confidence interval), to obtain the final result, that is, data integration, are a kind of combination rules mainly aimed at multi-source information.

本发明提出了一种基于大数据和边缘计算的数据整合方法及系统，通过利用边缘节点收集数据，数据源产生的数据不需要再传输至云数据中心处理，而是就近即在客户端边缘侧完成数据分析和处理，提高了数据传输的效率；通过对待处理数据进行标准化处理，得到标准数据，消除了待处理数据自身变异和数值大小的影响，提高数据处理的准确性；通过对标准数据进行特征分析，得到最优特征子集，避免了高维度的数据处理，减少特征数量及数据处理时间，提高了数据整合的效率。因此本发明提出的基于大数据和边缘计算的数据整合方法，可以解决数据整合效率低的问题。The present invention proposes a data integration method and system based on big data and edge computing. By using edge nodes to collect data, the data generated by the data source does not need to be transmitted to the cloud data center for processing, but is nearby at the edge of the client. Completing data analysis and processing improves the efficiency of data transmission; standard data is obtained by standardizing the data to be processed, which eliminates the influence of the variation and numerical value of the data to be processed and improves the accuracy of data processing; Feature analysis, to obtain the optimal feature subset, avoid high-dimensional data processing, reduce the number of features and data processing time, and improve the efficiency of data integration. Therefore, the data integration method based on big data and edge computing proposed by the present invention can solve the problem of low data integration efficiency.

如图4所示，是本发明一实施例提供的基于大数据和边缘计算的数据整合系统的功能模块图。As shown in FIG. 4 , it is a functional block diagram of a data integration system based on big data and edge computing provided by an embodiment of the present invention.

本发明所述基于大数据和边缘计算的数据整合系统100可以安装于电子设备中。根据实现的功能，所述基于大数据和边缘计算的数据整合系统100可以包括待处理数据收集模块101、标准化处理模块102、特征子集生成模块103、信任计算模块104及整合数据生成模块105。本发明所述模块也可以称之为单元，是指一种能够被电子设备处理器所执行，并且能够完成固定功能的一系列计算机程序段，其存储在电子设备的存储器中。The data integration system 100 based on big data and edge computing in the present invention can be installed in electronic devices. According to the realized functions, the data integration system 100 based on big data and edge computing may include a data collection module 101 to be processed, a standardized processing module 102, a feature subset generation module 103, a trust calculation module 104 and an integrated data generation module 105. The module in the present invention can also be called a unit, which refers to a series of computer program segments that can be executed by the processor of the electronic device and can complete fixed functions, and are stored in the memory of the electronic device.

在本实施例中，关于各模块/单元的功能如下：In this embodiment, the functions of each module/unit are as follows:

所述待处理数据收集模块101，用于获取客户端的边缘节点，利用所述边缘节点收集数据，得到待处理数据；The data-to-be-processed collection module 101 is configured to acquire edge nodes of the client, and use the edge nodes to collect data to obtain data to be processed;

所述标准化处理模块102，用于对所述待处理数据进行标准化处理，得到标准数据；The standardization processing module 102 is configured to perform standardization processing on the data to be processed to obtain standard data;

所述特征子集生成模块103，用于对所述标准数据进行特征分析，得到最优特征子集；The feature subset generating module 103 is configured to perform feature analysis on the standard data to obtain an optimal feature subset;

所述信任计算模块104，用于对所述最优特征子集进行整合信任计算，得到信任区间；The trust calculation module 104 is configured to perform integrated trust calculation on the optimal feature subset to obtain a trust interval;

所述整合数据生成模块105，用于根据所述信任区间对所述标准数据进行特征合成，得到整合数据。The integrated data generation module 105 is configured to perform feature synthesis on the standard data according to the trust interval to obtain integrated data.

详细地，本发明实施例中所述基于大数据和边缘计算的数据整合系统100中所述的各模块在使用时采用与附图中所述的基于大数据和边缘计算的数据整合方法一样的技术手段，并能够产生相同的技术效果，这里不再赘述。In detail, each module described in the data integration system 100 based on big data and edge computing described in the embodiment of the present invention uses the same method as the data integration method based on big data and edge computing described in the accompanying drawings. Technical means, and can produce the same technical effect, will not repeat them here.

本实施例还提供一种电子设备，所述电子设备可以包括处理器、存储器、通信总线以及通信接口，还可以包括存储在所述存储器中并可在所述处理器上运行的计算机程序，如基于信息安全大数据的防护升级程序。This embodiment also provides an electronic device, which may include a processor, a memory, a communication bus, and a communication interface, and may also include a computer program stored in the memory and operable on the processor, such as A protection upgrade program based on information security big data.

其中，所述处理器在一些实施例中可以由集成电路组成，例如可以由单个封装的集成电路所组成，也可以是由多个相同功能或不同功能封装的集成电路所组成，包括一个或者多个中央处理器(Central Processing Unit，CPU)、微处理器、数字处理芯片、图形处理器及各种控制芯片的组合等。所述处理器是所述电子设备的控制核心(Control Unit)，利用各种接口和线路连接整个电子设备的各个部件，通过运行或执行存储在所述存储器11内的程序或者模块(例如执行基于大数据和边缘计算的数据整合程序等)，以及调用存储在所述存储器内的数据，以执行电子设备的各种功能和处理数据。Wherein, the processor may be composed of integrated circuits in some embodiments, for example, may be composed of a single packaged integrated circuit, or may be composed of multiple integrated circuits with the same function or different functions, including one or more A combination of a central processing unit (Central Processing Unit, CPU), a microprocessor, a digital processing chip, a graphics processor, and various control chips, etc. The processor is the control core (Control Unit) of the electronic device, and uses various interfaces and lines to connect the various components of the entire electronic device, and runs or executes programs or modules stored in the memory 11 (for example, executes based on data integration programs for big data and edge computing, etc.), and call data stored in the memory to perform various functions of electronic devices and process data.

所述存储器至少包括一种类型的可读存储介质，所述可读存储介质包括闪存、移动硬盘、多媒体卡、卡型存储器(例如：SD或DX存储器等)、磁性存储器、磁盘、光盘等。所述存储器在一些实施例中可以是电子设备的内部存储单元，例如该电子设备的移动硬盘。所述存储器在另一些实施例中也可以是电子设备的外部存储设备，例如电子设备上配备的插接式移动硬盘、智能存储卡(Smart Media Card，SMC)、安全数字(Secure Digital，SD)卡、闪存卡(Flash Card)等。进一步地，所述存储器还可以既包括电子设备的内部存储单元也包括外部存储设备。所述存储器不仅可以用于存储安装于电子设备的应用软件及各类数据，例如基于大数据和边缘计算的数据整合程序的代码等，还可以用于暂时地存储已经输出或者将要输出的数据。The memory includes at least one type of readable storage medium, and the readable storage medium includes flash memory, mobile hard disk, multimedia card, card type memory (for example: SD or DX memory, etc.), magnetic memory, magnetic disk, optical disk, etc. In some embodiments, the storage may be an internal storage unit of the electronic device, such as a mobile hard disk of the electronic device. In other embodiments, the memory may also be an external storage device of the electronic device, such as a plug-in mobile hard disk equipped on the electronic device, a smart memory card (Smart Media Card, SMC), a secure digital (Secure Digital, SD) card, flash memory card (Flash Card), etc. Further, the memory may also include both an internal storage unit of the electronic device and an external storage device. The memory can not only be used to store application software and various data installed in electronic equipment, such as codes of data integration programs based on big data and edge computing, but also can be used to temporarily store data that has been output or will be output.

所述通信总线可以是外设部件互连标准(Peripheral Component Interconnect，简称PCI)总线或扩展工业标准结构(Extended Industry Standard Architecture，简称EISA)总线等。该总线可以分为地址总线、数据总线、控制总线等。所述总线被设置为实现所述存储器以及至少一个处理器等之间的连接通信。The communication bus may be a Peripheral Component Interconnect (PCI for short) bus or an Extended Industry Standard Architecture (EISA for short) bus or the like. The bus can be divided into address bus, data bus, control bus and so on. The bus is configured to implement communication between the memory and at least one processor.

所述通信接口用于上述电子设备与其他设备之间的通信，包括网络接口和用户接口。可选地，所述网络接口可以包括有线接口和/或无线接口(如WI-FI接口、蓝牙接口等)，通常用于在该电子设备与其他电子设备之间建立通信连接。所述用户接口可以是显示器(Display)、输入单元(比如键盘(Keyboard))，可选地，用户接口还可以是标准的有线接口、无线接口。可选地，在一些实施例中，显示器可以是LED显示器、液晶显示器、触控式液晶显示器以及OLED(Organic Light-Emitting Diode，有机发光二极管)触摸器等。其中，显示器也可以适当的称为显示屏或显示单元，用于显示在电子设备中处理的信息以及用于显示可视化的用户界面。The communication interface is used for communication between the electronic device and other devices, including a network interface and a user interface. Optionally, the network interface may include a wired interface and/or a wireless interface (such as a WI-FI interface, a Bluetooth interface, etc.), which are generally used to establish a communication connection between the electronic device and other electronic devices. The user interface may be a display (Display) or an input unit (such as a keyboard (Keyboard)). Optionally, the user interface may also be a standard wired interface or a wireless interface. Optionally, in some embodiments, the display may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an OLED (Organic Light-Emitting Diode, Organic Light-Emitting Diode) touch panel, and the like. Wherein, the display may also be properly referred to as a display screen or a display unit, and is used for displaying information processed in the electronic device and for displaying a visualized user interface.

所述电子设备中的所述存储器存储的基于大数据和边缘计算的数据整合程序是多个指令的组合，在所述处理器中运行时，可以实现如上文所述的基于大数据和边缘计算的数据整合方法的步骤。The data integration program based on big data and edge computing stored in the memory in the electronic device is a combination of multiple instructions. When running in the processor, the above-mentioned big data and edge computing-based data integration program can be realized. The steps of the data integration method.

具体地，所述处理器对上述指令的具体实现方法可参考附图对应实施例中相关步骤的描述，在此不赘述。Specifically, for the specific implementation method of the above instructions by the processor, reference may be made to the description of relevant steps in the corresponding embodiments in the accompanying drawings, and details are not repeated here.

进一步地，所述电子设备集成的模块/单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时，可以存储在一个计算机可读存储介质中。所述计算机可读存储介质可以是易失性的，也可以是非易失性的。例如，所述计算机可读介质可以包括：能够携带所述计算机程序代码的任何实体或系统、记录介质、U盘、移动硬盘、磁碟、光盘、计算机存储器、只读存储器(ROM，Read-Only Memory)。Furthermore, if the integrated module/unit of the electronic device is realized in the form of a software function unit and sold or used as an independent product, it can be stored in a computer-readable storage medium. The computer-readable storage medium may be volatile or non-volatile. For example, the computer-readable medium may include: any entity or system capable of carrying the computer program code, recording medium, U disk, removable hard disk, magnetic disk, optical disk, computer memory, read-only memory (ROM, Read-Only Memory).

本发明还提供一种计算机可读存储介质，存储有计算机程序，所述计算机程序被处理器执行时，实现如上文所述的基于大数据和边缘计算的数据整合方法的步骤。The present invention also provides a computer-readable storage medium storing a computer program. When the computer program is executed by a processor, the steps of the above-mentioned data integration method based on big data and edge computing are realized.

这些程序代码也可装载到计算机或其他可编程数据处理设备上，使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理，从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程中指定的功能的步骤。These program codes can also be loaded into a computer or other programmable data processing device, so that a series of operational steps are executed on the computer or other programmable device to produce computer-implemented processing, thereby executing instructions on the computer or other programmable device Provides the steps to implement a function specified in a flowchart flow or flows.

存储介质包括永久性和非永久性、可移动和非可移动媒体，可以由任何方法或技术来实现信息存储。信息可以是计算机可读指令、数据结构、程序的模块或其他数据。存储介质的例子可以包括但不限于相变内存(PRAM)、静态随机存取存储器(SRAM)、动态随机存取存储器(DRAM)、其他类型的随机存取存储器(RAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、快闪记忆体或其他内存技术、只读光盘只读存储器(CD-ROM)、数字多功能光盘(DVD)或其他光学存储、磁盒式磁带，磁盘存储或其他磁性存储设备或任何其他非传输介质，可用于存储可以被计算设备访问的信息。Storage media includes permanent and non-permanent, removable and non-removable media, and information storage can be realized by any method or technology. Information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media may include, but are not limited to, phase change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read only memory (ROM) ), Electrically Erasable Programmable Read-Only Memory (EEPROM), Flash memory or other memory technology, Compact Disc Read-Only Memory (CD-ROM), Digital Versatile Disc (DVD) or other optical storage, magnetic cartridges Magnetic tape, disk storage or other magnetic storage device or any other non-transmission medium that can be used to store information that can be accessed by a computing device.

在本发明所提供的几个实施例中，应该理解到，所揭露的设备，系统和方法，可以通过其它的方式实现。例如，以上所描述的系统实施例仅仅是示意性的，例如，所述模块的划分，仅仅为一种逻辑功能划分，实际实现时可以有另外的划分方式。In the several embodiments provided by the present invention, it should be understood that the disclosed devices, systems and methods can be implemented in other ways. For example, the system embodiments described above are only illustrative. For example, the division of the modules is only a logical function division, and there may be other division methods in actual implementation.

所述作为分离部件说明的模块可以是或者也可以不是物理上分开的，作为模块显示的部件可以是或者也可以不是物理单元，即可以位于一个地方，或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。The modules described as separate components may or may not be physically separated, and the components shown as modules may or may not be physical units, that is, they may be located in one place, or may be distributed to multiple network units. Part or all of the modules can be selected according to actual needs to achieve the purpose of the solution of this embodiment.

另外，在本发明各个实施例中的各功能模块可以集成在一个处理单元中，也可以是各个单元单独物理存在，也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现，也可以采用硬件加软件功能模块的形式实现。In addition, each functional module in each embodiment of the present invention may be integrated into one processing unit, or each unit may physically exist separately, or two or more units may be integrated into one unit. The above-mentioned integrated units can be implemented in the form of hardware, or in the form of hardware plus software function modules.

对于本领域技术人员而言，显然本发明不限于上述示范性实施例的细节，而且在不背离本发明的精神或基本特征的情况下，能够以其他的具体形式实现本发明。It will be apparent to those skilled in the art that the invention is not limited to the details of the above-described exemplary embodiments, but that the invention can be embodied in other specific forms without departing from the spirit or essential characteristics of the invention.

因此，无论从哪一点来看，均应将实施例看作是示范性的，而且是非限制性的，本发明的范围由所附权利要求而不是上述说明限定，因此旨在将落在权利要求的等同要件的含义和范围内的所有变化涵括在本发明内。不应将权利要求中的任何附关联图标记视为限制所涉及的权利要求。Accordingly, the embodiments should be regarded in all points of view as exemplary and not restrictive, the scope of the invention being defined by the appended claims rather than the foregoing description, and it is therefore intended that the scope of the invention be defined by the appended claims rather than by the foregoing description. All changes within the meaning and range of equivalents of the elements are embraced in the present invention. Any reference sign in a claim should not be construed as limiting the claim concerned.

本申请实施例可以基于人工智能技术对相关的数据进行获取和处理。其中，人工智能(Artificial Intelligence，AI)是利用数字计算机或者数字计算机控制的机器模拟、延伸和扩展人的智能，感知环境、获取知识并使用知识获得最佳结果的理论、方法、技术及应用系统。The embodiments of the present application may acquire and process relevant data based on artificial intelligence technology. Among them, artificial intelligence (AI) is the theory, method, technology and application system that uses digital computers or machines controlled by digital computers to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge and use knowledge to obtain the best results. .

此外，显然“包括”一词不排除其他单元或步骤，单数不排除复数。系统权利要求中陈述的多个单元或系统也可以由一个单元或系统通过软件或者硬件来实现。第一、第二等词语用来表示名称，而并不表示任何特定的顺序。In addition, it is obvious that the word "comprising" does not exclude other elements or steps, and the singular does not exclude the plural. A plurality of units or systems stated in the system claims may also be realized by one unit or system through software or hardware. The terms first, second, etc. are used to denote names and do not imply any particular order.

最后应说明的是，以上实施例仅用以说明本发明的技术方案而非限制，尽管参照较佳实施例对本发明进行了详细说明，本领域的普通技术人员应当理解，可以对本发明的技术方案进行修改或等同替换，而不脱离本发明技术方案的精神和范围。Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present invention without limitation. Although the present invention has been described in detail with reference to the preferred embodiments, those of ordinary skill in the art should understand that the technical solutions of the present invention can be Modifications or equivalent replacements can be made without departing from the spirit and scope of the technical solutions of the present invention.

Claims

1. A data integration method based on big data and edge calculation is characterized by comprising the following steps:

acquiring edge nodes of a client, and collecting data by using the edge nodes to obtain data to be processed;

carrying out standardization processing on the data to be processed to obtain standard data;

performing feature analysis on the standard data to obtain an optimal feature subset;

performing integrated trust calculation on the optimal feature subset to obtain a trust interval;

and performing characteristic synthesis on the standard data according to the trust interval to obtain integrated data.

2. The data integration method based on big data and edge computing as claimed in claim 1, wherein said collecting data by using the edge node to obtain the data to be processed comprises:

edge node equipment is arranged at the edge nodes to obtain a plurality of edge data sensors;

connecting the edge nodes and the cloud center by using the edge data sensor to obtain a data transmission channel;

the edge data sensor receives target data by using the data transmission channel;

and storing the target data to the edge node to obtain data to be processed.

3. The data integration method based on big data and edge calculation according to claim 1, wherein the normalizing the data to be processed to obtain standard data comprises:

judging the existence form of the data to be processed;

if the existence form of the data to be processed is judged to be an array, executing a preset first transformation scheme to obtain first data;

if the existence form of the data to be processed is judged to be a matrix, executing a preset second transformation scheme to obtain second data;

and summarizing the first data and the second data to obtain standard data.

4. The method for data integration based on big data and edge calculation according to claim 3, wherein the step of executing a first predetermined transformation scheme to obtain the first data comprises:

carrying out multidimensional decomposition on the data to be processed to obtain a plurality of sample vectors;

calculating a mean and a standard deviation of a plurality of the sample vectors;

and calculating the plurality of sample vectors according to the mean value and the standard deviation to obtain first data.

Calculating a plurality of said sample vectors from said mean and standard deviation using:

Z＝(X-M)/S

wherein Z is represented as the first data; x is represented as the sample vector; m is expressed as the mean of the sample vectors; s is expressed as the standard deviation of the sample vector.

5. The method for data integration based on big data and edge calculation according to claim 3, wherein the step of executing a second predetermined transformation scheme to obtain second data comprises:

performing column vector extraction on the data to be processed to obtain a plurality of column vectors;

respectively calculating the mean value and the standard deviation of the column vectors;

calculating the mean and standard deviation of the column vectors in the feature matrix using the following equations, respectively:

wherein,

a standard data value (i =1,2, 8230; n) expressed as an ith row and a jth column in the data to be processed; h _ij A vector which is represented as the ith row and the jth column in the data to be processed;

expressed as the mean value of the jth column in the data to be processed; l is _j Expressed as the standard deviation of the jth column in the data to be processed; n is expressed as the number of lines in the data to be processed;

and processing the column vectors one by using the mean value and the standard deviation of the column vectors to obtain second data.

6. The big data and edge computing-based data integration method according to claim 1, wherein the performing integration trust computation on the optimal feature subset to obtain a trust interval comprises:

calculating a belief function value for the optimal feature subset;

calculating a belief function value for the optimal subset of features using the following equation:

B(A)×2 ^v →[0,1]

wherein B (A) is expressed as a belief function value for the optimal feature subset A; v represents a model framework of a preset feature subset; m (Q) is represented as a probability function of the entire subset Q of the optimal feature subset A;

calculating a likelihood function value for the optimal subset of features from the belief function values;

calculating likelihood function values for the optimal subset of features using:

P(A)＝1-B(A)＝∑Q∩A≠φm(Q)

wherein P (A) is expressed as a likelihood function value for the optimal feature subset A; b (A) is expressed as a belief function value for the optimal feature subset A; m (Q) is expressed as a probability function of all subsets Q of the optimal feature subset A;

and integrating and representing the trust function values and the likelihood function values to obtain a trust interval.

7. The big data and edge calculation based data integration method of claim 1, wherein the performing feature analysis on the standard data to obtain an optimal feature subset comprises:

performing feature extraction on the standard data to obtain a feature set;

and performing weight evaluation on the feature set, and obtaining an optimal feature subset according to the result of the weight evaluation.

8. The method for data integration based on big data and edge calculation according to claim 7, wherein the performing weight evaluation on the feature set to obtain an optimal feature subset according to the result of weight evaluation comprises:

carrying out weight calculation on the feature set to obtain the feature weight of the feature set;

calculating the dimension difference degree of the feature set according to the feature weight;

calculating the dimension difference degree of the feature set by using the following formula:

wherein, diff (a) ₁ ,a ₂ I) expressed as feature data a in the feature set ₁ And characteristic data a ₂ The dimension difference degree in the ith dimension; a is a _1i And a _2i Expressed as the difference value of the individual data in the ith dimension; max (f) _i ) The weight maximum value of the characteristic weight corresponding to the characteristic data contained in the ith dimension is represented; min (f) _i ) The weight minimum value is expressed as the weight of the characteristic corresponding to the characteristic data contained in the ith dimension;

and adjusting the feature set according to the dimension difference degree to obtain an optimal feature subset.

9. The big data and edge calculation based data integration method according to any one of claims 1 to 8, wherein the performing feature synthesis on the standard data according to the trust interval to obtain integrated data comprises:

screening the standard data by taking the trust interval as a constraint condition to obtain effective data;

and combining the effective data according to a preset synthesis rule to obtain integrated data.

10. A data consolidation system based on big data and edge computation, the system comprising:

the to-be-processed data collection module is used for acquiring edge nodes of the client and collecting data by utilizing the edge nodes to obtain to-be-processed data;

the standardization processing module is used for standardizing the data to be processed to obtain standard data;

the characteristic subset generating module is used for carrying out characteristic analysis on the standard data to obtain an optimal characteristic subset;

the trust calculation module is used for carrying out integrated trust calculation on the optimal characteristic subset to obtain a trust interval;

and the integrated data generation module is used for carrying out feature synthesis on the standard data according to the trust interval to obtain integrated data.