CN117992834B

CN117992834B - Data analysis method and related device

Info

Publication number: CN117992834B
Application number: CN202410397523.8A
Authority: CN
Inventors: 蔡毅; 沈忠濠; 靳涛
Original assignee: Guangdong Lichuang Information Technology Co ltd
Current assignee: Guangdong Lichuang Information Technology Co ltd
Priority date: 2024-04-03
Filing date: 2024-04-03
Publication date: 2024-06-25
Anticipated expiration: 2044-04-03
Also published as: CN117992834A

Abstract

The embodiment of the invention provides a data analysis method and a related device, and belongs to the technical field of data processing. The method comprises the steps of obtaining training parameters, wherein the training parameters at least comprise a first gradient parameter of a first object and a second gradient parameter of a second object; performing gradient polymerization on the first gradient parameter and the second gradient parameter to obtain a target gradient parameter; obtaining a first updated parameter of the first object according to the target gradient parameter and obtaining a second updated parameter of the second object according to the target gradient parameter; transmitting the first update parameters to the first server to cause the first server to determine a first target data analysis model and transmitting the second update parameters to the second server to cause the second server to determine a second target data analysis model based on the second update parameters; obtaining a first analysis result obtained by analyzing the first current data according to the first target data analysis model and obtaining a second analysis result obtained by analyzing the second current data according to the second target data analysis model.

Description

Data analysis method and related device

Technical Field

The present invention relates to the field of data processing technologies, and in particular, to a data analysis method and a related device.

Background

With the development of industrial Internet technology, the operation data of the oil gas pipeline and the industrial equipment can be collected and analyzed in real time, and then the operation state of the oil gas pipeline or the industrial equipment can be monitored in real time through methods such as machine learning, time sequence analysis and the like. However, in the related art, the operation data needs to be transmitted on the network, so that leakage of the operation data may occur, if the operation data is trained locally, the operation state of the oil gas pipeline or the industrial equipment is not accurately monitored according to the operation data due to the limitation of the data, so that a data analysis method is needed, and under the condition of ensuring the privacy safety of the operation data of the oil gas pipeline and the industrial equipment, the accurate judgment of the operation state of the oil gas pipeline or the industrial equipment can be ensured.

Disclosure of Invention

The embodiment of the invention mainly aims to provide a data analysis method and a related device, and aims to solve the problem that the operation state of an oil gas pipeline or industrial equipment can not be accurately judged under the condition that the privacy safety of the operation data of the oil gas pipeline and the industrial equipment can not be ensured in the related technology.

In a first aspect, an embodiment of the present invention provides a data analysis method, applied to a target server, including:

Training parameters respectively corresponding to a plurality of target objects of the same target type are obtained, wherein the training parameters at least comprise a first gradient parameter corresponding to a first object and a second gradient parameter corresponding to a second object; wherein the target object comprises industrial equipment or an oil and gas pipeline; the target type comprises the type of any one of industrial equipment or the type of equipment for carrying out data monitoring on any section of pipeline in pipeline transportation;

performing gradient aggregation on the first gradient parameter and the second gradient parameter to obtain a target gradient parameter corresponding to the target type;

obtaining a first updated parameter corresponding to the first object according to the target gradient parameter and the first gradient parameter and obtaining a second updated parameter corresponding to the second object according to the target gradient parameter and the second gradient parameter;

The first updating parameters are sent to a first server in communication connection with the target server, so that the first server determines a first target data analysis model corresponding to the first object according to the first updating parameters, and the second updating parameters are sent to a second server in communication connection with the target server, so that the second server determines a second target data analysis model corresponding to the second object according to the second updating parameters;

obtaining a first analysis result corresponding to the first object, which is sent by the first server and is obtained by analyzing first current data corresponding to the first object according to the first target data analysis model, and obtaining a second analysis result corresponding to the second object, which is sent by the second server and is obtained by analyzing second current data corresponding to the second object according to the second target data analysis model.

In a second aspect, an embodiment of the present invention provides a data analysis device, applied to a target server, including:

The data acquisition module is used for acquiring training parameters respectively corresponding to a plurality of target objects of the same target type, wherein the training parameters at least comprise a first gradient parameter corresponding to a first object and a second gradient parameter corresponding to a second object;

the data aggregation module is used for carrying out gradient aggregation on the first gradient parameter and the second gradient parameter to obtain a target gradient parameter corresponding to the target type;

The parameter determining module is used for obtaining a first updated parameter corresponding to the first object according to the target gradient parameter and the first gradient parameter and obtaining a second updated parameter corresponding to the second object according to the target gradient parameter and the second gradient parameter;

The data sending module is used for sending the first updating parameters to a first server in communication connection with the target server, so that the first server determines a first target data analysis model corresponding to the first object according to the first updating parameters, and sending the second updating parameters to a second server in communication connection with the target server, so that the second server determines a second target data analysis model corresponding to the second object according to the second updating parameters;

The result acquisition module is used for acquiring a first analysis result corresponding to the first object, which is sent by the first server and is obtained by analyzing first current data corresponding to the first object according to the first target data analysis model, and acquiring a second analysis result corresponding to the second object, which is sent by the second server and is obtained by analyzing second current data corresponding to the second object according to the second target data analysis model.

In a third aspect, embodiments of the present invention further provide a target server, the target server including a processor, a memory, a computer program stored on the memory and executable by the processor, and a data bus for enabling a connection communication between the processor and the memory, wherein the computer program, when executed by the processor, implements the steps of any of the data analysis methods as provided in the present specification.

In a fourth aspect, embodiments of the present invention further provide a storage medium for computer readable storage, wherein the storage medium stores one or more programs executable by one or more processors to implement steps of any of the data analysis methods as provided in the present specification.

The embodiment of the application provides a data analysis method and a related device, wherein the method is applied to a target server and comprises the following steps: training parameters respectively corresponding to a plurality of target objects of the same target type are obtained, wherein the training parameters at least comprise a first gradient parameter corresponding to a first object and a second gradient parameter corresponding to a second object; performing gradient aggregation on the first gradient parameter and the second gradient parameter to obtain a target gradient parameter corresponding to the target type; obtaining a first updating parameter corresponding to the first object according to the target gradient parameter and the first gradient parameter and obtaining a second updating parameter corresponding to the second object according to the target gradient parameter and the second gradient parameter; the method comprises the steps of sending first updating parameters to a first server in communication connection with a target server, enabling the first server to determine a first target data analysis model corresponding to a first object according to the first updating parameters, and sending second updating parameters to a second server in communication connection with the target server, enabling the second server to determine a second target data analysis model corresponding to a second object according to the second updating parameters; obtaining a first analysis result corresponding to a first object, which is sent by a first server and is obtained by analyzing first current data corresponding to a first object according to a first target data analysis model, and obtaining a second analysis result corresponding to a second object, which is sent by a second server and is obtained by analyzing second current data corresponding to a second object according to a second target data analysis model. According to the application, the training parameters of a plurality of target objects of the same target type are obtained in a gradient aggregation mode, and then the model adjustment is carried out on the plurality of target objects respectively through the target gradient parameters, so that the training directions of different target objects can be adjusted in a gradient aggregation mode while the problem of data leakage caused by operation data transmission is avoided, and the aim of aligning a better target object with a worse target object is achieved. Therefore, the problem that the operation state of the oil gas pipeline or the industrial equipment can not be accurately judged under the condition that the privacy safety of the operation data of the oil gas pipeline and the industrial equipment can not be ensured in the related technology is solved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings required for the description of the embodiments will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present application, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a schematic flow chart of a data analysis method according to an embodiment of the present invention;

fig. 2 is a schematic block diagram of a data analysis device according to an embodiment of the present invention;

fig. 3 is a schematic block diagram of a target server according to an embodiment of the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are some, but not all embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

The flow diagrams depicted in the figures are merely illustrative and not necessarily all of the elements and operations/steps are included or performed in the order described. For example, some operations/steps may be further divided, combined, or partially combined, so that the order of actual execution may be changed according to actual situations.

It is to be understood that the terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in this specification and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.

The embodiment of the invention provides a data analysis method and a related device. The data analysis method can be applied to a target server, and the target server can be used for electronic devices such as tablet computers, notebook computers, desktop computers, personal digital assistants, wearable devices and the like. The target server may be a cloud server or a server cluster.

Some embodiments of the invention are described in detail below with reference to the accompanying drawings. The following embodiments and features of the embodiments may be combined with each other without conflict.

Referring to fig. 1, fig. 1 is a flow chart of a data analysis method according to an embodiment of the invention.

As shown in fig. 1, the data analysis method is applied to a target server, and includes steps S101 to S105.

Step S101, training parameters respectively corresponding to a plurality of target objects of the same target type are obtained, wherein the training parameters at least comprise a first gradient parameter corresponding to a first object and a second gradient parameter corresponding to a second object.

The target type may be a type of a certain device in industrial equipment, or may be a type of a device for monitoring data of a certain section of pipeline in pipeline transportation. For example, the target type is a certain type of engine or a certain type of air compressor in an industrial apparatus. The target type may also be a pressure sensor, a temperature sensor, etc. for data monitoring of the pipeline in pipeline transportation.

For example, multiple target objects of the same target type may be industrial devices of the same type distributed at different locations, e.g., air compressors of the same model may be respectively disposed at location 1, location 2, location 3, etc.

For example, operation data acquisition is performed on a plurality of target objects of the same target type, respectively, so as to monitor the target objects according to the acquired operation data. In order to ensure the safety of the operation data corresponding to the collected target objects, the operation data is trained in a local server, so that training parameters respectively corresponding to a plurality of target objects of the same target type are obtained.

For example, in order to identify the operation state of the target object, so as to facilitate the subsequent fault location and maintenance of the user, numerous state identification models are proposed. The existing method requires that the operation data of the target objects scattered at all positions are collected together to perform model construction uniformly. However, such centralized data collection operations may cause running data leakage during data transfer. In order to ensure the privacy and safety of the operation data, the operation data corresponding to the target object is subjected to model training locally, so that corresponding training parameters are obtained.

The training parameters include at least a first gradient parameter corresponding to a first object and a second gradient parameter corresponding to a second object, and the number of the objects contained in the training parameters is related to the number of a plurality of target objects corresponding to the same target type. For example, when the plurality of target objects corresponding to the same target type are 3, the training parameters include a first gradient parameter corresponding to the first object, a second gradient parameter corresponding to the second object, and a third gradient parameter corresponding to the third object; when the plurality of target objects corresponding to the same target type are 4, the training parameters comprise a first gradient parameter corresponding to the first object, a second gradient parameter corresponding to the second object, a third gradient parameter corresponding to the third object and a fourth gradient parameter corresponding to the fourth object, and the training parameters are deduced in the same way. The application is not particularly limited, and the user can set the device according to actual requirements.

In some embodiments, the obtaining training parameters respectively corresponding to the plurality of target objects of the same target type includes: acquiring the first gradient parameters sent by the first server, wherein the first gradient parameters are first gradient parameters corresponding to the first data obtained by the first server after the first data are subjected to data training by using an initial data analysis model; the second gradient parameters sent by the second server are obtained, the second gradient parameters are the second gradient parameters corresponding to the second data obtained by the second server after obtaining second data corresponding to the second object and performing data training on the second data by using the initial data analysis model.

The first server is a local server corresponding to the first object, the second server is a local server corresponding to the second object, the first server is in communication connection with the target server, the second server is in communication connection with the target server, no operation data transmission is performed between the first server and the target server, and only gradient parameter transmission is performed, so that the safety of the operation data corresponding to the first object is guaranteed. And similarly, the second server and the target server do not transmit operation data, and only transmit gradient parameters, so that the safety of the operation data corresponding to the second object is ensured.

The target server receives a first gradient parameter corresponding to a first object sent by a first server, and receives a second gradient parameter corresponding to a second object sent by a second server. Before the target server receives the first gradient parameters corresponding to the first object sent by the first server, the first server trains the first data corresponding to the first object by using the initial data analysis model, so that corresponding first gradient parameters are obtained. The initial data analysis model may be a data classification model, an anomaly identification model or a data clustering model, which is not particularly limited in the present application.

Similarly, before the target server receives the second gradient parameters corresponding to the second object sent by the second server, the second server trains the second data corresponding to the second object by using the initial data analysis model which is the same as that of the first object, so that corresponding second gradient parameters are obtained.

Illustratively, using the same initial data analysis model for the first object and the second object may provide good support for subsequent parameter updates and model optimizations.

Furthermore, the target server may obtain a first gradient parameter of the first object and a second gradient parameter of the second object, the number of gradient parameters obtained by the specific target server being related to the number of target objects.

Specifically, the application performs parameter training on a plurality of target objects of the same type by using the same initial data analysis model, so that when the model of each target object is optimized by using all operation data corresponding to the plurality of target objects, the data privacy can be well protected, the burden of a single server is reduced, and therefore, good support is provided for obtaining the model with better parameters subsequently and improving the data analysis precision of each target object.

Step S102, performing gradient aggregation on the first gradient parameter and the second gradient parameter to obtain a target gradient parameter corresponding to the target type.

Illustratively, the target server performs the gradient aggregation operation upon receiving the first gradient parameter and the second gradient parameter. Gradient aggregation may employ various methods, such as simple weighted averaging, fedAvg, etc., joint learning methods to comprehensively consider the values of the two gradient parameters. The gradient aggregation process aims at combining gradient information of different target objects so as to ensure that the overall updating direction of model parameters is more accurate and stable.

In some embodiments, the gradient aggregating the first gradient parameter and the second gradient parameter to obtain a target gradient parameter corresponding to the target type includes: estimating to obtain a first information entropy corresponding to the first gradient parameter and estimating to obtain a second information entropy corresponding to the second gradient parameter; determining the corresponding target gradient parameters of the plurality of target objects after gradient aggregation according to the first information entropy and the second information entropy;

For example, due to the difference of the installation position, the use frequency and other environmental information of the target objects, the first data or the second data corresponding to each target object are collected, so that the performance of models trained by each local server, such as the first server and the second server, is also different, and further, the differences among training parameters are reflected.

For example, in general, the sample distribution situation of the corresponding first data or second data in each local server often reflects the data quality, and the stronger the sample distribution regularity is, the higher the data quality is, and thus the better the trained model performance is. Therefore, the application utilizes the information entropy estimation algorithm to quantify the data distribution condition in each local server, thereby providing support for subsequent gradient aggregation.

For the first gradient parameter and the second gradient parameter, respectively, the corresponding information entropy is calculated, namely the first information entropy and the second information entropy. The information entropy is an index for measuring the uncertainty or confusion degree of data, and can be obtained by counting and calculating the value of the gradient parameter.

Illustratively, the plurality of target objects is not limited to 2, and when the target objects are plural, for example, 10 or more, the target objects that need to participate in gradient aggregation are determined according to the magnitude of the obtained plurality of information entropies. In general, selecting an object with higher information entropy for aggregation may be more beneficial to improve robustness and generalization capability of the model. And then performing gradient aggregation operation on the selected target object. The target gradient parameters can be obtained by adopting methods such as weighted average, joint learning and the like and comprehensively considering gradient information of the methods.

Specifically, gradient parameters are selected and aggregated by combining information entropy, so that stability and performance of the model are improved. This information entropy-based approach can effectively help select the appropriate participant and achieve more accurate target gradient parameters.

In some embodiments, the estimating to obtain a first information entropy corresponding to the first gradient parameter and the estimating to obtain a second information entropy corresponding to the second gradient parameter include: obtaining a first sample number of the first data corresponding to the first object, a first sample feature dimension corresponding to the first data and a first vector average corresponding to the first data; determining a first fitting function corresponding to the first object by combining the first sample number, the first sample feature dimension and the first vector average value through a dual gamma function; obtaining a first vector set corresponding to the first object under the first sample number according to the first fitting function, and further obtaining the first information entropy corresponding to the first gradient parameter according to the first vector set; obtaining a second sample number of the second data corresponding to the second object, a second sample feature dimension corresponding to the second data, and a second vector average corresponding to the second data; determining a second fitting function corresponding to the second object by combining the second sample number, the second sample feature dimension and the second vector average value through a dual gamma function; and obtaining a second vector set corresponding to the second object under the second sample number according to the second fitting function, and further obtaining the second information entropy corresponding to the second gradient parameter according to the second vector set.

Illustratively, a first sample number of first data corresponding to the first object, that is, a number corresponding to the running data when model training is performed, is obtained according to the first server. According to the first server, the first sample feature dimension corresponding to the first data, namely the feature dimension of each sample, is obtained, the vector average value of the first data corresponding to the first data is calculated to obtain a first vector average value, and the first vector average value is used for calculating a follow-up fitting function. A first fitting function corresponding to the first object is determined using the dual gamma function in combination with information such as the first number of samples, the first sample feature dimension, and the first vector average. The dual gamma function may select an appropriate form of fitting function based on the characteristics of the data.

That is, according to the first vector average value corresponding to the data vector corresponding to the first data in the first server, and the first sample number and the first sample feature dimension of the first data, the data fitting is performed by using the dual gamma function, so as to obtain a first fitting function of the first data corresponding to the first server, so that a plurality of vectors of the first sample number are obtained according to the first fitting function, a first vector set is formed according to the plurality of vectors, and further, the first information entropy corresponding to the first gradient parameter is obtained by performing information entropy calculation according to the first vector set.

Illustratively, a second sample number of second data corresponding to the second object, that is, a number corresponding to the running data when model training is performed, is obtained according to the second server. And calculating vector average value calculation of second data corresponding to the second data according to the second sample feature dimension corresponding to the second data, namely the feature dimension of each sample, by the second server to obtain a second vector average value, and calculating a subsequent fitting function. And determining a second fitting function corresponding to the second object by using the dual gamma function and combining the information such as the second sample number, the second sample characteristic dimension, the second vector average value and the like.

That is, according to the second vector average value corresponding to the data vector corresponding to the second data in the second server, and the second sample number and the second sample feature dimension of the second data, the data fitting is performed by using the double gamma function, so as to obtain a second fitting function of the second data corresponding to the second server, so that a plurality of vectors of the second sample number are obtained according to the second fitting function, a second vector set is formed according to the plurality of vectors, and further, the information entropy calculation is performed according to the second vector set, so as to obtain a second information entropy corresponding to the second gradient parameter.

Specifically, the smaller the information entropy is, the lower the data confusion degree is, the stronger the sample distribution regularity is, and the better the trained model performance is, the higher the weight is when gradient parameters are aggregated. According to the method, information entropy calculation is carried out, the data privacy is ensured, and meanwhile, the model quality corresponding to each target object can be better obtained, so that good support is provided for subsequent gradient aggregation.

In some embodiments, the determining the target gradient parameter corresponding to the gradient aggregation of the plurality of target objects according to the first information entropy and the second information entropy includes: obtaining a first reciprocal by solving the first information entropy and obtaining a second reciprocal by solving the second information entropy; obtaining the sum of the first reciprocal and the second reciprocal, and obtaining the sum of the reciprocal; determining the corresponding target gradient parameters after gradient aggregation of a plurality of target objects according to the first reciprocal, the second reciprocal, the reciprocal and the combination of the first gradient parameters and the second gradient parameters; wherein the target gradient parameter is obtained according to the following formula:

；

t represents the target gradient parameter, N represents the total number of target objects, Representing the ith reciprocal of the ith target object,Representing the sum of all reciprocal values corresponding to a plurality of the target objects,And representing the ith gradient parameter corresponding to the ith target object.

For example, the smaller the information entropy is, the lower the data confusion degree is, the stronger the sample distribution regularity is, and the better the trained model performance is, the higher the weight is when gradient parameters are aggregated. Therefore, the application obtains the first reciprocal by solving the reciprocal of the first information entropy and obtains the second reciprocal by solving the reciprocal of the second information entropy, and further indicates the weight information of the target object in gradient polymerization according to the reciprocal of the information entropy.

Illustratively, obtaining a sum of the first reciprocal and the second reciprocal, obtaining a sum of the reciprocal, and further determining corresponding target gradient parameters of the plurality of target objects after gradient aggregation according to the first reciprocal, the second reciprocal, the reciprocal and the combination of the first gradient parameters and the second gradient parameters; wherein the target gradient parameter is obtained according to the following formula:

；

t represents the target gradient parameter, N represents the total number of target object correspondences, Representing the ith reciprocal of the ith target object,Representing the sum of all reciprocal values corresponding to a plurality of target objects,And the ith gradient parameter corresponding to the ith target object is represented.

For example, when the target objects are 2, i.e. the training parameters include first gradient parameters corresponding to the first objectSecond gradient parameter/>, corresponding to second object. Then N is equal to 2, the first reciprocal corresponding to the first object isThe second reciprocal corresponding to the second object is，Equal to the first reciprocalAnd the second reciprocalThe sum is further counted down to the first reciprocalSecond reciprocalFirst gradient parameterSecond gradient parameterSubstituting the gradient parameters into the following formula to obtain target gradient parameters:

；

Specifically, the smaller the information entropy is, the lower the data confusion degree is, the stronger the sample distribution regularity is, and the better the trained model performance is, the higher the weight is when gradient parameters are aggregated. According to the application, the weight during gradient polymerization is determined according to the inverse of the information entropy, so that the model with higher quality plays a larger role during gradient polymerization, and the effect of gradient polymerization is improved.

Step S103, obtaining a first updated parameter corresponding to the first object according to the target gradient parameter and the first gradient parameter, and obtaining a second updated parameter corresponding to the second object according to the target gradient parameter and the second gradient parameter.

The target gradient parameter is a parameter obtained by model fusion of operation data corresponding to a plurality of target objects, and a better quality model training result in the plurality of target objects is considered, so that a difference between the target gradient parameter and the first gradient parameter is calculated, a first updated parameter of the first object is obtained, and a difference between the target gradient parameter and the second gradient parameter is calculated, so that a second updated parameter of the second object is obtained.

In some embodiments, the obtaining the first updated parameter corresponding to the first object according to the target gradient parameter and the first gradient parameter and obtaining the second updated parameter corresponding to the second object according to the target gradient parameter and the second gradient parameter includes: determining a first learning rate corresponding to the first object, and determining a first learning step length corresponding to the first object according to the target gradient parameter and the first learning rate; obtaining a first updating parameter corresponding to the first object according to the first learning step length and the first gradient parameter; determining a second learning rate corresponding to the second object, and determining a second learning step length corresponding to the second object according to the target gradient parameter and the second learning rate; and obtaining a second updating parameter corresponding to the second object according to the second learning step length and the second gradient parameter.

Illustratively, according to the target gradient parameter and the first gradient parameter, an update parameter corresponding to the first object in the current update may be determined. Typically, the model parameters may be adjusted using gradient descent or other optimization algorithms to minimize the loss function. The first update parameter corresponding to the first object is calculated by the following formula: first update parameter = first gradient parameter-first learning rate target gradient parameter. Wherein the first learning rate is an over-parameter for controlling the speed and stability of parameter updating.

Similarly, according to the target gradient parameter and the second gradient parameter, the update parameter corresponding to the second object in the current update can be determined. The same formula may be used to calculate the second update parameter corresponding to the second object: second update parameter = second gradient parameter-second learning rate target gradient parameter. Wherein the first learning rate is an over-parameter for controlling the speed and stability of parameter updating.

Through the steps, the corresponding update parameters of each object in the current update can be calculated according to the target gradient parameters and the gradient parameters of each object. These updated parameters will be used to update the model parameters of each object, thereby gradually optimizing the model and improving model performance and generalization ability. In practical application, super parameters such as learning rate and the like can be adjusted according to specific conditions so as to obtain better training effect.

Step S104, the first update parameter is sent to a first server communicatively connected to the target server, so that the first server determines a first target data analysis model corresponding to the first object according to the first update parameter, and the second update parameter is sent to a second server communicatively connected to the target server, so that the second server determines a second target data analysis model corresponding to the second object according to the second update parameter.

The first update parameter is sent to a first server in communication connection with the target server, so that the first server continues to train the initial data analysis model according to the first update parameter after receiving the first update parameter until a first target data analysis model corresponding to the first object is obtained, and the second update parameter is sent to a second server in communication connection with the target server, so that the second server continues to train the initial data analysis model according to the second update parameter after receiving the second update parameter until a second target data analysis model corresponding to the second object is obtained.

Step S105, obtaining a first analysis result corresponding to the first object, which is sent by the first server and obtained by analyzing the first current data corresponding to the first object according to the first target data analysis model, and obtaining a second analysis result corresponding to the second object, which is sent by the second server and obtained by analyzing the second current data corresponding to the second object according to the second target data analysis model.

In an exemplary embodiment, after the first server obtains the first target data analysis model, the first server performs data analysis on first current data, that is, current running data, of the first object by using the first target data analysis model, so as to obtain a first analysis result corresponding to the first object, and then sends the first analysis result to a target server communicatively connected to the first server. The first analysis result is related to the function of the first target data analysis model, and when the first target data analysis model is an abnormal classification model, the first analysis result is a classification result of abnormal or normal operation of the first current data; when the first target data analysis model is a fault recognition model, the first analysis result is a fault position corresponding to the first object, and so on.

The second server performs data analysis on second current data, that is, current running data, of the second object by using the second target data analysis model after the second server obtains the second target data analysis model, so as to obtain a second analysis result corresponding to the second object, and then sends the second analysis result to the target server in communication connection with the second server. The second analysis result is related to the function of the second target data analysis model, and when the second target data analysis model is an abnormal classification model, the second analysis result is a classification result of abnormal or normal operation of the second current data; when the second object classification analysis model is a fault recognition model, the second analysis result is a fault location corresponding to the second object, and so on.

In the gradient aggregation process, the privacy and the safety of the running data of each target object are ensured, and meanwhile, the quality difference between the running data corresponding to each target object is considered to potentially influence the performance of the model. The method for estimating the information entropy is introduced to obtain the information entropy corresponding to the target data, so that the contribution of the gradient parameters corresponding to each target object to the model final parameters in the aggregation process is represented, and the analysis quality of the target analysis model of each target object can be effectively improved under the condition that the data privacy is ensured. The method solves the problem that the operation state of the oil gas pipeline or the industrial equipment can not be accurately judged under the condition that the privacy safety of the operation data of the oil gas pipeline and the industrial equipment can not be ensured in the related technology.

In some embodiments, after the target server obtains the first analysis result of the first object and the second analysis result corresponding to the second object, the method further includes: obtaining a first accuracy rate corresponding to the first analysis result and a second accuracy rate corresponding to the second analysis result; determining an optimal object corresponding to the target type according to the first accuracy rate and the second accuracy rate; and performing model tuning on the rest objects except the optimal object corresponding to the target type according to the optimal object.

The target server sends out corresponding technicians to process problems according to the first analysis result and the second analysis result after obtaining the first analysis result of the first object and the second analysis result corresponding to the second object, and carries out quality assessment on the first analysis result and the second analysis result according to real field investigation by the technicians, so that a first accuracy corresponding to the first analysis result and a second accuracy corresponding to the second analysis result are obtained.

Illustratively, the optimal object is determined based on the first accuracy and the second accuracy. In general, objects with higher accuracy are selected as optimal objects because their corresponding analysis results are more reliable. And performing model tuning operation on the rest objects except the optimal object. This includes the steps of checking model parameters, adjusting hyper-parameters, optimizing model architecture, etc., to further improve model performance and generalization ability. Therefore, through model tuning, the analysis result of the residual object can be more accurate and reliable, and the performance of the whole model is improved.

Specifically, the model tuning is performed for the object with lower accuracy, so that resources can be utilized more effectively, and the model can obtain better effects on each target object. The method further solves the problem of ensuring the safety of the operation data corresponding to the target object and simultaneously ensuring the accurate judgment of the operation state of the oil gas pipeline or the industrial equipment.

Referring to fig. 2, fig. 2 is a data analysis device 200 provided in an embodiment of the present application, where the data analysis device 200 includes a data acquisition module 201, a data aggregation module 202, a parameter determination module 203, a data transmission module 204, and a result acquisition module 205, where the data acquisition module 201 is configured to obtain training parameters corresponding to a plurality of target objects of a same target type, where the training parameters at least include a first gradient parameter corresponding to a first object and a second gradient parameter corresponding to a second object; the data aggregation module 202 is configured to perform gradient aggregation on the first gradient parameter and the second gradient parameter to obtain a target gradient parameter corresponding to the target type; a parameter determining module 203, configured to obtain a first updated parameter corresponding to the first object according to the target gradient parameter and the first gradient parameter, and obtain a second updated parameter corresponding to the second object according to the target gradient parameter and the second gradient parameter; the data sending module 204 is configured to send the first update parameter to a first server communicatively connected to the target server, so that the first server determines a first target data analysis model corresponding to the first object according to the first update parameter, and send the second update parameter to a second server communicatively connected to the target server, so that the second server determines a second target data analysis model corresponding to the second object according to the second update parameter; the result obtaining module 205 is configured to obtain a first analysis result corresponding to the first object, which is sent by the first server and obtained by analyzing first current data corresponding to the first object according to the first target data analysis model, and obtain a second analysis result corresponding to the second object, which is sent by the second server and obtained by analyzing second current data corresponding to the second object according to the second target data analysis model.

In some embodiments, the data obtaining module 201 performs, in the process of obtaining training parameters corresponding to a plurality of target objects of the same target type, the following steps:

Acquiring the first gradient parameters sent by the first server, wherein the first gradient parameters are first gradient parameters corresponding to the first data obtained by the first server after the first data are subjected to data training by using an initial data analysis model;

The second gradient parameters sent by the second server are obtained, the second gradient parameters are the second gradient parameters corresponding to the second data obtained by the second server after obtaining second data corresponding to the second object and performing data training on the second data by using the initial data analysis model.

In some embodiments, the data aggregation module 202 performs, in the process of performing gradient aggregation on the first gradient parameter and the second gradient parameter to obtain the target gradient parameter corresponding to the target type:

estimating to obtain a first information entropy corresponding to the first gradient parameter and estimating to obtain a second information entropy corresponding to the second gradient parameter;

determining the corresponding target gradient parameters of the plurality of target objects after gradient aggregation according to the first information entropy and the second information entropy;

In some embodiments, the data aggregation module 202 performs, in the process of estimating the first information entropy corresponding to the first gradient parameter and estimating the second information entropy corresponding to the second gradient parameter:

Obtaining a first sample number of the first data corresponding to the first object, a first sample feature dimension corresponding to the first data and a first vector average corresponding to the first data;

determining a first fitting function corresponding to the first object by combining the first sample number, the first sample feature dimension and the first vector average value through a dual gamma function;

obtaining a first vector set corresponding to the first object under the first sample number according to the first fitting function, and further obtaining the first information entropy corresponding to the first gradient parameter according to the first vector set;

Obtaining a second sample number of the second data corresponding to the second object, a second sample feature dimension corresponding to the second data, and a second vector average corresponding to the second data;

Determining a second fitting function corresponding to the second object by combining the second sample number, the second sample feature dimension and the second vector average value through a dual gamma function;

And obtaining a second vector set corresponding to the second object under the second sample number according to the second fitting function, and further obtaining the second information entropy corresponding to the second gradient parameter according to the second vector set.

In some embodiments, the data aggregation module 202 performs, in the determining the target gradient parameters corresponding to the gradient aggregation of the plurality of target objects according to the first information entropy and the second information entropy:

Obtaining a first reciprocal by solving the first information entropy and obtaining a second reciprocal by solving the second information entropy;

Obtaining the sum of the first reciprocal and the second reciprocal, and obtaining the sum of the reciprocal;

Determining the corresponding target gradient parameters after gradient aggregation of a plurality of target objects according to the first reciprocal, the second reciprocal, the reciprocal and the combination of the first gradient parameters and the second gradient parameters;

wherein the target gradient parameter is obtained according to the following formula:

；

In some embodiments, the parameter determining module 203 performs, in the process of obtaining the first updated parameter corresponding to the first object according to the target gradient parameter and the first gradient parameter and obtaining the second updated parameter corresponding to the second object according to the target gradient parameter and the second gradient parameter:

determining a first learning rate corresponding to the first object, and determining a first learning step length corresponding to the first object according to the target gradient parameter and the first learning rate;

Obtaining a first updating parameter corresponding to the first object according to the first learning step length and the first gradient parameter;

Determining a second learning rate corresponding to the second object, and determining a second learning step length corresponding to the second object according to the target gradient parameter and the second learning rate;

And obtaining a second updating parameter corresponding to the second object according to the second learning step length and the second gradient parameter.

In some embodiments, the result obtaining module 205 further performs, after the target server obtains the first analysis result of the first object and the second analysis result corresponding to the second object:

Obtaining a first accuracy rate corresponding to the first analysis result and a second accuracy rate corresponding to the second analysis result;

determining an optimal object corresponding to the target type according to the first accuracy rate and the second accuracy rate;

and performing model tuning on the rest objects except the optimal object corresponding to the target type according to the optimal object.

In some embodiments, the data analysis device 200 may be applied to a target server.

It should be noted that, for convenience and brevity of description, the specific working process of the data analysis device 200 described above may refer to the corresponding process in the foregoing data analysis method embodiment, and will not be described herein.

Referring to fig. 3, fig. 3 is a schematic block diagram of a target server according to an embodiment of the present invention.

As shown in FIG. 3, the target server 300 includes a processor 301 and a memory 302, the processor 301 and the memory 302 being connected by a bus 303, such as an I2C (Inter-INTEGRATED CIRCUIT) bus.

In particular, the processor 301 is configured to provide computing and control capabilities to support the operation of the entire target server. The Processor 301 may be a central processing unit (Central Processing Unit, CPU), the Processor 301 may also be other general purpose processors, digital signal processors (DIGITAL SIGNAL Processor, DSP), application SPECIFIC INTEGRATED Circuit (ASIC), field-Programmable gate array (Field-Programmable GATE ARRAY, FPGA) or other Programmable logic device, discrete gate or transistor logic device, discrete hardware components, or the like. Wherein the general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

Specifically, the Memory 302 may be a Flash chip, a Read-Only Memory (ROM) disk, an optical disk, a U-disk, a removable hard disk, or the like.

It will be appreciated by those skilled in the art that the structure shown in fig. 3 is merely a block diagram of a portion of the structure associated with an embodiment of the present invention and is not intended to limit the destination server to which an embodiment of the present invention may be applied, and that a particular server may include more or fewer components than shown, or may combine certain components, or may have a different arrangement of components.

The processor is configured to run a computer program stored in the memory, and implement any one of the data analysis methods provided by the embodiments of the present invention when the computer program is executed.

In an embodiment, the processor is configured to run a computer program stored in a memory and to implement the following steps when executing the computer program:

Training parameters respectively corresponding to a plurality of target objects of the same target type are obtained, wherein the training parameters at least comprise a first gradient parameter corresponding to a first object and a second gradient parameter corresponding to a second object;

In some embodiments, the processor 301 performs, in the process of obtaining training parameters corresponding to the plurality of target objects of the same target type, the following steps:

In some embodiments, the processor 301 performs, in the gradient aggregation of the first gradient parameter and the second gradient parameter to obtain the target gradient parameter corresponding to the target type:

In some embodiments, the processor 301 performs, in the estimating of the first information entropy corresponding to the first gradient parameter and the estimating of the second information entropy corresponding to the second gradient parameter:

In some embodiments, the processor 301 performs, in the determining the target gradient parameters corresponding to the gradient aggregation of the plurality of target objects according to the first information entropy and the second information entropy:

；

In some embodiments, the processor 301 performs, in the process of obtaining the first updated parameter corresponding to the first object according to the target gradient parameter and the first gradient parameter and obtaining the second updated parameter corresponding to the second object according to the target gradient parameter and the second gradient parameter:

In some embodiments, the processor 301 further performs, after the target server obtains the first analysis result of the first object and the second analysis result corresponding to the second object:

It should be noted that, for convenience and brevity of description, a person skilled in the art can clearly understand that, for the specific working process of the target server described above, reference may be made to a corresponding process in the foregoing data analysis method embodiment, which is not repeated herein.

Embodiments of the present invention also provide a storage medium for computer readable storage, where the storage medium stores one or more programs executable by one or more processors to implement steps of any of the data analysis methods provided in the embodiments of the present invention.

The storage medium may be an internal storage unit of the target server according to the foregoing embodiment, for example, a hard disk or a memory of the target server. The storage medium may also be an external storage device of the target server, such as a plug-in hard disk, a smart memory card (SMART MEDIA CARD, SMC), a Secure Digital (SD) card, a flash memory card (FLASH CARD), or the like, which are provided on the target server.

Those of ordinary skill in the art will appreciate that all or some of the steps, systems, functional modules/units in the apparatus, and methods disclosed above may be implemented as software, firmware, hardware, and suitable combinations thereof. In a hardware embodiment, the division between the functional modules/units mentioned in the above description does not necessarily correspond to the division of physical components; for example, one physical component may have multiple functions, or one function or step may be performed cooperatively by several physical components. Some or all of the physical components may be implemented as software executed by a processor, such as a central processing unit, digital signal processor, or microprocessor, or as hardware, or as an integrated circuit, such as an application specific integrated circuit. Such software may be distributed on computer readable media, which may include computer storage media (or non-transitory media) and communication media (or transitory media). The term computer storage media includes both volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data, as known to those skilled in the art. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital Versatile Disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can be accessed by a computer. Furthermore, as is well known to those of ordinary skill in the art, communication media typically embodies computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media.

It should be understood that the term "and/or" as used in the present specification and the appended claims refers to any and all possible combinations of one or more of the associated listed items, and includes such combinations. It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or system that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or system. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or system that comprises the element.

The foregoing embodiment numbers of the present invention are merely for the purpose of description, and do not represent the advantages or disadvantages of the embodiments. While the invention has been described with reference to certain preferred embodiments, it will be understood by those skilled in the art that various changes and substitutions may be made therein without departing from the spirit and scope of the invention as defined by the appended claims. Therefore, the protection scope of the invention is subject to the protection scope of the claims.

Claims

1. A data analysis method, applied to a target server, the method comprising:

Obtaining a first analysis result corresponding to the first object, which is sent by the first server and is obtained by analyzing first current data corresponding to the first object according to the first target data analysis model, and obtaining a second analysis result corresponding to the second object, which is sent by the second server and is obtained by analyzing second current data corresponding to the second object according to the second target data analysis model;

Performing gradient aggregation on the first gradient parameter and the second gradient parameter to obtain a target gradient parameter corresponding to the target type, wherein the gradient aggregation comprises the following steps:

The determining the target gradient parameters corresponding to the plurality of target objects after gradient aggregation according to the first information entropy and the second information entropy comprises the following steps:

t represents the target gradient parameters, N represents the total number of target object correspondences, d _i represents the i-th reciprocal of target object correspondences, sum _N represents the sum of all reciprocal of target object correspondences, and g _i represents the i-th gradient parameters of target object correspondences.

2. The method according to claim 1, wherein obtaining training parameters respectively corresponding to a plurality of target objects of a same target type comprises:

3. The method according to claim 2, wherein the estimating obtains a first information entropy corresponding to the first gradient parameter and the estimating obtains a second information entropy corresponding to the second gradient parameter, including:

4. The method of claim 1, wherein the obtaining a first updated parameter corresponding to the first object from the target gradient parameter and the first gradient parameter and obtaining a second updated parameter corresponding to the second object from the target gradient parameter and the second gradient parameter comprises:

5. The method according to claim 1, wherein after the target server obtains the first analysis result of the first object and the second analysis result corresponding to the second object, the method further comprises:

6. A data analysis device, applied to a target server, comprising:

The data acquisition module is used for acquiring training parameters respectively corresponding to a plurality of target objects of the same target type, wherein the training parameters at least comprise a first gradient parameter corresponding to a first object and a second gradient parameter corresponding to a second object; wherein the target object comprises industrial equipment or an oil and gas pipeline; the target type comprises the type of any one of industrial equipment or the type of equipment for carrying out data monitoring on any section of pipeline in pipeline transportation;

The result acquisition module is used for acquiring a first analysis result corresponding to the first object, which is sent by the first server and is obtained by analyzing first current data corresponding to the first object according to the first target data analysis model, and acquiring a second analysis result corresponding to the second object, which is sent by the second server and is obtained by analyzing second current data corresponding to the second object according to the second target data analysis model;

7. A target server, wherein the target server comprises a processor and a memory;

the memory is used for storing a computer program;

the processor is configured to execute the computer program and to implement the data analysis method according to any one of claims 1 to 5 when the computer program is executed.

8. A computer storage medium for computer storage, characterized in that the computer storage medium stores one or more programs executable by one or more processors to implement the steps of the data analysis method of any of claims 1 to 5.