CN114660931A

CN114660931A - Method and system for diagnosing and identifying petrochemical process fault

Info

Publication number: CN114660931A
Application number: CN202011530237.2A
Authority: CN
Inventors: 牛鲁娜; 韩磊; 兰正贵; 陈文武
Original assignee: China Petroleum and Chemical Corp; Sinopec Qingdao Safety Engineering Institute
Current assignee: China Petroleum and Chemical Corp; Sinopec Qingdao Safety Engineering Institute
Priority date: 2020-12-22
Filing date: 2020-12-22
Publication date: 2022-06-24

Abstract

The invention provides a method and a system for diagnosing and identifying faults in a petrochemical process, and belongs to the technical field of chemical safety. The method comprises the following steps: calculating kernel function parameters of the SVM model through training set data; training the SVM model with the training set data, wherein a weight parameter of the SVM model is used for a contribution score of a process variable to the training set data type, the contribution score being used for a contribution score ranking of the process variable in a training process, the contribution score ranking being used for the in-training-process feature selection; and after the training is finished, obtaining the optimal feature set data and the optimal SVM model. The invention is used for fault diagnosis of modern complex petrochemical processes.

Description

Method and system for diagnosing and identifying petrochemical process fault

Technical Field

The invention relates to the technical field of chemical safety, in particular to a model acquisition method for petrochemical process fault diagnosis and identification, an online diagnosis method for petrochemical process fault diagnosis and identification, a system for petrochemical process fault diagnosis and identification, an electronic device and a computer-readable storage medium.

Background

Because the petrochemical process has great complexity and has higher requirements on the safety and the reliability of the device, the online diagnosis and the fault identification of the petrochemical process are extremely important. The fault diagnosis and identification method is an important technical means for ensuring safe and stable operation of the process, fault points can be accurately positioned through monitoring the abnormal state of the system, and the process operation is optimized and adjusted in time, so that the stability, reliability and safety of the production process are ensured, and the aims of improving the production efficiency, product quality and production safety are fulfilled.

In some existing schemes, a Support Vector Machine (SVM) and a Related Vector Machine (RVM) form a multi-classification model to solve the problem of abnormal state of monitoring electrical equipment, and the scheme has the disadvantages of many classification situations, time consumption and accuracy difficulty in calculation, and difficulty in realizing monitoring of an online chemical device when features are more and data sets are too large; there are also some existing solutions that use a conventional support vector machine SVM as a classification model for monitoring data systems, wherein feature selection is performed by manual analysis, particularly in combination with experience, and also when there are many features and the data set is too large, it is difficult to form a usable classification model, and different persons may perform different feature selections on the same chemical plant, and there may be inappropriate features.

Disclosure of Invention

The invention aims to provide a method and a system for diagnosing and identifying faults in a petrochemical process, which solve the technical problems that faults in the chemical process are difficult to diagnose and identify on line due to the fact that no proper classification model is suitable for the chemical process in the prior art.

In order to achieve the above object, an embodiment of the present invention provides an obtaining method of a model for diagnosing and identifying a petrochemical process fault, including:

calculating kernel function parameters of the SVM model through training set data;

training the SVM model with the training set data, wherein,

the weight parameters of the SVM model are used to score the contribution of process variables to the training set data types,

the contribution scores are used to train a contribution score ranking of the process variables in the process,

the contribution score ranking is used for feature selection in the training process;

and after the training is finished, obtaining the optimal feature set data and the optimal SVM model.

Specifically, the kernel function parameters of the SVM model are calculated by using training set data, where the training set data is obtained by preprocessing, and the preprocessing includes:

performing data normalization on the data of the training set to be processed;

data balancing is performed on the normalized data training set data, wherein,

the data balancing comprises N times of training set data after the data normalization_TS1Next to N_FS1Multiple cross validation, and using the training-validation set data obtained after cross validation as the training set data, N_TS1And N_FS1Is a positive integer and N_TS1And N_FS1Respectively the number of repetitions of cross-validation and the number of data packets.

Specifically, the data normalization is performed on the data of the training set to be processed, where a calculation formula of the data normalization is:

in the calculation formula, x is the data of the training set to be processed, mu is the mean value of the data of the training set to be processed, and sigma is₁Is a stand forAnd (5) the standard deviation of the training set data to be processed, wherein y is the training set data after data normalization.

Specifically, the calculating the kernel function parameter of the SVM model through the training set data includes:

n by training set data_TS2Next to N_FS2Multiple cross validation, calculating Gaussian kernel function parameters of SVM model, wherein N obtained after cross validation_TS2The mean value of the individual kernel function parameter values is used as the Gaussian kernel function parameter of the SVM model, N_TS2And N_FS2Is a positive integer and N_TS2And N_FS2Respectively, the number of repetitions of cross-validation and the number of data packets.

Specifically, the training the SVM model by the training set data includes:

selecting an objective function, wherein the objective function is as follows:

in the objective function formula, x_i∈R^mAnd x_j∈R^m，x_iIs a vector of features of the ith sample of m process variables, x_jIs a vector of the features of the jth sample in the m process variables,

y_ie {1, -1} and y_j∈{1，-1}，y_iIs the category of the ith sample, y_jFor the class of the jth sample,

k(x_i，x_j) Is a Gaussian kernel function, the Gaussian kernel function k (x)_i，x_j) Comprises the following steps:

in the Gaussian kernel function, σ₂Is a parameter of a gaussian kernel function and,

x_i、x_jbelongs to a training feature set, y_i、y_jBelongs to training classification set (x)_i，y_i) And (x)_j，y_j) Belonging to training set data

min is a minimum function, α_iAnd alpha_jIs a Lagrange multiplier, n is the number of samples and is a positive integer;

selecting constraint conditions, wherein the constraint conditions are as follows:

0≤α_i≤C，i＝1，2，3，...，n

c in the constraint condition is a preset balance factor;

selecting a classifier of said SVM model, said classifier f (x)_q) Comprises the following steps:

f(x_q)＝w^Tx_q+b

in the classifier, x_qFor the data to be classified, b is a constant vector, w is x_iVector of weight parameters.

Specifically, the training the SVM model by the training set data further includes:

determining a mapping of the contribution score, the mapping being:

in the said mapping formula, c_pScoring the contribution.

performing an iterative computation for feature selection through a training feature set and a training class set corresponding to the training set data,

wherein, the one-time calculation in the iterative calculation comprises:

solving the objective function under the constraint condition through the current training feature set and the training classification set corresponding to the current training feature set, training the classifier, and obtaining the vector of the weight parameter corresponding to the vector of each feature,

calculating a contribution score for each process variable by the obtained vector of weight parameters, forming a ranking of the contribution scores for the process variables, ranking the formed contribution scores as a feature ranking of the vector of features,

according to the feature sequence, removing the feature corresponding to the minimum score in the feature sequence from the current training feature set, and returning the removed training feature set to the step of training the classifier.

Specifically, one calculation in the iterative calculation further includes:

calculating at least a F1-score value in a performance index quantity used to evaluate a classifier corresponding to the current training feature set;

the F1-score value was recorded.

Specifically, after the SVM model is trained by the training set data and before the obtaining of the optimal feature set data and the optimal SVM model, the method further includes:

obtaining quasi-determined optimal feature set data and a quasi-determined optimal SVM model according to the recorded magnitude relation of each F1-score value, wherein the quasi-determined optimal SVM model is provided with a classifier corresponding to the quasi-determined optimal feature set data;

and calculating the fault detection rate FDR, the false alarm rate FAR and the accuracy ACC of the optimal SVM model to be determined through a confusion matrix about real classification and prediction classification, and taking the fault detection rate FDR, the false alarm rate FAR and the accuracy ACC as evaluation factors of optimal feature set data and the optimal SVM model.

The embodiment of the invention provides an online diagnosis method for diagnosing and identifying faults in a petrochemical process, which comprises the following steps:

acquiring chemical process online acquisition data with the same data structure as the training set data;

establishing a test set of the online collected data through the optimal feature set data;

and identifying the classification of the test set according to the optimal SVM model, and determining the normality or the fault of the chemical process.

The embodiment of the invention provides a system for diagnosing and identifying faults in a petrochemical process, which comprises:

the off-line training module is used for calculating kernel function parameters of the SVM model through training set data;

training the SVM model with the training set data, wherein,

after training is completed, obtaining optimal feature set data and an optimal SVM model;

the system further comprises:

the online diagnosis module is used for acquiring the online collected data of the chemical process with the same data structure as the training set data;

establishing a test set of the online collected data of the chemical process according to the optimal feature set data;

In another aspect, an embodiment of the present invention provides an electronic device, where the electronic device includes:

at least one processor;

a memory coupled to the at least one processor;

wherein the memory stores instructions executable by the at least one processor, the at least one processor implementing the foregoing method by executing the instructions stored by the memory.

In yet another aspect, an embodiment of the present invention provides a computer-readable storage medium storing computer instructions, which, when executed on a computer, cause the computer to perform the foregoing method.

Additional features and advantages of embodiments of the invention will be set forth in the detailed description which follows.

Drawings

The accompanying drawings, which are included to provide a further understanding of the embodiments of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the embodiments of the invention and not to limit the embodiments of the invention. In the drawings:

FIG. 1 is a schematic diagram of a petrochemical process fault diagnosis and identification process based on SVM nonlinear feature selection according to an embodiment of the present invention;

FIG. 2 is a schematic flow chart of a Tennessee-Ishmann process according to an embodiment of the present invention;

fig. 3 is a schematic diagram illustrating comparison between SVM fault diagnosis and fault recognition data and normal data according to an embodiment of the present invention.

Detailed Description

The following detailed description of embodiments of the invention refers to the accompanying drawings. It should be understood that the detailed description and specific examples, while indicating embodiments of the invention, are given by way of illustration and explanation only, not limitation.

Example 1

Referring to fig. 1, an embodiment of the present invention provides an acquisition method of a model for diagnosing and identifying a petrochemical process fault, including:

calculating kernel function parameters of the SVM model through training set data; training the SVM model with the training set data, wherein weight parameters of the SVM model are used for contribution scores of process variables (characteristics of SVM model) to the training set data types, the contribution scores are used for a contribution score ordering of the process variables in the training process, the contribution score ordering is used for the in-training-process feature selection; and after the training is finished, obtaining the optimal feature set data and the optimal SVM model.

In some implementations, the training set data may be recorded and collected by a sensory acquisition system and a control system, etc. involved in the chemical process. The training set data may be pre-processed, including: performing data normalization on the data of the training set to be processed;

data balancing is performed on the normalized data training set data, wherein,

the data balancing comprises N for the training set data after the data normalization_TS1Next to N_FS1Multiple cross validation, and using the training-validation set data obtained after cross validation as the training set data, N_TS1And N_FS1Is a positive integer and N_TS1And N_FS1Respectively, the number of repetitions and the number of data packets of the cross-validation, N_TS1May be taken as 100, N_FS1Which may be taken as 5, this process may be referred to as data normalization of the training set data and forming a balanced data set.

Specifically, for the normalization of the training set data, the training set data is normalized to a standard normal distribution data set with a mean value of 0 and a variance of 1. The raw data set can be normalized to a normal distribution standard data set with a mean of 0 and a variance of 1 using zero-mean (Z-score, also known as standard score, which is the process of dividing the difference of one number from the mean by the standard deviation) normalization. Indicating how far the raw data deviates from the mean, and the criterion for this distance measure is the standard deviation. When the data distribution is too messy, the maximum value and the minimum value cannot be judged, or excessive singular points exist in the data, z-score normalization can be adopted to normalize the data, and the mathematical expression is shown as formula (1):

in the calculation formula, x is the data of the training set to be processed, mu is the mean value of the data of the training set to be processed, and sigma is₁And y is the standard deviation of the training set data to be processed, and the training set data after data normalization.

Specifically, for a training-validation dataset that yields a balance, a 5-fold cross-validation of 100 times is utilized to yield a balanced training-validation dataset, since an unbalanced dataset may result in a less accurate model due to insufficient model learning. During the calculation of the algorithm, the iteration number can be selected to be 100, and the parameters are ensured to be optimal as much as possible.

For the training process, the method can be divided into determining the parameters of the Gaussian kernel function, calculating the score ranking of all the features, and determining the optimal feature set and the optimal SVM model.

Specifically, for determining the parameters of the Gaussian kernel function, 10 (N in this case) may be passed_FS2Taking 10) times of cross validation to determine kernel function parameters; further, cross validation firstly equally divides the training data into X small data sets, one of which is taken as a test set and X-1 are taken as training sets each time, and the training process is repeated by N_TS2Then X usable kernel parameter values are generated, and then the average value of the X values is taken as the kernel function parameter sigma₂. Usually X starts at 3 and usually takes values of 5 and 10.

Specifically, for calculating the contribution score ranking of all variables to the data set type, backward feature selection can be performed based on the feature sequence of the SVM, the feature selection can be performed by iterative computation, the score ranking is performed on all variables of the data set, iterative computation is performed in sequence, and the features with the lowest ranking are removed until the feature set is an empty set. In each iteration, the ranking of the features depends on the weight coefficients, which are expressed mathematically as shown in equation (2):

the ranking score (i.e., contribution score) corresponding to the pth feature in each feature set is calculated as shown in equation (3):

in the binary problem, for training set data with n samples

x_i∈R^mAnd x_j∈R^m，x_iVector of features for the ith sample of m process variables, x_jIs a vector of features of the jth sample of m process variables, x_i、x_jBelongs to a training feature set, y_i、y_jBelongs to a training classification set, y_iE {1, -1} and y_jE {1, -1} (one of 1 and-1 can indicate normal, then the other indicates failure), y_iIs a class label of the i-th sample, y_jIs the class label of the jth sample when y_iWhen 1, the classifier of the SVM model may be represented by formula (4):

f(x_q)＝w^Tx_q+b (4)

in the classifier, x_qFor the data to be classified, b is a constant vector, w is x_iWhen w is [ w ═ w [ [ w ]₁，w₂，...，w_m]^TBy using a gaussian kernel function, the SVM-based binary problem can be converted into equation (5):

0≤α_i≤C，i＝1，2，3，...，n

objective function and constraint of the above formula, α_iAnd alpha_jFor Lagrange multipliers, C is a predetermined representation training accuracyA trade-off factor with model complexity, wherein the Gaussian kernel function k (x)_i，x_j) Can be as follows:

σ₂are gaussian kernel function parameters.

Specifically, the iterative computation for feature selection is performed through a training feature set and a training class set corresponding to the training set data,

wherein, one calculation in the iterative calculation comprises:

In some cases, when the iterative computation starts, there may be an initialization process, where the current training feature set may be initialized to an original set, and an empty ranking set is initialized, and the contribution score is used to update the features with order to the ranking set after ranking, thereby completing the feature ranking and removal.

Specifically, for determining the optimal feature set, F1-score value (F1-score, also called F-score or F-measure, which is a measure for testing accuracy in binary data analysis), failure Detection rate fdr (false Discovery rate), false alarm rate far (fault Detection rate) and accuracy acc (accuary) of the model calculation model sequentially established by ranking the variables obtained in the formula (3) are used to form the optimal feature set, and the features in the model with the largest F1-score value are selected.

Specifically, for determining the optimal model, the SVM model corresponding to the optimal feature set is the optimal fault diagnosis and identification model. In the iterative process of feature selection, an optimal feature set and an optimal model need to be determined according to performance indexes of a classifier corresponding to each feature set, F1-score for evaluating the performance of the two classifiers can be selected as a current index quantity, and F1-score evaluates the accuracy of the model by combining Precision (Precision) and Recall (Recall), wherein the expression is shown in (7):

wherein the quantities in equation (7) are determined from the confusion matrix, which is shown in table 1.

TABLE 1 confusion matrix

Where TP represents correctly classified normal data, FP represents incorrectly classified normal data, TN represents correctly classified fault data, and FN represents incorrectly classified fault data. The mathematical expressions corresponding to the accuracy and the Recall degree are as shown in formula (8), Precision calculation formula and Recall calculation formula:

in order to prove the fault diagnosis effect of the proposed optimal model, the FDR, the false alarm rate FAR and the accuracy ACC of the model can be used as evaluation factors, and the corresponding mathematical expression is as (9):

in some cases, the threshold value of the above evaluation factor may be selected according to the desired sensitivity of the failure diagnosis.

The embodiment of the invention also provides an online diagnosis method for diagnosing and identifying the fault in the petrochemical process, which comprises the following steps:

establishing a test set of the online acquired data through the optimal feature set data;

Specifically, the collected online data can be subjected to fault diagnosis and identification, process data of the same type as the offline data can be collected and normalized, the process data of the same type as the training set is collected online and used for verifying an SVM fault diagnosis and identification model established in an offline stage, and then the collected data set is subjected to normalized preprocessing by using a normalization method z-score which is the same as the training set.

Specifically, the feature set corresponding to the test set may be determined according to the optimal feature set at the offline stage.

Specifically, the optimal SVM model can be used for diagnosing and identifying the feature set data determined on line. And calculating a classification performance index F1-score, a fault detection rate FDR, a false alarm rate FAR and an accuracy ACC of the model, and evaluating the fault diagnosis and identification effects.

Example 2

The embodiment of the invention belongs to the same inventive concept as the embodiment 1, and the embodiment of the invention provides a system for diagnosing and identifying the fault of the petrochemical process, which comprises the following steps:

training the SVM model with the training set data, wherein,

the system further comprises:

For the offline training module, wherein the training set data is obtained by pre-processing, the pre-processing comprising:

performing data normalization on the data of the training set to be processed;

data balancing is performed on the normalized data training set data, wherein,

the data balancing comprises N for the training set data after the data normalization_TS1Next to N_FS1Multiple cross validation, and using the training-validation set data obtained after cross validation as the training set data, N_TS1And N_FS1Is a positive integer and N_TS1And N_FS1Respectively the number of repetitions of cross-validation and the number of data packets.

For the offline training module, performing data normalization on the training set data to be processed, where the data normalization is calculated by:

in the calculation formula, x is the data of the training set to be processed, μ is the mean value of the data of the training set to be processed, and σ₁And y is the standard deviation of the training set data to be processed, and the training set data after data normalization.

The offline training module is specifically configured to: n by training set data_TS2Next to N_FS2And performing double cross validation, namely calculating the Gaussian kernel function parameters of the SVM model, wherein N is obtained after cross validation_TS2The mean value of the individual kernel function parameter values is used as the Gaussian kernel function parameter of the SVM model, N_TS2And N_FS2Is a positive integer and N_TS2And N_FS2Respectively, the number of repetitions of cross-validation and the number of data packets.

The offline training module is specifically configured to: selecting an objective function, wherein the objective function is as follows:

y_ie {1, -1} and y_j∈{1，-1}，y_iIs the category of the ith sample, y_jIs a category of the j-th sample,

in the Gaussian kernel function, σ₂In order to be a parameter of the gaussian kernel function,

0≤α_i≤C，i＝1，2，3，...，n

c in the constraint condition is a preset balance factor;

f(x_q)＝w^Tx_q+b

The offline training module is specifically configured to: determining a mapping of the contribution score, the mapping being:

in the said mapping formula, c_pScoring the contribution.

The offline training module is specifically configured to perform iterative computations for feature selection through a training feature set and a training class set corresponding to the training set data,

wherein, one calculation in the iterative calculation comprises:

The offline training module is specifically configured to: calculating at least a F1-score value in a performance index quantity used to evaluate a classifier corresponding to the current training feature set;

the F1-score value was recorded.

The offline training module is further configured to: obtaining quasi-determined optimal feature set data and a quasi-determined optimal SVM model according to the recorded magnitude relation of each F1-score value, wherein the quasi-determined optimal SVM model is provided with a classifier corresponding to the quasi-determined optimal feature set data;

Example 3

The embodiment of the invention belongs to the same inventive concept as embodiments 1 and 2, and provides a diagnosis and identification method of a Tennessee-Ishman (TE) process, wherein the process can represent a general modern chemical process, and the embodiment of the invention selects the process data as a method verification basis.

1. As shown in fig. 2 below (P represents pressure, F represents flow, T represents temperature, J represents power, H represents valve opening, L represents liquid level, I represents indication and C represents control), the TE process is mainly composed of a plurality of operation units such as a reactor, a condenser, a stripper, a gas-liquid separation column, a compressor, and the like. The TE process has four gas reactants: a (g), D (g), E (g) and C (g), wherein the 4 gaseous reactants (abbreviated as A, D, E, C) each contain a small amount of inert gas B (g) (abbreviated as B). Under the action of a catalyst, 4 chemical reactions which are carried out simultaneously are mainly carried out in a reactor, wherein liquid products generated by two main chemical reactions are respectively a liquid product G (liq) (abbreviated as G) and a liquid product H (liq) (abbreviated as H), and a liquid by-product F is generated at the same time, and the four chemical reaction equations are as shown in a formula (10):

A(g)+C(g)+D(g)→G(liq)

A(g)+C(g)+E(g)→H(liq)

A(g)+E(g)→F(liq)

3D(g)→2F(liq) (10)

the TE process includes 41 measured variables and 12 controlled variables as shown in table 2 below. The operating conditions include a normal operating condition and 21 operational fault operating conditions (the SVM may be targeted for fault classification or fault condition classification). The sampling time intervals for the 21 conditions were all set to 3 minutes. Under the normal working condition of the industrial process, 960 data samples generated in 48 hours of process operation are collected as normal data samples; under the fault conditions of all 21 processes, faults are introduced after the processes are stably operated for 8 hours, so that the first 160 data samples in 960 collected data samples do not contain faults, and the second 800 data samples contain faults. 960 data samples collected under normal conditions are used as training samples, and all data samples containing faults are used as test samples. Failure modes 1-7 are step failures with respect to TE process variables, failure modes 8-12 are random variation failures of variables, failure mode 13 is a slow drift failure of chemical reaction kinetics, failure modes 14-15 are corresponding viscous failures, failure modes 16-20 are unknown failures, and failure mode 21 is a constant position failure.

TABLE 2 Tennessee-Ishmann Process failures

2. And (6) standardizing data. The raw data are all from a plurality of sensors, each group of data has respective dimension and data level range, and the data need to be normalized in order to integrate all the data into a data matrix for subsequent cluster analysis. Normalization, which changes the data to be analyzed into relative values having a relative relationship that can be measured on the same order of magnitude, is an effective way to reduce the fall between values. And (4) forming a preprocessing data set by normal and fault 1 working condition data of the TE data, and carrying out z-score normalization.

3. Cross validation yields a balanced data set. The 5-time cross validation divides the training set data into five parts averagely, each part is used as a training set in sequence, and the other parts are used as test sets for cross validation, so that the data set is balanced, and the problem that the SVM model is low in precision due to the fact that original data are unbalanced is avoided.

4.100 times of 10-fold cross-validation determined gaussian kernel function parameters. The cross validation method comprises the steps of firstly, averagely dividing training data into X small data sets, taking one of the small data sets as a test set and X-1 small data sets as a training set each time, generating X usable nuclear parameter values after repeating the training process for X times, and then taking the average value of X values as a nuclear function parameter sigma₂。

5. And determining the optimal SVM model and the optimal feature set. Take TE process fault 1 as an example. When the optimal SVM nonlinear model evaluation index F1-score is 100%, the fault detection rate is 100%, the false alarm rate is 0% and the model accuracy ACC is 100, the optimal feature set corresponds to the operation variables 44.

6. And (4) diagnosing and identifying the fault 1 on line. Based on variable 44, fault 1 can be diagnosed as shown in fig. 3 below. In addition, variables ranked first in fault 1 were 44, 1, 4 and 18, and if C feed changed, resulting in a change in the measured variable 18 (stripper column temperature), in order to keep the component B content constant, the manipulated variable 44(a feed) needs to be changed, corresponding to measured variables 1(a feed) and 4 (total feed).

The method is suitable for fault diagnosis of modern complex petrochemical processes, and can select Tennessee-Ismann (TE) process faults as a method verification basis, and the process has generality and representativeness. The training set data was then z-score normalized, normalizing the data from different sensors with large value range differences to satisfy normal distribution data with mean 1 and variance 0. The gaussian kernel function parameters based on the SVM fault diagnosis and recognition model are determined by 10-fold cross validation, and then the score ranking of all variables used for modeling is calculated. And selecting an optimal SVM fault diagnosis and recognition model and an optimal feature set through F1-score, fault detection rate FDR, false alarm rate FAR and model accuracy ACC. And finally, collecting a data set on line and realizing fault diagnosis and identification by using an SVM model. Compared with other fault diagnosis methods based on data driving, the method has the characteristics of accuracy, objectivity, high efficiency and the like; the method is based on the information of the data, the time consumption of an algorithm in the diagnosis and identification process is low, and the efficiency of fault diagnosis and identification in the modern complex petrochemical process can be improved.

Although the embodiments of the present invention have been described in detail with reference to the accompanying drawings, the embodiments of the present invention are not limited to the details of the above embodiments, and various simple modifications can be made to the technical solutions of the embodiments of the present invention within the technical idea of the embodiments of the present invention, and the simple modifications all belong to the protection scope of the embodiments of the present invention.

It should be noted that the various features described in the foregoing embodiments may be combined in any suitable manner without contradiction. In order to avoid unnecessary repetition, the embodiments of the present invention do not describe every possible combination.

Those skilled in the art can understand that all or part of the steps in the method according to the above embodiments may be implemented by a program, which is stored in a storage medium and includes several instructions to enable a single chip, a chip, or a processor (processor) to perform all or part of the steps in the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

In addition, any combination of various different implementation manners of the embodiments of the present invention is also possible, and the embodiments of the present invention should be considered as disclosed in the embodiments of the present invention as long as the combination does not depart from the spirit of the embodiments of the present invention.

Claims

1. An acquisition method of a model for diagnosing and identifying faults of a petrochemical process is characterized by comprising the following steps:

training the SVM model with the training set data, wherein,

2. The method for acquiring the model for diagnosing and identifying the petrochemical process fault according to claim 1, wherein the kernel function parameters of the SVM model are calculated through training set data, wherein the training set data are obtained through preprocessing, and the preprocessing comprises:

performing data normalization on the data of the training set to be processed;

data balancing is performed on the normalized data training set data, wherein,

3. The method according to claim 2, wherein the data normalization is performed on the data of the training set to be processed, wherein the data normalization is calculated by:

4. The method for obtaining the model for diagnosing and identifying the petrochemical process fault according to claim 1, wherein the calculating the kernel function parameters of the SVM model through the training set data comprises:

n by training set data_TS2Next to N_FS2And performing double cross validation, namely calculating the Gaussian kernel function parameters of the SVM model, wherein N is obtained after cross validation_TS2The mean value of the individual kernel function parameter values is used as the Gaussian kernel function parameter of the SVM model, N_TS2And N_FS2Is a positive integer and N_TS2And N_FS2Respectively the number of repetitions of cross-validation and the number of data packets.

5. The method for obtaining a model for petrochemical process fault diagnosis and identification according to claim 1, wherein the training of the SVM model through the training set data comprises:

selecting an objective function, wherein the objective function is as follows:

0≤α_i≤C，i＝1，2，3，...，n

c in the constraint condition is a preset balance factor;

f(x_q)＝w^Tx_q+b

6. The method for obtaining a model for petrochemical process fault diagnosis and identification according to claim 5, wherein the training the SVM model by the training set data further comprises:

determining a mapping of the contribution score, the mapping being:

in the said mapping formula, c_pScoring the contribution.

7. The method for acquiring the model for diagnosing and identifying the petrochemical process fault according to claim 6, wherein the training of the SVM model through the training set data further comprises:

wherein, one calculation in the iterative calculation comprises:

8. The method of obtaining a model for diagnosing and identifying petrochemical process faults according to claim 7, wherein one of the iterative calculations further comprises:

the F1-score value was recorded.

9. The method for acquiring the model for diagnosing and identifying the fault of the petrochemical process according to claim 8, wherein after the training of the SVM model by the training set data and before the obtaining of the optimal feature set data and the optimal SVM model, the method further comprises:

10. An online diagnosis method for diagnosing and identifying faults of a petrochemical process is characterized by comprising the following steps:

establishing a test set of the online acquisition data from the optimal feature set data of any one of claims 1 to 9;

the optimal SVM model according to any one of claims 1 to 9, wherein the classification of the test set is identified to determine whether a chemical process is normal or faulty.

11. A system for petrochemical process fault diagnosis and identification, the system comprising:

training the SVM model with the training set data, wherein,

the system further comprises:

establishing a test set of the online data collected in the chemical process according to the optimal feature set data;

12. An electronic device, comprising:

at least one processor;

a memory coupled to the at least one processor;

wherein the memory stores instructions executable by the at least one processor, the at least one processor implementing the method of any one of claims 1 to 10 by executing the instructions stored by the memory.

13. A computer readable storage medium storing computer instructions which, when executed on a computer, cause the computer to perform the method of any one of claims 1 to 10.