CN114660931A - Method and system for diagnosing and identifying petrochemical process fault - Google Patents
Method and system for diagnosing and identifying petrochemical process fault Download PDFInfo
- Publication number
- CN114660931A CN114660931A CN202011530237.2A CN202011530237A CN114660931A CN 114660931 A CN114660931 A CN 114660931A CN 202011530237 A CN202011530237 A CN 202011530237A CN 114660931 A CN114660931 A CN 114660931A
- Authority
- CN
- China
- Prior art keywords
- training
- data
- svm model
- set data
- training set
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 137
- 230000008569 process Effects 0.000 title claims abstract description 96
- 238000012549 training Methods 0.000 claims abstract description 194
- 238000003745 diagnosis Methods 0.000 claims abstract description 32
- 230000006870 function Effects 0.000 claims description 57
- 238000002790 cross-validation Methods 0.000 claims description 30
- 238000010606 normalization Methods 0.000 claims description 23
- 238000004364 calculation method Methods 0.000 claims description 20
- 238000001311 chemical methods and process Methods 0.000 claims description 19
- 238000012360 testing method Methods 0.000 claims description 18
- 238000001514 detection method Methods 0.000 claims description 11
- 238000013507 mapping Methods 0.000 claims description 9
- 238000007781 pre-processing Methods 0.000 claims description 8
- 238000011156 evaluation Methods 0.000 claims description 6
- 239000011159 matrix material Substances 0.000 claims description 6
- 238000010200 validation analysis Methods 0.000 claims description 6
- 238000003860 storage Methods 0.000 claims description 5
- 239000000126 substance Substances 0.000 abstract description 4
- 238000012706 support-vector machine Methods 0.000 description 62
- 238000006243 chemical reaction Methods 0.000 description 4
- 238000013145 classification model Methods 0.000 description 4
- 238000009826 distribution Methods 0.000 description 4
- 230000014509 gene expression Effects 0.000 description 4
- 238000004519 manufacturing process Methods 0.000 description 4
- 238000012544 monitoring process Methods 0.000 description 4
- 239000007788 liquid Substances 0.000 description 3
- 239000012263 liquid product Substances 0.000 description 3
- 230000002159 abnormal effect Effects 0.000 description 2
- 238000004422 calculation algorithm Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000011112 process operation Methods 0.000 description 2
- 239000000376 reactant Substances 0.000 description 2
- 238000012795 verification Methods 0.000 description 2
- 238000002759 z-score normalization Methods 0.000 description 2
- NAWXUBYGYWOOIX-SFHVURJKSA-N (2s)-2-[[4-[2-(2,4-diaminoquinazolin-6-yl)ethyl]benzoyl]amino]-4-methylidenepentanedioic acid Chemical compound C1=CC2=NC(N)=NC(N)=C2C=C1CCC1=CC=C(C(=O)N[C@@H](CC(=C)C(O)=O)C(O)=O)C=C1 NAWXUBYGYWOOIX-SFHVURJKSA-N 0.000 description 1
- 102100029469 WD repeat and HMG-box DNA-binding protein 1 Human genes 0.000 description 1
- 101710097421 WD repeat and HMG-box DNA-binding protein 1 Proteins 0.000 description 1
- 230000009471 action Effects 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 239000006227 byproduct Substances 0.000 description 1
- 239000003054 catalyst Substances 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000007621 cluster analysis Methods 0.000 description 1
- 238000007405 data analysis Methods 0.000 description 1
- 239000007789 gas Substances 0.000 description 1
- 239000011261 inert gas Substances 0.000 description 1
- 238000012804 iterative process Methods 0.000 description 1
- 238000012821 model calculation Methods 0.000 description 1
- 230000008689 nuclear function Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 239000000047 product Substances 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 230000035945 sensitivity Effects 0.000 description 1
- 230000001953 sensory effect Effects 0.000 description 1
- 238000000926 separation method Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05B—CONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
- G05B13/00—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion
- G05B13/02—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric
- G05B13/04—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric involving the use of models or simulators
- G05B13/042—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric involving the use of models or simulators in which a parameter or coefficient is automatically adjusted to optimise the performance
Landscapes
- Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Computation (AREA)
- Medical Informatics (AREA)
- Software Systems (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Automation & Control Theory (AREA)
- Testing And Monitoring For Control Systems (AREA)
Abstract
The invention provides a method and a system for diagnosing and identifying faults in a petrochemical process, and belongs to the technical field of chemical safety. The method comprises the following steps: calculating kernel function parameters of the SVM model through training set data; training the SVM model with the training set data, wherein a weight parameter of the SVM model is used for a contribution score of a process variable to the training set data type, the contribution score being used for a contribution score ranking of the process variable in a training process, the contribution score ranking being used for the in-training-process feature selection; and after the training is finished, obtaining the optimal feature set data and the optimal SVM model. The invention is used for fault diagnosis of modern complex petrochemical processes.
Description
Technical Field
The invention relates to the technical field of chemical safety, in particular to a model acquisition method for petrochemical process fault diagnosis and identification, an online diagnosis method for petrochemical process fault diagnosis and identification, a system for petrochemical process fault diagnosis and identification, an electronic device and a computer-readable storage medium.
Background
Because the petrochemical process has great complexity and has higher requirements on the safety and the reliability of the device, the online diagnosis and the fault identification of the petrochemical process are extremely important. The fault diagnosis and identification method is an important technical means for ensuring safe and stable operation of the process, fault points can be accurately positioned through monitoring the abnormal state of the system, and the process operation is optimized and adjusted in time, so that the stability, reliability and safety of the production process are ensured, and the aims of improving the production efficiency, product quality and production safety are fulfilled.
In some existing schemes, a Support Vector Machine (SVM) and a Related Vector Machine (RVM) form a multi-classification model to solve the problem of abnormal state of monitoring electrical equipment, and the scheme has the disadvantages of many classification situations, time consumption and accuracy difficulty in calculation, and difficulty in realizing monitoring of an online chemical device when features are more and data sets are too large; there are also some existing solutions that use a conventional support vector machine SVM as a classification model for monitoring data systems, wherein feature selection is performed by manual analysis, particularly in combination with experience, and also when there are many features and the data set is too large, it is difficult to form a usable classification model, and different persons may perform different feature selections on the same chemical plant, and there may be inappropriate features.
Disclosure of Invention
The invention aims to provide a method and a system for diagnosing and identifying faults in a petrochemical process, which solve the technical problems that faults in the chemical process are difficult to diagnose and identify on line due to the fact that no proper classification model is suitable for the chemical process in the prior art.
In order to achieve the above object, an embodiment of the present invention provides an obtaining method of a model for diagnosing and identifying a petrochemical process fault, including:
calculating kernel function parameters of the SVM model through training set data;
training the SVM model with the training set data, wherein,
the weight parameters of the SVM model are used to score the contribution of process variables to the training set data types,
the contribution scores are used to train a contribution score ranking of the process variables in the process,
the contribution score ranking is used for feature selection in the training process;
and after the training is finished, obtaining the optimal feature set data and the optimal SVM model.
Specifically, the kernel function parameters of the SVM model are calculated by using training set data, where the training set data is obtained by preprocessing, and the preprocessing includes:
performing data normalization on the data of the training set to be processed;
data balancing is performed on the normalized data training set data, wherein,
the data balancing comprises N times of training set data after the data normalizationTS1Next to NFS1Multiple cross validation, and using the training-validation set data obtained after cross validation as the training set data, NTS1And NFS1Is a positive integer and NTS1And NFS1Respectively the number of repetitions of cross-validation and the number of data packets.
Specifically, the data normalization is performed on the data of the training set to be processed, where a calculation formula of the data normalization is:
in the calculation formula, x is the data of the training set to be processed, mu is the mean value of the data of the training set to be processed, and sigma is1Is a stand forAnd (5) the standard deviation of the training set data to be processed, wherein y is the training set data after data normalization.
Specifically, the calculating the kernel function parameter of the SVM model through the training set data includes:
n by training set dataTS2Next to NFS2Multiple cross validation, calculating Gaussian kernel function parameters of SVM model, wherein N obtained after cross validationTS2The mean value of the individual kernel function parameter values is used as the Gaussian kernel function parameter of the SVM model, NTS2And NFS2Is a positive integer and NTS2And NFS2Respectively, the number of repetitions of cross-validation and the number of data packets.
Specifically, the training the SVM model by the training set data includes:
selecting an objective function, wherein the objective function is as follows:
in the objective function formula, xi∈RmAnd xj∈Rm,xiIs a vector of features of the ith sample of m process variables, xjIs a vector of the features of the jth sample in the m process variables,
yie {1, -1} and yj∈{1,-1},yiIs the category of the ith sample, yjFor the class of the jth sample,
k(xi,xj) Is a Gaussian kernel function, the Gaussian kernel function k (x)i,xj) Comprises the following steps:
in the Gaussian kernel function, σ2Is a parameter of a gaussian kernel function and,
xi、xjbelongs to a training feature set, yi、yjBelongs to training classification set (x)i,yi) And (x)j,yj) Belonging to training set datamin is a minimum function, αiAnd alphajIs a Lagrange multiplier, n is the number of samples and is a positive integer;
selecting constraint conditions, wherein the constraint conditions are as follows:
0≤αi≤C,i=1,2,3,...,n
c in the constraint condition is a preset balance factor;
selecting a classifier of said SVM model, said classifier f (x)q) Comprises the following steps:
f(xq)=wTxq+b
in the classifier, xqFor the data to be classified, b is a constant vector, w is xiVector of weight parameters.
Specifically, the training the SVM model by the training set data further includes:
determining a mapping of the contribution score, the mapping being:
in the said mapping formula, cpScoring the contribution.
Specifically, the training the SVM model by the training set data further includes:
performing an iterative computation for feature selection through a training feature set and a training class set corresponding to the training set data,
wherein, the one-time calculation in the iterative calculation comprises:
solving the objective function under the constraint condition through the current training feature set and the training classification set corresponding to the current training feature set, training the classifier, and obtaining the vector of the weight parameter corresponding to the vector of each feature,
calculating a contribution score for each process variable by the obtained vector of weight parameters, forming a ranking of the contribution scores for the process variables, ranking the formed contribution scores as a feature ranking of the vector of features,
according to the feature sequence, removing the feature corresponding to the minimum score in the feature sequence from the current training feature set, and returning the removed training feature set to the step of training the classifier.
Specifically, one calculation in the iterative calculation further includes:
calculating at least a F1-score value in a performance index quantity used to evaluate a classifier corresponding to the current training feature set;
the F1-score value was recorded.
Specifically, after the SVM model is trained by the training set data and before the obtaining of the optimal feature set data and the optimal SVM model, the method further includes:
obtaining quasi-determined optimal feature set data and a quasi-determined optimal SVM model according to the recorded magnitude relation of each F1-score value, wherein the quasi-determined optimal SVM model is provided with a classifier corresponding to the quasi-determined optimal feature set data;
and calculating the fault detection rate FDR, the false alarm rate FAR and the accuracy ACC of the optimal SVM model to be determined through a confusion matrix about real classification and prediction classification, and taking the fault detection rate FDR, the false alarm rate FAR and the accuracy ACC as evaluation factors of optimal feature set data and the optimal SVM model.
The embodiment of the invention provides an online diagnosis method for diagnosing and identifying faults in a petrochemical process, which comprises the following steps:
acquiring chemical process online acquisition data with the same data structure as the training set data;
establishing a test set of the online collected data through the optimal feature set data;
and identifying the classification of the test set according to the optimal SVM model, and determining the normality or the fault of the chemical process.
The embodiment of the invention provides a system for diagnosing and identifying faults in a petrochemical process, which comprises:
the off-line training module is used for calculating kernel function parameters of the SVM model through training set data;
training the SVM model with the training set data, wherein,
the weight parameters of the SVM model are used to score the contribution of process variables to the training set data types,
the contribution scores are used to train a contribution score ranking of the process variables in the process,
the contribution score ranking is used for feature selection in the training process;
after training is completed, obtaining optimal feature set data and an optimal SVM model;
the system further comprises:
the online diagnosis module is used for acquiring the online collected data of the chemical process with the same data structure as the training set data;
establishing a test set of the online collected data of the chemical process according to the optimal feature set data;
and identifying the classification of the test set according to the optimal SVM model, and determining the normality or the fault of the chemical process.
In another aspect, an embodiment of the present invention provides an electronic device, where the electronic device includes:
at least one processor;
a memory coupled to the at least one processor;
wherein the memory stores instructions executable by the at least one processor, the at least one processor implementing the foregoing method by executing the instructions stored by the memory.
In yet another aspect, an embodiment of the present invention provides a computer-readable storage medium storing computer instructions, which, when executed on a computer, cause the computer to perform the foregoing method.
Additional features and advantages of embodiments of the invention will be set forth in the detailed description which follows.
Drawings
The accompanying drawings, which are included to provide a further understanding of the embodiments of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the embodiments of the invention and not to limit the embodiments of the invention. In the drawings:
FIG. 1 is a schematic diagram of a petrochemical process fault diagnosis and identification process based on SVM nonlinear feature selection according to an embodiment of the present invention;
FIG. 2 is a schematic flow chart of a Tennessee-Ishmann process according to an embodiment of the present invention;
fig. 3 is a schematic diagram illustrating comparison between SVM fault diagnosis and fault recognition data and normal data according to an embodiment of the present invention.
Detailed Description
The following detailed description of embodiments of the invention refers to the accompanying drawings. It should be understood that the detailed description and specific examples, while indicating embodiments of the invention, are given by way of illustration and explanation only, not limitation.
Example 1
Referring to fig. 1, an embodiment of the present invention provides an acquisition method of a model for diagnosing and identifying a petrochemical process fault, including:
calculating kernel function parameters of the SVM model through training set data; training the SVM model with the training set data, wherein weight parameters of the SVM model are used for contribution scores of process variables (characteristics of SVM model) to the training set data types, the contribution scores are used for a contribution score ordering of the process variables in the training process, the contribution score ordering is used for the in-training-process feature selection; and after the training is finished, obtaining the optimal feature set data and the optimal SVM model.
In some implementations, the training set data may be recorded and collected by a sensory acquisition system and a control system, etc. involved in the chemical process. The training set data may be pre-processed, including: performing data normalization on the data of the training set to be processed;
data balancing is performed on the normalized data training set data, wherein,
the data balancing comprises N for the training set data after the data normalizationTS1Next to NFS1Multiple cross validation, and using the training-validation set data obtained after cross validation as the training set data, NTS1And NFS1Is a positive integer and NTS1And NFS1Respectively, the number of repetitions and the number of data packets of the cross-validation, NTS1May be taken as 100, NFS1Which may be taken as 5, this process may be referred to as data normalization of the training set data and forming a balanced data set.
Specifically, for the normalization of the training set data, the training set data is normalized to a standard normal distribution data set with a mean value of 0 and a variance of 1. The raw data set can be normalized to a normal distribution standard data set with a mean of 0 and a variance of 1 using zero-mean (Z-score, also known as standard score, which is the process of dividing the difference of one number from the mean by the standard deviation) normalization. Indicating how far the raw data deviates from the mean, and the criterion for this distance measure is the standard deviation. When the data distribution is too messy, the maximum value and the minimum value cannot be judged, or excessive singular points exist in the data, z-score normalization can be adopted to normalize the data, and the mathematical expression is shown as formula (1):
in the calculation formula, x is the data of the training set to be processed, mu is the mean value of the data of the training set to be processed, and sigma is1And y is the standard deviation of the training set data to be processed, and the training set data after data normalization.
Specifically, for a training-validation dataset that yields a balance, a 5-fold cross-validation of 100 times is utilized to yield a balanced training-validation dataset, since an unbalanced dataset may result in a less accurate model due to insufficient model learning. During the calculation of the algorithm, the iteration number can be selected to be 100, and the parameters are ensured to be optimal as much as possible.
For the training process, the method can be divided into determining the parameters of the Gaussian kernel function, calculating the score ranking of all the features, and determining the optimal feature set and the optimal SVM model.
Specifically, for determining the parameters of the Gaussian kernel function, 10 (N in this case) may be passedFS2Taking 10) times of cross validation to determine kernel function parameters; further, cross validation firstly equally divides the training data into X small data sets, one of which is taken as a test set and X-1 are taken as training sets each time, and the training process is repeated by NTS2Then X usable kernel parameter values are generated, and then the average value of the X values is taken as the kernel function parameter sigma2. Usually X starts at 3 and usually takes values of 5 and 10.
Specifically, for calculating the contribution score ranking of all variables to the data set type, backward feature selection can be performed based on the feature sequence of the SVM, the feature selection can be performed by iterative computation, the score ranking is performed on all variables of the data set, iterative computation is performed in sequence, and the features with the lowest ranking are removed until the feature set is an empty set. In each iteration, the ranking of the features depends on the weight coefficients, which are expressed mathematically as shown in equation (2):
the ranking score (i.e., contribution score) corresponding to the pth feature in each feature set is calculated as shown in equation (3):
in the binary problem, for training set data with n samplesxi∈RmAnd xj∈Rm,xiVector of features for the ith sample of m process variables, xjIs a vector of features of the jth sample of m process variables, xi、xjBelongs to a training feature set, yi、yjBelongs to a training classification set, yiE {1, -1} and yjE {1, -1} (one of 1 and-1 can indicate normal, then the other indicates failure), yiIs a class label of the i-th sample, yjIs the class label of the jth sample when yiWhen 1, the classifier of the SVM model may be represented by formula (4):
f(xq)=wTxq+b (4)
in the classifier, xqFor the data to be classified, b is a constant vector, w is xiWhen w is [ w ═ w [ [ w ]1,w2,...,wm]TBy using a gaussian kernel function, the SVM-based binary problem can be converted into equation (5):
0≤αi≤C,i=1,2,3,...,n
objective function and constraint of the above formula, αiAnd alphajFor Lagrange multipliers, C is a predetermined representation training accuracyA trade-off factor with model complexity, wherein the Gaussian kernel function k (x)i,xj) Can be as follows:
σ2are gaussian kernel function parameters.
Specifically, the iterative computation for feature selection is performed through a training feature set and a training class set corresponding to the training set data,
wherein, one calculation in the iterative calculation comprises:
solving the objective function under the constraint condition through the current training feature set and the training classification set corresponding to the current training feature set, training the classifier, and obtaining the vector of the weight parameter corresponding to the vector of each feature,
calculating a contribution score for each process variable by the obtained vector of weight parameters, forming a ranking of the contribution scores for the process variables, ranking the formed contribution scores as a feature ranking of the vector of features,
according to the feature sequence, removing the feature corresponding to the minimum score in the feature sequence from the current training feature set, and returning the removed training feature set to the step of training the classifier.
In some cases, when the iterative computation starts, there may be an initialization process, where the current training feature set may be initialized to an original set, and an empty ranking set is initialized, and the contribution score is used to update the features with order to the ranking set after ranking, thereby completing the feature ranking and removal.
Specifically, for determining the optimal feature set, F1-score value (F1-score, also called F-score or F-measure, which is a measure for testing accuracy in binary data analysis), failure Detection rate fdr (false Discovery rate), false alarm rate far (fault Detection rate) and accuracy acc (accuary) of the model calculation model sequentially established by ranking the variables obtained in the formula (3) are used to form the optimal feature set, and the features in the model with the largest F1-score value are selected.
Specifically, for determining the optimal model, the SVM model corresponding to the optimal feature set is the optimal fault diagnosis and identification model. In the iterative process of feature selection, an optimal feature set and an optimal model need to be determined according to performance indexes of a classifier corresponding to each feature set, F1-score for evaluating the performance of the two classifiers can be selected as a current index quantity, and F1-score evaluates the accuracy of the model by combining Precision (Precision) and Recall (Recall), wherein the expression is shown in (7):
wherein the quantities in equation (7) are determined from the confusion matrix, which is shown in table 1.
TABLE 1 confusion matrix
Where TP represents correctly classified normal data, FP represents incorrectly classified normal data, TN represents correctly classified fault data, and FN represents incorrectly classified fault data. The mathematical expressions corresponding to the accuracy and the Recall degree are as shown in formula (8), Precision calculation formula and Recall calculation formula:
in order to prove the fault diagnosis effect of the proposed optimal model, the FDR, the false alarm rate FAR and the accuracy ACC of the model can be used as evaluation factors, and the corresponding mathematical expression is as (9):
in some cases, the threshold value of the above evaluation factor may be selected according to the desired sensitivity of the failure diagnosis.
The embodiment of the invention also provides an online diagnosis method for diagnosing and identifying the fault in the petrochemical process, which comprises the following steps:
acquiring chemical process online acquisition data with the same data structure as the training set data;
establishing a test set of the online acquired data through the optimal feature set data;
and identifying the classification of the test set according to the optimal SVM model, and determining the normality or the fault of the chemical process.
Specifically, the collected online data can be subjected to fault diagnosis and identification, process data of the same type as the offline data can be collected and normalized, the process data of the same type as the training set is collected online and used for verifying an SVM fault diagnosis and identification model established in an offline stage, and then the collected data set is subjected to normalized preprocessing by using a normalization method z-score which is the same as the training set.
Specifically, the feature set corresponding to the test set may be determined according to the optimal feature set at the offline stage.
Specifically, the optimal SVM model can be used for diagnosing and identifying the feature set data determined on line. And calculating a classification performance index F1-score, a fault detection rate FDR, a false alarm rate FAR and an accuracy ACC of the model, and evaluating the fault diagnosis and identification effects.
Example 2
The embodiment of the invention belongs to the same inventive concept as the embodiment 1, and the embodiment of the invention provides a system for diagnosing and identifying the fault of the petrochemical process, which comprises the following steps:
the off-line training module is used for calculating kernel function parameters of the SVM model through training set data;
training the SVM model with the training set data, wherein,
the weight parameters of the SVM model are used to score the contribution of process variables to the training set data types,
the contribution scores are used to train a contribution score ranking of the process variables in the process,
the contribution score ranking is used for feature selection in the training process;
after training is completed, obtaining optimal feature set data and an optimal SVM model;
the system further comprises:
the online diagnosis module is used for acquiring the online collected data of the chemical process with the same data structure as the training set data;
establishing a test set of the online collected data of the chemical process according to the optimal feature set data;
and identifying the classification of the test set according to the optimal SVM model, and determining the normality or the fault of the chemical process.
For the offline training module, wherein the training set data is obtained by pre-processing, the pre-processing comprising:
performing data normalization on the data of the training set to be processed;
data balancing is performed on the normalized data training set data, wherein,
the data balancing comprises N for the training set data after the data normalizationTS1Next to NFS1Multiple cross validation, and using the training-validation set data obtained after cross validation as the training set data, NTS1And NFS1Is a positive integer and NTS1And NFS1Respectively the number of repetitions of cross-validation and the number of data packets.
For the offline training module, performing data normalization on the training set data to be processed, where the data normalization is calculated by:
in the calculation formula, x is the data of the training set to be processed, μ is the mean value of the data of the training set to be processed, and σ1And y is the standard deviation of the training set data to be processed, and the training set data after data normalization.
The offline training module is specifically configured to: n by training set dataTS2Next to NFS2And performing double cross validation, namely calculating the Gaussian kernel function parameters of the SVM model, wherein N is obtained after cross validationTS2The mean value of the individual kernel function parameter values is used as the Gaussian kernel function parameter of the SVM model, NTS2And NFS2Is a positive integer and NTS2And NFS2Respectively, the number of repetitions of cross-validation and the number of data packets.
The offline training module is specifically configured to: selecting an objective function, wherein the objective function is as follows:
in the objective function formula, xi∈RmAnd xj∈Rm,xiIs a vector of features of the ith sample of m process variables, xjIs a vector of the features of the jth sample in the m process variables,
yie {1, -1} and yj∈{1,-1},yiIs the category of the ith sample, yjIs a category of the j-th sample,
k(xi,xj) Is a Gaussian kernel function, the Gaussian kernel function k (x)i,xj) Comprises the following steps:
in the Gaussian kernel function, σ2In order to be a parameter of the gaussian kernel function,
xi、xjbelongs to a training feature set, yi、yjBelongs to training classification set (x)i,yi) And (x)j,yj) Belonging to training set datamin is a minimum function, αiAnd alphajIs a Lagrange multiplier, n is the number of samples and is a positive integer;
selecting constraint conditions, wherein the constraint conditions are as follows:
0≤αi≤C,i=1,2,3,...,n
c in the constraint condition is a preset balance factor;
selecting a classifier of said SVM model, said classifier f (x)q) Comprises the following steps:
f(xq)=wTxq+b
in the classifier, xqFor the data to be classified, b is a constant vector, w is xiVector of weight parameters.
The offline training module is specifically configured to: determining a mapping of the contribution score, the mapping being:
in the said mapping formula, cpScoring the contribution.
The offline training module is specifically configured to perform iterative computations for feature selection through a training feature set and a training class set corresponding to the training set data,
wherein, one calculation in the iterative calculation comprises:
solving the objective function under the constraint condition through the current training feature set and the training classification set corresponding to the current training feature set, training the classifier, and obtaining the vector of the weight parameter corresponding to the vector of each feature,
calculating a contribution score for each process variable by the obtained vector of weight parameters, forming a ranking of the contribution scores for the process variables, ranking the formed contribution scores as a feature ranking of the vector of features,
according to the feature sequence, removing the feature corresponding to the minimum score in the feature sequence from the current training feature set, and returning the removed training feature set to the step of training the classifier.
The offline training module is specifically configured to: calculating at least a F1-score value in a performance index quantity used to evaluate a classifier corresponding to the current training feature set;
the F1-score value was recorded.
The offline training module is further configured to: obtaining quasi-determined optimal feature set data and a quasi-determined optimal SVM model according to the recorded magnitude relation of each F1-score value, wherein the quasi-determined optimal SVM model is provided with a classifier corresponding to the quasi-determined optimal feature set data;
and calculating the fault detection rate FDR, the false alarm rate FAR and the accuracy ACC of the optimal SVM model to be determined through a confusion matrix about real classification and prediction classification, and taking the fault detection rate FDR, the false alarm rate FAR and the accuracy ACC as evaluation factors of optimal feature set data and the optimal SVM model.
Example 3
The embodiment of the invention belongs to the same inventive concept as embodiments 1 and 2, and provides a diagnosis and identification method of a Tennessee-Ishman (TE) process, wherein the process can represent a general modern chemical process, and the embodiment of the invention selects the process data as a method verification basis.
1. As shown in fig. 2 below (P represents pressure, F represents flow, T represents temperature, J represents power, H represents valve opening, L represents liquid level, I represents indication and C represents control), the TE process is mainly composed of a plurality of operation units such as a reactor, a condenser, a stripper, a gas-liquid separation column, a compressor, and the like. The TE process has four gas reactants: a (g), D (g), E (g) and C (g), wherein the 4 gaseous reactants (abbreviated as A, D, E, C) each contain a small amount of inert gas B (g) (abbreviated as B). Under the action of a catalyst, 4 chemical reactions which are carried out simultaneously are mainly carried out in a reactor, wherein liquid products generated by two main chemical reactions are respectively a liquid product G (liq) (abbreviated as G) and a liquid product H (liq) (abbreviated as H), and a liquid by-product F is generated at the same time, and the four chemical reaction equations are as shown in a formula (10):
A(g)+C(g)+D(g)→G(liq)
A(g)+C(g)+E(g)→H(liq)
A(g)+E(g)→F(liq)
3D(g)→2F(liq) (10)
the TE process includes 41 measured variables and 12 controlled variables as shown in table 2 below. The operating conditions include a normal operating condition and 21 operational fault operating conditions (the SVM may be targeted for fault classification or fault condition classification). The sampling time intervals for the 21 conditions were all set to 3 minutes. Under the normal working condition of the industrial process, 960 data samples generated in 48 hours of process operation are collected as normal data samples; under the fault conditions of all 21 processes, faults are introduced after the processes are stably operated for 8 hours, so that the first 160 data samples in 960 collected data samples do not contain faults, and the second 800 data samples contain faults. 960 data samples collected under normal conditions are used as training samples, and all data samples containing faults are used as test samples. Failure modes 1-7 are step failures with respect to TE process variables, failure modes 8-12 are random variation failures of variables, failure mode 13 is a slow drift failure of chemical reaction kinetics, failure modes 14-15 are corresponding viscous failures, failure modes 16-20 are unknown failures, and failure mode 21 is a constant position failure.
TABLE 2 Tennessee-Ishmann Process failures
2. And (6) standardizing data. The raw data are all from a plurality of sensors, each group of data has respective dimension and data level range, and the data need to be normalized in order to integrate all the data into a data matrix for subsequent cluster analysis. Normalization, which changes the data to be analyzed into relative values having a relative relationship that can be measured on the same order of magnitude, is an effective way to reduce the fall between values. And (4) forming a preprocessing data set by normal and fault 1 working condition data of the TE data, and carrying out z-score normalization.
3. Cross validation yields a balanced data set. The 5-time cross validation divides the training set data into five parts averagely, each part is used as a training set in sequence, and the other parts are used as test sets for cross validation, so that the data set is balanced, and the problem that the SVM model is low in precision due to the fact that original data are unbalanced is avoided.
4.100 times of 10-fold cross-validation determined gaussian kernel function parameters. The cross validation method comprises the steps of firstly, averagely dividing training data into X small data sets, taking one of the small data sets as a test set and X-1 small data sets as a training set each time, generating X usable nuclear parameter values after repeating the training process for X times, and then taking the average value of X values as a nuclear function parameter sigma2。
5. And determining the optimal SVM model and the optimal feature set. Take TE process fault 1 as an example. When the optimal SVM nonlinear model evaluation index F1-score is 100%, the fault detection rate is 100%, the false alarm rate is 0% and the model accuracy ACC is 100, the optimal feature set corresponds to the operation variables 44.
6. And (4) diagnosing and identifying the fault 1 on line. Based on variable 44, fault 1 can be diagnosed as shown in fig. 3 below. In addition, variables ranked first in fault 1 were 44, 1, 4 and 18, and if C feed changed, resulting in a change in the measured variable 18 (stripper column temperature), in order to keep the component B content constant, the manipulated variable 44(a feed) needs to be changed, corresponding to measured variables 1(a feed) and 4 (total feed).
The method is suitable for fault diagnosis of modern complex petrochemical processes, and can select Tennessee-Ismann (TE) process faults as a method verification basis, and the process has generality and representativeness. The training set data was then z-score normalized, normalizing the data from different sensors with large value range differences to satisfy normal distribution data with mean 1 and variance 0. The gaussian kernel function parameters based on the SVM fault diagnosis and recognition model are determined by 10-fold cross validation, and then the score ranking of all variables used for modeling is calculated. And selecting an optimal SVM fault diagnosis and recognition model and an optimal feature set through F1-score, fault detection rate FDR, false alarm rate FAR and model accuracy ACC. And finally, collecting a data set on line and realizing fault diagnosis and identification by using an SVM model. Compared with other fault diagnosis methods based on data driving, the method has the characteristics of accuracy, objectivity, high efficiency and the like; the method is based on the information of the data, the time consumption of an algorithm in the diagnosis and identification process is low, and the efficiency of fault diagnosis and identification in the modern complex petrochemical process can be improved.
Although the embodiments of the present invention have been described in detail with reference to the accompanying drawings, the embodiments of the present invention are not limited to the details of the above embodiments, and various simple modifications can be made to the technical solutions of the embodiments of the present invention within the technical idea of the embodiments of the present invention, and the simple modifications all belong to the protection scope of the embodiments of the present invention.
It should be noted that the various features described in the foregoing embodiments may be combined in any suitable manner without contradiction. In order to avoid unnecessary repetition, the embodiments of the present invention do not describe every possible combination.
Those skilled in the art can understand that all or part of the steps in the method according to the above embodiments may be implemented by a program, which is stored in a storage medium and includes several instructions to enable a single chip, a chip, or a processor (processor) to perform all or part of the steps in the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
In addition, any combination of various different implementation manners of the embodiments of the present invention is also possible, and the embodiments of the present invention should be considered as disclosed in the embodiments of the present invention as long as the combination does not depart from the spirit of the embodiments of the present invention.
Claims (13)
1. An acquisition method of a model for diagnosing and identifying faults of a petrochemical process is characterized by comprising the following steps:
calculating kernel function parameters of the SVM model through training set data;
training the SVM model with the training set data, wherein,
the weight parameters of the SVM model are used to score the contribution of process variables to the training set data types,
the contribution scores are used to train a contribution score ranking of the process variables in the process,
the contribution score ranking is used for feature selection in the training process;
and after the training is finished, obtaining the optimal feature set data and the optimal SVM model.
2. The method for acquiring the model for diagnosing and identifying the petrochemical process fault according to claim 1, wherein the kernel function parameters of the SVM model are calculated through training set data, wherein the training set data are obtained through preprocessing, and the preprocessing comprises:
performing data normalization on the data of the training set to be processed;
data balancing is performed on the normalized data training set data, wherein,
the data balancing comprises N for the training set data after the data normalizationTS1Next to NFS1Multiple cross validation, and using the training-validation set data obtained after cross validation as the training set data, NTS1And NFS1Is a positive integer and NTS1And NFS1Respectively the number of repetitions of cross-validation and the number of data packets.
3. The method according to claim 2, wherein the data normalization is performed on the data of the training set to be processed, wherein the data normalization is calculated by:
in the calculation formula, x is the data of the training set to be processed, mu is the mean value of the data of the training set to be processed, and sigma is1And y is the standard deviation of the training set data to be processed, and the training set data after data normalization.
4. The method for obtaining the model for diagnosing and identifying the petrochemical process fault according to claim 1, wherein the calculating the kernel function parameters of the SVM model through the training set data comprises:
n by training set dataTS2Next to NFS2And performing double cross validation, namely calculating the Gaussian kernel function parameters of the SVM model, wherein N is obtained after cross validationTS2The mean value of the individual kernel function parameter values is used as the Gaussian kernel function parameter of the SVM model, NTS2And NFS2Is a positive integer and NTS2And NFS2Respectively the number of repetitions of cross-validation and the number of data packets.
5. The method for obtaining a model for petrochemical process fault diagnosis and identification according to claim 1, wherein the training of the SVM model through the training set data comprises:
selecting an objective function, wherein the objective function is as follows:
in the objective function formula, xi∈RmAnd xj∈Rm,xiIs a vector of features of the ith sample of m process variables, xjIs a vector of the features of the jth sample in the m process variables,
yie {1, -1} and yj∈{1,-1},yiIs the category of the ith sample, yjIs a category of the j-th sample,
k(xi,xj) Is a Gaussian kernel function, the Gaussian kernel function k (x)i,xj) Comprises the following steps:
in the Gaussian kernel function, σ2In order to be a parameter of the gaussian kernel function,
xi、xjbelongs to a training feature set, yi、yjBelongs to training classification set (x)i,yi) And (x)j,yj) Belonging to training set datamin is a minimum function, αiAnd alphajIs a Lagrange multiplier, n is the number of samples and is a positive integer;
selecting constraint conditions, wherein the constraint conditions are as follows:
0≤αi≤C,i=1,2,3,...,n
c in the constraint condition is a preset balance factor;
selecting a classifier of said SVM model, said classifier f (x)q) Comprises the following steps:
f(xq)=wTxq+b
in the classifier, xqFor the data to be classified, b is a constant vector, w is xiVector of weight parameters.
6. The method for obtaining a model for petrochemical process fault diagnosis and identification according to claim 5, wherein the training the SVM model by the training set data further comprises:
determining a mapping of the contribution score, the mapping being:
in the said mapping formula, cpScoring the contribution.
7. The method for acquiring the model for diagnosing and identifying the petrochemical process fault according to claim 6, wherein the training of the SVM model through the training set data further comprises:
performing an iterative computation for feature selection through a training feature set and a training class set corresponding to the training set data,
wherein, one calculation in the iterative calculation comprises:
solving the objective function under the constraint condition through the current training feature set and the training classification set corresponding to the current training feature set, training the classifier, and obtaining the vector of the weight parameter corresponding to the vector of each feature,
calculating a contribution score for each process variable by the obtained vector of weight parameters, forming a ranking of the contribution scores for the process variables, ranking the formed contribution scores as a feature ranking of the vector of features,
according to the feature sequence, removing the feature corresponding to the minimum score in the feature sequence from the current training feature set, and returning the removed training feature set to the step of training the classifier.
8. The method of obtaining a model for diagnosing and identifying petrochemical process faults according to claim 7, wherein one of the iterative calculations further comprises:
calculating at least a F1-score value in a performance index quantity used to evaluate a classifier corresponding to the current training feature set;
the F1-score value was recorded.
9. The method for acquiring the model for diagnosing and identifying the fault of the petrochemical process according to claim 8, wherein after the training of the SVM model by the training set data and before the obtaining of the optimal feature set data and the optimal SVM model, the method further comprises:
obtaining quasi-determined optimal feature set data and a quasi-determined optimal SVM model according to the recorded magnitude relation of each F1-score value, wherein the quasi-determined optimal SVM model is provided with a classifier corresponding to the quasi-determined optimal feature set data;
and calculating the fault detection rate FDR, the false alarm rate FAR and the accuracy ACC of the optimal SVM model to be determined through a confusion matrix about real classification and prediction classification, and taking the fault detection rate FDR, the false alarm rate FAR and the accuracy ACC as evaluation factors of optimal feature set data and the optimal SVM model.
10. An online diagnosis method for diagnosing and identifying faults of a petrochemical process is characterized by comprising the following steps:
acquiring chemical process online acquisition data with the same data structure as the training set data;
establishing a test set of the online acquisition data from the optimal feature set data of any one of claims 1 to 9;
the optimal SVM model according to any one of claims 1 to 9, wherein the classification of the test set is identified to determine whether a chemical process is normal or faulty.
11. A system for petrochemical process fault diagnosis and identification, the system comprising:
the off-line training module is used for calculating kernel function parameters of the SVM model through training set data;
training the SVM model with the training set data, wherein,
the weight parameters of the SVM model are used to score the contribution of process variables to the training set data types,
the contribution scores are used to train a contribution score ranking of the process variables in the process,
the contribution score ranking is used for feature selection in the training process;
after training is completed, obtaining optimal feature set data and an optimal SVM model;
the system further comprises:
the online diagnosis module is used for acquiring the online collected data of the chemical process with the same data structure as the training set data;
establishing a test set of the online data collected in the chemical process according to the optimal feature set data;
and identifying the classification of the test set according to the optimal SVM model, and determining the normality or the fault of the chemical process.
12. An electronic device, comprising:
at least one processor;
a memory coupled to the at least one processor;
wherein the memory stores instructions executable by the at least one processor, the at least one processor implementing the method of any one of claims 1 to 10 by executing the instructions stored by the memory.
13. A computer readable storage medium storing computer instructions which, when executed on a computer, cause the computer to perform the method of any one of claims 1 to 10.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011530237.2A CN114660931A (en) | 2020-12-22 | 2020-12-22 | Method and system for diagnosing and identifying petrochemical process fault |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011530237.2A CN114660931A (en) | 2020-12-22 | 2020-12-22 | Method and system for diagnosing and identifying petrochemical process fault |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114660931A true CN114660931A (en) | 2022-06-24 |
Family
ID=82024488
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011530237.2A Pending CN114660931A (en) | 2020-12-22 | 2020-12-22 | Method and system for diagnosing and identifying petrochemical process fault |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114660931A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116484292A (en) * | 2023-04-25 | 2023-07-25 | 上海船舶运输科学研究所有限公司 | A Method for Predicting Lubricating Oil Pressure Faults of Marine Diesel Engines Through Algorithm Selection |
-
2020
- 2020-12-22 CN CN202011530237.2A patent/CN114660931A/en active Pending
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116484292A (en) * | 2023-04-25 | 2023-07-25 | 上海船舶运输科学研究所有限公司 | A Method for Predicting Lubricating Oil Pressure Faults of Marine Diesel Engines Through Algorithm Selection |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20240144043A1 (en) | Prediction model for predicting product quality parameter values | |
Samanta et al. | Artificial neural networks and genetic algorithm for bearing fault detection | |
CN103970092B (en) | Multi-stage fermentation process fault monitoring method based on self-adaption FCM algorithm | |
CN111222549A (en) | Unmanned aerial vehicle fault prediction method based on deep neural network | |
CN111580506A (en) | Industrial process fault diagnosis method based on information fusion | |
CN108062565A (en) | Double pivots-dynamic kernel principal component analysis method for diagnosing faults based on chemical industry TE processes | |
CN105488539B (en) | The predictor method and device of the generation method and device of disaggregated model, power system capacity | |
EP2478423A1 (en) | Supervised fault learning using rule-generated samples for machine condition monitoring | |
EP2930579A2 (en) | State monitoring system, state monitoring method and state monitoring program | |
CN113420061B (en) | Steady state working condition analysis method, optimization debugging method and system of oil refining and chemical production device | |
CN110352389A (en) | Information processing unit and information processing method | |
CN113721000B (en) | Method and system for detecting abnormity of dissolved gas in transformer oil | |
CN113092981A (en) | Wafer data detection method and system, storage medium and test parameter adjustment method | |
CN108334898A (en) | A kind of multi-modal industrial process modal identification and Fault Classification | |
CN112904810A (en) | Process industry nonlinear process monitoring method based on effective feature selection | |
CN110308713A (en) | A Method for Identification of Industrial Process Fault Variables Based on k-Nearest Neighbor Reconstruction | |
CN112434739A (en) | Chemical process fault diagnosis method of support vector machine based on multi-core learning | |
CN115730241A (en) | Construction method of cavitation noise identification model of water turbine | |
EP3712728A1 (en) | Apparatus for predicting equipment damage | |
US12217189B2 (en) | Hyperparameter adjustment device, non-transitory recording medium in which hyperparameter adjustment program is recorded, and hyperparameter adjustment program | |
Wang et al. | An entropy-and attention-based feature extraction and selection network for multi-target coupling scenarios | |
CN114660931A (en) | Method and system for diagnosing and identifying petrochemical process fault | |
CN115496108A (en) | Fault monitoring method and system based on manifold learning and big data analysis | |
CN118643320B (en) | Quality-related minor fault detection method based on dynamic orthogonal subspace | |
CN113253682A (en) | Nonlinear chemical process fault detection method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |