CN114970686B

CN114970686B - Multi-anomaly-mode-oriented partial information observable equipment fault detection method

Info

Publication number: CN114970686B
Application number: CN202210489809.XA
Authority: CN
Inventors: 段超群; 李逸凡; 李易璋; 王晓雯; 钟宋义; 刘富樯; 刘志杰; 孟献兵; 蒲华燕
Original assignee: SHANGHAI UNIVERSITY
Current assignee: SHANGHAI UNIVERSITY
Priority date: 2022-05-06
Filing date: 2022-05-06
Publication date: 2024-08-06
Anticipated expiration: 2042-05-06
Also published as: CN114970686A

Abstract

The invention discloses a fault detection method of part of information observable equipment facing to a multi-anomaly mode, which comprises the steps of firstly establishing a vector autoregressive model of a data stable part and calculating a data integral residual error according to multi-element degradation data acquired offline; secondly, establishing a hidden Markov degradation model capable of reflecting the multi-anomaly mode of the equipment based on residual data, and estimating degradation model parameters by adopting an expectation maximization algorithm; then, using Bayesian theorem to construct Bayesian posterior probability of the equipment in each abnormal mode, and alarming the equipment according to the posterior probability control threshold value and performing comprehensive fault detection; finally, with the aim of minimizing the running cost of the equipment, optimizing the control threshold value of fault detection through an algorithm based on a half Markov decision process to obtain an optimal control threshold value. The invention can screen the abnormal mode of the equipment according to the control threshold value and alarm so as to achieve the purpose of fault detection.

Description

A partial information observable device fault detection method for multiple abnormal patterns

技术领域Technical Field

本发明属于部分信息可观测设备的故障检测领域，具体涉及一种面向多异常模式下的部分信息可观测设备的故障检测方法。The present invention belongs to the field of fault detection of partial information observable equipment, and in particular relates to a fault detection method for partial information observable equipment in a multi-abnormal mode.

背景技术Background technique

随着现代设备向功能多样化、结构复杂化发展，设备内部状态越发难以直接观察，只能通过传感器获取的部分或间接状态信息来评估设备的真实健康状况，这类设备称为部分信息可观测设备。此类设备一般具有多种异常模式，若未及时识别潜在的异常并进行预防，会导致严重的失效后果，带来无法挽回的生产损失。As modern equipment develops towards diversified functions and complex structures, it is increasingly difficult to directly observe the internal status of the equipment. The real health status of the equipment can only be evaluated through partial or indirect status information obtained by sensors. Such equipment is called partially observable equipment. Such equipment generally has multiple abnormal modes. If potential abnormalities are not identified and prevented in time, serious failure consequences will result, resulting in irreparable production losses.

部分信息可观测设备的故障检测通常基于对间接监测信息的利用来达到其检测目的。现有方法往往针对一种或几种独立的异常模式，忽略了多种异常模式之间的关联关系，如风电机组运行中的退化型异常(磨损)或渗透型异常(漏水、漏油)。而在不同的异常模式下，同一设备的退化过程会表现出明显不同的退化特征，这使得基于部分监测信息的故障检测更具挑战性。此外，现有的故障检测方法针对部分信息可观测设备存在误检率高的缺点，在工程中的应用性较低。因此，如何有效地检测多异常模式下部分信息可观测设备的故障来提高设备的运行可靠性是一个亟待解决的问题。Fault detection of partially observable devices is usually based on the use of indirect monitoring information to achieve its detection purpose. Existing methods often target one or several independent abnormal modes, ignoring the correlation between multiple abnormal modes, such as degradation-type abnormalities (wear and tear) or penetration-type abnormalities (water leakage, oil leakage) in the operation of wind turbines. Under different abnormal modes, the degradation process of the same equipment will show significantly different degradation characteristics, which makes fault detection based on partial monitoring information more challenging. In addition, the existing fault detection methods have the disadvantage of high false detection rate for partially observable devices, and their applicability in engineering is relatively low. Therefore, how to effectively detect the faults of partially observable devices under multiple abnormal modes to improve the operational reliability of the equipment is an urgent problem to be solved.

发明内容Summary of the invention

本发明提供一种面向多异常模式的部分信息可观测设备故障检测方法，可充分利用设备的部分观测信息进行多异常模式的早期故障检测。通过对部分信息的有效处理，构建设备在多异常模式下的健康指标，根据每种异常模式下的贝叶斯后验概率检测设备可能存在的故障，并提供经济的维修措施。The present invention provides a partial information observable device fault detection method for multiple abnormal modes, which can make full use of partial observation information of the device for early fault detection in multiple abnormal modes. By effectively processing partial information, the health index of the device under multiple abnormal modes is constructed, and the possible faults of the device are detected according to the Bayesian posterior probability under each abnormal mode, and economical maintenance measures are provided.

本发明可通过以下技术方案予以解决：The present invention can be solved by the following technical solutions:

一种面向多异常模式的部分信息可观测设备故障检测方法，具体包括如下步骤：A method for detecting partial information observable device faults in multiple abnormal modes, comprising the following steps:

(1)部分信息可观测设备的数据预处理过程：采集设备的多元退化数据，选取数据历史的平稳部分建立向量自回归模型，并计算整体数据集的残差，使得预处理的数据满足多元正态性；(1) Data preprocessing process of partially observable equipment: collect multivariate degradation data of the equipment, select the stable part of the data history to establish a vector autoregression model, and calculate the residual of the entire data set so that the preprocessed data meets the multivariate normality;

(2)部分信息可观测设备的状态建模及参数优化：根据得到的残差序列建立健康状态为0，异常状态为1,…,M，故障状态为M+1的连续时间隐马尔可夫过程(X_t:t∈R₊)，设备的状态转移概率为P_ij(t)＝P(X_t＝jX₀＝i)，其转移速率λ₀₀,…,λ_MM+1使用期望最大化算法估计得到；(2) State modeling and parameter optimization of partially observable equipment: Based on the obtained residual sequence, a continuous-time hidden Markov process ( _Xt : t∈R ₊ ) is established with the healthy state as 0, the abnormal state as 1,…,M, and the fault state as M+1. The state transition probability of the equipment is _Pij (t)＝P( _Xt ＝ _jX0 ＝i), and its transition rate _λ00 ,…,λMM ₊₁ is estimated using the expectation maximization algorithm;

(3)面向多异常模式的故障检测方法：分别建立设备处于异常状态1,...,M下的后验概率作为反映设备健康状况的指标，利用贝叶斯定理，在获取新采样周期的观测值时更新各异常状态下的后验概率，根据控制阈值筛查出高风险的后验概率并进行报警以达到故障检测的目的，以最小化设备的长期预期平均成本为目标，通过半马尔科夫决策过程下的计算算法来求解故障检测方法的最优控制阈值。(3) Fault detection method for multiple abnormal modes: Establish the posterior probability of the device being in abnormal state 1, ..., M respectively As an indicator reflecting the health status of the equipment, the Bayesian theorem is used to update the posterior probability of each abnormal state when obtaining the observation value of the new sampling period. The high-risk posterior probability is screened out according to the control threshold and an alarm is issued to achieve the purpose of fault detection. With the goal of minimizing the long-term expected average cost of the equipment, the optimal control threshold of the fault detection method is solved through the calculation algorithm under the semi-Markov decision process.

进一步地，所述步骤(1)实现过程如下：Furthermore, the implementation process of step (1) is as follows:

首先，将采集的多元监测数据的平稳部分表示为然后假设平稳部分的数据服从一个平稳的向量自回归过程：First, the stationary part of the collected multivariate monitoring data is expressed as Then assume that the data in the stationary part follows a stationary vector autoregressive process:

其中p∈N是模型的阶数；δ₀∈R²是平稳部分的过程均值；Φ_r∈R^2×2是自相关矩阵；ε_n是服从正态分布N₂(0,Ω)的误差项，利用最小二乘法可估计出模型的参数为并将参数代入模型中可计算得到监测数据的残差为 Where p∈N is the order of the model; δ ₀ ∈R ² is the process mean of the stationary part; Φ _r ∈R ^2×2 is the autocorrelation matrix; ε _n is the error term that obeys the normal distribution N ₂ (0,Ω). The least squares method can be used to estimate the parameters of the model as Substituting the parameters into the model, the residual of the monitoring data can be calculated as

进一步地，所述步骤(2)实现过程如下：Furthermore, the implementation process of step (2) is as follows:

设备的状态过程被建立为具有M+2个状态的连续时间隐马尔可夫过程(X_t:t∈R₊)，其中健康状态为0，故障状态为M+1，中间共有M个异常状态，代表M个异常模式，状态空间定义为S＝{0,...,M+1}；设备的状态转移概率为P_ij(t)＝P(X_t＝j|X₀＝i)，其转移速率λ₀₀,...,λ_MM+1通过期望最大化算法估计得到，则计算得到的状态转移概率矩阵可表示为：The state process of the equipment is established as a continuous-time hidden Markov process with M+2 states ( _Xt : t∈R ₊ ), where the healthy state is 0, the fault state is M+1, and there are M abnormal states in the middle, representing M abnormal modes. The state space is defined as S={0,...,M+1}; the state transition probability of the equipment is _Pij (t)=P( _Xt =j| _X0 =i), and its transfer rate _λ00 ,...,λMM ₊₁ is estimated by the expectation maximization algorithm. The calculated state transition probability matrix can be expressed as:

在观测过程中，每个观测向量都是条件独立的，并服从于其对应状态下的多元正态分布N_m(μ_m,Σ_m),m∈{0,...,M}，因此，在异常状态m下残差的概率密度函数可表示为：During the observation process, each observation vector is conditionally independent and obeys the multivariate normal distribution N _m (μ _m ,Σ _m ), m∈{0,...,M} under its corresponding state. Therefore, the probability density function of the residual under abnormal state m can be expressed as:

进一步地，所述步骤(3)实现过程如下：Furthermore, the implementation process of step (3) is as follows:

分别计算设备处于异常模式m下的后验概率其中m∈{1,...,M}，在采集了观测序列Y₁,...,Y_n-1的条件下，设备在第n-1个采样时刻的后验概率可计算为：Calculate the posterior probability that the device is in abnormal mode m respectively Where m∈{1,...,M}, under the condition of collecting the observation sequence Y ₁ ,...,Y _n-1 , the posterior probability of the device at the n-1th sampling time can be calculated as:

根据贝叶斯定理和得到的状态转移概率矩阵P(t)＝[P_ij(t)]_i,j∈S，在第n个采样时刻设备处于异常模式m下的后验概率可更新为：According to Bayesian theorem and the obtained state transition probability matrix P(t) = [P _ij (t)] _i,j∈S , the posterior probability that the device is in abnormal mode m at the nth sampling time is Can be updated to:

并且，条件可靠度函数可计算为：And, the conditional reliability function can be calculated as:

通过监测设备处于各异常模式下的后验概率是否超越各自对应的控制阈值W¹,...,W^M来进行故障检测，若指标低于阈值，则表示设备运行风险较低并继续监测，一旦监测到任意则报警并进行设备停机检查，若检查到的故障与识别到的异常模式相符，则判为真报警并提供预防性维护使设备恢复如新，反之为假报警，进行微小调整后使设备恢复如新。By monitoring the posterior probability of the device being in each abnormal mode Whether it exceeds the corresponding control threshold W ¹ ,...,W ^M to detect faults. If the indicator is lower than the threshold, it means that the equipment operation risk is low and continues to be monitored. Once any An alarm is triggered and the equipment is shut down for inspection. If the detected fault matches the identified abnormal pattern, it is judged as a true alarm and preventive maintenance is provided to restore the equipment to its original condition. Otherwise, it is a false alarm and minor adjustments are made to restore the equipment to its original condition.

进一步地，在计算了设备处于每种异常模式下的后验概率后，需要确定设备在每种异常模式下后验概率的最优控制阈值。根据成本最小化准则，通过SMDP框架下的策略迭代算法优化控制阈值。Furthermore, after calculating the posterior probability of the device being in each abnormal mode, it is necessary to determine the optimal control threshold of the posterior probability of the device in each abnormal mode. According to the cost minimization criterion, the control threshold is optimized through the policy iteration algorithm under the SMDP framework.

本发明所提出的故障检测方案目标是实现单位时间的最小长期预期平均成本，将TC定义为设备在一个周期的运维总成本，CL定义为设备在一个周期的时间长度，控制阈值定义为W¹,...,W^M∈(0,1)，则根据更新理论，该问题等效于找到最佳控制阈值使得：The fault detection scheme proposed in this invention aims to achieve the minimum long-term expected average cost per unit time. TC is defined as the total operation and maintenance cost of the equipment in a cycle, CL is defined as the time length of the equipment in a cycle, and the control threshold is defined as W ¹ ,...,W ^M ∈(0,1). According to the update theory, this problem is equivalent to finding the optimal control threshold So that:

该问题采用基于SMDP框架的策略迭代算法进行求解计算。This problem is solved using a policy iteration algorithm based on the SMDP framework.

进一步地，所述SMDP框架的建立需要确定其状态空间，首先，将后验概率的取值区间[0,1]划分为L个等份，假设在第n次采样得到设备处于异常模式m下的后验概率位于区间[(i-1)/L,i/L)内时，SMDP定义于状态(i,m)。将该状态集表示为L₁＝{(i,m)|i＝1,...,W^mL}，其中第一个元素为后验概率区间的状态值，第二个元素为设备处于的异常模式编号。当后验概率超过其对应的控制阈值W^m时，则将设备停止并进行全面检查，若探查到的故障与异常模式m相符，则证明是真报警，并将SMDP状态定义在状态I₁，以提供相应的预防性维修措施使设备恢复到初始状态(0,m)；若探查到的故障与异常模式m不符或设备无故障，则将SMDP状态定义在状态I₀，并进行微调使设备恢复到新的状态(0,m)；在运行过程中若产生随机故障，则将SMDP状态定义在F，并提供故障维修使设备恢复到新的状态(0,m)。Furthermore, the establishment of the SMDP framework requires the determination of its state space. First, the interval [0,1] of the posterior probability is divided into L equal parts. Assume that the posterior probability of the device being in abnormal mode m is obtained at the nth sampling. When it is in the interval [(i-1)/L,i/L), SMDP is defined at state (i,m). The state set is represented as L ₁ ={(i,m)|i=1,...,W ^m L}, where the first element is the state value of the posterior probability interval and the second element is the abnormal mode number of the device. When it exceeds the corresponding control threshold W ^m , the equipment is stopped and a comprehensive inspection is carried out. If the detected fault is consistent with the abnormal mode m, it is proved to be a true alarm, and the SMDP state is defined in state I ₁ to provide corresponding preventive maintenance measures to restore the equipment to the initial state (0, m); if the detected fault is inconsistent with the abnormal mode m or the equipment is fault-free, the SMDP state is defined in state I ₀ , and fine-tuning is carried out to restore the equipment to a new state (0, m); if a random fault occurs during operation, the SMDP state is defined in F, and fault maintenance is provided to restore the equipment to a new state (0, m).

初始状态集、检查状态集和故障维修状态集分别表示为：L₀＝{(0,m)}，L₂＝{I₀,I₁}和L₃＝{F}，因此SMDP的状态空间为L＝L₀∪L₁∪L₂∪L₃。The initial state set, inspection state set and fault repair state set are represented as: L ₀ ={(0,m)}, L ₂ ={I ₀ ,I ₁ } and L ₃ ={F} respectively, so the state space of SMDP is L =L ₀ ∪L ₁ ∪L ₂ ∪L ₃ .

根据定义的SMDP状态空间，进行优化前需要推导SMDP的必要元素：转移概率、平均驻留时间和平均费用，其定义如下：According to the defined SMDP state space, the necessary elements of SMDP need to be derived before optimization: transition probability, average residence time and average cost, which are defined as follows:

P_k,r：当前设备处于状态k∈L并在下一个决策时刻转变为状态r∈L的概率；P _k,r : the probability that the current device is in state k∈L and changes to state r∈L at the next decision moment;

T_k：设备处于状态k∈L，在当前决策时刻到下一决策时刻的预期驻留时间；T _k : the expected dwell time of the device in state k∈L from the current decision moment to the next decision moment;

C_k：设备处于状态k∈L，在当前决策时刻到下一决策时刻的预期成本；C _k : the expected cost of the device in state k∈L from the current decision moment to the next decision moment;

对于给定的控制阈值W¹,...,W^M，设备的运行成本g(W¹,...,W^M)可以通过求解如下线性方程组获得：For a given control threshold W ¹ ,...,W ^M , the equipment operation cost g(W ¹ ,...,W ^M ) can be obtained by solving the following linear equations:

其中(i,m)∈L； Where (i,m)∈L;

u_(s,m)＝0对于任意一个(s,m)∈L。u _(s,m) = 0 for any (s,m)∈L.

SMDP元素的计算公式如下：The calculation formula of SMDP element is as follows:

·转移概率：Transition probability:

P_F，(0，m)＝1 _PF,(0,m) = 1

·预期驻留时间Expected residence time

τ_F＝T_F τ _F = _TF

其中，T_I、T_Pr、T_F和分别为全面检查、预防性维修和纠正性维修所花费的时间。Among them, _TI , T _Pr , _TF and are the time spent on comprehensive inspection, preventive maintenance and corrective maintenance respectively.

·预期成本Expected costs

C_(i,m)＝C_S,(i,m)∈L₁ C _(i,m) = C _S ,(i,m)∈L ₁

C_I0＝C_I _CI0 = _CI

C_I1＝C_I+C_Pr C _I1 = C _I + C _Pr

C_F＝C_F _CF = _CF

其中，C_S、C_I、C_Pr、C_F和分别为采样、全面检查、预防性维修和纠正性维修所花费的成本。Among them, _CS , _CI , C _Pr , _CF and are the costs of sampling, comprehensive inspection, preventive maintenance and corrective maintenance respectively.

本发明有益效果体现在：The beneficial effects of the present invention are embodied in:

根据获取的多元监测数据，建立了多异常模式下的部分信息可观测设备退化模型，并采用期望最大化算法来估计模型参数。构造设备处于多种异常模式下的后验概率来反映设备的运行风险，根据控制阈值来筛查出高风险的后验概率并进行报警以达到故障检测的目的。本发明较好地解决了面向多异常模式的部分信息可观测设备的故障检测问题，提高了设备故障检测的准确性和效率，并保证了设备的性能和安全。Based on the acquired multivariate monitoring data, a partial information observable equipment degradation model under multiple abnormal modes is established, and the expectation maximization algorithm is used to estimate the model parameters. The posterior probability of the equipment in multiple abnormal modes is constructed to reflect the operating risk of the equipment. The high-risk posterior probability is screened out according to the control threshold and an alarm is issued to achieve the purpose of fault detection. The present invention better solves the fault detection problem of partial information observable equipment with multiple abnormal modes, improves the accuracy and efficiency of equipment fault detection, and ensures the performance and safety of the equipment.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

图1本发明的流程图FIG. 1 is a flow chart of the present invention.

图2泵历史数据#1的真实多元测量值Figure 2 Real multivariate measurements of pump history data #1

图3泵历史数据#2的真实多元测量值Figure 3. Real multivariate measurements of pump history data #2

图4两种异常模式下的残差分布Figure 4 Residual distribution under two abnormal modes

图5两条历史数据经过数据预处理后的残差Figure 5 Residuals of two historical data after data preprocessing

图6面向异常模式1的故障检测过程(历史数据#1)Figure 6 Fault detection process for abnormal mode 1 (historical data #1)

图7面向异常模式2的故障检测过程(历史数据#1)Figure 7 Fault detection process for abnormal mode 2 (historical data #1)

图8面向异常模式1的故障检测过程(历史数据#2)Figure 8 Fault detection process for abnormal mode 1 (historical data #2)

图9面向异常模式2的故障检测过程(历史数据#2)Figure 9 Fault detection process for abnormal mode 2 (historical data #2)

具体实施方式Detailed ways

以下通过特定的具体实施例说明本发明的实施方式，本领域的技术人员可由本说明书所揭示的内容轻易地了解本发明的其他优点及功效。The following describes the implementation of the present invention through specific embodiments. Those skilled in the art can easily understand other advantages and effects of the present invention from the contents disclosed in this specification.

参见图1，本发明的一种面向多异常模式的部分信息可观测设备故障检测方法，具体包括以下几个步骤：Referring to FIG. 1 , a method for detecting a partial information observable device fault in multiple abnormal modes according to the present invention specifically includes the following steps:

步骤一构建离线模型：本发明采用气动隔膜泵的实际诊断数据进行案例验证。该装置是一种内部退化不可直接观测的部分信息可观测设备，用于泵送由压缩空气驱动的腐蚀性和易燃液体。泵的不同异常模式会导致不同的退化趋势，例如：阀芯磨损异常会导致泵工作频率升高和输出压力升高；进气口堵塞异常会导致泵的工作频率升高和输出压力下降。因此，通过监测泵的输出压力和工作频率可以间接反映泵设备的真实状态。Step 1: Build an offline model: The present invention uses the actual diagnostic data of a pneumatic diaphragm pump for case verification. The device is a partially observable device whose internal degradation is not directly observable, and is used to pump corrosive and flammable liquids driven by compressed air. Different abnormal modes of the pump will lead to different degradation trends. For example, abnormal valve core wear will lead to increased pump operating frequency and output pressure; abnormal inlet blockage will lead to increased pump operating frequency and decreased output pressure. Therefore, the real state of the pump equipment can be indirectly reflected by monitoring the output pressure and operating frequency of the pump.

工作频率和压力作为多元观测值分别通过膜片传动轴和压力传感器的磁感应每10小时采样一次。通过监测，泵出现阀芯异常(异常模式1)和进气口异常(异常模式1)两种模式。实验共采集27组泵的多元退化数据，其中包含U＝12组阀芯故障数据和V＝15组进气口故障数据。图2和图3分别展示了泵的两条历史数据。在第一条中，观测值在第1至13时刻内较为平稳，被视为平稳/健康部分的观测值。在第14个采样时刻后，观测值变得不稳定且呈正相关上升趋势，被为异常模式1下的观测值。在第二条中，第1至15时刻的观测值都属于健康部分。第16个采样时刻之后，观测值变得不稳定且两股信号呈现出负相关性，被视为异常模式2下的观测值。挑选出所有历史数据中的平稳部分然后假设平稳部分的数据服从一个平稳的向量自回归过程：The operating frequency and pressure are sampled every 10 hours as multivariate observations through the magnetic induction of the diaphragm drive shaft and the pressure sensor respectively. Through monitoring, the pump has two modes: valve core abnormality (abnormal mode 1) and air inlet abnormality (abnormal mode 1). A total of 27 sets of multivariate degradation data of the pump were collected in the experiment, including U=12 sets of valve core failure data and V=15 sets of air inlet failure data. Figures 2 and 3 show two historical data of the pump respectively. In the first line, the observation value is relatively stable from the 1st to the 13th moment, which is regarded as the observation value of the stable/healthy part. After the 14th sampling moment, the observation value becomes unstable and shows a positively correlated upward trend, which is regarded as the observation value under abnormal mode 1. In the second line, the observation values from the 1st to the 15th moment all belong to the healthy part. After the 16th sampling moment, the observation value becomes unstable and the two signals show a negative correlation, which is regarded as the observation value under abnormal mode 2. Select the stable part of all historical data Then assume that the data in the stationary part follows a stationary vector autoregressive process:

通过最小二乘法，模型最佳估计值可计算得到：The model best estimate is obtained by the least squares method. It can be calculated that:

并计算整体数据集的残差可计算为：And the residual for the entire data set can be calculated as:

所述两种异常模式下的残差分布如图4所示。其中‘○’为健康部分的二元残差，‘*’为异常模式1下的二元残差，‘△’点为异常模式2下的二元残差。The residual distributions under the two abnormal modes are shown in Figure 4. Among them, ‘○’ is the binary residual of the healthy part, ‘*’ is the binary residual under abnormal mode 1, and ‘△’ is the binary residual under abnormal mode 2.

经过计算，所述图2和图3的两条历史数据的残差计算结果如图5所示。其中实线代表压力测量值，虚线代表频率测量值。After calculation, the residual calculation results of the two historical data of Figures 2 and 3 are shown in Figure 5. The solid line represents the pressure measurement value, and the dotted line represents the frequency measurement value.

经过数据预处理后，本案例利用得到的残差建立了包含四个状态的隐马尔可夫模型来描述泵的退化过程，其中健康状态为0，故障状态为3，中间共有1和2两个异常状态(分别对应到阀芯磨损异常和进气口堵塞异常)。此外，只有故障状态是可观测的，其余状态是不可观测的。状态空间被定义为S＝{0,1,2,3}。状态过程(X_t:t∈R₊)被表示为一个多元连续时间隐马尔可夫链，其中X_t∈S。After data preprocessing, this case uses the residuals to establish a hidden Markov model with four states to describe the degradation process of the pump, where the healthy state is 0, the fault state is 3, and there are two abnormal states in the middle, 1 and 2 (corresponding to the valve core wear abnormality and the air inlet blockage abnormality, respectively). In addition, only the fault state is observable, and the other states are unobservable. The state space is defined as S = {0,1,2,3}. The state process ( _Xt : t∈R ₊ ) is represented as a multivariate continuous-time hidden Markov chain, where _Xt∈S .

采用期望最大化算法来估计模型参数，所述部分信息可观测设备退化模型的最佳参数和估计过程如表1所示。The expectation maximization algorithm is used to estimate the model parameters. The partial information can be used to observe the optimal parameters of the equipment degradation model. and The estimation process is shown in Table 1.

表1采用EM算法估计得到的隐马尔可夫模型参数Table 1 Hidden Markov model parameters estimated using the EM algorithm

附表Schedule

将估计得到的状态参数代入柯尔莫哥洛夫向后微分方程，则可计算得到状态转移概率矩阵：The estimated state parameters Substituting into the Kolmogorov backward differential equation, the state transition probability matrix can be calculated:

将估计得到的观测参数代入产生残差的概率密度函数中，可得到：The estimated observation parameters Substituting into the probability density function of the residual, we can get:

步骤二故障检测方案的开发与优化：分别建立设备处于两种异常模式下的后验概率和然后利用贝叶斯定理，在获取新的观测值Y_n后对和进行更新：Step 2: Development and optimization of fault detection schemes: Establish the a posteriori probability of the device being in two abnormal modes respectively and Then, using Bayes’ theorem, after obtaining the new observation value _Yn, and To update:

将故障检测方案的控制阈值定义为W¹,W²∈(0,1)，根据更新理论，最小化设备长期预期平均成本的问题等效于找到最优控制阈值使得：The control threshold of the fault detection scheme is defined as W ¹ ,W ² ∈(0,1). According to the update theory, the problem of minimizing the long-term expected average cost of the equipment is equivalent to finding the optimal control threshold So that:

为了计算最优控制阈值需要首先定义SMDP框架。将离散为L＝20个有限状态。其次，分别定义固定的成本参数C_S＝5，C_I＝100，C_Pr＝300，C_F＝1500和时间参数T_I＝1，T_Pr＝1，T_F＝8。通过计算，可得到以下三种SMDP框架中的元素。To calculate the optimal control threshold You need to define the SMDP framework first. Discretize into L = 20 finite states. Secondly, define fixed cost parameters _CS = 5, _CI = 100, C _Pr = 300, _CF = 1500 and time parameters _TI = 1, _TPr = 1, _TF = 8. Through calculation, the following three elements in the SMDP framework can be obtained.

·转移概率：Transition probability:

P_F,0＝1P _F,0 ＝1

·预期驻留时间Expected residence time

τ_F＝T_F τ _F = _TF

·预期成本Expected costs

C_(i,m)＝C_S,(i,m)∈L₁ C _(i,m) = C _S ,(i,m)∈L ₁

C_F＝C_F _CF = _CF

对于给定的控制极限W¹,W²，长期预期平均成本g(W¹,W²)可以通过求解如下线性方程组获得：For given control limits W ¹ , W ² , the long-term expected average cost g(W ¹ , W ² ) can be obtained by solving the following linear equations:

其中(i,m)∈L； Where (i,m)∈L;

u_(s,m)＝0对于任意一个(s,m)∈L。u _(s,m) = 0 for any (s,m)∈L.

经过迭代计算，得到最优控制阈值为对应的最小长期预期平均成本为 After iterative calculation, the optimal control threshold is obtained as The corresponding minimum long-term expected average cost is

步骤三在线监测：在得到了本方案的最优控制阈值后，即可对气动隔膜泵进行故障检测，案例中提供了泵设备两条历史数据的故障检测过程，如图6、7、8和9所示。Step 3 Online monitoring: After obtaining the optimal control threshold of this solution, the pneumatic diaphragm pump can be fault detected. The case provides the fault detection process of two historical data of the pump equipment, as shown in Figures 6, 7, 8 and 9.

在图6和图7中，设备首先从新的状态开始运行并以恒定的采样间隔进行监测，前13个时刻，设备处于两种异常模式下的后验概率均低于各自的控制阈值，在第14个采样时刻设备处于异常模式1的后验概率更新为0.2695，高于设定的控制阈值设备主动报警，此时设备处于异常模式2的后验概率仍然低于阈值因此本方法提供的故障检测结果为：设备存在异常模式1可能导致阀芯故障。在进行全面停机检查后发现设备阀芯磨损严重，存在故障，因此判定提出方法所产生的报警信号为真。In Figures 6 and 7, the device first starts running from a new state and is monitored at a constant sampling interval. In the first 13 moments, the posterior probability of the device being in the two abnormal modes is lower than the respective control thresholds. At the 14th sampling moment, the posterior probability of the device being in abnormal mode 1 is updated to 0.2695, which is higher than the set control threshold. The device actively alarms. At this time, the posterior probability that the device is in abnormal mode 2 is still lower than the threshold Therefore, the fault detection result provided by the method is: the equipment has abnormal mode 1 which may cause valve core failure. After a comprehensive shutdown inspection, it is found that the equipment valve core is severely worn and there is a fault, so the alarm signal generated by the proposed method is determined to be true.

在图8和图9中，设备在前15个采样时刻运行状况良好，两种异常模式下的后验概率均低于其对应的控制阈值。在第16个时刻，设备处于异常模式2的后验概率突然升高并超过控制阈值触发报警，此时设备处于异常模式1的后验概率仍然低于阈值因此本方法提供的故障检测结果为：设备存在异常模式2可能导致进气口故障。在进行全面停机检查后发现泵进气口滤网存在大量粘性液体，存在故障，因此判定提出方法所产生的报警信号为真。In Figures 8 and 9, the device operates well in the first 15 sampling moments, and the posterior probabilities of the two abnormal modes are lower than their corresponding control thresholds. At the 16th moment, the posterior probability of the device in abnormal mode 2 suddenly increases and exceeds the control threshold. The alarm is triggered, and the posterior probability that the device is in abnormal mode 1 is still lower than the threshold Therefore, the fault detection result provided by the method is: the equipment has abnormal mode 2 which may cause the air inlet fault. After a comprehensive shutdown inspection, it was found that there was a large amount of viscous liquid in the pump air inlet filter screen, indicating a fault, so the alarm signal generated by the proposed method was determined to be true.

利用上述技术方法，本发明设计的一种面向多异常模式的部分信息可观测设备故障检测方法，能够对获取的多元退化数据进行数据预处理，针对部分信息可观测设备的多异常模式进行状态建模。通过监测设备处于各异常模式下的后验概率来揭示设备风险，根据优化后得到的控制阈值筛查出高风险的后验概率来进行故障检测。该方法已经过算法验证，并采用真实的气动隔膜泵多元诊断数据进行了案例检验。结果表明，该方法能够根据在线监测的多元退化数据有效识别泵的异常模式，并且能够检测出气动隔膜泵的潜在故障。该方法保证了故障检测的准确性并且能够采取适当的维修措施来预防故障的发生。在设备性能和安全得到提升的同时，也节约了运维成本。Utilizing the above technical methods, the present invention designs a method for partially information observable equipment fault detection for multiple abnormal modes, which can perform data preprocessing on the acquired multivariate degradation data and perform state modeling for the multiple abnormal modes of partially information observable equipment. The equipment risk is revealed by monitoring the posterior probability of the equipment in each abnormal mode, and the high-risk posterior probability is screened out according to the control threshold obtained after optimization to perform fault detection. The method has been algorithmically verified and case-tested using real multivariate diagnostic data of pneumatic diaphragm pumps. The results show that the method can effectively identify the abnormal mode of the pump based on the multivariate degradation data monitored online, and can detect the potential faults of the pneumatic diaphragm pump. The method ensures the accuracy of fault detection and can take appropriate maintenance measures to prevent the occurrence of faults. While the performance and safety of the equipment are improved, the operation and maintenance costs are also saved.

以上所述仅为本发明的较佳实施例而已，并不用以限制本发明，凡在本发明的精神和原则之内所作的任何修改、等同替换和改进等，均应包含在本发明的保护范围之内。The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present invention should be included in the protection scope of the present invention.

Claims

1. The fault detection method of the part of information observable equipment facing to the multi-anomaly mode is characterized by comprising the following steps of:

(1) Data preprocessing process of part of information observable equipment: fitting the acquired multi-element monitoring data by using a vector autoregressive model, and calculating residual errors between a predicted value of the vector autoregressive model and a true value of the data, wherein the obtained residual errors meet multi-element normalization;

(2) State modeling and parameter estimation of part of information observable equipment: establishing a continuous time hidden Markov process (X _t:t∈R₊) with the health state of 0, the abnormal state of 1, …, M and the fault state of M+1 according to the obtained residual sequence; the state transition probability of the device is P _ij(t)＝P(X_t＝j|X₀ =i), and the transition rate lambda ₀₀,...,λ_MM+1 is estimated by using a expectation maximization algorithm;

(3) The fault detection method for the multi-anomaly mode comprises the following steps: establishing a posterior probability under M, respectively, that the device is in an abnormal state 1 As an index reflecting the health condition of the equipment, using the bayesian theorem, updating posterior probability under various abnormal states when obtaining the observed value of a new sampling period, screening the posterior probability of high risk according to a control threshold value, alarming to achieve the purpose of fault detection, and solving the optimal control threshold value of the fault detection method through a half Markov decision process algorithm with the minimum equipment operation and maintenance cost as a target.

2. The method for detecting a failure of a part of information observable equipment facing a multi-anomaly mode according to claim 1, wherein the implementation process of the step (1) is as follows:

First, the stationary part of the acquired multivariate monitoring data is expressed as The stationary part of the data was fitted using a vector autoregressive process:

Wherein p ε N is the order of the model; Delta ₀∈R² is the plateau process mean; Φ _r∈R^2×2 is an autocorrelation matrix; epsilon _n is the error term that obeys the normal distribution N ₂ (0, Ω);

Finally, estimating the model parameters by using a least square method to obtain the model parameters as Substituting the parameters into the model to calculate residual error of the monitoring data as

3. The method for detecting a failure of a part of information observable equipment facing a multi-anomaly mode according to claim 1, wherein the implementation process of the step (2) is as follows:

the state process of the device is established as a continuous time hidden markov process (X _t:t∈R₊) with m+2 states, where the healthy state is 0, the faulty state is m+1, there are M abnormal states in between, representing M abnormal modes, and the state space is defined as s= { 0..a., m+1}; the state transition probability of the device is P _ij(t)＝P(X_t＝j|X₀ =i), the transition rate lambda ₀₀,...,λ_MM+1 is estimated by the expectation maximization algorithm, and the calculated state transition probability matrix can be expressed as:

During observation, each observation vector is independent of conditions and obeys the multivariate normal distribution N _m(μ_m,Σ_m under its corresponding state), M e {0,..m }, the probability density function of the residual in the abnormal state M can be expressed as:

4. The method for detecting a failure of a part of information observable equipment facing a multi-anomaly mode according to claim 1, wherein the implementation process of the step (3) is as follows:

When the observation sequence Y ₁,...,Y_n-1 is acquired, the posterior probability of the nth-1 sampling point device in the state corresponding to the mth abnormal mode can be expressed as follows:

According to the Bayesian theorem and the obtained state transition probability matrix P (t) = [ P _ij(t)]_i,j∈S ], posterior probability of the nth sampling point device in the state corresponding to the abnormal mode m The updating method can be updated as follows:

and, the conditional reliability function may be calculated as:

by monitoring the posterior probability of the device being in various abnormal modes Whether the corresponding control threshold W ¹,...,W^M is exceeded or not is used for fault detection, if the index is lower than the threshold, the running risk of the equipment is lower, the monitoring is continued, and once any one is detectedAnd alarming and carrying out equipment shutdown checking, if the checked faults are consistent with the identified abnormal modes, judging the equipment to be truly alarming and providing preventive maintenance to enable the equipment to be restored as new, otherwise, carrying out false alarming, and enabling the equipment to be restored as new after carrying out micro adjustment.

5. The multi-anomaly-oriented partial information observable equipment fault detection method of claim 4, wherein the objective of the fault detection method is to achieve a minimum operation and maintenance cost per unit time, which is equivalent to finding an optimal control threshold according to an update theorySuch that:

wherein TC and CL respectively represent the total operation and maintenance cost and the time length of the equipment in one period, and the fault detection method adopts an SMDP-based algorithm to solve the optimal control threshold and the minimum long-term expected average cost.

6. The multi-anomaly mode oriented partial information observable device failure detection method of claim 5, wherein the SMDP establishment requires determining its state space: firstly, dividing a value interval [0,1] of posterior probability into L equal parts, and supposing that posterior probability of the device in an abnormal mode m is obtained in nth samplingWhen the SMDP is positioned in the interval [ (i-1)/L, i/L), the SMDP is defined in a state (i, m), the state set is expressed as L ₁＝{(i,m)|i＝1,...,W^m L }, wherein the first element is a state value of a posterior probability interval, and the second element is an abnormal mode number of equipment; posterior probability whenWhen the detected fault exceeds the corresponding control threshold W ^m, stopping the equipment and performing comprehensive inspection, if the detected fault accords with the abnormal mode m, proving to be a true alarm, and defining the SMDP state in the state I ₁ to provide corresponding preventive maintenance measures to enable the equipment to be restored to the initial state (0, m); if the detected fault does not accord with the abnormal mode m or the equipment does not have the fault, defining the SMDP state in a state I ₀, and performing fine tuning to restore the equipment to a new state (0, m); if random faults are generated in the operation process, defining the SMDP state as F, and providing fault maintenance;

the initial state set, the inspection state set, and the trouble maintenance state set are expressed as: l ₀＝{(0,m)},L₂＝{I₀,I₁ and L ₃ = { F }, so the state space of SMDP is l=l ₀∪L₁∪L₂∪L₃.

7. The multi-anomaly mode oriented partial information observable device fault detection method of claim 6, wherein the SMDP based algorithm includes three elements, transition probability, expected residence time, and expected cost, defined as follows:

P _k,r: probability that the current device is in state k epsilon L and is converted into state r epsilon L at the next decision moment;

T _k: the device is in state k epsilon L, and the expected residence time from the current decision time to the next decision time;

C _k: the equipment is in a state k epsilon L, and the expected cost from the current decision time to the next decision time;

for a given control threshold W ¹,...,W^M, the running cost of the device g (W ¹,...,W^M) can be obtained by solving a system of linear equations:

Wherein (i, m) ε L;

u _(s,m) = 0 for any one (s, m) ∈l;

the calculation formula of the transition probability is as follows:

P_F,(0,m)＝1；

The expected residence time is calculated as follows:

τ_F＝T_F

wherein T _I、T_Pr、T_F and the time spent for the total inspection, preventive maintenance and corrective maintenance, respectively;

the expected cost is calculated as follows:

C_(i,m)＝C_S,(i,m)∈L₁

C_F＝C_F

Wherein C _S、C_I、C_Pr and C _F are costs spent for sampling, general inspection, preventive maintenance and corrective maintenance, respectively.