CN114358215A

CN114358215A - A wellbore fluid detection method based on deep anomaly detection

Info

Publication number: CN114358215A
Application number: CN202210267347.7A
Authority: CN
Inventors: 陈雁; 黄玉楠; 谌施宇; 易雨; 安玉钏; 苗波; 李平; 钟学燕; 钟原
Original assignee: Southwest Petroleum University
Current assignee: Southwest Petroleum University
Priority date: 2022-03-18
Filing date: 2022-03-18
Publication date: 2022-04-15

Abstract

The invention discloses a shaft effusion detection method based on depth anomaly detection, which comprises the steps of firstly obtaining SCADA production high-frequency data and reducing the dimension of the SCADA production high-frequency data; then carrying out feature fusion on the SCADA production high-frequency data subjected to dimension reduction, the A2 data and the geological parameter data; secondly, performing data modeling and training by using the fused features, and calculating a prediction error; and finally, calculating a dynamic threshold according to the obtained prediction error, and judging whether the shaft accumulates liquid according to the dynamic threshold. According to the scheme, the SCADA second-level data is used as the characteristics, so that the model only focuses on data fluctuation between days, and the data fluctuation condition within the day is also considered, and more subtle data change can be captured. And the experimental result also proves that the data fluctuation situation in the day can better reflect the concrete actual production situation, and the method of reducing the dimension firstly is adopted in the process of processing the SCADA high-frequency data, so that the data modeling of the service model is better realized, and the occurrence of dimension disaster is also avoided.

Description

A wellbore fluid detection method based on deep anomaly detection

技术领域technical field

本发明涉及气井开发领域，具体涉及一种基于深度异常检测的井筒积液检测方法。The invention relates to the field of gas well development, in particular to a wellbore fluid detection method based on depth abnormality detection.

背景技术Background technique

气井积液是指气井中由于气体不能有效携带出液体而使液体在井筒中聚积的现象。气井在生产过程中，气液两相由地层流出，经由井筒采出地面。生产早期，气井产气量高，气液两相以环状流向上流动，液体以两种方式携带：夹带于气芯中的液滴和贴附于管壁的液膜。随着地层压力下降，气井产气量降低，导致井筒中液体（液滴/液膜）流动反转不能被带出地面从而发生积液。现场试井作业表明，井筒积液导致井筒压力梯度大幅度增加，从而使得产量递减幅度增大，影响气井最终采收率。因此，准确预测气井积液时间并及时采取排水采气工艺措施对维持低产气井稳产生产具有重大意义。而目前对于采气工艺及积液的预测检测还存在以下问题：Liquid accumulation in gas wells refers to the phenomenon that liquid accumulates in the wellbore because the gas cannot effectively carry the liquid out. During the production process of a gas well, the gas-liquid two phases flow out from the formation and are produced from the surface through the wellbore. In the early stage of production, the gas production of gas wells is high, and the gas-liquid two phases flow upward in an annular flow, and the liquid is carried in two ways: liquid droplets entrained in the gas core and liquid film attached to the pipe wall. As the formation pressure decreases, the gas production of the gas well decreases, resulting in the inversion of the liquid (droplet/liquid film) flow in the wellbore, which cannot be brought out of the surface, resulting in liquid accumulation. Field well testing operations show that the fluid accumulation in the wellbore leads to a significant increase in the wellbore pressure gradient, which increases the rate of decline in production and affects the ultimate recovery of gas wells. Therefore, it is of great significance to accurately predict the liquid accumulation time of gas wells and take timely drainage and gas production measures to maintain stable production of low-yield gas wells. At present, there are still the following problems in the prediction and detection of gas production process and liquid accumulation:

1．采气工艺实施时机主要依靠经验，亟需更精细的措施时机指导。下入时机过早,井筒压力过高,带压作业风险大；下入时机过晚,气井携液能力差,容易积液,影响后期产能；不同井地质工程参数差异大,生产能力各异,统一标准难以完全适用。 1. The timing of implementation of the gas extraction process mainly depends on experience, and more precise measures and timing guidance are urgently needed. If the running time is too early, the wellbore pressure is too high, and the risk of pressure-bearing operation is high; if the running time is too late, the gas well has poor liquid-carrying capacity, and it is easy to accumulate liquid, which affects the later production capacity. Uniform standards are difficult to fully apply.

2．两相流机理复杂，各模型结果差异大,难以指导措施开展。井筒轨迹复杂，两相流模拟难度大，难以准确计算井筒压力、温度分布；临界携液模型繁多，模型结果差异大、适应性差，难以准确指导排液工艺的实施时机。2. The mechanism of two-phase flow is complex, and the results of each model vary greatly, which makes it difficult to guide the implementation of measures. The wellbore trajectory is complex, the two-phase flow simulation is difficult, and it is difficult to accurately calculate the wellbore pressure and temperature distribution; there are many critical liquid-carrying models, the model results vary greatly, and the adaptability is poor, so it is difficult to accurately guide the implementation timing of the liquid drainage process.

3．目前气井积液研究众多，但对其机理认识莫衷一是，不同预测模型计算值之间偏差很大，导致现场进行排采工艺设计时缺乏有效的指导。究其根本原因，各机理模型建模时考虑影响因素单一，缺乏与实际气井生产动态的对比。3. At present, there are many studies on fluid accumulation in gas wells, but there is no consensus on its mechanism. There is a large deviation between the calculated values of different prediction models, which leads to the lack of effective guidance when designing the drainage and production process in the field. The root cause is that the modeling of each mechanism model considers a single influencing factor and lacks the comparison with the actual gas well production performance.

4．SCADA 采集了大量高频生产数据，但未能有效利用。目前工业界虽开展过部分数据挖掘项目，但仍主要使用处理过后的低频静态数据，高频生产数据未能得到充分利用。4. SCADA collects a lot of high-frequency production data, but fails to use it effectively. At present, although some data mining projects have been carried out in the industry, the processed low-frequency static data is still mainly used, and the high-frequency production data has not been fully utilized.

发明内容SUMMARY OF THE INVENTION

针对现有技术中的上述不足，本发明提供了一种基于深度异常检测的井筒积液检测方法。In view of the above deficiencies in the prior art, the present invention provides a wellbore fluid detection method based on depth abnormality detection.

为了达到上述发明目的，本发明采用的技术方案为：In order to achieve the above-mentioned purpose of the invention, the technical scheme adopted in the present invention is:

一种基于深度异常检测的井筒积液检测方法，包括如下步骤：A method for detecting wellbore fluid accumulation based on deep anomaly detection, comprising the following steps:

S1、获取SCADA生产高频数据并对其进行降维；S1. Obtain high-frequency data of SCADA production and reduce its dimension;

S2、将降维后的SCADA生产高频数据与A2数据和地质参数数据进行特征融合；S2. Feature fusion of SCADA production high-frequency data after dimensionality reduction with A2 data and geological parameter data;

S3、利用步骤S2中融合后的特征进行数据建模并进行训练，计算预测误差；S3, using the features fused in step S2 to perform data modeling and training, and calculate the prediction error;

S4、根据步骤S3得到的预测误差计算动态阈值，并根据动态阈值判断井筒是否积液。S4. Calculate the dynamic threshold value according to the prediction error obtained in step S3, and judge whether the wellbore is fluid according to the dynamic threshold value.

上述方案的有益效果是，本发明在对井筒积液的下一时间周期内正常数据走向的预测中，使用 SCADA 秒级数据作为特征，使得模型不仅仅只关注天与天之间的数据波动，也考虑到天内数据波动情况，能捕捉到更加细微的数据变化。且实验结果也证明，天内数据波动情况更能反映具体的生产实际情况。本发明在对SCADA高频数据的处理上采用先降维的方法，更好地去服务模型的数据建模，也避免了维灾难的出现。The beneficial effect of the above solution is that the present invention uses SCADA second-level data as a feature in the prediction of the normal data trend in the next time period of wellbore fluid accumulation, so that the model not only pays attention to the data fluctuation between days, Taking into account the fluctuation of data within the day, it can capture more subtle data changes. And the experimental results also prove that the fluctuation of data within the day can better reflect the actual production situation. The invention adopts the method of reducing the dimension first in the processing of SCADA high-frequency data, so as to better serve the data modeling of the model, and also avoid the occurrence of dimension disaster.

进一步的，所述S1中对SCADA 生产高频数据进行降维的具体方式为：Further, the specific method of dimensionality reduction for SCADA production high-frequency data in S1 is:

利用自编码器将SCADA生产高频数据的获取的时序生产高频数据特征降低到指定维度，其中，所述自编码器包括多层LSTM网络，每层LSTM网络具有不同的隐藏单元，每层LSTM网络对时序的SCADA生产高频数据进行不同维度的特征提取。Using an autoencoder to reduce the time-series production high-frequency data features obtained from SCADA production high-frequency data to a specified dimension, wherein the autoencoder includes a multi-layer LSTM network, each layer of LSTM network has different hidden units, and each layer of LSTM The network performs feature extraction of different dimensions on the high-frequency data of SCADA production in time series.

上述方案的有益效果是，对SCADA生产高频数据进行降维操作，是根据SCADA数据本身的特点进行的数据处理操作，能够保障在充分利用SCADA数据的同时，避免模型因为数据的高维性发生维灾难。The beneficial effect of the above scheme is that the dimensionality reduction operation on the high-frequency data produced by SCADA is a data processing operation based on the characteristics of the SCADA data itself, which can ensure that the SCADA data is fully utilized while avoiding the occurrence of the model due to the high dimensionality of the data. dimensional disaster.

进一步的，所述S2中利用Concat 操作进行特征融合，将一天之内的SCADA 动态数据降维后的结果与 A2 数据、地质参数的静态参数进行特征向量拼接，得到输入特征向量。Further, the Concat operation is used to perform feature fusion in the S2, and the result of dimensionality reduction of the SCADA dynamic data within one day is spliced with the A2 data and the static parameters of the geological parameters to obtain the input feature vector.

上述方案的有益效果是，在现有的井筒积液研究算法中，均受限于井型与区块，为了让本发明提出的模型不受上述因素的限制，故进行特征融合操作，引入A2数据与地址参数的静态数据，做到积液预测的智能化。The beneficial effect of the above scheme is that in the existing wellbore effusion research algorithm, it is limited by the well type and block. In order to make the model proposed by the present invention not limited by the above factors, the feature fusion operation is carried out, and A2 is introduced. The static data of data and address parameters makes the prediction of effusion intelligent.

进一步的，所述S3中数据建模及训练的具体方式为：Further, the specific methods of data modeling and training in the S3 are:

S31、利用步骤S2得到的融合后的特征构建包括遗忘门、输入门和输出门的LSTM模型；S31, using the fused features obtained in step S2 to construct an LSTM model including a forget gate, an input gate and an output gate;

S32、利用自循环权重对LSTM模型的内部遗忘门、输入门和输出门进行更新，得到更新之后的LSTM模型。S32, using the self-circulation weight to update the internal forget gate, input gate and output gate of the LSTM model to obtain the updated LSTM model.

上述方案的有益效果是，在实际生产过程中，积液数据相较于未积液的数据而言较少，故采用基于预测误差的方法对数据进行建模和积液的检测，能够保证模型不受正负样本，即是积液与未积液的数据不均衡的影响。The beneficial effect of the above scheme is that in the actual production process, the data of effusion is less than the data of no effusion, so the method based on prediction error is used to model the data and detect the effusion, which can ensure the model It is not affected by the imbalance of positive and negative samples, that is, the data of effusion and no effusion.

进一步的，所述S32中利用自循环权重对遗忘门的更新方式为：Further, the update method of the forgetting gate using the self-circulation weight in the S32 is:

；

;

其中，

为第

个遗忘门在

时刻的输出，

为遗忘门的第

个偏置权重，

为

时刻第

个遗忘门对应的第

个输入变量

的输入权重，

为

时刻第

个遗忘门对应的第

个隐藏层变量

的循环权重，

为

时刻的隐藏层变量，

表示sigmoid函数。in,

for the first

a forgotten gate

time output,

for the oblivion gate

a bias weight,

for

the moment

corresponding to the forget gate

input variables

The input weights of ,

for

the moment

corresponding to the forget gate

hidden layer variables

The cycle weight of ,

for

the hidden layer variable at time,

Represents the sigmoid function.

进一步的，所述S32中输入门的更新方式为：Further, the update mode of the input gate in the S32 is:

；

;

其中，

为输入门的第

个计算偏置，

是

时刻的输入向量，

为

时刻的隐藏层向量，

为

时刻第

个输入门对应的第

个输入变量

的输入权重，

为

时刻第

个输入门对应的第

个隐藏层变量

的循环权重。in,

is the first of the input gate

a computational bias,

Yes

the input vector of moments,

for

the hidden layer vector at the moment,

for

the moment

The first input gate corresponding to the

input variables

The input weights of ,

for

the moment

The first input gate corresponding to the

hidden layer variables

the loop weight.

进一步的，所述S32中LSTM细胞内部状态的更新方式为：Further, the update method of the internal state of the LSTM cell in the S32 is:

。

.

其中，

为在

时刻的LSTM模型内部状态，

为输入门单元，

为偏置权重,

为隐藏层变量的循环权重。in,

for in

The internal state of the LSTM model at the moment,

is the input gate unit,

is the bias weight,

is the loop weight of the hidden layer variable.

进一步的，所述LSTM模型的输出表示为：Further, the output of the LSTM model is expressed as:

；

;

；

;

其中，

为第

个输出门

时刻的输出，

为第

个记忆单元

时刻的值，

为输入门的第

个计算偏置，

是

时刻的输入向量，

为

时刻的隐藏层向量，

为

时刻第

个输入门对应的第

个输入变量

的输入权重，

为

时刻第

个输入门对应的第

个隐藏层变量

的循环权重。in,

for the first

output gate

time output,

for the first

memory unit

the value of the moment,

is the first of the input gate

a computational bias,

Yes

the input vector of moments,

for

the hidden layer vector at the moment,

for

the moment

The first input gate corresponding to the

input variables

The input weights of ,

for

the moment

The first input gate corresponding to the

hidden layer variables

the loop weight.

上述进一步方案的有益效果是，利用长短时记忆门对数据进行建模，有效提取时序依赖关系，从而达到时序预测的效果。The beneficial effect of the above-mentioned further scheme is that the long-short-term memory gate is used to model the data, and the time-series dependencies can be effectively extracted, so as to achieve the effect of time-series prediction.

进一步的，所述S3中计算预测误差的方式为：Further, the method of calculating the prediction error in the S3 is:

；

;

其中，

表示时刻的平均预测误差，n表示输入输出的向量长度，

为t时刻的输入数据，

为t时刻的预测数据。in,

Represents the average prediction error at the moment, n represents the vector length of the input and output,

is the input data at time t,

is the forecast data at time t.

上述方案的有益效果是，由于每天的特征数据是一个向量，为了后续阈值选择以及积液预测的方便，故使用每天预测误差向量的期望来代表当天的预测误差值。The beneficial effect of the above solution is that, since the characteristic data of each day is a vector, for the convenience of subsequent threshold selection and effusion prediction, the expectation of the daily prediction error vector is used to represent the prediction error value of the day.

进一步的，所述S4中动态阈值的计算方式表示为：Further, the calculation method of the dynamic threshold in the S4 is expressed as:

；

;

其中，

为动态阈值向量，

函数的意义为从动态阈值向量中选择使公式

最大化的动态阈值

，

为预测误差向量的均值，

为预测误差向量的标准差，z为权重系数，

为去掉异常值后的数据均值差，

为预测误差值向量，

为

时刻的误差值

，

为预测误差窗口大小，

为去掉异常值后的数据标准差之差，

为满足

的误差值

的集合，

为

集合中连续误差值的集合。in,

is the dynamic threshold vector,

The meaning of the function is to select the formula from the dynamic threshold vector

maximized dynamic threshold

,

is the mean of the prediction error vector,

is the standard deviation of the prediction error vector, z is the weight coefficient,

In order to remove the outliers, the mean difference of the data,

is the prediction error value vector,

for

time error

,

is the prediction error window size,

In order to remove the outliers, the standard deviation of the data,

to satisfy

error value of

collection of

for

The set of consecutive error values in the set.

上述方案的有益效果是，使用动态的方法来挑选阈值避免了固定阈值法一刀切的模式，使得挑选出的阈值由近期生产情况确定，符合工业生产实况。The beneficial effect of the above solution is that using a dynamic method to select the threshold value avoids the one-size-fits-all model of the fixed threshold value method, so that the selected threshold value is determined by the recent production situation and conforms to the actual industrial production.

附图说明Description of drawings

图1为本发明基于深度异常检测的井筒积液检测方法的流程示意图。FIG. 1 is a schematic flowchart of a wellbore fluid detection method based on depth anomaly detection according to the present invention.

图2为本发明实施例降维自编码器结构示意图。FIG. 2 is a schematic structural diagram of a dimension reduction autoencoder according to an embodiment of the present invention.

图3为本发明实施例特征融合示意图。FIG. 3 is a schematic diagram of feature fusion according to an embodiment of the present invention.

图4为本发明实施例LSTM模型结构示意图。FIG. 4 is a schematic structural diagram of an LSTM model according to an embodiment of the present invention.

具体实施方式Detailed ways

下面对本发明的具体实施方式进行描述，以便于本技术领域的技术人员理解本发明，但应该清楚，本发明不限于具体实施方式的范围，对本技术领域的普通技术人员来讲，只要各种变化在所附的权利要求限定和确定的本发明的精神和范围内，这些变化是显而易见的，一切利用本发明构思的发明创造均在保护之列。The specific embodiments of the present invention are described below to facilitate those skilled in the art to understand the present invention, but it should be clear that the present invention is not limited to the scope of the specific embodiments. For those skilled in the art, as long as various changes Such changes are obvious within the spirit and scope of the present invention as defined and determined by the appended claims, and all inventions and creations utilizing the inventive concept are within the scope of protection.

一种基于深度异常检测的井筒积液检测方法，如图1所示，包括如下步骤：A wellbore fluid detection method based on depth anomaly detection, as shown in Figure 1, includes the following steps:

S1、获取SCADA 生产高频数据并对其进行降维；S1. Obtain high-frequency data produced by SCADA and reduce its dimensionality;

SCADA 数据为从 SCADA 生产数据采集系统中读取处的生产高频数据，时间级为秒级，能详细反映生产过程中天内数据波动情况。经过数据清洗操作后，SCADA 数据一天一个特征就有 17279 条，具有高维性的特点。SCADA data is the production high-frequency data read from the SCADA production data acquisition system, and the time level is in seconds, which can reflect the data fluctuations in the production process in detail. After the data cleaning operation, there are 17,279 pieces of SCADA data per day, which has the characteristics of high dimensionality.

由于SCADA 生产高频数据具有高维性的特点，考虑后续要与 A2 数据、地质参数数据进行特征融合，若直接使用原始数据进行特征融合并进行数据建模，将导致维灾难以及 A2 数据与地质参数对模型的影响力不足，并且SCADA数据作为秒级生产数据，A2 为SCADA 数据按天汇总的生产数据，时间级为天级，主要用于观察天与天之间的数据波动情况。 A2数据作为天级生产数据，降维操作将SCADA 数据的时间维度提升到天级。本实施例里利用自编码器将SCADA生产高频数据的获取的时序生产高频数据特征降低到指定维度，其中，所述自编码器包括多层LSTM网络，每层LSTM网络具有不同的隐藏单元，每层LSTM网络对时序的SCADA生产高频数据进行不同维度的特征提取，如图2所示。Due to the high-dimensional characteristics of SCADA-produced high-frequency data, it is considered that feature fusion with A2 data and geological parameter data will be carried out in the future. The influence of parameters on the model is insufficient, and SCADA data is used as second-level production data, A2 is the production data aggregated by SCADA data by day, and the time level is day level, which is mainly used to observe data fluctuations between days. A2 data is used as day-level production data, and the dimensionality reduction operation raises the time dimension of SCADA data to day-level. In this embodiment, the self-encoder is used to reduce the time-series production high-frequency data features obtained from the SCADA production high-frequency data to a specified dimension, wherein the self-encoder includes a multi-layer LSTM network, and each layer of the LSTM network has different hidden units , each layer of LSTM network performs feature extraction of different dimensions on the high-frequency data produced by SCADA in time series, as shown in Figure 2.

自编码器一共有五层，每层是一个 LSTM（Long Short-Term Memory）网络，每层LSTM 网络拥有不同的隐藏单元，对时序数据进行不同维度的特征提取。所提取的编码器中间变量，实质上是一组从输入的高维数据中提取出的最具代表性的特征，将 SCADA 高频生产动态数据特征从一天几万条降维成自定义条数，把时间刻度从秒级提到天级，再与 A2数据和静态参数进行特征融合。The autoencoder has a total of five layers, each layer is an LSTM (Long Short-Term Memory) network, and each layer of LSTM network has different hidden units to extract features of different dimensions for time series data. The extracted intermediate variables of the encoder are essentially a set of the most representative features extracted from the input high-dimensional data, which reduces the dimension of SCADA high-frequency production dynamic data features from tens of thousands of records a day to a custom number of records. , raise the time scale from seconds to days, and then perform feature fusion with A2 data and static parameters.

S2、将降维后的SCADA 生产高频数据与A2数据和地质参数数据进行特征融合；S2. Feature fusion of SCADA production high-frequency data after dimensionality reduction with A2 data and geological parameter data;

在本实施例里，将降维后的 SCADA 数据与 A2 数据、地质参数数据进行特征融合，地质参数为每口生产井的地质参数，为井的固定属性。融合操作使用 Concat 操作，将其拼接成一个特征向量，使得后续建模模型特征多样化。Concat 操作是指将所有特征直接拼接成一个向量，以做到特征多样性，提高模型建模结果。将一天 SCADA 动态数据降维后的结果，与 A2 数据以及静态参数进行特征向量拼接，组成最终的模型输入特征向量，以此达到积液预测模型在考虑 SCADA 动态数据的同时，也充分利用了 A2 数据和静态参数，作出更准确的预测结果。本模型的数据拼接方法如图3所示。In this embodiment, feature fusion is performed on SCADA data after dimensionality reduction, A2 data, and geological parameter data. The geological parameters are the geological parameters of each production well and are the fixed attributes of the well. The fusion operation uses the Concat operation to concatenate it into a feature vector to diversify the features of the subsequent modeling model. Concat operation refers to splicing all features directly into a vector to achieve feature diversity and improve model modeling results. The result of one day's SCADA dynamic data dimensionality reduction is spliced with A2 data and static parameters to form the final model input eigenvector, so that the effusion prediction model takes full advantage of A2 while considering the SCADA dynamic data. data and static parameters to make more accurate predictions. The data splicing method of this model is shown in Figure 3.

在本实施例里，先使用正常数据（未积液时间段的数据）对模型进行训练，使得模型学习如何根据正常数据预测未来一天的A2数据。数据建模模型使用LSTM模型，该模型的输入与输出均是特定滑动窗口大小的时序数据，旨在捕捉天与天之间的数据波动关系，而天与天的波动关系能反映出积液特征，比如日产气量的骤降、油套压压差变大等。由于模型只使用正常数据进行训练，故当模型的输入是正常数据时，将得到一个预测误差较小的输出，当模型的输入是异常数据（积液时间段的数据）时，将会得到一个较大预测误差的输出。模型结构如图4所示。In this embodiment, the model is first trained using normal data (data in a time period without fluid accumulation), so that the model learns how to predict the A2 data of the next day according to the normal data. The data modeling model uses the LSTM model. The input and output of this model are time series data with a specific sliding window size, which aims to capture the data fluctuation relationship between days and days, and the fluctuation relationship between days and days can reflect the characteristics of effusion. , such as the sudden drop in the daily gas production, the increase in the pressure difference of the oil jacket, etc. Since the model only uses normal data for training, when the input of the model is normal data, an output with a small prediction error will be obtained, and when the input of the model is abnormal data (data in the effusion period), an output will be obtained. Larger prediction error output. The model structure is shown in Figure 4.

本实施例里，LSTM 模型是一种特殊的RNN类型，可以学习长期依赖信息，主要由遗忘门、输入门和输出门实现信息的选择性通过，实现信息的保护和控制。遗忘门的运行机制如公式1所示：In this embodiment, the LSTM model is a special type of RNN, which can learn long-term dependent information. The forget gate, the input gate and the output gate are mainly used to realize the selective passage of information, so as to realize the protection and control of the information. The operating mechanism of the forget gate is shown in Equation 1:

（公式1）

(Formula 1)

其中

是当前输入向量，

是当前隐藏层向量，

包含所有LSTM细胞的输出。

、

、

分别是偏置、输入权重和遗忘门的循环权重，

表示sigmoid函数，将权重设置为0到1之间的值。因此LSTM单元内部状态以公式2的方式更新，其中有一个条件的自环权重：in

is the current input vector,

is the current hidden layer vector,

Contains the output of all LSTM cells.

,

are the bias, the input weight, and the loop weight of the forget gate, respectively,

Represents the sigmoid function, setting the weights to a value between 0 and 1. Therefore, the internal state of the LSTM unit is updated in the manner of Equation 2, which has a conditional self-loop weight:

（公式2）

(Formula 2)

其中

、

分别是LSTM单元中的偏置、输入权重和循环权重。in

,

are the bias, input weight, and loop weight in the LSTM unit, respectively.

输入门单元

以类似遗忘门的方式更新，但有自身的参数，其工作机制如公式3所示：input gate unit

It is updated in a manner similar to the forget gate, but has its own parameters, and its working mechanism is shown in Equation 3:

（公式3）

(Formula 3)

其中参数意义于公式1相同。The meaning of the parameters is the same as that of formula 1.

LSTM单元的输出

也可以由输出门

关闭，其工作机制如公式4和公式5:The output of the LSTM cell

can also be controlled by the output gate

off, it works like Equation 4 and Equation 5:

（公式4）

(Formula 4)

（公式5）

(Formula 5)

其中

、

、

分别是偏置、输入权重和循环权重。in

,

are the bias, input weight, and loop weight, respectively.

搭建好上述模型后，使用正常数据特征对模型开始训练，本实施例模型中优化器采用Adam Optimizer，优化器学习率设置为0.002，总迭代次数为500次。After the above model is built, use normal data features to start training the model. In the model in this embodiment, Adam Optimizer is used as the optimizer, the learning rate of the optimizer is set to 0.002, and the total number of iterations is 500 times.

误差计算：模型实质上是根据输入特征，对未来一天的A2数据的平均油压、平均套压以及平均日产气量进行预测，根据t时刻的实际生产数据

和预测数据

，预测误差的计算公式如下：Error calculation: The model essentially predicts the average oil pressure, average casing pressure and average daily gas production of A2 data in the future based on the input characteristics, and according to the actual production data at time t

and forecast data

, the calculation formula of prediction error is as follows:

（公式6）

(Formula 6)

其中，

表示t时刻的平均预测误差，n表示输入输出的向量长度，

为t时刻的输入数据，

为t时刻的预测数据。in,

represents the average prediction error at time t, n represents the vector length of the input and output,

is the input data at time t,

is the forecast data at time t.

根据预测误差向量，求出动态阈值候选向量A,其定义如公式7所示：According to the prediction error vector, the dynamic threshold candidate vector A is obtained, and its definition is shown in formula 7:

（公式7）

(Formula 7)

其中，

是均值，

是标准差，z是权重系数。每一个时间步的误差阈值

是动态的，计算公式如公式8-公式12所示：in,

is the mean,

is the standard deviation, and z is the weight coefficient. Error threshold at each time step

is dynamic, and the calculation formula is shown in Equation 8-Equation 12:

（公式8）

(Formula 8)

（公式9）

(Equation 9)

（公式10）

(Formula 10)

（公式11）

(Equation 11)

（公式12）

(Equation 12)

动态阈值法是要先根据预测误差向量进行阈值候选向量A的计算，再从候选向量中用挑选出最佳阈值

，即是

The dynamic threshold method is to first calculate the threshold candidate vector A according to the prediction error vector, and then select the best threshold from the candidate vectors.

, that is

具体计算时，

这个公式表示根据筛选出的阈值

对历史误差值

进行划分，得出历史误差值的均值

与去掉异常值之后的数据均值

之差。When calculating the specific

This formula expresses the filtered threshold according to

historical error value

Divide to get the mean of historical error values

and the mean of the data after removing outliers

Difference.

同理，

则代表的是标准差之差。Similarly,

represents the standard deviation.

表示根据阈值

选出异常值集合。

indicates that according to the threshold

Select the set of outliers.

表示

中为连续异常值的集合。

express

is the set of continuous outliers.

整个动态阈值法的意思可以理解为：用最少的异常序列、最少的异常点个数，使得去异常后的序列与原序列的均值和标准差的差异尽可能的大The meaning of the whole dynamic threshold method can be understood as: using the least abnormal sequence and the least number of abnormal points, so that the difference between the mean and standard deviation of the sequence after removing the abnormal and the original sequence is as large as possible

可以看出，动态阈值的方法综合了均值和标准差，其根据误差累积的情况不断更新阈值。在整个过程中只需要调整一个参数即权重系数z。It can be seen that the dynamic threshold method integrates the mean and standard deviation, and it continuously updates the threshold according to the accumulation of errors. In the whole process, only one parameter needs to be adjusted, that is, the weight coefficient z.

找到的阈值后，对 t+1 时刻进行积液判断，当 t+1 时刻的预测误差大于或等于动态阈值时，则判断该时刻井筒处于积液状态，否则判断该时刻未积液。After finding the threshold, judge the fluid accumulation at time t+1. When the prediction error at time t+1 is greater than or equal to the dynamic threshold, it is judged that the wellbore is in the state of fluid accumulation at this time, otherwise it is judged that there is no fluid accumulation at this time.

在实际工业生产中，各种工业措施也会使得数据出现相较于正常状态下的较大的波动，为了使得模型对工业措施不会进行误判，故采用动态阈值的方法对积液进行预测。In actual industrial production, various industrial measures will also cause the data to fluctuate greatly compared to the normal state. In order to prevent the model from misjudging the industrial measures, the dynamic threshold method is used to predict the effusion. .

本发明是参照根据本发明实施例的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器，使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block in the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to the processor of a general purpose computer, special purpose computer, embedded processor or other programmable data processing device to produce a machine such that the instructions executed by the processor of the computer or other programmable data processing device produce Means for implementing the functions specified in a flow or flow of a flowchart and/or a block or blocks of a block diagram.

这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中，使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品，该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。These computer program instructions may also be stored in a computer-readable memory capable of directing a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory result in an article of manufacture comprising instruction means, the instructions The apparatus implements the functions specified in the flow or flow of the flowcharts and/or the block or blocks of the block diagrams.

这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上，使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理，从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。These computer program instructions can also be loaded on a computer or other programmable data processing device to cause a series of operational steps to be performed on the computer or other programmable device to produce a computer-implemented process such that The instructions provide steps for implementing the functions specified in the flow or blocks of the flowcharts and/or the block or blocks of the block diagrams.

本发明中应用了具体实施例对本发明的原理及实施方式进行了阐述，以上实施例的说明只是用于帮助理解本发明的方法及其核心思想；同时，对于本领域的一般技术人员，依据本发明的思想，在具体实施方式及应用范围上均会有改变之处，综上所述，本说明书内容不应理解为对本发明的限制。In the present invention, the principles and implementations of the present invention are described by using specific embodiments, and the descriptions of the above embodiments are only used to help understand the method and the core idea of the present invention; The idea of the invention will have changes in the specific implementation and application scope. To sum up, the content of this specification should not be construed as a limitation to the present invention.

本领域的普通技术人员将会意识到，这里所述的实施例是为了帮助读者理解本发明的原理，应被理解为本发明的保护范围并不局限于这样的特别陈述和实施例。本领域的普通技术人员可以根据本发明公开的这些技术启示做出各种不脱离本发明实质的其它各种具体变形和组合，这些变形和组合仍然在本发明的保护范围内。Those of ordinary skill in the art will appreciate that the embodiments described herein are intended to assist readers in understanding the principles of the present invention, and it should be understood that the scope of protection of the present invention is not limited to such specific statements and embodiments. Those skilled in the art can make various other specific modifications and combinations without departing from the essence of the present invention according to the technical teaching disclosed in the present invention, and these modifications and combinations still fall within the protection scope of the present invention.

Claims

1. A shaft effusion detection method based on depth anomaly detection is characterized by comprising the following steps:

s1, obtaining SCADA production high-frequency data and reducing the dimension of the SCADA production high-frequency data;

s2, carrying out feature fusion on the SCADA production high-frequency data subjected to dimensionality reduction, the A2 data and the geological parameter data;

s3, carrying out data modeling and training by using the features fused in the step S2, and calculating a prediction error;

and S4, calculating a dynamic threshold according to the prediction error obtained in the step S3, judging whether the shaft is subjected to liquid accumulation or not according to the dynamic threshold, judging that the shaft is in a liquid accumulation state at the moment when the prediction error at the moment t +1 is greater than or equal to the dynamic threshold, and otherwise judging that the shaft is not subjected to liquid accumulation at the moment.

2. The method for detecting the effusion in the wellbore based on the detection of the depth anomaly according to claim 1, wherein the step of reducing the dimension of the high-frequency SCADA production data in the step S1 comprises the following steps:

reducing the obtained time sequence production high-frequency data characteristics of the SCADA production high-frequency data to a specified dimension by using a self-encoder, wherein the self-encoder comprises a plurality of layers of LSTM networks, each layer of LSTM network is provided with different hidden units, and each layer of LSTM network carries out feature extraction of different dimensions on the time sequence SCADA production high-frequency data.

3. The method as claimed in claim 2, wherein in S2, feature fusion is performed by using Concat operation, and the result of dimensional reduction of SCADA dynamic data within one day is subjected to feature vector concatenation with a2 data and static parameters of geological parameters to obtain an input feature vector.

4. The method for detecting effusion in a wellbore based on detection of depth anomaly according to claim 3, wherein the data modeling and training in S3 are carried out by:

s31, constructing an LSTM model comprising a forgetting gate, an input gate and an output gate by using the fused features obtained in the step S2;

and S32, updating the internal forgetting gate, the input gate and the output gate of the LSTM model by using the self-circulation weight to obtain the updated LSTM model.

5. The method for detecting effusion in a wellbore based on depth anomaly detection according to claim 4, wherein the updating manner of the forgetting gate by using the self-circulation weight in S32 is as follows:

；

wherein,

is as follows

A forgetting door is

The output of the time of day is,

to forget the door

The weight of each of the offsets is determined,

is composed of

At the first moment

The first that the forgetting gate corresponds to

An input variable

The input weight of (a) is determined,

is composed of

At the first moment

The first that the forgetting gate corresponds to

Hidden layer variable

The cyclic weight of (a) is determined,

is composed of

The hidden layer variable at a time of day,

representing the sigmoid function.

6. The method according to claim 5, wherein the input gates in S32 are updated in a manner that:

；

wherein,

is as follows

An input gate is arranged at

The output of the time of day is,

is the first of the input gate

The offset is calculated by the calculation unit,

is that

The input vector of the time of day,

is composed of

The hidden layer vector at a time instant,

is composed of

At the first moment

Corresponding to each input gate

An input variable

The input weight of (a) is determined,

is composed of

At the first moment

Corresponding to each input gate

Hidden layer variable

The cyclic weight of (2).

7. The method for detecting effusion in a wellbore based on detection of depth anomaly according to claim 6, wherein the updating manner of the internal state of the LSTM cell in S32 is as follows:

；

wherein,

is at the same time

The internal state of the LSTM model at a time,

in order to input the gate unit to the gate unit,

in order to bias the weight of the weight,

is the loop weight of the hidden layer variable.

8. The method of claim 7, wherein the output of the LSTM model is represented as:

；

wherein,

is as follows

Output gate

Output of time of day，

Is as follows

A memory unit

The value of the time of day is,

is the first of the input gate

The offset is calculated by the calculation unit,

is that

The input vector of the time of day,

is composed of

The hidden layer vector at a time instant,

is composed of

At the first moment

Corresponding to each input gate

An input variable

The input weight of (a) is determined,

is composed of

At the first moment

Corresponding to each input gate

Hidden layer variable

The cyclic weight of (2).

9. The method for detecting effusion in a wellbore based on detection of depth anomaly according to claim 8, wherein the prediction error is calculated in S3 by: