[go: up one dir, main page]

CN113204054B - Self-adaptive wide-area electromagnetic method induced polarization information extraction method based on reinforcement learning - Google Patents

Self-adaptive wide-area electromagnetic method induced polarization information extraction method based on reinforcement learning Download PDF

Info

Publication number
CN113204054B
CN113204054B CN202110386529.1A CN202110386529A CN113204054B CN 113204054 B CN113204054 B CN 113204054B CN 202110386529 A CN202110386529 A CN 202110386529A CN 113204054 B CN113204054 B CN 113204054B
Authority
CN
China
Prior art keywords
parameter
network
inversion
resistivity
polarizability
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110386529.1A
Other languages
Chinese (zh)
Other versions
CN113204054A (en
Inventor
董莉
江沸菠
李小龙
肖林
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hunan University of Technology
Original Assignee
Hunan University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hunan University of Technology filed Critical Hunan University of Technology
Priority to CN202110386529.1A priority Critical patent/CN113204054B/en
Publication of CN113204054A publication Critical patent/CN113204054A/en
Application granted granted Critical
Publication of CN113204054B publication Critical patent/CN113204054B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01VGEOPHYSICS; GRAVITATIONAL MEASUREMENTS; DETECTING MASSES OR OBJECTS; TAGS
    • G01V3/00Electric or magnetic prospecting or detecting; Measuring magnetic field characteristics of the earth, e.g. declination, deviation
    • G01V3/08Electric or magnetic prospecting or detecting; Measuring magnetic field characteristics of the earth, e.g. declination, deviation operating with magnetic or electric fields produced or modified by objects or geological structures or by detecting devices
    • G01V3/083Controlled source electromagnetic [CSEM] surveying

Landscapes

  • Physics & Mathematics (AREA)
  • Electromagnetism (AREA)
  • Engineering & Computer Science (AREA)
  • Remote Sensing (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Environmental & Geological Engineering (AREA)
  • Geology (AREA)
  • General Life Sciences & Earth Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Geophysics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

本发明公开了一种基于强化学习的自适应广域电磁法激电信息提取方法,其特征在于,通过定义敏感度作为反演参数识别的特征,同时采用强化学习的方法来实现自适应反演参数的识别和正则化设置,从而实现智能化的激电信息提取。通过本发明方法,由于反演前期电阻率对观测数据的影响远大于极化率,因此电阻率的敏感度将高于极化率,此时的反演以电阻率为主,对电阻率参数施加先验信息约束,对极化率参数施加强限制约束;而后期电阻率趋于稳定,极化率的敏感度将高于电阻率,此时的反演将以极化率为主,对极化率参数施加先验信息约束,对电阻率参数施加强限制约束;而具体的正则化系数和约束施加通过强化学习对反演阶段的判断结果进行设定。

Figure 202110386529

The invention discloses an adaptive wide-area electromagnetic method IP information extraction method based on reinforcement learning. Parameter identification and regularization settings, so as to achieve intelligent IP information extraction. Through the method of the present invention, since the influence of the resistivity on the observation data in the early stage of the inversion is much greater than that of the polarizability, the sensitivity of the resistivity will be higher than that of the polarizability. A priori information constraint is imposed, and a strong constraint is imposed on the polarizability parameter; in the later stage, the resistivity tends to be stable, and the sensitivity of the polarizability will be higher than that of the resistivity. At this time, the inversion will be dominated by the polarizability. The polarizability parameter imposes a priori information constraint, and imposes a strong constraint on the resistivity parameter; while the specific regularization coefficient and constraint imposition are used to set the judgment results of the inversion stage through reinforcement learning.

Figure 202110386529

Description

Self-adaptive wide-area electromagnetic method induced polarization information extraction method based on reinforcement learning
Technical Field
The invention belongs to the technical field of geophysical, and relates to a self-adaptive wide-area electromagnetic method induced polarization information extraction method based on reinforcement learning.
Background
Wide area Electromagnetic (WFEM) is a new type of frequency domain Electromagnetic prospecting Method. The method has the advantages of stable and reliable field source signals of a controllable source audio magnetotelluric method (CSAMT) and non-remote area measurement of a magnetic couple source frequency sounding Method (MELOS). The wide-area apparent resistivity defined by the WFEM strictly reserves high-order terms in the series expansion of the electromagnetic field expression, can be extracted by measuring only one physical quantity in various working modes, and is the full-area applicable apparent resistivity capable of effectively improving the non-far-zone distortion effect of the electromagnetic sounding curve.
Currently, WFEM obtains a series of positive results in the fields of oil and gas resource exploration, metal mine exploration, engineering survey and the like. However, in practical applications, the frequency domain electromagnetic wave response of the subsurface medium is a comprehensive reflection of electromagnetic induction and the effect of induced electricity. The research of extracting the induced polarization information from the frequency domain electromagnetic method signal is developed, more physical parameters can be obtained, the influence of the polarization effect on the electromagnetic signal can be quantitatively analyzed, and the inversion interpretation precision of the frequency domain electromagnetic method is further improved.
However, since the intensity of the anomaly caused by the underground uneven conductivity is much larger than that caused by the induced electrical effect, the inversion process is obviously divided into two parts: (1) the resistivity inversion part is used for rapidly converging the individual to be close to the correct resistivity parameter in a solution space because the resistivity parameter has a main influence effect on the fitness function curve; (2) in the polarizability inversion part, the influence of the resistivity parameter on the fitness function tends to be stable, the individual starts to perform fine adjustment near the resistivity parameter, the optimization of the polarizability parameter becomes a main reason for the decline of the fitness curve, but the influence on the fitness function is far less than the influence of the resistivity parameter on the fitness function because the polarizability parameter is far less than the resistivity parameter in value, the algorithm is very easy to fall into a local extremum at the moment, the wrong polarizability parameter is obtained, and the difficulty in extracting the excitation information is increased. Therefore, how to extract weak polarizability parameters under the influence of resistivity parameters is a complex engineering problem, and has a particularly great technical challenge.
Disclosure of Invention
The invention aims to provide a reinforcement learning-based adaptive wide-area electromagnetic method induced polarization information extraction method, which realizes the identification and regularization setting of adaptive inversion parameters by defining sensitivity as the characteristics of inversion parameter identification and adopting a reinforcement learning method, thereby improving the accuracy of induced polarization information extraction.
In order to achieve the purpose, the invention provides the following technical scheme:
the invention provides a reinforcement learning-based adaptive wide-area electromagnetic method induced polarization information extraction method, which is characterized in that sensitivity is defined as the characteristic of inversion parameter identification, and meanwhile, the reinforcement learning method is adopted to realize the identification and regularization setting of adaptive inversion parameters, so that intelligent induced polarization information extraction is realized.
The invention provides a reinforcement learning-based adaptive wide-area electromagnetic method induced polarization information extraction method, which comprises the following steps of:
s1, setting a calculation equation of the wide area apparent resistivity:
Figure GDA0003562403260000021
in the formula (1), the reaction mixture is,
Figure DEST_PATH_FDA0003567677330000022
r is the distance from the observation point to the center of the dipole source, or the transmitting-receiving distance; dL is the length of the horizontal current source,
Figure DEST_PATH_FDA0003567677330000023
is the distance between observation points M and N;
Figure GDA0003562403260000024
p is the resistivity, I is the current intensity,
Figure GDA0003562403260000025
k is called the propagation constant or wavenumber of the electromagnetic wave, i is the imaginary part,
Figure GDA0003562403260000026
is the angle between r and the current source;
s2, setting an induced polarization model as follows:
Figure GDA0003562403260000027
in the formula (2), ρ (ω) is a wide-area complex resistivity related to frequency in consideration of the polarization effect; rhoaWide area apparent resistivity when no polarization effect is considered; m is polarizability; τ is a time constant; c is a frequency correlation coefficient, and omega is an angular velocity;
s3, setting an inverted objective function as follows:
fit=E(e)+λ1R(ρ)+λ2R(m) (3)
in formula (3), R (ρ) and R (m) are minimum constructive constraint functions for resistivity and polarizability, respectively; lambda [ alpha ]1、λ2The two regularization factors are respectively corresponding to R (delta) and R (m), and the reason for adopting the two independent regularization factors is the value space of the polarizability (m belongs to[0,1]) The value space of the resistivity is greatly different (delta > m can be generally considered), and if a uniform regularization factor is adopted, a relatively small polarizability parameter cannot be constrained; e (e) is a target error function, and is a fitting error of data in inversion;
r (ρ) and R (m) are both calculated here using the following formulae:
Figure GDA0003562403260000031
in the formula (4), M is a model parameter obtained by inversion, and comprises resistivity rho and polarizability M;
s4, designing a staged extraction method of different physical parameters by defining sensitivity as the characteristics of inversion parameter identification, and distinguishing the stage of the current inversion through the sensitivity;
the sensitivities of resistivity and polarizability are defined as follows:
Figure GDA0003562403260000032
in the formula (5), S is sensitivity, G is iteration times, fit is fitness, and M is a model parameter obtained by inversion, including resistivity rho and polarizability M;
s5, adopting reinforcement learning based on the determined strategy gradient to realize judgment of an inversion stage and setting of a regularization coefficient;
the reinforcement learning comprises three elements of state, behavior and reward, and system modeling is carried out aiming at the three elements, wherein the state is the sensitivity of resistivity and polarizability, the behavior is a regularization coefficient, and the reward is an improved value of fitness; the system judges the inversion stage according to the current state and outputs a corresponding regularization coefficient, and then calculates the reward according to the inversion result to adjust a strategy and a value function in reinforcement learning; through repeated learning until the strategy and the value function are stable, the inversion stage can be accurately judged and a proper regularization coefficient is set;
and S6, controlling inversion imposed constraint according to the regularization coefficient generated by reinforcement learning, realizing the identification and regularization setting of the adaptive inversion parameters, and obtaining high-precision induced polarization information (including resistivity and polarizability parameters).
Further, in step S5, the step of reinforcement learning includes:
step one, four networks are initialized randomly, namely a current strategy network mu, a target strategy network mu ', a current Q network Q and a target Q network Q';
the parameters are respectively: a current strategy network parameter theta, a target strategy network parameter theta ', a current Q network parameter w and a target Q network parameter w', wherein the current iteration time t is 0;
step two, S is an initial state, the state S is input into the current policy network, and the action A is obtained:
A=μ(S|θ)+N
wherein, mu (-) is the strategy output by the current strategy network, S is the initial state, theta is the parameter of the current strategy network, and N is the noise;
step three, the state S executes the action A to obtain the next state S ', the reward R is obtained, and the S, A, R and S' are stored into an experience playback set D ═ St,At,Rt,S′t};
Step four, updating the state S to be S'; randomly collecting n samples from an empirical playback set D Si,Ai,Ri,S′i1,2,3, …, n, calculating the output value y of the current Q network Qi
yi=Ri+γq′(S′i,μ′(S′i|θ′)|w′
Wherein R isiIs state SiPerforming action AiThe reward earned, γ is the reward attenuation factor, Q '() is the Q value of the target Q network output, w' is a parameter of the target Q network, μ '() is the policy of the target policy network output, θ' is a parameter of the target policy network;
step five, calculating the loss L of the current Q network by using a mean square error loss function MSE (mean squared error) and updating all parameters w of the current Q network through the gradient back propagation of the neural network;
Figure GDA0003562403260000051
where n is the total number of samples taken, Q (-) is the Q value of the current Q network output, SiIs the ith state, AiIs the ith action, w is a parameter of the current Q network;
step six, using a performance index function J, updating all parameters theta of the current strategy network through gradient back propagation of the neural network, and increasing the iteration times t by 1;
Figure GDA0003562403260000052
step seven, updating the target Q network parameter w 'and the target strategy network parameter theta' every fixed period;
w′=τw+(1-τ)w′
θ′=τθ+(1-τ)θ′
wherein tau is a network parameter soft update coefficient, theta is a current policy network parameter, and w is a current Q network parameter;
and step eight, judging whether the strategy and the value function are stably converged, finishing the training if the terminal condition is reached, and returning to the step two if the terminal condition is not reached.
Further, in step S6, two types of constraints will be imposed during the inversion process: firstly, applying prior information constraint of resistivity and polarizability by utilizing known physical characteristics of an exploration area, and reducing a search space of an inversion algorithm; and secondly, when a certain physical property parameter (resistivity parameter or polarizability parameter) is in an inversion stage, a limitation constraint is applied to another physical property parameter, namely, the search of the other physical property parameter is limited within a small range, so that the influence of the main physical property parameter on the fitness function is strengthened.
Therefore, the strength of different constraints is controlled by different regularization coefficients generated by reinforcement learning, and accurate multi-parameter inversion is realized.
According to the method, the influence of the resistivity at the earlier stage of inversion on the observation data is far greater than the polarizability, so that the sensitivity of the resistivity is higher than the polarizability, the resistivity is mainly used at the earlier stage of inversion, prior information constraint is applied to the resistivity parameter, and strong limit constraint is applied to the polarizability parameter; and in the later inversion stage, the resistivity tends to be stable, the sensitivity of the polarizability is higher than that of the resistivity, the polarizability is mainly used, prior information constraint is applied to polarizability parameters, and strong limit constraint is applied to resistivity parameters. And the specific constraint application also sets the judgment result of the inversion stage through reinforcement learning.
The invention designs a reinforcement learning-based adaptive wide-area electromagnetic method induced polarization information extraction method, so that an inversion algorithm can automatically and quickly identify whether the main parameter of current inversion is polarizability or resistivity, and perform targeted inversion, thereby improving the accuracy of induced polarization information extraction.
Compared with the prior art, the invention has the following advantages:
(1) the method can judge the current inversion state (mainly polarization inversion or mainly resistivity inversion) according to the sensitivities of the resistivity and the polarization in the iteration process, and output correct regularization coefficients and apply correct constraint conditions, thereby realizing intelligent excited electricity information extraction.
(2) The method can effectively solve the problem of uncertainty in multi-parameter inversion.
(3) The method can strengthen the influence of the polarizability in the later inversion stage and improve the accuracy of the extraction of the induced polarization information.
Drawings
Fig. 1 is a flow chart of an adaptive wide-area electromagnetic method induced polarization information extraction method based on reinforcement learning.
FIG. 2 is a regularization coefficient and constraint setting strategy based on reinforcement learning.
Detailed Description
The invention will be further illustrated with reference to the following specific examples and the accompanying drawings:
example 1
The invention provides a reinforcement learning-based adaptive wide-area electromagnetic method induced polarization information extraction method, which comprises the following steps of:
s1, setting a calculation equation of the wide area apparent resistivity:
Figure GDA0003562403260000061
in the formula (1), the reaction mixture is,
Figure 449169DEST_PATH_FDA0003567677330000022
r is the distance from the observation point to the center of the dipole source, or the transmitting-receiving distance; dL is the length of the horizontal current source,
Figure 366310DEST_PATH_FDA0003567677330000023
is the distance between observation points M and N;
Figure GDA0003562403260000071
p is the resistivity, I is the current intensity,
Figure GDA0003562403260000072
k is called the propagation constant or wavenumber of the electromagnetic wave, i is the imaginary part,
Figure GDA0003562403260000073
is the angle between r and the current source;
s2, setting an induced polarization model as follows:
Figure GDA0003562403260000074
in the formula (2), ρ (ω) is a wide-area complex resistivity related to frequency in consideration of the polarization effect; rhoaWide area apparent resistivity when no polarization effect is considered; m is polarizability; τ is a time constant; c is a frequency correlation coefficient, and omega is an angular velocity;
s3, setting an inverted objective function as follows:
fit=E(e)+λ1R(ρ)+λ2R(m) (3)
in formula (3), R (ρ) and R (m) are minimum constructive constraint functions for resistivity and polarizability, respectively; lambda [ alpha ]1、λ2The regularization factors respectively corresponding to R (rho) and R (m), and the reason for adopting two independent regularization factors is the value space of the polarizability (m belongs to [0,1]]) The value space of the resistivity is greatly different (generally, rho > m), and if a uniform regularization factor is adopted, a relatively small polarizability parameter cannot be constrained; e (e) is a target error function, and is a fitting error of data in inversion;
r (ρ) and R (m) are both calculated here using the following formulae:
Figure GDA0003562403260000075
in the formula (4), M is a model parameter obtained by inversion, and comprises resistivity rho and polarizability M;
s4, designing a staged extraction method of different physical parameters by defining sensitivity as the characteristics of inversion parameter identification, and distinguishing the stage of the current inversion through the sensitivity;
the sensitivities of resistivity and polarizability are defined as follows:
Figure GDA0003562403260000076
in the formula (5), S is sensitivity, G is iteration times, fit is fitness, and M is a model parameter obtained by inversion, including resistivity rho and polarizability M;
s5, adopting reinforcement learning based on the determined strategy gradient to realize judgment of an inversion stage and setting of a regularization coefficient, as shown in FIG. 2 specifically;
the reinforcement learning comprises three elements of state, behavior and reward, and system modeling is carried out aiming at the three elements, wherein the state is the sensitivity of resistivity and polarizability, the behavior is a regularization coefficient, and the reward is an improved value of fitness; the system judges the inversion stage according to the current state and outputs a corresponding regularization coefficient, and then calculates the reward according to the inversion result to adjust a strategy and a value function in reinforcement learning; through repeated learning until the strategy and the value function are stable, the inversion stage can be accurately judged and a proper regularization coefficient is set;
the step of reinforcement learning comprises:
step one, four networks are initialized randomly, namely a current strategy network mu, a target strategy network mu ', a current Q network Q and a target Q network Q';
the parameters are respectively: a current strategy network parameter theta, a target strategy network parameter theta ', a current Q network parameter w and a target Q network parameter w', wherein the current iteration time t is 0;
step two, S is an initial state, the state S is input into the current policy network, and the action A is obtained:
A=μ(S|θ)+N
wherein, mu (-) is the strategy output by the current strategy network, S is the initial state, theta is the parameter of the current strategy network, and N is the noise;
step three, the state S executes the action A to obtain the next state S ', the reward R is obtained, and the S, A, R and S' are stored into an experience playback set D ═ St,At,Rt,S′t};
Step four, updating the state S to be S'; randomly collecting n samples from an empirical playback set D Si,Ai,Ri,S′i1,2,3, …, n, calculating the output value y of the current Q network Qi
yi=Ri+γq′(S′i,μ′(S′i|θ′)|w′)
Wherein R isiIs state SiPerforming action AiThe reward earned, γ is the reward attenuation factor, Q '() is the Q value of the target Q network output, w' is a parameter of the target Q network, μ '() is the policy of the target policy network output, θ' is a parameter of the target policy network;
step five, calculating the loss L of the current Q network by using a mean square error loss function MSE (mean squared error) and updating all parameters w of the current Q network through the gradient back propagation of the neural network;
Figure GDA0003562403260000091
where n is the total number of samples taken, Q (-) is the Q value of the current Q network output, SiIs the ith state, AiIs the ith action, w is a parameter of the current Q network;
step six, using a performance index function J, updating all parameters theta of the current strategy network through gradient back propagation of the neural network, and increasing the iteration times t by 1;
Figure GDA0003562403260000092
step seven, updating the target Q network parameter w 'and the target strategy network parameter theta' every fixed period;
w′=τw+(1-τ)w′
θ′=τθ+(1-τ)θ′
wherein tau is a network parameter soft update coefficient, theta is a current policy network parameter, and w is a current Q network parameter;
step eight, judging whether the strategy and the value function are stably converged, finishing the training if the strategy and the value function reach the termination condition, and returning to the step two if the strategy and the value function do not reach the termination condition;
s6, controlling constraints imposed by inversion according to regularization coefficients generated by reinforcement learning, realizing identification and regularization setting of self-adaptive inversion parameters, and obtaining high-precision induced polarization information (including resistivity and polarizability parameters);
two types of constraints will be imposed during the inversion process: firstly, applying prior information constraint of resistivity and polarizability by utilizing known physical characteristics of an exploration area, and reducing a search space of an inversion algorithm; and secondly, when a certain physical property parameter (resistivity parameter or polarizability parameter) is in an inversion stage, a limitation constraint is applied to another physical property parameter, namely, the search of the other physical property parameter is limited within a small range, so that the influence of the main physical property parameter on the fitness function is strengthened. Therefore, the strength of different constraints is controlled by different regularization coefficients generated by reinforcement learning, and accurate multi-parameter inversion is realized.
Example 2
The method was tested on a three-layer model, the resistivity parameter ρ, the thickness parameter h and the polarizability parameter m of which are set as shown in table 1; the inversion algorithm uses a grayish optimization algorithm GWO in which the population size P and the number of iterations tmaxThe settings of (a) are shown in table 1; the soft update coefficient tau and the reward attenuation factor gamma of reinforcement learning are set as shown in table 1; regularization factor lambda of a minimum constructor when reinforcement learning is not employed1And λ2The settings of (2) are shown in table 1.
TABLE 1
Figure GDA0003562403260000101
The inversion results of the comparison between the method provided by the invention and the method which does not adopt reinforcement learning and adopts an Actor-Critic method (single network) are shown in table 2; the evaluation indexes are Root Mean Square Error (RMSE) and coefficient of determination R2
TABLE 2
Method RMSE R2
Learning without reinforcement 38.33 0.88
Actor-critical method 30.24 0.91
The method of the invention 27.43 0.93
According to the inversion result, the inversion method based on reinforcement learning (Actor-Critic method and the method of the invention) is superior to the inversion method without reinforcement learning in result, because the reinforcement learning can automatically identify the physical property stage where the inversion is located, output correct regularization coefficients and apply constraints. The method is superior to the Actor-Critic method because the method adopts double networks to respectively realize Actor and Critic modules, and compared with the Actor-Critic method, the mode of separating the current network and the target network (double networks) can further improve the stability and generalization capability of reinforcement learning.

Claims (4)

1.一种基于强化学习的自适应广域电磁法激电信息提取方法,其特征在于,通过定义敏感度作为反演参数识别的特征,同时采用强化学习的方法来实现自适应反演参数的识别和正则化设置,从而实现智能化的激电信息提取;1. An adaptive wide-area electromagnetic method IP information extraction method based on reinforcement learning, characterized in that, by defining sensitivity as a feature of inversion parameter identification, a method of reinforcement learning is adopted to realize the extraction of adaptive inversion parameters simultaneously. Identification and regularization settings to achieve intelligent IP information extraction; 步骤1、通过定义敏感度作为反演参数识别的特征,设计不同物性参数的分阶段提取方法,并通过敏感度来区分当前反演所处的阶段;Step 1. By defining the sensitivity as the feature of inversion parameter identification, design a method for extracting different physical parameters in stages, and distinguish the current inversion stage by the sensitivity; 定义电阻率和极化率的敏感度如下:The sensitivities for resistivity and polarizability are defined as follows:
Figure FDA0003567677330000011
Figure FDA0003567677330000011
式(5)中,S为敏感度,G为迭代次数,fit为适应度,M为反演得到的模型参数,包括电阻率ρ和极化率m;In formula (5), S is the sensitivity, G is the number of iterations, fit is the fitness, and M is the model parameters obtained by inversion, including resistivity ρ and polarizability m; 反演前期以电阻率为主,对电阻率参数施加先验信息约束,对极化率参数施加强限制约束;而后期电阻率趋于稳定,极化率的敏感度将高于电阻率,反演后期将以极化率为主,对极化率参数施加先验信息约束,对电阻率参数施加强限制约束;In the early stage of the inversion, the resistivity is the main factor, and prior information constraints are imposed on the resistivity parameters, and strong constraints are imposed on the polarizability parameters. In the later stage of the development, the polarizability will be the main factor, and the polarizability parameters will be subject to prior information constraints, and the resistivity parameters will be strongly restricted. 根据迭代过程中电阻率和极化率的敏感度来判断当前反演的状态,以极化率反演为主还是电阻率反演为主,并输出正确的正则化系数并施加正确的约束条件,从而实现智能化的激电信息提取;According to the sensitivity of resistivity and polarizability in the iterative process, the current inversion state is judged, whether the polarizability inversion is the main or the resistivity inversion, and the correct regularization coefficient is output and the correct constraints are imposed. , so as to realize intelligent IP information extraction; 步骤2、采用基于确定策略梯度的强化学习来实现反演阶段的判断和正则化系数的设置;Step 2. Adopt reinforcement learning based on the determined policy gradient to realize the judgment of the inversion stage and the setting of the regularization coefficient; 步骤3、根据强化学习所生成的正则化系数来控制反演施加的约束,实现自适应反演参数的识别和正则化设置,得到高精度的激电信息,包括电阻率和极化率参数。Step 3: Control the constraints imposed by the inversion according to the regularization coefficient generated by the reinforcement learning, realize the identification and regularization setting of the adaptive inversion parameters, and obtain high-precision IP information, including the resistivity and polarizability parameters.
2.根据权利要求1所述基于强化学习的自适应广域电磁法激电信息提取方法,其特征在于,所述步骤1之前还包括以下步骤:2. The self-adaptive wide-area electromagnetic method IP information extraction method based on reinforcement learning according to claim 1, is characterized in that, before described step 1, also comprises the following steps: S1、设置广域视电阻率的计算方程:S1. Set the calculation equation of the wide-area apparent resistivity:
Figure FDA0003567677330000021
Figure FDA0003567677330000021
式(1)中,
Figure FDA0003567677330000022
r为观测点到偶极源中心的距离,或称收发距;dL为水平电流源的长度,
Figure FDA0003567677330000023
为观测点M和N之间的距离;
In formula (1),
Figure FDA0003567677330000022
r is the distance from the observation point to the center of the dipole source, or the transceiver distance; dL is the length of the horizontal current source,
Figure FDA0003567677330000023
is the distance between observation points M and N;
Figure FDA0003567677330000024
Figure FDA0003567677330000025
为观测点M和N之间的距离,ρ为电阻率,I为电流强度,
Figure FDA0003567677330000026
k称为电磁波的传播常数或波数,i为虚部,
Figure FDA0003567677330000027
为r与电流源之间的夹角;
Figure FDA0003567677330000024
Figure FDA0003567677330000025
is the distance between the observation points M and N, ρ is the resistivity, I is the current intensity,
Figure FDA0003567677330000026
k is called the propagation constant or wave number of the electromagnetic wave, i is the imaginary part,
Figure FDA0003567677330000027
is the angle between r and the current source;
S2、设置激电模型为:S2. Set the IP model as:
Figure FDA0003567677330000028
Figure FDA0003567677330000028
式(2)中,ρ(ω)为考虑极化效应后与频率相关的广域复电阻率;ρa为未考虑极化效应时的广域视电阻率;m为极化率;τ为时间常数;c为频率相关系数,ω为角速度;In formula (2), ρ(ω) is the frequency-dependent wide-area complex resistivity after considering the polarization effect; ρ a is the wide-area apparent resistivity without considering the polarization effect; m is the polarizability; τ is Time constant; c is the frequency correlation coefficient, ω is the angular velocity; S3、设置反演的目标函数如下:S3. Set the objective function of the inversion as follows: fit=E(e)+λ1R(ρ)+λ2R(m) (3)fit=E(e)+λ 1 R(ρ)+λ 2 R(m) (3) 式(3)中,R(ρ)和R(m)分别为对电阻率和极化率的最小构造约束函数;λ1、λ2分别为R(ρ)和R(m)对应的正则化因子,采用两个独立正则化因子的原因是极化率的取值空间m较电阻率的取值空间ρ有较大差异;In formula (3), R(ρ) and R(m) are the minimum structural constraint functions for resistivity and polarizability, respectively; λ 1 , λ 2 are the regularization corresponding to R(ρ) and R(m), respectively The reason for using two independent regularization factors is that the value space m of polarizability is quite different from the value space ρ of resistivity; 其中,m∈[0,1],ρ>>m;Among them, m∈[0,1], ρ>>m; 如果采用统一的正则化因子将无法约束相对较小的极化率参数;E(e)为目标误差函数,在反演时为数据的拟合误差;If a uniform regularization factor is used, it will not be able to constrain the relatively small polarizability parameter; E(e) is the objective error function, which is the fitting error of the data during inversion; R(ρ)和R(m)在此均采用下式进行计算:Both R(ρ) and R(m) are calculated here using the following formulas:
Figure FDA0003567677330000029
Figure FDA0003567677330000029
式(4)中,M为反演得到的模型参数,包括电阻率ρ和极化率m。In formula (4), M is the model parameter obtained by inversion, including resistivity ρ and polarizability m.
3.根据权利要求2所述基于强化学习的自适应广域电磁法激电信息提取方法,其特征在于,步骤2中,所述强化学习的步骤包括:3. The self-adaptive wide-area electromagnetic method IP information extraction method based on reinforcement learning according to claim 2, is characterized in that, in step 2, the step of described reinforcement learning comprises: 步骤S201、随机初始化四个网络,分别是当前策略网络μ,目标策略网络μ',当前Q网络q,目标Q网络q';Step S201, randomly initialize four networks, namely the current strategy network μ, the target strategy network μ', the current Q network q, and the target Q network q'; 参数分别为:当前策略网络参数θ,目标策略网络参数θ',当前Q网络参数w和目标Q网络参数w',当前迭代次数t=0;The parameters are: the current strategy network parameter θ, the target strategy network parameter θ', the current Q network parameter w and the target Q network parameter w', and the current iteration number t=0; 步骤S202、S为初始状态,将状态S输入当前策略网络,得到动作A:Step S202, S is the initial state, and the state S is input into the current policy network to obtain the action A: A=μ(S|θ)+NA=μ(S|θ)+N 其中μ(·)是当前策略网络输出的策略,S是初始状态,θ为当前策略网络的参数,N为噪声;where μ( ) is the policy output by the current policy network, S is the initial state, θ is the parameter of the current policy network, and N is the noise; 步骤S203、状态S执行动作A,得到下一个状态S',奖励R,将S,A,R,S'存入经验回放集合D={St,At,Rt,S't};Step S203, state S executes action A, obtains the next state S', rewards R, and stores S, A, R, S' in the experience playback set D={S t , A t , R t , S' t }; 步骤S204、状态S更新为S';从经验回放集合D中随机采集n个样本{Si,Ai,Ri,S'i},i=1,2,3,…,n,计算当前Q网络q的输出值yiStep S204, the state S is updated to S'; randomly collect n samples {S i , A i , R i , S' i }, i=1, 2, 3,..., n from the experience playback set D, and calculate the current The output value yi of the Q network q: yi=Ri+γq'(S'i,μ'(S'i|θ')|w')y i =R i +γq'(S' i ,μ'(S' i |θ')|w') 其中Ri是状态Si执行动作Ai获得的奖励,γ是奖励衰减因子,q'(·)是目标Q网络输出的Q值,w'是目标Q网络的参数,μ'(·)是目标策略网络输出的策略,θ'是目标策略网络的参数;where R i is the reward obtained by state Si performing action A i , γ is the reward decay factor, q'(·) is the Q value output by the target Q network, w' is the parameter of the target Q network, and μ'(·) is The policy output by the target policy network, θ' is the parameter of the target policy network; 步骤S205、使用均方差损失函数MSE(mean squared error)计算当前Q网络的损失L并通过神经网络的梯度反向传播来更新当前Q网络的全部参数w;Step S205, use the mean squared error (MSE) loss function to calculate the loss L of the current Q network and update all the parameters w of the current Q network through the gradient backpropagation of the neural network;
Figure FDA0003567677330000031
Figure FDA0003567677330000031
其中n是获取的样本总数,q(·)是当前Q网络输出的Q值,Si是第i个状态,Ai是第i个动作,w是当前Q网络的参数;where n is the total number of samples obtained, q( ) is the Q value output by the current Q network, S i is the ith state, A i is the ith action, and w is the parameter of the current Q network; 步骤S206、使用性能指标函数J,通过神经网络的梯度反向传播来更新当前策略网络的全部参数θ,迭代次数t增加1;Step S206, using the performance index function J to update all the parameters θ of the current strategy network through the gradient back-propagation of the neural network, and the number of iterations t is increased by 1;
Figure FDA0003567677330000041
Figure FDA0003567677330000041
步骤S207、每隔固定周期更新目标Q网络参数w'和目标策略网络参数θ';Step S207, update the target Q network parameter w' and the target strategy network parameter θ' every fixed period; w′=τw+(1-τ)w′w'=τw+(1-τ)w' θ′=τθ+(1-τ)θ′θ′=τθ+(1-τ)θ′ 其中τ是网络参数软更新系数,θ是当前策略网络参数,w是当前Q网络参数;where τ is the network parameter soft update coefficient, θ is the current policy network parameter, and w is the current Q network parameter; 步骤S208、判断策略和值函数是否稳定收敛,如果达到终止条件则训练结束,未达到终止条件则返回步骤S202。In step S208, it is judged whether the strategy and the value function are stably converged. If the termination condition is met, the training ends, and if the termination condition is not met, the process returns to step S202.
4.根据权利要求2所述基于强化学习的自适应广域电磁法激电信息提取方法,其特征在于,步骤3中,在反演过程中将施加两类约束:4. The adaptive wide-area electromagnetic method for extracting IP information based on reinforcement learning according to claim 2, is characterized in that, in step 3, two types of constraints will be imposed in the inversion process: 步骤S301、利用勘探区域已知的物性特征来施加电阻率和极化率的先验信息约束,减少反演算法的搜索空间;Step S301, using the known physical properties of the exploration area to impose prior information constraints on resistivity and polarizability to reduce the search space of the inversion algorithm; 步骤S302、当处于电阻率参数或极化率参数反演阶段时,则为另一物性参数施加限制约束,即将另一物性参数的搜索限制在一个很小的范围内,来强化主物性参数对适应度函数的影响。Step S302, when in the resistivity parameter or polarizability parameter inversion stage, impose constraints on another physical property parameter, that is, limit the search of another physical property parameter to a small range, to strengthen the main physical property parameter pair. The effect of fitness function.
CN202110386529.1A 2021-04-12 2021-04-12 Self-adaptive wide-area electromagnetic method induced polarization information extraction method based on reinforcement learning Active CN113204054B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110386529.1A CN113204054B (en) 2021-04-12 2021-04-12 Self-adaptive wide-area electromagnetic method induced polarization information extraction method based on reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110386529.1A CN113204054B (en) 2021-04-12 2021-04-12 Self-adaptive wide-area electromagnetic method induced polarization information extraction method based on reinforcement learning

Publications (2)

Publication Number Publication Date
CN113204054A CN113204054A (en) 2021-08-03
CN113204054B true CN113204054B (en) 2022-06-10

Family

ID=77026635

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110386529.1A Active CN113204054B (en) 2021-04-12 2021-04-12 Self-adaptive wide-area electromagnetic method induced polarization information extraction method based on reinforcement learning

Country Status (1)

Country Link
CN (1) CN113204054B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113960674B (en) * 2021-10-14 2023-11-21 湖北省水文地质工程地质勘察院有限公司 A two-dimensional inversion method using wide-field electromagnetic method
CN114755732B (en) * 2022-03-17 2025-04-25 核工业北京地质研究院 A method and system for extracting induced polarization parameters from magnetotelluric observation data
CN115166840A (en) * 2022-06-10 2022-10-11 阿里云计算有限公司 Method and device for realizing resistivity inversion
CN115793064B (en) * 2022-07-11 2023-06-02 成都理工大学 Improved extraction method of excitation information in semi-aviation transient electromagnetic data
CN115829001B (en) * 2022-11-08 2023-06-20 中国科学院地质与地球物理研究所 A method and system for transient electromagnetic-induced electric field separation and multi-parameter information extraction

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101706587B (en) * 2009-11-24 2012-06-27 中南大学 Method for extracting induced polarization model parameters prospected by electrical method
CN102495428A (en) * 2011-12-12 2012-06-13 山东大学 Resistivity real-time imaging monitoring method and system for water-bursting geological disaster in construction period of underground engineering
CN107290793B (en) * 2017-06-05 2019-02-19 湖南师范大学 A parallel inversion method of ultra-high density electrical method based on weighted multi-strategy leapfrog algorithm
CN111143984A (en) * 2019-12-23 2020-05-12 贵州大方煤业有限公司 Magnetotelluric two-dimensional inversion method based on genetic algorithm optimization neural network
CN112083509B (en) * 2020-08-14 2022-06-07 南方科技大学 Method for detecting induced polarization abnormity in time-frequency electromagnetic method

Also Published As

Publication number Publication date
CN113204054A (en) 2021-08-03

Similar Documents

Publication Publication Date Title
CN113204054B (en) Self-adaptive wide-area electromagnetic method induced polarization information extraction method based on reinforcement learning
CN107630697B (en) Formation resistivity joint inversion method based on electromagnetic wave resistivity logging while drilling
CN110133733B (en) Conductance-polarizability multi-parameter imaging method based on particle swarm optimization algorithm
CN101650443B (en) Back-propagation network calculating method of apparent resistivity
CN118171582B (en) A method and system for inversion of azimuthal electromagnetic logging while drilling based on combined residual neural network and L-M algorithm
CN112699596A (en) Wide-area electromagnetic method induced polarization information nonlinear extraction method based on learning
Fang et al. Estimation of ultrasonic signal onset for flow measurement
Wang et al. Multi-objective particle swarm optimization for multimode surface wave analysis
CN117669362A (en) Deep learning-assisted iterative inversion method of azimuthal electromagnetic wave logging data while drilling
CN116992754A (en) A fast inversion method for logging while drilling data based on transfer learning
CN115079257B (en) Q-value estimation and earthquake attenuation compensation method based on fusion network
CN117784250A (en) A method for inversion of seafloor acoustic parameters based on model-independent element learning algorithm
CN119199990A (en) A full waveform inversion method for elastic waves based on optimal transfer function
CN110441815B (en) Simulated annealing Rayleigh wave inversion method based on differential evolution and block coordinate descent
CN112773396A (en) Medical imaging method based on full waveform inversion, computer equipment and storage medium
CN118761277A (en) A deep learning-based array induction logging inversion method
CN118534552A (en) A dual-parameter full waveform inversion method for ground penetrating radar based on data coding
Zhang et al. Soil water content estimation by using ground penetrating radar data full waveform inversion with grey wolf optimizer algorithm
CN112526621A (en) Ground-air electromagnetic data slow diffusion multi-parameter extraction method based on neural network
CN110956249A (en) A layered medium inversion method based on resampling optimization particle swarm optimization
Ren et al. Research on the Rayleigh surface wave inversion method based on the improved whale optimization algorithm
Zhang et al. First break of the seismic signals in oil exploration based on information theory
CN117875172A (en) Rayleigh surface wave inversion method based on joint constraints of borehole information and target exploration
CN113505705B (en) A double-layer pipe column eddy current signal denoising method, system and processing terminal
CN115980864A (en) A Transient Electromagnetic Inversion Method Based on Improved PSO-FADBN Network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant