CN113204054B

CN113204054B - Self-adaptive wide-area electromagnetic method induced polarization information extraction method based on reinforcement learning

Info

Publication number: CN113204054B
Application number: CN202110386529.1A
Authority: CN
Inventors: 董莉; 江沸菠; 李小龙; 肖林
Original assignee: Hunan University of Technology
Current assignee: Hunan University of Technology
Priority date: 2021-04-12
Filing date: 2021-04-12
Publication date: 2022-06-10
Anticipated expiration: 2041-04-12
Also published as: CN113204054A

Abstract

The invention discloses an adaptive wide-area electromagnetic method IP information extraction method based on reinforcement learning. Parameter identification and regularization settings, so as to achieve intelligent IP information extraction. Through the method of the present invention, since the influence of the resistivity on the observation data in the early stage of the inversion is much greater than that of the polarizability, the sensitivity of the resistivity will be higher than that of the polarizability. A priori information constraint is imposed, and a strong constraint is imposed on the polarizability parameter; in the later stage, the resistivity tends to be stable, and the sensitivity of the polarizability will be higher than that of the resistivity. At this time, the inversion will be dominated by the polarizability. The polarizability parameter imposes a priori information constraint, and imposes a strong constraint on the resistivity parameter; while the specific regularization coefficient and constraint imposition are used to set the judgment results of the inversion stage through reinforcement learning.

Description

Self-adaptive wide-area electromagnetic method induced polarization information extraction method based on reinforcement learning

Technical Field

The invention belongs to the technical field of geophysical, and relates to a self-adaptive wide-area electromagnetic method induced polarization information extraction method based on reinforcement learning.

Background

Wide area Electromagnetic (WFEM) is a new type of frequency domain Electromagnetic prospecting Method. The method has the advantages of stable and reliable field source signals of a controllable source audio magnetotelluric method (CSAMT) and non-remote area measurement of a magnetic couple source frequency sounding Method (MELOS). The wide-area apparent resistivity defined by the WFEM strictly reserves high-order terms in the series expansion of the electromagnetic field expression, can be extracted by measuring only one physical quantity in various working modes, and is the full-area applicable apparent resistivity capable of effectively improving the non-far-zone distortion effect of the electromagnetic sounding curve.

Currently, WFEM obtains a series of positive results in the fields of oil and gas resource exploration, metal mine exploration, engineering survey and the like. However, in practical applications, the frequency domain electromagnetic wave response of the subsurface medium is a comprehensive reflection of electromagnetic induction and the effect of induced electricity. The research of extracting the induced polarization information from the frequency domain electromagnetic method signal is developed, more physical parameters can be obtained, the influence of the polarization effect on the electromagnetic signal can be quantitatively analyzed, and the inversion interpretation precision of the frequency domain electromagnetic method is further improved.

However, since the intensity of the anomaly caused by the underground uneven conductivity is much larger than that caused by the induced electrical effect, the inversion process is obviously divided into two parts: (1) the resistivity inversion part is used for rapidly converging the individual to be close to the correct resistivity parameter in a solution space because the resistivity parameter has a main influence effect on the fitness function curve; (2) in the polarizability inversion part, the influence of the resistivity parameter on the fitness function tends to be stable, the individual starts to perform fine adjustment near the resistivity parameter, the optimization of the polarizability parameter becomes a main reason for the decline of the fitness curve, but the influence on the fitness function is far less than the influence of the resistivity parameter on the fitness function because the polarizability parameter is far less than the resistivity parameter in value, the algorithm is very easy to fall into a local extremum at the moment, the wrong polarizability parameter is obtained, and the difficulty in extracting the excitation information is increased. Therefore, how to extract weak polarizability parameters under the influence of resistivity parameters is a complex engineering problem, and has a particularly great technical challenge.

Disclosure of Invention

The invention aims to provide a reinforcement learning-based adaptive wide-area electromagnetic method induced polarization information extraction method, which realizes the identification and regularization setting of adaptive inversion parameters by defining sensitivity as the characteristics of inversion parameter identification and adopting a reinforcement learning method, thereby improving the accuracy of induced polarization information extraction.

In order to achieve the purpose, the invention provides the following technical scheme:

the invention provides a reinforcement learning-based adaptive wide-area electromagnetic method induced polarization information extraction method, which is characterized in that sensitivity is defined as the characteristic of inversion parameter identification, and meanwhile, the reinforcement learning method is adopted to realize the identification and regularization setting of adaptive inversion parameters, so that intelligent induced polarization information extraction is realized.

The invention provides a reinforcement learning-based adaptive wide-area electromagnetic method induced polarization information extraction method, which comprises the following steps of:

s1, setting a calculation equation of the wide area apparent resistivity:

in the formula (1), the reaction mixture is,

r is the distance from the observation point to the center of the dipole source, or the transmitting-receiving distance; dL is the length of the horizontal current source,

is the distance between observation points M and N;

p is the resistivity, I is the current intensity,

k is called the propagation constant or wavenumber of the electromagnetic wave, i is the imaginary part,

is the angle between r and the current source;

s2, setting an induced polarization model as follows:

in the formula (2), ρ (ω) is a wide-area complex resistivity related to frequency in consideration of the polarization effect; rho_aWide area apparent resistivity when no polarization effect is considered; m is polarizability; τ is a time constant; c is a frequency correlation coefficient, and omega is an angular velocity;

s3, setting an inverted objective function as follows:

fit＝E(e)+λ₁R(ρ)+λ₂R(m) (3)

in formula (3), R (ρ) and R (m) are minimum constructive constraint functions for resistivity and polarizability, respectively; lambda [ alpha ]₁、λ₂The two regularization factors are respectively corresponding to R (delta) and R (m), and the reason for adopting the two independent regularization factors is the value space of the polarizability (m belongs to[0,1]) The value space of the resistivity is greatly different (delta > m can be generally considered), and if a uniform regularization factor is adopted, a relatively small polarizability parameter cannot be constrained; e (e) is a target error function, and is a fitting error of data in inversion;

r (ρ) and R (m) are both calculated here using the following formulae:

in the formula (4), M is a model parameter obtained by inversion, and comprises resistivity rho and polarizability M;

s4, designing a staged extraction method of different physical parameters by defining sensitivity as the characteristics of inversion parameter identification, and distinguishing the stage of the current inversion through the sensitivity;

the sensitivities of resistivity and polarizability are defined as follows:

in the formula (5), S is sensitivity, G is iteration times, fit is fitness, and M is a model parameter obtained by inversion, including resistivity rho and polarizability M;

s5, adopting reinforcement learning based on the determined strategy gradient to realize judgment of an inversion stage and setting of a regularization coefficient;

the reinforcement learning comprises three elements of state, behavior and reward, and system modeling is carried out aiming at the three elements, wherein the state is the sensitivity of resistivity and polarizability, the behavior is a regularization coefficient, and the reward is an improved value of fitness; the system judges the inversion stage according to the current state and outputs a corresponding regularization coefficient, and then calculates the reward according to the inversion result to adjust a strategy and a value function in reinforcement learning; through repeated learning until the strategy and the value function are stable, the inversion stage can be accurately judged and a proper regularization coefficient is set;

and S6, controlling inversion imposed constraint according to the regularization coefficient generated by reinforcement learning, realizing the identification and regularization setting of the adaptive inversion parameters, and obtaining high-precision induced polarization information (including resistivity and polarizability parameters).

Further, in step S5, the step of reinforcement learning includes:

step one, four networks are initialized randomly, namely a current strategy network mu, a target strategy network mu ', a current Q network Q and a target Q network Q';

the parameters are respectively: a current strategy network parameter theta, a target strategy network parameter theta ', a current Q network parameter w and a target Q network parameter w', wherein the current iteration time t is 0;

step two, S is an initial state, the state S is input into the current policy network, and the action A is obtained:

A＝μ(S|θ)+N

wherein, mu (-) is the strategy output by the current strategy network, S is the initial state, theta is the parameter of the current strategy network, and N is the noise;

step three, the state S executes the action A to obtain the next state S ', the reward R is obtained, and the S, A, R and S' are stored into an experience playback set D ═ S_t,A_t,R_t,S′_t}；

Step four, updating the state S to be S'; randomly collecting n samples from an empirical playback set D S_i,A_i,R_i,S′_i1,2,3, …, n, calculating the output value y of the current Q network Q_i：

y_i＝R_i+γq′(S′_i,μ′(S′_i|θ′)|w′

Wherein R is_iIs state S_iPerforming action A_iThe reward earned, γ is the reward attenuation factor, Q '() is the Q value of the target Q network output, w' is a parameter of the target Q network, μ '() is the policy of the target policy network output, θ' is a parameter of the target policy network;

step five, calculating the loss L of the current Q network by using a mean square error loss function MSE (mean squared error) and updating all parameters w of the current Q network through the gradient back propagation of the neural network;

where n is the total number of samples taken, Q (-) is the Q value of the current Q network output, S_iIs the ith state, A_iIs the ith action, w is a parameter of the current Q network;

step six, using a performance index function J, updating all parameters theta of the current strategy network through gradient back propagation of the neural network, and increasing the iteration times t by 1;

step seven, updating the target Q network parameter w 'and the target strategy network parameter theta' every fixed period;

w′＝τw+(1-τ)w′

θ′＝τθ+(1-τ)θ′

wherein tau is a network parameter soft update coefficient, theta is a current policy network parameter, and w is a current Q network parameter;

and step eight, judging whether the strategy and the value function are stably converged, finishing the training if the terminal condition is reached, and returning to the step two if the terminal condition is not reached.

Further, in step S6, two types of constraints will be imposed during the inversion process: firstly, applying prior information constraint of resistivity and polarizability by utilizing known physical characteristics of an exploration area, and reducing a search space of an inversion algorithm; and secondly, when a certain physical property parameter (resistivity parameter or polarizability parameter) is in an inversion stage, a limitation constraint is applied to another physical property parameter, namely, the search of the other physical property parameter is limited within a small range, so that the influence of the main physical property parameter on the fitness function is strengthened.

Therefore, the strength of different constraints is controlled by different regularization coefficients generated by reinforcement learning, and accurate multi-parameter inversion is realized.

According to the method, the influence of the resistivity at the earlier stage of inversion on the observation data is far greater than the polarizability, so that the sensitivity of the resistivity is higher than the polarizability, the resistivity is mainly used at the earlier stage of inversion, prior information constraint is applied to the resistivity parameter, and strong limit constraint is applied to the polarizability parameter; and in the later inversion stage, the resistivity tends to be stable, the sensitivity of the polarizability is higher than that of the resistivity, the polarizability is mainly used, prior information constraint is applied to polarizability parameters, and strong limit constraint is applied to resistivity parameters. And the specific constraint application also sets the judgment result of the inversion stage through reinforcement learning.

The invention designs a reinforcement learning-based adaptive wide-area electromagnetic method induced polarization information extraction method, so that an inversion algorithm can automatically and quickly identify whether the main parameter of current inversion is polarizability or resistivity, and perform targeted inversion, thereby improving the accuracy of induced polarization information extraction.

Compared with the prior art, the invention has the following advantages:

(1) the method can judge the current inversion state (mainly polarization inversion or mainly resistivity inversion) according to the sensitivities of the resistivity and the polarization in the iteration process, and output correct regularization coefficients and apply correct constraint conditions, thereby realizing intelligent excited electricity information extraction.

(2) The method can effectively solve the problem of uncertainty in multi-parameter inversion.

(3) The method can strengthen the influence of the polarizability in the later inversion stage and improve the accuracy of the extraction of the induced polarization information.

Drawings

Fig. 1 is a flow chart of an adaptive wide-area electromagnetic method induced polarization information extraction method based on reinforcement learning.

FIG. 2 is a regularization coefficient and constraint setting strategy based on reinforcement learning.

Detailed Description

The invention will be further illustrated with reference to the following specific examples and the accompanying drawings:

example 1

s1, setting a calculation equation of the wide area apparent resistivity:

in the formula (1), the reaction mixture is,

Figure 449169DEST_PATH_FDA0003567677330000022

Figure 366310DEST_PATH_FDA0003567677330000023

is the distance between observation points M and N;

p is the resistivity, I is the current intensity,

is the angle between r and the current source;

s2, setting an induced polarization model as follows:

s3, setting an inverted objective function as follows:

fit＝E(e)+λ₁R(ρ)+λ₂R(m) (3)

in formula (3), R (ρ) and R (m) are minimum constructive constraint functions for resistivity and polarizability, respectively; lambda [ alpha ]₁、λ₂The regularization factors respectively corresponding to R (rho) and R (m), and the reason for adopting two independent regularization factors is the value space of the polarizability (m belongs to [0,1]]) The value space of the resistivity is greatly different (generally, rho > m), and if a uniform regularization factor is adopted, a relatively small polarizability parameter cannot be constrained; e (e) is a target error function, and is a fitting error of data in inversion;

r (ρ) and R (m) are both calculated here using the following formulae:

the sensitivities of resistivity and polarizability are defined as follows:

s5, adopting reinforcement learning based on the determined strategy gradient to realize judgment of an inversion stage and setting of a regularization coefficient, as shown in FIG. 2 specifically;

the step of reinforcement learning comprises:

A＝μ(S|θ)+N

y_i＝R_i+γq′(S′_i,μ′(S′_i|θ′)|w′)

w′＝τw+(1-τ)w′

θ′＝τθ+(1-τ)θ′

step eight, judging whether the strategy and the value function are stably converged, finishing the training if the strategy and the value function reach the termination condition, and returning to the step two if the strategy and the value function do not reach the termination condition;

s6, controlling constraints imposed by inversion according to regularization coefficients generated by reinforcement learning, realizing identification and regularization setting of self-adaptive inversion parameters, and obtaining high-precision induced polarization information (including resistivity and polarizability parameters);

two types of constraints will be imposed during the inversion process: firstly, applying prior information constraint of resistivity and polarizability by utilizing known physical characteristics of an exploration area, and reducing a search space of an inversion algorithm; and secondly, when a certain physical property parameter (resistivity parameter or polarizability parameter) is in an inversion stage, a limitation constraint is applied to another physical property parameter, namely, the search of the other physical property parameter is limited within a small range, so that the influence of the main physical property parameter on the fitness function is strengthened. Therefore, the strength of different constraints is controlled by different regularization coefficients generated by reinforcement learning, and accurate multi-parameter inversion is realized.

Example 2

The method was tested on a three-layer model, the resistivity parameter ρ, the thickness parameter h and the polarizability parameter m of which are set as shown in table 1; the inversion algorithm uses a grayish optimization algorithm GWO in which the population size P and the number of iterations t_maxThe settings of (a) are shown in table 1; the soft update coefficient tau and the reward attenuation factor gamma of reinforcement learning are set as shown in table 1; regularization factor lambda of a minimum constructor when reinforcement learning is not employed₁And λ₂The settings of (2) are shown in table 1.

TABLE 1

The inversion results of the comparison between the method provided by the invention and the method which does not adopt reinforcement learning and adopts an Actor-Critic method (single network) are shown in table 2; the evaluation indexes are Root Mean Square Error (RMSE) and coefficient of determination R²。

TABLE 2

Method	RMSE	R²
			Learning without reinforcement	38.33	0.88
Actor-critical method	30.24	0.91
			The method of the invention	27.43	0.93

According to the inversion result, the inversion method based on reinforcement learning (Actor-Critic method and the method of the invention) is superior to the inversion method without reinforcement learning in result, because the reinforcement learning can automatically identify the physical property stage where the inversion is located, output correct regularization coefficients and apply constraints. The method is superior to the Actor-Critic method because the method adopts double networks to respectively realize Actor and Critic modules, and compared with the Actor-Critic method, the mode of separating the current network and the target network (double networks) can further improve the stability and generalization capability of reinforcement learning.

Claims

1. An adaptive wide-area electromagnetic method IP information extraction method based on reinforcement learning, characterized in that, by defining sensitivity as a feature of inversion parameter identification, a method of reinforcement learning is adopted to realize the extraction of adaptive inversion parameters simultaneously. Identification and regularization settings to achieve intelligent IP information extraction;

Step 1. By defining the sensitivity as the feature of inversion parameter identification, design a method for extracting different physical parameters in stages, and distinguish the current inversion stage by the sensitivity;

The sensitivities for resistivity and polarizability are defined as follows:

In formula (5), S is the sensitivity, G is the number of iterations, fit is the fitness, and M is the model parameters obtained by inversion, including resistivity ρ and polarizability m;

In the early stage of the inversion, the resistivity is the main factor, and prior information constraints are imposed on the resistivity parameters, and strong constraints are imposed on the polarizability parameters. In the later stage of the development, the polarizability will be the main factor, and the polarizability parameters will be subject to prior information constraints, and the resistivity parameters will be strongly restricted.

According to the sensitivity of resistivity and polarizability in the iterative process, the current inversion state is judged, whether the polarizability inversion is the main or the resistivity inversion, and the correct regularization coefficient is output and the correct constraints are imposed. , so as to realize intelligent IP information extraction;

Step 2. Adopt reinforcement learning based on the determined policy gradient to realize the judgment of the inversion stage and the setting of the regularization coefficient;

Step 3: Control the constraints imposed by the inversion according to the regularization coefficient generated by the reinforcement learning, realize the identification and regularization setting of the adaptive inversion parameters, and obtain high-precision IP information, including the resistivity and polarizability parameters.

2. The self-adaptive wide-area electromagnetic method IP information extraction method based on reinforcement learning according to claim 1, is characterized in that, before described step 1, also comprises the following steps:

S1. Set the calculation equation of the wide-area apparent resistivity:

In formula (1),

r is the distance from the observation point to the center of the dipole source, or the transceiver distance; dL is the length of the horizontal current source,

is the distance between observation points M and N;

is the distance between the observation points M and N, ρ is the resistivity, I is the current intensity,

k is called the propagation constant or wave number of the electromagnetic wave, i is the imaginary part,

is the angle between r and the current source;

S2. Set the IP model as:

In formula (2), ρ(ω) is the frequency-dependent wide-area complex resistivity after considering the polarization effect; ρ _a is the wide-area apparent resistivity without considering the polarization effect; m is the polarizability; τ is Time constant; c is the frequency correlation coefficient, ω is the angular velocity;

S3. Set the objective function of the inversion as follows:

fit=E(e)+λ ₁ R(ρ)+λ ₂ R(m) (3)

In formula (3), R(ρ) and R(m) are the minimum structural constraint functions for resistivity and polarizability, respectively; λ ₁ , λ ₂ are the regularization corresponding to R(ρ) and R(m), respectively The reason for using two independent regularization factors is that the value space m of polarizability is quite different from the value space ρ of resistivity;

Among them, m∈[0,1], ρ＞＞m;

If a uniform regularization factor is used, it will not be able to constrain the relatively small polarizability parameter; E(e) is the objective error function, which is the fitting error of the data during inversion;

Both R(ρ) and R(m) are calculated here using the following formulas:

In formula (4), M is the model parameter obtained by inversion, including resistivity ρ and polarizability m.

3. The self-adaptive wide-area electromagnetic method IP information extraction method based on reinforcement learning according to claim 2, is characterized in that, in step 2, the step of described reinforcement learning comprises:

Step S201, randomly initialize four networks, namely the current strategy network μ, the target strategy network μ', the current Q network q, and the target Q network q';

The parameters are: the current strategy network parameter θ, the target strategy network parameter θ', the current Q network parameter w and the target Q network parameter w', and the current iteration number t=0;

Step S202, S is the initial state, and the state S is input into the current policy network to obtain the action A:

A=μ(S|θ)+N

where μ( ) is the policy output by the current policy network, S is the initial state, θ is the parameter of the current policy network, and N is the noise;

Step S203, state S executes action A, obtains the next state S', rewards R, and stores S, A, R, S' in the experience playback set D={S _t , A _t , R _t , S' _t };

Step S204, the state S is updated to S'; randomly collect n samples {S _i , A _i , R _i , S' _i }, i=1, 2, 3,..., n from the experience playback set D, and calculate the current The output value _yi of the Q network q:

y _i =R _i +γq'(S' _i ,μ'(S' _i |θ')|w')

where R _i is the reward obtained by state Si performing action A _i _, γ is the reward decay factor, q'(·) is the Q value output by the target Q network, w' is the parameter of the target Q network, and μ'(·) is The policy output by the target policy network, θ' is the parameter of the target policy network;

Step S205, use the mean squared error (MSE) loss function to calculate the loss L of the current Q network and update all the parameters w of the current Q network through the gradient backpropagation of the neural network;

where n is the total number of samples obtained, q( ) is the Q value output by the current Q network, S _i is the ith state, A _i is the ith action, and w is the parameter of the current Q network;

Step S206, using the performance index function J to update all the parameters θ of the current strategy network through the gradient back-propagation of the neural network, and the number of iterations t is increased by 1;

Step S207, update the target Q network parameter w' and the target strategy network parameter θ' every fixed period;

w'=τw+(1-τ)w'

θ′=τθ+(1-τ)θ′

where τ is the network parameter soft update coefficient, θ is the current policy network parameter, and w is the current Q network parameter;

In step S208, it is judged whether the strategy and the value function are stably converged. If the termination condition is met, the training ends, and if the termination condition is not met, the process returns to step S202.

4. The adaptive wide-area electromagnetic method for extracting IP information based on reinforcement learning according to claim 2, is characterized in that, in step 3, two types of constraints will be imposed in the inversion process:

Step S301, using the known physical properties of the exploration area to impose prior information constraints on resistivity and polarizability to reduce the search space of the inversion algorithm;

Step S302, when in the resistivity parameter or polarizability parameter inversion stage, impose constraints on another physical property parameter, that is, limit the search of another physical property parameter to a small range, to strengthen the main physical property parameter pair. The effect of fitness function.