CN118152937B

CN118152937B - Lithology recognition model training method and device, electronic equipment and storage medium

Info

Publication number: CN118152937B
Application number: CN202410564812.2A
Authority: CN
Inventors: 许瑞; 闫铁; 曲晶瑀; 孙晓峰; 孙士慧; 候兆凯
Original assignee: Sanya Offshore Oil And Gas Research Institute Of Northeast Petroleum University
Current assignee: Sanya Offshore Oil And Gas Research Institute Of Northeast Petroleum University
Priority date: 2024-05-09
Filing date: 2024-05-09
Publication date: 2024-08-30
Anticipated expiration: 2044-05-09
Also published as: CN118152937A

Abstract

The disclosure provides a training method, a device, electronic equipment and a storage medium of a lithology recognition model, and relates to the technical field of data processing, wherein the method comprises the following steps: acquiring a logging while drilling data set of a plurality of wells; performing feature selection on the logging while drilling data set to obtain a target feature data set; dividing the target characteristic data set to obtain a training data set; generating an initial recognition model based on the training data subset in the training data set; dividing decision trees in the initial recognition model to obtain a plurality of sub-forests; based on the felling contribution degree of the decision trees in the sub-forests, felling the decision trees to obtain an intermediate recognition model; distributing weights to decision trees in the intermediate recognition model according to the out-of-bag data set in the training data set to obtain the intermediate recognition model with the distributed weights; and adjusting the superparameter of the intermediate recognition model after the weight distribution to meet the preset condition to obtain the target recognition model. The method ensures the training speed and the recognition accuracy.

Description

Lithology recognition model training method and device, electronic equipment and storage medium

Technical Field

The disclosure relates to the technical field of data processing, and in particular relates to a training method and device of a lithology recognition model, electronic equipment and a storage medium.

Background

In the drilling process, the formation lithology is obtained in time by monitoring and analyzing the logging data in real time, and the method is very critical to the development of underground closed-loop intelligent guiding drilling technology. Timely and accurate stratum lithology can provide basis for intelligent decision making for drilling operation, so that drilling efficiency and success rate are improved.

Current methods of obtaining formation lithology include identifying log data using decision tree models, random forest models, and the like. On one hand, the characteristic information extraction in the identification process is difficult due to the fact that the types of logging data are multiple, the data dimension is high, and the correlation between the logging data and lithology types is complex. On the other hand, these models often require a large number of decision trees as support, which undoubtedly takes up a large amount of memory. Although the accuracy of the model is improved at this time, the calculation capacity is increased by geometric factors, so that the training time is remarkably prolonged. If the number of decision trees is reduced in order to increase the training speed, the recognition accuracy of the model is also affected, although space can be saved.

Disclosure of Invention

The disclosure provides a lithology recognition model training method, a lithology recognition model training device, electronic equipment and a storage medium, so as to at least solve the technical problems in the prior art.

According to a first aspect of the present disclosure, there is provided a method of training a lithology recognition model, the method comprising: acquiring a logging while drilling data set of a plurality of wells; performing feature selection on the logging while drilling data set to obtain a target feature data set; dividing the target characteristic data set to obtain a training data set; generating an initial recognition model based on a training data subset in the training data set; dividing all decision trees in the initial recognition model to obtain a plurality of sub-forests; for each sub-forest, cutting down the decision trees based on the cutting contribution degree of each decision tree in the sub-forest to obtain an intermediate recognition model; distributing weights to decision trees in the intermediate recognition model according to the out-of-bag data set in the training data set to obtain an intermediate recognition model with the weights distributed; adjusting the superparameter of the intermediate recognition model after the weight distribution to meet a preset condition to obtain a target recognition model; the feature selection of the logging while drilling data set to obtain a target feature data set includes: determining first mutual information between any two logging while drilling data in the logging while drilling data set; determining second mutual information between each logging while drilling data and a lithology category in the logging while drilling data set; and determining a target characteristic data set according to the first mutual information and the second mutual information.

In an embodiment, the dividing all decision trees in the initial recognition model to obtain a plurality of sub-forests includes: determining lithology recognition results of each decision tree in the initial recognition model; determining the identification accuracy of each decision tree on each lithology according to the lithology identification result; based on the identification accuracy of each lithology of each decision tree, dividing all decision trees to obtain a plurality of sub-forests.

In an embodiment, the method for logging the decision trees based on the logging contribution of each decision tree in the sub-forest to obtain the intermediate recognition model includes: determining the accuracy contribution degree and the diversity contribution degree of the sub-forest after each decision tree is cut down; determining a felling contribution of each decision tree based on the accuracy contribution and the diversity contribution; if the threshold value is larger than the threshold value, the decision tree is cut down; if the felling contribution value of the decision tree is not greater than the preset threshold value, reserving the decision tree; based on the retained decision tree, an intermediate recognition model is generated.

In an embodiment, the adjusting the superparameter of the intermediate recognition model after the weight allocation to meet a preset condition to obtain the target recognition model includes: and carrying out iterative updating on the initial super parameters of the intermediate recognition model after the weight allocation by a Bayesian optimization algorithm until the super parameters meet preset conditions, so as to obtain the target recognition model.

According to a second aspect of the present disclosure, there is provided a lithology recognition method, the method comprising: acquiring a logging while drilling data set of a well to be identified; inputting the logging while drilling data set of the well to be identified into a target identification model for identification to obtain lithology categories of the well to be identified; the target recognition model is obtained by training by adopting the lithology recognition model training method in the embodiment.

According to a third aspect of the present disclosure, there is provided a training apparatus for a lithology recognition model, the apparatus comprising: the data acquisition module is used for acquiring logging while drilling data sets of a plurality of wells; the feature selection module is used for carrying out feature selection on the logging while drilling data set to obtain a target feature data set; the training data set acquisition module is used for dividing the target characteristic data set to obtain a training data set; the initial model generation module is used for generating an initial recognition model based on the training data subset in the training data set; the sub-forest generation module is used for dividing all decision trees in the initial recognition model to obtain a plurality of sub-forests; the middle model generation module is used for cutting down the decision trees according to the cutting contribution degree of each decision tree in each sub-forest to obtain a middle recognition model; the weight distribution module is used for distributing weights to the decision trees in the intermediate recognition model according to the out-of-bag data set in the training data set to obtain an intermediate recognition model after the weights are distributed; the super-parameter optimization module is used for adjusting the super-parameters of the intermediate recognition model after the weight is distributed to meet preset conditions to obtain a target recognition model; the feature selection module is further used for determining first mutual information between any two logging while drilling data in the logging while drilling data set; determining second mutual information between each logging while drilling data and a lithology category in the logging while drilling data set; and determining a target characteristic data set according to the first mutual information and the second mutual information.

According to a fourth aspect of the present disclosure there is provided a lithology recognition device, the device comprising: the to-be-identified data acquisition module is used for acquiring a logging while drilling data set of the to-be-identified well; the identification module is used for inputting the logging while drilling data set of the well to be identified into a target identification model for identification to obtain lithology categories of the well to be identified; the target recognition model is obtained by training the lithologic recognition model according to any one of the above embodiments.

According to a fifth aspect of the present disclosure, there is provided an electronic device comprising:

At least one processor; and

A memory communicatively coupled to the at least one processor; wherein,

The memory stores instructions executable by the at least one processor to enable the at least one processor to perform the methods described in the present disclosure.

According to a sixth aspect of the present disclosure, there is provided a non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method of the present disclosure.

According to the lithology recognition model training method, device, electronic equipment and storage medium, on one hand, the characteristics of an original logging-while-drilling data set are selected, so that the problems that logging data are multiple in variety, high in data dimension and difficult in characteristic information extraction are solved. On the other hand, decision trees with poor recognition effect and redundancy in the forest are cut down, weights are distributed to each decision tree according to the data set outside the bag, and super parameters of the model are optimized, so that the recognition accuracy of the model is ensured, and meanwhile, the model training speed is improved.

It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the disclosure, nor is it intended to be used to limit the scope of the disclosure. Other features of the present disclosure will become apparent from the following specification.

Drawings

The above, as well as additional purposes, features, and advantages of exemplary embodiments of the present disclosure will become readily apparent from the following detailed description when read in conjunction with the accompanying drawings. Several embodiments of the present disclosure are illustrated by way of example, and not by way of limitation, in the figures of the accompanying drawings, in which:

in the drawings, the same or corresponding reference numerals indicate the same or corresponding parts.

Fig. 1 shows a schematic implementation flow diagram of a lithology recognition model training method according to an embodiment of the disclosure;

FIG. 2 shows a second implementation flow diagram of a method for training a lithology recognition model according to an embodiment of the disclosure;

FIG. 3 is a schematic diagram illustrating an implementation flow of a lithology recognition method according to an embodiment of the disclosure;

FIG. 4 is a schematic structural diagram of a training device for lithology recognition model according to an embodiment of the present disclosure;

FIG. 5 shows a schematic structural diagram of a lithology recognition device according to an embodiment of the disclosure;

Fig. 6 shows a schematic diagram of a composition structure of an electronic device according to an embodiment of the present disclosure.

Detailed Description

In order to make the objects, features and advantages of the present disclosure more comprehensible, the technical solutions in the embodiments of the present disclosure will be clearly described in conjunction with the accompanying drawings in the embodiments of the present disclosure, and it is apparent that the described embodiments are only some embodiments of the present disclosure, but not all embodiments. Based on the embodiments in this disclosure, all other embodiments that a person skilled in the art would obtain without making any inventive effort are within the scope of protection of this disclosure.

Fig. 1 is a schematic implementation flow diagram of a lithology recognition model training method according to an embodiment of the present disclosure, and according to a first aspect of the present disclosure, there is provided a lithology recognition model training method, as shown in fig. 1, including:

Step 101, acquiring a logging while drilling data set of a plurality of wells.

In a designated work area, logging while drilling data of a plurality of wells are collected in the drilling process to form a logging while drilling data set. Logging while drilling data reflects physical properties of the subsurface rock including natural gamma, borehole diameter, 2MHz phase resistivity, 400KHz decay resistivity, neutron porosity, density, sonic moveout, and the like.

Logging while drilling data is acquired while drilling and records geological information of the drill bit as it has just drilled the formation. At this point, the wellbore has not collapsed significantly, and the mud invasion into the formation is shallow or even negligible. Thus, logging while drilling data more reflects the original formation information.

And 102, performing feature selection on the logging while drilling data set to obtain a target feature data set.

And selecting logging-while-drilling data with the maximum lithology category correlation and the minimum redundancy between the logging-while-drilling data from a large amount of logging-while-drilling data contained in the logging-while-drilling data set to form a target characteristic data set for training of a subsequent model. Specifically, a minimum redundancy maximum correlation algorithm (minimum Redundancy Maximum Relevance, mRMR) may be employed to mine correlations between logging-while-drilling data and lithology categories, other logging-while-drilling data.

In an embodiment, feature selection is performed on a logging while drilling data set to obtain a target feature data set, which may be implemented by the following technical means: determining first mutual information between any two logging while drilling data in the logging while drilling data set; determining second mutual information between each logging while drilling data and the lithology category in the logging while drilling data set; and determining a target characteristic data set according to the first mutual information and the second mutual information.

Mutual information between any two logging while drilling data in the logging while drilling data set is calculated and used as first mutual information. Mutual information between each logging while drilling data and the lithology category is calculated as second mutual information. Mutual information is used for measuring the shared information quantity between two data, and the larger the value of the mutual information is, the stronger the correlation between two variables is represented. If the value of the mutual information (first mutual information) between two logging-while-drilling data is high, then the information they carry has a large overlap, i.e. redundancy is high. If the mutual information between the logging while drilling data and the lithology category (second mutual information) is high, the correlation between the two is characterized to be large.

Specifically, the first mutual information and the second mutual information can be obtained through calculation according to the following formulas (1) and (2):

(1)

(2)

Wherein, Characterizing logging while drilling dataAnd logging while drilling dataMutual information, namely first mutual information; c is lithology category; characterizing logging while drilling data Mutual information between the rock type c and the lithology type c, namely second mutual information; characterizing logging while drilling data Probability density of (c); characterizing logging while drilling data Probability density of (c); characterizing logging while drilling data And logging while drilling dataIs a joint probability density of (2); characterizing the probability density of lithology category c; characterizing logging while drilling data And a joint probability density for lithology category c.

Then, the target characteristic data set can be obtained by carrying out characteristic selection according to the following formula (3):

(3)

Wherein F characterizes the logging while drilling dataset; m represents the number of logging while drilling data in the logging while drilling data set.

Feature selection is carried out on the logging while drilling data set by utilizing a feature selection technology, so that the feature quantity can be reduced, the model learning difficulty and the calculation cost are reduced, and the generalization capability of the model is improved.

And step 103, dividing the target characteristic data set to obtain a training data set.

In an embodiment, before dividing the target feature data set, the method further includes: and carrying out normalization processing on the data in the target feature data set to obtain normalized target feature data. Mapping the original target feature data to a new range of values, e.g., [0,1], by linear transformation, specifically, can be achieved by the following equation (4):

（4）

Wherein, Characterizing raw target feature data; characterizing the normalized target feature data; 、 The minimum and maximum values in the target feature dataset are characterized, respectively.

In model training, if the range of values of some features is far greater than other features, then these features may dominate the algorithm, resulting in the model not learning information of other features correctly. Therefore, in this embodiment, by performing normalization processing on the data, all the features are scaled to the same value range, so as to ensure that each feature can be treated equally in the model training process. The problems caused by the numerical dimension difference are eliminated, and the performance and the stability of the machine learning model are improved.

And dividing the normalized target characteristic data to determine a training data set and a test data set. Wherein the training dataset is used for training the model. And the test dataset is used to evaluate the performance of the model after model training is complete.

For example, 70% is extracted as a training data set from the target feature data set by a hierarchical stochastic method, and the remaining 30% is taken as a test data set.

Step 104, generating an initial recognition model based on the training data subset in the training data set.

N training data subsets are randomly extracted from the training data set with a put back by a boottrap sampling (with put back sampling) technique. The data capacity of each training data subset is the same as the training data set. During each boottrap sampling, some samples are not sampled. These samples that were not extracted were noted as Out-of-Bag (OOB) data.

And generating n decision trees according to the extracted n training data subsets. And then combining the n decision trees into an initial recognition model in a mode of equal-weight voting.

If the same training data is used for each decision tree, the correlation between models is too high, so that the diversity and generalization capability of the models are reduced. Thus, in this embodiment the training data subset is generated by randomly decimating samples of the same data capacity as the training data set with a put back. This randomness can increase the variability between models.

And 105, dividing all decision trees in the initial recognition model to obtain a plurality of sub-forests.

And dividing all the decision trees according to the accuracy rate of each decision tree in the initial recognition model for each type of lithology recognition to form a plurality of sub-forests. For example, if decision tree A has the highest accuracy in identifying lithology category 1, it is partitioned into sub-forestsIn (a) and (b); if decision tree B has the highest accuracy in identifying lithology category 2, it is partitioned into sub-forestsAnd so on.

And 106, cutting down the decision trees according to the cutting contribution degree of each decision tree in each sub-forest to obtain an intermediate recognition model.

The felling contribution is a measure of the decision tree and is used to decide whether the decision tree should be kept or felled. The felling contribution quantifies the impact of each decision tree on model prediction accuracy. If a decision tree exists that contributes less to the model performance improvement. Then cutting down this decision tree may improve the overall performance of the model. Conversely, if a decision tree contributes significantly to the prediction accuracy of the model, it means that it helps to improve the performance of the overall recognition model, requiring preservation. And finally, generating an intermediate recognition model according to the reserved decision tree.

And step 107, distributing weights to decision trees in the intermediate recognition model according to the out-of-bag data set in the training data set, and obtaining the intermediate recognition model with the distributed weights.

Step 104, determining the training data subset and the OOB data set, and determining the weight of each decision tree based on the OOB data set, wherein the calculation formula is as follows:

（5）

Wherein, Representing the weight of a decision tree i in the intermediate recognition model; characterizing an ith decision tree to predict the correct sample number in the out-of-bag data; The total sample number of the out-of-bag data was characterized. And after corresponding weights are allocated to each decision tree, an allocated intermediate recognition model is obtained.

And step 108, adjusting the superparameter of the intermediate recognition model after the weight distribution to meet the preset condition to obtain the target recognition model.

Specifically, the super parameters of the intermediate recognition model after the weight is distributed are optimized through a Bayesian optimization algorithm. The super parameters of the model include: the number of features considered when searching for the optimal splitting point, the number of trees in the forest, the maximum depth of the trees, the minimum number of samples separable by the leaf nodes, the minimum number of samples contained by the leaf nodes, the maximum number of leaf nodes, the minimum non-purity of node division, and the like.

The optimal combination of the super parameters is found through a Bayesian optimization algorithm. Firstly, the prior inference is carried out by utilizing the initial super-parameters of the intermediate recognition model after the weight is distributed. And adjusting the super parameters of the model according to the inferred result, and updating the recognition result of the intermediate recognition model after the weight is assigned. The above process is iterated continuously to adjust the super parameters of the model. Until the obtained hyper-parameters are considered as optimal solutions, i.e. the preset conditions are met. The model under the super parameter setting is determined as the target recognition model.

By applying the method disclosed by the invention, on one hand, the minimum redundancy maximum correlation algorithm is adopted to perform feature selection, and the problem of difficult feature information extraction caused by multiple types of logging data and high data dimension is solved. On the other hand, decision trees with poor recognition effect and redundancy in the forest are cut down, weights are distributed to each decision tree according to the OOB data, and super parameters of the model are optimized, so that the recognition accuracy of the model is ensured, and meanwhile, the model training speed is improved.

In one embodiment, all decision trees in the initial recognition model are divided to obtain a plurality of sub-forests, which can be realized by the following technical means: determining lithology recognition results of each decision tree in the initial recognition model; determining the identification accuracy of each decision tree on each lithology according to lithology identification results; based on the identification accuracy of each lithology of each decision tree, dividing all decision trees to obtain a plurality of sub-forests.

Firstly, determining the identification accuracy of each decision tree to each lithology according to the lithology identification result of each decision tree in the initial identification modelWherein i=1, 2, … …, n; c=1, 2, … …, z. n is the number of decision trees and z is the number of lithology categories. The recognition accuracy can be obtained by calculation according to the following formula (6)：

（6）

Wherein i characterizes any one decision tree in the model, i=1, 2, … …, n; c represents lithology category, c=1, 2, … …, z; the number of sample points for identifying the correct c-type lithology by the decision tree i is represented; And the number of sample points of the lithology of the class c in the training set corresponding to the decision tree i is represented.

Then according to each decision tree, identifying lithology category with highest accuracy, dividing all decision trees in the initial model into k sub-forests, and marking as。

If the decision tree is not classified, the decision tree is directly cut in the subsequent step, and the recognition accuracy of the decision tree may be low due to the small number of sample points of a certain lithology. Direct felling may result in felling of decision trees that have a good effect on identifying such lithology, thereby affecting the ability of the overall identification model to identify such lithology. The method of the embodiment can ensure the identifying effect of the identifying model on each type of lithology.

In an embodiment, for each sub-forest, the decision trees are cut based on the cut contribution of each decision tree in the sub-forest, resulting in an intermediate recognition model. The method is realized by the following steps: determining the accuracy contribution degree and the diversity contribution degree of the sub-forest after each decision tree is cut down; determining the felling contribution degree of each decision tree based on the accuracy contribution degree and the diversity contribution degree; if the threshold value is larger than the threshold value, the decision tree is cut down; if the felling contribution value of the decision tree is not greater than the preset threshold value, reserving the decision tree; based on the retained decision tree, an intermediate recognition model is generated.

Specifically, the felling contribution of each decision tree is determined by the following formula (7):

（7）

Wherein, Characterization of sub-forestsThe felling contribution degree of any one decision tree i; characterization of a deforestated decision tree i pair forest Accuracy contribution of (2); characterization of a deforestated decision tree i pair forest Is a degree of diversity contribution of (3).

The threshold value is preset to 0 in this embodiment. When the threshold contribution of the decision tree is greater than 0, the decision tree is cut. When the felling contribution of the decision tree is less than 0, the decision tree is retained. And forming the preserved decision tree into an intermediate recognition model.

Wherein in the formula (7)The determination can be made by the following formula (8):

（8）

Characterization of sub-forests Is used for identifying the average lithology accuracy rate; Characterization of sub-forests Average lithology recognition accuracy after cutting down decision tree i.

The determination can be made by the following formula (9):

（9）

Wherein, ；Representing the average lithology recognition accuracy of the recognition model; m is the number of decision trees in the recognition model; The lithology recognition accuracy of the decision tree i in the recognition model is represented; The characterization decision tree i identifies the correct lithology sample point number; n represents the total number of training set sample points.

The determination can be made by the following formula (10):

（10）

Wherein, Characterization of sub-forestsAverage diversity of (3); Characterization of sub-forests Average diversity after the decision tree i is cut.

The determination can be made by the following formula (11):

（11）

Wherein, ；Characterizing the average diversity of the recognition model; m is the number of decision trees in the recognition model; Representing the diversity between any two decision trees in the recognition model, namely decision tree i and decision tree j, wherein the index uses independent classification difference generated between the overdimensioned classifiers to evaluate the diversity of the model; The number of sample points with different identification results of the decision tree i and the decision tree j is characterized.

The recognition model in this embodiment belongs to a set model that is a set of a plurality of decision tree-based learners. For the aggregate model, the higher the precision of the base learner, the greater the diversity of the base learner, and the better the aggregate model comprehensive performance. Meanwhile, the precision and the diversity of the model are a trade-off relation and need to be considered simultaneously. By cutting the decision tree with a cut contribution, some decision trees with low recognition accuracy and redundancy can be cut. Therefore, the number of decision trees in the recognition model can be reduced, the similarity of the decision trees in the model is reduced, the training speed of the model is improved, and the accuracy of the lithology recognition model is improved.

For better understanding of the foregoing embodiments, the following describes a method of the present disclosure in detail by a specific embodiment, and fig. 2 shows a second implementation flow diagram of a training method of a lithology recognition model according to an embodiment of the present disclosure, as shown in fig. 2:

Logging while drilling data is first acquired to form a logging while drilling data set. And then, carrying out feature selection on the logging while drilling data set to obtain a target feature data set. Data preprocessing, such as data normalization, is performed on the data in the target feature data set. And then dividing the preprocessed data into training data and test data to obtain a training data set and a test data set. The data in the training data set is divided into a training data subset and out-of-bag data by Bootstrap sampling. The training data subset includes training set 1, training sets 2, … …, training set n. Generating n decision trees according to the training data subset, wherein the n decision trees comprise decision tree 1, decision trees 2 and … … and decision tree n. And determining the identification result of each decision tree for each lithology, and respectively obtaining an identification result 1, identification results 2, … … and an identification result n. The decision tree is then divided into sub-forest 1, sub-forest 2, … …, sub-forest k based on the recognition result. And (3) cutting down the decision trees according to the cutting contribution degree of each decision tree to the sub-forest, and forming a new forest, namely an intermediate recognition model, from the retained decision trees. And (3) inputting the out-of-bag data into the intermediate recognition model, and weighting and voting decision trees in the forest according to the accuracy rate of out-of-bag data recognition to obtain the intermediate recognition model after weight distribution. And finally, performing super-parameter optimization on the intermediate recognition model after the weight is distributed through a Bayesian optimization algorithm to obtain a target recognition model. And inputting the test data into a target recognition model, and training the model to obtain lithology recognition results.

Fig. 3 is a schematic flow chart illustrating an implementation of a lithology recognition method according to an embodiment of the present disclosure, and according to a second aspect of the present disclosure, a lithology recognition method is provided, including:

step 201, a logging while drilling dataset of a well to be identified is obtained.

Firstly, aiming at a well needing lithology identification, acquiring a logging-while-drilling data set of the well in the drilling process.

Step 202, inputting a logging while drilling data set of a well to be identified into a target identification model for identification, and obtaining lithology categories of the well to be identified; the target recognition model is obtained by training by using the lithology recognition model training method in the embodiment.

And inputting the logging while drilling data set of the well to be identified into the lithology identification model to obtain lithology results. Such as sandstone, england rock, limestone, mudstone, and the like.

Fig. 4 is a schematic structural diagram of a training device for a lithology recognition model according to an embodiment of the present disclosure, and according to a third aspect of the present disclosure, a training device for a lithology recognition model is provided, including:

A data acquisition module 301, configured to acquire logging while drilling data sets of a plurality of wells; the feature selection module 302 is configured to perform feature selection on the logging while drilling data set to obtain a target feature data set; a training data set obtaining module 303, configured to divide the target feature data set to obtain a training data set; an initial model generation module 304, configured to generate an initial recognition model based on a training data subset in the training data set; a sub-forest generating module 305, configured to divide all decision trees in the initial recognition model to obtain a plurality of sub-forests; the middle model generating module 306 is configured to, for each sub-forest, cut down the decision tree based on the cutting contribution of each decision tree in the sub-forest, to obtain a middle recognition model; the weight distribution module 307 is configured to distribute weights to decision trees in the intermediate recognition model according to the out-of-bag data set in the training data set, so as to obtain a weighted intermediate recognition model; and the super-parameter optimization module 308 is configured to adjust the super-parameters of the intermediate recognition model after the weight is assigned to satisfy a preset condition, so as to obtain a target recognition model. Wherein, the feature selection module 302 is further configured to determine first mutual information between any two logging while drilling data in the logging while drilling data set; determining second mutual information between each logging while drilling data and a lithology category in the logging while drilling data set; and determining a target characteristic data set according to the first mutual information and the second mutual information.

In an embodiment, the sub-forest generating module 305 is further configured to determine a lithology recognition result of each decision tree in the initial recognition model; determining the identification accuracy of each decision tree on each lithology according to the lithology identification result; based on the identification accuracy of each lithology of each decision tree, dividing all decision trees to obtain a plurality of sub-forests.

In an embodiment, the intermediate model generating module 306 is further configured to determine an accuracy contribution and a diversity contribution of the sub-forest after each decision tree is cut down; determining a felling contribution of each decision tree based on the accuracy contribution and the diversity contribution; if the threshold value is larger than the threshold value, the decision tree is cut down; if the felling contribution value of the decision tree is not greater than the preset threshold value, reserving the decision tree; based on the retained decision tree, an intermediate recognition model is generated.

In an embodiment, the super-parameter optimization module 308 is configured to iteratively update the initial super-parameter of the intermediate recognition model after the weight is assigned by using a bayesian optimization algorithm until the super-parameter meets a preset condition, so as to obtain the target recognition model.

Fig. 5 shows a schematic structural diagram of a lithology recognition device according to an embodiment of the present disclosure, and according to another aspect of the present disclosure, a lithology recognition device is provided, including:

A to-be-identified data acquisition module 401, configured to acquire a logging while drilling data set of a well to be identified;

the identification module 402 is configured to input the logging while drilling data set of the well to be identified into a target identification model for identification, so as to obtain a lithology category of the well to be identified; the target recognition model is obtained by training by using the lithology recognition model training method in the embodiment.

According to embodiments of the present disclosure, the present disclosure also provides an electronic device and a readable storage medium.

Fig. 6 shows a schematic block diagram of an example electronic device 800 that may be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 6, the apparatus 800 includes a computing unit 801 that can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM) 802 or a computer program loaded from a storage unit 808 into a Random Access Memory (RAM) 803. In the RAM 803, various programs and data required for the operation of the device 800 can also be stored. The computing unit 801, the ROM 802, and the RAM 803 are connected to each other by a bus 804. An input/output (I/O) interface 805 is also connected to the bus 804.

Various components in device 800 are connected to I/O interface 805, including: an input unit 806 such as a keyboard, mouse, etc.; an output unit 807 such as various types of displays, speakers, and the like; a storage unit 808, such as a magnetic disk, optical disk, etc.; and a communication unit 809, such as a network card, modem, wireless communication transceiver, or the like. The communication unit 809 allows the device 800 to exchange information/data with other devices via a computer network such as the internet and/or various telecommunication networks.

The computing unit 801 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of computing unit 801 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, etc. The computing unit 801 performs the various methods and processes described above, such as a method of training a lithology recognition model. For example, in some embodiments, a method of training a lithology recognition model may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as the storage unit 808. In some embodiments, part or all of the computer program may be loaded and/or installed onto device 800 via ROM 802 and/or communication unit 809. When the computer program is loaded into RAM 803 and executed by computing unit 801, one or more steps of a method of training a lithology recognition model as described above may be performed. Alternatively, in other embodiments, the computing unit 801 may be configured to perform a method of training the lithology recognition model in any other suitable manner (e.g., by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems-on-a-chip (SOCs), complex Programmable Logic Devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for carrying out methods of the present disclosure may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus such that the program code, when executed by the processor or controller, causes the functions/operations specified in the flowchart and/or block diagram to be implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the internet.

The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server incorporating a blockchain.

It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps recited in the present disclosure may be performed in parallel, sequentially, or in a different order, provided that the desired results of the disclosed aspects are achieved, and are not limited herein.

Furthermore, the terms "first," "second," and the like, are used for descriptive purposes only and are not to be construed as indicating or implying a relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include at least one such feature. In the description of the present disclosure, the meaning of "a plurality" is two or more, unless explicitly defined otherwise.

The foregoing is merely specific embodiments of the disclosure, but the protection scope of the disclosure is not limited thereto, and any person skilled in the art can easily think about changes or substitutions within the technical scope of the disclosure, and it is intended to cover the scope of the disclosure. Therefore, the protection scope of the present disclosure shall be subject to the protection scope of the claims.

Claims

1. A method of training a lithology recognition model, the method comprising:

acquiring a logging while drilling data set of a plurality of wells;

Performing feature selection on the logging while drilling data set to obtain a target feature data set;

Dividing the target characteristic data set to obtain a training data set;

generating an initial recognition model based on a training data subset in the training data set;

dividing all decision trees in the initial recognition model to obtain a plurality of sub-forests;

For each sub-forest, cutting down the decision trees based on the cutting contribution degree of each decision tree in the sub-forest to obtain an intermediate recognition model;

Distributing weights to decision trees in the intermediate recognition model according to the out-of-bag data set in the training data set to obtain an intermediate recognition model with the weights distributed;

adjusting the superparameter of the intermediate recognition model after the weight distribution to meet a preset condition to obtain a target recognition model;

The feature selection of the logging while drilling data set to obtain a target feature data set includes: determining first mutual information between any two logging while drilling data in the logging while drilling data set; determining second mutual information between each logging while drilling data and a lithology category in the logging while drilling data set; determining a target characteristic data set according to the first mutual information and the second mutual information;

dividing all decision trees in the initial recognition model to obtain a plurality of sub-forests, wherein the method comprises the following steps: determining lithology recognition results of each decision tree in the initial recognition model; determining the identification accuracy of each decision tree on each lithology according to the lithology identification result; dividing all decision trees based on the identification accuracy of each lithology of each decision tree to obtain a plurality of sub-forests;

the method for logging the decision trees for each sub-forest based on the logging contribution of each decision tree in the sub-forest, to obtain an intermediate recognition model, comprises the following steps: determining the accuracy contribution degree and the diversity contribution degree of the sub-forest after each decision tree is cut down; determining a felling contribution of each decision tree based on the accuracy contribution and the diversity contribution; if the threshold value is larger than the threshold value, the decision tree is cut down; if the felling contribution value of the decision tree is not greater than the preset threshold value, reserving the decision tree; based on the retained decision tree, an intermediate recognition model is generated.

2. The method according to claim 1, wherein the adjusting the superparameter of the intermediate recognition model after the weight assignment to satisfy the preset condition to obtain the target recognition model includes:

And carrying out iterative updating on the initial super parameters of the intermediate recognition model after the weight allocation by a Bayesian optimization algorithm until the super parameters meet preset conditions, so as to obtain the target recognition model.

3. A lithology recognition method, the method comprising:

acquiring a logging while drilling data set of a well to be identified;

Inputting the logging while drilling data set of the well to be identified into a target identification model for identification to obtain lithology categories of the well to be identified; the object recognition model is trained by the training method of the lithology recognition model according to any one of claims 1-2.

4. A training device for lithology recognition models, the device comprising:

the data acquisition module is used for acquiring logging while drilling data sets of a plurality of wells;

the feature selection module is used for carrying out feature selection on the logging while drilling data set to obtain a target feature data set;

the training data set acquisition module is used for dividing the target characteristic data set to obtain a training data set;

the initial model generation module is used for generating an initial recognition model based on the training data subset in the training data set;

the sub-forest generation module is used for dividing all decision trees in the initial recognition model to obtain a plurality of sub-forests;

The middle model generation module is used for cutting down the decision trees according to the cutting contribution degree of each decision tree in each sub-forest to obtain a middle recognition model;

the weight distribution module is used for distributing weights to the decision trees in the intermediate recognition model according to the out-of-bag data set in the training data set to obtain an intermediate recognition model after the weights are distributed;

the super-parameter optimization module is used for adjusting the super-parameters of the intermediate recognition model after the weight is distributed to meet preset conditions to obtain a target recognition model;

The feature selection module is used for determining first mutual information between any two logging while drilling data in the logging while drilling data set; determining second mutual information between each logging while drilling data and a lithology category in the logging while drilling data set; determining a target characteristic data set according to the first mutual information and the second mutual information;

The sub-forest generation module is used for determining lithology recognition results of each decision tree in the initial recognition model; determining the identification accuracy of each decision tree on each lithology according to the lithology identification result; dividing all decision trees based on the identification accuracy of each lithology of each decision tree to obtain a plurality of sub-forests;

the middle model generation module is used for determining the accuracy contribution degree and the diversity contribution degree of the sub-forest after each decision tree is cut down; determining a felling contribution of each decision tree based on the accuracy contribution and the diversity contribution; if the threshold value is larger than the threshold value, the decision tree is cut down; if the felling contribution value of the decision tree is not greater than the preset threshold value, reserving the decision tree; based on the retained decision tree, an intermediate recognition model is generated.

5. A lithology recognition device, the device comprising:

the to-be-identified data acquisition module is used for acquiring a logging while drilling data set of the to-be-identified well;

the identification module is used for inputting the logging while drilling data set of the well to be identified into a target identification model for identification to obtain lithology categories of the well to be identified; the object recognition model is trained by the training method of the lithology recognition model according to any one of claims 1-2.

6. An electronic device, comprising:

At least one processor; and

A memory communicatively coupled to the at least one processor; wherein,

The memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-2.

7. A non-transitory computer readable storage medium storing computer instructions for causing a computer to perform the method of any one of claims 1-2.