WO2024059094A1 - Natural language-based search engine for information retrieval in energy industry - Google Patents
Natural language-based search engine for information retrieval in energy industry Download PDFInfo
- Publication number
- WO2024059094A1 WO2024059094A1 PCT/US2023/032579 US2023032579W WO2024059094A1 WO 2024059094 A1 WO2024059094 A1 WO 2024059094A1 US 2023032579 W US2023032579 W US 2023032579W WO 2024059094 A1 WO2024059094 A1 WO 2024059094A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- natural language
- query
- database
- language query
- conversion framework
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/242—Query formulation
- G06F16/243—Natural language query formulation
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2452—Query translation
- G06F16/24522—Translation of natural language queries to structured queries
Definitions
- a reservoir may be a subsurface formation that may be characterized at least in part by its porosity and fluid permeability.
- a reservoir may be part of a basin such as a sedimentary basin.
- a basin can be a depression (e.g., caused by plate tectonic activity, subsidence, and so forth) in which sediments accumulate.
- a petroleum IS22.0765-WO-PCT system may develop within a basin, which may form a reservoir that includes hydrocarbon fluids (e.g., oil, gas, and so forth).
- hydrocarbon fluids e.g., oil, gas, and so forth.
- a method includes receiving, via the natural language query conversion framework, a natural language query; converting, via the natural language query conversion framework, the natural language query into a database query using a language model (LM); and executing, via the natural language query conversion framework, the database query against an oil and gas (O&G) database.
- LM language model
- OFG oil and gas
- FIG.1 illustrates an example computing system, in accordance with embodiments of the present disclosure
- FIG.2 illustrates a survey operation being performed by a survey tool, such as a seismic truck, to measure properties of a subterranean formation, in accordance with embodiments of the present disclosure
- FIG.3 illustrates a drilling operation being performed by drilling tools suspended by a rig and advanced into a subterranean formation to form a wellbore, in accordance with embodiments of the present disclosure
- FIG.4 illustrates a wireline operation being performed by a wireline tool suspended by the rig and into the wellbore of FIG.3,
- connection As used herein, the terms “connect,” “connection,” “connected,” “in connection with,” and “connecting” are used to mean “in direct connection with” or “in connection with via one or more elements”; and the term “set” is used to mean “one element” or “more than one IS22.0765-WO-PCT element.” Further, the terms “couple,” “coupling,” “coupled,” “coupled together,” and “coupled with” are used to mean “directly coupled together” or “coupled together via one or more elements.” As used herein, the terms “up” and “down,” “uphole” and “downhole”, “upper” and “lower,” “top” and “bottom,” and other like terms indicating relative positions to a given point or element are utilized to more clearly describe some elements.
- these terms relate to a reference point as the surface from which drilling operations are initiated as being the top (e.g., uphole or upper) point and the total depth along the drilling axis being the lowest (e.g., downhole or lower) point, whether the well (e.g., wellbore, borehole) is vertical, horizontal or slanted relative to the surface.
- the terms “real time”, ”real-time”, or “substantially real time” may be used interchangeably and are intended to described operations (e.g., computing operations) that are performed without any human-perceivable interruption between operations.
- FIG.1 illustrates an example computing system 10 in accordance with embodiments of the present disclosure.
- the computing system 10 may include an individual computer 12A or an arrangement of distributed computers 12B, 12C, 12D.
- the computer 12A may include one or more geosciences analysis modules 14 that are configured to perform the various tasks described herein. To perform these various tasks, the geosciences analysis modules 14 may execute independently, or in coordination with, one or more processors 16, which may be connected to one or more storage media 18.
- the processor(s) 16 may also be connected to a network interface 20 to enable the computer 12A to communicate over a communication network 22 with one or more additional computers and/or computing systems, such as computers 12B, 12C, and/or 12D.
- the computers 12B, 12C and/or 12D may or may not share the same architecture as the computer 12A, and may be located in different physical locations.
- the computers 12A and 12B may be on a ship underway on the ocean, while in communication with one or more computers 12C and/or 12D that are located in one or more data centers on shore, other ships, and/or located in varying countries on different continents.
- the communication network 22 may be a private network, it may use portions of public networks, it may include remote storage and/or applications processing capabilities (e.g., cloud computing).
- the processor(s) 16 may include a microprocessor, microcontroller, processor module or subsystem, programmable integrated circuit, programmable gate array, or another control or computing device.
- the storage media 18 may be implemented as one or more computer-readable or machine-readable storage media. It should be noted that while in the example embodiment of FIG.1, the storage media 18 is illustrated as being disposed within the computer 12A, in other embodiments, the storage media 18 may be distributed within and/or across multiple internal and/or external enclosures of the computer 12A and/or additional computers 12B, 12c, 12D.
- the storage media 18 may include one or more different forms of memory including semiconductor memory devices such as dynamic or static random access memories (DRAMs or SRAMs), erasable and programmable read-only memories (EPROMs), electrically erasable and programmable read-only memories (EEPROMs) and flash memories; magnetic disks such as fixed, floppy and removable disks; other magnetic media including tape; optical media such as compact disks (CDs) or digital video disks (DVDs), BluRays or any other type of optical media; or other types of storage devices.
- semiconductor memory devices such as dynamic or static random access memories (DRAMs or SRAMs), erasable and programmable read-only memories (EPROMs), electrically erasable and programmable read-only memories (EEPROMs) and flash memories
- magnetic disks such as fixed, floppy and removable disks
- optical media such as compact disks (CDs) or digital video disks (DVDs), BluRays or any other type of optical media; or other types
- the instructions discussed herein may be provided on one computer-readable or machine-readable storage medium, or alternatively, may be provided on multiple computer-readable or machine-readable storage media distributed in a large system having possibly plural nodes and/or non-transitory storage means.
- Such computer-readable or machine-readable storage medium or media may be considered to be part of an article (or article of manufacture).
- An article or article of manufacture may refer to any manufactured single component or multiple components.
- the storage medium or media may be located either in the machine running the machine-readable instructions, or located at a remote site from which machine-readable instructions may be downloaded over the communication network 22 for execution.
- the computer 12A is but one example of a computer, and that the computer 12A may have more or fewer components than shown, may combine additional components not depicted in the example embodiment of FIG.1, and/or the computer 12A may have a different configuration or arrangement of the components depicted in FIG.1.
- various components shown in FIG.1 may be implemented IS22.0765-WO-PCT in hardware, software, or a combination of both, hardware and software, including one or more signal processing and/or application specific integrated circuits.
- computers 12A, 12B, 12C, and 12D may include computers with keyboards, mice, touch screens, displays, and so forth.
- some computers in use in the computing system 10 may be desktop workstations, laptops, tablet computers, smartphones, server computers, and so forth.
- the steps in the processing methods described herein may be implemented by running one or more functional modules in information processing apparatus such as general purpose processors or application specific chips, such as ASICs, FPGAs, PLDs, or other appropriate devices. These modules, combinations of these modules, and/or their combination with general hardware are included within the scope of protection.
- FIGS.2-5 illustrate simplified, schematic views of an oilfield 24 having a subterranean formation 26 containing a reservoir 28 therein in accordance with implementations of various technologies and techniques described herein.
- FIG.2 illustrates a survey operation being performed by a survey tool, such as a seismic truck 30A, to measure properties of the subterranean formation 26.
- the survey operation may be a seismic survey operation for producing sound vibrations.
- one such sound vibration 32, generated by source 34 reflects off horizons 36 in the subterranean formation 26.
- a set of sound vibrations may be received by sensors 38 (e.g., geophone-receivers) situated at the surface of the oilfield 24.
- FIG.3 illustrates a drilling operation being performed by drilling tools 30B suspended by a rig 44 and advanced into the subterranean formation 26 to form a wellbore 46.
- a mud pit 48 may be used to draw drilling mud into the drilling tools 30B via a flow line 50 for circulating drilling mud down through the drilling tools, then up through the wellbore 46 and back to the surface of the oilfield 24.
- the drilling mud may be filtered and returned to the mud pit 48.
- a circulating system may be used for storing, controlling, or filtering the flowing drilling mud.
- the drilling tools 30B may be advanced into the subterranean formation 26 to reach the reservoir 28. In general, each well may target one or more reservoirs 28.
- the drilling tools 30B may be configured to measure downhole properties, for example, using logging while drilling (LWD) tools. In certain embodiments, the LWD tools may also be configured to capture a core sample 52, as illustrated.
- computer facilities e.g., the surface unit 54
- the surface unit 54 may be used to communicate with the drilling tools 30B and/or offsite operations, as well as with other surface or downhole sensors.
- the surface unit 54 may be configured to communicate with the drilling tools 30B to send control commands to the drilling tools 30B and to receive data therefrom.
- the surface unit 54 may also be configured to collect data generated during a drilling operation and produce data output 56, which may then be stored or transmitted.
- various sensors such as gauges, may be positioned about the oilfield 24 to collect data relating to various oilfield operations as described herein.
- sensor may be positioned in one or more locations in the drilling tools 30B and/or at the rig 44 to measure drilling parameters, such as weight on bit, torque on bit, pressures, temperatures, flow rates, compositions, rotary speed, and/or other parameters of the field operation.
- sensors may also be positioned in one or more locations in the circulating system.
- the drilling tools 30B may include a bottom hole assembly (BHA) (not shown) near the drill bit of the drilling tools 30B (e.g., within several drill collar lengths from the drill bit).
- BHA bottom hole assembly
- the bottom hole assembly may include capabilities for measuring, processing, and storing information, as well as communicating with the surface unit 54.
- the bottom hole assembly may further include drill collars for performing various other measurement functions.
- the bottom hole assembly may include a communication subassembly configured to communicate with the surface unit 54.
- the communication subassembly may be adapted to send signals to and receive signals from the surface using a dedicated communications channel such as mud pulse telemetry, electro- magnetic telemetry, or wired drill pipe communications.
- the communication subassembly may include, for example, a transmitter that generates a signal, such as an acoustic or electromagnetic signal, which is representative of measured drilling parameters. It should be appreciated that a variety of telemetry systems may be employed, such as wired drill pipe, electromagnetic or other known telemetry systems.
- the wellbore 46 may be drilled according to a drilling plan that is established prior to drilling.
- the drilling plan typically sets forth equipment, pressures, IS22.0765-WO-PCT trajectories and/or other parameters that define the drilling process for the wellsite.
- the drilling operation may then be performed according to the drilling plan. However, as information is gathered, the drilling operation may need to deviate from the drilling plan. Additionally, as drilling or other operations are performed, the subsurface conditions may change. In addition, the earth model may also need adjustment as new information is collected.
- the data gathered by the sensors may be collected by the surface unit 54 and/or other data collection sources for analysis or other processing. The data collected by the sensors may be used alone or in combination with other data.
- the data may be collected in one or more databases and/or transmitted on or offsite.
- the data may be historical data, real time data, or combinations thereof.
- the real time data may be used in real time, or stored for later use.
- the data may also be combined with historical data or other inputs for further analysis.
- the data may be stored in separate databases, or combined into a single database.
- the surface unit 54 may include a transceiver 58 configured to enable communications between the surface unit 54 and various portions of the oilfield 24 or other locations.
- the surface unit 54 may also be provided with or functionally connected to one or more controllers (not shown) for actuating mechanisms at the oilfield 24.
- the surface unit 54 may then send command signals to the oilfield 24 in response to the received data.
- the surface unit 54 may receive commands via the transceiver 58 or may itself execute commands to the controller.
- a processor may be provided to analyze the data (locally or remotely), make the decisions and/or actuate the controller. In this manner, the oilfield 24 may be selectively adjusted based on the data collected. This technique may be used to optimize (or improve) portions of the field operation, IS22.0765-WO-PCT such as controlling drilling, weight on bit, pump rates, or other parameters. These adjustments may be made automatically based on computer protocol, and/or manually by an operator. In certain situations, well plans may be adjusted to select optimum (or improved) operating condition, or to avoid problems.
- FIG.4 illustrates a wireline operation being performed by a wireline tool 30C suspended by the rig 44 and into the wellbore 46 of FIG.3.
- the wireline tool 30C may be adapted for deployment into the wellbore 46 for generating well logs, performing downhole tests and/or collecting samples.
- the wireline tool 30C may be used to provide another method and apparatus for performing a seismic survey operation.
- the wireline tool 30C may, for example, have an explosive, radioactive, electrical, or acoustic energy source 60 that sends and/or receives electrical signals to the surrounding subterranean formation 26 and fluids therein.
- the wireline tool 30C may be operatively connected to, for example, geophones 38 and a computer 12 of a seismic truck 30A of FIG.2.
- the wireline tool 30C may also provide data to the surface unit 54.
- the surface unit 54 may collect data generated during the wireline operation and may produce data output 56 that may be stored or transmitted.
- the wireline tool 30C may be positioned at various depths in the wellbore 46 to provide a survey or other information relating to the subterranean formation 26.
- Sensors such as gauges, may be positioned about oilfield 24 to collect data relating to various field operations, as described previously.
- FIG.5 illustrates a production operation being performed by a production tool 30D deployed from a production unit or Christmas tree 62 and into a completed wellbore 46 for drawing fluid from downhole reservoirs 28 into surface facilities 64.
- the fluid may flow from a reservoir 28 through perforations in the casing (not shown) and into the production tool 30D in the wellbore 46 and to surface facilities 64 via a gathering network 66.
- Sensors such as gauges, may be positioned about the oilfield 24 to collect data relating to various field operations, as described previously.
- sensors may be positioned in the production tool 30D or associated equipment, such as the Christmas tree 62, the gathering network 66, the surface facilities 64, and/or a production facility, to measure fluid parameters, such as fluid composition, flow rates, pressures, temperatures, and/or other parameters of the production operation.
- production may also include injection wells for added recovery.
- one or more gathering facilities may be operatively connected to one or more of the wellsites for selectively collecting downhole fluids from the wellsite(s).
- FIGS.2 through 5 illustrate tools 30 used to measure properties of an oilfield 24, it will be appreciated that the tools 30 may be used in connection with non-oilfield operations, such as gas fields, mines, aquifers, storage or other subterranean facilities.
- non-oilfield operations such as gas fields, mines, aquifers, storage or other subterranean facilities.
- data acquisition tools 30 are depicted, it will be appreciated that various measurement tools capable of sensing parameters, such as seismic two-way travel time, density, resistivity, production rate, and so forth, of the subterranean formation 26 and/or its geological formations may be used.
- Various sensors may be located at various positions along the wellbore 46 and/or the monitoring tools 30 to collect and/or monitor the desired data. In certain scenarios, other sources of data may also be provided from offsite locations.
- FIGS.2 through 5 are intended to provide a brief description of an example of a field usable with oilfield application frameworks.
- Part of, or the entirety, of an oilfield 24 may be on land, water, and/or sea.
- oilfield applications may be utilized with any combination of one or more oilfields, one or more processing facilities and one or more wellsites.
- FIG.6 illustrates a schematic view, partially in cross section, of an oilfield 24 having data acquisition tools 30E, 30F, 30G, 30H positioned at various locations along the oilfield 24 for collecting data for a subterranean formation 26 in accordance with implementations of various technologies and techniques described herein.
- the data acquisition tools 30E, 30F, 30G, 30H may be the same as the data acquisition tools 30A, 30B, 30C, 30D illustrated in, and described with reference to, FIGS.2-5, respectively, or others not depicted.
- the data acquisition tools 30E, 30F, 30G, 30H may each generate data plots or measurements 68E, 68F, 68G, 68H, respectively.
- data plots 68E, 68F, 68G, 68H are depicted along the oilfield 24 to demonstrate the data generated by the various operations.
- the data plots 68E, 68F, 68G are examples of static data plots that may be generated by the data acquisition tools 30E, 30F, 30G, respectively.
- the data plots 68E, 68F, 68G may also be data plots that are updated in substantially real time during deployment and operation of the respective data acquisition tools 30E, 30F, 30G.
- these measurements may be analyzed to better define the properties of one or more formation(s) 26A, 26B, 26C, 26D and/or determine the accuracy of the measurements and/or for checking for errors.
- the plots 68E, 68F, 68G, 68H of some of the respective measurements may be aligned and scaled with each other for comparison and verification of the properties of the one or more formation(s) 26A, 26B, 26C, 26D.
- the static data plot 68E may be a seismic two-way response over a period of time.
- the static plot 68F may be a core sample data measured from a core sample of the subterranean formation 26.
- the core sample may be used to provide data, such as a graph of the density, porosity, permeability, or some other physical property of the core sample over the length of the core.
- tests for density and viscosity may be performed on the fluids in the core at varying pressures and temperatures.
- the static data plot 68G may be a logging trace that typically provides a resistivity or other measurement of the formation at various depths.
- a production decline curve or graph 68H may be a dynamic data plot of the fluid flow rate over time.
- the production decline curve 68H typically provides the production rate as a function of time.
- measurements may be taken of fluid properties, such as flow rates, pressures, composition, and so forth.
- other data may also be collected, such as historical data, user inputs, economic information, and/or other measurement data and other parameters of interest.
- the static and dynamic measurements may be analyzed and used to generate models of subterranean formation(s) 26 to determine characteristics thereof. Similar measurements may also be used to measure changes in formation aspects over time.
- a subterranean formation 26 may have a plurality of geological formations 26A, 26B, 26C, 26D.
- the illustrated subterranean formation 26 has several formations or layers, including a shale layer 26A, a carbonate layer 26B, a shale IS22.0765-WO-PCT layer 26C, and a sand layer 26D.
- a fault 70 extends through the shale layer 26A and the carbonate layer 26B.
- the static data acquisition tools are adapted to take measurements and detect characteristics of the geological formations 26A, 26B, 26C, 26D.
- an oilfield 24 may contain a variety of geological structures and/or formations 26, sometimes having extreme complexity. For example, in some locations, typically below the water line, fluid may occupy pore spaces of the formations 26.
- Each of the data acquisition tools 30 may be used to measure properties of the formations 26 and/or its geological features. While each data acquisition tool 30E, 30F, 30G, 30H is illustrated in FIG.6 as being in specific locations in the oilfield 24, it will be appreciated that one or more types of measurement may be taken at one or more locations across one or more fields or other locations for comparison and/or analysis.
- the data collected from various sources may then be processed and/or evaluated as described in greater detail herein.
- seismic data displayed in the static data plot 68E generated using data acquired by the data acquisition tool 30E is used by a geophysicist to determine characteristics of the subterranean formations 26 and/or its geological features.
- the core data shown in the static plot 68F and/or the log data in the well log 68G are typically used by a geologist to determine various characteristics of the subterranean formation.
- the production data from the graph 68H is typically used by a reservoir engineer to determine fluid flow reservoir characteristics.
- FIG.7 illustrates an oilfield 24 for performing production operations in accordance with implementations of various technologies and techniques described herein.
- the oilfield 24 has a plurality of wellsites 72 operatively connected to A central processing facility 74.
- the oilfield configuration of FIG.7 is not intended to limit the scope of the embodiments described herein. Part, or all, of the oilfield 24 may be on land and/or sea.
- Each wellsite 72 has equipment that forms one or more wellbores 46 into the earth.
- the wellbores 46 extend through subterranean formations 26 including reservoirs 28 that contain fluids, such as hydrocarbons.
- the wellsites 72 draw fluid from the reservoirs 28 and direct the fluids to processing facilities 74 via surface networks 76.
- the surface networks 76 may include tubing and control mechanisms for controlling the flow of fluids from the wellsites 72 to processing facilities 74.
- the embodiments described herein include methods, techniques, and workflows for planning, forecasting, and/or optimizing production-related systems (e.g., model selections, reservoir maps, wells, and so forth). Some operations in the processing procedures, methods, techniques, and workflows described herein may be combined and/or the order of some operations may be changed. Those with skill in the art will recognize that in the geosciences and/or other multi-dimensional data processing disciplines, various interpretations, sets of assumptions, and/or domain models such as velocity models, may be refined in an iterative fashion. This concept is applicable to the procedures, methods, techniques, and workflows as described herein.
- This iterative refinement may include use of feedback loops executed on an IS22.0765-WO-PCT algorithmic basis, such as at a computing device (e.g., the computing system 10 illustrated in, and described with reference to, FIG.1), and/or through manual control by a user who may make determinations regarding whether a given step, action, template, or model has become sufficiently accurate.
- a computing device e.g., the computing system 10 illustrated in, and described with reference to, FIG.1
- the data comes in different formats, including structured data (e.g., organized tables) and unstructured data (e.g., text documents, logs, images, and so forth).
- the data is stored in either relational or non- relational databases, and used to gain valuable insights of various aspects of the O&G lifecycle, including production decision making, operational efficiency, regulatory compliance, and so forth.
- retrieving relevant information from the databases requires end-users to be relatively conversant with database query syntaxes and schema definitions, which is relatively challenging.
- the embodiments described herein introduce a novel framework to interact with O&G databases using natural language-based searches.
- Natural language interface over oil and gas (O&G) databases A transformer-based language model (LM) has been developed to convert natural language queries (e.g., in English, Spanish, French, and so forth) to the database query syntax (e.g., Structured Query Language (SQL), and so forth).
- the LM may be trained on curated oil and gas (O&G) datasets.
- x Multi-lingual interaction with O&G databases The framework described herein supports multi-lingual O&G databases with muti-lingual natural language IS22.0765-WO-PCT queries. For example, an end user may query a Spanish database with English queries and vice versa.
- Multi-task training on O&G domain related tasks Multi-task training has been adopted to create and train a robust LM capable of effectively handling O&G domain specific tasks. Through multi-task training, the LM shares parameters across multiple tasks, enabling it to capture common patterns and features. This approach enhances the LM’s understanding of the domain better, as compared to training on individual tasks in isolation.
- x Database agnostic searches The framework described herein may be adapted to different types of O&G databases without retraining the LM. This illustrates the zero-shot learning capabilities of the LM.
- x Workflow for data generation To train the LM on O&G domain related tasks, the required training data may be prepared using a data generation pipeline.
- FIG.8 illustrates an embodiment of a flowchart of a natural language query conversion framework 78 for converting natural language queries to database queries, as described in greater detail herein.
- the natural language query conversion framework 78 includes receives a natural language query 80 as an input, and uses a language model (LM) 82 to convert the natural language query 80 into a database query 84, which may be used to query an O&G database 86 to produce query results 88, as described in greater detail herein.
- LM language model
- the O&G database 86 may include data including data collected by the various data acquisition tools 30 described with reference to FIGS.2-6 above.
- the natural language query conversion framework 78 may be implemented by the computing system 10 described with reference to FIG.1 above.
- the embodiments described herein enable end users to interact with O&G databases using natural language, whether the natural language is in the same language used by a particular O&G database or the natural language is in a natural language (e.g., English, Spanish, French, and so forth) different from that used by the particular O&G database.
- a transformer-based LM 82 which converts natural language queries to database query syntaxes has been developed.
- a curated O&G dataset has been created to train the LM 82 on O&G domain queries.
- the LM 82 may be or include a text-to-text transfer transformer (T5) model, which may take input as a natural language query 80 appended with attributes of the database schema and then translate the natural language query 80 appended with attributes of the database schema into a database query 84, which may be used to query an O&G database 86 to generate query results 88.
- T5 text-to-text transfer transformer
- O&G Curated oil and gas
- a few sample queries from domain experts pertaining to energy may be obtained.
- the training datasets may be prepared by augmenting domain expert queries with synthetically generated queries.
- a synthetic data generation pipeline may be implemented using database schema information, example values, and historical logs of domain experts. Multi-task training on O&G domain related tasks.
- multi-task training was adopted to create a robust LM 82 capable of effectively handling O&G domain-specific tasks.
- the LM 82 shares parameters across tasks, enabling it to capture common patterns and features. This approach enhances the LM’s understanding of the particular domains better compared to training on individual tasks in isolation.
- the LM 82 exhibits an impressive ability to perform well on new tasks even with limited examples. Additionally, the LM 82 can generate outputs for tasks it was not specifically trained on when provided with a natural language prompt.
- FIG.9 illustrates a multi-task training framework 90 using a transformer-based LM 82, addressing multiple O&G domain tasks.
- multiple tasks 92, 94, 96 may be used to create and refine the LM 82 including, but not limited to, natural language to database query generation 92, target entity detection 94, and O&G discipline classification 96.
- the LM 82 may then be used to generate database queries 84 for particular target entities 98 in particular O&G disciplines 100, as described in greater detail herein.
- the tasks 92, 94, 96 used to create and refine the LM 82 may include enabling conversion of the natural language queries 80 and database queries 84 that are based in different natural languages, thereby enabling multi-lingual searching of O&G databases 86.
- Natural language to database query generation 92 [0063]
- the LM 82 translates a natural language query 80 into its corresponding database query 84.
- an instruction prompt such as “Convert to database query” may be appended to the natural language query 80.
- the entity/schema attributes may be appended to the natural language query 80 to define the scope of output attributes for query generation.
- the objective of this task is to accurately identify the target entity for a given natural language query 80.
- the LM 82 may be trained using a “Detect target entity” instruction prompt as a prefix to the natural language query 80.
- the natural language query 80 may be appended with all possible entity classes to enable the LM 82 to learn the entity selection from a given set of entities. This approach provides scalability to the detection task, allowing for the inclusion of new O&G database entities without the need for fine-tuning the LM 82.
- Detect target entity Show me the profile of wells located in ABC Basin and spud after 2010
- wellbore marker set [0066]
- Detect target entity is the instruction prompt
- Show me the profile of wells located in ABC Basin and spud after 2010” is the natural language query 80
- the remainder is the entity classes.
- O&G discipline classification 96 [0067] The objective of this task is to classify definitions, explanations, and illustrations that are commonly used in the O&G industry to their specific disciplines. The disciplines may include drilling, production, geology, geophysics, reservoir characterization, and so forth.
- Anticline is an arch-shaped fold in rock in which rock layers are upwardly convex. The oldest rock layers form the core of the fold, and outward from the core progressively younger rocks occur. Anticlines form many excellent hydrocarbon traps, particularly in folds with reservoir-quality rocks in their core and impermeable seals in the outer layers of the fold.
- This process provides additional natural language and database query pairs for data augmentation with diverse natural language representation, without the need for domain experts to manually analyze the database queries 84, thereby improving the functionality of the natural language query conversion framework 78 described herein.
- Domain specific record matching for correct data retrieval [0070] While typing a natural language query, a user may make typographical or spelling errors. To rectify these mistakes, a record matching algorithm may be implemented, which may invoke whenever there are no records found using a translated database query 84. In general, the record matching algorithm compares a predicted value with all values corresponding to the predicted attribute in an O&G database 86.
- FIG.11 illustrates examples of Levenshtein distance and metaphone algorithms, which may be used as part of a record matching algorithm.
- FIG.12 illustrates a flowchart of an example record matching workflow 102 for spelling correction, which may be implemented by the record matching algorithm. As illustrated in FIG.12, in certain embodiments, the workflow 102 may include receiving all values corresponding to a predicted attribute (e.g., block 104).
- a similarity to the predicted value may be calculated for every value (e.g., by determining a Levenshtein distance or phonetics similarity) (e.g., block 106). Then, a determination may be made as to whether the word sounds similar to the corrected value (decision block 108). If the word does sound similar to the corrected value, then the word may be replaced with the corrected value (block 110).
- a difference in similarity e.g., a Levenshtein distance or phonetics similarity distance
- level of similarity e.g., within a predetermined Levenshtein distance threshold or phonetics similarity distance threshold
- the developed natural language query conversion framework 78 described herein may be deployed in an artificial intelligence or machine learning (AI/ML) platform having an application programming interface (API) that facilitates data experts and domain experts to input data that is used by the natural language query conversion framework 78 to enable the natural language query conversion framework 78 to receive natural language queries 80 and to convert the natural language queries 80 into database queries 84, as described IS22.0765-WO-PCT in greater detail herein.
- natural language queries 80 may be input by a user via a data workspace application and processed locally by a data workspace application to generate database queries 84 and associated queries results 88, as described in greater detail herein, which may be transmitted back to the data workspace application for display for the user.
- FIG.13 is a flow diagram of a method 114 for utilizing the natural language query conversion framework 78, as described in greater detail herein. As illustrated in FIG.13, in certain embodiments, the method 114 may include receiving, via the natural language query conversion framework 78, a natural language query 80 (block 116).
- the method 114 may include converting, via the natural language query conversion framework 78, the natural language query 80 into a database query 84 using a language model (LM) 82 (block 118).
- the method 114 may include executing, via the natural language query conversion framework 78, the database query 84 against an oil and gas (O&G) database 86 (block 120).
- the method 114 may include appending, via the natural language query conversion framework, 78 the natural language query 80 with one or more database schema attributes prior to converting, via the natural language query conversion framework 78, the natural language query 80 into the database query 84 using the LM 82.
- the method 114 may include, upon determining that no records were found upon execution of the database query 84, comparing, via the natural language query conversion framework 78, a predicted value with all values corresponding to a predicted attribute in the O&G database 86; and replacing, via the natural language query conversion framework 78, the predicted value with a corrected value based on a similar value found in the O&G database 86.
- converting, via the natural language query conversion framework 78, the natural language query 80 into the database query 84 using the LM 82 includes using target entity detection 94 of the natural language query 80.
- converting, via the natural language query conversion framework 78, the natural language query 80 into the database query 84 using the LM 82 includes using O&G discipline classification 96 of the natural language query 80.
- values of the database query 84 are in a natural language different than the natural language query 80.
- the natural language query conversion framework 78 described herein a user may access the database information using a natural language search option. As such, the user need not be conversant in database query syntaxes and schema definitions. Therefore, the natural language query conversion framework 78 described herein does not require users to know complex database query language. IS22.0765-WO-PCT In addition for a new database schema, retraining of the LM 82 described herein is not required. In existing technology, where users may need to know database query languages to search oil and gas databases, the disclosed natural language query conversion framework 78 may act as a friendly frontend to enable natural language search options. The users may be more productive with such simplified searches.
- third parties may integrate the developed natural language query conversion framework 78 into their data management systems enabling such external data management systems to leverage the natural language search functionality described herein.
- third parties e.g., competitors and clients
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Artificial Intelligence (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
Description
Claims
Priority Applications (3)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CA3267706A CA3267706A1 (en) | 2022-09-14 | 2023-09-13 | Natural language-based search engine for information retrieval in energy industry |
| EP23866134.2A EP4587935A4 (en) | 2022-09-14 | 2023-09-13 | NATURAL LANGUAGE-BASED SEARCH ENGINE FOR INFORMATION RETRIEVE IN THE ENERGY INDUSTRY |
| AU2023342942A AU2023342942A1 (en) | 2022-09-14 | 2023-09-13 | Natural language-based search engine for information retrieval in energy industry |
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| IN202221052418 | 2022-09-14 | ||
| IN202221052418 | 2022-09-14 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2024059094A1 true WO2024059094A1 (en) | 2024-03-21 |
Family
ID=90275621
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/US2023/032579 Ceased WO2024059094A1 (en) | 2022-09-14 | 2023-09-13 | Natural language-based search engine for information retrieval in energy industry |
Country Status (4)
| Country | Link |
|---|---|
| EP (1) | EP4587935A4 (en) |
| AU (1) | AU2023342942A1 (en) |
| CA (1) | CA3267706A1 (en) |
| WO (1) | WO2024059094A1 (en) |
Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20170075953A1 (en) * | 2015-09-11 | 2017-03-16 | Google Inc. | Handling failures in processing natural language queries |
| US20170213157A1 (en) * | 2015-07-17 | 2017-07-27 | Knoema Corporation | Method and system to provide related data |
| US20210117625A1 (en) * | 2018-06-29 | 2021-04-22 | Microsft Technology Licensing, LLC | Semantic parsing of natural language query |
| US20210133535A1 (en) * | 2019-11-04 | 2021-05-06 | Oracle International Corporation | Parameter sharing decoder pair for auto composing |
| KR20220109978A (en) * | 2021-01-29 | 2022-08-05 | 포항공과대학교 산학협력단 | Apparatus and method for gathering training set for natural language to sql system |
-
2023
- 2023-09-13 CA CA3267706A patent/CA3267706A1/en active Pending
- 2023-09-13 WO PCT/US2023/032579 patent/WO2024059094A1/en not_active Ceased
- 2023-09-13 AU AU2023342942A patent/AU2023342942A1/en active Pending
- 2023-09-13 EP EP23866134.2A patent/EP4587935A4/en active Pending
Patent Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20170213157A1 (en) * | 2015-07-17 | 2017-07-27 | Knoema Corporation | Method and system to provide related data |
| US20170075953A1 (en) * | 2015-09-11 | 2017-03-16 | Google Inc. | Handling failures in processing natural language queries |
| US20210117625A1 (en) * | 2018-06-29 | 2021-04-22 | Microsft Technology Licensing, LLC | Semantic parsing of natural language query |
| US20210133535A1 (en) * | 2019-11-04 | 2021-05-06 | Oracle International Corporation | Parameter sharing decoder pair for auto composing |
| KR20220109978A (en) * | 2021-01-29 | 2022-08-05 | 포항공과대학교 산학협력단 | Apparatus and method for gathering training set for natural language to sql system |
Non-Patent Citations (1)
| Title |
|---|
| See also references of EP4587935A4 * |
Also Published As
| Publication number | Publication date |
|---|---|
| EP4587935A4 (en) | 2025-10-29 |
| EP4587935A1 (en) | 2025-07-23 |
| CA3267706A1 (en) | 2024-03-21 |
| AU2023342942A1 (en) | 2025-03-27 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US9074454B2 (en) | Dynamic reservoir engineering | |
| US11657090B2 (en) | Data searching, enrichment and consumption techniques using exploration and/or production entity relationships | |
| US12129755B2 (en) | Information extraction from daily drilling reports using machine learning | |
| US10719893B2 (en) | Symbolic rigstate system | |
| US20240026776A1 (en) | System and method for determining well correlation | |
| US8099267B2 (en) | Input deck migrator for simulators | |
| US20230116731A1 (en) | Intelligent time-stepping for numerical simulations | |
| US12147464B2 (en) | Extracting user-defined attributes from documents | |
| US20230281507A1 (en) | Automated similarity measurement and property estimation | |
| AU2023342942A1 (en) | Natural language-based search engine for information retrieval in energy industry | |
| US20230193736A1 (en) | Infill development prediction system | |
| US20230273335A1 (en) | Integration of geotags and opportunity maturation | |
| WO2025090866A1 (en) | Artificial intelligence based unstructured document geotagging solution | |
| US20250384065A1 (en) | Method and system for metadata extraction for document identification | |
| WO2024243558A2 (en) | Implementation of generative artificial intelligence in oilfield operations | |
| WO2025250172A1 (en) | Adaptive 4d seismic survey design for monitoring of carbon storage sites | |
| US20160108706A1 (en) | Reservoir simulation system and method | |
| Srivastava | Performance Prediction for Deepwater Gulf of Mexico Using Data Mining | |
| WO2016182787A1 (en) | Well analytics framework |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 23866134 Country of ref document: EP Kind code of ref document: A1 |
|
| WWE | Wipo information: entry into national phase |
Ref document number: AU2023342942 Country of ref document: AU |
|
| ENP | Entry into the national phase |
Ref document number: 2023342942 Country of ref document: AU Date of ref document: 20230913 Kind code of ref document: A |
|
| WWE | Wipo information: entry into national phase |
Ref document number: 2023866134 Country of ref document: EP |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| ENP | Entry into the national phase |
Ref document number: 2023866134 Country of ref document: EP Effective date: 20250414 |
|
| WWP | Wipo information: published in national office |
Ref document number: 2023866134 Country of ref document: EP |