[go: up one dir, main page]

CN118760698B - A natural language query method and system based on spatiotemporal knowledge cube - Google Patents

A natural language query method and system based on spatiotemporal knowledge cube Download PDF

Info

Publication number
CN118760698B
CN118760698B CN202411240362.8A CN202411240362A CN118760698B CN 118760698 B CN118760698 B CN 118760698B CN 202411240362 A CN202411240362 A CN 202411240362A CN 118760698 B CN118760698 B CN 118760698B
Authority
CN
China
Prior art keywords
cube
data
column
spatiotemporal
time
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202411240362.8A
Other languages
Chinese (zh)
Other versions
CN118760698A (en
Inventor
乐鹏
韦祎
李皓
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan University WHU
Original Assignee
Wuhan University WHU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan University WHU filed Critical Wuhan University WHU
Priority to CN202411240362.8A priority Critical patent/CN118760698B/en
Publication of CN118760698A publication Critical patent/CN118760698A/en
Application granted granted Critical
Publication of CN118760698B publication Critical patent/CN118760698B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/242Query formulation
    • G06F16/243Natural language query formulation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/221Column-oriented storage; Management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2452Query translation
    • G06F16/24522Translation of natural language queries to structured queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/02Knowledge representation; Symbolic representation
    • G06N5/022Knowledge engineering; Knowledge acquisition

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Evolutionary Computation (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本发明公开了一种基于时空知识立方体的自然语言查询方法与系统,包括基于关系数据表构建时空数据立方体,根据时空数据立方体,构建描述时空知识立方体的虚拟知识图谱,通过对用户的自然语言进行解析,利用时空知识立方体中的领域本体,拼接成大语言模型的提示工程,将提示工程输入大语言模型,大语言模型自动生成GeoSPARQL;在基于本体的数据访问系统(OBDA)中执行GeoSPARQL获取对应的数据。本发明所提方法简单易操作,提高了数据的可用性和价值,非专业用户使用本发明所提方法也能够实现快速准确的数据检索,特别适用于需要高度数据互操作性和深层数据理解的领域。

The present invention discloses a natural language query method and system based on spatiotemporal knowledge cube, including constructing a spatiotemporal data cube based on a relational data table, constructing a virtual knowledge graph describing the spatiotemporal knowledge cube according to the spatiotemporal data cube, parsing the user's natural language, using the domain ontology in the spatiotemporal knowledge cube, splicing a prompting project of a large language model, inputting the prompting project into the large language model, and automatically generating GeoSPARQL from the large language model; executing GeoSPARQL in an ontology-based data access system (OBDA) to obtain corresponding data. The method proposed by the present invention is simple and easy to operate, improves the availability and value of data, and non-professional users can also use the method proposed by the present invention to achieve fast and accurate data retrieval, which is particularly suitable for fields that require high data interoperability and deep data understanding.

Description

Natural language query method and system based on space-time knowledge cube
Technical Field
The invention belongs to the technical field of information, in particular relates to the technical field of large model natural language processing, intelligent data mining and analysis, and particularly relates to a natural language query method and system based on a space-time knowledge cube.
Background
In recent years, spatiotemporal data cubes are becoming popular data integration schemes for the earth observation field. The data structure can effectively organize, store and manage data with time and space dimensions, thereby supporting complex space-time data query and analysis. However, the ground observation data sources are various and comprise sensors, satellites, unmanned aerial vehicles, ground observation stations and the like, the data structure is complex, each source can adopt different data formats, resolutions and measurement standards, so that the integration and unified processing of the data become complex, in addition, a great deal of database technology and professional background knowledge are required for users to inquire the existing database system, and the interactivity is poor.
Virtual knowledge-graph is a method of creating a knowledge-graph using existing data sources without physically copying the data into a central repository. It allows users to directly query raw data sources distributed in different locations and different formats, such as relational databases, CSV files, web services, etc., through a graph query interface. Text2SQL is a Natural Language Processing (NLP) technology, and can convert natural language query into SQL query sentences, so that users without knowledge of the database query language can also make database query requests through natural language, and database query is more visual and easier. The Large Language Model (LLM) is a deep learning model trained based on massive text data, and not only can generate natural language text, but also can deeply understand text meaning and process various natural language tasks. Prompt engineering is applied to develop and optimize Prompt words (promts), helping users to effectively use large language models for various application scenarios and research fields.
Although the existing technologies such as virtual knowledge graph, text2SQL and large language model solve the problems of data integration and query to a certain extent, they still have some defects. First, the virtual knowledge graph can query data distributed at different locations, but the support for the query of complex spatiotemporal data is insufficient. Second, text2SQL, while making natural language queries more intuitive, has yet to be improved in accuracy when dealing with complex spatio-temporal data queries. The large language model has strong natural language processing capability, and the prompt engineering can optimize the application of the large language model, but the effect depends on the design of prompt words, and the performance of the large language model still needs to be improved when the large language model is subjected to integration and processing of multi-source heterogeneous data. Therefore, a solution is needed to realize intelligent integration and friendly interactive query of multidimensional data in the earth observation field.
Disclosure of Invention
Aiming at the defects of the prior art, the invention provides a natural language query scheme based on a space-time knowledge cube, wherein the space-time knowledge cube is used for describing and analyzing data in different time and space dimensions, and the meaning and the relevance of the data are enhanced through semantic expansion. And (3) introducing an intermediate ontology layer by utilizing the virtual knowledge graph, enriching the semantic relation between data, constructing a space-time knowledge cube, and processing complex data query in the space-time knowledge cube by utilizing a large language model to realize natural language query of multi-source space-time data and reduce a user query threshold.
In order to achieve the above object, the present invention provides a natural language query method based on a spatio-temporal knowledge cube, comprising the steps of:
step1, constructing a space-time data cube based on a relation data table;
Step 2, constructing a virtual knowledge graph describing the space-time knowledge cube according to the space-time data cube;
step 2.1, constructing a domain ontology according to a relation data table;
Step 2.2, representing the body structure in the form of triples;
step 2.3, defining ontology concepts, relationships and attributes of the space-time knowledge cube according to table names, fields and foreign keys of a relationship data table of the space-time data cube and referring to OGC standard GeoSPARQL, and forming an RDF graph;
Step 2.4, constructing a mapping model according to the RDF graph, mapping the data in the relational data table onto the concept and the attribute defined by the ontology of the space-time knowledge cube, and expressing by using a W3C standard R2 RML;
Step 2.5, taking the body structure of the space-time knowledge cube expressed in the form of the triplet of step 2.2 and the mapping model constructed according to the RDF diagram of step 2.4 as the virtual knowledge graph of the space-time knowledge cube;
step 3, based on the space-time knowledge cube, carrying out natural language data query by using a large language model;
step 3.1, performing entity recognition on natural language input by a user by using a large language model, and analyzing the required GeoSPARQL field information and calculation information;
Step 3.2, splicing into a prompt project required by a large language model by utilizing the domain ontology in the space-time knowledge cube;
step 3.3, based on the large language model, converting the natural language query into a corresponding GeoSPARQL query statement by using the selected prompt engineering;
and 3.4, executing the GeoSPARQL query statement in the OBDA system to acquire a data result.
Further, the space-time data cube in the step 1 is represented by a relational data table, including a product dimension table, a time dimension table, a space dimension table, a grid fact table and a vector fact table. The product dimension table contains 3 columns, the first column is the product ID, the second column is the product name, and the third column is the product type. The time dimension table contains 3 columns, the first column is the time ID, the second column is the observation time, and the third column is the update time. The space dimension table contains 4 columns, the first column is a space ID, the second column is a place name, the third column is a four-to-range, and the fourth column is an accurate range. The grid facts table contains 5 columns, the first column being the grid facts ID, the second column being the product ID, the third column being the time ID, the fourth column being the space ID, the fifth column being the grid data address. The vector fact table contains 5 columns, the first column is vector fact ID, the second column is product ID, the third column is time ID, the fourth column is space ID, and the fifth column is vector data address.
Grid fact tables and vector fact tables are associated with product dimension tables, time dimension tables, and space dimension tables. The product ID is the primary key of the product dimension table and is the foreign key with the product ID in the grid fact table and the vector fact table. The time ID is the primary key of the time dimension table and is the foreign key to the time ID in the grid fact table and the vector fact table. The space ID is the primary key of the space dimension table and is the foreign key to the space ID in the grid fact table and the vector fact table.
Further, the domain ontology core concept in the step 2.1 covers "time", "space", "measurement" and "dimension", where "time" and "space" form the basic dimension of the spatiotemporal data, and "measurement" refers to the value obtained by observation or calculation performed at a specific time and space point, and "dimension" is not limited to time and space, but can be extended to other analysis dimensions.
Further, in the triplet (O, C, R, P) in the step 2.2, O represents an ontology, C represents an ontology concept, R represents an ontology relationship, and P represents an ontology attribute.
Further, in the step 3.1, the natural language understanding capability of the large language model is utilized to extract important information elements in the query intention of the user, including entity names, attributes and association relations among the entities, and based on the extracted important information elements, an intermediate representation form is constructed to extract the relations among the entities, the attributes and specific grammar elements required by GeoSPARQL query.
Further, in the step 3.2, according to the ontology concept and the relationship in the space-time knowledge cube defined in the step 2.3, the prompt information applicable to the large language model is designed and organized, and the designed prompt information is integrated into the input of the large language model by utilizing the prediction capability of the large language model to form a complete prompt project.
Further, in the step 3.4, the virtual knowledge graph is accessed into the query OBDA system based on the ontology database through an API, and the data query is performed by utilizing GeoSPARQL query sentences.
The invention also provides a natural language query system based on the space-time knowledge cube, which is used for realizing the natural language query method based on the space-time knowledge cube.
Further, the system includes a processor and a memory, the memory for storing program instructions, the processor for invoking the stored instructions in the memory to perform a spatiotemporal knowledge cube based natural language query method as described above.
Or comprises a readable storage medium having stored thereon a computer program which, when executed, implements a spatiotemporal knowledge cube based natural language query method as described above.
Compared with the prior art, the invention has the following advantages:
1) The invention can simplify the query operation, and the user can query the space-time data only by the natural language without grasping the complex database query language by the natural language processing technology and the large language model.
2) And the multisource data integration is realized by effectively organizing, storing and managing data with time and space dimensions based on a data organization form of a space-time data cube and effectively integrating multisource heterogeneous data by utilizing a virtual knowledge graph technology.
3) And the intelligent data query is that the natural language query input by the user can be accurately analyzed by utilizing the ontology prompt and the large language model and converted into a corresponding structured query (GeoSPARQL) statement, so that the accurate query and analysis of the multidimensional data can be realized.
Drawings
In order to more clearly illustrate the invention or the technical solutions of the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described, and it is obvious that the drawings in the description below are some embodiments of the invention, and other drawings can be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of a natural language query method based on a spatiotemporal knowledge cube in accordance with an embodiment of the invention.
FIG. 2 is a schematic diagram of the ontology of a spatiotemporal knowledge cube defined in an embodiment of the present invention.
FIG. 3 is an RDF graph of an embodiment of the present invention.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the present invention more apparent, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings and examples of the present invention, and it is apparent that the described examples are some, but not all, examples of the present invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Example 1
As shown in fig. 1, an embodiment of the present invention provides a natural language query method based on a spatio-temporal knowledge cube, including the following steps:
And step1, constructing a space-time data cube based on the relation data table.
The spatiotemporal data cubes are represented in relational data tables, including product dimension tables, time dimension tables, space dimension tables, grid fact tables, and vector fact tables. The product dimension table contains 3 columns, the first column is the product ID, the second column is the product name, and the third column is the product type. The time dimension table contains 3 columns, the first column is the time ID, the second column is the observation time, and the third column is the update time. The space dimension table contains 4 columns, the first column is a space ID, the second column is a place name, the third column is a four-to-range, and the fourth column is an accurate range. The grid facts table contains 5 columns, the first column being the grid facts ID, the second column being the product ID, the third column being the time ID, the fourth column being the space ID, the fifth column being the grid data address. The vector fact table contains 5 columns, the first column is vector fact ID, the second column is product ID, the third column is time ID, the fourth column is space ID, and the fifth column is vector data address.
Grid fact tables and vector fact tables are associated with product dimension tables, time dimension tables, and space dimension tables. The product ID is the primary key of the product dimension table and is the foreign key with the product ID in the grid fact table and the vector fact table. The time ID is the primary key of the time dimension table and is the foreign key to the time ID in the grid fact table and the vector fact table. The space ID is the primary key of the space dimension table and is the foreign key to the space ID in the grid fact table and the vector fact table.
The five data table structures can be implemented in various relational database management systems, such as an Oracle Spatial database, a PostgreSQL database and the like. Taking PostgreSQL database schema as an example, the table information is as follows:
Table 1 spatiotemporal data cube metadata table
And 2, constructing a virtual knowledge graph describing the space-time knowledge cube according to the space-time data cube.
And 2.1, constructing a domain ontology according to the relation data table in the step 1.
Domain ontology core concepts cover "time", "space", "metrics" and "dimensions". Wherein "time" and "space" form the fundamental dimensions of spatiotemporal data, and "metric" refers to the observed or calculated values at specific points in time and space, and the "dimensions" are not limited to time and space, but can be extended to other analysis dimensions.
And 2.2, representing the body structure in the form of triples.
O in the triples (O, C, R, P) represents an ontology, C represents an ontology concept, R represents an ontology relationship, and P represents an ontology attribute.
The example of building a domain ontology from the relationship data table in step1 is as follows:
<http://www.semanticweb.org/geocube/ontologies/spatial-temporal-cube#Product>rdf:type owl:Class .
<http://www.semanticweb.org/geocube/ontologies/spatial-temporal-cube#Productname>rdf:type owl:DatatypeProperty ;
rdfs:domain :Product;
rdfs:range xsd:string .
<http://www.semanticweb.org/geocube/ontologies/spatial-temporal-cube#Productname>rdf:type owl:DatatypeProperty ;
rdfs:domain :Product;
rdfs:range xsd:string .
<http://www.semanticweb.org/geocube/ontologies/spatial-temporal-cube#Observation>rdf:type owl:DatatypeProperty ;
rdfs:domain :Product ;
rdfs:range xsd:dateTime .
<http://www.semanticweb.org/geocube/ontologies/spatial-temporal-cube#Raster>rdf:type owl:Class .
<http://www.semanticweb.org/geocube/ontologies/spatial-temporal-cube#belongsToProduct>rdf:type owl:ObjectProperty;
rdfs:domain : Raster;
rdfs:range :Product .
With ontology < http:// www.semanticweb.org/geocube/ontologies/spatial-temporal-cube #)
Product > rdf: type owl: class. As an example, ontology O is http:// www.semanticweb.org/geocube
Ontologies/spatial-temporal-cube # Product, ontology concept C is Product, ontology R is rdf: type, ontology property is owl: class, whole ontology meaning is Product is a Class (Class).
Step 2.3, according to table names, fields and foreign keys of the relational data table of the spatiotemporal data cube, referring to OGC standard (Open Geospatial Consortium, open geographic space information alliance standard) GeoSPARQLA (Geographic Query Language for RDF Data, geographic query language of RDF data), defining ontology concepts, relations and attributes of the spatiotemporal knowledge cube, and forming RDF diagram (Resource Description Framework, resource description framework diagram).
In this embodiment, according to table names, fields and foreign keys of the relational data table of the spatiotemporal data cube, referring to OGC standard GeoSPARQL, the ontology relationship of the spatiotemporal knowledge cube is defined as shown in fig. 2, and the established RDF diagram is shown in fig. 3.
Step 2.4, constructing a mapping model according to the RDF graph, mapping the data in the relational data table onto the concepts and attributes defined by the ontology of the space-time knowledge cube, and expressing the data by using the W3C standard (World Wide Web Consortium ) R2RML (RDB to RDF MAPPING Language, RDB to RDF mapping Language).
Examples of building a map from RDF graphs and expressing using the W3C standard R2RML are as follows:
@prefix rr:<http://www.w3.org/ns/r2rml#>.
@prefix rml:<http://semweb.mmlab.be/ns/rml#>.
@prefix ql:<http://semweb.mmlab.be/ns/ql#>.
@prefix xsd:<http://www.w3.org/2001/XMLSchema#>.
@prefix stc:<http://www.semanticweb.org/geocube/ontologies/spatial -temporal -cube #>.
@prefix geo:<http://www.opengis.net/ont/geosparql#>.
<#RasterMapping>
rr:logicalTable [
rr:sqlQuery """
SELECT r.id AS id_rs, p.id AS id_pr, ST_AsText(s.geom) AS geom
FROM "Raster" r, "Spatial" s, "Product" p
WHERE r.spatial_key = s.key
AND r.product_key = p.key
"""
] ;
rr:subjectMap [
rr:template "http://www.semanticweb.org/geocube/ontologies/spatial-temporal-cube/raster-{id_rs}" ;
rr:class :Raster
] ;
rr:predicateObjectMap [
rr:predicate :belongsToProduct ;
rr:objectMap [
rr:template
"http://www.semanticweb.org/geocube/ontologies/spatial-temporal-cube/product-{id_pr}" ;
]
] ;
rr:predicateObjectMap [
rr:predicate geo:hasGeometry ;
rr:objectMap [
rr:template
"http://www.semanticweb.org/geocube/ontologies/spatial-temporal-cube/geom-{id_rs}" ;
]
] .
<#GeometryMapping>
rr:logicalTable [
rr:sqlQuery """
SELECT r.id AS id_rs, ST_AsText(s.geom) AS geom
FROM "Raster" r, "Spatial" s
WHERE r.spatial_key = s.key
"""
] ;
rr:subjectMap [
rr:template "http://www.semanticweb.org/geocube/ontologies/spatial-temporal-cube/geom-{id_rs}" ;
rr:class geo:Geometry
] ;
rr:predicateObjectMap [
rr:predicate geo:asWKT ;
rr:objectMap [
rr:column "geom" ;
rr:datatype geo:wktLiteral
]
] .
and 2.5, taking the body structure of the space-time knowledge cube expressed in the form of the triplet of the step 2.2 and the mapping model constructed according to the RDF diagram in the step 2.4 as a virtual knowledge graph of the space-time knowledge cube.
And 3, based on the space-time knowledge cube, carrying out natural language data query by using a large language model.
And 3.1, performing entity recognition on natural language input by a user by using a large language model, and analyzing the required GeoSPARQL field information and calculation information.
The method comprises the steps of extracting important information elements in user query intention, including entity names, attributes and association relations among entities, constructing an intermediate representation form based on the extracted important information elements, and extracting GeoSPARQL field information (relations among the entities and attributes) and calculation information (GeoSPARQL specific grammar elements required by query) by utilizing natural language understanding capability of a large language model.
And 3.2, splicing prompt engineering required by the large language model by utilizing the domain ontology in the space-time knowledge cube.
According to the ontology concepts and relations in the space-time knowledge cube defined in the step 2.3, the prompt information applicable to the large language model is designed and organized, and the designed prompt information is integrated into the input of the large language model by utilizing the prediction capability of the large language model to form a complete prompt project.
And 3.3, based on the large language model, converting the natural language query into a corresponding GeoSPARQL query statement by using the selected prompt engineering.
Step 3.4, executing the GeoSPARQL query statement in OBDA (ontologiy-baseddataaccess, ontology-based data access) system to obtain a data result.
And accessing the virtual knowledge graph into the OBDA system through the API, and carrying out data query by utilizing GeoSPARQL sentences.
Taking the natural language "how many grid images are in the martial arts" input by the user as an example, the important information elements related in the natural language query sentence are sequentially analyzed according to rules:
(1) Query field (SELECT) all fields ) Accurate range (geom)
(2) The related table (WHERE) is a grid image (Raster), a space dimension table (Spatial), and a Product dimension table (Product)
(3) Screening (Filter) intersection (st_ intersects)
SPARQL field information is "master" and calculation information is "SELECT", "WHERE", "FILTER", "COUNT".
The information is spliced into a prompt project, and the prompt project is obtained as follows:
"now a text2sparql task, please convert the user question into GeoSPARQL query statement according to the following ontology information. The ontology is { RDF ontology }, and the problem is { natural language problem }. "
According to the prompt engineering above, a large language model technique is used to generate GeoSPARQL query statements, resulting in GeoSPARQL as follows:
PREFIX geo:<http://www.opengis.net/ont/geosparql#>
PREFIX geof:<http://www.opengis.net/def/function/geosparql/>
PREFIX stc:<http://www.semanticweb.org/geocube/ontologies/spatial-temporal
-cube#>
SELECT (COUNT(?image) AS ?imageCount)
WHERE {
?image a stc:Raster;
geo:hasGeometry ? image_geom.
?image_geom geo:asWKT ?image_wkt.
?wuhan_region a geo:Geometry;
geo asWKT region_wkt.# wuhan _region is a geographic entity representing the geographic scope of the martial arts generated by a large language model
FILTER(geof:sfIntersects(?image_wkt, ?region_wkt))
}
In OBDA system, execute the GeoSPARQL query sentence to obtain the data needed by the end user, wherein the total number of grid data of Wuhan city is 126
Example 2
Based on the same inventive concept, the invention also provides a natural language query system based on the space-time knowledge cube, which comprises a processor and a memory, wherein the memory is used for storing program instructions, and the processor is used for calling the program instructions in the memory to execute the natural language query method based on the space-time knowledge cube.
Example 3
Based on the same inventive concept, the invention also provides a natural language query system based on the space-time knowledge cube, which comprises a readable storage medium, wherein the readable storage medium is stored with a computer program, and the computer program realizes the natural language query method based on the space-time knowledge cube when being executed.
In particular, the method according to the technical solution of the present invention may be implemented by those skilled in the art using computer software technology to implement an automatic operation flow, and a system apparatus for implementing the method, such as a computer readable storage medium storing a corresponding computer program according to the technical solution of the present invention, and a computer device including the operation of the corresponding computer program, should also fall within the protection scope of the present invention.
The specific embodiments described herein are offered by way of example only to illustrate the spirit of the invention. Those skilled in the art may make various modifications or additions to the described embodiments or substitutions thereof without departing from the spirit of the invention or exceeding the scope of the invention as defined in the accompanying claims.

Claims (7)

1.一种基于时空知识立方体的自然语言查询方法,其特征在于,包括以下步骤:1. A natural language query method based on spatiotemporal knowledge cube, characterized by comprising the following steps: 步骤1,基于关系数据表构建时空数据立方体;Step 1, constructing a spatiotemporal data cube based on the relational data table; 时空数据立方体以关系数据表来表示,包括产品维度表、时间维度表、空间维度表、栅格事实表和矢量事实表;The spatiotemporal data cube is represented by relational data tables, including product dimension table, time dimension table, space dimension table, raster fact table and vector fact table; 产品维度表包含3列,第一列为产品ID,第二列为产品名称,第三列为产品类型;时间维度表包含3列,第一列为时间ID,第二列为观测时间,第三列为更新时间;The product dimension table contains 3 columns, the first column is the product ID, the second column is the product name, and the third column is the product type; the time dimension table contains 3 columns, the first column is the time ID, the second column is the observation time, and the third column is the update time; 空间维度表包含4列,第一列为空间ID,第二列为地名,第三列为四至范围,第四列为精确范围;栅格事实表包含5列,第一列为栅格事实ID,第二列为产品ID,第三列为时间ID,第四列为空间ID,第五列为栅格数据地址;The spatial dimension table contains 4 columns, the first column is the spatial ID, the second column is the place name, the third column is the four-dimensional range, and the fourth column is the precise range; the raster fact table contains 5 columns, the first column is the raster fact ID, the second column is the product ID, the third column is the time ID, the fourth column is the spatial ID, and the fifth column is the raster data address; 矢量事实表包含5列,第一列为矢量事实ID,第二列为产品ID,第三列为时间ID,第四列为空间ID,第五列为矢量数据地址;The vector fact table contains 5 columns, the first column is the vector fact ID, the second column is the product ID, the third column is the time ID, the fourth column is the space ID, and the fifth column is the vector data address; 步骤2,根据时空数据立方体,构建描述时空知识立方体的虚拟知识图谱;Step 2: Based on the spatiotemporal data cube, a virtual knowledge graph describing the spatiotemporal knowledge cube is constructed; 步骤2.1,根据关系数据表构建领域本体;Step 2.1, construct domain ontology based on relational data tables; 步骤2.2,以三元组的形式表示本体结构;Step 2.2, express the ontology structure in the form of triples; 三元组(O, C, R, P)中O表示本体,C表示本体概念,R表示本体关系,P表示本体属性;In the triple (O, C, R, P), O represents the ontology, C represents the ontology concept, R represents the ontology relationship, and P represents the ontology attribute; 步骤2.3,根据时空数据立方体的关系数据表的表名、字段和外键,参照OGC标准GeoSPARQL,定义时空知识立方体的本体概念、关系与属性,并组成RDF图;Step 2.3, according to the table name, field and foreign key of the relational data table of the spatiotemporal data cube, refer to the OGC standard GeoSPARQL, define the ontology concepts, relations and attributes of the spatiotemporal knowledge cube, and form an RDF graph; 步骤2.4,根据RDF图构建映射模型,将关系数据表中的数据映射到时空知识立方体的本体定义的概念和属性上,并使用W3C标准R2RML进行表达;Step 2.4, construct a mapping model based on the RDF graph, map the data in the relational data table to the concepts and attributes defined in the ontology of the spatiotemporal knowledge cube, and express it using the W3C standard R2RML; 步骤2.5,将步骤2.2三元组形式表示的时空知识立方体的本体结构和步骤2.4根据RDF图构建的映射模型作为时空知识立方体的虚拟知识图谱;Step 2.5, using the ontology structure of the spatiotemporal knowledge cube represented in the triple form in step 2.2 and the mapping model constructed according to the RDF graph in step 2.4 as the virtual knowledge graph of the spatiotemporal knowledge cube; 步骤3,基于时空知识立方体,利用大语言模型进行自然语言的数据查询;Step 3: Based on the spatiotemporal knowledge cube, use the large language model to perform natural language data query; 步骤3.1,利用大语言模型对用户输入的自然语言进行实体识别,解析出所需要的GeoSPARQL字段信息以及计算信息;Step 3.1, use the large language model to perform entity recognition on the natural language input by the user, and parse out the required GeoSPARQL field information and calculation information; 步骤3.2,利用时空知识立方体中的领域本体,拼接成大语言模型需要的提示工程;Step 3.2: Use the domain ontology in the spatiotemporal knowledge cube to assemble the prompt engineering required by the large language model; 步骤3.3,基于大语言模型,利用选定的提示工程将自然语言查询转换为对应的GeoSPARQL查询语句;Step 3.3, based on the large language model, the natural language query is converted into a corresponding GeoSPARQL query statement using the selected hint engineering; 步骤3.4,在OBDA系统中执行该GeoSPARQL查询语句,获取数据结果;Step 3.4, execute the GeoSPARQL query statement in the OBDA system to obtain data results; 通过API将虚拟知识图谱接入到基于本体数据库查询OBDA系统中,利用GeoSPARQL语句进行数据查询。The virtual knowledge graph is connected to the ontology database query OBDA system through API, and GeoSPARQL statements are used to query data. 2.如权利要求1所述的一种基于时空知识立方体的自然语言查询方法,其特征在于:步骤1中栅格事实表和矢量事实表与产品维度表、时间维度表、空间维度表相关联,产品ID为产品维度表的主键并与栅格事实表和矢量事实表中的产品ID互为外键,时间ID为时间维度表的主键并与栅格事实表和矢量事实表中的时间ID互为外键,空间ID为空间维度表的主键并与栅格事实表和矢量事实表中的空间ID互为外键。2. A natural language query method based on a spatiotemporal knowledge cube as described in claim 1, characterized in that: in step 1, the raster fact table and the vector fact table are associated with the product dimension table, the time dimension table, and the space dimension table, the product ID is the primary key of the product dimension table and is a foreign key to the product ID in the raster fact table and the vector fact table, the time ID is the primary key of the time dimension table and is a foreign key to the time ID in the raster fact table and the vector fact table, and the space ID is the primary key of the space dimension table and is a foreign key to the space ID in the raster fact table and the vector fact table. 3.如权利要求1所述的一种基于时空知识立方体的自然语言查询方法,其特征在于:步骤2.1中领域本体核心概念涵盖“时间”、“空间”、“度量”和“维度”,其中,“时间”和“空间”构成时空数据的基础维度,“度量”指在特定时间和空间点进行的观测或计算得出的数值,“维度”不局限于时间和空间,还可以扩展到其他的分析维度。3. A natural language query method based on a spatiotemporal knowledge cube as described in claim 1, characterized in that: the core concepts of the domain ontology in step 2.1 include "time", "space", "measurement" and "dimension", among which "time" and "space" constitute the basic dimensions of spatiotemporal data, "measurement" refers to the numerical value obtained by observation or calculation at a specific time and space point, and "dimension" is not limited to time and space, but can also be extended to other analysis dimensions. 4.如权利要求1所述的一种基于时空知识立方体的自然语言查询方法,其特征在于:步骤3.1中利用大语言模型的自然语言理解能力,提取出用户查询意图中的重要信息元素,包括实体名称、属性以及实体间的关联关系,基于提取出的重要信息元素,构建中间表示形式,提取出实体间的关系、属性以及GeoSPARQL查询所需的特定语法元素。4. A natural language query method based on a spatiotemporal knowledge cube as described in claim 1, characterized in that: in step 3.1, the natural language understanding ability of the large language model is used to extract important information elements in the user's query intention, including entity names, attributes, and associations between entities, and based on the extracted important information elements, an intermediate representation is constructed to extract the relationships and attributes between entities and the specific grammatical elements required for GeoSPARQL queries. 5.如权利要求1所述的一种基于时空知识立方体的自然语言查询方法,其特征在于:步骤3.2中根据步骤2.3定义的时空知识立方体中的本体概念及关系,设计和组织适用于大语言模型的提示信息,利用大语言模型的预测能力,将设计的提示信息整合进大语言模型的输入中,形成完整的提示工程。5. A natural language query method based on a spatiotemporal knowledge cube as described in claim 1, characterized in that: in step 3.2, prompt information suitable for a large language model is designed and organized according to the ontological concepts and relationships in the spatiotemporal knowledge cube defined in step 2.3, and the designed prompt information is integrated into the input of the large language model by utilizing the predictive ability of the large language model to form a complete prompt project. 6.一种基于时空知识立方体的自然语言查询系统,其特征在于,包括处理器和存储器,存储器用于存储程序指令,处理器用于调用存储器中的程序指令执行如权利要求1-5任一项所述的一种基于时空知识立方体的自然语言查询方法。6. A natural language query system based on a spatiotemporal knowledge cube, characterized in that it includes a processor and a memory, the memory is used to store program instructions, and the processor is used to call the program instructions in the memory to execute a natural language query method based on a spatiotemporal knowledge cube as described in any one of claims 1-5. 7.一种基于时空知识立方体的自然语言查询系统,其特征在于,包括可读存储介质,所述可读存储介质上存储有计算机程序,所述计算机程序执行时,实现如权利要求1-5任一项所述的一种基于时空知识立方体的自然语言查询方法。7. A natural language query system based on a spatiotemporal knowledge cube, characterized in that it includes a readable storage medium, on which a computer program is stored. When the computer program is executed, it implements a natural language query method based on a spatiotemporal knowledge cube as described in any one of claims 1 to 5.
CN202411240362.8A 2024-09-05 2024-09-05 A natural language query method and system based on spatiotemporal knowledge cube Active CN118760698B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202411240362.8A CN118760698B (en) 2024-09-05 2024-09-05 A natural language query method and system based on spatiotemporal knowledge cube

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202411240362.8A CN118760698B (en) 2024-09-05 2024-09-05 A natural language query method and system based on spatiotemporal knowledge cube

Publications (2)

Publication Number Publication Date
CN118760698A CN118760698A (en) 2024-10-11
CN118760698B true CN118760698B (en) 2024-12-27

Family

ID=92949489

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202411240362.8A Active CN118760698B (en) 2024-09-05 2024-09-05 A natural language query method and system based on spatiotemporal knowledge cube

Country Status (1)

Country Link
CN (1) CN118760698B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114860884A (en) * 2022-05-23 2022-08-05 中国科学院空天信息创新研究院 Dynamic analysis-oriented spatio-temporal knowledge graph construction system and method
CN118364073A (en) * 2024-04-09 2024-07-19 中国科学院计算机网络信息中心 Distributed RDF data semantic retrieval method and system based on large model

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5799300A (en) * 1996-12-12 1998-08-25 International Business Machines Corporations Method and system for performing range-sum queries on a data cube
CN108885634B (en) * 2016-10-24 2022-09-09 北京亚控科技发展有限公司 Retrieval method for data object based on space-time database
CN112181980B (en) * 2020-09-16 2024-02-02 武汉大学 Large-scale analysis-oriented space-time big data cube organization method and system
CN113282584B (en) * 2021-05-28 2022-05-03 福州大学 A method and system for retrieving space-time cube data of Earth observation images
CN113505234B (en) * 2021-06-07 2023-11-21 中国科学院地理科学与资源研究所 Construction method of ecological civilized geographic knowledge graph
CN114547168B (en) * 2022-01-27 2022-09-20 大连理工大学 Fine chemical engineering safety production data fusion and reconstruction method based on virtual knowledge graph
CN117196029A (en) * 2023-09-15 2023-12-08 陕西师范大学 A real-time interactive extreme climate disaster event correlation mining method

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114860884A (en) * 2022-05-23 2022-08-05 中国科学院空天信息创新研究院 Dynamic analysis-oriented spatio-temporal knowledge graph construction system and method
CN118364073A (en) * 2024-04-09 2024-07-19 中国科学院计算机网络信息中心 Distributed RDF data semantic retrieval method and system based on large model

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
GeoCube:面向大规模分析的多源对地观测时空立方体;高凡;遥感学报;20220630;第1051-1065页 *

Also Published As

Publication number Publication date
CN118760698A (en) 2024-10-11

Similar Documents

Publication Publication Date Title
CN107391677B (en) Method and device for generating Chinese general knowledge graph with entity relation attributes
US8489649B2 (en) Extensible RDF databases
KR101661198B1 (en) Method and system for searching by using natural language query
CN104850601B (en) Police service based on chart database analyzes application platform and its construction method in real time
Fionda et al. NautiLOD: A formal language for the web of data graph
CN110716952A (en) Multi-source heterogeneous data processing method and device and storage medium
CN107491476B (en) Data model conversion and query analysis method suitable for various big data management systems
CN107515887A (en) An interactive query method suitable for various big data management systems
CN113094449B (en) Large-scale knowledge map storage method based on distributed key value library
Yamaguchi et al. An Intelligent SPARQL Query Builder for Exploration of Various Life-science Databases.
CN116108194A (en) Knowledge graph-based search engine method, system, storage medium and electronic equipment
Mugnier et al. Ontology-mediated queries for NOSQL databases
Hazber et al. A survey: Transformation for integrating relational database with semantic Web
CN111475534B (en) Data query method and related equipment
Vinasco-Alvarez et al. From citygml to owl
Ma et al. Modeling and querying temporal RDF knowledge graphs with relational databases
CN118760698B (en) A natural language query method and system based on spatiotemporal knowledge cube
Kivikangas et al. Improving semantic queries by utilizing UNL ontology and a graph database
CN115952203B (en) Data query method, device, system and storage medium
KR101897760B1 (en) A system of converting and storing triple for linked open data cloud information service and a method thereof
Li et al. Research on storage method for fuzzy RDF graph based on Neo4j
CN116226421A (en) Multi-mode metadata characterization method and system based on graph neural network
Bhogal et al. Towards object-oriented context modeling: Object-oriented relational database data storage
Zhang et al. Research on SPARQL Semantic Query Technology Based on Knowledge Hybrid Storage
Tran et al. A web interface for exploiting spatio-temporal heterogeneous data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant