[go: up one dir, main page]

CN119829641A - Fuzzy retrieval method and device for encrypted field of database - Google Patents

Fuzzy retrieval method and device for encrypted field of database Download PDF

Info

Publication number
CN119829641A
CN119829641A CN202411773990.2A CN202411773990A CN119829641A CN 119829641 A CN119829641 A CN 119829641A CN 202411773990 A CN202411773990 A CN 202411773990A CN 119829641 A CN119829641 A CN 119829641A
Authority
CN
China
Prior art keywords
queried
index value
target
database
speech tagging
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202411773990.2A
Other languages
Chinese (zh)
Inventor
张艳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Telecom Cloud Technology Co Ltd
Original Assignee
China Telecom Cloud Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Telecom Cloud Technology Co Ltd filed Critical China Telecom Cloud Technology Co Ltd
Priority to CN202411773990.2A priority Critical patent/CN119829641A/en
Publication of CN119829641A publication Critical patent/CN119829641A/en
Pending legal-status Critical Current

Links

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to the technical field of network security, and discloses a fuzzy retrieval method and device for encryption fields of a database. The method comprises the steps of intercepting an execution statement of a database, carrying out part-of-speech tagging on a target encryption field to determine a target index value corresponding to the target encryption field when the target encryption field is determined to be inserted or updated according to the execution statement, carrying out part-of-speech tagging on a value to be queried to determine the index value to be queried corresponding to the value to be queried when the target encryption field is determined to be queried according to the execution statement, and carrying out matching on the target index value and the index value to be queried in the database based on a preset operator to determine a fuzzy retrieval result corresponding to the index value to be queried in the target index value. When the method provided by the embodiment of the disclosure is used for fuzzy search, the search efficiency can be improved only by matching the generated encryption index value.

Description

Fuzzy retrieval method and device for encrypted field of database
Technical Field
The disclosure relates to the technical field of network security, in particular to a fuzzy retrieval method and device for encryption fields of a database.
Background
Many companies currently encrypt sensitive fields in databases for network security requirements. Since the encrypted data cannot be directly compared with text, the encrypted field cannot be subjected to fuzzy matching through like query.
In the related art, the plaintext content of the encrypted field is often divided into specific lengths, and then each divided character string is individually encrypted and combined into ciphertext. When searching, the same segmentation and encryption processing are carried out on the search character string, and the generated encrypted result is subjected to fuzzy matching with ciphertext in the database.
However, the method adopted in the related art may cause an increase in the index length of the subsequent encryption field, thereby affecting the efficiency of the subsequent retrieval. How to improve the efficiency of fuzzy search of encrypted fields in a database is a problem to be solved.
Disclosure of Invention
In view of this, the present disclosure provides a method and an apparatus for fuzzy search of encrypted fields in a database, so as to solve the problem of how to improve the efficiency of fuzzy search of encrypted fields in the database.
The method comprises the steps of intercepting an execution statement of a database, performing part-of-speech labeling on a target encryption field to determine a target index value corresponding to the target encryption field when the target encryption field is determined to be inserted or updated according to the execution statement, performing part-of-speech labeling on a value to be queried to determine the index value to be queried corresponding to the value to be queried when the target encryption field is determined to be queried according to the execution statement, matching the target index value with the index value to be queried in the database based on a preset operator, and determining a fuzzy retrieval result corresponding to the index value to be queried in the target index value, wherein the fuzzy retrieval result is intersection or union.
The device comprises a first part-of-speech tagging module, a second part-of-speech tagging module and a fuzzy search module, wherein the first part-of-speech tagging module is used for intercepting an execution statement of a database, when the target encryption field is determined to be inserted or updated according to the execution statement, part-of-speech tagging processing is carried out on the target encryption field, a target index value corresponding to the target encryption field is determined, when the target encryption field is determined to be queried according to the execution statement, part-of-speech tagging processing is carried out on a value to be queried, a value to be queried corresponding to the value to be queried is determined, and the fuzzy search module is used for matching the target index value with the value to be queried in the database based on a preset operator, and determining a fuzzy search result corresponding to the index value to be queried in the target index value, wherein the fuzzy search result is an intersection or a union.
The disclosure also provides a computer readable storage medium, wherein the computer readable storage medium stores computer instructions for causing a computer to implement the method for fuzzy search of the encrypted field of the database.
Another aspect of the present disclosure also provides a computer program product comprising computer instructions for causing a computer to perform the above-described method of fuzzy retrieval of a database encryption field.
According to the fuzzy search method and device for the encrypted field of the database, through part-of-speech annotation of the target encrypted field and the value to be queried and generation of the index value, complex query operation on encrypted data can be avoided, when fuzzy search is performed, only the generated encrypted index value is required to be matched, so that search efficiency is improved, and particularly when a large amount of data is involved, high calculation cost is avoided.
In addition, the above embodiments of the present disclosure support intersection or union matching of the target index value and the query index value based on the preset operator, so that different service requirements can be flexibly satisfied.
Drawings
In order to more clearly illustrate the embodiments of the present disclosure or the technical solutions in the related art, the drawings that are required to be used in the description of the embodiments or the related art will be briefly described below, and it is apparent that the drawings in the following description are some embodiments of the present disclosure, and other drawings may be obtained according to the drawings without inventive effort to those of ordinary skill in the art.
Fig. 1 is a schematic flow chart of a fuzzy search method for encrypted fields of a database according to an embodiment of the present disclosure;
FIG. 2 is a schematic diagram of a portion of a CTB part-of-speech tagging set of a method for fuzzy retrieval of encrypted fields of a database according to an embodiment of the present disclosure;
Fig. 3 is a schematic diagram of a target index value construction flow of a fuzzy search method for encrypted fields of a database according to an embodiment of the present disclosure;
fig. 4 is a schematic diagram of a fuzzy search result obtaining flow of a fuzzy search method for encrypted fields of a database according to an embodiment of the present disclosure;
Fig. 5 is a schematic structural diagram of a fuzzy search device for encrypted fields of a database according to an embodiment of the present disclosure;
Fig. 6 is a schematic structural diagram of another database encryption field fuzzy retrieval device according to an embodiment of the present disclosure.
Detailed Description
Many companies currently encrypt sensitive fields in databases for network security requirements. Since the encrypted data cannot be directly compared with text, the encrypted field cannot be subjected to fuzzy matching through like query. For example, the value stored in the database after encryption of "communication operation" is "OTQ50 0LVZrKb3ZcxY", and the record cannot be matched by searching through "communication" and "communication" is "NyAW0 LVZ".
In the first related art, the plaintext content of the encrypted field is often divided into pieces of a specific length, and then each of the divided strings is individually encrypted and combined into ciphertext. When searching, the same segmentation and encryption processing are carried out on the search character string, and the generated encrypted result is subjected to fuzzy matching with ciphertext in the database.
In the second related art, similarly to the first related art, the most important difference is that it emphasizes that a Counter Mode (GCM) manner is adopted to obtain a hash value using symmetric encryption, and a field-encrypted value and an index value are stored in the same field and then matched in an application layer.
The above related art may have the following problems:
1. The plaintext is divided using a specific length, and if the division is performed with a length of 2, the character string with a length of n is divided into a length of 2 (n-1). Taking a hash of a 4-bit character as an example, the length of the final encryption field index is 8 (n-1). This increases the length of the subsequent encryption field index. But also the efficiency of subsequent retrieval.
2. The Hash value of the 4-bit character is used for representing the 2-bit segmented character, instead of the original ciphertext, the Hash conflict exists, so that the searching result is inaccurate, if the Hash value is used for representing the 2-bit segmented character, the situation of the Hash conflict can be reduced, but the index value of the encrypted field is extremely long, and the space is occupied. And the hash algorithm is too simple and has the risk of being reversely decrypted.
3. Since the retrieval result of the union may be inaccurate after the fixed-length segmentation is used, limitations of system functions are caused. Therefore, the related art described above only supports intersection-based search, and cannot support union search.
In order to solve the problems, various embodiments of the present disclosure provide a fuzzy search method for an encrypted field of a database, which includes intercepting an execution statement of the database, performing part-of-speech labeling on a target encrypted field to determine a target index value corresponding to the target encrypted field when the target encrypted field is being inserted or updated according to the execution statement, performing part-of-speech labeling on a value to be queried to determine the index value to be queried corresponding to the value to be queried when the target encrypted field is being queried according to the execution statement, matching the target index value with the index value to be queried in the database based on a preset operator, and determining a fuzzy search result corresponding to the index value to be queried in the target index value, wherein the fuzzy search result is an intersection or a union.
For the purposes of making the objects, technical solutions and advantages of the embodiments of the present disclosure more apparent, the technical solutions of the embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present disclosure, and it is apparent that the described embodiments are some embodiments of the present disclosure, but not all embodiments. Based on the embodiments in this disclosure, all other embodiments that a person skilled in the art would obtain without making any inventive effort are within the scope of protection of this disclosure.
Referring to fig. 1, fig. 1 is a flowchart of a method for fuzzy search of encrypted fields in a database according to an embodiment of the disclosure, where the method includes the following steps:
Step S101, intercepting an execution statement of a database, and when the fact that the target encryption field is being inserted or updated is determined according to the execution statement, performing part-of-speech tagging on the target encryption field to determine a target index value corresponding to the target encryption field.
In this embodiment, the execution statements of the database may be SQL statements that are processed by the database management system, which may include, but are not limited to, insert, update, delete, or query operations. Here, the method for intercepting the execution statement of the database may include, but is not limited to, database triggers, database middleware, application layer interception.
Part-of-speech tagging can be a text data processing technology for tagging parts of speech of words in a corpus according to meaning and context content by using a part-of-speech tagging algorithm based on a part-of-speech tagging set. The purpose of part-of-speech tagging may be to identify grammatical roles of each word, e.g., nouns, verbs, adjectives, so that the system understands the grammatical structure and meaning of the text to be queried.
Here, the part-of-speech tagging set defines all possible grammar classes, and the part-of-speech tagging set may include one of Chinese data (CTB), beijing university corpus (PKU, peking University Corpus), 863 plan corpus. As shown in fig. 2, fig. 2 is a schematic diagram of a part of speech tagging set of CTB of a method for fuzzy retrieval of encrypted fields of a database according to an embodiment of the present disclosure.
The part-of-speech tagging algorithm may be a specific algorithm or model for automatically assigning part-of-speech tags.
As one example, the part-of-speech tagging algorithm may include at least one of hidden Markov model part-of-speech tagging, perceptron part-of-speech tagging, and deep learning part-of-speech tagging.
Step S102, when determining that the target encryption field is being queried according to the execution statement, performing part-of-speech tagging on the value to be queried, and determining the index value to be queried corresponding to the value to be queried.
In this embodiment, the term to be queried may refer to a field or a term in a query condition for retrieving an encrypted field, which is input by a user when performing a database query.
As an example, the encrypted value of "communication operation" is "OTQ50LVZrKb3ZcxY", the "communication operation" is a target encryption field, the search is performed through "communication", and "communication" is a word to be queried.
Step S103, based on the preset operator, matching the target index value with the index value to be queried in the database, and determining a fuzzy retrieval result corresponding to the index value to be queried in the target index value.
In this embodiment, the fuzzy search result is an intersection or a union. The preset operators may include an intersection operator and a union operator.
Based on the intersection operator, all target index value records containing index values to be queried are retrieved and their intersection is found, i.e. the records are returned only if the target index value and the index value to be queried have the same part. Based on the union operator, records of all target index values that have a partial match with the index value to be queried, whether they are fully or partially matched, are retrieved.
Here, the selection of the intersection operator or the union operator may be specifically set according to the need, and is not limited herein.
In the method and the device for fuzzy retrieval of the encrypted field of the database, the text data is converted into a keyword form which is easier to process by marking the parts of speech of the encrypted field and the query field, so that the accuracy and the flexibility of fuzzy retrieval are improved. According to the scheme, the index value is generated, so that fuzzy retrieval can be performed even if the field value is encrypted, direct access to an original plaintext is avoided, and the retrieval process is accelerated through the index. By generating an index for the encryption field and preprocessing the retrieval value according to the part-of-speech tags, the direct complex query operation on the encryption data can be avoided, and the retrieval efficiency is improved. The index can be flexibly generated, the hash collision problem caused by the fixed hash length is reduced, and the accuracy of the search result is ensured through accurate index generation.
In one possible implementation manner of the step S101, intercepting an execution statement of the database, and performing part-of-speech tagging on the target encryption field when it is determined that the target encryption field is being inserted or updated according to the execution statement includes:
intercepting SQL execution sentences of the database, and when the database is determined to be executing insert sentences or update sentences according to the SQL execution sentences, performing part-of-speech tagging on the target encryption field to determine a first part-of-speech tagging result set.
In this embodiment, insert statements or update statements are intercepted as the database executes, thereby capturing SQL requests containing encrypted fields. Wherein, insert statement is used for inserting new record in database, update statement is used for updating existing record in database.
As an example, part-of-speech tagging is performed on the value P of the encrypted field F, to obtain a part-of-speech tagging result set POS.
For example, assume that the value p= "country a" of the encryption field F will provide emergency humanitarian assistance to country B for solving the urgent needs of country B food and medical, etc. "part-of-speech tagging of the value P is performed to determine a first part-of-speech tagging result set POS= { A country/NN, will/AD, to/P, B country/NR, provide/VV, urgent/JJ,/DEG, humane/NN, aid/NN,/PU, for/VV, solve/VV, B country/NR, food/NN, and/CC, medical/NN, etc., urgent need/NN. /PU.
In the method and the device for fuzzy retrieval of the encrypted field of the database, the grammar structure and the vocabulary role in the field can be better understood by marking the parts of speech of the value of the target encrypted field, so that more accurate matching is performed in the subsequent fuzzy retrieval process. By marking the parts of speech and encrypting the encryption field, the sensitive information is ensured not to be exposed in a plaintext form, and the privacy protection of the database is enhanced.
In one possible implementation manner of the step S101, the determining the target index value corresponding to the target encryption field is implemented based on the following steps:
adding a corresponding target index field for the target encryption field in the database;
Screening the first part-of-speech tagging result set according to a preset screening standard to determine a second part-of-speech tagging result set;
Performing de-duplication processing on the second part-of-speech tagging result set, determining a third part-of-speech tagging result set, performing AES/CFB/NoPadding encryption and character string conversion on at least one word in the third part-of-speech tagging result set, and determining a fourth part-of-speech tagging result set;
And sequencing and splicing the fourth part-of-speech tagging result set, determining a target index value corresponding to the target encryption field, and storing the target index value into the target index field corresponding to the target encryption field.
In this embodiment, adding a corresponding target index field to the target encryption field in the database may include adding a target index field FIDX to the target encryption field F in the database. The target index field FIDX may be an index field associated with the target encryption field F, and may function to provide a processed, retrievable mapping value for the encryption field F.
Screening the first part-of-speech tagging result set according to a preset screening standard to determine a second part-of-speech tagging result set may include screening the first part-of-speech tagging result set POS, screening out parts of speech having obvious retrieval features such as nouns and verbs, and determining the second part-of-speech tagging result set POS2.
Here, the parts of speech to be screened may be set according to specific requirements. The more parts-of-speech filtering, the better the effect of the search is also in response, and the larger the index is relative.
As one example, the first part of speech tagging result set POS is screened to determine a second part of speech tagging result set POS 2= { country a/NN, country B/NR, offer/VV, urgent/JJ, humane/NN, assistance/NN, for/VV, solve/VV, country B/NR, food/NN, medical/NN, urgent/NN }.
Performing deduplication processing on the second part-of-speech tagging result set to determine a third part-of-speech tagging result set, which may include:
and performing duplicate removal processing on the second part-of-speech tagging result set POS2 to determine a third part-of-speech tagging result set POS3.
Here, deduplication may refer to removing repeated part of speech tags in the second part of speech tagging result set POS2, so as to ensure that the efficiency and accuracy of indexing are not affected by the occurrence of the same part of speech tags or words multiple times in the subsequent retrieval process.
As an example, the second part of speech tagging result set POS2 is deduplicated to obtain a third part of speech tagging result set POS 3= { a country/NN, B country/NR, offer/VV, urgent/JJ, humane/NN, assistance/NN, for/VV, solve/VV, food/NN, medical/NN, urgent need/NN }.
Performing AES/CFB/NoPadding encryption and character string conversion on at least one word in the third part-of-speech tagging result set to determine a fourth part-of-speech tagging result set may include performing AES/CFB/NoPadding encryption and Base64 conversion on each word in the third part-of-speech tagging result set POS3 to obtain a fourth part-of-speech tagging result set POS4.
Here, AES/CFB/NoPadding encryption may be an encryption scheme that combines the advanced encryption standard (AdvancedEncryptionStandard, AES) algorithm, ciphertext feedback mode (Cipher Feedback Mode, CFB) mode, and No pad option.
The AES algorithm may be a symmetric encryption algorithm, used to encrypt data, and the CFB mode may encrypt by feeding ciphertext back to the encryption process as input, noPadding may refer to not filling the data during the encryption process.
Base64 conversion may refer to converting ciphertext of each word encrypted by AES/CFB/NoPadding, after Base64 encoding, into a printable string form.
As an example, a fourth part-of-speech tagging result set is obtained POS4={NAwO0pxB,NQMX0YFqoo/IaMFb,Njsz0LRj,NwAE0opd,NA4Z3YtroKH caeV0,NjsX0YBR,NyAL0LB2,OBMA0YxL,ORc80Zl5,NTgY05xv,NjQG3ZZ4}.
The sorting and splicing processing are performed on the fourth part-of-speech tagging result set, a target index value corresponding to the target encryption field is determined, and the target index value is stored in the target index field corresponding to the target encryption field, which may include:
And (3) sorting and splicing the fourth part-of-speech tagging result set POS4, determining a target index value IDX corresponding to the target encryption field, and storing the target index value IDX into a target index field FIDX corresponding to the target encryption field.
As an example, the target index value IDX may be as follows:
IDX=NA4Z3YtroKHcaeV0NAwO0pxBNQMX0YFqoo/IaMFbNTgY05xvNjQ G3ZZ4NjsX0YBRNjsz0LRjNwAE0opdNyAL0LB2OBMA0YxLORc80Zl5
In the method and the device for fuzzy retrieval of the encrypted field of the database in the above embodiments of the present disclosure, the problem of inaccurate retrieval results caused by hash collision can be prevented by encrypting the vocabulary by using the AES/CFB/NoPadding encryption algorithm instead of the hash value. Through the sequence and the splice after encrypting the vocabulary, the target index value can be effectively matched with the vocabulary to be queried, so that the efficient fuzzy retrieval can be still realized under the encryption condition, the data safety is ensured, and the retrieval accuracy is improved. By means of de-duplication and sequencing of the part-of-speech tagging result set, storage of repeated data can be reduced, generated target index values are ensured to be as compact as possible, and occupation of storage space is reduced.
In one possible implementation manner of the step S102, when it is determined that the target encrypted field is being queried according to the execution statement, part-of-speech tagging is performed on the value to be queried, including:
Intercepting SQL execution sentences of the database, and when the database is determined to be executing the select sentences according to the SQL execution sentences, performing part-of-speech tagging on the values to be queried in the target encryption field, and determining a first set of tagged results to be queried.
In this embodiment, select statements are intercepted at the time of database execution, thereby capturing SQL requests containing encrypted fields. Wherein the select statement is used to query the database for records.
As an example, assuming that the value to be queried of the target encryption field F is "assistance of country a", part-of-speech tagging is performed on the value to be queried, and a first set of tagging results to be queried spos= { country a/NR, DEG, assistance/NN }.
In the method and the device for fuzzy retrieval of the encrypted field of the database, the part-of-speech tagging can help to identify useful information in the words to be queried, so that the fuzzy retrieval effect is optimized. Even if the field is encrypted, the effective fuzzy search processing can be performed based on the part-of-speech information of the plaintext query value.
In one possible implementation manner of the step S102, determining the index value to be queried corresponding to the value to be queried is implemented based on the following steps:
screening the first labeling result set to be queried according to a preset screening standard, and determining a second labeling result set to be queried;
Performing de-duplication processing on the second labeling result set to be queried to determine a third labeling result set to be queried, performing AES/CFB/NoPadding encryption and character string conversion on at least one word in the third labeling result set to be queried, and determining a fourth labeling result set to be queried;
And sequencing and splicing the fourth to-be-queried labeling result set to determine the index value to be queried corresponding to the value to be queried.
In this embodiment, the screening the first labeling result set to be queried according to the preset screening criteria, and determining the second labeling result set to be queried may include:
and screening the first labeling result set SPOS to be queried, and determining a second labeling result set SPOS2 to be queried.
As one example, spos2= { medium/NR, assistance/NN }.
Performing deduplication processing on the second labeling result set to be queried, and determining a third labeling result set to be queried may include:
and performing de-duplication processing on the second labeling result set SPOS2 to be queried, and determining a third labeling result set SPOS3 to be queried.
As one example, spos3= { medium/NR, assistance/NN }.
Performing AES/CFB/NoPadding encryption and character string conversion on at least one word in the third to-be-queried labeling result set to determine a fourth to-be-queried labeling result set, which may include:
And performing AES/CFB/NoPadding encryption and Base64 character string conversion on each word in the third to-be-queried labeling result set SPOS3, and determining a fourth to-be-queried labeling result set SPOS4.
As an example, spos4= { NAwO0pxB, njsX0YBR }.
The sorting and splicing processing are performed on the fourth to-be-queried labeling result set, and the determining of the to-be-queried index value corresponding to the to-be-queried value can comprise:
splicing the sorted fourth part-of-speech tagging result sets by adopting a first method, and taking the intersection of at least one value to be queried in the fourth part-of-speech tagging result sets as a first index value to be queried corresponding to the value to be queried;
And splicing the sorted fourth part-of-speech tagging result sets by adopting a second method, and taking the union of at least one value to be queried in the fourth part-of-speech tagging result sets as a second index value to be queried corresponding to the value to be queried.
Specifically, the method includes the steps of performing splicing processing on the fourth to-be-queried labeling result set SPOS4 by% to determine an intersection of each to-be-queried value in the fourth to-be-queried labeling result set SPOS4 as a to-be-queried value SIDX corresponding to the to-be-queried value, and performing splicing processing on the fourth to-be-queried labeling result set SPOS4 to determine a union of each to-be-queried value in the fourth to-be-queried labeling result set SPOS4 as a to-be-queried value SIDX corresponding to the to-be-queried value.
In the method and the device for fuzzy retrieval of the encryption field of the database, disclosed by the embodiment of the invention, the sensitive information of the query word is ensured to be protected in the storage and retrieval process by carrying out AES/CFB/NoPadding encryption on the word to be queried. The encrypted part-of-speech tagging result set SPOS4 is sequenced and spliced to form a final query index SIDX, so that database matching can be performed more efficiently. The intersection or union may be selected as the final retrieval method according to the requirements, providing greater flexibility. Through the encryption and post-part-of-speech tagging process, the generated query index SIDX will not directly store the original encrypted field value, but rather store a simplified, encrypted index value. This can effectively reduce the consumption of storage space while also enabling the acceleration of queries.
In one possible implementation manner of the foregoing embodiment, based on a preset operator, matching a target index value with an index value to be queried in a database, and determining a fuzzy search result corresponding to the index value to be queried in the target index value includes:
based on a like operator of the database, matching a target index value with a first index value to be queried in the database, and determining a first fuzzy retrieval result corresponding to the first index value to be queried, wherein the first fuzzy retrieval result is an intersection;
And matching the target index value with a second index value to be queried in the database, and determining a second fuzzy search result corresponding to the second index value to be queried, wherein the second fuzzy search result is a union.
In this embodiment, based on a like operator of a database, matching a target index value with a first index value to be queried in the database, determining a first fuzzy search result corresponding to the first index value to be queried, where the first fuzzy search result is an intersection, may include matching the target index value with the first index value to be queried in the database by using a first query condition, and determining the first fuzzy search result corresponding to the first index value to be queried.
Here, the first query condition may be select from table where FIDX like '% NAwO0pxB% NjsX0YBR%'.
Matching the target index value with the second index value to be queried in the database to determine a second fuzzy search result corresponding to the second index value to be queried, wherein the second fuzzy search result is a union, and the method can comprise the steps of matching the target index value with the second index value to be queried in the database by adopting a second query condition to determine a second fuzzy search result corresponding to the second index value to be queried.
Here, the second query condition may be select from table where (FIDX like '% NAwO0pxB%' or FIDX like '% NjsX0 YBR%').
In the method and the device for fuzzy retrieval of the encrypted field of the database in the above embodiments of the present disclosure, by matching the target index value with the index value to be queried using the like operator of the database, the system can quickly find by using the existing index structure of the database. Query conditions are processed respectively in an intersection and union mode, records meeting the conditions can be screened out more accurately, unnecessary database scanning and calculation are reduced, and the overall efficiency of retrieval is improved.
Referring to fig. 3, fig. 3 is a schematic diagram of a target index value construction flow of a database encryption field fuzzy retrieval method according to an embodiment of the disclosure, where the flow may include the following steps:
Step S301, an insert statement or an update statement of data is intercepted, and a value P of an encryption field F is obtained;
step S302, performing part-of-speech tagging on P to obtain POS;
step S303, screening the POS to obtain POS2;
Step S304, performing weight removal on the POS2 to obtain a POS3;
step S305, performing AES/CFB/NoPadding encryption and character string conversion on POS3 to obtain POS4;
step S306, sorting and splicing the POS4 to obtain a target index value IDX;
Step S307, the IDX is stored in FIDX field.
Referring to fig. 4, fig. 4 is a schematic diagram of a fuzzy search result obtaining flow of a fuzzy search method for encrypting a field in a database according to an embodiment of the disclosure, where the flow may include the following steps:
Step S401, intercepting a select statement of data, and acquiring a value S to be retrieved in an encryption field F;
step S402, performing part-of-speech tagging on the S to obtain SPOS;
step S403, screening the SPOS to obtain SPOS2;
step S404, de-duplicating the SPOS2 to obtain SPOS3;
step S405, performing AES/CFB/NoPadding encryption and character string conversion on the SPOS3 to obtain SPOS4;
Step S406, confirming the searching method, if searching is carried out according to the intersection, turning to step S407, if searching is carried out according to the union, turning to step S408;
Judging the splicing method of the SPOS4, wherein if the first method is adopted for splicing, the first query condition is adopted for searching;
Step S407, select ' from table where FIDX like '% NAwO% 0pxB% NjsX% 0YBR% ';
Here, matching the target index value with a first index value to be queried in a database by adopting a first query condition, and determining a first fuzzy retrieval result corresponding to the first index value to be queried;
Step S408, select from table sphere (FIDX like '% NAwO0pxB%' or FIDX like '% NjsX0 YBR%');
and matching the target index value with a second index value to be queried in the database by adopting a second query condition, and determining a second fuzzy search result corresponding to the second index value to be queried.
In one embodiment, a database encryption field ambiguity search device 50 is provided, where the database encryption field ambiguity search device 50 corresponds to the database encryption field ambiguity search method in the above embodiment one by one. As shown in fig. 5, the database encryption field fuzzy retrieval device 50 includes a first part-of-speech tagging module 501, a second part-of-speech tagging module 502, and a fuzzy retrieval module 503, wherein each functional module is described in detail as follows:
The first part-of-speech tagging module 501 is configured to intercept an execution statement of a database, and when determining that a target encryption field is being inserted or updated according to the execution statement, perform part-of-speech tagging on the target encryption field, and determine a target index value corresponding to the target encryption field;
The second part of speech tagging module 502 is configured to perform the part of speech tagging on a value to be queried to determine an index value to be queried corresponding to the value to be queried when it is determined that the target encryption field is being queried according to the execution statement;
And the fuzzy retrieval module 503 is configured to match the target index value with the index value to be queried in the database based on a preset operator, and determine a fuzzy retrieval result corresponding to the index value to be queried in the target index value, where the fuzzy retrieval result is an intersection or a union.
In a specific embodiment, the first part of speech tagging module 501 is configured to intercept an SQL execution statement of a database, and perform part of speech tagging on a target encrypted field when it is determined that the database is executing an insert statement or an update statement according to the SQL execution statement, so as to determine a first part of speech tagging result set.
In a specific embodiment, the first part of speech tagging module 501 is configured to add a corresponding target index field to a target encryption field in the database;
Screening the first part-of-speech tagging result set according to a preset screening standard to determine a second part-of-speech tagging result set;
performing de-duplication processing on the second part-of-speech tagging result set, determining a third part-of-speech tagging result set, performing AES/CFB/NoPadding encryption and character string conversion on at least one word in the third part-of-speech tagging result set, and determining a fourth part-of-speech tagging result set;
and sequencing and splicing the fourth part-of-speech tagging result set, determining a target index value corresponding to the target encryption field, and storing the target index value into the target index field corresponding to the target encryption field.
In a specific embodiment, the second part-of-speech tagging module 502 is configured to intercept an SQL execution statement of a database, and perform part-of-speech tagging on a value to be queried in a target encryption field when it is determined that the database is executing a select statement according to the SQL execution statement, to determine a first set of tagged results to be queried.
In a specific embodiment, the second part-of-speech tagging module 502 is configured to filter the first to-be-queried tagging result set according to a preset filtering criterion, and determine a second to-be-queried tagging result set;
performing de-duplication processing on the second labeling result set to be queried to determine a third labeling result set to be queried, performing AES/CFB/NoPadding encryption and character string conversion on at least one word in the third labeling result set to be queried, and determining a fourth labeling result set to be queried;
And sequencing and splicing the fourth to-be-queried labeling result set to determine the to-be-queried index value corresponding to the to-be-queried value.
In a specific embodiment, the second part-of-speech tagging module 502 is configured to splice the sorted fourth part-of-speech tagging result set by using a first method, and take an intersection of at least one value to be queried in the fourth part-of-speech tagging result set as a first index value to be queried corresponding to the value to be queried;
And splicing the sorted fourth part-of-speech tagging result sets by adopting a second method, and taking the union of at least one value to be queried in the fourth part-of-speech tagging result sets as a second index value to be queried corresponding to the value to be queried.
In a specific embodiment, the fuzzy search module 503 is configured to match, in the database, the target index value with a first index value to be queried based on a like operator of the database, and determine a first fuzzy search result corresponding to the first index value to be queried, where the first fuzzy search result is an intersection;
and matching the target index value with a second index value to be queried in the database, and determining a second fuzzy search result corresponding to the second index value to be queried, wherein the second fuzzy search result is a union set.
It should be noted that, when implementing the corresponding method for retrieving the encrypted field of the database encryption field, the device provided in the foregoing embodiment only uses the division of the foregoing program modules to illustrate, in practical application, the process allocation may be completed by different program modules according to needs, that is, the internal structure of the foregoing system is divided into different program modules, so as to complete all or part of the processes described above. In addition, the system provided in the foregoing embodiment belongs to the same concept as the embodiment of the method shown in fig. 1, and the specific implementation process of the system is detailed in the method embodiment, which is not repeated herein.
The embodiment of the disclosure also provides a computer device, which is provided with the database encryption field fuzzy retrieval device shown in the figure 5.
Referring to fig. 6, fig. 6 is a schematic structural diagram of another database encryption field ambiguity retrieving apparatus according to an embodiment of the present disclosure, and as shown in fig. 6, the computer device includes one or more processors 10, a memory 20, and interfaces for connecting components, including a high-speed interface and a low-speed interface. The various components are communicatively coupled to each other using different buses and may be mounted on a common motherboard or in other manners as desired. The processor may process instructions executing within the computer device, including instructions stored in or on memory to display graphical information of the GUI on an external input/output device, such as a display device coupled to the interface. In some alternative embodiments, multiple processors and/or multiple buses may be used, if desired, along with multiple memories and multiple memories. Also, multiple computer devices may be connected, each providing a portion of the necessary operations (e.g., as a server array, a set of blade servers, or a multiprocessor system). One processor 10 is illustrated in fig. 6.
The processor 10 may be a central processor, a network processor, or a combination thereof. The processor 10 may further include a hardware chip, among others. The hardware chip may be an application specific integrated circuit, a programmable logic device, or a combination thereof. The programmable logic device may be a complex programmable logic device, a field programmable gate array, a general-purpose array logic, or any combination thereof.
Wherein the memory 20 stores instructions executable by the at least one processor 10 to cause the at least one processor 10 to perform the methods shown in implementing the above embodiments.
The memory 20 may include a storage program area that may store an operating system, application programs required for at least one function, and a storage data area that may store data created according to the use of the computer device, etc. In addition, the memory 20 may include high-speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid-state storage device. In some alternative embodiments, memory 20 may optionally include memory located remotely from processor 10, which may be connected to the computer device via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The memory 20 may comprise volatile memory, such as random access memory, or nonvolatile memory, such as flash memory, hard disk or solid state disk, or the memory 20 may comprise a combination of the above types of memory.
The computer device further comprises input means 30 and output means 40. The processor 10, memory 20, input device 30, and output device 20 may be connected by a bus or other means, for example in fig. 6.
The input device 30 may receive input numeric or character information and generate key signal inputs related to user settings and function control of the computer apparatus, such as a touch screen, a keypad, a mouse, a trackpad, a touchpad, a pointer stick, one or more mouse buttons, a trackball, a joystick, and the like. The output means 40 may include a display device, auxiliary lighting means (e.g., LEDs), tactile feedback means (e.g., vibration motors), and the like. Such display devices include, but are not limited to, liquid crystal displays, light emitting diodes, displays and plasma displays. In some alternative implementations, the display device may be a touch screen.
The computer device also includes a communication interface 30 for the computer device to communicate with other devices or communication networks.
The presently disclosed embodiments also provide a computer readable storage medium, and the methods described above according to the presently disclosed embodiments may be implemented in hardware, firmware, or as recordable storage medium, or as computer code downloaded over a network that is originally stored in a remote storage medium or a non-transitory machine-readable storage medium and is to be stored in a local storage medium, such that the methods described herein may be stored on such software processes on a storage medium using a general purpose computer, special purpose processor, or programmable or dedicated hardware. The storage medium may be a magnetic disk, an optical disk, a read-only memory, a random-access memory, a flash memory, a hard disk, a solid state disk, or the like, and further, the storage medium may further include a combination of the above types of memories. It will be appreciated that a computer, processor, microprocessor controller or programmable hardware includes a storage element that can store or receive software or computer code that, when accessed and executed by the computer, processor or hardware, implements the methods illustrated by the above embodiments.
Portions of the present disclosure may be applied as a computer program product, such as computer program instructions, which when executed by a computer, may invoke or provide methods and/or techniques in accordance with the present disclosure by way of operation of the computer. Those skilled in the art will appreciate that the existence of computer program instructions in a computer-readable medium includes, but is not limited to, source files, executable files, installation package files, and the like, and accordingly, the manner in which computer program instructions are executed by a computer includes, but is not limited to, the computer directly executing the instructions, or the computer compiling the instructions and then executing the corresponding compiled programs, or the computer reading and executing the instructions, or the computer reading and installing the instructions and then executing the corresponding installed programs. Herein, a computer-readable medium may be any available computer-readable storage medium or communication medium that can be accessed by a computer.
Although embodiments of the present disclosure have been described in connection with the accompanying drawings, various modifications and variations may be made by those skilled in the art without departing from the spirit and scope of the disclosure, and such modifications and variations are within the scope defined by the appended claims.

Claims (10)

1.一种数据库加密字段模糊检索方法,其特征在于,所述方法包括:1. A method for fuzzy retrieval of encrypted fields in a database, characterized in that the method comprises: 拦截数据库的执行语句,当根据所述执行语句确定目标加密字段正在插入或更新时,对所述目标加密字段进行词性标注处理,确定所述目标加密字段对应的目标索引值;Intercepting an execution statement of a database, and when it is determined according to the execution statement that a target encrypted field is being inserted or updated, performing part-of-speech tagging on the target encrypted field to determine a target index value corresponding to the target encrypted field; 当根据所述执行语句确定所述目标加密字段正在被查询时,对待查询值进行所述词性标注处理,确定所述待查询值对应的待查询索引值;When it is determined according to the execution statement that the target encrypted field is being queried, performing the part-of-speech tagging process on the queried value to determine the queried index value corresponding to the queried value; 基于预设操作符,在所述数据库中对所述目标索引值与所述待查询索引值进行匹配,确定所述目标索引值中所述待查询索引值对应的模糊检索结果;其中,所述模糊检索结果为交集或并集。Based on a preset operator, the target index value is matched with the index value to be queried in the database to determine a fuzzy search result corresponding to the index value to be queried in the target index value; wherein the fuzzy search result is an intersection or a union. 2.根据权利要求1所述的方法,其特征在于,所述拦截数据库的执行语句,当根据所述执行语句确定目标加密字段正在插入或更新时,对所述目标加密字段进行词性标注处理,包括:2. The method according to claim 1 is characterized in that the interception of the execution statement of the database, when it is determined according to the execution statement that the target encrypted field is being inserted or updated, performs part-of-speech tagging processing on the target encrypted field, comprises: 拦截数据库的SQL执行语句,当根据所述SQL执行语句确定所述数据库正在执行insert语句或update语句时,对目标加密字段进行词性标注处理,确定第一词性标注结果集。An SQL execution statement of a database is intercepted, and when it is determined according to the SQL execution statement that the database is executing an insert statement or an update statement, part-of-speech tagging is performed on a target encrypted field to determine a first part-of-speech tagging result set. 3.根据权利要求2所述的方法,其特征在于,确定所述目标加密字段对应的目标索引值,基于以下步骤实现:3. The method according to claim 2, characterized in that determining the target index value corresponding to the target encrypted field is achieved based on the following steps: 在数据库中为目标加密字段添加对应的目标索引字段;Add corresponding target index fields for target encrypted fields in the database; 根据预设筛选标准,对第一词性标注结果集进行筛选,确定第二词性标注结果集;According to a preset screening criterion, the first part-of-speech tagging result set is screened to determine a second part-of-speech tagging result set; 对所述第二词性标注结果集进行去重处理,确定第三词性标注结果集,对所述第三词性标注结果集中的至少一个词进行AES/CFB/NoPadding加密与字符串转换,确定第四词性标注结果集;Deduplication processing is performed on the second part-of-speech tagging result set to determine a third part-of-speech tagging result set, and AES/CFB/NoPadding encryption and string conversion are performed on at least one word in the third part-of-speech tagging result set to determine a fourth part-of-speech tagging result set; 对所述第四词性标注结果集进行排序与拼接处理,确定所述目标加密字段对应的目标索引值,将所述目标索引值存入所述目标加密字段对应的所述目标索引字段中。The fourth part-of-speech tagging result set is sorted and concatenated, a target index value corresponding to the target encrypted field is determined, and the target index value is stored in the target index field corresponding to the target encrypted field. 4.根据权利要求1所述的方法,其特征在于,所述当根据所述执行语句确定所述目标加密字段正在被查询时,对待查询值进行所述词性标注处理,包括:4. The method according to claim 1, characterized in that when it is determined according to the execution statement that the target encrypted field is being queried, performing the part-of-speech tagging process on the query value comprises: 拦截数据库的SQL执行语句,当根据所述SQL执行语句确定所述数据库正在执行select语句时,对目标加密字段中的待查询值进行词性标注处理,确定第一待查询标注结果集。The SQL execution statement of the database is intercepted, and when it is determined according to the SQL execution statement that the database is executing a select statement, part-of-speech tagging is performed on the value to be queried in the target encrypted field to determine a first tag result set to be queried. 5.根据权利要求4所述的方法,其特征在于,确定所述待查询值对应的待查询索引值,基于以下步骤实现:5. The method according to claim 4, characterized in that determining the index value to be queried corresponding to the value to be queried is achieved based on the following steps: 根据预设筛选标准,对第一待查询标注结果集进行筛选,确定第二待查询标注结果集;According to the preset screening criteria, the first set of annotation results to be queried is screened to determine the second set of annotation results to be queried; 对所述第二待查询标注结果集进行去重处理,确定第三待查询标注结果集,对所述第三待查询标注结果集中的至少一个词进行AES/CFB/NoPadding加密与字符串转换,确定第四待查询标注结果集;Deduplication processing is performed on the second annotation result set to be queried to determine a third annotation result set to be queried, and AES/CFB/NoPadding encryption and string conversion are performed on at least one word in the third annotation result set to be queried to determine a fourth annotation result set to be queried; 对所述第四待查询标注结果集进行排序与拼接处理,确定所述待查询值对应的待查询索引值。The fourth to-be-queried annotation result set is sorted and concatenated to determine the to-be-queried index value corresponding to the to-be-queried value. 6.根据权利要求5任一项所述的方法,其特征在于,所述对所述第四待查询标注结果集进行排序与拼接处理,确定所述待查询值对应的待查询索引值,包括:6. The method according to any one of claims 5, characterized in that the step of sorting and concatenating the fourth to-be-queried annotation result set to determine the to-be-queried index value corresponding to the to-be-queried value comprises: 采用第一方法对排序后的第四词性标注结果集进行拼接处理,将所述第四词性标注结果集中至少一个待查询值的交集作为待查询值对应的第一待查询索引值;The sorted fourth part-of-speech tagging result set is concatenated using the first method, and an intersection of at least one to-be-queried value in the fourth part-of-speech tagging result set is used as a first to-be-queried index value corresponding to the to-be-queried value; 采用第二方法对所述排序后的第四词性标注结果集进行拼接处理,将所述第四词性标注结果集中至少一个待查询值的并集作为所述待查询值对应的第二待查询索引值。The sorted fourth part-of-speech tagging result set is concatenated using the second method, and a union of at least one to-be-queried value in the fourth part-of-speech tagging result set is used as a second to-be-queried index value corresponding to the to-be-queried value. 7.根据权利要求6任一项所述的方法,其特征在于,所述基于预设操作符,在所述数据库中对所述目标索引值与所述待查询索引值进行匹配,确定所述目标索引值中所述待查询索引值对应的模糊检索结果,包括:7. The method according to any one of claim 6, characterized in that the step of matching the target index value with the index value to be queried in the database based on a preset operator to determine the fuzzy search result corresponding to the index value to be queried in the target index value comprises: 基于数据库的like操作符,在数据库中对目标索引值与第一待查询索引值进行匹配,确定所述第一待查询索引值对应的第一模糊检索结果,所述第一模糊检索结果为交集;Based on the like operator of the database, the target index value is matched with the first index value to be queried in the database to determine a first fuzzy search result corresponding to the first index value to be queried, where the first fuzzy search result is an intersection; 在所述数据库中对所述目标索引值与第二待查询索引值进行匹配,确定所述第二待查询索引值对应的第二模糊检索结果,所述第二模糊检索结果为并集。The target index value is matched with a second index value to be queried in the database to determine a second fuzzy search result corresponding to the second index value to be queried, where the second fuzzy search result is a union. 8.一种数据库加密字段模糊检索装置,其特征在于,所述装置包括:8. A fuzzy search device for encrypted fields in a database, characterized in that the device comprises: 第一词性标注模块,用于拦截数据库的执行语句,当根据所述执行语句确定目标加密字段正在插入或更新时,对所述目标加密字段进行词性标注处理,确定所述目标加密字段对应的目标索引值;A first part-of-speech tagging module is used to intercept an execution statement of a database, and when it is determined according to the execution statement that a target encrypted field is being inserted or updated, perform part-of-speech tagging on the target encrypted field to determine a target index value corresponding to the target encrypted field; 第二词性标注模块,用于当根据所述执行语句确定所述目标加密字段正在被查询时,对待查询值进行所述词性标注处理,确定所述待查询值对应的待查询索引值;A second part-of-speech tagging module is used to perform the part-of-speech tagging process on the to-be-queried value and determine the to-be-queried index value corresponding to the to-be-queried value when it is determined according to the execution statement that the target encrypted field is being queried; 模糊检索模块,用于基于预设操作符,在所述数据库中对所述目标索引值与所述待查询索引值进行匹配,确定所述目标索引值中所述待查询索引值对应的模糊检索结果;其中,所述模糊检索结果为交集或并集。A fuzzy retrieval module is used to match the target index value with the index value to be queried in the database based on a preset operator, and determine the fuzzy retrieval result corresponding to the index value to be queried in the target index value; wherein the fuzzy retrieval result is an intersection or a union. 9.一种计算机可读存储介质,其特征在于,所述计算机可读存储介质上存储有计算机指令,所述计算机指令用于使计算机执行权利要求1-7所述的数据库加密字段模糊检索方法。9. A computer-readable storage medium, characterized in that computer instructions are stored on the computer-readable storage medium, and the computer instructions are used to enable a computer to execute the database encrypted field fuzzy retrieval method described in claims 1-7. 10.一种计算机程序产品,其特征在于,包括计算机指令,所述计算机指令用于使计算机执行权利要求1至7中任一项所述的数据库加密字段模糊检索方法。10. A computer program product, characterized in that it comprises computer instructions, wherein the computer instructions are used to enable a computer to execute the database encrypted field fuzzy retrieval method according to any one of claims 1 to 7.
CN202411773990.2A 2024-12-04 2024-12-04 Fuzzy retrieval method and device for encrypted field of database Pending CN119829641A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202411773990.2A CN119829641A (en) 2024-12-04 2024-12-04 Fuzzy retrieval method and device for encrypted field of database

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202411773990.2A CN119829641A (en) 2024-12-04 2024-12-04 Fuzzy retrieval method and device for encrypted field of database

Publications (1)

Publication Number Publication Date
CN119829641A true CN119829641A (en) 2025-04-15

Family

ID=95294568

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202411773990.2A Pending CN119829641A (en) 2024-12-04 2024-12-04 Fuzzy retrieval method and device for encrypted field of database

Country Status (1)

Country Link
CN (1) CN119829641A (en)

Similar Documents

Publication Publication Date Title
US10360308B2 (en) Automated ontology building
US12189668B2 (en) Query expansion using a graph of question and answer vocabulary
US10169471B2 (en) Generating and executing query language statements from natural language
US12242477B2 (en) Semantic search based on a graph database
US10303689B2 (en) Answering natural language table queries through semantic table representation
US11941135B2 (en) Automated sensitive data classification in computerized databases
US11074266B2 (en) Semantic concept discovery over event databases
US9940355B2 (en) Providing answers to questions having both rankable and probabilistic components
US11487801B2 (en) Dynamic data visualization from factual statements in text
US9679063B2 (en) Search results based on an environment context
US11222051B2 (en) Document analogues through ontology matching
US12153615B2 (en) Developing object ontologies and data usage models using machine learning
US9286348B2 (en) Dynamic search system
CN119829641A (en) Fuzzy retrieval method and device for encrypted field of database
US11443101B2 (en) Flexible pseudo-parsing of dense semi-structured text
CN113868375A (en) Data query method, device, equipment and storage medium based on structured query language
US20260023536A1 (en) Threat model assistant for software development
CN119990153A (en) Multi-language processing method, device, electronic device and storage medium
CN119293296A (en) String-based data search method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination