CN118944913A - Data encryption management method and system - Google Patents
Data encryption management method and system Download PDFInfo
- Publication number
- CN118944913A CN118944913A CN202410930253.2A CN202410930253A CN118944913A CN 118944913 A CN118944913 A CN 118944913A CN 202410930253 A CN202410930253 A CN 202410930253A CN 118944913 A CN118944913 A CN 118944913A
- Authority
- CN
- China
- Prior art keywords
- data
- ciphertext
- word segmentation
- query
- term
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000007726 management method Methods 0.000 title claims abstract description 25
- 230000011218 segmentation Effects 0.000 claims abstract description 185
- 238000000034 method Methods 0.000 claims abstract description 59
- 238000012545 processing Methods 0.000 claims abstract description 51
- 230000000977 initiatory effect Effects 0.000 claims abstract description 9
- 230000008569 process Effects 0.000 claims description 36
- 230000005540 biological transmission Effects 0.000 claims description 20
- 230000007246 mechanism Effects 0.000 description 10
- 230000008901 benefit Effects 0.000 description 8
- 238000013478 data encryption standard Methods 0.000 description 8
- 238000011161 development Methods 0.000 description 4
- 230000018109 developmental process Effects 0.000 description 4
- 238000003058 natural language processing Methods 0.000 description 4
- 230000000694 effects Effects 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 230000004044 response Effects 0.000 description 3
- 230000001010 compromised effect Effects 0.000 description 2
- 238000013136 deep learning model Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000012795 verification Methods 0.000 description 2
- 230000002411 adverse Effects 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/04—Network architectures or network communication protocols for network security for providing a confidential data exchange among entities communicating through data packet networks
- H04L63/0428—Network architectures or network communication protocols for network security for providing a confidential data exchange among entities communicating through data packet networks wherein the data content is protected, e.g. by encrypting or encapsulating the payload
- H04L63/0435—Network architectures or network communication protocols for network security for providing a confidential data exchange among entities communicating through data packet networks wherein the data content is protected, e.g. by encrypting or encapsulating the payload wherein the sending and receiving network entities apply symmetric encryption, i.e. same key used for encryption and decryption
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2458—Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
- G06F16/2468—Fuzzy queries
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/60—Protecting data
- G06F21/602—Providing cryptographic facilities or services
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/60—Protecting data
- G06F21/64—Protecting data integrity, e.g. using checksums, certificates or signatures
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/04—Network architectures or network communication protocols for network security for providing a confidential data exchange among entities communicating through data packet networks
- H04L63/0428—Network architectures or network communication protocols for network security for providing a confidential data exchange among entities communicating through data packet networks wherein the data content is protected, e.g. by encrypting or encapsulating the payload
- H04L63/0442—Network architectures or network communication protocols for network security for providing a confidential data exchange among entities communicating through data packet networks wherein the data content is protected, e.g. by encrypting or encapsulating the payload wherein the sending and receiving network entities apply asymmetric encryption, i.e. different keys for encryption and decryption
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L9/00—Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
- H04L9/14—Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols using a plurality of keys or algorithms
Landscapes
- Engineering & Computer Science (AREA)
- Computer Security & Cryptography (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Computer Hardware Design (AREA)
- General Physics & Mathematics (AREA)
- Signal Processing (AREA)
- Computer Networks & Wireless Communication (AREA)
- Mathematical Physics (AREA)
- Fuzzy Systems (AREA)
- Health & Medical Sciences (AREA)
- Bioethics (AREA)
- General Health & Medical Sciences (AREA)
- Computing Systems (AREA)
- Probability & Statistics with Applications (AREA)
- Automation & Control Theory (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- Computational Linguistics (AREA)
- Storage Device Security (AREA)
Abstract
The embodiment of the invention provides a data encryption management method and system, and belongs to the technical field of data security. The method is executed by a user terminal, and comprises the following steps: collecting data to be stored, performing word segmentation on the data to be stored, and performing asymmetric encryption on the data to be stored after the word segmentation is completed to obtain encrypted data; symmetric encryption is carried out on a plurality of participles obtained through participle processing, and participle ciphertext is obtained; sending the encrypted data and the term ciphertext to a cloud end so as to enable the cloud end to execute data association storage; initiating a query request, so that a cloud end performs word segmentation ciphertext matching based on a query word segmentation in the query request, and recovering encrypted data corresponding to the word segmentation ciphertext matched by the cloud end; and executing decryption on the recovered encrypted data to obtain query data. The scheme of the invention can efficiently carry out fuzzy data retrieval under the condition of not damaging data encryption.
Description
Technical Field
The invention relates to the technical field of data security, in particular to a data encryption management method and a data encryption management system.
Background
There are problems and risks of leakage, tampering, violations of regulations, and affecting trust in the absence of encryption of data. Unencrypted data is susceptible to theft or tampering in transmission or storage, resulting in exposure of confidential information, causing serious financial and reputation losses to the company or individual. For some industries or areas (e.g., financial, medical, government agencies, etc.), unencrypted data may also violate relevant regulations and legal terms, increasing the risk of adverse effects and legal risks.
Thus, to avoid these potential problems and risks, the data must be encrypted. Encryption can ensure that data is not stolen and tampered in transmission and storage, and helps enterprises or individuals to obey relevant laws and regulations, so that the data security is further improved and confidential information is protected. In the current information age, encryption technology has become one of the important means for guaranteeing information security. Data encryption is not only a necessary condition for enterprise information security management, but also a necessary trend for social informatization development.
Although data encryption is one of the basic security measures, it can bring difficulty and cost to data retrieval, because the encrypted data cannot be directly subjected to text matching and inquiry, and needs to be decrypted first to perform searching. How to efficiently perform data fuzzy retrieval without breaking data encryption has become a problem to be solved. There are two solutions currently available:
1) The encryption and decryption algorithm consistent with the program is realized by utilizing the encryption and decryption options of the database, the fuzzy query condition is modified, the encryption and decryption function of the database is used for firstly decrypting and then performing fuzzy search, the method has the advantages of low realization cost and low development and use cost, the conventional fuzzy search can be realized only by slightly modifying the conventional fuzzy search, but the defect is obvious, the encrypted data cannot hit the database index, the retrieval efficiency is greatly reduced, and even some databases cannot guarantee the encryption and decryption algorithm consistent with the program.
2) And performing word segmentation and combination on the ciphertext data, respectively encrypting the word segmentation and combination result sets, and storing the encrypted word segmentation and combination result sets in an expansion column, wherein the encrypted word segmentation and combination result sets are queried through fuzzy query. However, the extended columns in the table may present a large number of repeated word divisions, and it is difficult to avoid index failure.
Aiming at the problem that the efficient fuzzy data retrieval can not be realized under the condition of not damaging the data encryption at present, a new data encryption management scheme needs to be provided.
Disclosure of Invention
The embodiment of the invention aims to provide a data encryption management method and system, which at least solve the problem that the efficient fuzzy retrieval of data can not be realized under the condition of not damaging the data encryption at present.
To achieve the above object, a first aspect of the present invention provides a data encryption management method, which is executed by a user terminal, the method comprising: collecting data to be stored, performing word segmentation on the data to be stored, and performing asymmetric encryption on the data to be stored after the word segmentation is completed to obtain encrypted data; symmetric encryption is carried out on a plurality of participles obtained through participle processing, and participle ciphertext is obtained; sending the encrypted data and the term ciphertext to a cloud end so as to enable the cloud end to execute data association storage; initiating a query request, so that a cloud end performs word segmentation ciphertext matching based on a query word segmentation in the query request, and recovering encrypted data corresponding to the word segmentation ciphertext matched by the cloud end; and executing decryption on the recovered encrypted data to obtain query data.
Optionally, the performing word segmentation on the data to be stored includes: based on a pre-selected word segmentation tool, executing the initialization of the word segmentation tool, and executing the loading of the word segmentation tool after the initialization is completed; and executing word segmentation processing on the plaintext data of the data to be stored based on the word segmentation tool to obtain a plurality of word segmentation terms.
Optionally, the performing asymmetric encryption on the data to be stored after the word segmentation processing to obtain encrypted data includes: performing data encryption on the data to be stored after word segmentation processing based on the public key to obtain encrypted data; storing the encrypted data into a ciphertext table, and obtaining a data ciphertext ID; and sending the private key and the data ciphertext ID to the cloud.
Optionally, the performing symmetric encryption on the multiple terms obtained by the term segmentation processing to obtain a term ciphertext includes: generating a symmetric encryption key, and performing symmetric encryption on a plurality of participles obtained by participle processing based on the symmetric encryption key to obtain participle ciphertext; and sending the symmetric encryption key and the participle ciphertext to a cloud.
Optionally, the cloud end for performing data association storage includes: the cloud is configured to: judging whether the word segmentation ciphertext exists in a stored word segmentation ciphertext table or not; if the word-term ciphertext exists, executing corresponding word-term ciphertext and data ciphertext ID association storage; if the word segmentation term ciphertext does not exist, storing the corresponding word segmentation term ciphertext into a stored word segmentation term ciphertext table, and executing the association storage of the corresponding word segmentation term ciphertext and the data ciphertext ID.
Optionally, the initiating a query request to enable the cloud to perform the term ciphertext matching based on the query term in the query request, and recovering encrypted data corresponding to the term ciphertext matched by the cloud, including: the cloud is configured to: obtaining a plaintext word segmentation term based on a query request, and encrypting the plaintext word segmentation term based on a symmetric encryption key uploaded by a user side to obtain a query word segmentation term ciphertext; judging whether the query term ciphertext exists in a stored term ciphertext table or not; if the query data does not exist, judging that the corresponding query data does not exist, and feeding back a query failure signal to the user side; if the data ciphertext ID exists, feeding back a ciphertext table of the data ciphertext ID corresponding to the matched segmentation ciphertext to the user side.
Optionally, the decrypting the recovered encrypted data to obtain the query data includes: acquiring a corresponding private key from the cloud based on the fed-back data ciphertext ID of the ciphertext table; and decrypting the ciphertext table based on the private key to obtain a plaintext list, and pushing the plaintext list to a user as query data.
The second aspect of the present invention provides a client, where the client includes: the acquisition unit is applied to acquiring data to be stored, performing word segmentation on the data to be stored, and performing asymmetric encryption on the data to be stored after the word segmentation is completed to obtain encrypted data; the word segmentation encryption unit is used for performing symmetric encryption on a plurality of word segmentation items obtained by word segmentation processing to obtain a word segmentation item ciphertext; the transmission unit is used for transmitting the encrypted data and the term ciphertext to the cloud end so as to enable the cloud end to execute data association storage; the request unit is used for initiating a query request so that the cloud end can perform the word segmentation ciphertext matching based on the query word segmentation in the query request and recover the encrypted data corresponding to the word segmentation ciphertext matched by the cloud end; and the decryption unit is used for decrypting the recovered encrypted data to obtain query data.
The third aspect of the present invention provides a data encryption management system, which is characterized in that the system includes the above-mentioned user terminal, and a cloud terminal communicatively connected to the user terminal.
A third aspect of the present invention provides a computer-readable storage medium having stored thereon instructions that, when executed on a computer, cause the computer to perform the data encryption management method described above.
Through the technical scheme, the data is encrypted and stored and word-segmented, so that the fuzzy retrieval performance of the data of the database is improved under the condition that the service development and the use of the core characteristics of the database are not affected, the original text content is not exposed, the method has certain confidentiality and security, an encryption algorithm can be customized during storage, the method is completely independent of a database application system, and the data encryption function can be realized without changing the database application system. And finally, the focus of the fuzzy search problem is placed on a storage layer, so that the invasiveness to the service is reduced, and the service development is simpler and more efficient.
Additional features and advantages of embodiments of the invention will be set forth in the detailed description which follows.
Drawings
The accompanying drawings are included to provide a further understanding of embodiments of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain, without limitation, the embodiments of the invention. In the drawings:
FIG. 1 is a flow chart of steps of a data encryption management method according to an embodiment of the present invention;
fig. 2 is a schematic structural diagram of a user terminal according to an embodiment of the present invention.
Detailed Description
The following describes specific embodiments of the present invention in detail with reference to the drawings. It should be understood that the detailed description and specific examples, while indicating and illustrating the invention, are not intended to limit the invention.
Fig. 1 is a flowchart of a method for managing data encryption according to an embodiment of the present invention. As shown in fig. 1, an embodiment of the present invention provides a data encryption management method, including:
step S10: collecting data to be stored, performing word segmentation on the data to be stored, and performing asymmetric encryption on the data to be stored after the word segmentation is completed to obtain encrypted data.
Specifically, based on a pre-selected word segmentation tool, initializing the word segmentation tool, and loading the word segmentation tool after the initialization is completed; and executing word segmentation processing on the plaintext data of the data to be stored based on the word segmentation tool to obtain a plurality of word segmentation terms.
In the embodiment of the invention, the selection of the word segmentation tool is the basis of the whole word segmentation processing process. Different word segmentation tools have different word segmentation algorithms and effects, and common word segmentation tools comprise Jieba, ansj, THULAC and the like. These tools each have advantages and disadvantages when processing chinese text. For example, jieba word segmentation tools use a Trie structure based dictionary word segmentation algorithm, supporting three modes: the accurate mode, the full mode and the search engine mode are suitable for different application scenes; ansj is based on a deep learning model, and has higher word segmentation accuracy and efficiency. The scheme of the invention can customize the word segmentation tool based on the user requirement. After selecting the appropriate word segmentation tool, an initialization is required. The initialization process includes loading word segmentation dictionary, configuring word segmentation parameters, etc. This process is critical to ensure that the word segmentation tool is functioning properly. For example, in the Jieba word segmentation tool, the initialization includes loading a user-defined dictionary to ensure that the word segmentation results can contain domain-specific nouns and new words; in Ansj word segmentation tools, initialization may then involve loading model files and configuring word segmentation policies.
Further, unstructured text data may be converted into structured data through word segmentation. The structured data is convenient for subsequent database storage and indexing, and improves the efficiency of data retrieval. For example, in a search engine, structured word segmentation data may accelerate the keyword retrieval and matching process.
Preferably, the performing asymmetric encryption on the data to be stored after the word segmentation processing to obtain encrypted data includes: performing data encryption on the data to be stored after word segmentation processing based on the public key to obtain encrypted data; storing the encrypted data into a ciphertext table, and obtaining a data ciphertext ID; and sending the private key and the data ciphertext ID to the cloud.
In an embodiment of the present invention, asymmetric encryption (also referred to as public key encryption) is an encryption technique that uses a pair of keys: public and private keys. The public key is used to encrypt data and the private key is used to decrypt data. One major advantage of this technique is that confidentiality and security of the data can be guaranteed, since even if the public key is public, only the person who has the corresponding private key can decrypt the data. After the word segmentation process is completed, the data are encrypted by using the public key. The public key may be published publicly, which anyone can use to encrypt data, but only those who have the corresponding private key can decrypt the data. A key advantage of asymmetric encryption is that even if the encrypted data is intercepted, it cannot be decrypted unless the private key is in possession. And storing the encrypted data into a ciphertext table. The ciphertext table is a database table dedicated to storing encrypted data. Each record includes encrypted data and a unique ciphertext ID. The ciphertext ID is an identifier that identifies each encrypted data record. For security management, the private key and the ciphertext ID need to be sent to the cloud for storage and management. The cloud may provide higher security and availability, ensuring that the private key is not compromised or lost. When the data needs to be decrypted, the corresponding encrypted data can be found through the ciphertext ID, and the private key is used for decryption.
According to the scheme of the invention, the security of the data in the storage and transmission processes can be ensured by using asymmetric encryption. Even if an attacker intercepts the encrypted data, it cannot decrypt it unless it has the corresponding private key. The asymmetric encryption not only can protect confidentiality of data, but also can ensure the integrity of the data and prevent the data from being tampered in the transmission and storage processes.
Step S20: and performing symmetric encryption on a plurality of participles obtained by the participle processing to obtain a participle ciphertext.
Specifically, a symmetric encryption key is generated, symmetric encryption is carried out on a plurality of participles obtained through participle processing based on the symmetric encryption key, and participle ciphertext is obtained; and sending the symmetric encryption key and the participle ciphertext to a cloud.
In the embodiment of the invention, symmetric encryption is a technology widely applied in modern information security, and uses a single key to perform encryption and decryption operations. The symmetric encryption process includes three main steps of key generation, encryption and decryption.
Further, a symmetric encryption key needs to be generated. The most common of symmetric encryption algorithms include Advanced Encryption Standard (AES), data Encryption Standard (DES), triple data encryption standard (3 DES), etc. Here, the generation of the key is described using AES as an example. AES supports key lengths of 128 bits, 192 bits and 256 bits. A 256-bit AES key may be generated using a cryptograph library in Python. Symmetric encryption is performed on a plurality of participles obtained by the participle processing. Word segmentation is a common technique in Natural Language Processing (NLP) that splits text into individual words or phrases. These terms may be words, phrases, or other text segments. And sending the symmetric encryption key and the term ciphertext to the cloud. Cloud storage uses, for example, AWS S3, google Cloud Storage or other cloud storage services. HTTPS is used for transmission before transmission in order to ensure the security of data.
Step S30: and sending the encrypted data and the term ciphertext to a cloud end so as to enable the cloud end to execute data association storage.
Specifically, the cloud judges whether the term ciphertext already exists in a stored term ciphertext table; if the word-term ciphertext exists, executing corresponding word-term ciphertext and data ciphertext ID association storage; if the word segmentation term ciphertext does not exist, storing the corresponding word segmentation term ciphertext into a stored word segmentation term ciphertext table, and executing the association storage of the corresponding word segmentation term ciphertext and the data ciphertext ID.
In the embodiment of the invention, in the cloud data processing process, the term segmentation ciphertext is a ciphertext formed by segmenting and encrypting sensitive data. In order to ensure the security of data, efficient management and storage of these participle ciphertext items is required. Specifically, when a new term ciphertext is generated, the system first needs to determine whether the term ciphertext already exists in the stored term ciphertext table. The system may first query in the stored participle ciphertext table to determine if the participle ciphertext already exists. This step is implemented by a hash algorithm and an encryption algorithm to ensure the efficiency and security of the query process. And according to the query result, the system judges whether the participle item ciphertext already exists in the table. If present, means that the term ciphertext has been previously processed and stored. If the term ciphertext exists, the system executes the associated storage operation of the corresponding term ciphertext and the data ciphertext ID. This means that the system will associate the term ciphertext with the new data ciphertext ID to enable quick locating and retrieval of relevant data during subsequent queries and processing. If the term ciphertext does not exist, the system stores the term ciphertext into a stored term ciphertext table. Then, the system also executes the associated storage operation of the corresponding segmentation term ciphertext and the data ciphertext ID so as to ensure the integrity and traceability of the data.
Based on the scheme of the invention, by judging whether the participle ciphertext exists or not, the system can avoid repeatedly storing and processing the same participle ciphertext. This not only reduces the occupation of storage space, but also improves the efficiency of data processing. Such a mechanism may significantly reduce the load on the system, particularly in a large-scale data processing environment. In the data transmission and storage process, all the word segmentation items and the data ciphertext are encrypted, so that the safety of the data is ensured. Even if stored and processed in the cloud, sensitive data is not revealed or tampered with. The process of determining whether the participle ciphertext exists is actually a data deduplication operation. By this mechanism, the system can avoid storing redundant data, thereby saving storage resources. This is of great importance for cloud computing platforms that handle mass data.
Step S40: and initiating a query request to enable the cloud to perform the word segmentation ciphertext matching based on the query word segmentation in the query request, and recovering the encrypted data corresponding to the word segmentation ciphertext matched by the cloud.
Specifically, the cloud obtains a plaintext word segmentation term based on a query request, and encrypts the plaintext word segmentation term based on a symmetric encryption key uploaded by a user side to obtain a query word segmentation term ciphertext; judging whether the query term ciphertext exists in a stored term ciphertext table or not; if the query data does not exist, judging that the corresponding query data does not exist, and feeding back a query failure signal to the user side; if the data ciphertext ID exists, feeding back a ciphertext table of the data ciphertext ID corresponding to the matched segmentation ciphertext to the user side.
In the embodiment of the invention, a user terminal initiates a query request, and a system acquires a corresponding plaintext word segmentation term according to the request. These terms are keywords or phrases extracted from the user's query terms for locating and retrieving relevant data. The system encrypts the plaintext word segmentation term by using a pre-generated symmetric encryption key to generate a query word segmentation term ciphertext. The process ensures the security of the query segmentation term in the transmission and processing process and prevents the plaintext information from being intercepted or tampered. The system compares the generated query term ciphertext with the stored term ciphertext table, and judges whether the ciphertext exists in the table. If the query term ciphertext does not exist in the stored term ciphertext table, the system judges that the corresponding query data does not exist and feeds back a query failure signal to the user side. If the query term ciphertext exists in the stored term ciphertext table, the system feeds back the ciphertext table of the data ciphertext ID corresponding to the matched term ciphertext to the user side so that the user can decrypt and obtain query data.
In the embodiment of the invention, the query segmentation term is always in an encryption state in the transmission and processing processes through symmetric encryption, so that the leakage and the falsification of plaintext data are effectively prevented, and the security of data query is ensured. By judging whether the query segmentation ciphertext exists or not, the system can quickly determine whether corresponding data exists or not, so that the query efficiency is improved. Particularly in a large-scale data processing environment, the mechanism can remarkably reduce the inquiry time and improve the response speed of the system. The system only stores and processes the association information of the word segmentation ciphertext and the data ciphertext ID, and a large amount of redundant data is prevented from being stored. This mechanism not only saves memory space, but also reduces the processing burden on the system.
Step S50: and executing decryption on the recovered encrypted data to obtain query data.
Specifically, a corresponding private key is obtained from the cloud based on the data ciphertext ID of the feedback ciphertext table; and decrypting the ciphertext table based on the private key to obtain a plaintext list, and pushing the plaintext list to a user as query data.
In the embodiment of the invention, in the data query process, after the user side submits a query request and carries out multiple encryption and verification steps, the final system feeds back the data ciphertext ID of a group of ciphertext tables to the user side. Next, the user side needs to obtain a corresponding private key from the cloud end, and perform decryption on the ciphertext table based on the private key, so as to obtain a required plaintext list. The method specifically comprises the following steps:
1) Acquiring a data ciphertext ID: and under the condition that the query term ciphertext exists, the system feeds back a ciphertext table of the data ciphertext ID corresponding to the matched term ciphertext to the user side. These data ciphertext IDs are unique identifiers of encryption states for locating and retrieving relevant data.
2) Obtaining a private key: and the user side acquires the corresponding private key from the cloud according to the fed-back data ciphertext ID in the ciphertext table. The private key is typically stored in a secure key management system that is accessible and usable only by authorized users. This step ensures that only legitimate users can decrypt and access sensitive data.
3) Decrypting the ciphertext table: and the user side uses the obtained private key to decrypt the data ciphertext ID in the ciphertext table. The decryption process is implemented by advanced encryption algorithms (such as RSA or AES) to ensure that the decrypted data is the original plaintext data.
4) Acquiring a plaintext list: after decryption is completed, the user side obtains a group of plaintext data list. These plaintext data are the result of the user query and can be used directly for display or further processing.
5) Pushing query data: and finally, pushing the decrypted plaintext list to a user as query data by the system, and completing the whole data query and processing flow.
Based on the scheme of the invention, through a multi-level encryption and decryption mechanism, the data is always in an encryption state in the transmission and storage processes, thereby effectively preventing the leakage and the tampering of the data. Sensitive data can be fully protected even if stored and processed in the cloud. The system can quickly locate and retrieve related data through the data ciphertext ID of the ciphertext table, and obtain a plaintext list through decryption of the private key. The mechanism greatly improves the efficiency of data query and processing, particularly in a large-scale data processing environment, can obviously shorten the query time and improve the response speed of the system.
Fig. 2 is a schematic structural diagram of a user terminal according to an embodiment of the present invention. As shown in fig. 2, an embodiment of the present invention provides a client, where the client includes: the acquisition unit is used for acquiring data to be stored, performing word segmentation on the data to be stored, and performing asymmetric encryption on the data to be stored after the word segmentation is completed to obtain encrypted data.
Specifically, based on a pre-selected word segmentation tool, initializing the word segmentation tool, and loading the word segmentation tool after the initialization is completed; and executing word segmentation processing on the plaintext data of the data to be stored based on the word segmentation tool to obtain a plurality of word segmentation terms.
In the embodiment of the invention, the selection of the word segmentation tool is the basis of the whole word segmentation processing process. Different word segmentation tools have different word segmentation algorithms and effects, and common word segmentation tools comprise Jieba, ansj, THULAC and the like. These tools each have advantages and disadvantages when processing chinese text. For example, jieba word segmentation tools use a Trie structure based dictionary word segmentation algorithm, supporting three modes: the accurate mode, the full mode and the search engine mode are suitable for different application scenes; ansj is based on a deep learning model, and has higher word segmentation accuracy and efficiency. The scheme of the invention can customize the word segmentation tool based on the user requirement. After selecting the appropriate word segmentation tool, an initialization is required. The initialization process includes loading word segmentation dictionary, configuring word segmentation parameters, etc. This process is critical to ensure that the word segmentation tool is functioning properly. For example, in the Jieba word segmentation tool, the initialization includes loading a user-defined dictionary to ensure that the word segmentation results can contain domain-specific nouns and new words; in Ansj word segmentation tools, initialization may then involve loading model files and configuring word segmentation policies.
Further, unstructured text data may be converted into structured data through word segmentation. The structured data is convenient for subsequent database storage and indexing, and improves the efficiency of data retrieval. For example, in a search engine, structured word segmentation data may accelerate the keyword retrieval and matching process.
Preferably, the performing asymmetric encryption on the data to be stored after the word segmentation processing to obtain encrypted data includes: performing data encryption on the data to be stored after word segmentation processing based on the public key to obtain encrypted data; storing the encrypted data into a ciphertext table, and obtaining a data ciphertext ID; and sending the private key and the data ciphertext ID to the cloud.
In an embodiment of the present invention, asymmetric encryption (also referred to as public key encryption) is an encryption technique that uses a pair of keys: public and private keys. The public key is used to encrypt data and the private key is used to decrypt data. One major advantage of this technique is that confidentiality and security of the data can be guaranteed, since even if the public key is public, only the person who has the corresponding private key can decrypt the data. After the word segmentation process is completed, the data are encrypted by using the public key. The public key may be published publicly, which anyone can use to encrypt data, but only those who have the corresponding private key can decrypt the data. A key advantage of asymmetric encryption is that even if the encrypted data is intercepted, it cannot be decrypted unless the private key is in possession. And storing the encrypted data into a ciphertext table. The ciphertext table is a database table dedicated to storing encrypted data. Each record includes encrypted data and a unique ciphertext ID. The ciphertext ID is an identifier that identifies each encrypted data record. For security management, the private key and the ciphertext ID need to be sent to the cloud for storage and management. The cloud may provide higher security and availability, ensuring that the private key is not compromised or lost. When the data needs to be decrypted, the corresponding encrypted data can be found through the ciphertext ID, and the private key is used for decryption.
According to the scheme of the invention, the security of the data in the storage and transmission processes can be ensured by using asymmetric encryption. Even if an attacker intercepts the encrypted data, it cannot decrypt it unless it has the corresponding private key. The asymmetric encryption not only can protect confidentiality of data, but also can ensure the integrity of the data and prevent the data from being tampered in the transmission and storage processes.
And the word segmentation encryption unit is used for performing symmetric encryption on a plurality of word segmentation items obtained by word segmentation processing to obtain a word segmentation item ciphertext.
Specifically, a symmetric encryption key is generated, symmetric encryption is carried out on a plurality of participles obtained through participle processing based on the symmetric encryption key, and participle ciphertext is obtained; and sending the symmetric encryption key and the participle ciphertext to a cloud.
In the embodiment of the invention, symmetric encryption is a technology widely applied in modern information security, and uses a single key to perform encryption and decryption operations. The symmetric encryption process includes three main steps of key generation, encryption and decryption.
Further, a symmetric encryption key needs to be generated. The most common of symmetric encryption algorithms include Advanced Encryption Standard (AES), data Encryption Standard (DES), triple data encryption standard (3 DES), etc. Here, the generation of the key is described using AES as an example. AES supports key lengths of 128 bits, 192 bits and 256 bits. A 256-bit AES key may be generated using a cryptograph library in Python. Symmetric encryption is performed on a plurality of participles obtained by the participle processing. Word segmentation is a common technique in Natural Language Processing (NLP) that splits text into individual words or phrases. These terms may be words, phrases, or other text segments. And sending the symmetric encryption key and the term ciphertext to the cloud. Cloud storage uses, for example, AWS S3, google Cloud Storage or other cloud storage services. HTTPS is used for transmission before transmission in order to ensure the security of data.
And the transmission unit is used for transmitting the encrypted data and the term ciphertext to the cloud end so as to enable the cloud end to execute data association storage.
Specifically, the cloud judges whether the term ciphertext already exists in a stored term ciphertext table; if the word-term ciphertext exists, executing corresponding word-term ciphertext and data ciphertext ID association storage; if the word segmentation term ciphertext does not exist, storing the corresponding word segmentation term ciphertext into a stored word segmentation term ciphertext table, and executing the association storage of the corresponding word segmentation term ciphertext and the data ciphertext ID.
In the embodiment of the invention, in the cloud data processing process, the term segmentation ciphertext is a ciphertext formed by segmenting and encrypting sensitive data. In order to ensure the security of data, efficient management and storage of these participle ciphertext items is required. Specifically, when a new term ciphertext is generated, the system first needs to determine whether the term ciphertext already exists in the stored term ciphertext table. The system may first query in the stored participle ciphertext table to determine if the participle ciphertext already exists. This step is implemented by a hash algorithm and an encryption algorithm to ensure the efficiency and security of the query process. And according to the query result, the system judges whether the participle item ciphertext already exists in the table. If present, means that the term ciphertext has been previously processed and stored. If the term ciphertext exists, the system executes the associated storage operation of the corresponding term ciphertext and the data ciphertext ID. This means that the system will associate the term ciphertext with the new data ciphertext ID to enable quick locating and retrieval of relevant data during subsequent queries and processing. If the term ciphertext does not exist, the system stores the term ciphertext into a stored term ciphertext table. Then, the system also executes the associated storage operation of the corresponding segmentation term ciphertext and the data ciphertext ID so as to ensure the integrity and traceability of the data.
Based on the scheme of the invention, by judging whether the participle ciphertext exists or not, the system can avoid repeatedly storing and processing the same participle ciphertext. This not only reduces the occupation of storage space, but also improves the efficiency of data processing. Such a mechanism may significantly reduce the load on the system, particularly in a large-scale data processing environment. In the data transmission and storage process, all the word segmentation items and the data ciphertext are encrypted, so that the safety of the data is ensured. Even if stored and processed in the cloud, sensitive data is not revealed or tampered with. The process of determining whether the participle ciphertext exists is actually a data deduplication operation. By this mechanism, the system can avoid storing redundant data, thereby saving storage resources. This is of great importance for cloud computing platforms that handle mass data.
The request unit is used for initiating a query request so that the cloud end can perform the word segmentation ciphertext matching based on the query word segmentation in the query request and recover the encrypted data corresponding to the word segmentation ciphertext matched by the cloud end.
Specifically, the cloud obtains a plaintext word segmentation term based on a query request, and encrypts the plaintext word segmentation term based on a symmetric encryption key uploaded by a user side to obtain a query word segmentation term ciphertext; judging whether the query term ciphertext exists in a stored term ciphertext table or not; if the query data does not exist, judging that the corresponding query data does not exist, and feeding back a query failure signal to the user side; if the data ciphertext ID exists, feeding back a ciphertext table of the data ciphertext ID corresponding to the matched segmentation ciphertext to the user side.
In the embodiment of the invention, a user terminal initiates a query request, and a system acquires a corresponding plaintext word segmentation term according to the request. These terms are keywords or phrases extracted from the user's query terms for locating and retrieving relevant data. The system encrypts the plaintext word segmentation term by using a pre-generated symmetric encryption key to generate a query word segmentation term ciphertext. The process ensures the security of the query segmentation term in the transmission and processing process and prevents the plaintext information from being intercepted or tampered. The system compares the generated query term ciphertext with the stored term ciphertext table, and judges whether the ciphertext exists in the table. If the query term ciphertext does not exist in the stored term ciphertext table, the system judges that the corresponding query data does not exist and feeds back a query failure signal to the user side. If the query term ciphertext exists in the stored term ciphertext table, the system feeds back the ciphertext table of the data ciphertext ID corresponding to the matched term ciphertext to the user side so that the user can decrypt and obtain query data.
In the embodiment of the invention, the query segmentation term is always in an encryption state in the transmission and processing processes through symmetric encryption, so that the leakage and the falsification of plaintext data are effectively prevented, and the security of data query is ensured. By judging whether the query segmentation ciphertext exists or not, the system can quickly determine whether corresponding data exists or not, so that the query efficiency is improved. Particularly in a large-scale data processing environment, the mechanism can remarkably reduce the inquiry time and improve the response speed of the system. The system only stores and processes the association information of the word segmentation ciphertext and the data ciphertext ID, and a large amount of redundant data is prevented from being stored. This mechanism not only saves memory space, but also reduces the processing burden on the system.
And the decryption unit is used for decrypting the recovered encrypted data to obtain query data.
Specifically, a corresponding private key is obtained from the cloud based on the data ciphertext ID of the feedback ciphertext table; and decrypting the ciphertext table based on the private key to obtain a plaintext list, and pushing the plaintext list to a user as query data.
In the embodiment of the invention, in the data query process, after the user side submits a query request and carries out multiple encryption and verification steps, the final system feeds back the data ciphertext ID of a group of ciphertext tables to the user side. Next, the user side needs to obtain a corresponding private key from the cloud end, and perform decryption on the ciphertext table based on the private key, so as to obtain a required plaintext list.
The embodiment of the invention also provides a data encryption management system, which comprises the user side and a cloud end in communication connection with the user side.
The embodiment of the invention also provides a computer readable storage medium, wherein the computer readable storage medium stores instructions, which when running on a computer, cause the computer to execute the data encryption management method.
Those skilled in the art will appreciate that all or part of the steps in a method for implementing the above embodiments may be implemented by a program stored in a storage medium, where the program includes several instructions for causing a single-chip microcomputer, chip or processor (processor) to perform all or part of the steps in a method according to the embodiments of the invention. And the aforementioned storage medium includes: a usb disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
The alternative embodiments of the present invention have been described in detail above with reference to the accompanying drawings, but the embodiments of the present invention are not limited to the specific details of the above embodiments, and various simple modifications may be made to the technical solutions of the embodiments of the present invention within the scope of the technical concept of the embodiments of the present invention, and all the simple modifications belong to the protection scope of the embodiments of the present invention. In addition, the specific features described in the above embodiments may be combined in any suitable manner without contradiction. In order to avoid unnecessary repetition, the various possible combinations of embodiments of the invention are not described in detail.
In addition, any combination of the various embodiments of the present invention may be made, so long as it does not deviate from the idea of the embodiments of the present invention, and it should also be regarded as what is disclosed in the embodiments of the present invention.
Claims (10)
1. A data encryption management method, wherein the method is performed by a user terminal, the method comprising:
Collecting data to be stored, performing word segmentation on the data to be stored, and performing asymmetric encryption on the data to be stored after the word segmentation is completed to obtain encrypted data;
Symmetric encryption is carried out on a plurality of participles obtained through participle processing, and participle ciphertext is obtained;
Sending the encrypted data and the term ciphertext to a cloud end so as to enable the cloud end to execute data association storage;
Initiating a query request, so that a cloud end performs word segmentation ciphertext matching based on a query word segmentation in the query request, and recovering encrypted data corresponding to the word segmentation ciphertext matched by the cloud end;
and executing decryption on the recovered encrypted data to obtain query data.
2. The method of claim 1, wherein the performing word segmentation on the data to be stored comprises:
based on a pre-selected word segmentation tool, executing the initialization of the word segmentation tool, and executing the loading of the word segmentation tool after the initialization is completed;
and executing word segmentation processing on the plaintext data of the data to be stored based on the word segmentation tool to obtain a plurality of word segmentation terms.
3. The method according to claim 1, wherein performing asymmetric encryption on the data to be stored after the word segmentation process is completed to obtain encrypted data, comprises:
performing data encryption on the data to be stored after word segmentation processing based on the public key to obtain encrypted data;
storing the encrypted data into a ciphertext table, and obtaining a data ciphertext ID;
And sending the private key and the data ciphertext ID to the cloud.
4. A method according to claim 3, wherein the performing symmetric encryption on the plurality of terms obtained by the term segmentation process to obtain a term ciphertext comprises:
Generating a symmetric encryption key, and performing symmetric encryption on a plurality of participles obtained by participle processing based on the symmetric encryption key to obtain participle ciphertext;
and sending the symmetric encryption key and the participle ciphertext to a cloud.
5. The method of claim 4, wherein the step of providing for cloud-based data-associative storage comprises:
The cloud is configured to:
Judging whether the word segmentation ciphertext exists in a stored word segmentation ciphertext table or not;
if the word-term ciphertext exists, executing corresponding word-term ciphertext and data ciphertext ID association storage;
If the word segmentation term ciphertext does not exist, storing the corresponding word segmentation term ciphertext into a stored word segmentation term ciphertext table, and executing the association storage of the corresponding word segmentation term ciphertext and the data ciphertext ID.
6. The method of claim 1, wherein the initiating the query request for the cloud to perform the term ciphertext matching based on the query term in the query request, and recovering the encrypted data corresponding to the term ciphertext matched by the cloud, comprises:
The cloud is configured to:
Obtaining a plaintext word segmentation term based on a query request, and encrypting the plaintext word segmentation term based on a symmetric encryption key uploaded by a user side to obtain a query word segmentation term ciphertext;
Judging whether the query term ciphertext exists in a stored term ciphertext table or not;
If the query data does not exist, judging that the corresponding query data does not exist, and feeding back a query failure signal to the user side;
if the data ciphertext ID exists, feeding back a ciphertext table of the data ciphertext ID corresponding to the matched segmentation ciphertext to the user side.
7. The method of claim 1, wherein decrypting the recovered encrypted data to obtain the query data comprises:
acquiring a corresponding private key from the cloud based on the fed-back data ciphertext ID of the ciphertext table;
And decrypting the ciphertext table based on the private key to obtain a plaintext list, and pushing the plaintext list to a user as query data.
8. A client is characterized in that, the user terminal comprises:
The acquisition unit is applied to acquiring data to be stored, performing word segmentation on the data to be stored, and performing asymmetric encryption on the data to be stored after the word segmentation is completed to obtain encrypted data;
the word segmentation encryption unit is used for performing symmetric encryption on a plurality of word segmentation items obtained by word segmentation processing to obtain a word segmentation item ciphertext;
the transmission unit is used for transmitting the encrypted data and the term ciphertext to the cloud end so as to enable the cloud end to execute data association storage;
the request unit is used for initiating a query request so that the cloud end can perform the word segmentation ciphertext matching based on the query word segmentation in the query request and recover the encrypted data corresponding to the word segmentation ciphertext matched by the cloud end;
And the decryption unit is used for decrypting the recovered encrypted data to obtain query data.
9. A data encryption management system, comprising the client of claim 8, and a cloud end communicatively connected to the client.
10. A computer readable storage medium having stored thereon instructions which, when run on a computer, cause the computer to perform the data encryption management method according to any one of claims 1 to 7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410930253.2A CN118944913A (en) | 2024-07-11 | 2024-07-11 | Data encryption management method and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410930253.2A CN118944913A (en) | 2024-07-11 | 2024-07-11 | Data encryption management method and system |
Publications (1)
Publication Number | Publication Date |
---|---|
CN118944913A true CN118944913A (en) | 2024-11-12 |
Family
ID=93359520
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202410930253.2A Pending CN118944913A (en) | 2024-07-11 | 2024-07-11 | Data encryption management method and system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN118944913A (en) |
-
2024
- 2024-07-11 CN CN202410930253.2A patent/CN118944913A/en active Pending
Similar Documents
Publication | Publication Date | Title |
---|---|---|
AU2018367363B2 (en) | Processing data queries in a logically sharded data store | |
US10089487B2 (en) | Masking query data access pattern in encrypted data | |
US20210067320A1 (en) | System and method to protect sensitive information via distributed trust | |
US9977918B2 (en) | Method and system for verifiable searchable symmetric encryption | |
US9852306B2 (en) | Conjunctive search in encrypted data | |
CN101587479B (en) | Database management system kernel oriented data encryption/decryption system and method thereof | |
CN107194273B (en) | Data desensitization method and system capable of being queried continuously | |
KR20130039354A (en) | Database management system and encrypting method thereof | |
AU2017440029B2 (en) | Cryptographic key generation for logically sharded data stores | |
US20140281589A1 (en) | Secure database searching | |
CN117235796B (en) | Electronic commerce data processing method | |
US20130097430A1 (en) | Encrypting data and characterization data that describes valid contents of a column | |
US20190363878A1 (en) | Value comparison server, value comparison encryption system, and value comparison method | |
KR20220092811A (en) | Method and device for storing encrypted data | |
JP2006189925A (en) | Private information management system, private information management program, and private information protection method | |
US20230334141A1 (en) | Systems and methods for providing secure passwords | |
JPWO2017168798A1 (en) | Encrypted search index merge server, encrypted search index merge system, and encrypted search index merge method | |
CN115694921B (en) | Data storage method, device and medium | |
CN117762994A (en) | Data query method, device, electronic equipment and storage medium | |
CN118944913A (en) | Data encryption management method and system | |
CN115688132A (en) | Database field encryption method and device supporting SQL query | |
CN112560065A (en) | Method for directly indexing database ciphertext | |
Heidinger et al. | Efficient and secure exact-match queries in outsourced databases | |
CN115834203B (en) | Data processing method and device, electronic equipment and storage medium | |
US20250086306A1 (en) | Improvements to searchable symmetric encryption |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |