CN113990399A - Privacy-protecting genetic data sharing method and device - Google Patents
Privacy-protecting genetic data sharing method and device Download PDFInfo
- Publication number
- CN113990399A CN113990399A CN202111274064.7A CN202111274064A CN113990399A CN 113990399 A CN113990399 A CN 113990399A CN 202111274064 A CN202111274064 A CN 202111274064A CN 113990399 A CN113990399 A CN 113990399A
- Authority
- CN
- China
- Prior art keywords
- data
- encrypted
- query
- public key
- key
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 47
- 230000002068 genetic effect Effects 0.000 title claims description 28
- 238000004364 calculation method Methods 0.000 claims description 61
- 230000008569 process Effects 0.000 claims description 11
- 238000012545 processing Methods 0.000 claims description 10
- 239000000654 additive Substances 0.000 claims description 6
- 230000000996 additive effect Effects 0.000 claims description 6
- 238000006243 chemical reaction Methods 0.000 claims description 6
- 238000012552 review Methods 0.000 claims description 6
- 230000004044 response Effects 0.000 claims description 5
- 238000012795 verification Methods 0.000 claims description 4
- 238000012550 audit Methods 0.000 claims 3
- 108090000623 proteins and genes Proteins 0.000 abstract description 59
- 238000005516 engineering process Methods 0.000 abstract description 5
- 238000007405 data analysis Methods 0.000 abstract description 3
- 238000004891 communication Methods 0.000 description 4
- 238000010586 diagram Methods 0.000 description 4
- 230000007614 genetic variation Effects 0.000 description 4
- 238000011160 research Methods 0.000 description 4
- 230000001360 synchronised effect Effects 0.000 description 3
- 238000004458 analytical method Methods 0.000 description 2
- 238000013500 data storage Methods 0.000 description 2
- 239000003814 drug Substances 0.000 description 2
- 230000036541 health Effects 0.000 description 2
- 238000013475 authorization Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000000295 complement effect Effects 0.000 description 1
- 238000013503 de-identification Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 229940079593 drug Drugs 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000008520 organization Effects 0.000 description 1
- 230000003449 preventive effect Effects 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 230000001105 regulatory effect Effects 0.000 description 1
- 231100000279 safety data Toxicity 0.000 description 1
- 230000035945 sensitivity Effects 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- 238000002560 therapeutic procedure Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B50/00—ICT programming tools or database systems specially adapted for bioinformatics
- G16B50/40—Encryption of genetic data
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/60—Protecting data
- G06F21/602—Providing cryptographic facilities or services
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L9/00—Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
- H04L9/008—Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols involving homomorphic encryption
Landscapes
- Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Theoretical Computer Science (AREA)
- Bioethics (AREA)
- Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Computer Security & Cryptography (AREA)
- Software Systems (AREA)
- Biotechnology (AREA)
- General Engineering & Computer Science (AREA)
- Genetics & Genomics (AREA)
- Computer Hardware Design (AREA)
- Biophysics (AREA)
- Databases & Information Systems (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- General Physics & Mathematics (AREA)
- Evolutionary Biology (AREA)
- Medical Informatics (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Storage Device Security (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a method and a device for sharing gene data for protecting privacy and safety, which can control who can access and use the data. The method applied to the data providing terminal comprises the following steps: generating a group of virtual individual gene data, and maximally preventing an attacker from associating the encrypted data and the plaintext data by observing; generating a key for a data provider to encrypt an original file to obtain a symmetric encrypted file; allocating an identifier for each virtual individual and each data provider so as to eliminate the influence of the virtual individual on the data analysis statistics of a data inquirer; respectively encrypting the original file, the virtual individual gene data, the secret key and the identification by using the collective public key and sending the encrypted files to the computing node; setting an access policy and receiving and replying to data access notifications. The invention integrates and complements a plurality of technologies including homomorphic encryption, data owner access control based on encryption and block chain technology, and realizes controllable, transparent and safe genome data sharing.
Description
Technical Field
The invention relates to the technical field of data encryption, in particular to a method and a device for sharing gene data for protecting privacy and safety.
Background
Due to the reduction of the cost of DNA analysis, the ever-increasing genome dataset is expected to improve preventive medicine and support the development of more targeted therapeutic methods, so that the value of the large-scale genome dataset is fully reflected. Has entered the large-scale genomics era. The personal genomics market, which is directed towards users, has increased dramatically over the past few years, and a large-scale population genomics program is being implemented in many countries around the world.
However, this potential for large-scale genomics can only be realized if genomic data is widely available. The privacy problem of gene data makes large-scale data sharing extremely difficult. Historically, the privacy problem of health data has always been addressed by "deauthorization" of the data, particularly by deleting fields that show the identity of individuals. However, personal genome data cannot be effectively addressed by traditional "de-identification" because even a small fraction of them is sufficient to identify individuals. Even a small fraction of this is sufficient to identify individuals or relatives, as evidenced by the successful use of DNA in the forensic field, for example.
This has led to a particular sensitivity and concern for genomic data privacy. There is a real or perceived risk of genetic discrimination in insurance, employment and other areas that prevents individuals from participating in demographic studies and using genetic testing services directed toward consumers.
In addition, it is desirable to have control over the use of their genomic data and to be able to share their data with researchers without the risk of misuse, all of which are safeguarded by the demographic initiative. However, these guarantees are currently not available from the population genomics program.
Over the last few years, researchers from the information security community have proposed some solutions. Some of these focus on providing secure storage for genomic data, while others propose ways of how to safely perform certain calculations. However, to date, none of these solutions have been adopted in practice, primarily because of their very limited range of use and applicability in practical applications. Furthermore, the trust problem is beyond the scope of providing secure storage and processing, as it is also closely related to transparency and personal control issues for data sharing and usage. Existing genome data sharing platforms all rely on extensive consent policies, which are adopted by enterprise regulatory data such as the Beacon network constructed by the global genomics and health alliance (GA4GH) to facilitate genome data sharing among enterprises, but do not provide dynamic and refined control over individuals providing genome data. These problems have not been solved effectively.
Therefore, the invention is especially provided.
Disclosure of Invention
The invention aims to provide a method and a device for sharing gene data for protecting privacy and safety, which can realize controllable, transparent and safe genome data sharing by taking a user as a center.
In a first aspect, the present invention provides a method for sharing gene data with privacy and security, which is applied to a data providing terminal, and includes:
generating a group of virtual individuals and corresponding gene data by adopting a virtual individual generation algorithm, namely virtual individual gene data;
generating a symmetric encryption key for a data provider to encrypt an original file to obtain a symmetric encryption file;
assigning an identifier to each virtual individual and data provider;
respectively encrypting the original file, the virtual individual gene data, the key and the identification by using the collective public key to obtain encrypted data, and sending the encrypted data to the computing node;
setting an access policy, and under the condition that a dynamic agreement policy exists, further comprising:
and when the state of the block chain is synchronous, receiving the inquiry request notice for the data provider to check and decide to approve or reject, and sending the result to the computing node.
In a second aspect, the present invention provides a method for sharing gene data to protect privacy and security, which is applied to two or more computing nodes, and includes:
each computing node generates a pair of public key and private key based on an addition homomorphic algorithm and also generates a secret, and all computing nodes broadcast own public keys so as to generate a collective public key; all the computing nodes are established into a block chain;
receiving encrypted data sent by a data providing terminal, verifying the encrypted data by all the computing nodes, and storing the encrypted data in a block chain;
sending the symmetric encrypted file and the identifier encrypted by the collective public key to a storage unit; carrying out distributed re-encryption on the original file and the virtual individual gene data encrypted by the collective public key, and sending the original file and the virtual individual gene data to a storage unit;
receiving an inquiry request which is sent by a data inquiry terminal and is subjected to homomorphic encryption through addition, wherein the inquiry request is verified by all computing nodes and then stored in a block chain; carrying out distributed re-encryption on the query request, and sending the query request to a storage unit;
the distributed re-encryption is to convert the addition homomorphic encryption form into a determined encryption form.
Optionally or preferably, when the method is applied to a computing node, the distributed re-encryption adopts an ElGamal encryption algorithm EC-ElGamal on an elliptic curve, and includes two rounds of processes:
let EK(v)=(C1,C2) Where v is an original file encrypted by a collective public key, or virtual individual gene data encrypted by a collective public key, or a query request, (rG, v + rK) denotes distributed re-encryption of v, K is a collective public key, r is a random number, G is a base point on an elliptic curve,
1, process round: a certain computing node i uses its secret siA 1 is toiG and C2Add and send the result to the next compute node i +1, with the final result being
And 2, process round: a certain computing node i receives the ciphertext of the last computing node i-1Then, the calculation is executed Taking out item 2 of the final result, i.e. the result of determining the encryption, i.e.Wherein
Optionally or preferably, when the method is applied to a computing node, the method further includes:
reading an access strategy on a block chain, and associating the received encrypted data with the access strategy, wherein the access strategy is an all-agreement strategy or a dynamic agreement strategy; for all agreement policies, either agreeing to or denying the query request; for the dynamic agreement strategy, sending a query to the data providing terminal, and receiving and verifying a reply generated on the block chain and sent by the data provider to the query;
for the opt-in-agreement reply, the symmetric key encrypted using the collective public key is converted to a symmetric key encrypted using the public key U of the querier.
Optionally or preferably, when the method is applied to a computing node, the method further includes:
and receiving a calculation result sent by the storage unit, and storing the calculation result on the block chain after the calculation result is verified by all the calculation nodes.
Optionally or preferably, when the method is applied to a computing node, the method further includes a step of converting the computing result into a query result, and the following distributed key exchange is performed jointly by all computing nodes:
let EK(R)=(C1,C2) (rG, R + rK) is the result of the computation encrypted using the collective public key K, U is the public key of the querier,
first of all, modify EK(R)=(C1,C2) Is composed ofEach compute node then successively generates a random number viAnd calculateWherein The final conversion to a query result encrypted using the querier's public key U isWherein v ═ v1+…+vn;
And sending the query result encrypted by using the public key U of the querier to the data query terminal.
In a third aspect, the present invention provides a method for sharing gene data to protect privacy and security, which is applied to a data query terminal, and includes:
distributing query integrals and pseudo-random identities to data queriers, and storing the query integrals and the pseudo-random identities on a block chain;
generating a public key U and a private key U of an inquirer, and providing data field information for the inquirer to perform Boolean combination to establish an inquiry request; the collective public key is used for encrypting the inquiry request and sending the inquiry request to any one computing node together with a public key U of an inquirer;
receiving a query result sent by a computing node, then updating a query integral, and storing the updated query integral on a block chain;
and decrypting the query result by using the private key u of the inquirer, or decrypting the symmetric key by using the private key u of the inquirer, and decrypting the encrypted data by using the symmetric key to obtain the real individual data.
In a fourth aspect, the present invention provides a method for sharing gene data with privacy and security, which is applied to a storage unit, and includes:
receiving and storing a symmetric encrypted file, an identifier encrypted by a collective public key, an original file encrypted by the collective public key and then subjected to distributed re-encryption and virtual individual gene data, namely storage information;
receiving a query request subjected to distributed re-encryption, matching the query request with stored information, respectively performing the following two calculations according to the query request, and sending a calculation result to a calculation node:
let Φ be the set of the identity identifiers of the individuals successfully matched, including the data provider and the virtual individuals;
if only the number of successfully matched individuals is required in the query request, performing the following homomorphic calculation:
if the individual ID identifiers which are successfully matched are required in the query request, the following calculation is carried out:
multiplying the individual identity identifier with a homomorphically encrypted identification, i.e.At this time, the identifier of the virtual individual is 0 after the calculation result is decrypted.
In a fifth aspect, the present invention provides a gene data sharing apparatus for protecting privacy and security, which is applied to a data providing terminal, and includes:
the virtual individual gene data generation module is used for generating a group of virtual individuals and corresponding gene data by adopting a virtual individual generation algorithm, namely virtual individual gene data;
the data sending module is used for generating a symmetric encryption key for a data provider to encrypt an original file to obtain a symmetric encryption file; distributing an identifier for each virtual individual and each data provider, encrypting the original file, the gene data of the virtual individual, the key and the identifier by using the collective public key respectively to obtain encrypted data, and sending the encrypted data to the computing node;
and the auditing module is used for setting an access strategy, receiving an inquiry request notice for a data provider to review, decide approval or rejection and sending a result to the computing node when the dynamic approval strategy exists and the block chain state is synchronous.
In a sixth aspect, the present invention provides a genetic data sharing apparatus for protecting privacy and security, which is applied to a computing node, and includes:
the data receiving module is used for receiving the encrypted data sent by the data providing terminal, and storing the encrypted data in the block chain after being verified by all the computing nodes; receiving an inquiry request which is sent by a data inquiry terminal and is subjected to homomorphic encryption through addition, wherein the inquiry request is verified by all computing nodes and then stored in a block chain; receiving a calculation result sent by a storage unit, and storing the calculation result on a block chain after the calculation result is verified by all the calculation nodes;
the data processing module is used for generating a pair of public key and private key based on the addition homomorphic algorithm and also generating a secret, and all the computing nodes broadcast own public keys so as to generate a collective public key; all the computing nodes are established into a block chain; sending the symmetric encrypted file and the identifier encrypted by the collective public key to a storage unit; performing distributed re-encryption on the original file and the virtual individual gene data which are subjected to the collective public key encryption according to claim 2 or 3, and sending the original file and the virtual individual gene data to a storage unit; and carrying out distributed re-encryption on the query request and sending the query request to the storage unit.
The access policy processing module is used for reading an access policy on the blockchain and associating the received encrypted data with the access policy, wherein the access policy is an all-granted policy or a dynamic-granted policy; for all agreement policies, either agreeing to or denying the query request; for the dynamic agreement strategy, sending a query to the data providing terminal, and receiving and verifying a reply generated on the block chain and sent by the data provider to the query;
for the opt-in-agreement reply, the symmetric key encrypted using the collective public key is converted to a symmetric key encrypted using the public key U of the querier.
In a seventh aspect, the present invention provides a gene data sharing device for protecting privacy and security, which is applied to a data query terminal, and the device includes:
the request generation module is used for distributing query integral and pseudo-random identity to the data inquirer and storing the query integral and the pseudo-random identity on the block chain; generating a public key U and a private key U of an inquirer, and providing data field information for the inquirer to perform Boolean combination to establish an inquiry request; the collective public key is used for encrypting the inquiry request and sending the inquiry request to any one computing node together with a public key U of an inquirer;
the result receiving module is used for receiving the query result sent by the computing node, then updating the query integral and storing the updated query integral on the block chain; and decrypting the query result by using the private key u of the inquirer, or decrypting the symmetric key by using the private key u of the inquirer, and decrypting the encrypted data by using the symmetric key to obtain the real individual data.
In an eighth aspect, the present invention provides a gene data sharing apparatus for protecting privacy and security, which is applied to a storage unit, and the apparatus includes:
the data storage module is used for receiving and storing the symmetric encrypted file, the identifier encrypted by the collective public key, the original file encrypted by the collective public key and then encrypted in a distributed mode and the virtual individual gene data, and the original file and the virtual individual gene data are storage information;
the query matching module is used for receiving the query request subjected to distributed re-encryption, matching the query request with the stored information, respectively performing the following two calculations according to the query request, and sending the calculation result to the calculation node:
let Φ be the set of the identity identifiers of the individuals successfully matched, including the data provider and the virtual individuals;
if only the number of successfully matched individuals is required in the query request, performing the following homomorphic calculation:
if the individual ID identifiers which are successfully matched are required in the query request, the following calculation is carried out:
multiplying the individual identity identifier with a homomorphically encrypted identification, i.e.At this time, the identifier of the virtual individual is 0 after the calculation result is decrypted.
The method and the device for sharing the gene data for protecting privacy and safety provided by the invention have the following beneficial effects:
the combination of homomorphic encryption, encryption-based access control by data providers (i.e., data owners), and blockchain technology enables individuals to share their genomic and clinical data in encrypted form. The device applied to the data providing terminal, the computing node, the storage unit, the data query terminal and the block chain is combined into a safe and credible platform, so that the data uploaded by the data provider can be found by a data query person (such as a researcher), and the encryption state is always kept, and unauthorized access is prevented. In addition, the method also ensures that an authorized data inquirer can access and use the decrypted gene data to perform further data analysis operation in a data provider control and auditable mode, and simultaneously meets the requirements of ensuring trust distribution, end-to-end protection of genome data, safety data release, control and auditable of data provider individuals on own data and the like.
Most of the previous researches on privacy security are focused on the application of a single encryption technology, and the invention integrates multiple complementary technologies and realizes controllable, transparent and secure genome data sharing.
The previously proposed embodiments of privacy protection security focus on organization genome data sharing, and the invention centers on users (data providers, data inquirers) and guides the needs of the users on personal genomics services, so that the users can control the personal genomics data.
The present invention is versatile and portable and can be easily integrated with secure computing protocols with centralized features, making it more secure (by distributing trust across multiple nodes) and more user-centric (by enabling individuals to have control over the use of their data).
Drawings
Fig. 1 is a schematic diagram illustrating communication relationships among parts of a data query terminal, a data providing terminal, a computing node, and a storage time unit in the gene data sharing apparatus for protecting privacy and security according to the embodiment;
fig. 2 is a schematic diagram illustrating a communication relationship between a data providing terminal and a computing node in the gene data sharing apparatus for protecting privacy and security in the embodiment;
FIG. 3 is a schematic diagram illustrating a communication relationship between a storage time unit and a computing node in the gene data sharing apparatus for protecting privacy and security in the embodiment;
fig. 4 is a schematic diagram illustrating a communication relationship between a data query terminal and a computing node in the gene data sharing device for protecting privacy and security in the embodiment.
Detailed Description
The technical solutions of the present invention will be explained and illustrated in detail with reference to specific embodiments so that they will be more clearly understood and can be implemented by those skilled in the art. The steps listed in the examples are not limited to the order listed.
In a first embodiment, a method for sharing gene data with privacy and security is provided, and is applied to a data providing terminal, and the method includes:
s 11: and generating a group of virtual individuals and corresponding gene data by adopting a virtual individual generation algorithm, namely virtual individual gene data.
The purpose of generating virtual individual gene data is to maximally prevent an attacker from associating by observing encrypted data and plaintext data. The virtual individual generation algorithm inputs all the gene data which can be acquired, including a set of clinical attributes, gene variations and the like, and corresponding population statistics, namely frequency distribution of the gene variations, prevalence rate of clinical variables and the like, and outputs a group of virtual individuals and corresponding gene data after calculation.
The group refers to more than two, namely at least two virtual individuals in the group, and gene data corresponding to each virtual individual. The greater the number of virtual individuals, the stronger the protection for the real data provider.
s 12: generating a symmetric encryption key S for a data provideriThe system comprises a first encryption module, a second encryption module and a third encryption module, wherein the first encryption module is used for encrypting an original file to obtain a symmetrical encrypted file; assigning an identifier f to each virtual individual and data provideri;
Encrypting S using a collective public key Ki:EK(Si) Respectively encrypting the original file and the virtual individual gene data by using the collective public key K, and simultaneously encrypting the identifier fi::EK(fi) And obtaining the encrypted data and sending the encrypted data to the computing node.
The original file, i.e. the file containing the genetic data and the clinical data, such as the genetic variation data, the clinical medical data, etc., all have a specific standard format.
Assigning an identifier f to each virtual individual and data provideriIn order to eliminate the influence of the virtual individual on the data analysis statistics of the subsequent data inquirer, the identifier may be set according to the requirement, for example, the length of the identifier is set to be 1 bit, the identifier represents a real individual when being 1, and the identifier represents a virtual individual when being 0.
s 13: and setting an access policy, receiving an inquiry request notification for a data provider to review and decide approval or rejection when the dynamic approval policy is in state synchronization with the block chain, and sending the approval or rejection result to the computing node.
The access policy generally has two basic options: all consent and dynamic consent. The all-agreement policy is to set some options that must be satisfied, when all the options are satisfied, the computing node will provide data access after all the computing nodes verify, otherwise, the computing node will refuse. The dynamic agreement policy is that the data providing terminal receives an inquiry notice when synchronizing with the blockchain state, the notice contains data access request related information, and the data provider can review the information and then choose to agree or reject as a response.
The hash signature of the original file and the set access policy are linked to the identity of the data provider (the data provider's own public key, used to generate transactions on the blockchain).
In this embodiment, the encrypted data sent to the computing node further includes a pseudo identity of the data provider, the pseudo identity is distributed by the data providing terminal, and also includes an access policy for protecting the real data provider.
In a second embodiment, a method for sharing genetic data with privacy and security is provided, and applied to a computing node, the method includes:
s 21: each compute node generates a pair of a public key and a private key based on an additive homomorphic algorithm and also generates a secret, e.g., based on an EC-ElGamal additive homomorphic encryption algorithm, the ith compute node generates a private key kiAnd a public key KiAnd secret si. All computing nodes broadcast their own public key KiThereby generating a collective public keyAll compute nodes are built into a blockchain as auditable and unalterable logs of operations performed by parties and used to store topology information about storage units.
The collective public key ensures that data encrypted under the collective public key is protected unless all computing nodes are destroyed or the private keys of all computing nodes are stolen. The more compute nodes that participate in generating the collective key, the higher the overall security of the system.
s 22: and receiving encrypted data (containing an access strategy and a pseudo identity of a data provider) sent by the data providing terminal, verifying by all the computing nodes, and storing the encrypted data on the block chain.
Sending the symmetric encrypted file and the identifier encrypted by the collective public key to a storage unit; and performing distributed re-encryption on the original file and the virtual individual gene data encrypted by the collective public key, converting the addition homomorphic encryption form into a determined encryption form, and sending the determined encryption form to the storage unit.
s 23: receiving an inquiry request which is sent by a data inquiry terminal and is subjected to homomorphic encryption through addition, wherein the inquiry request is verified by all computing nodes and then stored in a block chain; and carrying out distributed re-encryption on the query request, converting the addition homomorphic encryption form into a determined encryption form, and sending the determined encryption form to the storage unit.
The purpose of distributed re-encryption is to convert data from an additively homomorphic encrypted form to a deterministically encrypted form, thereby enabling the storage unit to perform an "equal" match query on the distributed re-encrypted data. The method specifically comprises the step of matching gene data which are stored in a storage unit and uploaded by a data providing terminal and distributed and re-encrypted through a computing node with query request data which are uploaded by a data query terminal and distributed and re-encrypted through the computing node.
The general encryption algorithm is probability encryption, that is, the cipher text obtained by encrypting the same message is different. And the cipher text obtained by encrypting the same message each time is determined to be encrypted is the same. Although using deterministic encryption may reveal the distribution of encrypted data. However, since the data providing terminal generates a virtual individual so that the overall distribution of encrypted data is uniform, it is impossible for an adversary to perform a frequency analysis attack. As long as the encrypted individual identification is secure, it is impossible to distinguish between a real individual and a virtual individual.
The distributed re-encryption protocol involves a 2-round process involving the participation of all the compute nodes. The present embodiment takes the ElGamal encryption algorithm EC-ElGamal on the elliptic curve as an example, and describes the procedure of the protocol. The protocol may be constructed on a lattice code.
Let EK(v)=(C1,C2) Where v is an original file encrypted by a collective public key, or virtual individual gene data encrypted by a collective public key, or a query request, (rG, v + rK) denotes distributed re-encryption of v, K is a collective public key, r is a random number, and G is a base point on an elliptic curve.
1, process round: a certain computing node i uses its secret siA 1 is toiG and C2Add and send the result to the next meterComputing node i +1, the final result obtained is
And 2, process round: a certain computing node i receives the ciphertext of the last computing node i-1Then, the calculation is executed Taking out item 2 of the final result, i.e. the result of determining the encryption, i.e.Wherein
The additive homomorphic encryption form can be converted into a deterministic encryption form using distributed re-encryption.
s 24: and reading an access policy on the blockchain, and associating the received encrypted data with the access policy, wherein the access policy is formulated by the data providing terminal and can be an all-agreement policy or a dynamic agreement policy. For all agreement policies, either agreeing to or denying the query request; for the dynamic consent policy, a query notification is sent to the data providing terminal, and a reply to the query notification sent by the data provider generated on the block chain is received and verified. For the opt-in-agreement reply, the symmetric key encrypted using the collective public key is converted to a symmetric key encrypted using the public key U of the querier.
s 25: and receiving a calculation result sent by the storage unit, and storing the calculation result on the block chain after the calculation result is verified by all the calculation nodes.
The calculation result is obtained after the storage unit performs query matching and calculation, the calculation result requires all the calculation nodes to jointly execute distributed key exchange, and the calculation result is converted into a query result and then is sent to the data query terminal.
Let EK(R)=(C1,C2) (rG, R + rK) is the result of the computation encrypted using the collective public key K, U is the public key of the querier,
first of all, modify EK(R)=(C1,C2) Is composed ofEach compute node then successively generates a random number viAnd calculateWherein The final conversion to a query result encrypted using the querier's public key U isWherein M ═ C2-(k1+……+kn-1)C1,v=v1+…+vn。
Therefore, the calculation result is converted into the inquiry result encrypted by the public key U of the inquirer, and the inquiry result can be decrypted by the inquirer by using the private key of the inquirer after the inquiry result is sent to the data inquiry terminal.
When the inquirer needs to obtain the original file, the key exchange step converts the key encrypted by the collective public key (namely, the symmetrically encrypted key generated by the data providing terminal) into the symmetric key encrypted by the inquirer public key, and the conversion process is the same as the key exchange process. The converted key can be used by the data inquiry terminal to decrypt the key by using the private key of the data inquiry terminal, and after the original file is downloaded, the decrypted key is used for decrypting the original file, so that the original data is obtained.
In a third embodiment, a method for sharing gene data with privacy and security is provided, and applied to a storage unit, the method includes:
s 31: and receiving and storing the symmetric encrypted file sent by the computing node, the identifier encrypted by the collective public key, the original file encrypted by the collective public key and then encrypted in a distributed way and the virtual individual gene data, namely the storage information.
s 32: receiving a distributed re-encrypted query request sent by a computing node, matching the query request with stored information, respectively performing the following two calculations according to the query request, and sending a calculation result to the computing node:
let Φ be the set of the identity identifiers of the individuals successfully matched, including the data provider and the virtual individuals;
if only the number of successfully matched individuals, namely the sum R of successfully matched individual identifications, is required in the query request, the following homomorphic calculation is carried out:
if the individual ID identifiers which are successfully matched are required in the query request, the following calculation is carried out:
multiplying the individual identity identifier with a homomorphically encrypted identification, i.e.At this time, the identifier of the virtual individual is 0 after the calculation result is decrypted, so that the query is not influenced. Where id represents an individual identity identifier.
The execution of the query match in both cases results in a blockchain transaction that includes the definition of the query and the matching successful pin identifier. For non-tamperproof storage on the blockchain after verification by all compute nodes.
In a fourth embodiment, a method for sharing gene data with privacy and security is provided, and is applied to a data query terminal, and the method includes:
s 41: distributing query integrals and pseudo-random identities to data queriers, and storing the query integrals and the pseudo-random identities on a block chain; generating a public key U and a private key U of an inquirer, and providing data field information for the inquirer to perform Boolean combination to establish an inquiry request; the query request is encrypted using the collective public key and sent to any one of the compute nodes along with the public key U of the querier.
When the data inquirer registers at the data inquiry terminal, the data inquiry terminal records the registration information and performs proper authentication, and then distributes a pseudo-random identity and inquiry integral for the inquirer. The pseudo-random identity protects the privacy security of the querier.
The query score is recorded on the blockchain, and is consumed for each query. The purpose of this query credit is to limit the total number of query requests that the same data querier can send, and thus limit the amount of sensitive information that may be inferred from the query results that is not disclosed about the data provider. After proper authentication, an authorized querier with sufficient query credit can run a secure data query. And based on his authorization level, either obtains the total number of individuals whose data on the platform matches or the identifiers of these individuals, i.e. obtains different query rights. The query score does not decrease if the same query request is made multiple times by the same querier.
The query requests can be of different query authorities, and if only the information of the number of matches, such as the number of individuals with a certain genetic variation, needs to be obtained, the generated query request mainly contains the content of the query and the pseudo-random identity of the querier. If the specific information of the matched individual and the original file and the like are needed to be obtained, the generated query request also contains the file signature based on the hash function and the information of the querier, such as the name, the affiliated unit, the description of the research needing to access the personal data and the like in more detail.
After the query request is generated, a transaction is generated on the blockchain, that is, a certain computing node receives the query request sent by the data query terminal and the public key U of the querier, and the transaction is verified by all the computing nodes and then stored on the blockchain.
s 42: and receiving a query result sent by the computing node, then updating the query integral, and storing the updated query integral on the block chain.
For the query result only containing the matching quantity information, directly using the private key u of the inquirer to decrypt the query result;
for the inquiry result also containing the detailed information such as the matched individual identifier and the original file, the private key u of the inquirer is used for decrypting the key (namely, the symmetrically encrypted key generated by the data providing terminal), and the decrypted key is used for decrypting the original encrypted data to obtain the real individual data.
In a fifth embodiment, a genetic data sharing apparatus for protecting privacy and security is provided, which is applied to a data providing terminal, and includes:
the virtual individual gene data generation module is used for generating a group of virtual individuals and corresponding gene data by adopting a virtual individual generation algorithm, namely virtual individual gene data;
the data sending module is used for generating a symmetric encryption key for a data provider to encrypt an original file to obtain a symmetric encryption file; distributing an identifier for each virtual individual and each data provider, encrypting the original file, the gene data of the virtual individual, the symmetric encrypted file and the identifier by using a collective public key respectively to obtain encrypted data, and sending the encrypted data to a computing node;
and the auditing module is used for setting an access strategy, receiving an inquiry request notice for a data provider to review and decide approval or rejection when the dynamic approval strategy exists and the block chain state is synchronous, and sending the approval or rejection result to the computing node.
The data providing terminal is various devices that can be a front end, including a processor and a storage medium, and a program, such as a web page or the like, including a virtual individual gene data generating module, a data transmitting module, and an auditing module, which are stored in the storage medium and can be executed by the processor. The data providers, namely the users, can upload the original files of clinical and genetic variation data of the users safely through the data providing terminals and share the original files with other people for group genomics research.
In a sixth embodiment, a genetic data sharing apparatus for protecting privacy and security is provided, which is applied to a computing node, and includes:
the data receiving module is used for receiving the encrypted data sent by the data providing terminal, and storing the encrypted data in the block chain after being verified by all the computing nodes; receiving an inquiry request which is sent by a data inquiry terminal and is subjected to homomorphic encryption through addition, wherein the inquiry request is verified by all computing nodes and then stored in a block chain; receiving a calculation result sent by a storage unit, and storing the calculation result on a block chain after the calculation result is verified by all the calculation nodes;
the data processing module is used for generating a pair of public key and private key based on the addition homomorphic algorithm and also generating a secret, and all the computing nodes broadcast own public keys so as to generate a collective public key; all the computing nodes are established into a block chain; sending the symmetric encrypted file and the identifier encrypted by the collective public key to a storage unit; performing distributed re-encryption on the original file and the virtual individual gene data which are subjected to the collective public key encryption according to claim 2 or 3, and sending the original file and the virtual individual gene data to a storage unit; and carrying out distributed re-encryption on the query request and sending the query request to the storage unit.
The access policy processing module is used for reading an access policy on the blockchain and associating the received encrypted data with the access policy, wherein the access policy is an all-granted policy or a dynamic-granted policy; for all agreement policies, either agreeing to or denying the query request; for the dynamic agreement strategy, sending a query to the data providing terminal, and receiving and verifying a reply generated on the block chain and sent by the data provider to the query; for the opt-in-agreement reply, the symmetric key encrypted using the collective public key is converted to a symmetric key encrypted using the public key U of the querier.
The data processing module is further configured to convert the computation result into a query result, the conversion being performed jointly by all the compute nodes for the following distributed key exchange:
let EK(R)=(C1,C2) (rG, R + rK) is the result of the computation encrypted using the collective public key K, U is the public key of the querier,
first of all, modify EK(R)=(C1,C2) Is composed ofEach compute node then successively generates a random number viAnd calculateWherein The final conversion to a query result encrypted using the querier's public key U isWherein M ═ C2-(k1+……+kn-1)C1,v=v1+…+vn;
And the data processing module sends the query result encrypted by using the public key U of the querier to the data query terminal.
The device applied to the data node is a server hosted by a mutually independent government, academic or commercial institution, and a plurality of devices applied to the computing node are responsible for collectively and securely processing a query request of a data querier.
In a seventh embodiment, a genetic data sharing apparatus for protecting privacy and security is provided, which is applied to a data query terminal, and includes:
the request generation module is used for distributing query integral and pseudo-random identity to the data inquirer and storing the query integral and the pseudo-random identity on the block chain; generating a public key U and a private key U of an inquirer, and providing data field information for the inquirer to perform Boolean combination to establish an inquiry request; the collective public key is used for encrypting the inquiry request and sending the inquiry request to any one computing node together with a public key U of an inquirer;
the result receiving module is used for receiving the query result sent by the computing node, then updating the query integral and storing the updated query integral on the block chain; and decrypting the query result by using the private key u of the inquirer, or decrypting the symmetric key by using the private key u of the inquirer, and decrypting the encrypted data by using the symmetric key to obtain the real individual data.
Data inquirers are generally researchers who use data inquiry terminals to send inquiry requests and obtain inquiry results, and then find out individuals with clinical and genetic variation interested in them so as to recruit them in clinical research or drug trials.
In an eighth embodiment, there is provided a genetic data sharing apparatus for protecting privacy and security, applied to a storage unit, including:
the data storage module is used for receiving and storing the symmetric encrypted file and the identification encrypted by the collective public key, and the original file and the virtual individual gene data which are encrypted by the collective public key and then distributed and re-encrypted, namely the storage information;
the query matching module is used for receiving the query request subjected to distributed re-encryption, matching the query request with the stored information, respectively performing the following two calculations according to the demands of the querier, and sending the calculation result to the calculation node:
let Φ be the set of the identity identifiers of the individuals successfully matched, including the data provider and the virtual individuals;
if the inquirer only needs the number of successfully matched individuals, namely the sum R of successfully matched individual identifications, the following homomorphic calculation is carried out:
if the inquirer needs to match the successful individual ID, the following calculation is carried out:
multiplying an individual identity identifier with a homomorphically encrypted identityI.e. byAt this time, the identification of the virtual individual is 0 after the calculation result is decrypted by the data inquiry terminal. Where id represents an individual identity identifier.
The means applied to the storage unit may be one or more servers responsible for securely storing the clinical and genomic data of the data provider. The data provider may select any one of the servers to store their data. These servers may be distributed in any government, academic, or commercial establishment having an IT infrastructure capable of providing storage for large amounts of data.
The inventive concept is explained in detail herein using specific examples, which are given only to aid in understanding the core concepts of the invention. It should be understood that any obvious modifications, equivalents and other improvements made by those skilled in the art without departing from the spirit of the present invention are included in the scope of the present invention.
Claims (10)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111274064.7A CN113990399B (en) | 2021-10-29 | 2021-10-29 | Genetic data sharing method and device with privacy protection |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111274064.7A CN113990399B (en) | 2021-10-29 | 2021-10-29 | Genetic data sharing method and device with privacy protection |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113990399A true CN113990399A (en) | 2022-01-28 |
CN113990399B CN113990399B (en) | 2025-01-14 |
Family
ID=79744585
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111274064.7A Active CN113990399B (en) | 2021-10-29 | 2021-10-29 | Genetic data sharing method and device with privacy protection |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113990399B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114996763A (en) * | 2022-07-28 | 2022-09-02 | 北京锘崴信息科技有限公司 | Private data security analysis method and device based on trusted execution environment |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109768987A (en) * | 2019-02-26 | 2019-05-17 | 重庆邮电大学 | A secure and private storage and sharing method of data files based on blockchain |
CN111191288A (en) * | 2019-12-30 | 2020-05-22 | 中电海康集团有限公司 | Block chain data access authority control method based on proxy re-encryption |
CN111723354A (en) * | 2019-03-21 | 2020-09-29 | 宏观基因有限公司 | Method for providing biological data, method for encrypting biological data, and method for processing biological data |
CN112840403A (en) * | 2018-07-17 | 2021-05-25 | 李伦京 | Methods for preserving and using genomes and genomic data |
CN113468570A (en) * | 2021-07-15 | 2021-10-01 | 湖北央中巨石信息技术有限公司 | Private data sharing method based on intelligent contract |
-
2021
- 2021-10-29 CN CN202111274064.7A patent/CN113990399B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112840403A (en) * | 2018-07-17 | 2021-05-25 | 李伦京 | Methods for preserving and using genomes and genomic data |
CN109768987A (en) * | 2019-02-26 | 2019-05-17 | 重庆邮电大学 | A secure and private storage and sharing method of data files based on blockchain |
CN111723354A (en) * | 2019-03-21 | 2020-09-29 | 宏观基因有限公司 | Method for providing biological data, method for encrypting biological data, and method for processing biological data |
CN111191288A (en) * | 2019-12-30 | 2020-05-22 | 中电海康集团有限公司 | Block chain data access authority control method based on proxy re-encryption |
CN113468570A (en) * | 2021-07-15 | 2021-10-01 | 湖北央中巨石信息技术有限公司 | Private data sharing method based on intelligent contract |
Non-Patent Citations (1)
Title |
---|
隋爱娜等: "数字内容安全技术", 31 October 2016, 中国传媒大学出版社, pages: 229 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114996763A (en) * | 2022-07-28 | 2022-09-02 | 北京锘崴信息科技有限公司 | Private data security analysis method and device based on trusted execution environment |
CN114996763B (en) * | 2022-07-28 | 2022-11-15 | 北京锘崴信息科技有限公司 | Private data security analysis method and device based on trusted execution environment |
Also Published As
Publication number | Publication date |
---|---|
CN113990399B (en) | 2025-01-14 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11019040B2 (en) | Cloud key escrow system | |
US20190294817A1 (en) | Method and system for managing access to personal data by means of a smart contract | |
Gao et al. | BSSPD: A Blockchain‐Based Security Sharing Scheme for Personal Data with Fine‐Grained Access Control | |
EP3345372B1 (en) | Secure key management and peer-to-peer transmission system with a controlled, double-tier cryptographic key structure and corresponding method thereof | |
CN108173805A (en) | Collaborative Construction Method of Distributed Anonymous Areas with Privacy Preservation Based on Blockchain | |
JP2023500570A (en) | Digital signature generation using cold wallet | |
CN114866323A (en) | User-controllable private data authorization sharing system and method | |
CN115242518A (en) | Medical health data protection system and method under mixed cloud environment | |
KR102465467B1 (en) | The decentralized user data storage and sharing system based on DID | |
CN113393225A (en) | Digital currency encryption payment method and system | |
Gajmal et al. | Blockchain-based access control and data sharing mechanism in cloud decentralized storage system | |
Guo et al. | Using blockchain to control access to cloud data | |
Almuzaini et al. | Key Aggregation Cryptosystem and Double Encryption Method for Cloud‐Based Intelligent Machine Learning Techniques‐Based Health Monitoring Systems | |
Huynh et al. | A reliability guaranteed solution for data storing and sharing | |
CN115396096B (en) | Encryption and decryption method and protection system for secret files based on national secret algorithm | |
Singh et al. | Mutual authentication framework using fog computing in healthcare | |
CN117216786A (en) | On-demand sharing method of statistical data on crowdsourcing platform based on blockchain and differential privacy | |
CN113990399B (en) | Genetic data sharing method and device with privacy protection | |
EP4165851A1 (en) | Distributed anonymized compliant encryption management system | |
Zhang et al. | Blockchain-enabled data governance for privacy-preserved sharing of confidential data | |
KR102475434B1 (en) | Security method and system for crypto currency | |
Chao et al. | A patient-identity security mechanism for electronic medical records during transit and at rest | |
WO2023104745A1 (en) | A distributed communication network | |
De Oliveira et al. | Red Alert: break-glass protocol to access encrypted medical records in the cloud | |
Yau et al. | Anonymous service usage and payment in service-based systems |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |