Disclosure of Invention
The application provides a medical data classification and grading safety management method combining a large model and a blockchain technology, which aims to solve the technical problems that the existing medical data classification and grading safety management method is high in cost, poor in accuracy and reliability, and difficult to trace and audit responsibility, and cannot meet increasingly strict compliance requirements of the medical industry.
The application is realized by the following scheme:
a medical data classification hierarchical safety management method combining a large model and a blockchain technology comprises the following steps:
s1, constructing a medical data classification grading big model, and obtaining the medical data classification grading big model after pre-training, fine-tuning and parameter optimization training of the big model through public medical data;
s2, medical data classification and grading treatment, namely performing classification and grading treatment on each piece of medical data by using a trained medical data classification and grading large model, mapping the medical data into data classification and grading rules one by one, and adding corresponding classification and grading labels;
S3, data uplink and safe storage, processing and uplink the classified medical data, recording the medical data in a distributed account book, verifying the correctness and the integrity of the data storage through a block chain consensus mechanism, and ensuring that the data cannot be tampered;
S4, intelligent contract dynamic authorization and access control, and after receiving a user request, the blockchain system invokes the related intelligent contract to verify the access authority of the user and returns a verification result;
S5, the classification and grading specification is dynamically updated, when the medical data classification and grading specification is changed, a dynamic updating flow is triggered, a knowledge base is updated, the updated data is uploaded to the block chain system again, the access authority of a user is adjusted, and the privacy safety of the data is guaranteed.
Further, the step S1 specifically includes the steps of:
S11, pre-training a large model through a large amount of public medical data to enable the large model to have professional medical general knowledge;
S12, according to the existing policy specifications, a classification and grading frame rule is formulated according to the basic properties, business attributes and potential risks of medical data;
S13, each medical institution collects representative small sample data sets from data of different categories and different levels by using a data sampling technology from respective private databases, wherein the small sample data sets comprise representative diagnosis and treatment data and rare case data, and the collected data are manually classified and graded marked by a multi-expert evaluation method to construct a high-quality fine adjustment data set;
S14, locally deploying the pre-training large model in each medical institution, finely adjusting the pre-training large model based on personalized data of each medical institution through a low-rank adaptation technology (LoRA), training through a labeling data set under the condition of freezing original parameters of the pre-training model, and optimizing gradient descent based on a cross entropy loss function to obtain a matrix of parameter variation:
Wherein y i is the actual label, The probability of model prediction, n is the category number, and the fine tuning method is applied to obtain a parameter variation matrix (delta W 1,…,ΔWN) based on data training of each medical institution;
S15, masking gradient information by using homomorphic encryption and differential privacy encryption technology on a parameter variation matrix (Deltaw 1,…,ΔWN) trained based on data of each medical institution, and sending the encrypted parameter variation matrix to a central aggregation server;
s16, adopting a method of combining spatial abnormality, behavior abnormality and amplitude abnormality multi-index joint detection to detect the poisoning attack;
s17, calculating contribution degree of each node by combining the quality gain of the local model, the provided data quantity, the training intensity and the abnormal mark number;
And S18, carrying out weighted safety aggregation by the central server according to the contribution degree of each mechanism, wherein a weight change matrix of the mechanism with large contribution degree occupies larger weight in training, adding the weighted added parameter change matrix with a parameter matrix of a pre-training large model, recording the precision change quantity, considering that the model converges when the precision change quantity is smaller than an expected value, ending training, otherwise, encrypting and transmitting the global model parameter change quantity to each mechanism node, updating the local model, and returning to the step S13 until the model training converges.
Further, the step S16 specifically includes the steps of:
S161, firstly, mapping parameter variation matrixes uploaded by all mechanisms into a unified vector space, and then calculating cosine similarity between every two according to the following formula:
The similarity between the parameter variation of the malicious node and most normal nodes is obviously low, an isolated cluster is formed, a threshold value theta is set, and when the node i meets that the average cosine similarity between the node i and other nodes is smaller than the set threshold value, the node i is considered to be an abnormal user;
S162, respectively combining parameter variation matrixes of all medical institutions with a base large model, recording performance variation caused by the parameter variation matrixes, and marking the performance variation as an abnormal user when the performance variation of the model exceeds a normal interval;
And S163, counting parameter variation matrix norms uploaded by all nodes, when malicious attack behaviors exist in the nodes, always generating a delta W attempt control model with larger variation range, marking as an abnormal user if the norms of certain nodes are far larger than the average value, carrying out warning feedback on the abnormal user, requiring the nodes to recheck data marking or re-fine tuning, carrying out abnormal marking accumulation on the nodes, carrying out weight reduction on the delta W when aggregation, and permanently eliminating the mechanism nodes when the accumulated times exceed a preset threshold value.
Further, the step S17 specifically includes:
S171, using indexes such as accuracy, F1 fraction, ROC and the like to record local gains of the DeltaW of each mechanism on the global model, and recording the local gains as G;
s172, counting information such as the number of marked samples, training rounds and the like used for fine adjustment by each medical institution, and marking the information as Q so as to avoid 'taking a car' of a node with extremely small data scale and extremely small training times;
s173, calculating contribution degree of each medical institution according to the following formula by combining the number F of times that the node is judged to be an abnormal node in training:
the weight parameters a, beta and gamma can be dynamically adjusted according to the requirement, A minimum protection threshold is set to avoid complete loss of power for the small medical facility.
Further, the step S2 specifically includes the steps of:
S21, collecting structured and unstructured medical data including electronic health records, medical images, laboratory detection results, doctor diagnosis reports, patient medical records and the like from medical institution multi-source systems such as a hospital information system, a laboratory information management system, an image archiving and communication system and the like, and cleaning the data to remove noise and redundant information;
S22, constructing a knowledge base according to the latest credible medical field information such as the latest medical data classification hierarchical management specification file, the leading edge papers published by the journal of important medical academic and the authoritative research report;
S23, searching related knowledge and rules of medical data to be classified and classified in a knowledge base through a search enhancement technology, and combining a prompt engineering technology to assist a large model to realize more accurate data classification and classification;
S24, classifying and grading each piece of medical data by using the trained large model, mapping the medical data to the data classifying and grading rules one by one, and adding corresponding classifying and grading labels.
Further, the step S3 specifically includes the steps of:
S31, calculating a hash value of classified medical data, and ensuring the uniqueness and the integrity of the data;
S32, encrypting the medical data by using an asymmetric encryption algorithm, so as to ensure the safety of the data in the transmission and storage processes;
S33, uploading the encrypted medical data and classification hierarchical labels thereof to a block chain network, recording the medical data and classification hierarchical labels in a distributed account book, and verifying the correctness and the integrity of data storage through a block chain consensus mechanism to ensure that the data cannot be tampered.
Further, the step S4 specifically includes the steps of:
s41, a user packs a request including a request main body, a target data object and an operation to be executed to generate a request R:
R←F(S,O,A),
Wherein R represents a user request, S represents a subject attribute (comprising a user unique id and a permission level), O represents an object attribute (comprising a data category and a level), and A represents an operation attribute (comprising data operations such as reading, writing and the like);
after signing by the private key, sending the self public key, the certificate signature and the time stamp to the blockchain medical data management system:
B←X{PKX,Sign(R,SKX),T1},
Wherein B represents a blockchain, X represents a user, PK X represents a public key of user X, sign () represents a digital signature, SK X represents a private key of user X, and T 1,…Tm represents a time stamp;
s42, after the blockchain system receives the user request, the public key is used for analyzing the request, and the policy management contract is called to automatically match the data object corresponding to the request:
B{Sign(R,SKX),P(R),T2}→X,
wherein, P represents a policy management contract and is responsible for automatically matching corresponding data objects according to user requests and policy rules;
S43, invoking a permission authentication contract to verify the access permission of the user:
B{V(X),T3}→B,
V represents a permission verification contract, and is responsible for verifying whether a user has permission to access a specific data object, if the user passes the verification, the data retrieval contract is called, and data corresponding to the request is returned:
B{D(R),T4}→X,
wherein D represents a data retrieval contract, is responsible for retrieving and returning a data object requested by a user from the blockchain, and if the verification is not passed, represents that the user request does not meet the policy information in the policy management contract, and returns rejection information:
B{Refused,T4}→X,
Wherein Refused denotes the plaintext of the rejection information.
Further, the step S5 specifically includes the steps of:
S51, when the medical data classification and grading specification changes, the selection and labeling of the fine adjustment data set are carried out again according to the new medical data classification and grading specification, and the step S1 is repeated;
S52, updating a knowledge base, ensuring timeliness and accuracy of retrieval information, providing latest and most reliable knowledge support for medical data classification hierarchical management, and ensuring high efficiency and practicability of a retrieval enhancement technology;
S53, updating the medical data classification labels through the updated large model classification framework, and uploading the updated data to the block chain system again;
S54, adjusting the access rights of the users according to the updated medical data classification grading specifications, and guaranteeing the privacy security of the data.
The application also provides a medical data classification hierarchical safety management device combining the large model and the blockchain technology, which comprises the following components:
The medical data classification grading big model construction module is used for constructing a medical data classification grading big model, and the medical data classification grading big model is obtained after the big model is pre-trained, fine-tuned and parameter optimization trained through public medical data;
The medical data classification and grading processing module is used for classifying and grading the medical data, classifying and grading each piece of medical data by using the trained medical data classification and grading large model, mapping the medical data classification and grading large model into data classification and grading rules one by one, and adding corresponding classification and grading labels;
the data uplink and safety storage module is used for data uplink and safety storage, processing and uplink the classified medical data, recording the medical data in the distributed account book, verifying the correctness and the integrity of the data storage through a block chain consensus mechanism, and ensuring that the data cannot be tampered;
The intelligent contract dynamic authorization and access control module is used for intelligent contract dynamic authorization and access control, and after receiving a user request, the blockchain system invokes the related intelligent contract to verify the access authority of the user and returns a verification result;
The classification and classification standard dynamic updating module is used for dynamically updating the classification and classification standard, triggering a dynamic updating flow when the classification and classification standard of the medical data changes, updating the knowledge base, and uploading the updated data to the blockchain system again, so that the access authority of the user is adjusted, and the privacy safety of the data is ensured.
In another aspect, the present application further provides an electronic device, including a memory, a processor, and a computer program stored on the memory and executable on the processor, where the processor implements the steps of the medical data classification hierarchical security management method combining large model and blockchain technology when the computer program is executed.
The application also provides a storage medium, which comprises a stored program, and when the program runs, the equipment where the storage medium is located is controlled to execute the step of the medical data classification hierarchical safety management method combining the large model and the blockchain technology.
Compared with the prior art, the application has the following beneficial effects:
The application constructs an intelligent and safe medical data classification hierarchical management system through deep fusion of a large model and a blockchain technology. Firstly, the application realizes the medical data classification grading large model collaborative training under the condition that the multi-mechanism data cannot go out of the domain through the federal learning technology, realizes the accurate classification and grading division of multi-mode medical data such as texts, images and the like, remarkably improves the intelligent level and accuracy of data management, and improves the response speed and adaptability of the model to standard updating through combining a fine tuning technology and a retrieval enhancement technology. Secondly, the application ensures the safe storage and non-falsification of medical data through the distributed account book technology, the asymmetric encryption technology and the hash algorithm of the blockchain, solves the responsibility tracing problem in the data management, and provides a highly-trusted data management environment for medical institutions, patients and supervision departments. In addition, the application realizes automatic identity verification and access control through intelligent contract technology, ensures that only legal users can access data of a specific level through predefined rules and logic, and reduces the risks of data leakage and abuse. Through the synergistic effect of the large model and the blockchain, the application obviously improves the efficiency and the safety of medical data management and provides comprehensive and reliable technical support for data value release in the medical industry.
In addition to the objects, features and advantages described above, the present application has other objects, features and advantages. The present application will be described in further detail with reference to the drawings.
Detailed Description
Embodiments of the application are described in detail below with reference to the attached drawing figures, but the application can be practiced in a number of different ways, as defined and covered below.
Interpretation of related terms:
The data classification refers to data classification layers such as major class, middle class, minor class and subclass formed by distinguishing and classifying the data according to a certain principle or method according to certain common attribute or characteristic of the data and by expanding the data in two dimensions such as basic data property and business application attribute for facilitating fine management and use of the data.
The data classification refers to data classification, namely, the data is classified into different levels according to the influence degree of the data, which is tampered, destroyed, illegally acquired or illegally utilized, on national security, economic operation, social stability, public benefit or legal rights of individuals and organizations, so as to distinguish the limitation degree of the range of the accessible use or disclosure of the data.
The large model technology is an advanced artificial intelligence method based on a deep learning technology, and the core of the large model is to simulate the cognitive ability of human beings in natural language learning by using a deep learning neural network architecture containing billions or even billions of parameters, so that the high-efficiency processing and analysis of complex modes and relations contained in data are realized. By using advanced algorithms such as an attention mechanism and the like and combining massive multi-source heterogeneous data for pre-training, the large model can deeply mine fine features and potential rules in the data, and further high-precision prediction, classification, generation and decision making capability are realized. In addition, the large model also has strong migration learning capability, and can be quickly adapted to new tasks and fields. By virtue of the excellent data representation capability and generalization performance, the large model shows remarkable advantages in the fields of multi-mode data processing such as natural language processing, computer vision, voice recognition and the like.
The model fine tuning technology is an efficient artificial intelligence technology based on transfer learning, and is characterized in that the model fine tuning technology is further trained by utilizing general features learned by a pre-training model on a large-scale data set and a small-scale data set aiming at a specific task, so that new tasks are quickly adapted and the performance of the model is improved. Specifically, the fine tuning technique enables capturing a specific pattern and law of a target task by adjusting some or all of the parameters of the pre-training model while retaining its extensive knowledge learned during the pre-training phase. The fine tuning technology has the advantages that the requirement on the data volume of the target task can be remarkably reduced, the training cost is reduced, and meanwhile, the accuracy and the robustness of the model on a new task are improved. In addition, the tuning also supports hierarchical tuning strategies, such as tuning only the last few layers or specific modules of the model, to quickly adapt to task requirements while preserving generic features. By virtue of the high efficiency, flexibility and wide applicability, the fine tuning technology is widely applied to the fields of natural language processing (such as text classification and question-answering systems), computer vision (such as image segmentation and target detection), voice recognition (such as dialect recognition and voice emotion analysis) and the like, and becomes an important means for realizing rapid deployment of models and performance optimization.
The retrieval enhancement technology is an advanced artificial intelligence method combining information retrieval and model generation, and aims to enhance the generation capacity and accuracy of the model by introducing an external knowledge source. The core idea is to dynamically search information related to the task in the generation process, input the search result as a context, and assist the model to generate more accurate and more relevant output. In particular, search enhancement techniques typically include two key components, a search module and a generation module. The generation module combines the searched information with the original input by utilizing a pre-training language model to generate high-quality text output. The technology not only can remarkably improve the performance of the model in the knowledge-intensive task, but also can effectively reduce the risk of generating false or irrelevant contents. In addition, the retrieval enhancement technology has high flexibility and expandability, and can adapt to the requirements of different fields and tasks only by updating the data of the retrieval part, so that the knowledge updating cost of the model is reduced.
The block chain technology is a data management method based on a distributed account book and a cryptography principle, and is characterized in that a highly safe and transparent data storage and transmission system is constructed through the characteristics of decentralization, non-tampering and traceability. The blockchain consists of a series of blocks connected in time sequence, each block contains a group of transaction data which is verified by encryption, and the consistency of the data states of all nodes in the network is ensured through a consensus mechanism. The distributed architecture of the blockchain eliminates the dependence on centralization authority, enhances the attack resistance and fault tolerance of the system, and simultaneously provides a reliable basis for data audit and tracing due to the non-tamperable characteristic.
Intelligent contracts, which are an automated, programmable protocol based on blockchain technology, are characterized in that through predefined rules and logic, contract terms are automatically executed when specific conditions are met, without intervention of a third party. The intelligent contracts utilize the decentralization and non-tamperable properties of the blockchain to ensure transparency, security and reliability of contract execution. The key technology comprises a complete programming language of the figure, a state machine model and an event driving mechanism, and through the technology, the intelligent contract can process complex business logic and realize automatic updating and verification of data on a block chain. The execution process of the intelligent contract is completely transparent, and all operation records are permanently stored on the blockchain, so that auditing and traceability of all parties can be realized. In addition, the intelligent contract also supports multiparty collaboration, can automatically coordinate and execute protocols among a plurality of participants, and reduces human intervention and trust cost.
Federal learning, which is a distributed machine learning framework based on privacy protection and secure encryption technology, is centered on realizing a collaborative learning mechanism that data does not go out of the local by training a model on local equipment and sharing only model parameters or gradients. In the training process of federal learning, the data of each participant cannot go out of the domain, so that the risk of data leakage is avoided, the parameter update of the model is transmitted and aggregated through an encryption channel, and the effectiveness of model training and the compliance of data privacy are ensured. Through the federal learning technology, the global model can be cooperatively updated among a plurality of data sources, meanwhile, the direct exposure of original data is avoided, and the data cooperation modeling of cross-mechanism and cross-region is realized while the data main authority is protected.
As shown in FIG. 1, the preferred embodiment of the present application provides a medical data classification hierarchical security management method combining large model and blockchain technologies, comprising the steps of:
s1, constructing a medical data classification grading big model, and obtaining the medical data classification grading big model after pre-training, fine-tuning and parameter optimization training of the big model through public medical data;
s2, medical data classification and grading treatment, namely performing classification and grading treatment on each piece of medical data by using a trained medical data classification and grading large model, mapping the medical data into data classification and grading rules one by one, and adding corresponding classification and grading labels;
S3, data uplink and safe storage, processing and uplink the classified medical data, recording the medical data in a distributed account book, verifying the correctness and the integrity of the data storage through a block chain consensus mechanism, and ensuring that the data cannot be tampered;
S4, intelligent contract dynamic authorization and access control, and after receiving a user request, the blockchain system invokes the related intelligent contract to verify the access authority of the user and returns a verification result;
S5, the classification and grading specification is dynamically updated, when the medical data classification and grading specification is changed, a dynamic updating flow is triggered, a knowledge base is updated, the updated data is uploaded to the block chain system again, the access authority of a user is adjusted, and the privacy safety of the data is guaranteed.
Aiming at the defects existing in the prior art, the technical scheme adopted by the embodiment is mainly divided into three parts, namely a large model data classification sub-part, a block chain sub-part and an intelligent contract sub-part, wherein:
And the large model data classification and classification sub-part is used for realizing the automatic and intelligent flow of medical data classification and classification, simplifying a large number of manual processing operations and guaranteeing the accuracy of classification and classification results.
And the block chain part is used for storing medical data and classification grading labels thereof, guaranteeing the non-falsification, safety and reliability of the data, and ensuring traceability and transparent management of data operation responsibility.
The intelligent contract sub-part comprises a strategy management contract, an authority verification contract and a data retrieval contract, wherein the strategy management contract is mainly used for making and executing a management strategy, corresponding operation authorities are automatically distributed according to attributes of users or roles, the authority verification contract is used for comparing the authority limit of a request subject with the authority limit of access data, access control is carried out through the strategy in the strategy contract, and the data retrieval contract is used for retrieving and returning corresponding data according to the user request.
The embodiment constructs an intelligent and safe medical data classification hierarchical management system through deep fusion of a large model and a blockchain technology. Firstly, the embodiment realizes the medical data classification grading large model collaborative training under the condition that the multi-mechanism data cannot go out of the domain through the federal learning technology, realizes the accurate classification and grading division of multi-mode medical data such as texts, images and the like, remarkably improves the intelligent level and accuracy of data management, and improves the response speed and adaptability of the model to standard updating through combining a fine tuning technology and a retrieval enhancement technology. Secondly, the embodiment ensures the safe storage and non-falsification of medical data through the distributed account book technology, the asymmetric encryption technology and the hash algorithm of the blockchain, solves the responsibility tracing problem in data management, and provides a highly-trusted data management environment for medical institutions, patients and supervision departments. In addition, the embodiment realizes automatic identity verification and access control through an intelligent contract technology, ensures that only legal users can access data of a specific level through predefined rules and logic, and reduces risks of data leakage and abuse. Through the synergistic effect of the large model and the blockchain, the embodiment remarkably improves the efficiency and the safety of medical data management, and provides comprehensive and reliable technical support for data value release in the medical industry.
Preferably, as shown in fig. 2, the step S1 specifically includes the steps of:
S11, pre-training a large model through a large amount of public medical data to enable the large model to have professional medical general knowledge;
S12, according to the existing policy specifications, a classification and grading frame rule is formulated according to the basic properties, business attributes and potential risks of medical data;
S13, each medical institution collects representative small sample data sets from data of different categories and different levels by using a data sampling technology from respective private databases, wherein the small sample data sets comprise representative diagnosis and treatment data and rare case data, and the collected data are manually classified and graded marked by a multi-expert evaluation method to construct a high-quality fine adjustment data set;
S14, locally deploying the pre-training large model in each medical institution, finely adjusting the pre-training large model based on personalized data of each medical institution through a low-rank adaptation technology (LoRA), training through a labeling data set under the condition of freezing original parameters of the pre-training model, and optimizing gradient descent based on a cross entropy loss function to obtain a matrix of parameter variation:
Wherein y i is the actual label, The probability of model prediction, n is the category number, and the fine tuning method is applied to obtain a parameter variation matrix (delta W 1,…,ΔWN) based on data training of each medical institution;
S15, masking gradient information by using homomorphic encryption and differential privacy encryption technology on a parameter variation matrix (Deltaw 1,…,ΔWN) trained based on data of each medical institution, and sending the encrypted parameter variation matrix to a central aggregation server;
s16, adopting a method of combining spatial abnormality, behavior abnormality and amplitude abnormality multi-index joint detection to detect the poisoning attack;
s17, calculating contribution degree of each node by combining the quality gain of the local model, the provided data quantity, the training intensity and the abnormal mark number;
And S18, carrying out weighted safety aggregation by the central server according to the contribution degree of each mechanism, wherein a weight change matrix of the mechanism with large contribution degree occupies larger weight in training, adding the weighted added parameter change matrix with a parameter matrix of a pre-training large model, recording the precision change quantity, considering that the model converges when the precision change quantity is smaller than an expected value, ending training, otherwise, encrypting and transmitting the global model parameter change quantity to each mechanism node, updating the local model, and returning to the step S13 until the model training converges.
Specifically, the step S16 specifically includes the steps of:
S161, firstly, mapping parameter variation matrixes uploaded by all mechanisms into a unified vector space, and then calculating cosine similarity between every two according to the following formula:
The similarity between the parameter variation of the malicious node and most normal nodes is obviously low, an isolated cluster is formed, a threshold value theta is set, and when the node i meets that the average cosine similarity between the node i and other nodes is smaller than the set threshold value, the node i is considered to be an abnormal user;
S162, respectively combining parameter variation matrixes of all medical institutions with a base large model, recording performance variation caused by the parameter variation matrixes, and marking the performance variation as an abnormal user when the performance variation of the model exceeds a normal interval;
And S163, counting parameter variation matrix norms uploaded by all nodes, when malicious attack behaviors exist in the nodes, always generating a delta W attempt control model with larger variation range, marking as an abnormal user if the norms of certain nodes are far larger than the average value, carrying out warning feedback on the abnormal user, requiring the nodes to recheck data marking or re-fine tuning, carrying out abnormal marking accumulation on the nodes, carrying out weight reduction on the delta W when aggregation, and permanently eliminating the mechanism nodes when the accumulated times exceed a preset threshold value.
Specifically, the step S17 specifically includes:
S171, using indexes such as accuracy, F1 fraction, ROC and the like to record local gains of the DeltaW of each mechanism on the global model, and recording the local gains as G;
s172, counting information such as the number of marked samples, training rounds and the like used for fine adjustment by each medical institution, and marking the information as Q so as to avoid 'taking a car' of a node with extremely small data scale and extremely small training times;
s173, calculating contribution degree of each medical institution according to the following formula by combining the number F of times that the node is judged to be an abnormal node in training:
the weight parameters a, beta and gamma can be dynamically adjusted according to the requirement, A minimum protection threshold is set to avoid complete loss of power for the small medical facility.
Preferably, as shown in fig. 3, the step S2 specifically includes the steps of:
S21, collecting structured and unstructured medical data including electronic health records, medical images, laboratory detection results, doctor diagnosis reports, patient medical records and the like from medical institution multi-source systems such as a hospital information system, a laboratory information management system, an image archiving and communication system and the like, and cleaning the data to remove noise and redundant information;
S22, constructing a knowledge base according to the latest credible medical field information such as the latest medical data classification hierarchical management specification file, the leading edge papers published by the journal of important medical academic and the authoritative research report;
S23, searching related knowledge and rules of medical data to be classified and classified in a knowledge base through a search enhancement technology, and combining a prompt engineering technology to assist a large model to realize more accurate data classification and classification;
S24, classifying and grading each piece of medical data by using the trained large model, mapping the medical data to the data classifying and grading rules one by one, and adding corresponding classifying and grading labels.
Preferably, the step S3 specifically includes the steps of:
S31, calculating a hash value of classified medical data, and ensuring the uniqueness and the integrity of the data;
S32, encrypting the medical data by using an asymmetric encryption algorithm, so as to ensure the safety of the data in the transmission and storage processes;
S33, uploading the encrypted medical data and classification hierarchical labels thereof to a block chain network, recording the medical data and classification hierarchical labels in a distributed account book, and verifying the correctness and the integrity of data storage through a block chain consensus mechanism to ensure that the data cannot be tampered.
Preferably, as shown in fig. 4, the step S4 specifically includes the steps of:
s41, a user packs a request including a request main body, a target data object and an operation to be executed to generate a request R:
R←F(S,O,A),
Wherein R represents a user request, S represents a subject attribute (comprising a user unique id and a permission level), O represents an object attribute (comprising a data category and a level), and A represents an operation attribute (comprising data operations such as reading, writing and the like);
after signing by the private key, sending the self public key, the certificate signature and the time stamp to the blockchain medical data management system:
B←X{PKX,Sign(R,SKX),T1},
Wherein B represents a blockchain, X represents a user, PK X represents a public key of user X, sign () represents a digital signature, SK X represents a private key of user X, and T 1,…Tm represents a time stamp;
s42, after the blockchain system receives the user request, the public key is used for analyzing the request, and the policy management contract is called to automatically match the data object corresponding to the request:
B{Sign(R,SKX),P(R),T2}→X,
wherein, P represents a policy management contract and is responsible for automatically matching corresponding data objects according to user requests and policy rules;
S43, invoking a permission authentication contract to verify the access permission of the user:
B{V(X),T3}→B,
V represents a permission verification contract, and is responsible for verifying whether a user has permission to access a specific data object, if the user passes the verification, the data retrieval contract is called, and data corresponding to the request is returned:
B{D(R),T4}→X,
wherein D represents a data retrieval contract, is responsible for retrieving and returning a data object requested by a user from the blockchain, and if the verification is not passed, represents that the user request does not meet the policy information in the policy management contract, and returns rejection information:
B{Refused,T4}→X,
Wherein Refused denotes the plaintext of the rejection information.
Preferably, the step S5 specifically includes the steps of:
S51, when the medical data classification and grading specification changes, the selection and labeling of the fine adjustment data set are carried out again according to the new medical data classification and grading specification, and the step S1 is repeated;
S52, updating a knowledge base, ensuring timeliness and accuracy of retrieval information, providing latest and most reliable knowledge support for medical data classification hierarchical management, and ensuring high efficiency and practicability of a retrieval enhancement technology;
S53, updating the medical data classification labels through the updated large model classification framework, and uploading the updated data to the block chain system again;
S54, adjusting the access rights of the users according to the updated medical data classification grading specifications, and guaranteeing the privacy security of the data.
As shown in FIG. 5, another preferred embodiment of the present application further provides a medical data classification hierarchical security management apparatus combining large model and blockchain technologies, comprising:
The medical data classification grading big model construction module is used for constructing a medical data classification grading big model, and the medical data classification grading big model is obtained after the big model is pre-trained, fine-tuned and parameter optimization trained through public medical data;
The medical data classification and grading processing module is used for classifying and grading the medical data, classifying and grading each piece of medical data by using the trained medical data classification and grading large model, mapping the medical data classification and grading large model into data classification and grading rules one by one, and adding corresponding classification and grading labels;
the data uplink and safety storage module is used for data uplink and safety storage, processing and uplink the classified medical data, recording the medical data in the distributed account book, verifying the correctness and the integrity of the data storage through a block chain consensus mechanism, and ensuring that the data cannot be tampered;
The intelligent contract dynamic authorization and access control module is used for intelligent contract dynamic authorization and access control, and after receiving a user request, the blockchain system invokes the related intelligent contract to verify the access authority of the user and returns a verification result;
The classification and classification standard dynamic updating module is used for dynamically updating the classification and classification standard, triggering a dynamic updating flow when the classification and classification standard of the medical data changes, updating the knowledge base, and uploading the updated data to the blockchain system again, so that the access authority of the user is adjusted, and the privacy safety of the data is ensured.
In summary, the above embodiment of the present application has the following features:
the application combines the technologies of large model, fine adjustment of model, retrieval enhancement and the like, optimizes the medical data classification and grading flow, and remarkably improves the efficiency and accuracy. The large model has strong multi-mode data processing capability, reduces manpower and time cost, ensures high-precision classification and grading by the characteristic that the model fine adjustment enables the system to be rapidly suitable for different medical data, improves adaptability and expansibility of the system by dynamically searching related knowledge and rules by a search enhancement technology, provides a high-efficiency and reliable data management scheme for medical institutions, and releases data value in the assisted medical industry.
The application provides a multi-mechanism collaborative modeling method based on federal learning, which creatively realizes cross-mechanism joint training of 'available invisible' data and effectively breaks through the problem of island of traditional medical data. Meanwhile, the multi-dimensional toxin-throwing attack detection mechanism is designed by combining multiple indexes of space abnormality, behavior abnormality and amplitude abnormality, the safety of the model training process is improved, the contribution degree of each node is calculated by combining the quality gain of a local model, the provided data quantity, the training intensity and the abnormal mark number, the contribution weight is dynamically adjusted, the influence of bad data is reduced, the overall accuracy and the robustness of the model in medical data classification and classification tasks are improved, and particularly, the modeling effect is excellent under the complex scene facing different medical institution data quality discrepancies, and the method has extremely high application popularization value and industry suitability.
The application ensures the decentralization storage of the data by using the distributed account book technology of the blockchain, avoids the risk of single-point fault and data loss, ensures the integrity and the non-falsifiability of the data by combining the asymmetric encryption technology and the hash algorithm, and effectively prevents the data from being falsified or counterfeited maliciously in the transmission and storage processes. Meanwhile, the transparency and traceability of the blockchain enable each operation of data management to be permanently recorded and publicly checked, traceability of management responsibility is achieved, reliable audit basis is provided for medical institutions and supervision departments, and a transparent, reliable and efficient technical framework is constructed for safety management of medical data.
The application utilizes the intelligent contract technology to set the access rule according to the data classification hierarchical label and distributes the access rule to the blockchain, thereby ensuring that the data with different levels are only opened to the users with corresponding rights. The rule-based automatic access control mechanism not only avoids misoperation and subjective deviation in traditional manual management, but also ensures strict execution and traceability of access rules through the non-tamper property and transparency of the block chain. In addition, the intelligent contract supports dynamic management of user rights, can adjust access rights in real time according to changes of user roles, data sensitivity and service demands, and effectively prevents users from unauthorized operation or misuse of data. Through the synergistic effect of the intelligent contracts and the blockchain, the application realizes the full-flow automation of data storage, access control and authority management, provides efficient and reliable technical guarantee for safe use of medical data, reduces management cost and risk, and builds a more intelligent and safer data management system in the assisted medical industry.
As shown in FIG. 6, the preferred embodiment of the present application also provides an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, which when executed implements the steps of the medical data classification hierarchical security management method of the above embodiment that combines large model and blockchain techniques.
As shown in fig. 7, the preferred embodiment of the present application also provides a computer device, which may be a terminal or a living body detection server, the internal structure of which may be as shown in fig. 7. The computer device includes a processor, a memory, and a network interface connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The network interface of the computer device is used for communicating with other external computer devices through network connection. The computer program, when executed by a processor, performs the steps of the medical data classification hierarchical security management method described above that combines large model and blockchain techniques.
It will be appreciated by those skilled in the art that the structure shown in FIG. 7 is merely a block diagram of some of the structures associated with the present inventive arrangements and is not limiting of the computer device to which the present inventive arrangements may be applied, and that a particular computer device may include more or fewer components than shown, or may combine some of the components, or have a different arrangement of components.
The preferred embodiment of the present application also provides a storage medium including a stored program, which when executed controls a device in which the storage medium is located to perform the steps of the medical data classification hierarchical security management method combining the large model and the blockchain technology in the above embodiment.
It should be noted that the steps illustrated in the flowcharts of the figures may be performed in a computer system such as a set of computer executable instructions, and that although a logical order is illustrated in the flowcharts, in some cases the steps illustrated or described may be performed in an order other than that illustrated herein.
The functions described in the method of this embodiment, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in one or more computing device readable storage media. Based on such understanding, a part of the present application that contributes to the prior art or a part of the technical solution may be embodied in the form of a software product stored in a storage medium, comprising several instructions for causing a computing device (which may be a personal computer, a server, a mobile computing device or a network device, etc.) to execute all or part of the steps of the method described in the embodiments of the present application. The storage medium includes a usb disk, a removable hard disk, a Read-Only Memory (ROM), a random-access Memory (RAM, random Access Memory), a magnetic disk or an optical disk, and other various media capable of storing program codes.
It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein. The scheme in the embodiment of the application can be realized by adopting various computer languages, such as object-oriented programming language Java, an transliteration script language JavaScript and the like.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
While preferred embodiments of the present application have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. It is therefore intended that the following claims be interpreted as including the preferred embodiments and all such alterations and modifications as fall within the scope of the application.
It will be apparent to those skilled in the art that various modifications and variations can be made to the present application without departing from the spirit or scope of the application. Thus, it is intended that the present application also include such modifications and alterations insofar as they come within the scope of the appended claims or the equivalents thereof.