CN120613060A

CN120613060A - Medical data classification and grading security management method, device, electronic device and storage medium combining big model and blockchain technology

Info

Publication number: CN120613060A
Application number: CN202510591610.1A
Authority: CN
Inventors: 陈裕友; 马超群; 万丽
Original assignee: Hunan University
Current assignee: Hunan University
Priority date: 2025-05-09
Filing date: 2025-05-09
Publication date: 2025-09-09

Abstract

This application discloses a method, device, electronic device, and storage medium for secure management of medical data classification and grading that combines a large model with blockchain technology. The method comprises the following steps: S1, constructing a large model for medical data classification and grading; S2, processing medical data classification and grading; S3, uploading data to a blockchain and securely storing it; S4, dynamic authorization and access control of smart contracts; and S5, dynamic updating of classification and grading specifications. Through the deep integration of large models and blockchain technology, the present invention achieves intelligent and secure management of medical data classification and grading. It enhances the intelligence and accuracy of data management through federated learning technology, and improves the model's responsiveness and adaptability to updates to regulatory standards through fine-tuning and other techniques. Blockchain technology addresses the issue of data management accountability. Furthermore, the present invention significantly improves the efficiency and security of medical data management, providing comprehensive and reliable technical support for unlocking the value of data in the medical industry.

Description

Medical data classification hierarchical security management method and device combining large model and blockchain technology, electronic equipment and storage medium

Technical Field

The invention relates to the technical field of blockchains and large models, in particular to a medical data classification hierarchical security management method, a device, electronic equipment and a storage medium combining the large model and the blockchain.

Background

The classified and hierarchical safety management of medical data refers to a management system which classifies the medical data into different categories and grades according to the sensitivity, importance and potential risk of the medical data and implements differentiated safety protection. The core aim is to balance the data security and the value utilization and ensure the privacy of patients and the compliance and circulation of medical information. Classification is generally based on data sources (e.g., patient records, public health data) and attributes (e.g., personal identity information, health status data), and classification is generally classified into three levels, core data, important data, and general data, depending on the extent to which data leakage or tampering may cause damage.

The purposes of the management mechanism described above include:

1. Strengthening safety protection, namely, adopting measures such as encryption, access control and the like in a targeted manner through definitely determining the grade of sensitive data (such as patient identity information and disease history), and reducing leakage risk;

2. meets the compliance requirements, namely, enforces the regulations of the data security laws, personal information protection laws, and the like, and ensures the legality of data processing;

3. data sharing is promoted, namely non-sensitive data (such as desensitized statistical information) is opened in a controllable range, and scene applications such as medical research, medical insurance payment and the like are supported;

4. Optimizing data management, namely improving the data management efficiency of medical institutions through asset combing and standard formulation, and laying a foundation for accurate medical treatment and intelligent hospital construction.

At present, the existing medical data classification hierarchical management main flow is as follows:

X1. establishing data classification and classification rules, namely constructing scientific and systematic data classification and classification frameworks and rules according to related specifications and standards;

X2. data asset carding, namely comprehensively carding the structured and unstructured data assets of the mechanism to form an original list of the data assets, and defining basic information and related parties of the data assets;

x3. data asset-data classification and grading mapping, namely mapping database tables, fields, data items, data files and the like in an original list of the data asset to data asset units in a data classification and grading rule one by one through data element association, and defining classification and grading of the data asset;

X4. data classification and classification-asset unit list auditing, namely auditing and optimizing and perfecting the data classification and classification-data asset mapping result;

X5, marking the basic attribute of the data asset, namely marking the basic attribute of the data asset unit according to the description requirement of the data asset and the retrieval requirement of the data asset catalog;

x6, auditing and optimizing the basic attribute labeling result of the data asset to finally form a data asset catalog;

And X7, dynamically updating and managing the data classification rules, the data classification-asset unit list, the basic attribute labeling set, the data asset catalogue and the like according to the data classification elements and the changes possibly influencing the data classification elements.

The management flow has the following defects:

① The complexity of classification rules and the professional and diversity characteristics of medical data lead to the fact that each processing step of data classification management needs to input a large amount of professional human resources, the economic cost and the time cost are high, and the data classification management is difficult to bear and implement for small medical institutions;

② Medical data classification and classification management methods relying on manual operation or basic rule engines are prone to misjudgment or omission, and accuracy and reliability of final classification results are affected;

③ When the standards of classification and grading are changed, the update response speed of the existing system is low, and new rule requirements cannot be adapted in time, so that timeliness of data processing is affected, and compliance risks are increased;

④ The existing data management method generally lacks traceability for classifying and grading data and recording storage addresses thereof, is fuzzy in management responsibility definition, and cannot effectively track change history and use records of the data, so that responsibility tracing and auditing are difficult to carry out when illegal operations such as data leakage and tampering occur, and increasingly strict compliance requirements of medical industry cannot be met.

Disclosure of Invention

The application provides a medical data classification and grading safety management method combining a large model and a blockchain technology, which aims to solve the technical problems that the existing medical data classification and grading safety management method is high in cost, poor in accuracy and reliability, and difficult to trace and audit responsibility, and cannot meet increasingly strict compliance requirements of the medical industry.

The application is realized by the following scheme:

a medical data classification hierarchical safety management method combining a large model and a blockchain technology comprises the following steps:

s1, constructing a medical data classification grading big model, and obtaining the medical data classification grading big model after pre-training, fine-tuning and parameter optimization training of the big model through public medical data;

s2, medical data classification and grading treatment, namely performing classification and grading treatment on each piece of medical data by using a trained medical data classification and grading large model, mapping the medical data into data classification and grading rules one by one, and adding corresponding classification and grading labels;

S3, data uplink and safe storage, processing and uplink the classified medical data, recording the medical data in a distributed account book, verifying the correctness and the integrity of the data storage through a block chain consensus mechanism, and ensuring that the data cannot be tampered;

S4, intelligent contract dynamic authorization and access control, and after receiving a user request, the blockchain system invokes the related intelligent contract to verify the access authority of the user and returns a verification result;

S5, the classification and grading specification is dynamically updated, when the medical data classification and grading specification is changed, a dynamic updating flow is triggered, a knowledge base is updated, the updated data is uploaded to the block chain system again, the access authority of a user is adjusted, and the privacy safety of the data is guaranteed.

Further, the step S1 specifically includes the steps of:

S11, pre-training a large model through a large amount of public medical data to enable the large model to have professional medical general knowledge;

S12, according to the existing policy specifications, a classification and grading frame rule is formulated according to the basic properties, business attributes and potential risks of medical data;

S13, each medical institution collects representative small sample data sets from data of different categories and different levels by using a data sampling technology from respective private databases, wherein the small sample data sets comprise representative diagnosis and treatment data and rare case data, and the collected data are manually classified and graded marked by a multi-expert evaluation method to construct a high-quality fine adjustment data set;

S14, locally deploying the pre-training large model in each medical institution, finely adjusting the pre-training large model based on personalized data of each medical institution through a low-rank adaptation technology (LoRA), training through a labeling data set under the condition of freezing original parameters of the pre-training model, and optimizing gradient descent based on a cross entropy loss function to obtain a matrix of parameter variation:

Wherein y _i is the actual label, The probability of model prediction, n is the category number, and the fine tuning method is applied to obtain a parameter variation matrix (delta W ₁,…,ΔW_N) based on data training of each medical institution;

S15, masking gradient information by using homomorphic encryption and differential privacy encryption technology on a parameter variation matrix (Deltaw ₁,…,ΔW_N) trained based on data of each medical institution, and sending the encrypted parameter variation matrix to a central aggregation server;

s16, adopting a method of combining spatial abnormality, behavior abnormality and amplitude abnormality multi-index joint detection to detect the poisoning attack;

s17, calculating contribution degree of each node by combining the quality gain of the local model, the provided data quantity, the training intensity and the abnormal mark number;

And S18, carrying out weighted safety aggregation by the central server according to the contribution degree of each mechanism, wherein a weight change matrix of the mechanism with large contribution degree occupies larger weight in training, adding the weighted added parameter change matrix with a parameter matrix of a pre-training large model, recording the precision change quantity, considering that the model converges when the precision change quantity is smaller than an expected value, ending training, otherwise, encrypting and transmitting the global model parameter change quantity to each mechanism node, updating the local model, and returning to the step S13 until the model training converges.

Further, the step S16 specifically includes the steps of:

S161, firstly, mapping parameter variation matrixes uploaded by all mechanisms into a unified vector space, and then calculating cosine similarity between every two according to the following formula:

The similarity between the parameter variation of the malicious node and most normal nodes is obviously low, an isolated cluster is formed, a threshold value theta is set, and when the node i meets that the average cosine similarity between the node i and other nodes is smaller than the set threshold value, the node i is considered to be an abnormal user;

S162, respectively combining parameter variation matrixes of all medical institutions with a base large model, recording performance variation caused by the parameter variation matrixes, and marking the performance variation as an abnormal user when the performance variation of the model exceeds a normal interval;

And S163, counting parameter variation matrix norms uploaded by all nodes, when malicious attack behaviors exist in the nodes, always generating a delta W attempt control model with larger variation range, marking as an abnormal user if the norms of certain nodes are far larger than the average value, carrying out warning feedback on the abnormal user, requiring the nodes to recheck data marking or re-fine tuning, carrying out abnormal marking accumulation on the nodes, carrying out weight reduction on the delta W when aggregation, and permanently eliminating the mechanism nodes when the accumulated times exceed a preset threshold value.

Further, the step S17 specifically includes:

S171, using indexes such as accuracy, F1 fraction, ROC and the like to record local gains of the DeltaW of each mechanism on the global model, and recording the local gains as G;

s172, counting information such as the number of marked samples, training rounds and the like used for fine adjustment by each medical institution, and marking the information as Q so as to avoid 'taking a car' of a node with extremely small data scale and extremely small training times;

s173, calculating contribution degree of each medical institution according to the following formula by combining the number F of times that the node is judged to be an abnormal node in training:

the weight parameters a, beta and gamma can be dynamically adjusted according to the requirement, A minimum protection threshold is set to avoid complete loss of power for the small medical facility.

Further, the step S2 specifically includes the steps of:

S21, collecting structured and unstructured medical data including electronic health records, medical images, laboratory detection results, doctor diagnosis reports, patient medical records and the like from medical institution multi-source systems such as a hospital information system, a laboratory information management system, an image archiving and communication system and the like, and cleaning the data to remove noise and redundant information;

S22, constructing a knowledge base according to the latest credible medical field information such as the latest medical data classification hierarchical management specification file, the leading edge papers published by the journal of important medical academic and the authoritative research report;

S23, searching related knowledge and rules of medical data to be classified and classified in a knowledge base through a search enhancement technology, and combining a prompt engineering technology to assist a large model to realize more accurate data classification and classification;

S24, classifying and grading each piece of medical data by using the trained large model, mapping the medical data to the data classifying and grading rules one by one, and adding corresponding classifying and grading labels.

Further, the step S3 specifically includes the steps of:

S31, calculating a hash value of classified medical data, and ensuring the uniqueness and the integrity of the data;

S32, encrypting the medical data by using an asymmetric encryption algorithm, so as to ensure the safety of the data in the transmission and storage processes;

S33, uploading the encrypted medical data and classification hierarchical labels thereof to a block chain network, recording the medical data and classification hierarchical labels in a distributed account book, and verifying the correctness and the integrity of data storage through a block chain consensus mechanism to ensure that the data cannot be tampered.

Further, the step S4 specifically includes the steps of:

s41, a user packs a request including a request main body, a target data object and an operation to be executed to generate a request R:

R←F(S,O,A),

Wherein R represents a user request, S represents a subject attribute (comprising a user unique id and a permission level), O represents an object attribute (comprising a data category and a level), and A represents an operation attribute (comprising data operations such as reading, writing and the like);

after signing by the private key, sending the self public key, the certificate signature and the time stamp to the blockchain medical data management system:

B←X{PK_X,Sign(R,SK_X),T₁},

Wherein B represents a blockchain, X represents a user, PK _X represents a public key of user X, sign () represents a digital signature, SK _X represents a private key of user X, and T ₁,…T_m represents a time stamp;

s42, after the blockchain system receives the user request, the public key is used for analyzing the request, and the policy management contract is called to automatically match the data object corresponding to the request:

B{Sign(R,SK_X),P(R),T₂}→X,

wherein, P represents a policy management contract and is responsible for automatically matching corresponding data objects according to user requests and policy rules;

S43, invoking a permission authentication contract to verify the access permission of the user:

B{V(X),T₃}→B,

V represents a permission verification contract, and is responsible for verifying whether a user has permission to access a specific data object, if the user passes the verification, the data retrieval contract is called, and data corresponding to the request is returned:

B{D(R),T₄}→X,

wherein D represents a data retrieval contract, is responsible for retrieving and returning a data object requested by a user from the blockchain, and if the verification is not passed, represents that the user request does not meet the policy information in the policy management contract, and returns rejection information:

B{Refused,T₄}→X,

Wherein Refused denotes the plaintext of the rejection information.

Further, the step S5 specifically includes the steps of:

S51, when the medical data classification and grading specification changes, the selection and labeling of the fine adjustment data set are carried out again according to the new medical data classification and grading specification, and the step S1 is repeated;

S52, updating a knowledge base, ensuring timeliness and accuracy of retrieval information, providing latest and most reliable knowledge support for medical data classification hierarchical management, and ensuring high efficiency and practicability of a retrieval enhancement technology;

S53, updating the medical data classification labels through the updated large model classification framework, and uploading the updated data to the block chain system again;

S54, adjusting the access rights of the users according to the updated medical data classification grading specifications, and guaranteeing the privacy security of the data.

The application also provides a medical data classification hierarchical safety management device combining the large model and the blockchain technology, which comprises the following components:

The medical data classification grading big model construction module is used for constructing a medical data classification grading big model, and the medical data classification grading big model is obtained after the big model is pre-trained, fine-tuned and parameter optimization trained through public medical data;

The medical data classification and grading processing module is used for classifying and grading the medical data, classifying and grading each piece of medical data by using the trained medical data classification and grading large model, mapping the medical data classification and grading large model into data classification and grading rules one by one, and adding corresponding classification and grading labels;

the data uplink and safety storage module is used for data uplink and safety storage, processing and uplink the classified medical data, recording the medical data in the distributed account book, verifying the correctness and the integrity of the data storage through a block chain consensus mechanism, and ensuring that the data cannot be tampered;

The intelligent contract dynamic authorization and access control module is used for intelligent contract dynamic authorization and access control, and after receiving a user request, the blockchain system invokes the related intelligent contract to verify the access authority of the user and returns a verification result;

The classification and classification standard dynamic updating module is used for dynamically updating the classification and classification standard, triggering a dynamic updating flow when the classification and classification standard of the medical data changes, updating the knowledge base, and uploading the updated data to the blockchain system again, so that the access authority of the user is adjusted, and the privacy safety of the data is ensured.

In another aspect, the present application further provides an electronic device, including a memory, a processor, and a computer program stored on the memory and executable on the processor, where the processor implements the steps of the medical data classification hierarchical security management method combining large model and blockchain technology when the computer program is executed.

The application also provides a storage medium, which comprises a stored program, and when the program runs, the equipment where the storage medium is located is controlled to execute the step of the medical data classification hierarchical safety management method combining the large model and the blockchain technology.

Compared with the prior art, the application has the following beneficial effects:

The application constructs an intelligent and safe medical data classification hierarchical management system through deep fusion of a large model and a blockchain technology. Firstly, the application realizes the medical data classification grading large model collaborative training under the condition that the multi-mechanism data cannot go out of the domain through the federal learning technology, realizes the accurate classification and grading division of multi-mode medical data such as texts, images and the like, remarkably improves the intelligent level and accuracy of data management, and improves the response speed and adaptability of the model to standard updating through combining a fine tuning technology and a retrieval enhancement technology. Secondly, the application ensures the safe storage and non-falsification of medical data through the distributed account book technology, the asymmetric encryption technology and the hash algorithm of the blockchain, solves the responsibility tracing problem in the data management, and provides a highly-trusted data management environment for medical institutions, patients and supervision departments. In addition, the application realizes automatic identity verification and access control through intelligent contract technology, ensures that only legal users can access data of a specific level through predefined rules and logic, and reduces the risks of data leakage and abuse. Through the synergistic effect of the large model and the blockchain, the application obviously improves the efficiency and the safety of medical data management and provides comprehensive and reliable technical support for data value release in the medical industry.

In addition to the objects, features and advantages described above, the present application has other objects, features and advantages. The present application will be described in further detail with reference to the drawings.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this specification, illustrate embodiments of the application and together with the description serve to explain the application.

FIG. 1 is a flow chart of a method for classified hierarchical security management of medical data combining large model and blockchain technologies in accordance with a preferred embodiment of the present application.

FIG. 2 is a schematic diagram of a medical data classification hierarchical large model construction flow in accordance with a preferred embodiment of the present application.

FIG. 3 is a schematic diagram of a classification and classification flow of medical data based on large model technique in accordance with a preferred embodiment of the present application.

FIG. 4 is a schematic diagram of the present application for managing user data access control based on blockchain smartcontracts.

FIG. 5 is a schematic diagram of a medical data classification hierarchical security management apparatus combining large model and blockchain technologies in accordance with another preferred embodiment of the present application.

Fig. 6 is a schematic block diagram of an electronic device entity of the preferred embodiment of the present application.

Fig. 7 is an internal structural view of the computer device of the preferred embodiment of the present application.

Detailed Description

Embodiments of the application are described in detail below with reference to the attached drawing figures, but the application can be practiced in a number of different ways, as defined and covered below.

Interpretation of related terms:

The data classification refers to data classification layers such as major class, middle class, minor class and subclass formed by distinguishing and classifying the data according to a certain principle or method according to certain common attribute or characteristic of the data and by expanding the data in two dimensions such as basic data property and business application attribute for facilitating fine management and use of the data.

The data classification refers to data classification, namely, the data is classified into different levels according to the influence degree of the data, which is tampered, destroyed, illegally acquired or illegally utilized, on national security, economic operation, social stability, public benefit or legal rights of individuals and organizations, so as to distinguish the limitation degree of the range of the accessible use or disclosure of the data.

The large model technology is an advanced artificial intelligence method based on a deep learning technology, and the core of the large model is to simulate the cognitive ability of human beings in natural language learning by using a deep learning neural network architecture containing billions or even billions of parameters, so that the high-efficiency processing and analysis of complex modes and relations contained in data are realized. By using advanced algorithms such as an attention mechanism and the like and combining massive multi-source heterogeneous data for pre-training, the large model can deeply mine fine features and potential rules in the data, and further high-precision prediction, classification, generation and decision making capability are realized. In addition, the large model also has strong migration learning capability, and can be quickly adapted to new tasks and fields. By virtue of the excellent data representation capability and generalization performance, the large model shows remarkable advantages in the fields of multi-mode data processing such as natural language processing, computer vision, voice recognition and the like.

The model fine tuning technology is an efficient artificial intelligence technology based on transfer learning, and is characterized in that the model fine tuning technology is further trained by utilizing general features learned by a pre-training model on a large-scale data set and a small-scale data set aiming at a specific task, so that new tasks are quickly adapted and the performance of the model is improved. Specifically, the fine tuning technique enables capturing a specific pattern and law of a target task by adjusting some or all of the parameters of the pre-training model while retaining its extensive knowledge learned during the pre-training phase. The fine tuning technology has the advantages that the requirement on the data volume of the target task can be remarkably reduced, the training cost is reduced, and meanwhile, the accuracy and the robustness of the model on a new task are improved. In addition, the tuning also supports hierarchical tuning strategies, such as tuning only the last few layers or specific modules of the model, to quickly adapt to task requirements while preserving generic features. By virtue of the high efficiency, flexibility and wide applicability, the fine tuning technology is widely applied to the fields of natural language processing (such as text classification and question-answering systems), computer vision (such as image segmentation and target detection), voice recognition (such as dialect recognition and voice emotion analysis) and the like, and becomes an important means for realizing rapid deployment of models and performance optimization.

The retrieval enhancement technology is an advanced artificial intelligence method combining information retrieval and model generation, and aims to enhance the generation capacity and accuracy of the model by introducing an external knowledge source. The core idea is to dynamically search information related to the task in the generation process, input the search result as a context, and assist the model to generate more accurate and more relevant output. In particular, search enhancement techniques typically include two key components, a search module and a generation module. The generation module combines the searched information with the original input by utilizing a pre-training language model to generate high-quality text output. The technology not only can remarkably improve the performance of the model in the knowledge-intensive task, but also can effectively reduce the risk of generating false or irrelevant contents. In addition, the retrieval enhancement technology has high flexibility and expandability, and can adapt to the requirements of different fields and tasks only by updating the data of the retrieval part, so that the knowledge updating cost of the model is reduced.

The block chain technology is a data management method based on a distributed account book and a cryptography principle, and is characterized in that a highly safe and transparent data storage and transmission system is constructed through the characteristics of decentralization, non-tampering and traceability. The blockchain consists of a series of blocks connected in time sequence, each block contains a group of transaction data which is verified by encryption, and the consistency of the data states of all nodes in the network is ensured through a consensus mechanism. The distributed architecture of the blockchain eliminates the dependence on centralization authority, enhances the attack resistance and fault tolerance of the system, and simultaneously provides a reliable basis for data audit and tracing due to the non-tamperable characteristic.

Intelligent contracts, which are an automated, programmable protocol based on blockchain technology, are characterized in that through predefined rules and logic, contract terms are automatically executed when specific conditions are met, without intervention of a third party. The intelligent contracts utilize the decentralization and non-tamperable properties of the blockchain to ensure transparency, security and reliability of contract execution. The key technology comprises a complete programming language of the figure, a state machine model and an event driving mechanism, and through the technology, the intelligent contract can process complex business logic and realize automatic updating and verification of data on a block chain. The execution process of the intelligent contract is completely transparent, and all operation records are permanently stored on the blockchain, so that auditing and traceability of all parties can be realized. In addition, the intelligent contract also supports multiparty collaboration, can automatically coordinate and execute protocols among a plurality of participants, and reduces human intervention and trust cost.

Federal learning, which is a distributed machine learning framework based on privacy protection and secure encryption technology, is centered on realizing a collaborative learning mechanism that data does not go out of the local by training a model on local equipment and sharing only model parameters or gradients. In the training process of federal learning, the data of each participant cannot go out of the domain, so that the risk of data leakage is avoided, the parameter update of the model is transmitted and aggregated through an encryption channel, and the effectiveness of model training and the compliance of data privacy are ensured. Through the federal learning technology, the global model can be cooperatively updated among a plurality of data sources, meanwhile, the direct exposure of original data is avoided, and the data cooperation modeling of cross-mechanism and cross-region is realized while the data main authority is protected.

As shown in FIG. 1, the preferred embodiment of the present application provides a medical data classification hierarchical security management method combining large model and blockchain technologies, comprising the steps of:

Aiming at the defects existing in the prior art, the technical scheme adopted by the embodiment is mainly divided into three parts, namely a large model data classification sub-part, a block chain sub-part and an intelligent contract sub-part, wherein:

And the large model data classification and classification sub-part is used for realizing the automatic and intelligent flow of medical data classification and classification, simplifying a large number of manual processing operations and guaranteeing the accuracy of classification and classification results.

And the block chain part is used for storing medical data and classification grading labels thereof, guaranteeing the non-falsification, safety and reliability of the data, and ensuring traceability and transparent management of data operation responsibility.

The intelligent contract sub-part comprises a strategy management contract, an authority verification contract and a data retrieval contract, wherein the strategy management contract is mainly used for making and executing a management strategy, corresponding operation authorities are automatically distributed according to attributes of users or roles, the authority verification contract is used for comparing the authority limit of a request subject with the authority limit of access data, access control is carried out through the strategy in the strategy contract, and the data retrieval contract is used for retrieving and returning corresponding data according to the user request.

The embodiment constructs an intelligent and safe medical data classification hierarchical management system through deep fusion of a large model and a blockchain technology. Firstly, the embodiment realizes the medical data classification grading large model collaborative training under the condition that the multi-mechanism data cannot go out of the domain through the federal learning technology, realizes the accurate classification and grading division of multi-mode medical data such as texts, images and the like, remarkably improves the intelligent level and accuracy of data management, and improves the response speed and adaptability of the model to standard updating through combining a fine tuning technology and a retrieval enhancement technology. Secondly, the embodiment ensures the safe storage and non-falsification of medical data through the distributed account book technology, the asymmetric encryption technology and the hash algorithm of the blockchain, solves the responsibility tracing problem in data management, and provides a highly-trusted data management environment for medical institutions, patients and supervision departments. In addition, the embodiment realizes automatic identity verification and access control through an intelligent contract technology, ensures that only legal users can access data of a specific level through predefined rules and logic, and reduces risks of data leakage and abuse. Through the synergistic effect of the large model and the blockchain, the embodiment remarkably improves the efficiency and the safety of medical data management, and provides comprehensive and reliable technical support for data value release in the medical industry.

Preferably, as shown in fig. 2, the step S1 specifically includes the steps of:

Specifically, the step S16 specifically includes the steps of:

Specifically, the step S17 specifically includes:

Preferably, as shown in fig. 3, the step S2 specifically includes the steps of:

Preferably, the step S3 specifically includes the steps of:

Preferably, as shown in fig. 4, the step S4 specifically includes the steps of:

R←F(S,O,A),

B←X{PK_X,Sign(R,SK_X),T₁},

B{Sign(R,SK_X),P(R),T₂}→X,

B{V(X),T₃}→B,

B{D(R),T₄}→X,

B{Refused,T₄}→X,

Wherein Refused denotes the plaintext of the rejection information.

Preferably, the step S5 specifically includes the steps of:

As shown in FIG. 5, another preferred embodiment of the present application further provides a medical data classification hierarchical security management apparatus combining large model and blockchain technologies, comprising:

In summary, the above embodiment of the present application has the following features:

the application combines the technologies of large model, fine adjustment of model, retrieval enhancement and the like, optimizes the medical data classification and grading flow, and remarkably improves the efficiency and accuracy. The large model has strong multi-mode data processing capability, reduces manpower and time cost, ensures high-precision classification and grading by the characteristic that the model fine adjustment enables the system to be rapidly suitable for different medical data, improves adaptability and expansibility of the system by dynamically searching related knowledge and rules by a search enhancement technology, provides a high-efficiency and reliable data management scheme for medical institutions, and releases data value in the assisted medical industry.

The application provides a multi-mechanism collaborative modeling method based on federal learning, which creatively realizes cross-mechanism joint training of 'available invisible' data and effectively breaks through the problem of island of traditional medical data. Meanwhile, the multi-dimensional toxin-throwing attack detection mechanism is designed by combining multiple indexes of space abnormality, behavior abnormality and amplitude abnormality, the safety of the model training process is improved, the contribution degree of each node is calculated by combining the quality gain of a local model, the provided data quantity, the training intensity and the abnormal mark number, the contribution weight is dynamically adjusted, the influence of bad data is reduced, the overall accuracy and the robustness of the model in medical data classification and classification tasks are improved, and particularly, the modeling effect is excellent under the complex scene facing different medical institution data quality discrepancies, and the method has extremely high application popularization value and industry suitability.

The application ensures the decentralization storage of the data by using the distributed account book technology of the blockchain, avoids the risk of single-point fault and data loss, ensures the integrity and the non-falsifiability of the data by combining the asymmetric encryption technology and the hash algorithm, and effectively prevents the data from being falsified or counterfeited maliciously in the transmission and storage processes. Meanwhile, the transparency and traceability of the blockchain enable each operation of data management to be permanently recorded and publicly checked, traceability of management responsibility is achieved, reliable audit basis is provided for medical institutions and supervision departments, and a transparent, reliable and efficient technical framework is constructed for safety management of medical data.

The application utilizes the intelligent contract technology to set the access rule according to the data classification hierarchical label and distributes the access rule to the blockchain, thereby ensuring that the data with different levels are only opened to the users with corresponding rights. The rule-based automatic access control mechanism not only avoids misoperation and subjective deviation in traditional manual management, but also ensures strict execution and traceability of access rules through the non-tamper property and transparency of the block chain. In addition, the intelligent contract supports dynamic management of user rights, can adjust access rights in real time according to changes of user roles, data sensitivity and service demands, and effectively prevents users from unauthorized operation or misuse of data. Through the synergistic effect of the intelligent contracts and the blockchain, the application realizes the full-flow automation of data storage, access control and authority management, provides efficient and reliable technical guarantee for safe use of medical data, reduces management cost and risk, and builds a more intelligent and safer data management system in the assisted medical industry.

As shown in FIG. 6, the preferred embodiment of the present application also provides an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, which when executed implements the steps of the medical data classification hierarchical security management method of the above embodiment that combines large model and blockchain techniques.

As shown in fig. 7, the preferred embodiment of the present application also provides a computer device, which may be a terminal or a living body detection server, the internal structure of which may be as shown in fig. 7. The computer device includes a processor, a memory, and a network interface connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The network interface of the computer device is used for communicating with other external computer devices through network connection. The computer program, when executed by a processor, performs the steps of the medical data classification hierarchical security management method described above that combines large model and blockchain techniques.

It will be appreciated by those skilled in the art that the structure shown in FIG. 7 is merely a block diagram of some of the structures associated with the present inventive arrangements and is not limiting of the computer device to which the present inventive arrangements may be applied, and that a particular computer device may include more or fewer components than shown, or may combine some of the components, or have a different arrangement of components.

The preferred embodiment of the present application also provides a storage medium including a stored program, which when executed controls a device in which the storage medium is located to perform the steps of the medical data classification hierarchical security management method combining the large model and the blockchain technology in the above embodiment.

It should be noted that the steps illustrated in the flowcharts of the figures may be performed in a computer system such as a set of computer executable instructions, and that although a logical order is illustrated in the flowcharts, in some cases the steps illustrated or described may be performed in an order other than that illustrated herein.

The functions described in the method of this embodiment, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in one or more computing device readable storage media. Based on such understanding, a part of the present application that contributes to the prior art or a part of the technical solution may be embodied in the form of a software product stored in a storage medium, comprising several instructions for causing a computing device (which may be a personal computer, a server, a mobile computing device or a network device, etc.) to execute all or part of the steps of the method described in the embodiments of the present application. The storage medium includes a usb disk, a removable hard disk, a Read-Only Memory (ROM), a random-access Memory (RAM, random Access Memory), a magnetic disk or an optical disk, and other various media capable of storing program codes.

It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein. The scheme in the embodiment of the application can be realized by adopting various computer languages, such as object-oriented programming language Java, an transliteration script language JavaScript and the like.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

While preferred embodiments of the present application have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. It is therefore intended that the following claims be interpreted as including the preferred embodiments and all such alterations and modifications as fall within the scope of the application.

It will be apparent to those skilled in the art that various modifications and variations can be made to the present application without departing from the spirit or scope of the application. Thus, it is intended that the present application also include such modifications and alterations insofar as they come within the scope of the appended claims or the equivalents thereof.

Claims

1. A medical data classification and grading security management method combining a large model with blockchain technology, characterized by comprising the following steps:

S1. Construction of a large model for medical data classification and grading. After pre-training, fine-tuning, and parameter optimization training of the large model using public medical data, a large model for medical data classification and grading is obtained.

S2. Classification and grading of medical data: Use the trained medical data classification and grading model to classify and grade each piece of medical data, map it one by one to the data classification and grading rules, and add corresponding classification and grading labels;

S3, data on-chain and secure storage: Classified and graded medical data is processed and uploaded to the chain, recorded in a distributed ledger, and the correctness and integrity of the data storage are verified through the blockchain consensus mechanism to ensure that the data cannot be tampered with;

S4. Dynamic authorization and access control of smart contracts. After receiving a user request, the blockchain system calls the relevant smart contract to verify the user's access rights and returns the verification result.

S5. Dynamic update of classification and grading specifications. When the classification and grading specifications of medical data change, the dynamic update process is triggered, the knowledge base is updated, and the updated data is re-uploaded to the blockchain system, user access rights are adjusted, and data privacy and security are guaranteed.

2. The medical data classification and grading security management method combining a large model and blockchain technology according to claim 1 is characterized in that step S1 specifically includes the following steps:

S11. Pre-train large models using a large amount of public medical data to equip them with professional general medical knowledge.

S12. Based on existing policies and regulations, formulate a classification and grading framework based on the basic nature, business attributes, and potential risks of medical data;

S13. Each medical institution uses data sampling techniques to collect representative small sample datasets from different categories and levels of data from its own private database, including representative diagnosis and treatment data and rare case data. The collected data are manually classified and graded through multi-expert evaluation methods to construct a high-quality fine-tuning dataset;

S14. Deploy the pre-trained large model locally in each medical institution and fine-tune the pre-trained large model based on the personalized data of each medical institution through low-rank adaptation technology. Under the condition of freezing the original parameters of the pre-trained model, train it on the labeled data set and obtain the matrix of parameter changes based on the gradient descent optimization of the cross entropy loss function:

Among them, _yi is the true label, is the probability predicted by the model, n is the number of categories, and the fine-tuning method is applied to obtain the parameter change matrix (ΔW ₁ ,…,ΔW _N ) based on the training data of each medical institution;

S15. Using homomorphic encryption and differential privacy encryption technology, the parameter change matrix (ΔW ₁ , …, ΔW _N ) trained based on the data of each medical institution is used to mask the gradient information, and the encrypted parameter change matrix is sent to the central aggregation server;

S16. Use a method that combines spatial anomaly, behavioral anomaly, and amplitude anomaly multi-indicator detection to detect poisoning attacks;

S17. Calculate the contribution of each node based on the local model quality gain, the amount of data provided and the training intensity, and the number of abnormal labels;

S18. The central server performs weighted security aggregation based on the contribution of each institution. The weight change matrix of the institution with a large contribution occupies a larger weight in the training. The parameter change matrix after weighted addition is added to the parameter matrix of the pre-trained large model, and the accuracy change is recorded. When the accuracy change is less than the expected value, the model is considered to have converged and the training is terminated. Otherwise, the global model parameter change is encrypted and sent to each institution node. After updating the local model, return to step S13 until the model training converges.

3. The medical data classification and grading security management method combining a large model and blockchain technology according to claim 2 is characterized in that step S16 specifically includes the following steps:

S161. First, the parameter change matrices uploaded by each organization are mapped to a unified vector space and the cosine similarity between them is calculated according to the following formula:

The parameter changes of malicious nodes are significantly lower than the similarity of most normal nodes, forming isolated clusters. A threshold θ is set. When the average cosine similarity of node i with other nodes is less than the set threshold, it is considered an abnormal user.

S162. Next, combine the parameter change matrix of each medical institution with the base large model, record the resulting performance changes, and mark the user as an abnormal user when the model performance degradation value exceeds the normal range.

S163. Finally, the norm of the parameter change matrix uploaded by all nodes is counted. When a node has malicious attack behavior, a large variation of ΔW will often appear in an attempt to control the model. If the norm of a node is much larger than the mean, it will be marked as an abnormal user; warning feedback will be given to the abnormal user, requiring the node to re-check the data label or re-fine-tune, and accumulate abnormal labels for it. Its ΔW will be downgraded during aggregation. When the cumulative number exceeds the preset threshold, the organization node will be permanently eliminated.

4. The medical data classification and grading security management method combining a large model and blockchain technology according to claim 2 is characterized in that step S17 specifically includes:

S171. Use accuracy, F1 score, ROC and other indicators to record the local gain of ΔW of each mechanism to the global model, denoted as G;

S172. Count the number of labeled samples and training rounds used for fine-tuning by each medical institution, denoted as Q, to avoid "free-riding" by nodes with extremely small data sizes and very few training rounds;

S173. Calculate the contribution of each medical institution based on the number of times F the node was identified as an abnormal node during training using the following formula:

The weight parameters a, β, and γ can be adjusted dynamically as needed. A minimum protection threshold is set to avoid complete power loss for small medical institutions.

5. The medical data classification and grading security management method combining a large model and blockchain technology according to claim 1 is characterized in that step S2 specifically includes the following steps:

S21. Collect structured and unstructured medical data, including electronic health records, medical images, laboratory test results, physician diagnostic reports, and patient medical records, from multiple source systems of medical institutions, such as hospital information systems, laboratory information management systems, image archiving and communication systems, and perform data cleaning to remove noise and redundant information;

S22. Build a knowledge base based on the latest credible medical information, including the latest medical data classification and grading management standards, cutting-edge papers published in important medical academic journals, and authoritative research reports;

S23. Through search enhancement technology, relevant knowledge and rules of medical data to be classified and graded are retrieved from the knowledge base, and combined with prompt engineering technology to assist the large model to achieve more accurate data classification and grading;

S24. Use the trained large model to classify and grade each piece of medical data, map them one by one to the data classification and grading rules, and add corresponding classification and grading labels.

6. The medical data classification and grading security management method combining a large model and blockchain technology according to claim 1 is characterized in that step S3 specifically includes the following steps:

S31. Calculate hash values for the classified and graded medical data to ensure the uniqueness and integrity of the data;

S32. Use asymmetric encryption algorithms to encrypt medical data to ensure data security during transmission and storage;

S33. Upload the encrypted medical data and its classification and grading labels to the blockchain network, record them in the distributed ledger, and verify the correctness and integrity of the data storage through the blockchain consensus mechanism to ensure that the data cannot be tampered with.

7. The medical data classification and grading security management method combining a large model and blockchain technology according to claim 1 is characterized in that step S4 specifically includes the following steps:

S41. The user packages the request including the request body, target data object, and the operation to be performed to generate a request R:

R←F(S, O, A),

Among them, R represents user request, S represents subject attributes (including user unique ID and permission level), O represents object attributes (including data category and level), and A represents operation attributes (including data operations such as read and write);

After signing with the private key, send the public key, certificate signature, and timestamp to the blockchain medical data management system:

B←X{ _PKX ,Sign(R, _SKX ), _T1 },

Where B represents the blockchain, X represents the user, PK _X represents the public key of user X, Sign() represents the digital signature, SK _X represents the private key of user X, T ₁ ,…T _m represents the timestamp;

S42. After receiving the user request, the blockchain system uses its public key to parse the request and calls the policy management contract to automatically match the data object corresponding to the request:

B{Sign(R,SK _X ),P(R),T ₂ }→X,

Among them, P represents the policy management contract, which is responsible for automatically matching the corresponding data objects according to user requests and policy rules;

S43. Call the permission authentication contract to verify the user's access rights:

B{V(X), T ₃ }→B,

V represents the permission verification contract, which is responsible for verifying whether the user has the permission to access a specific data object. If the verification is successful, the data retrieval contract is called to return the data corresponding to the request:

B{D(R), T ₄ }→X,

Among them, D represents the data retrieval contract, which is responsible for retrieving and returning the data object requested by the user from the blockchain. If the verification fails, it means that the user request does not meet the policy information in the policy management contract, and a rejection message is returned:

B{Refused，T ₄ }→X，

Refused indicates the plain text of the rejection information.

8. The medical data classification and grading security management method combining a large model and blockchain technology according to claim 1 is characterized in that step S5 specifically includes the following steps:

S51. When the medical data classification and grading specifications change, reselect and label the fine-tune data set according to the new medical data classification and grading specifications, and repeat step S1;

S52. Update the knowledge base to ensure the timeliness and accuracy of retrieval information, provide the latest and most reliable knowledge support for the classification and grading management of medical data, and ensure the efficiency and practicality of retrieval enhancement technology;

S53. Update the classification and grading labels of the medical data using the updated large-scale model classification and grading architecture, and re-upload the updated data to the blockchain system;

S54. Adjust user access rights according to the updated medical data classification and grading specifications to ensure data privacy and security.

9. An electronic device comprising a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein when the processor executes the computer program, the steps of the medical data classification and grading security management method combining a large model with blockchain technology are implemented as described in any one of claims 1 to 8.

10. A storage medium comprising a stored program, characterized in that when the program is running, the device where the storage medium is located is controlled to execute the steps of the medical data classification and grading security management method combining a large model and blockchain technology as described in any one of claims 1 to 8.