[go: up one dir, main page]

CN120613060A - Medical data classification and grading security management method, device, electronic device and storage medium combining big model and blockchain technology - Google Patents

Medical data classification and grading security management method, device, electronic device and storage medium combining big model and blockchain technology

Info

Publication number
CN120613060A
CN120613060A CN202510591610.1A CN202510591610A CN120613060A CN 120613060 A CN120613060 A CN 120613060A CN 202510591610 A CN202510591610 A CN 202510591610A CN 120613060 A CN120613060 A CN 120613060A
Authority
CN
China
Prior art keywords
data
grading
medical
classification
medical data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202510591610.1A
Other languages
Chinese (zh)
Inventor
陈裕友
马超群
万丽
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hunan University
Original Assignee
Hunan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hunan University filed Critical Hunan University
Priority to CN202510591610.1A priority Critical patent/CN120613060A/en
Publication of CN120613060A publication Critical patent/CN120613060A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H10/00ICT specially adapted for the handling or processing of patient-related medical or healthcare data
    • G16H10/60ICT specially adapted for the handling or processing of patient-related medical or healthcare data for patient-specific data, e.g. for electronic patient records
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/602Providing cryptographic facilities or services
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/604Tools and structures for managing or administering access control systems
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6245Protecting personal data, e.g. for financial or medical purposes

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Software Systems (AREA)
  • Bioethics (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Hardware Design (AREA)
  • Medical Informatics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Automation & Control Theory (AREA)
  • Computing Systems (AREA)
  • Epidemiology (AREA)
  • Primary Health Care (AREA)
  • Public Health (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本申请公开了一种结合大模型与区块链技术的医疗数据分类分级安全管理方法、装置、电子设备及存储介质,所述方法包括步骤:S1、医疗数据分类分级大模型构建;S2、医疗数据分类分级处理;S3、数据上链与安全存储;S4、智能合约动态授权与访问控制;S5、分类分级规范动态更新。本发明通过大模型和区块链技术深度融合,实现了医疗数据分类分级管理智能化、安全化,通过联邦学习技术提升了数据管理的智能化水平与准确性,并通过微调等技术提高模型对规范标准更新的响应速度与适应性;通过区块链技术解决了数据管理的责任追溯问题;此外,本发明显著提升了医疗数据管理的效率与安全性,为医疗行业的数据价值释放提供了全面、可靠的技术支撑。

This application discloses a method, device, electronic device, and storage medium for secure management of medical data classification and grading that combines a large model with blockchain technology. The method comprises the following steps: S1, constructing a large model for medical data classification and grading; S2, processing medical data classification and grading; S3, uploading data to a blockchain and securely storing it; S4, dynamic authorization and access control of smart contracts; and S5, dynamic updating of classification and grading specifications. Through the deep integration of large models and blockchain technology, the present invention achieves intelligent and secure management of medical data classification and grading. It enhances the intelligence and accuracy of data management through federated learning technology, and improves the model's responsiveness and adaptability to updates to regulatory standards through fine-tuning and other techniques. Blockchain technology addresses the issue of data management accountability. Furthermore, the present invention significantly improves the efficiency and security of medical data management, providing comprehensive and reliable technical support for unlocking the value of data in the medical industry.

Description

Medical data classification hierarchical security management method and device combining large model and blockchain technology, electronic equipment and storage medium
Technical Field
The invention relates to the technical field of blockchains and large models, in particular to a medical data classification hierarchical security management method, a device, electronic equipment and a storage medium combining the large model and the blockchain.
Background
The classified and hierarchical safety management of medical data refers to a management system which classifies the medical data into different categories and grades according to the sensitivity, importance and potential risk of the medical data and implements differentiated safety protection. The core aim is to balance the data security and the value utilization and ensure the privacy of patients and the compliance and circulation of medical information. Classification is generally based on data sources (e.g., patient records, public health data) and attributes (e.g., personal identity information, health status data), and classification is generally classified into three levels, core data, important data, and general data, depending on the extent to which data leakage or tampering may cause damage.
The purposes of the management mechanism described above include:
1. Strengthening safety protection, namely, adopting measures such as encryption, access control and the like in a targeted manner through definitely determining the grade of sensitive data (such as patient identity information and disease history), and reducing leakage risk;
2. meets the compliance requirements, namely, enforces the regulations of the data security laws, personal information protection laws, and the like, and ensures the legality of data processing;
3. data sharing is promoted, namely non-sensitive data (such as desensitized statistical information) is opened in a controllable range, and scene applications such as medical research, medical insurance payment and the like are supported;
4. Optimizing data management, namely improving the data management efficiency of medical institutions through asset combing and standard formulation, and laying a foundation for accurate medical treatment and intelligent hospital construction.
At present, the existing medical data classification hierarchical management main flow is as follows:
X1. establishing data classification and classification rules, namely constructing scientific and systematic data classification and classification frameworks and rules according to related specifications and standards;
X2. data asset carding, namely comprehensively carding the structured and unstructured data assets of the mechanism to form an original list of the data assets, and defining basic information and related parties of the data assets;
x3. data asset-data classification and grading mapping, namely mapping database tables, fields, data items, data files and the like in an original list of the data asset to data asset units in a data classification and grading rule one by one through data element association, and defining classification and grading of the data asset;
X4. data classification and classification-asset unit list auditing, namely auditing and optimizing and perfecting the data classification and classification-data asset mapping result;
X5, marking the basic attribute of the data asset, namely marking the basic attribute of the data asset unit according to the description requirement of the data asset and the retrieval requirement of the data asset catalog;
x6, auditing and optimizing the basic attribute labeling result of the data asset to finally form a data asset catalog;
And X7, dynamically updating and managing the data classification rules, the data classification-asset unit list, the basic attribute labeling set, the data asset catalogue and the like according to the data classification elements and the changes possibly influencing the data classification elements.
The management flow has the following defects:
① The complexity of classification rules and the professional and diversity characteristics of medical data lead to the fact that each processing step of data classification management needs to input a large amount of professional human resources, the economic cost and the time cost are high, and the data classification management is difficult to bear and implement for small medical institutions;
② Medical data classification and classification management methods relying on manual operation or basic rule engines are prone to misjudgment or omission, and accuracy and reliability of final classification results are affected;
③ When the standards of classification and grading are changed, the update response speed of the existing system is low, and new rule requirements cannot be adapted in time, so that timeliness of data processing is affected, and compliance risks are increased;
④ The existing data management method generally lacks traceability for classifying and grading data and recording storage addresses thereof, is fuzzy in management responsibility definition, and cannot effectively track change history and use records of the data, so that responsibility tracing and auditing are difficult to carry out when illegal operations such as data leakage and tampering occur, and increasingly strict compliance requirements of medical industry cannot be met.
Disclosure of Invention
The application provides a medical data classification and grading safety management method combining a large model and a blockchain technology, which aims to solve the technical problems that the existing medical data classification and grading safety management method is high in cost, poor in accuracy and reliability, and difficult to trace and audit responsibility, and cannot meet increasingly strict compliance requirements of the medical industry.
The application is realized by the following scheme:
a medical data classification hierarchical safety management method combining a large model and a blockchain technology comprises the following steps:
s1, constructing a medical data classification grading big model, and obtaining the medical data classification grading big model after pre-training, fine-tuning and parameter optimization training of the big model through public medical data;
s2, medical data classification and grading treatment, namely performing classification and grading treatment on each piece of medical data by using a trained medical data classification and grading large model, mapping the medical data into data classification and grading rules one by one, and adding corresponding classification and grading labels;
S3, data uplink and safe storage, processing and uplink the classified medical data, recording the medical data in a distributed account book, verifying the correctness and the integrity of the data storage through a block chain consensus mechanism, and ensuring that the data cannot be tampered;
S4, intelligent contract dynamic authorization and access control, and after receiving a user request, the blockchain system invokes the related intelligent contract to verify the access authority of the user and returns a verification result;
S5, the classification and grading specification is dynamically updated, when the medical data classification and grading specification is changed, a dynamic updating flow is triggered, a knowledge base is updated, the updated data is uploaded to the block chain system again, the access authority of a user is adjusted, and the privacy safety of the data is guaranteed.
Further, the step S1 specifically includes the steps of:
S11, pre-training a large model through a large amount of public medical data to enable the large model to have professional medical general knowledge;
S12, according to the existing policy specifications, a classification and grading frame rule is formulated according to the basic properties, business attributes and potential risks of medical data;
S13, each medical institution collects representative small sample data sets from data of different categories and different levels by using a data sampling technology from respective private databases, wherein the small sample data sets comprise representative diagnosis and treatment data and rare case data, and the collected data are manually classified and graded marked by a multi-expert evaluation method to construct a high-quality fine adjustment data set;
S14, locally deploying the pre-training large model in each medical institution, finely adjusting the pre-training large model based on personalized data of each medical institution through a low-rank adaptation technology (LoRA), training through a labeling data set under the condition of freezing original parameters of the pre-training model, and optimizing gradient descent based on a cross entropy loss function to obtain a matrix of parameter variation:
Wherein y i is the actual label, The probability of model prediction, n is the category number, and the fine tuning method is applied to obtain a parameter variation matrix (delta W 1,…,ΔWN) based on data training of each medical institution;
S15, masking gradient information by using homomorphic encryption and differential privacy encryption technology on a parameter variation matrix (Deltaw 1,…,ΔWN) trained based on data of each medical institution, and sending the encrypted parameter variation matrix to a central aggregation server;
s16, adopting a method of combining spatial abnormality, behavior abnormality and amplitude abnormality multi-index joint detection to detect the poisoning attack;
s17, calculating contribution degree of each node by combining the quality gain of the local model, the provided data quantity, the training intensity and the abnormal mark number;
And S18, carrying out weighted safety aggregation by the central server according to the contribution degree of each mechanism, wherein a weight change matrix of the mechanism with large contribution degree occupies larger weight in training, adding the weighted added parameter change matrix with a parameter matrix of a pre-training large model, recording the precision change quantity, considering that the model converges when the precision change quantity is smaller than an expected value, ending training, otherwise, encrypting and transmitting the global model parameter change quantity to each mechanism node, updating the local model, and returning to the step S13 until the model training converges.
Further, the step S16 specifically includes the steps of:
S161, firstly, mapping parameter variation matrixes uploaded by all mechanisms into a unified vector space, and then calculating cosine similarity between every two according to the following formula:
The similarity between the parameter variation of the malicious node and most normal nodes is obviously low, an isolated cluster is formed, a threshold value theta is set, and when the node i meets that the average cosine similarity between the node i and other nodes is smaller than the set threshold value, the node i is considered to be an abnormal user;
S162, respectively combining parameter variation matrixes of all medical institutions with a base large model, recording performance variation caused by the parameter variation matrixes, and marking the performance variation as an abnormal user when the performance variation of the model exceeds a normal interval;
And S163, counting parameter variation matrix norms uploaded by all nodes, when malicious attack behaviors exist in the nodes, always generating a delta W attempt control model with larger variation range, marking as an abnormal user if the norms of certain nodes are far larger than the average value, carrying out warning feedback on the abnormal user, requiring the nodes to recheck data marking or re-fine tuning, carrying out abnormal marking accumulation on the nodes, carrying out weight reduction on the delta W when aggregation, and permanently eliminating the mechanism nodes when the accumulated times exceed a preset threshold value.
Further, the step S17 specifically includes:
S171, using indexes such as accuracy, F1 fraction, ROC and the like to record local gains of the DeltaW of each mechanism on the global model, and recording the local gains as G;
s172, counting information such as the number of marked samples, training rounds and the like used for fine adjustment by each medical institution, and marking the information as Q so as to avoid 'taking a car' of a node with extremely small data scale and extremely small training times;
s173, calculating contribution degree of each medical institution according to the following formula by combining the number F of times that the node is judged to be an abnormal node in training:
the weight parameters a, beta and gamma can be dynamically adjusted according to the requirement, A minimum protection threshold is set to avoid complete loss of power for the small medical facility.
Further, the step S2 specifically includes the steps of:
S21, collecting structured and unstructured medical data including electronic health records, medical images, laboratory detection results, doctor diagnosis reports, patient medical records and the like from medical institution multi-source systems such as a hospital information system, a laboratory information management system, an image archiving and communication system and the like, and cleaning the data to remove noise and redundant information;
S22, constructing a knowledge base according to the latest credible medical field information such as the latest medical data classification hierarchical management specification file, the leading edge papers published by the journal of important medical academic and the authoritative research report;
S23, searching related knowledge and rules of medical data to be classified and classified in a knowledge base through a search enhancement technology, and combining a prompt engineering technology to assist a large model to realize more accurate data classification and classification;
S24, classifying and grading each piece of medical data by using the trained large model, mapping the medical data to the data classifying and grading rules one by one, and adding corresponding classifying and grading labels.
Further, the step S3 specifically includes the steps of:
S31, calculating a hash value of classified medical data, and ensuring the uniqueness and the integrity of the data;
S32, encrypting the medical data by using an asymmetric encryption algorithm, so as to ensure the safety of the data in the transmission and storage processes;
S33, uploading the encrypted medical data and classification hierarchical labels thereof to a block chain network, recording the medical data and classification hierarchical labels in a distributed account book, and verifying the correctness and the integrity of data storage through a block chain consensus mechanism to ensure that the data cannot be tampered.
Further, the step S4 specifically includes the steps of:
s41, a user packs a request including a request main body, a target data object and an operation to be executed to generate a request R:
R←F(S,O,A),
Wherein R represents a user request, S represents a subject attribute (comprising a user unique id and a permission level), O represents an object attribute (comprising a data category and a level), and A represents an operation attribute (comprising data operations such as reading, writing and the like);
after signing by the private key, sending the self public key, the certificate signature and the time stamp to the blockchain medical data management system:
B←X{PKX,Sign(R,SKX),T1},
Wherein B represents a blockchain, X represents a user, PK X represents a public key of user X, sign () represents a digital signature, SK X represents a private key of user X, and T 1,…Tm represents a time stamp;
s42, after the blockchain system receives the user request, the public key is used for analyzing the request, and the policy management contract is called to automatically match the data object corresponding to the request:
B{Sign(R,SKX),P(R),T2}→X,
wherein, P represents a policy management contract and is responsible for automatically matching corresponding data objects according to user requests and policy rules;
S43, invoking a permission authentication contract to verify the access permission of the user:
B{V(X),T3}→B,
V represents a permission verification contract, and is responsible for verifying whether a user has permission to access a specific data object, if the user passes the verification, the data retrieval contract is called, and data corresponding to the request is returned:
B{D(R),T4}→X,
wherein D represents a data retrieval contract, is responsible for retrieving and returning a data object requested by a user from the blockchain, and if the verification is not passed, represents that the user request does not meet the policy information in the policy management contract, and returns rejection information:
B{Refused,T4}→X,
Wherein Refused denotes the plaintext of the rejection information.
Further, the step S5 specifically includes the steps of:
S51, when the medical data classification and grading specification changes, the selection and labeling of the fine adjustment data set are carried out again according to the new medical data classification and grading specification, and the step S1 is repeated;
S52, updating a knowledge base, ensuring timeliness and accuracy of retrieval information, providing latest and most reliable knowledge support for medical data classification hierarchical management, and ensuring high efficiency and practicability of a retrieval enhancement technology;
S53, updating the medical data classification labels through the updated large model classification framework, and uploading the updated data to the block chain system again;
S54, adjusting the access rights of the users according to the updated medical data classification grading specifications, and guaranteeing the privacy security of the data.
The application also provides a medical data classification hierarchical safety management device combining the large model and the blockchain technology, which comprises the following components:
The medical data classification grading big model construction module is used for constructing a medical data classification grading big model, and the medical data classification grading big model is obtained after the big model is pre-trained, fine-tuned and parameter optimization trained through public medical data;
The medical data classification and grading processing module is used for classifying and grading the medical data, classifying and grading each piece of medical data by using the trained medical data classification and grading large model, mapping the medical data classification and grading large model into data classification and grading rules one by one, and adding corresponding classification and grading labels;
the data uplink and safety storage module is used for data uplink and safety storage, processing and uplink the classified medical data, recording the medical data in the distributed account book, verifying the correctness and the integrity of the data storage through a block chain consensus mechanism, and ensuring that the data cannot be tampered;
The intelligent contract dynamic authorization and access control module is used for intelligent contract dynamic authorization and access control, and after receiving a user request, the blockchain system invokes the related intelligent contract to verify the access authority of the user and returns a verification result;
The classification and classification standard dynamic updating module is used for dynamically updating the classification and classification standard, triggering a dynamic updating flow when the classification and classification standard of the medical data changes, updating the knowledge base, and uploading the updated data to the blockchain system again, so that the access authority of the user is adjusted, and the privacy safety of the data is ensured.
In another aspect, the present application further provides an electronic device, including a memory, a processor, and a computer program stored on the memory and executable on the processor, where the processor implements the steps of the medical data classification hierarchical security management method combining large model and blockchain technology when the computer program is executed.
The application also provides a storage medium, which comprises a stored program, and when the program runs, the equipment where the storage medium is located is controlled to execute the step of the medical data classification hierarchical safety management method combining the large model and the blockchain technology.
Compared with the prior art, the application has the following beneficial effects:
The application constructs an intelligent and safe medical data classification hierarchical management system through deep fusion of a large model and a blockchain technology. Firstly, the application realizes the medical data classification grading large model collaborative training under the condition that the multi-mechanism data cannot go out of the domain through the federal learning technology, realizes the accurate classification and grading division of multi-mode medical data such as texts, images and the like, remarkably improves the intelligent level and accuracy of data management, and improves the response speed and adaptability of the model to standard updating through combining a fine tuning technology and a retrieval enhancement technology. Secondly, the application ensures the safe storage and non-falsification of medical data through the distributed account book technology, the asymmetric encryption technology and the hash algorithm of the blockchain, solves the responsibility tracing problem in the data management, and provides a highly-trusted data management environment for medical institutions, patients and supervision departments. In addition, the application realizes automatic identity verification and access control through intelligent contract technology, ensures that only legal users can access data of a specific level through predefined rules and logic, and reduces the risks of data leakage and abuse. Through the synergistic effect of the large model and the blockchain, the application obviously improves the efficiency and the safety of medical data management and provides comprehensive and reliable technical support for data value release in the medical industry.
In addition to the objects, features and advantages described above, the present application has other objects, features and advantages. The present application will be described in further detail with reference to the drawings.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this specification, illustrate embodiments of the application and together with the description serve to explain the application.
FIG. 1 is a flow chart of a method for classified hierarchical security management of medical data combining large model and blockchain technologies in accordance with a preferred embodiment of the present application.
FIG. 2 is a schematic diagram of a medical data classification hierarchical large model construction flow in accordance with a preferred embodiment of the present application.
FIG. 3 is a schematic diagram of a classification and classification flow of medical data based on large model technique in accordance with a preferred embodiment of the present application.
FIG. 4 is a schematic diagram of the present application for managing user data access control based on blockchain smartcontracts.
FIG. 5 is a schematic diagram of a medical data classification hierarchical security management apparatus combining large model and blockchain technologies in accordance with another preferred embodiment of the present application.
Fig. 6 is a schematic block diagram of an electronic device entity of the preferred embodiment of the present application.
Fig. 7 is an internal structural view of the computer device of the preferred embodiment of the present application.
Detailed Description
Embodiments of the application are described in detail below with reference to the attached drawing figures, but the application can be practiced in a number of different ways, as defined and covered below.
Interpretation of related terms:
The data classification refers to data classification layers such as major class, middle class, minor class and subclass formed by distinguishing and classifying the data according to a certain principle or method according to certain common attribute or characteristic of the data and by expanding the data in two dimensions such as basic data property and business application attribute for facilitating fine management and use of the data.
The data classification refers to data classification, namely, the data is classified into different levels according to the influence degree of the data, which is tampered, destroyed, illegally acquired or illegally utilized, on national security, economic operation, social stability, public benefit or legal rights of individuals and organizations, so as to distinguish the limitation degree of the range of the accessible use or disclosure of the data.
The large model technology is an advanced artificial intelligence method based on a deep learning technology, and the core of the large model is to simulate the cognitive ability of human beings in natural language learning by using a deep learning neural network architecture containing billions or even billions of parameters, so that the high-efficiency processing and analysis of complex modes and relations contained in data are realized. By using advanced algorithms such as an attention mechanism and the like and combining massive multi-source heterogeneous data for pre-training, the large model can deeply mine fine features and potential rules in the data, and further high-precision prediction, classification, generation and decision making capability are realized. In addition, the large model also has strong migration learning capability, and can be quickly adapted to new tasks and fields. By virtue of the excellent data representation capability and generalization performance, the large model shows remarkable advantages in the fields of multi-mode data processing such as natural language processing, computer vision, voice recognition and the like.
The model fine tuning technology is an efficient artificial intelligence technology based on transfer learning, and is characterized in that the model fine tuning technology is further trained by utilizing general features learned by a pre-training model on a large-scale data set and a small-scale data set aiming at a specific task, so that new tasks are quickly adapted and the performance of the model is improved. Specifically, the fine tuning technique enables capturing a specific pattern and law of a target task by adjusting some or all of the parameters of the pre-training model while retaining its extensive knowledge learned during the pre-training phase. The fine tuning technology has the advantages that the requirement on the data volume of the target task can be remarkably reduced, the training cost is reduced, and meanwhile, the accuracy and the robustness of the model on a new task are improved. In addition, the tuning also supports hierarchical tuning strategies, such as tuning only the last few layers or specific modules of the model, to quickly adapt to task requirements while preserving generic features. By virtue of the high efficiency, flexibility and wide applicability, the fine tuning technology is widely applied to the fields of natural language processing (such as text classification and question-answering systems), computer vision (such as image segmentation and target detection), voice recognition (such as dialect recognition and voice emotion analysis) and the like, and becomes an important means for realizing rapid deployment of models and performance optimization.
The retrieval enhancement technology is an advanced artificial intelligence method combining information retrieval and model generation, and aims to enhance the generation capacity and accuracy of the model by introducing an external knowledge source. The core idea is to dynamically search information related to the task in the generation process, input the search result as a context, and assist the model to generate more accurate and more relevant output. In particular, search enhancement techniques typically include two key components, a search module and a generation module. The generation module combines the searched information with the original input by utilizing a pre-training language model to generate high-quality text output. The technology not only can remarkably improve the performance of the model in the knowledge-intensive task, but also can effectively reduce the risk of generating false or irrelevant contents. In addition, the retrieval enhancement technology has high flexibility and expandability, and can adapt to the requirements of different fields and tasks only by updating the data of the retrieval part, so that the knowledge updating cost of the model is reduced.
The block chain technology is a data management method based on a distributed account book and a cryptography principle, and is characterized in that a highly safe and transparent data storage and transmission system is constructed through the characteristics of decentralization, non-tampering and traceability. The blockchain consists of a series of blocks connected in time sequence, each block contains a group of transaction data which is verified by encryption, and the consistency of the data states of all nodes in the network is ensured through a consensus mechanism. The distributed architecture of the blockchain eliminates the dependence on centralization authority, enhances the attack resistance and fault tolerance of the system, and simultaneously provides a reliable basis for data audit and tracing due to the non-tamperable characteristic.
Intelligent contracts, which are an automated, programmable protocol based on blockchain technology, are characterized in that through predefined rules and logic, contract terms are automatically executed when specific conditions are met, without intervention of a third party. The intelligent contracts utilize the decentralization and non-tamperable properties of the blockchain to ensure transparency, security and reliability of contract execution. The key technology comprises a complete programming language of the figure, a state machine model and an event driving mechanism, and through the technology, the intelligent contract can process complex business logic and realize automatic updating and verification of data on a block chain. The execution process of the intelligent contract is completely transparent, and all operation records are permanently stored on the blockchain, so that auditing and traceability of all parties can be realized. In addition, the intelligent contract also supports multiparty collaboration, can automatically coordinate and execute protocols among a plurality of participants, and reduces human intervention and trust cost.
Federal learning, which is a distributed machine learning framework based on privacy protection and secure encryption technology, is centered on realizing a collaborative learning mechanism that data does not go out of the local by training a model on local equipment and sharing only model parameters or gradients. In the training process of federal learning, the data of each participant cannot go out of the domain, so that the risk of data leakage is avoided, the parameter update of the model is transmitted and aggregated through an encryption channel, and the effectiveness of model training and the compliance of data privacy are ensured. Through the federal learning technology, the global model can be cooperatively updated among a plurality of data sources, meanwhile, the direct exposure of original data is avoided, and the data cooperation modeling of cross-mechanism and cross-region is realized while the data main authority is protected.
As shown in FIG. 1, the preferred embodiment of the present application provides a medical data classification hierarchical security management method combining large model and blockchain technologies, comprising the steps of:
s1, constructing a medical data classification grading big model, and obtaining the medical data classification grading big model after pre-training, fine-tuning and parameter optimization training of the big model through public medical data;
s2, medical data classification and grading treatment, namely performing classification and grading treatment on each piece of medical data by using a trained medical data classification and grading large model, mapping the medical data into data classification and grading rules one by one, and adding corresponding classification and grading labels;
S3, data uplink and safe storage, processing and uplink the classified medical data, recording the medical data in a distributed account book, verifying the correctness and the integrity of the data storage through a block chain consensus mechanism, and ensuring that the data cannot be tampered;
S4, intelligent contract dynamic authorization and access control, and after receiving a user request, the blockchain system invokes the related intelligent contract to verify the access authority of the user and returns a verification result;
S5, the classification and grading specification is dynamically updated, when the medical data classification and grading specification is changed, a dynamic updating flow is triggered, a knowledge base is updated, the updated data is uploaded to the block chain system again, the access authority of a user is adjusted, and the privacy safety of the data is guaranteed.
Aiming at the defects existing in the prior art, the technical scheme adopted by the embodiment is mainly divided into three parts, namely a large model data classification sub-part, a block chain sub-part and an intelligent contract sub-part, wherein:
And the large model data classification and classification sub-part is used for realizing the automatic and intelligent flow of medical data classification and classification, simplifying a large number of manual processing operations and guaranteeing the accuracy of classification and classification results.
And the block chain part is used for storing medical data and classification grading labels thereof, guaranteeing the non-falsification, safety and reliability of the data, and ensuring traceability and transparent management of data operation responsibility.
The intelligent contract sub-part comprises a strategy management contract, an authority verification contract and a data retrieval contract, wherein the strategy management contract is mainly used for making and executing a management strategy, corresponding operation authorities are automatically distributed according to attributes of users or roles, the authority verification contract is used for comparing the authority limit of a request subject with the authority limit of access data, access control is carried out through the strategy in the strategy contract, and the data retrieval contract is used for retrieving and returning corresponding data according to the user request.
The embodiment constructs an intelligent and safe medical data classification hierarchical management system through deep fusion of a large model and a blockchain technology. Firstly, the embodiment realizes the medical data classification grading large model collaborative training under the condition that the multi-mechanism data cannot go out of the domain through the federal learning technology, realizes the accurate classification and grading division of multi-mode medical data such as texts, images and the like, remarkably improves the intelligent level and accuracy of data management, and improves the response speed and adaptability of the model to standard updating through combining a fine tuning technology and a retrieval enhancement technology. Secondly, the embodiment ensures the safe storage and non-falsification of medical data through the distributed account book technology, the asymmetric encryption technology and the hash algorithm of the blockchain, solves the responsibility tracing problem in data management, and provides a highly-trusted data management environment for medical institutions, patients and supervision departments. In addition, the embodiment realizes automatic identity verification and access control through an intelligent contract technology, ensures that only legal users can access data of a specific level through predefined rules and logic, and reduces risks of data leakage and abuse. Through the synergistic effect of the large model and the blockchain, the embodiment remarkably improves the efficiency and the safety of medical data management, and provides comprehensive and reliable technical support for data value release in the medical industry.
Preferably, as shown in fig. 2, the step S1 specifically includes the steps of:
S11, pre-training a large model through a large amount of public medical data to enable the large model to have professional medical general knowledge;
S12, according to the existing policy specifications, a classification and grading frame rule is formulated according to the basic properties, business attributes and potential risks of medical data;
S13, each medical institution collects representative small sample data sets from data of different categories and different levels by using a data sampling technology from respective private databases, wherein the small sample data sets comprise representative diagnosis and treatment data and rare case data, and the collected data are manually classified and graded marked by a multi-expert evaluation method to construct a high-quality fine adjustment data set;
S14, locally deploying the pre-training large model in each medical institution, finely adjusting the pre-training large model based on personalized data of each medical institution through a low-rank adaptation technology (LoRA), training through a labeling data set under the condition of freezing original parameters of the pre-training model, and optimizing gradient descent based on a cross entropy loss function to obtain a matrix of parameter variation:
Wherein y i is the actual label, The probability of model prediction, n is the category number, and the fine tuning method is applied to obtain a parameter variation matrix (delta W 1,…,ΔWN) based on data training of each medical institution;
S15, masking gradient information by using homomorphic encryption and differential privacy encryption technology on a parameter variation matrix (Deltaw 1,…,ΔWN) trained based on data of each medical institution, and sending the encrypted parameter variation matrix to a central aggregation server;
s16, adopting a method of combining spatial abnormality, behavior abnormality and amplitude abnormality multi-index joint detection to detect the poisoning attack;
s17, calculating contribution degree of each node by combining the quality gain of the local model, the provided data quantity, the training intensity and the abnormal mark number;
And S18, carrying out weighted safety aggregation by the central server according to the contribution degree of each mechanism, wherein a weight change matrix of the mechanism with large contribution degree occupies larger weight in training, adding the weighted added parameter change matrix with a parameter matrix of a pre-training large model, recording the precision change quantity, considering that the model converges when the precision change quantity is smaller than an expected value, ending training, otherwise, encrypting and transmitting the global model parameter change quantity to each mechanism node, updating the local model, and returning to the step S13 until the model training converges.
Specifically, the step S16 specifically includes the steps of:
S161, firstly, mapping parameter variation matrixes uploaded by all mechanisms into a unified vector space, and then calculating cosine similarity between every two according to the following formula:
The similarity between the parameter variation of the malicious node and most normal nodes is obviously low, an isolated cluster is formed, a threshold value theta is set, and when the node i meets that the average cosine similarity between the node i and other nodes is smaller than the set threshold value, the node i is considered to be an abnormal user;
S162, respectively combining parameter variation matrixes of all medical institutions with a base large model, recording performance variation caused by the parameter variation matrixes, and marking the performance variation as an abnormal user when the performance variation of the model exceeds a normal interval;
And S163, counting parameter variation matrix norms uploaded by all nodes, when malicious attack behaviors exist in the nodes, always generating a delta W attempt control model with larger variation range, marking as an abnormal user if the norms of certain nodes are far larger than the average value, carrying out warning feedback on the abnormal user, requiring the nodes to recheck data marking or re-fine tuning, carrying out abnormal marking accumulation on the nodes, carrying out weight reduction on the delta W when aggregation, and permanently eliminating the mechanism nodes when the accumulated times exceed a preset threshold value.
Specifically, the step S17 specifically includes:
S171, using indexes such as accuracy, F1 fraction, ROC and the like to record local gains of the DeltaW of each mechanism on the global model, and recording the local gains as G;
s172, counting information such as the number of marked samples, training rounds and the like used for fine adjustment by each medical institution, and marking the information as Q so as to avoid 'taking a car' of a node with extremely small data scale and extremely small training times;
s173, calculating contribution degree of each medical institution according to the following formula by combining the number F of times that the node is judged to be an abnormal node in training:
the weight parameters a, beta and gamma can be dynamically adjusted according to the requirement, A minimum protection threshold is set to avoid complete loss of power for the small medical facility.
Preferably, as shown in fig. 3, the step S2 specifically includes the steps of:
S21, collecting structured and unstructured medical data including electronic health records, medical images, laboratory detection results, doctor diagnosis reports, patient medical records and the like from medical institution multi-source systems such as a hospital information system, a laboratory information management system, an image archiving and communication system and the like, and cleaning the data to remove noise and redundant information;
S22, constructing a knowledge base according to the latest credible medical field information such as the latest medical data classification hierarchical management specification file, the leading edge papers published by the journal of important medical academic and the authoritative research report;
S23, searching related knowledge and rules of medical data to be classified and classified in a knowledge base through a search enhancement technology, and combining a prompt engineering technology to assist a large model to realize more accurate data classification and classification;
S24, classifying and grading each piece of medical data by using the trained large model, mapping the medical data to the data classifying and grading rules one by one, and adding corresponding classifying and grading labels.
Preferably, the step S3 specifically includes the steps of:
S31, calculating a hash value of classified medical data, and ensuring the uniqueness and the integrity of the data;
S32, encrypting the medical data by using an asymmetric encryption algorithm, so as to ensure the safety of the data in the transmission and storage processes;
S33, uploading the encrypted medical data and classification hierarchical labels thereof to a block chain network, recording the medical data and classification hierarchical labels in a distributed account book, and verifying the correctness and the integrity of data storage through a block chain consensus mechanism to ensure that the data cannot be tampered.
Preferably, as shown in fig. 4, the step S4 specifically includes the steps of:
s41, a user packs a request including a request main body, a target data object and an operation to be executed to generate a request R:
R←F(S,O,A),
Wherein R represents a user request, S represents a subject attribute (comprising a user unique id and a permission level), O represents an object attribute (comprising a data category and a level), and A represents an operation attribute (comprising data operations such as reading, writing and the like);
after signing by the private key, sending the self public key, the certificate signature and the time stamp to the blockchain medical data management system:
B←X{PKX,Sign(R,SKX),T1},
Wherein B represents a blockchain, X represents a user, PK X represents a public key of user X, sign () represents a digital signature, SK X represents a private key of user X, and T 1,…Tm represents a time stamp;
s42, after the blockchain system receives the user request, the public key is used for analyzing the request, and the policy management contract is called to automatically match the data object corresponding to the request:
B{Sign(R,SKX),P(R),T2}→X,
wherein, P represents a policy management contract and is responsible for automatically matching corresponding data objects according to user requests and policy rules;
S43, invoking a permission authentication contract to verify the access permission of the user:
B{V(X),T3}→B,
V represents a permission verification contract, and is responsible for verifying whether a user has permission to access a specific data object, if the user passes the verification, the data retrieval contract is called, and data corresponding to the request is returned:
B{D(R),T4}→X,
wherein D represents a data retrieval contract, is responsible for retrieving and returning a data object requested by a user from the blockchain, and if the verification is not passed, represents that the user request does not meet the policy information in the policy management contract, and returns rejection information:
B{Refused,T4}→X,
Wherein Refused denotes the plaintext of the rejection information.
Preferably, the step S5 specifically includes the steps of:
S51, when the medical data classification and grading specification changes, the selection and labeling of the fine adjustment data set are carried out again according to the new medical data classification and grading specification, and the step S1 is repeated;
S52, updating a knowledge base, ensuring timeliness and accuracy of retrieval information, providing latest and most reliable knowledge support for medical data classification hierarchical management, and ensuring high efficiency and practicability of a retrieval enhancement technology;
S53, updating the medical data classification labels through the updated large model classification framework, and uploading the updated data to the block chain system again;
S54, adjusting the access rights of the users according to the updated medical data classification grading specifications, and guaranteeing the privacy security of the data.
As shown in FIG. 5, another preferred embodiment of the present application further provides a medical data classification hierarchical security management apparatus combining large model and blockchain technologies, comprising:
The medical data classification grading big model construction module is used for constructing a medical data classification grading big model, and the medical data classification grading big model is obtained after the big model is pre-trained, fine-tuned and parameter optimization trained through public medical data;
The medical data classification and grading processing module is used for classifying and grading the medical data, classifying and grading each piece of medical data by using the trained medical data classification and grading large model, mapping the medical data classification and grading large model into data classification and grading rules one by one, and adding corresponding classification and grading labels;
the data uplink and safety storage module is used for data uplink and safety storage, processing and uplink the classified medical data, recording the medical data in the distributed account book, verifying the correctness and the integrity of the data storage through a block chain consensus mechanism, and ensuring that the data cannot be tampered;
The intelligent contract dynamic authorization and access control module is used for intelligent contract dynamic authorization and access control, and after receiving a user request, the blockchain system invokes the related intelligent contract to verify the access authority of the user and returns a verification result;
The classification and classification standard dynamic updating module is used for dynamically updating the classification and classification standard, triggering a dynamic updating flow when the classification and classification standard of the medical data changes, updating the knowledge base, and uploading the updated data to the blockchain system again, so that the access authority of the user is adjusted, and the privacy safety of the data is ensured.
In summary, the above embodiment of the present application has the following features:
the application combines the technologies of large model, fine adjustment of model, retrieval enhancement and the like, optimizes the medical data classification and grading flow, and remarkably improves the efficiency and accuracy. The large model has strong multi-mode data processing capability, reduces manpower and time cost, ensures high-precision classification and grading by the characteristic that the model fine adjustment enables the system to be rapidly suitable for different medical data, improves adaptability and expansibility of the system by dynamically searching related knowledge and rules by a search enhancement technology, provides a high-efficiency and reliable data management scheme for medical institutions, and releases data value in the assisted medical industry.
The application provides a multi-mechanism collaborative modeling method based on federal learning, which creatively realizes cross-mechanism joint training of 'available invisible' data and effectively breaks through the problem of island of traditional medical data. Meanwhile, the multi-dimensional toxin-throwing attack detection mechanism is designed by combining multiple indexes of space abnormality, behavior abnormality and amplitude abnormality, the safety of the model training process is improved, the contribution degree of each node is calculated by combining the quality gain of a local model, the provided data quantity, the training intensity and the abnormal mark number, the contribution weight is dynamically adjusted, the influence of bad data is reduced, the overall accuracy and the robustness of the model in medical data classification and classification tasks are improved, and particularly, the modeling effect is excellent under the complex scene facing different medical institution data quality discrepancies, and the method has extremely high application popularization value and industry suitability.
The application ensures the decentralization storage of the data by using the distributed account book technology of the blockchain, avoids the risk of single-point fault and data loss, ensures the integrity and the non-falsifiability of the data by combining the asymmetric encryption technology and the hash algorithm, and effectively prevents the data from being falsified or counterfeited maliciously in the transmission and storage processes. Meanwhile, the transparency and traceability of the blockchain enable each operation of data management to be permanently recorded and publicly checked, traceability of management responsibility is achieved, reliable audit basis is provided for medical institutions and supervision departments, and a transparent, reliable and efficient technical framework is constructed for safety management of medical data.
The application utilizes the intelligent contract technology to set the access rule according to the data classification hierarchical label and distributes the access rule to the blockchain, thereby ensuring that the data with different levels are only opened to the users with corresponding rights. The rule-based automatic access control mechanism not only avoids misoperation and subjective deviation in traditional manual management, but also ensures strict execution and traceability of access rules through the non-tamper property and transparency of the block chain. In addition, the intelligent contract supports dynamic management of user rights, can adjust access rights in real time according to changes of user roles, data sensitivity and service demands, and effectively prevents users from unauthorized operation or misuse of data. Through the synergistic effect of the intelligent contracts and the blockchain, the application realizes the full-flow automation of data storage, access control and authority management, provides efficient and reliable technical guarantee for safe use of medical data, reduces management cost and risk, and builds a more intelligent and safer data management system in the assisted medical industry.
As shown in FIG. 6, the preferred embodiment of the present application also provides an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, which when executed implements the steps of the medical data classification hierarchical security management method of the above embodiment that combines large model and blockchain techniques.
As shown in fig. 7, the preferred embodiment of the present application also provides a computer device, which may be a terminal or a living body detection server, the internal structure of which may be as shown in fig. 7. The computer device includes a processor, a memory, and a network interface connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The network interface of the computer device is used for communicating with other external computer devices through network connection. The computer program, when executed by a processor, performs the steps of the medical data classification hierarchical security management method described above that combines large model and blockchain techniques.
It will be appreciated by those skilled in the art that the structure shown in FIG. 7 is merely a block diagram of some of the structures associated with the present inventive arrangements and is not limiting of the computer device to which the present inventive arrangements may be applied, and that a particular computer device may include more or fewer components than shown, or may combine some of the components, or have a different arrangement of components.
The preferred embodiment of the present application also provides a storage medium including a stored program, which when executed controls a device in which the storage medium is located to perform the steps of the medical data classification hierarchical security management method combining the large model and the blockchain technology in the above embodiment.
It should be noted that the steps illustrated in the flowcharts of the figures may be performed in a computer system such as a set of computer executable instructions, and that although a logical order is illustrated in the flowcharts, in some cases the steps illustrated or described may be performed in an order other than that illustrated herein.
The functions described in the method of this embodiment, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in one or more computing device readable storage media. Based on such understanding, a part of the present application that contributes to the prior art or a part of the technical solution may be embodied in the form of a software product stored in a storage medium, comprising several instructions for causing a computing device (which may be a personal computer, a server, a mobile computing device or a network device, etc.) to execute all or part of the steps of the method described in the embodiments of the present application. The storage medium includes a usb disk, a removable hard disk, a Read-Only Memory (ROM), a random-access Memory (RAM, random Access Memory), a magnetic disk or an optical disk, and other various media capable of storing program codes.
It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein. The scheme in the embodiment of the application can be realized by adopting various computer languages, such as object-oriented programming language Java, an transliteration script language JavaScript and the like.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
While preferred embodiments of the present application have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. It is therefore intended that the following claims be interpreted as including the preferred embodiments and all such alterations and modifications as fall within the scope of the application.
It will be apparent to those skilled in the art that various modifications and variations can be made to the present application without departing from the spirit or scope of the application. Thus, it is intended that the present application also include such modifications and alterations insofar as they come within the scope of the appended claims or the equivalents thereof.

Claims (10)

1.一种结合大模型与区块链技术的医疗数据分类分级安全管理方法,其特征在于,包括步骤:1. A medical data classification and grading security management method combining a large model with blockchain technology, characterized by comprising the following steps: S1、医疗数据分类分级大模型构建,通过公共医疗数据对大模型进行预训练、微调和参数优化训练后,得到医疗数据分类分级大模型;S1. Construction of a large model for medical data classification and grading. After pre-training, fine-tuning, and parameter optimization training of the large model using public medical data, a large model for medical data classification and grading is obtained. S2、医疗数据分类分级处理,使用训练好的医疗数据分类分级大模型对每条医疗数据进行分类分级处理,将其一一映射到数据分类分级细则中,添加相应的分类分级标签;S2. Classification and grading of medical data: Use the trained medical data classification and grading model to classify and grade each piece of medical data, map it one by one to the data classification and grading rules, and add corresponding classification and grading labels; S3、数据上链与安全存储,对分类分级后的医疗数据进行处理上链,记录在分布式账本中,并通过区块链的共识机制验证数据存储的正确性与完整性,确保数据不可篡改;S3, data on-chain and secure storage: Classified and graded medical data is processed and uploaded to the chain, recorded in a distributed ledger, and the correctness and integrity of the data storage are verified through the blockchain consensus mechanism to ensure that the data cannot be tampered with; S4、智能合约动态授权与访问控制,区块链系统接收到用户请求后,调用相关的智能合约对用户的访问权限进行验证并返回验证结果;S4. Dynamic authorization and access control of smart contracts. After receiving a user request, the blockchain system calls the relevant smart contract to verify the user's access rights and returns the verification result. S5、分类分级规范动态更新,当医疗数据分类分级规范发生变化时,触发动态更新流程,更新知识库,并将更新后的数据重新上传到区块链系统中,调整用户访问权限,保障数据隐私安全。S5. Dynamic update of classification and grading specifications. When the classification and grading specifications of medical data change, the dynamic update process is triggered, the knowledge base is updated, and the updated data is re-uploaded to the blockchain system, user access rights are adjusted, and data privacy and security are guaranteed. 2.根据权利要求1所述的结合大模型与区块链技术的医疗数据分类分级安全管理方法,其特征在于,所述步骤S1具体包括步骤:2. The medical data classification and grading security management method combining a large model and blockchain technology according to claim 1 is characterized in that step S1 specifically includes the following steps: S11、通过大量公共医疗数据对大模型进行预训练,使其具备专业的医学通用知识;S11. Pre-train large models using a large amount of public medical data to equip them with professional general medical knowledge. S12、根据现有政策规范,按照医疗数据的基本性质、业务属性及其潜在风险,制定分类分级框架规则;S12. Based on existing policies and regulations, formulate a classification and grading framework based on the basic nature, business attributes, and potential risks of medical data; S13、各医疗机构从各自的私有数据库中使用数据抽样技术在不同类别与不同级别的数据中进行采集具有代表性的小样本数据集,包括具有代表性的诊疗数据以及罕见病例数据,并通过多专家评价的方法对采集数据进行人工分类分级标注,构造高质量微调数据集;S13. Each medical institution uses data sampling techniques to collect representative small sample datasets from different categories and levels of data from its own private database, including representative diagnosis and treatment data and rare case data. The collected data are manually classified and graded through multi-expert evaluation methods to construct a high-quality fine-tuning dataset; S14、将预训练大模型本地部署在各医疗机构,并通过低秩适应技术基于各医疗机构个性化数据对预训练大模型进行微调,在冻结预训练模型原始参数的条件下,通过标注数据集训练,基于交叉熵损失函数的梯度下降优化得到参数变化量的矩阵:S14. Deploy the pre-trained large model locally in each medical institution and fine-tune the pre-trained large model based on the personalized data of each medical institution through low-rank adaptation technology. Under the condition of freezing the original parameters of the pre-trained model, train it on the labeled data set and obtain the matrix of parameter changes based on the gradient descent optimization of the cross entropy loss function: 其中,yi是真实标签,是模型预测的概率,n是类别数量,应用该微调方法,得到基于各医疗机构数据训练的参数变化量矩阵(ΔW1,…,ΔWN);Among them, yi is the true label, is the probability predicted by the model, n is the number of categories, and the fine-tuning method is applied to obtain the parameter change matrix (ΔW 1 ,…,ΔW N ) based on the training data of each medical institution; S15、将基于各医疗机构数据训练的参数变化量矩阵(ΔW1,…,ΔWN)使用同态加密、差分隐私加密技术,对梯度信息进行掩饰,并将加密后的参数变化量矩阵发送至中央聚合服务器;S15. Using homomorphic encryption and differential privacy encryption technology, the parameter change matrix (ΔW 1 , …, ΔW N ) trained based on the data of each medical institution is used to mask the gradient information, and the encrypted parameter change matrix is sent to the central aggregation server; S16、采用结合空间异常、行为异常、幅度异常多指标联合检测的方法进行投毒攻击检测;S16. Use a method that combines spatial anomaly, behavioral anomaly, and amplitude anomaly multi-indicator detection to detect poisoning attacks; S17、结合局部模型质量增益、提供的数据量与训练强度、异常标记数计算各节点贡献度;S17. Calculate the contribution of each node based on the local model quality gain, the amount of data provided and the training intensity, and the number of abnormal labels; S18、中央服务器根据各机构贡献度进行加权安全聚合,贡献度大的机构的权重变化量矩阵在训练中占据更大的权重,将加权相加后的参数变化量矩阵与预训练大模型的参数矩阵进行相加,记录其精度变化量;当精度变化量小于预期值时,认为模型收敛,结束训练,否则,将全局模型参数变化量加密发送至各机构节点,更新局部模型后回到步骤S13,直至模型训练收敛。S18. The central server performs weighted security aggregation based on the contribution of each institution. The weight change matrix of the institution with a large contribution occupies a larger weight in the training. The parameter change matrix after weighted addition is added to the parameter matrix of the pre-trained large model, and the accuracy change is recorded. When the accuracy change is less than the expected value, the model is considered to have converged and the training is terminated. Otherwise, the global model parameter change is encrypted and sent to each institution node. After updating the local model, return to step S13 until the model training converges. 3.根据权利要求2所述的结合大模型与区块链技术的医疗数据分类分级安全管理方法,其特征在于,所述步骤S16具体包括步骤:3. The medical data classification and grading security management method combining a large model and blockchain technology according to claim 2 is characterized in that step S16 specifically includes the following steps: S161、首先,将各机构上传的参数变化量矩阵映射到一个统一的向量空间后根据下式计算两两之间的余弦相似度:S161. First, the parameter change matrices uploaded by each organization are mapped to a unified vector space and the cosine similarity between them is calculated according to the following formula: 恶意节点的参数变化量与大多数正常节点的相似度显著偏低,形成孤立团簇,设置阈值θ,当节点i满足其与其他节点的平均余弦相似度小于设定阈值,认为其为异常用户;The parameter changes of malicious nodes are significantly lower than the similarity of most normal nodes, forming isolated clusters. A threshold θ is set. When the average cosine similarity of node i with other nodes is less than the set threshold, it is considered an abnormal user. S162、其次,分别将各个医疗机构的参数变化量矩阵与基底大模型进行组合,记录带来的性能变化,当其导致模型性能下降值超出正常区间时,标记为异常用户;S162. Next, combine the parameter change matrix of each medical institution with the base large model, record the resulting performance changes, and mark the user as an abnormal user when the model performance degradation value exceeds the normal range. S163、最后,统计所有节点上传的参数变化量矩阵范数,当节点存在恶意攻击行为时,往往会出现较大变动幅度的ΔW试图控制模型,若某节点范数远大于均值时,标记为异常用户;对异常用户进行警告反馈,要求节点重新校验数据标注或重新微调,并对其进行异常标记累计,在聚合时对其ΔW进行降权,当累计次数超过预设阈值时,永久剔除该机构节点。S163. Finally, the norm of the parameter change matrix uploaded by all nodes is counted. When a node has malicious attack behavior, a large variation of ΔW will often appear in an attempt to control the model. If the norm of a node is much larger than the mean, it will be marked as an abnormal user; warning feedback will be given to the abnormal user, requiring the node to re-check the data label or re-fine-tune, and accumulate abnormal labels for it. Its ΔW will be downgraded during aggregation. When the cumulative number exceeds the preset threshold, the organization node will be permanently eliminated. 4.根据权利要求2所述的结合大模型与区块链技术的医疗数据分类分级安全管理方法,其特征在于,所述步骤S17具体包括:4. The medical data classification and grading security management method combining a large model and blockchain technology according to claim 2 is characterized in that step S17 specifically includes: S171、使用准确率、F1分数、ROC等指标记录每个机构的ΔW各自对全局模型的局部增益,记作G;S171. Use accuracy, F1 score, ROC and other indicators to record the local gain of ΔW of each mechanism to the global model, denoted as G; S172、统计每个医疗机构用于微调的标注样本数量、训练轮次等信息,记作Q,以避免数据规模极小、训练次数极少的节点“搭便车”;S172. Count the number of labeled samples and training rounds used for fine-tuning by each medical institution, denoted as Q, to avoid "free-riding" by nodes with extremely small data sizes and very few training rounds; S173、结合节点在训练中被判定为异常节点的次数F根据下式计算每家医疗机构贡献度:S173. Calculate the contribution of each medical institution based on the number of times F the node was identified as an abnormal node during training using the following formula: 权重参数a、β、γ可根据需要进行动态调整,为一个设置的最小保护阈值,以避免小型医疗机构完全失去动力。The weight parameters a, β, and γ can be adjusted dynamically as needed. A minimum protection threshold is set to avoid complete power loss for small medical institutions. 5.根据权利要求1所述的结合大模型与区块链技术的医疗数据分类分级安全管理方法,其特征在于,所述步骤S2具体包括步骤:5. The medical data classification and grading security management method combining a large model and blockchain technology according to claim 1 is characterized in that step S2 specifically includes the following steps: S21、从医院信息系统、实验室信息管理系统、影像归档和通信系统等医疗机构多源系统中采集包括电子健康记录、医学影像、实验室检测结果、医生诊断报告、患者病历等在内的结构化与非结构化医疗数据,并进行数据清洗,去除噪声与冗余信息;S21. Collect structured and unstructured medical data, including electronic health records, medical images, laboratory test results, physician diagnostic reports, and patient medical records, from multiple source systems of medical institutions, such as hospital information systems, laboratory information management systems, image archiving and communication systems, and perform data cleaning to remove noise and redundant information; S22、根据最新医疗数据分类分级管理规范文件、重要医疗学术期刊发表的前沿论文、权威研究报告等最新可信医疗领域信息构建知识库;S22. Build a knowledge base based on the latest credible medical information, including the latest medical data classification and grading management standards, cutting-edge papers published in important medical academic journals, and authoritative research reports; S23、通过检索增强技术,在知识库中检索待分类分级医疗数据的相关知识与规则,结合提示工程技术辅助大模型实现更加精准的数据分类分级;S23. Through search enhancement technology, relevant knowledge and rules of medical data to be classified and graded are retrieved from the knowledge base, and combined with prompt engineering technology to assist the large model to achieve more accurate data classification and grading; S24、使用训练好的大模型对每条医疗数据进行分类分级处理,将其一一映射到数据分类分级细则中,添加相应的分类分级标签。S24. Use the trained large model to classify and grade each piece of medical data, map them one by one to the data classification and grading rules, and add corresponding classification and grading labels. 6.根据权利要求1所述的结合大模型与区块链技术的医疗数据分类分级安全管理方法,其特征在于,所述步骤S3具体包括步骤:6. The medical data classification and grading security management method combining a large model and blockchain technology according to claim 1 is characterized in that step S3 specifically includes the following steps: S31、对分类分级后的医疗数据计算哈希值,确保数据的唯一性与完整性;S31. Calculate hash values for the classified and graded medical data to ensure the uniqueness and integrity of the data; S32、使用非对称加密算法对医疗数据进行加密,确保数据在传输与存储过程中的安全性;S32. Use asymmetric encryption algorithms to encrypt medical data to ensure data security during transmission and storage; S33、将加密后的医疗数据及其分类分级标签上传至区块链网络,记录在分布式账本中,并通过区块链的共识机制验证数据存储的正确性与完整性,确保数据不可篡改。S33. Upload the encrypted medical data and its classification and grading labels to the blockchain network, record them in the distributed ledger, and verify the correctness and integrity of the data storage through the blockchain consensus mechanism to ensure that the data cannot be tampered with. 7.根据权利要求1所述的结合大模型与区块链技术的医疗数据分类分级安全管理方法,其特征在于,所述步骤S4具体包括步骤:7. The medical data classification and grading security management method combining a large model and blockchain technology according to claim 1 is characterized in that step S4 specifically includes the following steps: S41、用户将包括请求主体、目标数据对象以及需执行操作在内的请求打包生成请求R:S41. The user packages the request including the request body, target data object, and the operation to be performed to generate a request R: R←F(S,O,A),R←F(S, O, A), 其中,R表示用户请求,S表示主体属性(包含用户唯一id与权限级别),O表示客体属性(包括数据类别与级别),A表示操作属性(包括读、写等数据操作);Among them, R represents user request, S represents subject attributes (including user unique ID and permission level), O represents object attributes (including data category and level), and A represents operation attributes (including data operations such as read and write); 用私钥签名后,将自身公钥、证书签名以及时间戳发送至区块链医疗数据管理系统:After signing with the private key, send the public key, certificate signature, and timestamp to the blockchain medical data management system: B←X{PKX,Sign(R,SKX),T1},B←X{ PKX ,Sign(R, SKX ), T1 }, 其中,B表示区块链,X表示用户,PKX表示用户X的公钥,Sign()表示数字签名,SKX表示用户X的私钥,T1,…Tm表示时间戳;Where B represents the blockchain, X represents the user, PK X represents the public key of user X, Sign() represents the digital signature, SK X represents the private key of user X, T 1 ,…T m represents the timestamp; S42、区块链系统接收到用户请求后,使用其公钥对请求进行解析,调用策略管理合约自动匹配请求对应的数据对象:S42. After receiving the user request, the blockchain system uses its public key to parse the request and calls the policy management contract to automatically match the data object corresponding to the request: B{Sign(R,SKX),P(R),T2}→X,B{Sign(R,SK X ),P(R),T 2 }→X, 其中,P表示策略管理合约,负责根据用户请求和策略规则,自动匹配对应的数据对象;Among them, P represents the policy management contract, which is responsible for automatically matching the corresponding data objects according to user requests and policy rules; S43、调用权限认证合约对用户的访问权限进行验证:S43. Call the permission authentication contract to verify the user's access rights: B{V(X),T3}→B,B{V(X), T 3 }→B, V表示权限验证合约,负责验证用户是否具有访问特定数据对象的权限,若验证通过,则调用数据检索合约,返回请求对应的数据:V represents the permission verification contract, which is responsible for verifying whether the user has the permission to access a specific data object. If the verification is successful, the data retrieval contract is called to return the data corresponding to the request: B{D(R),T4}→X,B{D(R), T 4 }→X, 其中,D表示数据检索合约,负责从区块链中检索并返回用户请求的数据对象,若验证未通过,则表示用户请求不满足策略管理合约中的策略信息,返回拒绝信息:Among them, D represents the data retrieval contract, which is responsible for retrieving and returning the data object requested by the user from the blockchain. If the verification fails, it means that the user request does not meet the policy information in the policy management contract, and a rejection message is returned: B{Refused,T4}→X,B{Refused,T 4 }→X, 其中,Refused表示拒绝信息的明文。Refused indicates the plain text of the rejection information. 8.根据权利要求1所述的结合大模型与区块链技术的医疗数据分类分级安全管理方法,其特征在于,所述步骤S5具体包括步骤:8. The medical data classification and grading security management method combining a large model and blockchain technology according to claim 1 is characterized in that step S5 specifically includes the following steps: S51、当医疗数据分类分级规范发生变化时,根据新的医疗数据分类分级规范,重新进行微调数据集的选择与标注,并重复步骤S1;S51. When the medical data classification and grading specifications change, reselect and label the fine-tune data set according to the new medical data classification and grading specifications, and repeat step S1; S52、更新知识库,确保检索信息的时效性与准确性,为医疗数据分类分级管理提供最新、最可靠的知识支持,保障检索增强技术的高效性与实用性;S52. Update the knowledge base to ensure the timeliness and accuracy of retrieval information, provide the latest and most reliable knowledge support for the classification and grading management of medical data, and ensure the efficiency and practicality of retrieval enhancement technology; S53、通过更新后的大模型分类分级架构更新医疗数据分类分级标签,并将更新后的数据重新上传到区块链系统中;S53. Update the classification and grading labels of the medical data using the updated large-scale model classification and grading architecture, and re-upload the updated data to the blockchain system; S54、根据更新后的医疗数据分类分级规范调整用户访问权限,保障数据隐私安全。S54. Adjust user access rights according to the updated medical data classification and grading specifications to ensure data privacy and security. 9.一种电子设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,其特征在于,所述处理器执行所述计算机程序时实现如权利要求1至8中任一项所述结合大模型与区块链技术的医疗数据分类分级安全管理方法的步骤。9. An electronic device comprising a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein when the processor executes the computer program, the steps of the medical data classification and grading security management method combining a large model with blockchain technology are implemented as described in any one of claims 1 to 8. 10.一种存储介质,所述存储介质包括存储的程序,其特征在于,在所述程序运行时控制所述存储介质所在的设备执行如权利要求1至8中任一项所述结合大模型与区块链技术的医疗数据分类分级安全管理方法的步骤。10. A storage medium comprising a stored program, characterized in that when the program is running, the device where the storage medium is located is controlled to execute the steps of the medical data classification and grading security management method combining a large model and blockchain technology as described in any one of claims 1 to 8.
CN202510591610.1A 2025-05-09 2025-05-09 Medical data classification and grading security management method, device, electronic device and storage medium combining big model and blockchain technology Pending CN120613060A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202510591610.1A CN120613060A (en) 2025-05-09 2025-05-09 Medical data classification and grading security management method, device, electronic device and storage medium combining big model and blockchain technology

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202510591610.1A CN120613060A (en) 2025-05-09 2025-05-09 Medical data classification and grading security management method, device, electronic device and storage medium combining big model and blockchain technology

Publications (1)

Publication Number Publication Date
CN120613060A true CN120613060A (en) 2025-09-09

Family

ID=96924994

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202510591610.1A Pending CN120613060A (en) 2025-05-09 2025-05-09 Medical data classification and grading security management method, device, electronic device and storage medium combining big model and blockchain technology

Country Status (1)

Country Link
CN (1) CN120613060A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN121188200A (en) * 2025-11-25 2025-12-23 国网湖北省电力有限公司 Sensitive data classification method, apparatus, device, and storage medium based on intelligent agents

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN121188200A (en) * 2025-11-25 2025-12-23 国网湖北省电力有限公司 Sensitive data classification method, apparatus, device, and storage medium based on intelligent agents

Similar Documents

Publication Publication Date Title
US11769577B1 (en) Decentralized identity authentication framework for distributed data
Malik et al. Building a secure platform for digital governance interoperability and data exchange using blockchain and deep learning-based frameworks
CN119155012B (en) Transfer station system and method for auditing file secure sharing
Xie et al. The anonymization protection algorithm based on fuzzy clustering for the ego of data in the internet of things
CN119652551A (en) A user information management method and system based on application information
CN118350030B (en) Metadata asset platform-based application assembly middle platform system
Zainal et al. A decentralized autonomous personal data management system in banking sector
CN120613060A (en) Medical data classification and grading security management method, device, electronic device and storage medium combining big model and blockchain technology
CN120032781A (en) A secure shared maritime medical record management method and system based on blockchain technology
De et al. A refinement approach for the reuse of privacy risk analysis results
CN118132650A (en) A food-based inspection data sharing method and system
Iqbal et al. Corda Security Ontology: Example of Post-Trade Matching and Confirmation.
CN120632954B (en) A method for secure sharing and analysis of cross-departmental government data based on deep learning
CN120372642B (en) Safety monitoring method and system for financial information service platform
CN120811652A (en) Operation authorization method, device and equipment for software resources and storage medium
CN120671166A (en) Data management method and system based on hierarchical coding and dynamic permission
CN120296755A (en) Zero trust API dynamic access control method, computer device, and medium
CN119004527A (en) Block chain-based data protection method, system, electronic equipment and storage medium
CN119106404A (en) A trusted system for digital asset rights confirmation through agricultural blockchain technology
CN106326769B (en) A field monitoring information processing device
Song Strengthening Small and Medium-Sized Businesses’ Cybersecurity: A Machine Learning-based Phishing Classification Model
Lin et al. Trustworthy Blockchain Oracles for Smart Contracts
Kalyanasundaram et al. Sign Up Wallet: A Blockchain based Personally Identifiable Information (PII) Masking using Lookup Substitution
Khang AI-Powered Cybersecurity for Banking and Finance: How to Enhance Security, Protect Data, and Prevent Attacks
US20260044610A1 (en) Monitoring and controlling communications between autonomous agents

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination