CN117672440A - Electronic medical record text information extraction method and system based on neural network - Google Patents
Electronic medical record text information extraction method and system based on neural network Download PDFInfo
- Publication number
- CN117672440A CN117672440A CN202211031069.1A CN202211031069A CN117672440A CN 117672440 A CN117672440 A CN 117672440A CN 202211031069 A CN202211031069 A CN 202211031069A CN 117672440 A CN117672440 A CN 117672440A
- Authority
- CN
- China
- Prior art keywords
- data
- entity
- category
- text
- medical
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000013528 artificial neural network Methods 0.000 title claims abstract description 29
- 238000000605 extraction Methods 0.000 title claims abstract description 26
- 238000000034 method Methods 0.000 claims abstract description 67
- 238000013145 classification model Methods 0.000 claims description 80
- 238000012549 training Methods 0.000 claims description 27
- 238000012545 processing Methods 0.000 claims description 16
- 230000014509 gene expression Effects 0.000 claims description 15
- 238000004590 computer program Methods 0.000 claims description 8
- 238000012216 screening Methods 0.000 claims description 2
- 230000009286 beneficial effect Effects 0.000 abstract description 6
- 239000000178 monomer Substances 0.000 description 18
- 230000008569 process Effects 0.000 description 16
- 239000003814 drug Substances 0.000 description 13
- 238000010586 diagram Methods 0.000 description 11
- 238000002372 labelling Methods 0.000 description 10
- 238000013210 evaluation model Methods 0.000 description 9
- 238000012986 modification Methods 0.000 description 9
- 230000004048 modification Effects 0.000 description 9
- 229940079593 drug Drugs 0.000 description 8
- UREBDLICKHMUKA-CXSFZGCWSA-N dexamethasone Chemical compound C1CC2=CC(=O)C=C[C@]2(C)[C@]2(F)[C@@H]1[C@@H]1C[C@@H](C)[C@@](C(=O)CO)(O)[C@@]1(C)C[C@@H]2O UREBDLICKHMUKA-CXSFZGCWSA-N 0.000 description 6
- 229960003957 dexamethasone Drugs 0.000 description 6
- 238000013479 data entry Methods 0.000 description 5
- 238000003745 diagnosis Methods 0.000 description 5
- 208000024891 symptom Diseases 0.000 description 5
- 206010019233 Headaches Diseases 0.000 description 4
- 206010037660 Pyrexia Diseases 0.000 description 4
- 201000010099 disease Diseases 0.000 description 4
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 4
- 231100000869 headache Toxicity 0.000 description 4
- 238000012360 testing method Methods 0.000 description 4
- 238000012795 verification Methods 0.000 description 4
- 230000008901 benefit Effects 0.000 description 3
- 238000013075 data extraction Methods 0.000 description 3
- 230000000694 effects Effects 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 230000003907 kidney function Effects 0.000 description 3
- 210000004185 liver Anatomy 0.000 description 3
- 230000003908 liver function Effects 0.000 description 3
- 239000013566 allergen Substances 0.000 description 2
- 238000013459 approach Methods 0.000 description 2
- 230000002457 bidirectional effect Effects 0.000 description 2
- 230000005540 biological transmission Effects 0.000 description 2
- 238000006243 chemical reaction Methods 0.000 description 2
- 238000007635 classification algorithm Methods 0.000 description 2
- 238000007418 data mining Methods 0.000 description 2
- 238000013135 deep learning Methods 0.000 description 2
- 230000037213 diet Effects 0.000 description 2
- 235000005911 diet Nutrition 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 238000004393 prognosis Methods 0.000 description 2
- 238000010200 validation analysis Methods 0.000 description 2
- 230000004888 barrier function Effects 0.000 description 1
- 210000004556 brain Anatomy 0.000 description 1
- 238000004422 calculation algorithm Methods 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000036541 health Effects 0.000 description 1
- 208000003906 hydrocephalus Diseases 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 230000007721 medicinal effect Effects 0.000 description 1
- 201000011107 obstructive hydrocephalus Diseases 0.000 description 1
- 230000002265 prevention Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H10/00—ICT specially adapted for the handling or processing of patient-related medical or healthcare data
- G16H10/60—ICT specially adapted for the handling or processing of patient-related medical or healthcare data for patient-specific data, e.g. for electronic patient records
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
- G06F40/295—Named entity recognition
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02A—TECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
- Y02A90/00—Technologies having an indirect contribution to adaptation to climate change
- Y02A90/10—Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Artificial Intelligence (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Epidemiology (AREA)
- Medical Informatics (AREA)
- Primary Health Care (AREA)
- Public Health (AREA)
- Medical Treatment And Welfare Office Work (AREA)
Abstract
The application discloses a neural network-based electronic medical record text information extraction method, which comprises the following steps: obtaining text data of the input electronic medical record; classifying the text data to obtain category text data with category identification; extracting medical entity data from the category text data, and forming the medical entity data into medical description data; and generating structured medical record data according to a preset structure based on the medical description data. According to the method, the text data of the electronic medical record are classified, medical entity data are extracted from the processed classified text data, the medical entity data are data representing relevant information of the electronic medical record, and the medical entity data are formed into medical description data according to a medical speech sentence pattern. The medical description data follow the medical principle, does not influence the free writing of medical records by clinicians, and is beneficial to the integrity of patient information input, thereby improving the accuracy of electronic medical records.
Description
Technical Field
The application relates to the field of medical resource management, in particular to an electronic medical record text information extraction method based on a neural network, an electronic medical record text information extraction system based on the neural network, a data generation method, electronic equipment and a computer storage medium.
Background
The electronic medical record (Electronic Medical Record, EMR) is digital information such as characters, symbols, charts, figures, numbers, images and the like which are made by medical staff on the illness passing and treatment conditions of patients, and is the basis for doctors to diagnose and treat diseases. The electronic medical record is used as an original record of the whole diagnosis and treatment process of the patient, records the illness state of the patient after the patient enters the medical institution, records the process of analyzing, diagnosing and treating the illness state by doctors, estimates prognosis and comments of all levels of doctors for ward rounds and consultation.
At present, unstructured data such as texts, images, sounds and the like exist in a large amount in a medical record text in a collection mode of the structured electronic medical record for a long time, the proportion of the unstructured data is high, and the computer is used for automatically processing and manufacturing barriers. Secondly, the data coding and standard use among the systems are not uniform, and the data input has randomness, such as diagnosis coding, operation coding and the like, so that the operability of the data is reduced, and the clinical data quality and the exchange and sharing of information are affected.
In this regard, in the prior art, there is provided an electronic medical record with a structured template, and a doctor can fill in corresponding information according to a menu bar. However, the electronic medical record of the structured template requires more key mouse operations of doctors to finish medical record filling, and the medical record filling lacks flexibility and freedom; and the electronic medical record structure of the structured template is relatively fixed, so that writing habit and thinking of doctors are influenced, and description of medical record information by the doctors is not beneficial to be completely described, and accuracy of the electronic medical record is further influenced.
Therefore, how to make the electronic medical record describe the medical record information completely, and improve the accuracy of the existing electronic medical record becomes a problem to be solved by those skilled in the art.
Disclosure of Invention
The embodiment of the application provides a method for solving the problem of how to enable an electronic medical record to completely describe medical record information of a doctor in the prior art and improving the accuracy of the existing electronic medical record.
The embodiment of the application provides a neural network-based electronic medical record text information extraction method, which comprises the following steps:
obtaining text data of the input electronic medical record;
classifying the text data to obtain category text data with category identification;
extracting medical entity data from the category text data, and forming the medical entity data into medical description data;
generating structured medical record data according to a preset structure based on the medical description data; the preset structure is a structure conforming to the natural language order.
Optionally, the classifying the text data to obtain category text data with category identification includes:
converting the text data into corresponding first vector data;
Obtaining a medical record classification model which is generated in advance, and inputting first vector data corresponding to the text data into the medical record classification model so as to obtain category characteristics and category probability data corresponding to the category characteristics;
screening target category characteristics corresponding to target category probability data with the maximum probability value from the category probability data;
based on the target category characteristics, category text data with category identification is output.
Optionally, before the text data is subjected to classification processing to obtain category text data with category identification, the method further comprises:
determining whether the text data has a category identification;
if the text data has the corresponding category identification, selecting a medical record classification model of the corresponding category for the text data with the category identification;
and if the text data does not have the corresponding category identification, selecting a corresponding medical record classification model for the text data from preset candidate medical record classification models.
Optionally, the pre-generated medical record classification model is generated by the following method:
obtaining text data of a plurality of electronic medical records and category identifiers respectively corresponding to the text data; the electronic medical record is the electronic medical record which is input by a user according to the writing habit of the user and combined with a medical expression format;
Converting the text data and the category identifications corresponding to the text data into text vector data and category vector data respectively;
inputting part of the text vector data and the corresponding category vector data into an initial medical record classification model to perform iterative training for multiple rounds, and obtaining a first output result of the initial medical record classification model for each iterative round;
correspondingly adjusting the initial medical record classification model according to a first output result of the initial medical record classification model of each iteration round to obtain a target medical record classification model;
inputting the rest of the text vector data and the corresponding category vector data into the target medical record classification model to obtain a second output result of the target medical record classification model;
matching the second output result with a preset result, and taking the target medical record classification model as a medical record classification model if the matching value reaches a preset threshold; if the matching value does not reach the preset threshold value, the model parameters of the initial medical record classification model are adjusted, and/or the text data of the plurality of electronic medical records and the number of the category identifiers corresponding to the text data are adjusted, and/or the text data and the category identifiers corresponding to the text data are respectively and correspondingly converted into new text vector data and new category vector data, and the new text vector data and the new category vector data are adopted to carry out repeated iterative training on the adjusted initial medical record classification model for a plurality of times according to the mode until the medical record classification model is determined.
Optionally, the extracting medical entity data from the category text data includes:
converting the category text data into corresponding second vector data;
obtaining a pre-generated entity recognition model, and inputting second vector data corresponding to the category text data into the entity recognition model to obtain an entity tag of the second vector corresponding to the category text data;
and extracting medical entity data from the category text data according to the entity tag.
Optionally, the obtaining the entity tag of the second vector corresponding to the category text data includes: and obtaining entity labels corresponding to the vector data of each text in the second vector corresponding to the category text data.
Optionally, the entity tags of the second vector corresponding to the category text data at least include a patient entity tag, a connective entity tag, a location entity tag, a medicine name entity tag, a presentation entity tag and a time entity tag.
Optionally, the pre-generated entity recognition model is generated by the following method:
obtaining text data of a plurality of electronic medical records and medical entity data of the text data; the electronic medical record is the electronic medical record which is input by a user according to the writing habit of the user and combined with a medical expression format;
Obtaining entity labels corresponding to medical entity data of the text data, and converting the medical entity data of the text data and the entity labels corresponding to the medical entity data of the text data into medical entity data vector data and entity label vector data of the text data respectively;
inputting part of medical entity data vector data of the text data and corresponding entity label vector data thereof into an initial entity recognition model for iterative training of multiple rounds, and obtaining a first output result of the initial entity recognition model of each iterative round;
correspondingly adjusting the initial entity recognition model according to a first output result of the initial entity recognition model of each iteration round to obtain a target entity recognition model;
inputting the medical entity data vector data of the rest part of the text data and the corresponding entity tag vector data thereof into the target entity recognition model to obtain a second output result of the target entity recognition model;
matching the second output result with a preset result, and taking the target entity recognition model as an entity recognition model if the matching value reaches a preset threshold;
and if the matching value does not reach the preset threshold value, adjusting model parameters of the initial entity recognition model, and/or adjusting the plurality of medical entity data and the number of entity labels corresponding to the medical entity data, and/or respectively converting the medical entity data of the text data and the entity labels corresponding to the medical entity data into medical entity data vector data and entity label vector data of new text data, and performing iterative training on the adjusted initial entity recognition model for multiple times by adopting the medical entity data vector data and the entity label vector data of the new text data according to the mode until the entity recognition model is determined.
Optionally, inputting the medical entity data vector data of the text data and the entity tag vector data corresponding to the medical entity data vector data into an initial entity recognition model for iterative training of multiple rounds, and obtaining a first output result of the initial entity recognition model of each iterative round, including:
obtaining vector data of each word in medical entity data of the text data and entity tag vector data corresponding to the vector data of each word;
and inputting the vector data of each word in the medical entity data of the text data and the entity label vector data corresponding to the vector data of each word into an initial entity recognition model for iterative training of multiple rounds, and obtaining a first output result of the initial entity recognition model of each iterative round.
Optionally, the composing the medical entity data into medical description data includes:
obtaining medical entity words in medical entity data and entity labels corresponding to the medical entity words;
obtaining a pre-generated entity relationship combination model, and inputting the medical entity words and entity labels corresponding to the medical entity words into the entity relationship combination model to obtain standard entity labels and logic relationships among the standard entity labels;
And matching the entity tag with the standard entity tag, and arranging medical entity words corresponding to the entity tag matched with the standard entity tag according to the logic relation to generate medical description data.
Optionally, the pre-generated entity relationship combination model is obtained by the following way:
acquiring text data of a plurality of electronic medical records and medical entity data of the text data, and acquiring medical entity words in the medical entity data and entity tags corresponding to the medical entity words; the electronic medical record is recorded by a user according to the writing habit of the user and by combining with a medical speech and sentence;
labeling the medical entity words through a preset labeling system to obtain a plurality of medical entity sequence templates;
obtaining a logic relation among the entity labels, and matching the entity labels with the medical entity sequence template according to the logic relation to obtain an entity relation combination model.
Optionally, before the generating the structured medical record data based on the medical description data and according to a preset structure, the method further includes:
obtaining scoring information of the medical description data, and determining the medical description data meeting a preset scoring threshold as target medical description data; the target medical description data is used as medical description data for generating structured medical record data according to a preset structure.
Optionally, the obtaining the scoring information of the medical description data determines the medical description data meeting a preset scoring threshold as target medical description data, including:
converting the medical description data into third vector data;
obtaining a pre-generated evaluation model, inputting third vector data corresponding to the medical description data into the evaluation model, and outputting scoring information corresponding to the third vector data by the evaluation model;
and setting a preset scoring threshold value, and determining medical description data meeting the preset scoring threshold value as target medical description data.
Optionally, the method further comprises:
converting a single word corresponding to each entity in the target medical description data into a monomer sequence,
converting a single word corresponding to each entity in the term library into a standard monomer sequence;
obtaining the ratio of the intersection of the monomer sequence and the standard monomer sequence to the union of the monomer sequence and the standard monomer sequence, and obtaining the entity corresponding to the sequence with the highest ratio;
and taking the entity corresponding to the sequence with the highest ratio as a standard word of the entity in the target medical description data.
Optionally, the method further comprises:
obtaining attribute information in the text data; the attribute information is information associated with the medical entity data;
Fusing the attribute information to the medical entity data to form medical description data;
and generating structured medical record data according to a preset structure based on the medical description data.
Optionally, the method further comprises:
selecting a medical term text from a preset medical term library; the medical term library at least stores categories, names and attributes of medical terms;
acquiring triggering operation aiming at electronic medical record entry, and acquiring text data of the electronic medical record entry according to the medical term text and the triggering operation; or alternatively
Acquiring a first triggering operation aiming at electronic medical record entry, and selecting a medical term text from a preset medical term library according to the first triggering operation; the medical term library at least stores the category, name and attribute of the term;
and obtaining a second triggering operation aiming at the electronic medical record, and obtaining text data of the input electronic medical record according to the medical term text and the second triggering operation.
Optionally, the triggering operation includes an initial entry triggering operation and a multi-stage modification triggering operation.
Optionally, the method is applied to a server, and after structured medical record data is generated based on the medical description data and according to a preset structure, the structured medical record data is sent to a client for display.
Optionally, the method is applied to the client, and after the structured medical record data is generated based on the medical description data and according to a preset structure, the structured medical record data is displayed on a display interface of the client.
Optionally, the method is applied to a client, after generating structured medical record data based on the medical description data and according to a preset structure, triggering operation of a display interface for the client is obtained, and the structured medical record data is transmitted to a designated client according to the triggering operation; the display interface of the client is provided with an icon associated with the appointed client, and the triggering operation is a triggering operation aiming at the icon associated with the appointed client.
The embodiment of the application also provides an electronic medical record text information extraction system based on the neural network, which comprises:
the text data acquisition module is used for acquiring text data of the input electronic medical record;
the category text data module is used for classifying the text data to obtain category text data with category identification;
the data processing module is used for extracting medical entity data from the category text data and forming the medical entity data into medical description data;
The electronic medical record structuring module is used for generating structured medical record data according to a preset structure based on the medical description data; the preset structure is a structure conforming to the natural language order.
The embodiment of the application also provides a data generation method, which comprises the following steps:
acquiring input text data, wherein the text data is text data conforming to a speech sentence pattern of a text data application scene;
classifying the text data to obtain category text data with category identification;
extracting entity data from the category text data, and forming the entity data into description data; the entity data are data with entity meaning in the category text data, and the description data are data conforming to a speaking sentence pattern of a data application scene;
generating target data according to a preset structure based on the description data; the preset structure is a structure conforming to a natural language order, and the natural language order is matched with a sentence pattern conforming to a data application scene.
The embodiment of the application also provides electronic equipment, which comprises: a processor; a memory for storing a computer program to be executed by a processor to perform the method of any of the above.
Embodiments of the present application also provide a computer storage medium storing a computer program that is executed by a processor to perform a method according to any one of the above.
Compared with the prior art, the application has the following advantages:
the embodiment of the application provides a neural network-based electronic medical record text information extraction method, which comprises the following steps: obtaining text data of the input electronic medical record; classifying the text data to obtain category text data with category identification; extracting medical entity data from the category text data, and forming the medical entity data into medical description data; generating structured medical record data according to a preset structure based on the medical description data; the preset structure is a structure conforming to the natural language order. According to the method, the text data of the electronic medical record are classified, medical entity data are extracted from the processed classified text data, the medical entity data are data representing relevant information of the electronic medical record, and the medical entity data are formed into medical description data according to a medical speech sentence pattern. The medical description data follow the medical principle, does not influence the free writing of medical records by clinicians, and is beneficial to the integrity of patient information input, thereby improving the accuracy of electronic medical records. In addition, as the medical description data follow the medical principle, the threshold for acquiring the data can be greatly reduced, the basis is provided for information retrieval and data mining, the support is provided for medical clinic and management decision, intelligent backfilling of diagnosis, examination and the like to electronic medical records is supported, the document pressure of doctors is reduced, and the phenomena of clinic information interconnection and intercommunication among systems in hospitals and among hospitals are promoted.
Drawings
Fig. 1 is a flowchart of a method for extracting electronic medical record text information based on a neural network according to a first embodiment of the present application.
Fig. 2 is a schematic diagram of the overall structure of the BiLSTM-CRF model according to the first embodiment of the present application.
Fig. 3 is a schematic diagram of an effect of electronic medical records after labeling according to the first embodiment of the present application.
Fig. 4 is a schematic diagram of a relationship model generating process according to the first embodiment of the present application.
Fig. 5 is a schematic diagram of an electronic medical record text information extraction system based on a neural network according to a second embodiment of the present application.
Fig. 6 is a flowchart of a data generating method according to a third embodiment of the present application.
Fig. 7 is a schematic diagram of a data generating device according to a fourth embodiment of the present application.
Fig. 8 is a schematic diagram of an electronic device according to a fifth embodiment of the present application.
Detailed Description
In the following description, numerous specific details are set forth in order to provide a thorough understanding of embodiments of the present application. The embodiments of the present application may be embodied in many other forms other than those described herein and similarly generalized by those skilled in the art may be made without departing from the spirit of the embodiments of the present application and the embodiments of the present application are therefore not limited to the specific implementations disclosed below.
Medical records are records of medical activities such as examination, diagnosis and treatment of occurrence, development and prognosis of diseases of patients by medical staff. The collected data is also summarized, arranged and comprehensively analyzed, and the medical health file of the patient is written according to a specified format and requirements. The medical records are not only summary of clinical practice work, but also legal basis for exploring disease rules and treating medical disputes. The medical record has important effects on medical treatment, prevention, teaching, scientific research, medical institution management and the like.
In this regard, in the prior art, there is provided an electronic medical record with a structured template, and a doctor can fill in corresponding information according to a menu bar. However, the electronic medical record of the structured template requires more key mouse operations of doctors to finish medical record filling, and the medical record filling lacks flexibility and freedom; and the electronic medical record structure of the structured template is relatively fixed, so that writing habit and thinking of doctors are influenced, and description of medical record information by the doctors is not beneficial to be completely described, and accuracy of the electronic medical record is further influenced.
In this regard, the first embodiment of the present application provides a method for extracting text information of an electronic medical record based on a neural network, so that the electronic medical record can completely describe the description of medical record information by a doctor according to own writing habit and thinking, thereby improving the accuracy of the description of the existing electronic medical record.
As shown in fig. 1, fig. 1 is a flowchart of a method for extracting electronic medical record text information based on a neural network according to a first embodiment of the present application. The electronic medical record text information extraction method based on the neural network provided by the first embodiment of the application comprises the following steps.
Step S101, obtaining text data of the input electronic medical record.
In this step, the text data of the entered electronic medical record obtained by the user (mainly referred to as doctor) is obtained by the user through the input of the mouse and the keyboard, and the obtained text data of the electronic medical record can be displayed on the display interface. The electronic medical record is an electronic medical record conforming to a medical expression format, and the medical expression format refers to a language sequence structure used by a doctor in the medical record writing process according to own writing habit and combining contents such as diseases, symptoms, adopted operations, used medicines and the like of the doctor on the clinic of a patient. The description forms comprise directions, parts, symptoms and the like, for example, 30mg of dexamethasone is taken orally, and the corresponding description forms are taking modes, medicines and dosage.
In addition, in order to facilitate the user to quickly enter the electronic medical record, the text data of the electronic medical record can be obtained through the triggering operation of the pre-generated term text electronic medical record entry. Specifically, a term text is selected from a preset medical term library, wherein the medical term library at least stores the category, name and attribute of the term. The medical term library may be generated by collecting text data of the historical electronic medical records. Then, a triggering operation for the electronic medical record is obtained, and text data of the electronic medical record is obtained according to the term text and the triggering operation. When entering the duration of the electronic medical record, the system can generate corresponding medical term text in advance according to the medical term library, and after the medical term text is generated, a user can correspondingly input text data of the electronic medical record.
Or, acquiring a first triggering operation aiming at electronic medical record entry, and selecting a medical term text from a preset medical term library according to the first triggering operation; the medical term library stores at least the category, name, and attribute of the term. The first triggering operation is used for acquiring the selected medical term text in the medical term library. And then, obtaining a second triggering operation aiming at the electronic medical record, and obtaining text data of the input electronic medical record according to the medical term text and the second triggering operation. The second triggering operation is used for text data of the electronic medical record which is entered according to the medical term text.
Of course, since the patient seeing process can be multi-level and multi-clinic, the corresponding electronic medical record is also in need of increasing content, and for this purpose, the triggering operation includes an initial entry triggering operation and a multi-level modification triggering operation. The initial input triggering operation is the triggering operation of the patient for inputting the electronic medical record required by the first doctor, and the electronic medical record generated by the initial input triggering operation is the initial electronic medical record. The multi-stage modification triggering operation is a text data modification operation for the initial electronic medical record, which is added on the basis of the initial electronic medical record. The modification operation is mainly a trigger operation to increase the text data entry.
Step S102, classifying the text data to obtain category text data with category identification.
After obtaining text data of the electronic medical record, the input electronic medical record is divided into types of the electronic medical record through a text classification algorithm, and the detail classification of the electronic medical record is determined. The meaning of this step is that the text data extraction rules of different types of electronic medical records are different in the subsequent processing flow.
In the step, after the text data of the electronic medical record is obtained, the text data is classified, and category text data with category identification is obtained. Specifically, first, text data of an electronic medical record is converted into corresponding first vector data, wherein the first vector data is data which can be identified by a computer device. After converting the text data of the electronic medical record into corresponding first vector data, category text data with category identification can be obtained through a medical record classification model which is generated in advance. The medical record classification model generated in advance is generated by the following steps: obtaining text data and category identifiers respectively corresponding to the text data of a plurality of electronic medical records, wherein the electronic medical records are input by a user according to writing habits of the user and in combination with a medical expression format; the text data of the plurality of electronic medical records and the category identifiers respectively corresponding to the text data can be divided into a training set, a testing set and a verification set. For example, if the number of the text data of the plurality of electronic medical records and the number of the category identifiers corresponding to the text data are 100, the training set includes 50, the test set includes 30, and the verification set includes 20. Then, converting text data and category identifications corresponding to the text data into text vector data and category vector data respectively, inputting part of the text vector data and the category vector data corresponding to the text vector data into an initial medical record classification model for iterative training of multiple rounds, and obtaining a first output result of the initial medical record classification model of each iterative round, wherein part of the text vector data and the category vector data corresponding to the text vector data are part of data in a training set; the text vector data in the validation set and its corresponding category vector data may validate the first output result for each round. And correspondingly adjusting the initial medical record classification model according to the first output result of the initial medical record classification model of each iteration round to obtain a target medical record classification model. After the target medical record classification model is obtained, the rest text vector data and the corresponding category vector data are input into the target medical record classification model to obtain a second output result of the target medical record classification model, wherein the rest text vector data and the corresponding category vector data are part of data in the test set. And then, matching the second output result with a preset result, and taking the target medical record classification model as a medical record classification model if the matching value reaches a preset threshold value. If the matching value does not reach the preset threshold value, the model parameters of the initial medical record classification model are adjusted, and/or the number of text data and category identifiers corresponding to the text data of a plurality of electronic medical records are adjusted, and/or the text data and the category identifiers corresponding to the text data are respectively and correspondingly converted into new text vector data and new category vector data, and the new text vector data and the new category vector data are adopted to carry out repeated iterative training on the adjusted initial medical record classification model according to the mode until the medical record classification model is determined.
After a medical record classification model which is generated in advance is obtained, first vector data corresponding to text data is input into the medical record classification model, the medical record classification model reads the vector data corresponding to the text data to obtain category characteristics and category probability data corresponding to the category characteristics, target category characteristics corresponding to target category probability data with the maximum probability value are screened out from the category probability data, and the medical record classification model outputs category text data with category identification based on the target category characteristics.
It should be noted that, in combination with the above, since the text data extraction rules of different types of electronic medical records have differences, before the text data is classified to obtain the category text data with the category identifier, the method further includes: determining whether the text data has a category identifier, and if the text data has a corresponding category identifier, selecting a medical record classification model of a corresponding category for the text data with the category identifier; if the text data does not have the corresponding category identification, selecting a corresponding medical record classification model for the text data from preset candidate medical record classification models. In other words, before classifying the text data, it is determined whether the text data has been classified, so that a medical record classification model of the corresponding class is required to classify the text data.
Step S103, extracting medical entity data from the category text data, and composing the medical entity data into medical description data.
After classifying the text data to obtain category text data with category identification, extracting medical entity data from the category text data, and composing the medical entity data into medical description data. The medical entity data refers to data with entity meaning in the category text data, and the medical description data refers to data conforming to a medical expression format. The specific manner of extracting medical entity data from category text data and composing the medical entity data into medical description data will be described below, respectively.
Specifically, extracting medical entity data from category text data includes: the category text data is converted into corresponding second vector data, which is data recognizable by the computer device. And extracting medical entity data from the category text data through a pre-generated entity recognition model, wherein the pre-generated entity recognition model is generated by the following steps of: obtaining text data of a plurality of electronic medical records and medical entity data of the text data, wherein the electronic medical records are the electronic medical records which are input by a user according to own writing habits and combined with a medical expression format, and the medical entity data refer to data with entity significance in category text data. For example, if the text data of the electronic medical record is "finding right auricle deformity", the medical entity data included in the text data of the electronic medical record is "finding", "right", "auricle" and "deformity". Then, obtaining an entity tag corresponding to the medical entity data of the text data, wherein the entity tag corresponding to the medical entity data is a representation of the medical entity data, and the entity tag can determine the entity type. In the first embodiment of the present application, entity classification (see fig. 2) may be set according to the I2B2 standard, and entity types are anatomical site, medication amount, observation object, diet, frequency, property, operation, negation, suspected, medication name, orientation, medication route, medication specification, performance, condition, time, examination name, unit, value, department, tense, instrument, approach, operation type, affirmation, allergen, patient, place, and connective, respectively. The entity types basically comprise entity types commonly occurring in electronic medical records. Continuing with the example of "finding right auricle deformity", its corresponding physical label is "affirmative, azimuthal, anatomical, and performance".
After obtaining the medical entity data and the corresponding entity tags of the text data, the medical entity data and the corresponding entity tags of the text data are respectively converted into medical entity data vector data and entity tag vector data of the text data, wherein the medical entity data vector data and the entity tag vector data are data which can be identified by the computer equipment. And inputting part of the medical entity data vector data of the text data and the corresponding entity label vector data thereof into an initial entity recognition model for carrying out iterative training for multiple rounds, obtaining a first output result of the initial entity recognition model of each iterative round, and correspondingly adjusting the initial entity recognition model according to the first output result of the initial entity recognition model of each iterative round to obtain a target entity recognition model. Then, inputting medical entity data vector data of the rest part of the text data and corresponding entity tag vector data thereof into the target entity recognition model to obtain a second output result of the target entity recognition model; matching the second output result with a preset result, and taking the target entity recognition model as an entity recognition model if the matching value reaches a preset threshold; and if the matching value does not reach the preset threshold value, adjusting model parameters of the initial entity recognition model, and/or adjusting the plurality of medical entity data and the number of entity labels corresponding to the medical entity data, and/or respectively converting the medical entity data of the text data and the entity labels corresponding to the medical entity data into medical entity data vector data and entity label vector data of new text data, and performing iterative training on the adjusted initial entity recognition model for multiple times by adopting the medical entity data vector data and the entity label vector data of the new text data according to the mode until the entity recognition model is determined.
Further, in the first embodiment of the present application, in order to make the obtained entity recognition model more accurate for recognizing the medical entity data vector data and the corresponding entity tag vector data, in the process of training the entity recognition model, the method further includes: obtaining vector data of each word in the medical entity data of the text data and entity tag vector data corresponding to the vector data of each word, inputting the vector data of each word in the medical entity data of the text data and the entity tag vector data corresponding to the vector data of each word into an initial entity recognition model, and performing iterative training for multiple rounds according to the mode until the entity recognition model is determined. The purpose of this step training is to enable the recognition of the trained entity recognition model to determine a classification for each word in the electronic medical record.
In the first embodiment of the present application, a processing manner of BIOES may be adopted to convert each word in text data into a tag, where the beginning B-TYPE of a certain entity, the middle I-TYPE of a certain entity, the ending E-TYPE of a certain entity, the entity S-TYPE of a single word, and the non-entity O, where TYPE may be any of the entity TYPEs described above. The sequences of Table 1 below can be obtained by the above transformation. After corresponding data is obtained, text is converted into vector data representation of characters through a word embedding technology, the vector data of continuous characters are input into an initial entity recognition model for training, and finally the entity recognition model is obtained.
TABLE 1
After the entity recognition model is obtained, inputting second vector data corresponding to the category text data into the entity recognition model to obtain an entity tag of the second vector corresponding to the category text data, and extracting medical entity data from the category text data according to the entity tag. The entity labels of the second vector corresponding to the category text data at least comprise a patient entity label, a connective entity label, a part entity label, a medicine name entity label, a presentation entity label and a time entity label. Correspondingly, the entity recognition model recognizes the entity tag of the second vector corresponding to the category text data, and the entity tag comprises: and the entity recognition model recognizes entity labels corresponding to the vector data of each word in the second vector corresponding to the category text data.
It should be noted that, the entity recognition model is mainly composed of a BiLSTM-CRF model, wherein the BiLSTM-CRF model is mainly divided into two parts BiLSTM and CRF, the BiLSTM model is totally called a bidirectional long-short-time memory network, the model is a deep learning network, the main function is to encode continuous data of an input model, and the method can 'memorize' the front-back relation of the data. The CRF (Conditional random field) model is known collectively as a conditional random field model, a model commonly used in the task of named entity recognition and used in the last layer of the overall model to constrain the output results. The overall structure of the BiLSTM-CRF model is shown in FIG. 3, and FIG. 3 is a schematic diagram of the effect of the noted electronic medical record according to the first embodiment of the present application.
After extracting medical entity data from the category text data, composing the medical entity data into medical description data. Specifically, obtaining a medical entity word and an entity label corresponding to the medical entity word in medical entity data, and then processing the medical entity word and the entity label corresponding to the medical entity word through a pre-generated entity relation combination model. The entity relation combination model which is generated in advance is obtained by the following steps: obtaining text data of a plurality of electronic medical records and medical entity data of the text data, and obtaining medical entity words in the medical entity data and entity tags corresponding to the medical entity words (shown in figure 3); the electronic medical record is recorded by a user according to the writing habit of the user and by combining with a medical speech and sentence. And then, labeling the medical entity words through a preset labeling system to obtain a plurality of medical entity sequence templates, as shown in the following table 2:
TABLE 2
As shown in Table 2, the medical examination of liver and kidney functions has other small items, such as indexes of ALB, K, P and the like, which are related to liver and kidney functions in essence, so that a sequence of [ liver and kidney functions, alb,35, g/L ] can be obtained through a labeling means, and a medical entity sequence template of [ examination items, examination sub-items, numerical values and units ] can be summarized through a large number of labels and some algorithm processing. Wherein the medical entity sequence templates are multiple and of different types. In the embodiment of the application, the form of the medical entity sequence template is as follows:
Where the left hand side is understood as a sequence of consecutive tags and the right hand side is understood as a sequence of tags where several numbers of tags may constitute an entity. After a plurality of medical entity sequence templates are obtained, logic relations among the entity labels are obtained, and the entity labels are matched with the medical entity sequence templates according to the logic relations, so that an entity relation combination model is obtained. A large number of medical entity sequence templates are recorded in the entity relationship combination model.
After labeling the entity-relationship combination model, the entity-relationship combination model may be collected into an entity-relationship combination model database. After obtaining the medical entity words and the entity labels corresponding to the medical entity words in the medical entity data, inputting the medical entity words and the entity labels corresponding to the medical entity words into the entity relation combination model to obtain a standard entity label and a logic relation between the standard entity labels, matching the entity labels with the standard entity labels, and arranging the medical entity words corresponding to the entity labels matched with the standard entity labels according to the logic relation to generate medical description data.
For example, referring to fig. 4, fig. 4 is a schematic diagram of a relationship model generating process according to the first embodiment of the present application. Two electronic medical records, namely an electronic medical record A and an electronic medical record B, can be extracted from the electronic medical record library. The basic content of the electronic medical record A is "admission check: brain CT shows obstructive hydrocephalus ", the corresponding label of which is" OOPRAND ". The basic content of the electronic medical record B is "admission check: head MRI shows a cystic nodular mass ", the corresponding label of which is" OOPRAND ". The Chinese labels corresponding to the PRAND are parts, checks, connective words, properties and expressions. In this regard, a relationship template corresponding to the logical relationship may be screened from a library of templates (models). And finally, outputting a corresponding result through the matched relation template through the electronic medical records (electronic medical records with the same logic relation) to be verified. The structure of the relationship template is shown as [ 'NEG', 'SYM', 'DUHao', 'SYM', @ [0,3], [0,1], [0,5] ]. Where [ 'NEG', 'SYM', 'DUHao', 'SYM' represent a segment of continuous text, a segment of text sequence of such type can be found exactly, and if and only if the type of each entity matches exactly those tags, we consider a segment of template to be matched. In this paragraph of templates not all information is useful, so the numbers [ [0,3], [0,1], [0,5] ] indicate that those information in this template are useful, e.g., [0,3] indicate that entity No. 0, NEG, and entity No. 3, SYM, are useful information, and we will output the entities NEG and SYM corresponding when the result is finally output.
In the first embodiment of the present application, in order to verify whether the medical description data is suitable, it is necessary to obtain scoring information of the medical description data, and medical description data satisfying a preset scoring threshold is determined as target medical description data; the target medical description data is used as medical description data for generating structured medical record data according to a preset structure, and specifically, the medical description data is converted into third vector data, wherein the third vector data is data which can be identified by computer equipment. And then, obtaining a pre-generated evaluation model, inputting third vector data corresponding to the medical description data into the evaluation model, outputting grading information corresponding to the third vector data by the evaluation model, setting a preset grading threshold value, and determining the medical description data meeting the preset grading threshold value as target medical description data. Further, to improve accuracy of verification of the target medical description data, the method further includes: and converting the single text corresponding to each entity in the target medical description data into a monomer sequence, converting the single text corresponding to each entity in the term library into a standard monomer sequence, obtaining the ratio of the intersection of the monomer sequence and the standard monomer sequence to the union of the monomer sequence and the standard monomer sequence, obtaining the entity corresponding to the sequence with the highest ratio, and taking the entity corresponding to the sequence with the highest ratio as the standard word of the entity in the target medical description data.
Step S104, based on the medical description data, generating structured medical record data according to a preset structure; the preset structure is a structure conforming to the natural language order.
After obtaining the medical description data, generating structured medical record data according to a preset structure based on the medical description data; the preset structure is a structure conforming to a natural language order, and the natural language order is matched with a medical expression format. In the first embodiment of the present application, the preset structure is a structure conforming to the natural language order, specifically, a JSON structure. For example, the electronic medical record is "find right auricle deformity". The extracted entity is "right auricle deformity", and the name, type, standard word, index position, position and symptom corresponding to the "right auricle deformity" as shown above can be respectively displayed in detail as JSON structures, and the conversion into such format has the advantage of being convenient for subsequent computer processing, so that the electronic medical record can be packaged into a standard API (application programming interface ) to be deployed on a network for users to use. The structured medical record data refers to data displayed according to a preset structure, no new data is generated in the process of generating the structured medical record data according to the preset structure based on medical description data, and meaningful medical information in the electronic medical record is extracted and integrated in the whole process.
In addition, in order to improve accuracy of description of the medical description data, in the first embodiment of the present application, the method further includes: obtaining attribute information in the text data; the attribute information is information associated with the medical entity data, the attribute information is fused with the medical entity data to form medical description data, and structured medical record data is generated according to a preset structure based on the medical description data. The purpose of the attribute information extraction is to find other content from the text data that modifies this entity data, and if the entity data is simply "find right-side auricle deformity. For example," headache and fever are not obvious caused by the occurrence of the headache and fever in the period of 3 days of 9 months of 2018, "the text data of the electronic medical record is improved after 30 mgiggt treatment by using dexamethasone after admission, and the use time of the dexamethasone medicine is the period of 3 days of 9 months of 2018, which is the meaning of extracting attribute information, because the model cannot be associated with overlong entity data.
In this embodiment, the method is applied to a server, and after structured medical record data is generated based on the medical description data and according to a preset structure, the structured medical record data is sent to a client for display. Or, the method is applied to the client, and after the structured medical record data is generated based on the medical description data and according to a preset structure, the structured medical record data is displayed on a display interface of the client. Furthermore, the method is applied to the client, after the structured medical record data is generated based on the medical description data and according to a preset structure, triggering operation of a display interface aiming at the client is obtained, and the structured medical record data is transmitted to a designated client according to the triggering operation; the display interface of the client is provided with an icon associated with the appointed client, and the triggering operation is a triggering operation aiming at the icon associated with the appointed client.
The first embodiment of the application provides a method for extracting text information of an electronic medical record based on a neural network, which comprises the following steps: obtaining text data of an input electronic medical record, wherein the electronic medical record is an electronic medical record conforming to a medical expression format; classifying the text data to obtain category text data with category identification; extracting medical entity data from the category text data, and forming the medical entity data into medical description data; the medical entity data are data with entity significance in the category text data, and the medical description data are data conforming to a medical speech sentence pattern; generating structured medical record data according to a preset structure based on the medical description data; the preset structure is a structure conforming to a natural language order, and the natural language order is matched with a sentence pattern conforming to a medical speech operation. According to the method, the text data of the electronic medical record are classified, medical entity data are extracted from the processed classified text data, the medical entity data are data representing relevant information of the electronic medical record, and the medical entity data are formed into medical description data according to a medical speech sentence pattern. The medical description data follow the medical principle, does not influence the free writing of medical records by clinicians, and is beneficial to the integrity of patient information input, thereby improving the accuracy of electronic medical records. And, based on medical description data, structured medical record data is generated according to a preset structure, wherein the structured medical record data refers to data which is displayed by the data in the electronic medical record according to the preset structure, and the preset structure accords with a medical expression format, so that the formed structured medical record data complies with medical criteria, and the applicability and the accuracy of the structured medical record data are improved. In addition, as the medical description data follow the medical principle, the threshold for acquiring the data can be greatly reduced, the basis is provided for information retrieval and data mining, the support is provided for medical clinic and management decision, intelligent backfilling of diagnosis, examination and the like to electronic medical records is supported, the document pressure of doctors is reduced, and the phenomena of clinic information interconnection and intercommunication among systems in hospitals and among hospitals are promoted.
Corresponding to the electronic medical record text information extraction method based on the neural network provided in the first embodiment of the present application, a second embodiment of the present application provides an electronic medical record text information extraction system based on the neural network. Since the system embodiment is substantially similar to the first embodiment, the description is relatively simple, and reference is made to the partial description of the first embodiment for relevant points. The system embodiments described below are merely illustrative.
Fig. 5 is a schematic diagram of an electronic medical record text information extraction system based on a neural network according to a second embodiment of the present application. The electronic medical record text information extraction system based on the neural network comprises: a text data obtaining module 501, configured to obtain text data of an entered electronic medical record, where the electronic medical record is an electronic medical record that conforms to a medical expression format; the category text data module 502 is configured to perform classification processing on the text data to obtain category text data with category identification; a data processing module 503, configured to extract medical entity data from the category text data, and compose the medical entity data into medical description data; the medical entity data are data with entity significance in the category text data, and the medical description data are data conforming to a medical speech sentence pattern; an electronic medical record structuring module 504, configured to generate structured medical record data according to a preset structure based on the medical description data; the preset structure is a structure conforming to a natural language order, and the natural language order is matched with a sentence pattern conforming to a medical speech operation.
A third embodiment of the present application provides a data generating method, as shown in fig. 6, and fig. 6 is a flowchart of data generation provided in the third embodiment of the present application. The method comprises the following steps:
step S601, obtaining input text data, wherein the text data is text data conforming to a sentence pattern of a text data application scene.
In this step, the entered text data obtained by the user is obtained by the user through the input of the mouse and the keyboard, and the obtained text data can be displayed on the display interface. The text data is text data conforming to a sentence pattern of a text data application scene, wherein the sentence pattern of the data application scene refers to a word sequence structure used by a user in a writing process according to own writing habit and combining with the content related to a current scene of the user in a specific scene field. The data application scenes, such as medical scenes, and the expression of the data application scenes, such as azimuth, part, symptom and the like, for example, 30mg of dexamethasone is taken orally, and the corresponding expression of the data application scenes is the administration mode, the medicine and the dosage. The data application scenario may also be legal document description scenario, financial document description scenario, etc.
In addition, in order to facilitate the user to enter text data quickly, the text data may be obtained through a trigger operation of a pre-generated term text entry. Specifically, the term text is selected from a preset term library, wherein the term library at least stores the category, name and attribute of the term, different application scenes correspond to different term libraries, and text data under the scenes are stored in each term library. The term library may be generated by collecting historical text data. Then, a triggering operation for text data entry is obtained, and entered text data is obtained according to the term text and the triggering operation. When entering text data, the system generates corresponding term text in advance according to the term library, and after the term text is generated, a user can correspondingly input the text data.
Or obtaining a first triggering operation aiming at text data input, wherein the first triggering operation is used for obtaining the term text selected from a term library according to the term text selected from the preset term library by the first triggering operation. Then, a second trigger operation for text data entry is obtained, and entered text data is obtained according to the term text and the second trigger operation. The second trigger operation is for text data entered according to the term text.
Of course, since the process of text data entry may be multi-level, multi-department, the corresponding text data is also subject to ever-increasing content, for which the triggering operations include initial entry triggering operations and multi-level modification triggering operations. The initial input triggering operation is the triggering operation of inputting text data required by the user facing the first scene, and the text data generated by the initial input triggering operation is initial text data. The multi-stage modification triggering operation is an operation for modifying the initial text data, which is added on the basis of the initial text data. The modification operation is mainly a trigger operation for increasing text data entry.
Step S602, classifying the text data to obtain category text data with category identification.
After obtaining the text data, the input text data is firstly divided into types of the text data through a text classification algorithm, and the detailed classification of the text data is determined. This is done because there is a difference in the text data extraction rules of different types in the subsequent process flow.
In this step, after obtaining text data, the text data is subjected to classification processing to obtain category text data having category identification. Specifically, first, the text data is converted into corresponding first vector data, which is data recognizable by the computer device. After converting the text data into the corresponding first vector data, category text data with category identification may be obtained by a pre-generated classification model. Wherein, the classification model generated in advance is generated by the following way: and obtaining a plurality of text data and category identifiers respectively corresponding to the text data, wherein the text data is text data which is input by a user according to own writing habit and combining a speaking sentence pattern of a text data application scene. Converting text data and category identifications corresponding to the text data into text vector data and category vector data respectively, inputting part of the text vector data and the category vector data into an initial classification model for iterative training of multiple rounds, and obtaining a first output result of the initial classification model of each iterative round, wherein part of the text vector data and the category vector data are part of data in a training set; the text vector data and the category vector data in the validation set may validate the first output result for each round. And correspondingly adjusting the initial classification model according to the first output result of the initial classification model of each iteration round to obtain the target classification model. After the target classification model is obtained, the rest text vector data and the category vector data are input into the target classification model to obtain a second output result of the target classification model, wherein the rest text vector data and the category vector data are part of data in the test set. And then, matching the second output result with a preset result, and taking the target classification model as a medical record classification model if the matching value reaches a preset threshold value. If the matching value does not reach the preset threshold value, the model parameters of the initial classification model are adjusted, and/or a plurality of text data and the number of class identifications corresponding to the text data are adjusted, and/or the text vector data and the class vector data are respectively and correspondingly converted into new text vector data and new class vector data, and the new text vector data and the new class vector data are adopted to carry out iterative training on the adjusted initial classification model for multiple times according to the mode until the classification model is determined.
After a pre-generated classification model is obtained, first vector data corresponding to text data is input into the classification model to obtain vector data corresponding to the text data so as to obtain category characteristics and category probability data corresponding to the category characteristics, target category characteristics corresponding to target category probability data with the maximum probability value are screened out from the category probability data, and the classification model outputs category text data with category identification based on the target category characteristics.
It should be noted that, in combination with the above, since there is a difference in the extraction rules of different types of text data, before the text data is classified to obtain the category text data with the category identifier, the method further includes: determining whether text data has category identification, if the text data has corresponding category identification, selecting a classification model of a corresponding category for the text data with the category identification; if the text data does not have the corresponding category identification, a classification model is selected for the text data. In other words, before classifying the text data, it is determined whether the text data has been classified, so that a classification model corresponding to the classification is required to classify the text data.
Step S603, extracting entity data from the category text data, and composing the entity data into description data; the entity data refers to data with entity meaning in the category text data, and the description data refers to data conforming to a speaking sentence pattern of a data application scene.
After classifying text data to obtain category text data with category identification, extracting entity data from the category text data, and composing the entity data into description data. The entity data refers to data with entity meaning in the category text data, and the description data refers to data conforming to a speaking sentence pattern of a data application scene. Specific ways of extracting entity data from the category text data and composing the entity data into description data will be described below, respectively.
Specifically, extracting entity data from the category text data includes: and converting the category text data into corresponding second vector data, wherein the second vector data is data which can be identified by the computer equipment. And extracting entity data from the category text data through a pre-generated entity recognition model. The entity recognition model generated in advance is generated by the following steps: and obtaining a plurality of text data and entity data of the text data, wherein the text data is text data which is recorded by a user according to self writing habit and in combination with a speaking sentence pattern of a data application scene, and the entity data refers to data with entity meaning in the category text data. Taking a data application scene as a medical scene as an example, the text data is text data of an electronic medical record, wherein the text data of the electronic medical record is "finding right auricle deformity", and the entity data in the text data of the electronic medical record are "finding", "right side", "auricle" and "deformity". And then, obtaining an entity tag corresponding to the entity data of the text data, wherein the entity tag corresponding to the entity data is a representation of the entity data, and the entity tag can be used for defining the entity type. In the third embodiment of the present application, taking the data application scenario as an example of a medical scenario, entity classification (see fig. 2) may be set according to the I2B2 standard, where entity types are respectively anatomical site, dose, observation object, diet, frequency, property, operation, negation, suspected, drug name, orientation, route of administration, drug specification, performance, condition, time, examination name, unit, value, department, tense, instrument, approach, operation type, affirmation, allergen, patient, place, and connective. The entity types basically comprise entity types commonly occurring in electronic medical records. Continuing with the example of "finding right auricle deformity", its corresponding physical label is "affirmative, azimuthal, anatomical, and performance".
After the entity data and the corresponding entity tags of the text data are obtained, the entity data and the corresponding entity tags of the text data are respectively converted into entity data vector data and entity tag vector data of the text data, wherein the entity data vector data and the entity tag vector data are data which can be identified by computer equipment. And inputting entity data vector data of the text data and entity label vector data corresponding to the entity data vector data into an initial entity recognition model for performing iterative training for multiple rounds, obtaining a first output result of the initial entity recognition model for each iterative round, and correspondingly adjusting the initial entity recognition model according to the first output result of the initial entity recognition model for each iterative round to obtain a target entity recognition model. Then, inputting entity data vector data of the rest part of the text data and corresponding entity tag vector data thereof into the target entity recognition model to obtain a second output result of the target entity recognition model; matching the second output result with a preset result, and taking the target entity recognition model as an entity recognition model if the matching value reaches a preset threshold; and if the matching value does not reach the preset threshold value, adjusting the model parameters of the initial entity recognition model, and/or adjusting the entity data of the plurality of text data and the number of entity tags corresponding to the entity data, and/or respectively converting the entity data vector data of the text data and the entity tag vector data corresponding to the entity data vector data of the text data into the entity data vector data of new text data and the entity tag vector data corresponding to the entity data vector data of the new text data, and performing repeated iterative training on the adjusted initial entity recognition model by adopting the entity data vector data of the new text data and the entity tag vector data corresponding to the entity data according to the mode until the entity recognition model is determined.
Further, in the third embodiment of the present application, in order to make the obtained entity recognition model recognize the entity data vector data and the entity tag vector data corresponding to the entity data vector data more accurately, in the process of training the entity recognition model, the method further includes: the method comprises the steps of obtaining vector data of each word in entity data of text data and entity tag vector data corresponding to the vector data of each word, inputting the vector data of each word in the entity data of the text data and the entity tag vector data corresponding to the vector data of each word into an initial entity recognition model, and performing iterative training for multiple rounds according to the mode until the entity recognition model is determined. The purpose of this step training is to enable the recognition of the trained entity recognition model to determine a classification for each word in the text data.
In the third embodiment of the present application, a processing manner of BIOES may be adopted to convert each word in text data into a tag, where the beginning B-TYPE of a certain entity, the middle I-TYPE of a certain entity, the ending E-TYPE of a certain entity, the entity S-TYPE of a single word, and the non-entity O, where TYPE may be any of the entity TYPEs described above. After the entity recognition model is obtained, the second vector data corresponding to the category text data is input into the entity recognition model, the entity recognition model recognizes the entity tag of the second vector corresponding to the category text data, and the entity data is extracted from the category text data according to the entity tag. Correspondingly, the entity recognition model recognizes entity labels of a second vector corresponding to the category text data, and the entity labels comprise: the entity recognition model recognizes entity labels corresponding to the vector data of each word in the second vector corresponding to the category text data.
It should be noted that, the entity recognition model is mainly composed of a BiLSTM-CRF model, wherein the BiLSTM-CRF model is mainly divided into two parts BiLSTM and CRF, the BiLSTM model is totally called a bidirectional long-short-time memory network, the model is a deep learning network, the main function is to encode continuous data of an input model, and the method can 'memorize' the front-back relation of the data. The CRF (Conditional random field) model is known collectively as a conditional random field model, a model commonly used in the task of named entity recognition and used in the last layer of the overall model to constrain the output results. The overall structure of the BiLSTM-CRF model is shown in FIG. 2.
After extracting entity data from the category text data, composing the entity data into description data. Specifically, entity words and entity tags corresponding to the entity words in the entity data are obtained, and then the entity words and the entity tags corresponding to the entity words are processed through a pre-generated entity relation combination model. The entity relation combination model which is generated in advance is obtained by the following steps: obtaining a plurality of text data and entity data of the text data, and obtaining entity words in the entity data and entity tags corresponding to the entity words; and labeling the entity words through a preset labeling system to obtain a plurality of entity sequence templates. After a plurality of entity sequence templates are obtained, logic relations among the entity labels are obtained, and the entity labels are matched with the entity sequence templates according to the logic relations, so that an entity relation combination model is obtained.
After labeling the entity-relationship combination model, the entity-relationship combination model may be collected into an entity-relationship combination model database. After obtaining entity words and entity labels corresponding to the entity words in the entity data, inputting the entity words and the entity labels corresponding to the entity words into the entity relation combination model to obtain standard entity labels and logic relations among the standard entity labels, matching the entity labels with the standard entity labels, and arranging the entity words corresponding to the entity labels matched with the standard entity labels according to the logic relations to generate description data.
In the first embodiment of the present application, in order to verify whether the description data is suitable, it is necessary to obtain scoring information of the description data, and the description data satisfying a preset scoring threshold is determined as target description data; the target description data is used for generating target data according to a preset structure, and specifically, the description data is converted into third vector data, wherein the third vector data is data which can be identified by computer equipment. And then, a pre-generated evaluation model is obtained, third vector data corresponding to the description data is input into the evaluation model, the evaluation model outputs grading information corresponding to the third vector data, a preset grading threshold value is set, and the description data meeting the preset grading threshold value is determined to be target description data. Further, to improve accuracy of verification of the target description data, the method further includes: converting the single text corresponding to each entity in the target description data into a monomer sequence, converting the single text corresponding to each entity in the term library into a standard monomer sequence, obtaining the ratio of the intersection of the monomer sequence and the standard monomer sequence to the union of the monomer sequence and the standard monomer sequence, obtaining the entity corresponding to the sequence with the highest ratio, and taking the entity corresponding to the sequence with the highest ratio as the standard word of the entity in the target description data.
Step S604, generating target data according to a preset structure based on the description data; the preset structure is a structure conforming to a natural language order, and the natural language order is matched with a sentence pattern conforming to a data application scene.
After the description data is obtained, generating target data according to a preset structure based on the description data; the preset structure is a structure conforming to a natural language order, and the natural language order is matched with a sentence pattern conforming to a data application scene. In the third embodiment of the present application, the preset structure is a structure conforming to the natural language order, specifically, a JSON structure. The target data is data which is displayed according to a preset structure by information in the input text data, no new data is generated in the process of generating the target data according to the preset structure based on the description data, and meaningful information in the input text data is extracted and integrated in the whole process.
The name, type, standard word, index position, position and symptom corresponding to the right auricle deformity can be respectively shown in detail as a JSON structure, and the conversion into the format has the advantage of being convenient for subsequent computer processing, so that the standard API can be packaged for deployment to a network for users.
In addition, in order to improve accuracy of description data, in the third embodiment of the present application, the method further includes: obtaining attribute information in the text data; the attribute information is information associated with the entity data, the attribute information is fused with the entity data to form description data, and target data is generated according to a preset structure based on the description data. The purpose of the attribute information extraction is to find other content from the text data that modifies this entity data, and if the entity data is simply "find right-side auricle deformity. For example," headache and fever are not obvious caused by the occurrence of the headache and fever in the period of 3 days of 9 months of 2018, "the text data of the electronic medical record is improved after 30 mgiggt treatment by using dexamethasone after admission, and the use time of the dexamethasone medicine is the period of 3 days of 9 months of 2018, which is the meaning of extracting attribute information, because the model cannot be associated with overlong entity data.
The text data is classified, entity data is extracted from the processed classified text data, the entity data is data of text data related information, and the entity data is composed into description data according to a speech sentence pattern of a data application scene. The description data follows the principle of speaking in the application scene, does not influence the free writing of text data by a user, and is beneficial to the integrity of text information input, thereby improving the accuracy of the text data.
The fourth embodiment of the present application provides a data generating apparatus corresponding to the data generating method provided by the third embodiment of the present application. Since the device embodiment is substantially similar to the third embodiment, the description is relatively simple, and reference is made to the description of the third embodiment for relevant points. The system embodiments described below are merely illustrative.
Fig. 7 is a schematic diagram of a data generating apparatus according to a fourth embodiment of the present application. The data generation device comprises: the text data obtaining unit 701 is configured to obtain entered text data, where the text data is text data conforming to a sentence pattern of a text data application scene. A category text data obtaining unit 702, configured to perform a classification process on the text data to obtain category text data with category identification. A data processing unit 703 for extracting entity data from the category text data and composing the entity data into description data; the entity data refers to data with entity meaning in the category text data, and the description data refers to data conforming to a speaking sentence pattern of a data application scene. A target data generating unit 704, configured to generate target data according to a preset structure based on the description data; the preset structure is a structure conforming to a natural language order, and the natural language order is matched with a sentence pattern conforming to a data application scene.
The fifth embodiment of the present application provides an electronic device, corresponding to the electronic medical record text information extraction method based on the neural network provided in the first embodiment of the present application and the data generation method provided in the third embodiment. Fig. 8 is a schematic diagram of an electronic device according to a fifth embodiment of the present application. The electronic device includes: a processor 801; a memory 802 for storing a computer program that is executed by a processor to perform the electronic medical record text information extraction method based on a neural network provided by the first embodiment and the data generation method provided by the third embodiment.
The sixth embodiment of the present application also provides a computer storage medium storing a computer program that is executed by a processor to perform the electronic medical record text information extraction method based on a neural network provided by the first embodiment and the data generation method provided by the third embodiment, corresponding to the electronic medical record text information extraction method based on a neural network provided by the first embodiment and the data generation method provided by the third embodiment of the present application.
While the preferred embodiment has been described, it is not intended to limit the invention thereto, and any person skilled in the art may make variations and modifications without departing from the spirit and scope of the present invention, so that the scope of the present invention shall be defined by the claims of the present application.
In one typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
The memory may include volatile memory in a computer-readable medium, random Access Memory (RAM) and/or nonvolatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of computer-readable media.
Computer readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computing device. Computer readable media, as defined herein, does not include non-transitory computer readable media (transmission media), such as modulated data signals and carrier waves.
It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
Claims (10)
1. The electronic medical record text information extraction method based on the neural network is characterized by comprising the following steps of:
obtaining text data of the input electronic medical record;
classifying the text data to obtain category text data with category identification;
extracting medical entity data from the category text data, and forming the medical entity data into medical description data;
generating structured medical record data according to a preset structure based on the medical description data; the preset structure is a structure conforming to the natural language order.
2. The method for extracting text information from electronic medical records based on neural network according to claim 1, wherein the classifying the text data to obtain category text data with category identification comprises:
Converting the text data into corresponding first vector data;
obtaining a medical record classification model which is generated in advance, and inputting first vector data corresponding to the text data into the medical record classification model so as to obtain category characteristics and category probability data corresponding to the category characteristics;
screening target category characteristics corresponding to target category probability data with the maximum probability value from the category probability data;
based on the target category characteristics, category text data with category identification is output.
3. The electronic medical record text information extraction method based on the neural network according to claim 2, wherein before the text data is subjected to classification processing to obtain category text data with category identification, further comprising:
determining whether the text data has a category identification;
if the text data has the corresponding category identification, selecting a medical record classification model of the corresponding category for the text data with the category identification;
and if the text data does not have the corresponding category identification, selecting a corresponding medical record classification model for the text data from preset candidate medical record classification models.
4. The electronic medical record text information extraction method based on the neural network according to claim 2, wherein the medical record classification model generated in advance is generated by:
Obtaining text data of a plurality of electronic medical records and category identifiers respectively corresponding to the text data; the electronic medical record is the electronic medical record which is input by a user according to the writing habit of the user and combined with a medical expression format;
converting the text data and the category identifications corresponding to the text data into text vector data and category vector data respectively;
inputting part of the text vector data and the corresponding category vector data into an initial medical record classification model to perform iterative training for multiple rounds, and obtaining a first output result of the initial medical record classification model for each iterative round;
correspondingly adjusting the initial medical record classification model according to a first output result of the initial medical record classification model of each iteration round to obtain a target medical record classification model;
inputting the rest of the text vector data and the corresponding category vector data into the target medical record classification model to obtain a second output result of the target medical record classification model;
matching the second output result with a preset result, and taking the target medical record classification model as a medical record classification model if the matching value reaches a preset threshold; if the matching value does not reach the preset threshold value, the model parameters of the initial medical record classification model are adjusted, and/or the text data of the plurality of electronic medical records and the number of the category identifiers corresponding to the text data are adjusted, and/or the text data and the category identifiers corresponding to the text data are respectively and correspondingly converted into new text vector data and new category vector data, and the new text vector data and the new category vector data are adopted to carry out repeated iterative training on the adjusted initial medical record classification model for a plurality of times according to the mode until the medical record classification model is determined.
5. The electronic medical record text information extraction method based on the neural network according to claim 1, wherein the extracting medical entity data from the category text data comprises:
converting the category text data into corresponding second vector data;
obtaining a pre-generated entity recognition model, and inputting second vector data corresponding to the category text data into the entity recognition model to obtain an entity tag of the second vector corresponding to the category text data;
and extracting medical entity data from the category text data according to the entity tag.
6. The method for extracting text information from electronic medical records based on neural networks according to claim 5, wherein the obtaining the entity tag of the second vector corresponding to the category text data includes: and obtaining entity labels corresponding to the vector data of each text in the second vector corresponding to the category text data.
7. An electronic medical record text information extraction system based on a neural network is characterized by comprising:
the text data acquisition module is used for acquiring text data of the input electronic medical record;
the category text data module is used for classifying the text data to obtain category text data with category identification;
The data processing module is used for extracting medical entity data from the category text data and forming the medical entity data into medical description data;
the electronic medical record structuring module is used for generating structured medical record data according to a preset structure based on the medical description data; the preset structure is a structure conforming to the natural language order.
8. A data generation method, comprising:
acquiring input text data, wherein the text data is text data conforming to a speech sentence pattern of a text data application scene;
classifying the text data to obtain category text data with category identification;
extracting entity data from the category text data, and forming the entity data into description data; the entity data are data with entity meaning in the category text data, and the description data are data conforming to a speaking sentence pattern of a data application scene;
generating target data according to a preset structure based on the description data; the preset structure is a structure conforming to a natural language order, and the natural language order is matched with a sentence pattern conforming to a data application scene.
9. An electronic device, the electronic device comprising: a processor; a memory for storing a computer program to be run by a processor for performing the method of any one of claims 1-7, 8.
10. A computer storage medium, characterized in that the computer storage medium stores a computer program, which is executed by a processor, for performing the method of any of claims 1-7, 8.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211031069.1A CN117672440A (en) | 2022-08-26 | 2022-08-26 | Electronic medical record text information extraction method and system based on neural network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211031069.1A CN117672440A (en) | 2022-08-26 | 2022-08-26 | Electronic medical record text information extraction method and system based on neural network |
Publications (1)
Publication Number | Publication Date |
---|---|
CN117672440A true CN117672440A (en) | 2024-03-08 |
Family
ID=90068555
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211031069.1A Pending CN117672440A (en) | 2022-08-26 | 2022-08-26 | Electronic medical record text information extraction method and system based on neural network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117672440A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118132753A (en) * | 2024-05-08 | 2024-06-04 | 奇点数联(北京)科技有限公司 | Acquisition system of medical record text corresponding label |
CN119444420A (en) * | 2024-12-24 | 2025-02-14 | 证通股份有限公司 | A compliance task generation method and system based on natural language processing |
CN119740558A (en) * | 2025-03-05 | 2025-04-01 | 中国人民解放军总医院 | Method, system, equipment and storage medium for extracting electronic medical record information |
-
2022
- 2022-08-26 CN CN202211031069.1A patent/CN117672440A/en active Pending
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118132753A (en) * | 2024-05-08 | 2024-06-04 | 奇点数联(北京)科技有限公司 | Acquisition system of medical record text corresponding label |
CN119444420A (en) * | 2024-12-24 | 2025-02-14 | 证通股份有限公司 | A compliance task generation method and system based on natural language processing |
CN119740558A (en) * | 2025-03-05 | 2025-04-01 | 中国人民解放军总医院 | Method, system, equipment and storage medium for extracting electronic medical record information |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20240078448A1 (en) | Prognostic score based on health information | |
US20190347269A1 (en) | Structured report data from a medical text report | |
US11610678B2 (en) | Medical diagnostic aid and method | |
CN107644011B (en) | System and method for fine-grained medical entity extraction | |
US11791048B2 (en) | Machine-learning-based healthcare system | |
US20200334416A1 (en) | Computer-implemented natural language understanding of medical reports | |
EP3827442A1 (en) | Deep learning-based diagnosis and referral of diseases and disorders using natural language processing | |
Wang et al. | Continuous patient-centric sequence generation via sequentially coupled adversarial learning | |
CN112015917A (en) | Data processing method and device based on knowledge graph and computer equipment | |
CN117672440A (en) | Electronic medical record text information extraction method and system based on neural network | |
JP7704731B2 (en) | Deep Learning Architectures for Analysing Unstructured Data | |
US10936962B1 (en) | Methods and systems for confirming an advisory interaction with an artificial intelligence platform | |
US11527312B2 (en) | Clinical report retrieval and/or comparison | |
CN113724830A (en) | Medicine taking risk detection method based on artificial intelligence and related equipment | |
Braddon et al. | Exploring the utility of synthetic data to extract more value from sensitive health data assets: A focused example in perinatal epidemiology | |
Memarzadeh et al. | A study into patient similarity through representation learning from medical records | |
CN116762133A (en) | Decomposed feature representation for analyzing the content and style of radiology reports | |
CN112071431B (en) | Clinical path automatic generation method and system based on deep learning and knowledge graph | |
CN118428460A (en) | A method, device and system for constructing an interstitial lung disease knowledge base | |
CN117633209A (en) | Method and system for patient information summary | |
US20200118660A1 (en) | Summarization of clinical documents with end points thereof | |
EP3937105A1 (en) | Methods and systems for user data processing | |
Almuhana et al. | Classification of specialities in textual medical reports based on natural language processing and feature selection | |
Bhasin et al. | Early Detection of Alzheimer’s Disease using Medical Data | |
Li | Early diagnosis of alzheimer's disease using hybrid word embedding and linguistic characteristics |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |