CN112487816B - Named entity identification method based on network classification - Google Patents
Named entity identification method based on network classification Download PDFInfo
- Publication number
- CN112487816B CN112487816B CN202011472395.7A CN202011472395A CN112487816B CN 112487816 B CN112487816 B CN 112487816B CN 202011472395 A CN202011472395 A CN 202011472395A CN 112487816 B CN112487816 B CN 112487816B
- Authority
- CN
- China
- Prior art keywords
- named entity
- individual
- sample
- classification
- network
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 31
- 239000013598 vector Substances 0.000 claims abstract description 54
- 238000012549 training Methods 0.000 claims abstract description 11
- 238000013145 classification model Methods 0.000 claims abstract description 5
- 238000003058 natural language processing Methods 0.000 claims description 7
- 125000003275 alpha amino acid group Chemical group 0.000 claims description 6
- 150000001875 compounds Chemical class 0.000 claims description 6
- 238000012545 processing Methods 0.000 claims description 6
- 238000010276 construction Methods 0.000 claims description 3
- 238000010586 diagram Methods 0.000 claims description 3
- 230000006870 function Effects 0.000 claims description 3
- 238000004458 analytical method Methods 0.000 abstract description 5
- 238000000605 extraction Methods 0.000 abstract description 5
- 238000013519 translation Methods 0.000 abstract description 4
- 238000007781 pre-processing Methods 0.000 abstract 2
- 230000000694 effects Effects 0.000 description 6
- 238000011156 evaluation Methods 0.000 description 3
- 238000007635 classification algorithm Methods 0.000 description 2
- 238000002474 experimental method Methods 0.000 description 2
- 238000012804 iterative process Methods 0.000 description 2
- 238000005457 optimization Methods 0.000 description 2
- 230000008520 organization Effects 0.000 description 2
- 238000012360 testing method Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000015572 biosynthetic process Effects 0.000 description 1
- 238000002790 cross-validation Methods 0.000 description 1
- 238000009826 distribution Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000012216 screening Methods 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
- G06F40/295—Named entity recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/332—Query formulation
- G06F16/3329—Natural language query formulation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
- G06F40/211—Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/40—Processing or translation of natural language
- G06F40/58—Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Artificial Intelligence (AREA)
- Audiology, Speech & Language Pathology (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Mathematical Physics (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Human Computer Interaction (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a named entity identification method based on network classification, which comprises the following steps: 1: inputting named entity training sample text data and converting the text data into vector data; step 2: preprocessing the named entity training sample data; step 3: constructing a network training named entity recognition model by iteratively selecting part of samples; named entity identification includes: step 4: inputting named entity sample data to be identified; step 5: preprocessing the named entity sample data to be identified; step 6: and identifying the sample data of the named entity to be identified through the named entity classification model, and judging the category of the named entity to which the sample data belongs. The invention can rapidly and effectively extract the key attribute of the named entity from massive texts and identify the category of the named entity, improves the identifying efficiency of the named entity and provides a basis for information extraction, question-answering systems, syntactic analysis, machine translation and the like.
Description
Technical Field
The invention relates to the fields of natural language processing technology and named entity recognition, in particular to a named entity recognition method based on network classification.
Background
Named entity recognition (Named Entity Recognition, NER for short), also known as "private name recognition," refers to the recognition of entities in text that have a specific meaning, mainly including person names, place names, organization names, proper nouns, and the like. Generally comprising two parts: (1) entity boundary identification; (2) The entity class (person name, place name, organization name, or others) is determined. NER is a fundamental key task in NLP. From the flow of natural language processing, NER can be regarded as one of recognition of unregistered words in the lexical analysis, and is the problem that the number of unregistered words is the largest, the recognition difficulty is the largest, and the word segmentation effect is the most influenced. Meanwhile, NER is also the basis of a plurality of NLP tasks such as relation extraction, event extraction, knowledge graph, machine translation, question-answering system and the like.
The focus of the task of extracting the named entity identification information is urgent in actual production, but the characteristics of infinite number of named entities, flexible word formation, fuzzy category and the like make the named entity identification difficult. Conventional classification algorithms only consider physical characteristics (e.g., similarity, distance, distribution, etc.) between data, and do not consider semantic characteristics (e.g., context semantic information may exist in the text) between data.
Traditional classification learning methods, such as SVM and some other network-based classification algorithms, require the use of all training data in practical implementations, and noise present in a vast amount of data can reduce the recognition efficiency of named entities.
Disclosure of Invention
The invention provides a named entity identification method based on network classification, aiming at constructing a classification network by selecting part of named entity identification samples and identifying the to-be-detected named entity samples, thereby improving the identification efficiency of the named entities and providing technical support for information extraction, question-answering systems, syntactic analysis, machine translation and the like.
In order to achieve the above purpose, the invention adopts the following technical scheme:
the invention relates to a named entity identification method based on network classification, which is characterized by comprising the following steps:
step one: named entity classification model training:
step 1.1: text data of T named entity samples are obtained and converted into vector data ψ= ((x) using Word2Vec natural language processing tool 1 ,y 1 ),(x 2 ,y 2 ),…,(x t ,y t ),…,(x T ,y T )),(x t ,y t ) Represents the t thVector data of named entity samples, where x t Representing attribute features of the t-th named entity sample, an Representing the attribute characteristics of the d-th named entity in the t-th named entity sample; y is t A label representing a T-th named entity sample, t=1, 2, …, T;
step 1.2: for the attribute characteristics x of the t named entity sample t Performing standardization processing to obtain feature vectors of the t named entity samples Representing the d-th feature about the named entity in the t-th named entity sample;
step 1.3: two objective functions f are constructed using the equations (1) and (2), respectively 1 And f 2 :
min f 1 =Rr(V s ) (1)
In the formula (1), V s For vector data selected from T vector data ψ, rr (V s ) Representing selected vector data V s Is the proportion of T vector data ψ;
in the formula (2), the amino acid sequence of the compound,to utilize the selected vector data V s A constructed classification network; />To classify networkClassification accuracy of (2);
step 1.4: taking a set of vector data of S named entity samples to be selected as an initial population P= { P 1 ,...,p S },p S The vector data set representing the S-th named entity sample to be selected is combined as one individual;
coding the initial population P by adopting binary codes with the length of T; if individual p S The ith bit in the binary code of (1) represents the attribute feature x of the ith named entity sample t Selected and used to construct a classification network
Step 1.5: defining the current iteration number as N and the maximum iteration number as N; and initializing n=1; the initial population P is taken as the parent population P of the nth iteration n ;
Step 1.6: parent population P from the nth iteration through binary tournament n Randomly selecting two individuals p x And p y And respectively constructing classification networksAnd->If the classification network->Is higher than the classification network +.>From the parent population P of the nth iteration n Acquiring higher than classification network->All individuals of precision and randomly selecting one individual p therefrom z The method comprises the steps of carrying out a first treatment on the surface of the For individual p y And p z Cross-mutating to obtain mutated individual p' y And p' z The method comprises the steps of carrying out a first treatment on the surface of the From individual p y 、p′ y And' z Selecting the individual with highest classification network precision to replace the individual p y The method comprises the steps of carrying out a first treatment on the surface of the Finally by the replaced individual p y With individual p x Performing cross-variation to generate offspring P of the nth iteration ′n ;
Step 1.7: the parent population P of the nth iteration n And offspring P of the nth iteration ′n Merging to obtain a merged population of the nth iteration, and obtaining any individual p in the merged population of the nth iteration by using a formula (3) n Importance of (1) IMP (p n ):
IMP(p n )=α×Acc(p n )+(1-α)×(-Red(p n )) (3)
In the formula (3), alpha is a compromise factor, acc (p) n ) For individual p n Is of the precision of Red (p n ) For individual p n Is provided with the following redundancy:
Red(p n )=(a 1 ×b 1 +a 2 ×b 2 +...+a i ×b i +...+a m ×b m )/m (4)
in the formula (4), m is the individual p divided from the combined population of the nth iteration n Number of individuals other than; a, a i For individual p n Dividing individuals p in the combined population with the nth iteration n Redundancy of the ith individual outside in the source space and through individual p n Dividing the number of the same named entity samples selected by the ith individual by T, i e { 1..m }; b i For individual p n Redundancy in the accuracy target space with the i-th individual is obtained by the equation (5):
in the formula (5), acc (i) represents the accuracy of the classification network constructed by the ith individual, acc (p) n ) Representing individual p n The precision of the constructed classification network;
step 1.8: obtaining all individuals p in the pooled population for the nth iteration according to equation (3) n And selecting the former S individuals as the parent population P of the (n+1) th iteration n ;
Step 1.9: assigning n+1 to N, judging whether N is more than N, if so, selecting vector data of a named entity sample corresponding to an individual with highest classification network precision in the parent population of the nth iteration and using the vector data to construct an optimal network classifier, executing the second step, otherwise, returning to the step 1.6 for execution;
step two: named entity identification:
step 2.1: inputting text data of a named entity sample to be identified, processing according to the steps 1.1 and 1.2, and obtaining a feature vector of the sample to be identified;
step 2.3: classifying the feature vectors of the sample to be detected by using the optimal network classifier, wherein the obtained label represents a named entity corresponding to the sample to be detected.
The invention relates to a named entity recognition method based on network classification, which is characterized in that the classification network in the formula (6)The method adopts the construction mode of the k-associated optimal diagram of Euclidean distance, and comprises the following steps:
for feature vectorsObtaining the Euclidean distance d between the feature vector of the d named entity in the t named entity sample and the feature vector of the d named entity in the i named entity sample by using the formula (6) ti And selecting k named entities of the same category closest to the network connection, thereby forming a classified network:
in the formula (6), the amino acid sequence of the compound,representing the feature vector of the (d) th named entity in the (t) th named entity sample.
Compared with the prior art, the invention has the beneficial effects that:
1. the invention is different from the traditional classification method, provides a named entity identification method based on network classification, comprehensively considers the physical and semantic characteristics of named entity sample data, and constructs a classification network by screening and training the named entity sample data, so that noise points are removed, and the named entity can be identified more efficiently.
2. The present invention defines two objectives: the number of samples in the selected named entity recognition sample set constructs the optimization problem of the classification precision of the network, and the high-quality named entity sample data is selected through optimizing the number of samples in the selected named entity recognition sample set, so that the classification network with better classification effect is constructed, and the performance and the accuracy of named entity recognition are improved.
3. In the iterative process, the method adopts a precision preference-based solution strategy, and performs precision guidance on the low-precision named entity recognition sample set to obtain more excellent offspring, so that the quality of the classification network to be constructed is effectively improved, and the classifier finally used for named entity recognition has better classification effect and higher recognition accuracy.
4. In the process of selecting the next generation named entity recognition sample set, the importance-based selection strategy is adopted, so that the classifier for the named entity recognition finally has better classifying effect and better performance by carrying out importance sorting selection on all named entity recognition sample sets to enter the next generation more excellently, and continuous optimization in the iterative process is ensured.
Drawings
Fig. 1 is a flow chart of the method of the present invention.
Detailed Description
In this embodiment, a named entity recognition method based on network classification includes a named entity classification model training step and a named entity recognition step, specifically, as shown in fig. 1, the method is performed according to the following steps:
step one: named entity classification model training:
step 1.1: taking name named entity recognition as an example, text data of T named entity samples are obtained, and the text data are converted into vector data ψ= ((x) by using Word2Vec natural language processing tool 1 ,y 1 ),(x 2 ,y 2 ),…,(x t ,y t ),…,(x T ,y T )),(x t ,y t ) Vector data representing the t-th named entity sample, where x t Representing attribute features of the t-th named entity sample, an Representing the attribute characteristics of the (d) th named entity in the (t) th named entity sample, namely describing the attribute characteristics of the (t) th personal name, wherein the common attributes include birth time, native, height, weight, nickname, main contribution and the like; y is t The label representing the sample of the t-th named entity is the sign that the named entity belongs to a certain category, and is the name of a person. Thus converting the named entity recognition problem into a multi-classification problem, y t Representing the name of the person represented by the label described by the T-th named entity sample, t=1, 2, …, T;
step 1.2: attribute feature x for the t-th named entity sample t Performing standardization processing to obtain feature vectors of the t named entity samples Representing the d-th feature about the named entity in the t-th named entity sample;
step 1.3: by using (1) andequation (2) constructs two objective functions f, respectively 1 And f 2 All the targets are minimization:
min f 1 =Rr(V s ) (1)
in the formula (1), V s For vector data selected from T vector data ψ, rr (V s ) Representing selected vector data V s Is the proportion of T vector data ψ;
in the formula (2), the amino acid sequence of the compound,to utilize the selected vector data V s A constructed classification network; />To classify networkClassification accuracy of (2);
step 1.4: taking a set of vector data of S named entity samples to be selected as an initial population P= { P 1 ,...,p S },p S The vector data set representing the S-th named entity sample to be selected is combined as one individual;
coding the initial population P by adopting binary codes with the length of T; if individual p S The ith bit in the binary code of (1) represents the attribute feature x of the ith named entity sample t Selected and used to construct a classification networkFor example, assume a total of 10 named entity samples and p S In 3,5,8,9 is 1, p S Selectively naming a set of entity identification samples as (x 3 ,x 5 ,x 8 ,x 9 );
Step 1.5: definition of the currentThe iteration number is N, and the maximum iteration number is N; and initializing n=1; the initial population P is taken as the parent population P of the nth iteration n ;
Step 1.6: parent population P from the nth iteration through binary tournament n Randomly selecting two individuals p x And p y And respectively constructing classification networksAnd->Classifying by using the constructed network; if the classification network->Is higher than the classification network +.>From the parent population P of the nth iteration n Acquiring higher than classification network->All individuals of precision and randomly selecting one individual p therefrom z The method comprises the steps of carrying out a first treatment on the surface of the For individual p y And p z Cross-mutating to obtain mutated individual p' y And p' z The method comprises the steps of carrying out a first treatment on the surface of the From individual p y 、p′ y And p' z Selecting the individual with highest classification network precision to replace the individual p y Thus, the poor guiding of the two is performed and the excellent guiding individual is obtained; finally by the replaced individual p y With individual p x Performing cross-variation to generate offspring P of the nth iteration ′n ;
Step 1.7: the parent population P of the nth iteration n And offspring P of the nth iteration ′n Merging to obtain a merged population of the nth iteration, and obtaining any individual p in the merged population of the nth iteration by using a formula (3) n Importance of (1) IMP (p n ):
IMP(p n )=α×Acc(p n )+(1-α)×(-Red(p n )) (3)
In the formula (3), alpha is a compromise factor, and is usually 0.8, acc (p n ) For individual p n Is of the precision of Red (p n ) For individual p n The importance obtained by integrating the integrated accuracy and redundancy has a more balanced evaluation for the individual, and has:
Red(p n )=(a 1 ×b 1 +a 2 ×b 2 +...+a i ×b i +...+a m ×b m )/m (4)
in the formula (4), m is the individual p divided from the combined population of the nth iteration n Number of individuals other than; a, a i For individual p n Dividing individuals p in the combined population with the nth iteration n Redundancy of the ith individual outside in the source space and through individual p n The number of identical named entity samples selected by the ith individual divided by T, i e { 1..m }, a i The larger the description of the individual p n The higher the redundancy in source space with individual i; b i For individual p n Redundancy in the precision target space with the ith individual is clear and reasonable in redundancy analysis of each individual by combining the redundancy in the source space and the precision target space, and the judgment effect on the subsequent importance is larger, and the redundancy is obtained by the formula (5):
in the formula (5), acc (i) represents the accuracy of the classification network constructed by the ith individual, acc (p) n ) Representing individual p n Accuracy of the constructed classification network, b i The larger the description of the individual p n The higher the spatial redundancy with individual i in the precision target;
step 1.8: obtaining all individuals p in the pooled population for the nth iteration according to equation (3) n And selecting the former S individuals as the parent population P of the (n+1) th iteration n ;
Step 1.9: assigning n+1 to N, judging whether N is more than N, if so, selecting vector data of a named entity sample corresponding to an individual with highest classification network precision in the parent population of the nth iteration and using the vector data to construct an optimal network classifier, executing the second step, otherwise, returning to the step 1.6 for execution;
step two: named entity identification, classifying the sample to be detected by using the most network classifier obtained in the step one:
step 2.1: inputting text data of a named entity sample to be identified, processing according to the steps 1.1 and 1.2, and obtaining feature vectors of the sample to be identified, wherein common features include birth time, native, height, weight, nickname, main contribution and the like;
step 2.3: classifying the feature vectors of the sample to be detected by using an optimal network classifier, wherein the obtained label represents a named entity corresponding to the sample to be detected.
2. The method for identifying a named entity based on network classification as claimed in claim 1, wherein the classification network in formula (6)The method adopts the construction mode of the k-associated optimal diagram of Euclidean distance, and comprises the following steps:
for feature vectorsObtaining the Euclidean distance d between the feature vector of the d named entity in the t named entity sample and the feature vector of the d named entity in the i named entity sample by using the formula (6) ti And selecting k named entities of the same category closest to the network connection, thereby forming a classified network:
in the formula (6), the amino acid sequence of the compound,represents the t thThe d-th feature vector of the named entity sample about the named entity.
The method is tested and verified by adopting objectively collected data.
1) Acquiring text data of named entity samples related to the names of people, namely acquiring sentences or paragraphs related to the names of people in a literature, converting the real-world text data into vector data which can be processed by a computer by using a Word2Vec tool, dividing a processed data set into training samples and test samples, selecting the best training samples through ten-fold cross-validation to construct a classification network, and identifying named entities of the test samples.
2) Evaluation indexes;
the classification accuracy is used as an evaluation index for the present example to evaluate the performance of the named entity recognition. The higher the accuracy is, the better the classification effect is, and the higher the identification accuracy is.
3) Performing an experiment on the dataset;
the validity of the invention is verified by experimental results on the dataset. In the present day of high information diversification, it is important to accurately and efficiently identify named entities from texts and analyze the named entities. Experiments show that the method can rapidly and effectively extract key attributes of the named entities from massive texts and identify the categories of the named entities, improves the recognition efficiency of the named entities and provides a basis for information extraction, question-answering systems, syntactic analysis, machine translation and the like.
Claims (2)
1. A named entity identification method based on network classification is characterized by comprising the following steps:
step one: named entity classification model training:
step 1.1: text data of T named entity samples are obtained and converted into vector data ψ= ((x) using Word2Vec natural language processing tool 1 ,y 1 ),(x 2 ,y 2 ),…,(x t ,y t ),…,(x T ,y T )),(x t ,y t ) Representing the t-th named entity sampleVector data of the present, where x t Representing attribute features of the t-th named entity sample, an Representing the attribute characteristics of the d-th named entity in the t-th named entity sample; y is t A label representing a T-th named entity sample, t=1, 2, …, T;
step 1.2: for the attribute characteristics x of the t named entity sample t Performing standardization processing to obtain feature vectors of the t named entity samples Representing the d-th feature about the named entity in the t-th named entity sample;
step 1.3: two objective functions f are constructed using the equations (1) and (2), respectively 1 And f 2 :
minf 1 =Rr(V s ) (1)
In the formula (1), V s For vector data selected from T vector data ψ, rr (V s ) Representing selected vector data V s Is the proportion of T vector data ψ;
in the formula (2), the amino acid sequence of the compound,to utilize the selected vector data V s A constructed classification network; />For classifying networks->Classification accuracy of (2);
step 1.4: taking a set of vector data of S named entity samples to be selected as an initial population P= { P 1 ,...,p S },p S The vector data set representing the S-th named entity sample to be selected is combined as one individual;
coding the initial population P by adopting binary codes with the length of T; if individual p S The ith bit in the binary code of (1) represents the attribute feature x of the ith named entity sample t Selected and used to construct a classification network
Step 1.5: defining the current iteration number as N and the maximum iteration number as N; and initializing n=1; the initial population P is taken as the parent population P of the nth iteration n ;
Step 1.6: parent population P from the nth iteration through binary tournament n Randomly selecting two individuals p x And p y And respectively constructing classification networksAnd->If the classification network->Is higher than the classification network +.>From the parent population P of the nth iteration n Acquiring higher than classification network->All individuals of precision and randomly selecting one individual p therefrom z The method comprises the steps of carrying out a first treatment on the surface of the For individual p y And p z Cross-mutating to obtain mutated individual p' y And p' z The method comprises the steps of carrying out a first treatment on the surface of the From individual p y 、p′ y And p' z Selecting the individual with highest classification network precision to replace the individual p y The method comprises the steps of carrying out a first treatment on the surface of the Finally by the replaced individual p y With individual p x Performing cross-variation to generate offspring P 'of the nth iteration' n ;
Step 1.7: the parent population P of the nth iteration n And offspring P 'of the nth iteration' n Merging to obtain a merged population of the nth iteration, and obtaining any individual p in the merged population of the nth iteration by using a formula (3) n Importance of (1) IMP (p n ):
IMP(p n )=α×Acc(p n )+(1-α)×(-Red(p n )) (3)
In the formula (3), alpha is a compromise factor, acc (p) n ) For individual p n Is of the precision of Red (p n ) For individual p n Is provided with the following redundancy:
Red(p n )=(a 1 ×b 1 +a 2 ×b 2 +...+a i ×b i +...+a m ×b m )/m (4)
in the formula (4), m is the individual p divided from the combined population of the nth iteration n Number of individuals other than; a, a i For individual p n Dividing individuals p in the combined population with the nth iteration n Redundancy of the ith individual outside in the source space and through individual p n Dividing the number of the same named entity samples selected by the ith individual by T, i e { 1..m }; b i For individual p n Redundancy in the accuracy target space with the i-th individual is obtained by the equation (5):
in the formula (5), acc (i) represents the accuracy of the classification network constructed by the ith individual, acc (p) n ) Representing individual p n The precision of the constructed classification network;
step 1.8: obtaining all individuals p in the pooled population for the nth iteration according to equation (3) n And selecting the former S individuals as the parent population P of the (n+1) th iteration n ;
Step 1.9: assigning n+1 to N, judging whether N is more than N, if so, selecting vector data of a named entity sample corresponding to an individual with highest classification network precision in the parent population of the nth iteration and using the vector data to construct an optimal network classifier, executing the second step, otherwise, returning to the step 1.6 for execution;
step two: named entity identification:
step 2.1: inputting text data of a named entity sample to be identified, processing according to the steps 1.1 and 1.2, and obtaining a feature vector of the sample to be identified;
step 2.3: classifying the feature vectors of the sample to be detected by using the optimal network classifier, wherein the obtained label represents a named entity corresponding to the sample to be detected.
2. The network classification-based named entity recognition method of claim 1, wherein the classification network in formula (6)The method adopts the construction mode of the k-associated optimal diagram of Euclidean distance, and comprises the following steps:
for feature vectorsObtaining the Euclidean distance d between the feature vector of the d named entity in the t named entity sample and the feature vector of the d named entity in the i named entity sample by using the formula (6) ti And selecting k named entities of the same category closest to the network connection to form a scoreClass network:
in the formula (6), the amino acid sequence of the compound,representing the feature vector of the (d) th named entity in the (t) th named entity sample.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011472395.7A CN112487816B (en) | 2020-12-14 | 2020-12-14 | Named entity identification method based on network classification |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011472395.7A CN112487816B (en) | 2020-12-14 | 2020-12-14 | Named entity identification method based on network classification |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112487816A CN112487816A (en) | 2021-03-12 |
CN112487816B true CN112487816B (en) | 2024-02-13 |
Family
ID=74916987
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011472395.7A Active CN112487816B (en) | 2020-12-14 | 2020-12-14 | Named entity identification method based on network classification |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112487816B (en) |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2007137487A1 (en) * | 2006-05-15 | 2007-12-06 | Panasonic Corporation | Method and apparatus for named entity recognition in natural language |
CN107203511A (en) * | 2017-05-27 | 2017-09-26 | 中国矿业大学 | A kind of network text name entity recognition method based on neutral net probability disambiguation |
WO2018072351A1 (en) * | 2016-10-20 | 2018-04-26 | 北京工业大学 | Method for optimizing support vector machine on basis of particle swarm optimization algorithm |
CN109581339A (en) * | 2018-11-16 | 2019-04-05 | 西安理工大学 | A kind of sonar recognition methods based on brainstorming adjust automatically autoencoder network |
CN110162795A (en) * | 2019-05-30 | 2019-08-23 | 重庆大学 | A kind of adaptive cross-cutting name entity recognition method and system |
-
2020
- 2020-12-14 CN CN202011472395.7A patent/CN112487816B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2007137487A1 (en) * | 2006-05-15 | 2007-12-06 | Panasonic Corporation | Method and apparatus for named entity recognition in natural language |
WO2018072351A1 (en) * | 2016-10-20 | 2018-04-26 | 北京工业大学 | Method for optimizing support vector machine on basis of particle swarm optimization algorithm |
CN107203511A (en) * | 2017-05-27 | 2017-09-26 | 中国矿业大学 | A kind of network text name entity recognition method based on neutral net probability disambiguation |
CN109581339A (en) * | 2018-11-16 | 2019-04-05 | 西安理工大学 | A kind of sonar recognition methods based on brainstorming adjust automatically autoencoder network |
CN110162795A (en) * | 2019-05-30 | 2019-08-23 | 重庆大学 | A kind of adaptive cross-cutting name entity recognition method and system |
Non-Patent Citations (1)
Title |
---|
冯艳红 ; 于红 ; 孙庚 ; 孙娟娟 ; .基于BLSTM的命名实体识别方法.计算机科学.2017,(第02期),全文. * |
Also Published As
Publication number | Publication date |
---|---|
CN112487816A (en) | 2021-03-12 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111444342B (en) | Short text classification method based on multiple weak supervision integration | |
CN110795564B (en) | Text classification method lacking negative cases | |
CN113672718B (en) | Dialogue intention recognition method and system based on feature matching and field self-adaption | |
Alotaibi et al. | Optical character recognition for quranic image similarity matching | |
CN110909116B (en) | Entity set expansion method and system for social media | |
CN108959305A (en) | A kind of event extraction method and system based on internet big data | |
CN111222318A (en) | Trigger word recognition method based on two-channel bidirectional LSTM-CRF network | |
CN112417132B (en) | New meaning identification method for screening negative samples by using guest information | |
CN114491062B (en) | Short text classification method integrating knowledge graph and topic model | |
CN114936277B (en) | Similar question matching method and user similar question matching system | |
CN111191033A (en) | Open set classification method based on classification utility | |
CN112800249A (en) | A Fine-Grained Cross-Media Retrieval Method Based on Generative Adversarial Networks | |
CN110910175A (en) | Tourist ticket product portrait generation method | |
CN113779282B (en) | Fine-grained cross-media retrieval method based on self-attention and generation countermeasure network | |
CN111159332A (en) | Text multi-intention identification method based on bert | |
CN112860898A (en) | Short text box clustering method, system, equipment and storage medium | |
CN113987168A (en) | Business review analysis system and method based on machine learning | |
CN108108184A (en) | A kind of source code writer identification method based on depth belief network | |
CN109977227B (en) | Method, system and device for text feature extraction based on feature coding | |
CN115168590A (en) | Text feature extraction method, model training method, device, equipment and medium | |
CN113535928A (en) | Service discovery method and system based on long short-term memory network based on attention mechanism | |
CN114093445A (en) | Patient screening and marking method based on multi-label learning | |
CN119128125A (en) | Personalized document recommendation method and system based on knowledge graph | |
CN112487816B (en) | Named entity identification method based on network classification | |
CN111191448A (en) | Word processing method, device, storage medium and processor |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |