CN114626378B

CN114626378B - Named entity recognition method, named entity recognition device, electronic equipment and computer readable storage medium

Info

Publication number: CN114626378B
Application number: CN202011529431.9A
Authority: CN
Inventors: 肖韧; 杨秀武
Original assignee: Asiainfo Technologies China Inc
Current assignee: Asiainfo Technologies China Inc
Priority date: 2020-12-22
Filing date: 2020-12-22
Publication date: 2024-06-18
Anticipated expiration: 2040-12-22
Also published as: CN114626378A

Abstract

The embodiment of the application provides a named entity identification method, a named entity identification device, electronic equipment and a computer readable storage medium, which relate to the field of artificial intelligence, and are used for mapping each word in an address text to be analyzed into a corresponding word vector to obtain a word vector sequence, respectively determining the score of each word vector in the word vector sequence corresponding to different preset category labels, and acquiring a labeling sequence according to the score and a preset transfer matrix to finish the analysis of the address text. The resolving process can learn the context relation of the address text well without configuring an address hierarchy in advance or setting an address resolving rule, and improves the efficiency and accuracy of word segmentation and resolving of the address text.

Description

Named entity recognition method, named entity recognition device, electronic equipment and computer readable storage medium

Technical Field

The application relates to the technical field of artificial intelligence, in particular to a named entity identification method, a named entity identification device, electronic equipment and a computer readable storage medium.

Background

Named Entity Recognition (NER) is a basic task in the field of natural language processing, is an important basic tool for numerous natural language processing tasks such as information extraction, question-answering systems, syntactic analysis, machine translation and the like, and can recognize boundaries and categories of entity references in natural texts.

At present, when the named entity recognition is carried out on an address text, the prior art scheme generally constructs an administrative division database firstly and then constructs a custom region dictionary. In the address analysis process, the words are segmented firstly, and then the structural information of the address is obtained by searching according to the sequence of the segmented words and combining with a decision tree. The administrative division data are required to be manually configured, a cascade relation is established, an address resolution rule is defined, and when the format of the address text is greatly different from the format template, wrong word segmentation and analysis results can be obtained, so that the accuracy is low.

Disclosure of Invention

The application aims to at least solve one of the technical defects, especially the technical defect of low accuracy of the address text analysis result.

In a first aspect, a method for named entity recognition is provided, the method comprising:

acquiring an address text to be resolved;

Based on a preset recognition model, the following steps are executed:

Mapping at least one target word in the address text into corresponding target word vectors respectively, and obtaining a word vector sequence based on the target word vectors;

respectively determining the scores of the category labels corresponding to different presets of each word vector in the word vector sequence;

and obtaining a labeling sequence based on the score and a preset transfer matrix.

In an alternative embodiment of the first aspect, mapping at least one target word in the address text to a corresponding target word vector comprises:

Determining a target word vector corresponding to at least one target word in the address text based on a preset lookup table; the lookup table comprises a plurality of words and word vectors corresponding to each word;

In an alternative embodiment of the first aspect, determining the score of each word vector in the sequence of word vectors corresponding to a different preset class label, respectively, comprises:

Extracting forward features in the word vector sequence to obtain a forward hidden vector;

Extracting backward features in the word vector sequence to obtain a backward hidden vector;

Splicing the forward hidden vector and the backward hidden vector to obtain a target hidden vector; the target hidden vector includes scores for word vectors corresponding to different preset class labels.

In an alternative embodiment of the first aspect, obtaining the labeling sequence based on the score and the preset transition matrix includes:

Determining an optimal path based on the score and the transition matrix;

and acquiring the labeling sequence based on the optimal path.

In an alternative embodiment of the first aspect, determining the optimal path based on the score and the transfer matrix comprises:

the transfer matrix comprises transfer scores of different transfer paths among the class labels;

taking the score as a transmission score, and determining the transmission scores of the word vectors corresponding to different preset category labels;

an optimal path is obtained based on the transfer score and the transmit score.

In an optional embodiment of the first aspect, before obtaining the address text to be parsed, the method further includes:

acquiring a sample address text; each word in the sample address text is provided with a corresponding sample category label;

training the initial recognition model according to the sample address text and the sample class label to obtain a recognition model; the parameters of the recognition model include a transfer matrix.

In a second aspect, an apparatus for named entity recognition is provided, including:

The acquisition module is used for acquiring the address text to be analyzed;

the mapping module is used for mapping at least one target word in the address text into corresponding target word vectors based on a preset recognition model, and obtaining a word vector sequence based on the target word vectors;

the determining module is used for respectively determining the scores of the different preset category labels corresponding to each word vector in the word vector sequence based on the recognition model;

and the labeling module is used for acquiring a labeling sequence based on the score and a preset transfer matrix through the identification model.

In an alternative embodiment of the second aspect, the mapping module is specifically configured to, when mapping at least one target word in the address text to a corresponding target word vector:

Determining a target word vector corresponding to at least one target word in the address text based on a preset lookup table; the lookup table includes a plurality of words and a word vector corresponding to each word.

In an optional embodiment of the second aspect, the determining module is specifically configured to, when determining, respectively, a score of each word vector in the sequence of word vectors corresponding to a different preset class label:

In an optional embodiment of the second aspect, the labeling module is specifically configured to, when acquiring the labeling sequence based on the score and the preset transition matrix:

Determining an optimal path based on the score and the transition matrix;

and acquiring the labeling sequence based on the optimal path.

In an alternative embodiment of the second aspect, the labeling module is specifically configured to, when determining the optimal path based on the score and the transition matrix:

an optimal path is obtained based on the transfer score and the transmit score.

In an alternative embodiment of the second aspect, the apparatus further comprises a training module, specifically configured to:

In an alternative embodiment of the second aspect, the training module is specifically configured to, when training the initial recognition model according to the sample address text and the sample class label:

inputting the sample address text into an initial recognition model to obtain a labeling result corresponding to the sample address text; the parameters of the initial recognition model comprise an initial transfer matrix;

Determining a negative log-likelihood function based on the sample class label and the labeling result, and taking the negative log-likelihood function as a loss function of the initial recognition model;

Training the initial recognition model according to the address text sample to obtain a trained recognition model and a transfer matrix in the recognition model.

In a third aspect, an electronic device is provided, the electronic device comprising:

the system comprises a memory, a processor and a computer program stored in the memory and capable of running on the processor, wherein the processor realizes the named entity identification method of any embodiment when executing the program.

In a fourth aspect, the present invention also provides a computer readable storage medium, on which a computer program is stored, which when executed by a processor implements a named entity recognition method according to any of the embodiments described above.

According to the named entity recognition method, the word vector sequence is obtained by mapping each word in the address text to be resolved into the corresponding word vector, the score of each word vector in the word vector sequence corresponding to different preset category labels is respectively determined, the labeling sequence is obtained according to the score and the preset transfer matrix, the resolution of the address text is completed, the address hierarchy is not required to be configured in advance, the address resolution rule is not required, the context relation of the address text can be learned well, and the efficiency and the accuracy of word segmentation and resolution of the address text are improved.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings that are required to be used in the description of the embodiments of the present application will be briefly described below.

FIG. 1 is a flow chart of a named entity recognition method according to an embodiment of the present application;

FIG. 2 is a schematic diagram of encoding words in a named entity recognition method according to an embodiment of the present application;

FIG. 3 is a schematic diagram of mapping a sparse matrix into a low-dimensional matrix with reserved semantic relationships through Eembedding in a named entity recognition method according to an embodiment of the present application;

FIG. 4 is a schematic diagram of LSTM unit in a named entity recognition method according to an embodiment of the present application;

FIG. 5 is a diagram of BILSTM layers in a named entity recognition method according to an embodiment of the present application;

FIG. 6 is a schematic diagram of a model overall architecture in a named entity recognition method according to an embodiment of the present application;

FIG. 7 is a flowchart of a named entity recognition method according to an embodiment of the present application;

FIG. 8 is a schematic structural diagram of a named entity recognition device according to an embodiment of the present application;

Fig. 9 is a schematic structural diagram of an electronic device for named entity recognition according to an embodiment of the present application.

Detailed Description

Embodiments of the present application are described in detail below, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to like or similar elements or elements having like or similar functions throughout. The embodiments described below by referring to the drawings are illustrative only and are not to be construed as limiting the application.

As used herein, the singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, unless expressly stated otherwise, as understood by those skilled in the art. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. It will be understood that when an element is referred to as being "connected" or "coupled" to another element, it can be directly connected or coupled to the other element or intervening elements may also be present. Further, "connected" or "coupled" as used herein may include wirelessly connected or wirelessly coupled. The term "and/or" as used herein includes all or any element and all combination of one or more of the associated listed items.

For the purpose of making the objects, technical solutions and advantages of the present application more apparent, embodiments of the present application will be described in further detail below with reference to the accompanying drawings.

Natural language is a crystal of human intelligence, and NLP (Natural Language Processing ) refers to a sub-field of artificial intelligence, which can have unusual achievements in more and more fields under the rapid development of artificial intelligence, but natural language processing is still one of the most difficult problems in artificial intelligence.

Named Entity Recognition (NER) is a very basic task in NLP, is an important basic tool for many NLP tasks such as information extraction, question-answering system, syntax analysis, machine translation, etc., and is used for word segmentation and parsing of entities with specific meaning in text, such as name, place name, organization name, proper noun, etc., specifically, for recognizing boundaries and categories pointed by entities in natural text.

When the named entity recognition is performed on the address text, the prior art scheme generally constructs an administrative division database first, and then constructs a custom region dictionary. In the address analysis process, the words are segmented firstly, and then the structural information of the address is obtained by searching according to the sequence of the segmented words and combining with a decision tree. Wherein, it needs to manually configure administrative division data, establish cascade relation and define address resolution rules, when the format of address text and format template differ greatly, it may obtain wrong word segmentation and analysis results. As the database and the address resolution rules are manually configured, the cascade relationship contained in the resolution rules and the database is required to be continuously adjusted, modified and supplemented along with the increase of entity names related to the address text, and the maintenance cost is high.

In addition, the universal word segmentation method is easy to fragment the entity names in the address text, so that the matching degree of the segmented entity names and the class labels is poor, the final analysis result is affected, and the analysis accuracy is low.

The application provides a named entity identification method, a named entity identification device, electronic equipment and a computer readable storage medium, and aims to solve the technical problems in the prior art.

The following describes the technical scheme of the present application and how the technical scheme of the present application solves the above technical problems in detail with specific embodiments. The following embodiments may be combined with each other, and the same or similar concepts or processes may not be described in detail in some embodiments. Embodiments of the present application will be described below with reference to the accompanying drawings.

The named entity identification method provided by the embodiment of the application can be applied to a server and a terminal.

As will be appreciated by those skilled in the art, the "terminal" as used herein may be a cell phone, tablet computer, PDA (Personal DIGITAL ASSISTANT ), MID (Mobile INTERNET DEVICE, mobile internet device), etc.; the "server" may be implemented as a stand-alone server or as a server cluster composed of a plurality of servers.

The embodiment of the application provides a named entity identification method, which can be applied to a server and a terminal, as shown in fig. 1, and comprises the following steps:

Step S100, obtaining an address text to be resolved.

In the embodiment of the application, the address text to be resolved refers to the address text to be subjected to word segmentation and resolution, and the address text can be Chinese or other languages.

The method for acquiring the address text can be selected from local texts, can be used for reading the input of a user in an application client, can be obtained by scanning through an image pickup device in the fields of postal express delivery and the like, and is not limited in the application.

Step S200, at least one target word in the address text is mapped into corresponding target word vectors respectively, and a word vector sequence is obtained based on the target word vectors.

In embodiments of the present application, each word in the address text may be mapped to a corresponding word vector to represent semantically similar or related words by distance in vector space.

Specifically, an unmapped target word may be selected from words included in the address text, and a target word vector corresponding to the target word may be determined based on a preset lookup table. The lookup table comprises a plurality of words and word vectors corresponding to each word.

In embodiments of the present application, words may be mapped to word vectors by a word vector embedding (Eembedding) layer in the deep learning field.

Specifically, a preset number of words with high occurrence frequency can be selected from a word data set with a wider coverage area, the words are used as high-frequency common words, the preset number can be determined according to requirements, and a proper value can be selected through experimental inspection.

Further, the selected word may be encoded again, where the encoding may cause the high frequency common word to be represented as a sparse tensor, where the sparse tensor has very few non-zero elements, so that when the data size is large, resources are very occupied, and it is difficult to embody the context between the words, where the sparse tensor may be further mapped to a low-dimensional word vector through a fully connected neural network, so that each word has a corresponding word vector, and a lookup table including the high frequency common word and its word vector is obtained.

The mapping process of the target word into the target word vector may be similar to the lookup process, i.e. the lookup table is obtained based on the above steps to determine the target word vector corresponding to the target word.

In One example, when the language of the address text is chinese, 10000 chinese characters with highest occurrence frequency may be extracted from the dataset of the corpus of chinese characters, and One-hot (independent hot) method encoding may be performed on the 10000 chinese characters.

The word vector obtained by encoding by the One-hot method has two properties: high-dimensional and sparse, as shown in fig. 2, assuming that a sentence is "i love China", the coding result as shown in the figure can be obtained.

After performing One-hot encoding on 10000 Chinese characters, each Chinese character corresponds to a 1-hot vector with 10000 dimensions, the 10000 1-hot vectors can be regarded as a sparse matrix, the word vector embedding (Eembedding) layer can convert the sparse matrix into a low-dimensional matrix with reserved semantic relations through mapping of a fully connected neural network, for example, as shown in fig. 3, after mapping each 1-hot vector into a 300-dimensional word vector, a weight parameter matrix containing Chinese character context relations and 10000 x 300 is obtained, and the weight parameter matrix is used as a lookup table.

The process of mapping the target word selected in the address text into a 300-dimensional vector may be similar to the process of a lookup table, and the index label of the target word is obtained according to the target sub-and the index label is searched in the lookup table to obtain a 300-dimensional row vector corresponding to the target sub-i.e. the target word vector.

In the embodiment of the present application, the weight of Embedding layers may be replaced by a pre-training model, and if Embedding layers are replaced by the pre-training model, the weight parameters may not need to be trained in advance.

In an embodiment of the present application, BERT (Bidirectional Encoder Representations from Transformers) pre-trained models may be employed as Embedding layers.

The BERT model is suitable for tasks of sentence and paragraph levels, has better performance in processing high-level semantic information extraction tasks, and has the advantage of being capable of acquiring context-related bidirectional feature representations.

In the embodiment of the application, the word can be mapped into the word vector by the word vector embedding (Eembedding) layer in the deep learning field, the large sparse vector is converted into the low-dimensional space for retaining the semantic relationship, the resource occupation is reduced, the internal relationship between a large number of words is retained, and the relationship can be relatively mature in the training process.

Step S300, determining the score of each word vector in the word vector sequence corresponding to different preset category labels.

In the embodiment of the application, the word vector sequence can be formed by combining word vectors corresponding to each word in the address text, the score of each word vector corresponding to different preset category labels can be determined through the word vector sequence, and the context relation between words in a certain type of language is fully considered.

The preset label category may be a label scheme setting based on a sequence labeling (Sequence labeling) problem.

Specifically, each element (word) in the address text may be labeled as "B-X", "I-X", or "O" using a BIO labeling scheme, where "B-X" indicates that the segment in which the element is located is of the X type and that the element is at the beginning of the segment, "I-X" indicates that the segment in which the element is located is of the X type and that the element is at the middle of the segment, and "O" indicates that the element is not of any type.

A BIOES labeling scheme may also be used, where "B-X" indicates that the segment in which the element is located is of type X and that the element is at the beginning of the segment, "I-X" indicates that the segment in which the element is located is of type X and that the element is at the middle of the segment, "O" indicates that it is not of any type, "E-X" indicates that the segment in which the element is located is of type X and that the element is at the end of the segment, "S" indicates that the element itself may constitute an entity.

The labeling scheme can be optimized based on other general labeling schemes, can be set according to the requirements of a certain application field, and is not limited in the application.

Determining a score for each word vector by a sequence of word vectors corresponding to a different preset category label may refer to: a score is determined for each word vector for each category label. For example, there are A, B two class labels for two word vectors, w1 and w2, i.e. the scores of w1 and w2 for class label a and class label B are respectively determined, and the score is used to represent the probability that the labeling result of the word vector prediction is the label.

And step S400, acquiring a labeling sequence based on the score and a preset transfer matrix.

In an embodiment of the application, the score may be an output from BILSTM layers, and each word in the address text may have a different score for a different category label.

The preset transfer matrix can be obtained by training the initial identification network, and the transfer matrix can comprise transfer scores of different transfer paths among the category labels. The score transfer score can be used for calculating the optimal path from the first word to the last word of the address text, namely, the label sequence with the highest probability after the semantic association of the natural language is considered, so that the purpose of integrally analyzing the address text is achieved.

In the embodiment of the present application, step S300, determining the score of each word vector in the word vector sequence corresponding to a different preset class label, may include the following steps:

(1) Extracting forward features in the word vector sequence to obtain a forward hidden vector;

(2) Extracting backward features in the word vector sequence to obtain a backward hidden vector;

(3) Splicing the forward hidden vector and the backward hidden vector to obtain a target hidden vector; the target hidden vector includes scores for word vectors corresponding to different preset class labels.

In the embodiment of the application, BILSTM (Bi-directional Long Short-Term Memory layer) can be used to determine the score of each word vector in the word vector sequence corresponding to a different preset class label.

LSTM (Long Short-Term Memory layer), which is one of RNNs (Recurrent Neural Network, recurrent neural networks). LSTM is well suited for modeling time series data, such as text data, due to its design characteristics. BILSTM is a combination of forward LSTM and backward LSTM, and is commonly used to model context information in natural language processing tasks.

The LSTM unit is schematically shown in FIG. 4, and mainly comprises a cell unit and three gate control units, wherein the cell unit is used for storing and transmitting state information; the three gating units are respectively a forgetting gate, an input gate and an output gate:

(1) Forget gate (Forget Gate): the original inputs and hidden states of LSTM pass through the forgetting gate to generate a forgetting signal that determines which cell states information at the last time point t-1 should be discarded, equation (1, 1) is as follows:

f_t＝σ(W_f·[h_t-1,x_t]+b_f) (1.1)

(2) Input Gate (Input Gate): the input gate also generates a switching signal of input information from the hidden layer state at the time point t-1 and the input of the current time t, and generates the input information at the time point t The filtering is performed to determine which can enter the cell state, equations (1.2) and (1.3) are as follows:

i_t＝σ(W_i·[h_t-1,x_t]+b_i) (1.2)

(3) Output Gate (Output Gate): the output gate is similar to the two gates, the input signals of the output gate are from the hidden layer state at the time t-1 and the input at the time t, the output gate generates a switch signal to determine which parts of the cell state after activation can be used as output information through the output gate, and the formulas (1.4) and (1.5) are as follows:

o_t＝σ(W_o·[h_t-1,x_t]+b_o) (1.4)

h_t＝o_t*tanh(C_t) (1.5)

in LSTM, after the information carried by C _t passes the output gate limit constituted by the input of the current time, it contains more information of the current time, so the obtained h _t can be said to have short-term memory compared with C _t, while C _t has long-term memory.

BILSTM can be composed of two LSTM units in different directions, namely, one forward LSTM is used for utilizing past information, one reverse LSTM is used for utilizing future information, and at the time t, the information at the time t-1 and the information at the time t+1 can be utilized simultaneously, so that the prediction is more accurate than that of unidirectional LSTM.

In the embodiment of the application, two LSTM units in different directions can be used for forming BILSTM, and BILSTM can receive the word vectors output by Embedding layers one by one, and correspondingly generate the output vectors to be transmitted to the next neural network layer. As shown in FIG. 5, the BILSTM layers expanded along the time axis have two separate LSTM cells from left to right and from right to left respectively to receive input data, and then splice the outputs in different directions into final output data.

In One example, the first layer of the model may be Embedding layers, each word in the address text is mapped to a low-dimensional dense word vector by a corresponding One-hot vector, the second layer of the model may be BILSTM layers, sentence characteristics may be automatically extracted, a word vector sequence composed of word vectors of each word of the address text is used as input of each time step of BILSTM, the output of the forward LSTM is used as a forward hidden vector, the output of the backward LSTM is used as a backward hidden vector, hidden states output by the two at each position are spliced according to positions, and a complete hidden vector sequence (target hidden vector) is obtained, wherein the hidden vector sequence includes scores of each word for different types of labels.

In the embodiment of the application, when the labeling sequence is acquired based on the score and the preset transfer matrix, the optimal path can be determined based on the score and the transfer matrix, and then the labeling sequence can be acquired based on the optimal path.

In the embodiment of the application, determining the optimal path based on the score and the transfer matrix may include the following steps:

(1) The transfer matrix comprises transfer scores of different transfer paths among the class labels; and taking the score as a transmission score, and determining the transmission scores of the word vectors corresponding to different preset category labels.

Specifically, the transfer matrix may be obtained by training the initial identification network, and the transfer matrix may include transfer scores of different transfer paths between class labels. The score may be an output from the BiLSTM layers, each word in the address text may have a different score for a different category label, and the score may be taken as the emission score.

(2) An optimal path is obtained based on the transfer score and the transmit score.

In an embodiment of the present application, the optimal path may be determined by CRF (Conditional Random Fields, conditional random field) based on the score and the transition matrix.

The CRF is a special case of a Markov random field, belongs to an undirected probability graph model, can be used for the problem of sentence-level sequence labeling, considers the linear weighted combination of local features of the whole sentence, namely scans the whole sentence through a feature template, and completes optimization of the prediction result of the whole sequence by finding a sequence with highest probability.

Specifically, the transfer paths between different types of labels can represent the prediction results of different overall sequences, the scores of the different prediction results can be obtained by two parts, one part can be the score, namely the emission score, of each word vector output by BILSTM for the different types of labels, the other part can be the transfer score of the different paths determined by the transfer matrix in the CRF, and the other part can be processed by a softmax layer to obtain the probability corresponding to the overall different prediction results of the sequences after normalization.

In one example, the CRF re-prediction process may use a dynamically planned Viterbi algorithm to solve for the optimal paths, i.e., the previous calculation information may be multiplexed without having to be recalculated later when scoring each path. The Viterbi algorithm is a special dynamic programming algorithm with better performance, and the shortest path problem in a graph can be solved by utilizing dynamic programming.

In the embodiment of the application, the optimal path can be determined based on the score and the transfer matrix through CRF, the relevance of the output layer is separated, and some constraint information based on the global is obtained through training corpus learning, so that the context relevance can be fully considered when the label is predicted, and meanwhile, the path with the maximum probability is obtained by utilizing a dynamic programming method, so that the method is better in agreement with the task of named entity recognition, and the prediction accuracy is improved.

In the embodiment of the application, before the address text to be resolved is acquired, the method further comprises the following steps: acquiring a sample address text; each word in the sample address text is provided with a corresponding sample category label; training the initial recognition model according to the sample address text and the sample class label to obtain a recognition model; the parameters of the recognition model include a transfer matrix.

The sample address text can be a real Chinese postal address recorded in the running process of an actual production system, the addresses are segmented and marked, and marking results can be in one-to-one correspondence with each Chinese character in the addresses.

The initial recognition model can be divided into three layers, the bottom layer can be Embedding layers, the middle layer can be BILSTM layers, the upper layer can be a CRF (conditional random field) layer, and the initial recognition model can be trained according to sample address texts and sample category labels to obtain a recognition model. Wherein the parameters of the recognition model comprise a transfer matrix, and weight parameters in the transfer matrix change along with the model training process.

In the embodiment of the application, training the initial recognition model according to the sample address text and the sample class label can comprise the following steps:

(1) Inputting the sample address text into an initial recognition model to obtain a labeling result corresponding to the sample address text; the parameters of the initial recognition model include an initial transfer matrix.

Specifically, the initial transfer matrix may be a parameter of the initial recognition model, and may be randomly initialized before model training, where the weight parameters in the initial transfer matrix are updated along with the iterative process of training. It is also possible to add two types of labels, "START" and "END" in the initial transfer matrix, where "START" represents the beginning of a text segment and "END" identifies the END of a text segment.

The sample address text can be input into the initial recognition model, and a sequence labeling result corresponding to the address text is output.

(2) Determining a negative log-likelihood function based on the sample class label and the labeling result, and taking the negative log-likelihood function as a loss function of the initial recognition model;

Specifically, a negative log likelihood function of a predicted labeling result aiming at a manually labeled sample class label can be calculated by the model to serve as a loss function of the initial recognition model, and then the loss function can be enabled to reach a minimum value by a random gradient descent method in the training process so as to optimize the model.

(3) Training the initial recognition model according to the address text sample to obtain a trained recognition model and a transfer matrix in the recognition model.

The initial recognition model can be trained according to the address text sample to obtain a trained recognition model, the recognition model comprises a transfer matrix, the weight parameters in the transfer matrix are updated along with the iterative process of training, and the network can learn some constraint conditions based on global features.

In the embodiment of the application, the identification network model can be formed by three layers, the overall model architecture is shown in fig. 6, the bottom layer is Embedding layers, the middle layer is BILSTM layers, and the upper layer is a CRF (conditional random field) layer. Specifically, when the address text is Chinese, high-dimensional word vectors are obtained through Embedding layers, the word vectors are used as BILSTM layers of input, a multi-dimensional vector sequence is generated, and the sequence is mapped into a final labeling sequence through a CRF layer.

In order to more clearly explain the natural language processing method of the present application, the natural language processing method will be further described with reference to specific examples.

In one example, the present application provides a natural language processing method, as shown in fig. 7, comprising the steps of:

Step S701, collecting Chinese address samples, and determining labeling information corresponding to each Chinese character through word segmentation and labeling;

Step S702, taking a Chinese address sample as input of an initial neural network model, acquiring negative log likelihood of a labeling result output by the model corresponding to labeling information, and taking the negative log likelihood as a loss function of the initial neural network;

step S703, training the initial neural network model, enabling the loss function to reach the minimum value through a random gradient descent method, and obtaining a trained neural network model;

step S704, mapping each Chinese character in the Chinese address into a corresponding multidimensional word vector based on a neural network model, for example, mapping a specific Chinese character into a 300-dimensional row vector, wherein the dimension and the weight parameters thereof can be determined by a lookup table, wherein the lookup table can be obtained by performing 1-hot encoding according to 10000 high-frequency Chinese characters and then performing full-connection neural network transformation, and the size of the lookup table can be 10k by 300;

Step S705, respectively extracting forward features and backward features in a word vector sequence to obtain forward hidden vectors and backward hidden vectors;

Step S706, the forward hidden vector and the backward hidden vector are spliced to obtain the scores of each word vector corresponding to different category labels;

step S707 determines a transmission score based on the scores of each word vector corresponding to different category labels, and determines a transfer score based on the transfer matrix.

Step S708, a path with the highest score is found out to be the optimal path by using a dynamic programming algorithm based on the emission score and the transfer score, and sequence labeling is carried out according to the optimal path, so that a final labeling sequence is obtained.

An embodiment of the present application provides a device for identifying a named entity, as shown in fig. 8, the image processing device 800 may include: an acquisition module 8001, a mapping module 8002, a determination module 8003, and a labeling module 8004, wherein,

An obtaining module 8001, configured to obtain an address text to be resolved;

the mapping module 8002 is configured to map at least one target word in the address text to a corresponding target word vector based on a preset recognition model, and obtain a word vector sequence based on the target word vector;

a determining module 8003, configured to determine, based on the recognition model, a score of each word vector in the word vector sequence corresponding to a different preset category label;

the labeling module 8004 is configured to obtain a labeling sequence based on the score and a preset transfer matrix through the recognition model.

According to the named entity recognition device, each word in the address text to be resolved is mapped into the corresponding word vector to obtain the word vector sequence, the score of each word vector in the word vector sequence corresponding to different preset category labels is respectively determined, the labeling sequence is obtained according to the score and the preset transfer matrix, the resolution of the address text is completed, the address hierarchy is not required to be configured in advance, the address resolution rule is not required, the context relation of the address text can be learned well, and the efficiency and the accuracy of word segmentation and resolution of the address text are improved.

In the embodiment of the present application, when mapping at least one target word in the address text to a corresponding target word vector, the mapping module 8002 is specifically configured to:

In the embodiment of the present application, when determining the score of each word vector corresponding to a different preset class label in the word vector sequence, the determining module 8003 is specifically configured to:

In the embodiment of the present application, when the labeling module 8004 obtains a labeling sequence based on a score and a preset transfer matrix, the labeling module is specifically configured to:

Determining an optimal path based on the score and the transition matrix;

and acquiring the labeling sequence based on the optimal path.

In the embodiment of the present application, the labeling module 8004 is specifically configured to:

an optimal path is obtained based on the transfer score and the transmit score.

The embodiment of the application further comprises a training module, which is specifically used for:

In the embodiment of the application, when training the initial recognition model according to the sample address text and the sample class label, the training module is specifically used for:

The embodiment of the application provides electronic equipment, which comprises: a memory and a processor; at least one program stored in the memory for execution by the processor, which, when executed by the processor, performs: and mapping each word in the address text to be analyzed into a corresponding word vector to obtain a word vector sequence, respectively determining the score of each word vector in the word vector sequence corresponding to different preset category labels, and acquiring a labeling sequence according to the score and a preset transfer matrix to complete the analysis of the address text.

In an alternative embodiment, there is provided an electronic device, as shown in fig. 9, the electronic device 4000 shown in fig. 9 includes: a processor 4001 and a memory 4003. Wherein the processor 4001 is coupled to the memory 4003, such as via a bus 4002. Optionally, the electronic device 4000 may also include a transceiver 4004. It should be noted that, in practical applications, the transceiver 4004 is not limited to one, and the structure of the electronic device 4000 is not limited to the embodiment of the present application.

The Processor 4001 may be a CPU (Central Processing Unit ), general purpose Processor, DSP (DIGITAL SIGNAL Processor, data signal Processor), ASIC (Application SPECIFIC INTEGRATED Circuit), FPGA (Field Programmable GATE ARRAY ) or other programmable logic device, transistor logic device, hardware component, or any combination thereof. Which may implement or perform the various exemplary logic blocks, modules and circuits described in connection with this disclosure. The processor 4001 may also be a combination that implements computing functionality, e.g., comprising one or more microprocessor combinations, a combination of a DSP and a microprocessor, etc.

Bus 4002 may include a path to transfer information between the aforementioned components. Bus 4002 may be a PCI (PERIPHERAL COMPONENT INTERCONNECT, peripheral component interconnect standard) bus or an EISA (Extended Industry Standard Architecture ) bus, or the like. The bus 4002 can be divided into an address bus, a data bus, a control bus, and the like. For ease of illustration, only one thick line is shown in fig. 9, but not only one bus or one type of bus.

Memory 4003 may be, but is not limited to, ROM (Read Only Memory) or other type of static storage device that can store static information and instructions, RAM (Random Access Memory ) or other type of dynamic storage device that can store information and instructions, EEPROM (ELECTRICALLY ERASABLE PROGRAMMABLE READ ONLY MEMORY ), CD-ROM (Compact Disc Read Only Memory, compact disc Read Only Memory) or other optical disk storage, optical disk storage (including compact discs, laser discs, optical discs, digital versatile discs, blu-ray discs, etc.), magnetic disk storage media or other magnetic storage devices, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer.

The memory 4003 is used for storing application program codes for executing the inventive arrangements, and is controlled to be executed by the processor 4001. The processor 4001 is configured to execute application program codes stored in the memory 4003 to realize what is shown in the foregoing method embodiment.

Among them, the electronic device includes, but is not limited to, a mobile terminal such as a mobile phone, a notebook computer, a PAD, etc., and a fixed terminal such as a digital TV, a desktop computer, etc.

Embodiments of the present application provide a computer-readable storage medium having a computer program stored thereon, which when run on a computer, causes the computer to perform the corresponding method embodiments described above.

It should be understood that, although the steps in the flowcharts of the figures are shown in order as indicated by the arrows, these steps are not necessarily performed in order as indicated by the arrows. The steps are not strictly limited in order and may be performed in other orders, unless explicitly stated herein. Moreover, at least some of the steps in the flowcharts of the figures may include a plurality of sub-steps or stages that are not necessarily performed at the same time, but may be performed at different times, the order of their execution not necessarily being sequential, but may be performed in turn or alternately with other steps or at least a portion of the other steps or stages.

The foregoing is only a partial embodiment of the present invention, and it should be noted that it will be apparent to those skilled in the art that modifications and adaptations can be made without departing from the principles of the present invention, and such modifications and adaptations are intended to be comprehended within the scope of the present invention.

Claims

1. A named entity recognition method, comprising:

acquiring an address text to be resolved;

Based on a preset recognition model, the following steps are executed:

Respectively determining the scores of the different preset category labels corresponding to each word vector in the word vector sequence;

acquiring a labeling sequence based on the score and the preset transfer matrix;

The determining the score of each word vector in the word vector sequence corresponding to different preset category labels respectively comprises the following steps:

splicing the forward hidden vector and the backward hidden vector to obtain a target hidden vector; the target hidden vector comprises scores of the word vectors corresponding to different preset category labels.

2. The named entity recognition method of claim 1, wherein mapping at least one target word in the address text to a corresponding target word vector comprises:

3. The named entity recognition method of claim 1, wherein obtaining a labeling sequence based on the score and the preset transition matrix comprises:

Determining an optimal path based on the score and the transfer matrix;

and acquiring the labeling sequence based on the optimal path.

4. A named entity recognition method according to claim 3, wherein determining an optimal path based on the score and the transfer matrix comprises:

Taking the score as a transmission score, and determining transmission scores of the word vectors corresponding to different preset category labels;

the optimal path is obtained based on the transfer score and the emission score.

5. The method for identifying a named entity according to claim 4, further comprising, before obtaining the address text to be parsed:

training an initial recognition model according to the sample address text and the sample class label to obtain the recognition model; the transfer matrix is included in the parameters of the recognition model.

6. The named entity recognition method of claim 5, wherein training an initial recognition model based on the sample address text and the sample class labels comprises:

Inputting the sample address text into an initial recognition model, and obtaining a labeling result corresponding to the sample address text; the parameters of the initial recognition model comprise an initial transfer matrix;

7. An apparatus for named entity recognition, comprising:

The acquisition module is used for acquiring the address text to be analyzed;

the marking module is used for acquiring a marking sequence based on the score and the preset transfer matrix through the identification model;

The determining module is specifically configured to:

8. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the named entity recognition method of any of claims 1-6 when the program is executed by the processor.

9. A computer readable storage medium, characterized in that the computer readable storage medium has stored thereon a computer program which, when executed by a processor, implements the named entity recognition method of any of claims 1-6.