[go: up one dir, main page]

CN108595426B - A word vector optimization method based on the structural information of Chinese characters - Google Patents

A word vector optimization method based on the structural information of Chinese characters Download PDF

Info

Publication number
CN108595426B
CN108595426B CN201810368909.0A CN201810368909A CN108595426B CN 108595426 B CN108595426 B CN 108595426B CN 201810368909 A CN201810368909 A CN 201810368909A CN 108595426 B CN108595426 B CN 108595426B
Authority
CN
China
Prior art keywords
word
vector
processed
words
distributed
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810368909.0A
Other languages
Chinese (zh)
Other versions
CN108595426A (en
Inventor
郭宇春
潘常玮
陈一帅
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Jiaotong University
Original Assignee
Beijing Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Jiaotong University filed Critical Beijing Jiaotong University
Priority to CN201810368909.0A priority Critical patent/CN108595426B/en
Publication of CN108595426A publication Critical patent/CN108595426A/en
Application granted granted Critical
Publication of CN108595426B publication Critical patent/CN108595426B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/216Parsing using statistical methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Machine Translation (AREA)

Abstract

本发明提供了一种基于汉字字形结构性信息的词向量优化方法。该方法包括:获取待处理词语的分布式词向量;根据所述待处理词语所包含的汉字进行词语的词形特征表示,获取待处理词语的词形特征向量;将所述待处理词语的词形特征向量和分布式词向量进行结合表示,得到待处理词语的优化特征向量。本发明设计了一种利用汉语字形结构信息进行词向量表达优化的方案,利用原有的神经网络词分布式表达技术,结合汉语的字形结构特征,基于实际的自然语言处理任务进行了词向量的特性优化,使得词向量的表达能力和泛化迁移能力得到加强,有助于改善词向量在低频词和未知词上的词特征表示。

Figure 201810368909

The invention provides a word vector optimization method based on the structural information of Chinese characters. The method includes: obtaining distributed word vectors of the words to be processed; performing morphological feature representation of the words according to the Chinese characters contained in the words to be processed, and obtaining the morphological feature vectors of the words to be processed; The shape feature vector and the distributed word vector are combined to represent the optimized feature vector of the word to be processed. The present invention designs a scheme for optimizing word vector expression by using Chinese glyph structure information, using the original neural network word distributed expression technology, combined with Chinese glyph structure features, and carrying out word vector expression based on actual natural language processing tasks. Feature optimization enhances the expression ability and generalization transfer ability of word vectors, which helps to improve the word feature representation of word vectors on low-frequency words and unknown words.

Figure 201810368909

Description

Word vector optimization method based on Chinese character font structural information
Technical Field
The invention relates to the technical field of word vector representation, in particular to a word vector optimization method based on Chinese character font structural information.
Background
In the traditional method, words in a text are expressed numerically by means of one-hot representation (one-hot representation), but the expression method only symbolizes the words, does not contain any semantic information, and obtains high-dimensional sparse representation. The presence of the distribution hypothesis allows the representation of the word vector to be further optimized with respect to how the semantics are incorporated into the word representation: the semantics of a word are determined by its context. The distributed representation based on the neural network is generally called word embedding (word embedding) or distributed representation (distributed representation), original sparse huge dimensionality is compressed and embedded into a space with smaller dimensionality, and the semantic representation in the word vector form is the basis of a neural translation model and also becomes the basis of various natural language processing tasks. Therefore, designing a better word vector model is also a common challenge for various natural language processing tasks such as text classification, machine translation, and language modeling.
For low-frequency words and unknown words, in the neural network distributed expression method in the prior art, a special word vector (such as "UNK") is set for replacement, because the distributed semantic representation itself is a statistical learning method, the accuracy of the semantic representation is based on sufficient sample data, statistical commonality is learned from the sample data and the distributed low-dimensional numerical expression is encoded, so when the occurrence frequency of the words is low and even when the words are never seen before, the confidence of the word vector representation is low, and semantic deviation is generated due to the characteristics of individual samples.
Disclosure of Invention
The embodiment of the invention provides a word vector optimization method based on Chinese character font structural information, which aims to overcome the problems in the prior art.
In order to achieve the purpose, the invention adopts the following technical scheme.
A word vector optimization method based on Chinese character font structural information comprises the following steps:
acquiring a distributed word vector of a word to be processed;
performing word shape feature representation of the words according to the Chinese characters contained in the words to be processed, and acquiring word shape feature vectors of the words to be processed;
and combining and representing the word form characteristic vector and the distributed word vector of the word to be processed to obtain an optimized characteristic vector of the word to be processed.
Further, the obtaining of the distributed word vector of the word to be processed includes:
firstly, carrying out word segmentation pretreatment on an original text of words to be treated, carrying out distributed word vector expression on words in the pretreated original text, and obtaining a distributed word vector of the words to be treated.
Setting a word frequency threshold, counting the word frequency of the word to be processed by utilizing a preset word bank, and judging whether the word frequency of the word to be processed is lower than the set word frequency threshold.
Further, the obtaining of the word shape feature vector of the word to be processed according to the word shape feature representation of the word by the Chinese character contained in the word to be processed includes:
self-extracting and learning of Chinese character structural information is carried out through a deep learning technology, and all the Chinese character structural information is stored in a Chinese character structural database;
decomposing and counting all characters in the original text of the word to be processed, respectively querying the Chinese character structure database according to each character, acquiring the structure information of each character, and expressing the structure information of each character as a low-dimensional feature vector by using an unsupervised feature extraction method;
and carrying out an averaging operation on the low-dimensional feature vectors corresponding to all the characters, and taking the obtained average as the morphological feature vector of the word to be processed.
Further, the combining and representing the morphological feature vector and the distributed word vector of the word to be processed to obtain the optimized feature vector of the word to be processed includes:
performing dimension connection on the word form characteristic vector and the distributed word vectors to generate a fused word vector, and taking the fused word vector as an optimized characteristic vector of the word to be processed;
further, the combining and representing the morphological feature vector and the distributed word vector of the word to be processed to obtain the optimized feature vector of the word to be processed includes:
finding one or more neighbor words of the word to be processed in a word bank by utilizing the morphological feature vector through a set similarity calculation index, then carrying out an averaging operation on the distributed word vector of the one or more neighbor words and the distributed word vector of the word to be processed, taking the obtained average value as an optimized feature vector of the word to be processed, and taking the optimized feature vector as a semantic expression word vector common to the one or more neighbor words and the word to be processed.
It can be seen from the technical solutions provided by the embodiments of the present invention that a scheme for optimizing word vector expression by using chinese font structure information is designed in the embodiments of the present invention, and the characteristics of word vectors are optimized based on actual natural language processing tasks by using the original neural network word distributed expression technology in combination with chinese font structure characteristics, so that the expression capability and generalization migration capability of word vectors are enhanced. The method is beneficial to improving the word feature representation of the word vector on low-frequency words and unknown words, and the purpose of optimizing the performance of natural language processing tasks such as text classification is realized.
Additional aspects and advantages of the invention will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
Fig. 1 is a schematic diagram of an implementation of a word vector optimization method based on structural information of Chinese character patterns according to an embodiment of the present invention;
fig. 2 is a processing flow chart of a word vector optimization method based on structural information of Chinese character patterns according to an embodiment of the present invention.
Detailed Description
Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the accompanying drawings are illustrative only for the purpose of explaining the present invention, and are not to be construed as limiting the present invention.
As used herein, the singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. It will be understood that when an element is referred to as being "connected" or "coupled" to another element, it can be directly connected or coupled to the other element or intervening elements may also be present. Further, "connected" or "coupled" as used herein may include wirelessly connected or coupled. As used herein, the term "and/or" includes any and all combinations of one or more of the associated listed items.
It will be understood by those skilled in the art that, unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the prior art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.
For the convenience of understanding the embodiments of the present invention, the following description will be further explained by taking several specific embodiments as examples in conjunction with the drawings, and the embodiments are not to be construed as limiting the embodiments of the present invention.
The embodiment of the invention discloses a word vector optimization scheme based on Chinese character font structural information, which mainly relates to the following steps: extracting method of Chinese character font structure; a method of representing a word feature vector from a font; combining strategies between the word structure feature vectors and the distributed expression word feature vectors based on statistics; and the use of the post-word vector in the actual natural language processing task is improved, and how to select different expression combination schemes according to different scenes. More accurate expression optimization is realized.
The implementation principle schematic diagram of the word vector optimization method based on the Chinese character font structural information provided by the embodiment of the invention is shown in fig. 1, the specific processing flow is shown in fig. 2, and the method comprises the following processing steps:
step 21: and extracting the structure information of the Chinese characters, and storing the structure information of all the Chinese characters in a Chinese character structure database.
And through a deep learning technology, the independent extraction and learning of the Chinese character structural information are carried out, and all the Chinese character structural information is stored in a Chinese character structural database.
And step 22, obtaining the distributed word vector of the word to be processed.
The method comprises the steps of firstly carrying out preprocessing such as word segmentation on an original text of Words to be processed, carrying out distributed word vector expression on the Words in the original text by using methods such as CBOW (Continuous Bag-of-Words) and Skip-gram, and obtaining a distributed word vector of the Words to be processed.
Setting a word frequency threshold, counting the word frequency of the word to be processed by utilizing a preset word bank, and judging whether the word frequency of the word to be processed is lower than the set word frequency threshold. Currently, prior art methods may only make a uniform "UNK" substitution representation for low frequency words. The method of the embodiment of the invention is mainly applied to low-frequency words lower than the word frequency threshold, but can also be applied to high-frequency words higher than the word frequency threshold.
And step 23, performing word shape feature representation of the words according to the Chinese characters contained in the words to be processed, and acquiring word shape feature vectors of the words to be processed.
The definition of a word self comprises each character and the combination relation among the characters, all the characters in the original text of the word to be processed are decomposed and counted, the Chinese character structure database is respectively inquired according to each character, and the structure information of each character is obtained. And then, expressing the structural information of each word as a low-dimensional feature vector by using an unsupervised feature extraction method.
And then, carrying out an averaging operation on the low-dimensional feature vectors corresponding to all the characters, and taking the obtained average as the morphological feature vector of the word to be processed. And (4) adding word structure characteristics into the word form characteristic vector of the word to be processed to complete the optimization on the expression of the word.
For example, for the word "ocean", 32-dimensional font feature vectors of the word "ocean" and the word "ocean" are calculated by using an unsupervised deep learning model, and a font structure feature representation vector of the word "ocean" can be obtained by averaging the two vectors.
And 24, combining and representing the morphological characteristic vector and the distributed word vector of the word to be processed to obtain an optimized characteristic vector of the word to be processed.
The characteristic combination of the word form characteristic vector and the distributed word vector of the word to be processed is carried out in two ways:
one way is that: directly connecting the word shape feature vector and the distributed word vector in dimensionality to generate a 160-dimensional fused word vector; using the fused word vector as the optimized characteristic vector of the word to be processed
For example, for the word "ocean", a 128-dimensional distributed word vector is obtained by calculation using a conventional context distributed semantic expression mode, and in the last step, a 32-dimensional morphological feature vector is obtained, and the 128-dimensional distributed word vector and the 32-dimensional morphological feature vector are subjected to dimension connection to generate a 160-dimensional fused word vector.
The other mode is as follows: finding one or more neighboring words of the word to be processed in a word bank by utilizing a morphological feature vector through a certain similarity calculation index, then carrying out an averaging operation on the distributed word vector of the one or more neighboring words and the distributed word vector of the word to be processed, taking the obtained average value as an optimized feature vector of the word to be processed, and taking the optimized feature vector as a semantic expression word vector common to the one or more neighboring words and the word to be processed
Assuming that "ocean" is most similar to "Wang ocean" and "sea water" in font, the 128-dimensional distributed word vector corresponding to the two words and the word vector of the word "ocean" can be used for calculating an average value to obtain an optimized feature vector of the "ocean", and the optimized feature vector is also a semantic expression word vector common to the three words.
And inputting the obtained expression of the optimized feature vector of the word to be processed into a sentence expression model frame to obtain a sentence expression, and outputting a final classification result of the word to be processed in a classifier.
In summary, the embodiment of the present invention designs a scheme for optimizing word vector expression by using chinese font structure information, and performs characteristic optimization of word vectors based on actual natural language processing tasks by using the original neural network word distributed expression technology and combining chinese font structure characteristics, so that the expression capability and generalization migration capability of word vectors are enhanced. The method is beneficial to improving the word feature representation of the word vector on low-frequency words and unknown words, and the purpose of optimizing the performance of natural language processing tasks such as text classification is realized.
The embodiment of the invention realizes a special optimization strategy for the Chinese word vector on the basis of the original word vector distributed expression method and the two-dimensional graph structure information extraction method, so that the actual performance of the Chinese word vector on natural language processing tasks such as text classification and the like is enhanced.
Those of ordinary skill in the art will understand that: the figures are merely schematic representations of one embodiment, and the blocks or flow diagrams in the figures are not necessarily required to practice the present invention.
From the above description of the embodiments, it is clear to those skilled in the art that the present invention can be implemented by software plus necessary general hardware platform. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which may be stored in a storage medium, such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method according to the embodiments or some parts of the embodiments.
The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for apparatus or system embodiments, since they are substantially similar to method embodiments, they are described in relative terms, as long as they are described in partial descriptions of method embodiments. The above-described embodiments of the apparatus and system are merely illustrative, and the units described as separate parts may or may not be physically separate, and the parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
The above description is only for the preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are included in the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (2)

1. A word vector optimization method based on Chinese character font structural information is characterized by comprising the following steps:
acquiring a distributed word vector of a word to be processed;
performing word shape feature representation of the words according to the Chinese characters contained in the words to be processed, and acquiring word shape feature vectors of the words to be processed, wherein the word shape feature vectors specifically comprise:
self-extracting and learning of Chinese character structural information is carried out through a deep learning technology, and all the Chinese character structural information is stored in a Chinese character structural database;
decomposing and counting all characters in the original text of the word to be processed, respectively querying the Chinese character structure database according to each character, acquiring the structure information of each character, and expressing the structure information of each character as a low-dimensional feature vector by using an unsupervised feature extraction method;
carrying out an averaging operation on the low-dimensional feature vectors corresponding to all the characters, and taking the obtained average as a morphological feature vector of the word to be processed;
the expression method comprises the following steps of combining and representing the morphological characteristic vector and the distributed word vector of the word to be processed to obtain an optimized characteristic vector of the word to be processed, and specifically comprises the following steps:
performing dimension connection on the morphological feature vector and the distributed word vector to generate a fused word vector, taking the fused word vector as an optimized feature vector of the word to be processed, or finding one or more neighbor words of the word to be processed in a word bank by utilizing the morphological feature vector through a set similarity calculation index, then performing an averaging operation on the distributed word vector of the one or more neighbor words and the distributed word vector of the word to be processed, taking the obtained average value as the optimized feature vector of the word to be processed, and taking the optimized feature vector as a semantic expression word vector common to the one or more neighbor words and the word to be processed.
2. The method of claim 1, wherein obtaining the distributed word vector of the word to be processed comprises:
firstly, carrying out word segmentation pretreatment on an original text of words to be treated, carrying out distributed word vector expression on words in the pretreated original text, and obtaining a distributed word vector of the words to be treated;
setting a word frequency threshold, counting the word frequency of the word to be processed by utilizing a preset word bank, and judging whether the word frequency of the word to be processed is lower than the set word frequency threshold.
CN201810368909.0A 2018-04-23 2018-04-23 A word vector optimization method based on the structural information of Chinese characters Active CN108595426B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810368909.0A CN108595426B (en) 2018-04-23 2018-04-23 A word vector optimization method based on the structural information of Chinese characters

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810368909.0A CN108595426B (en) 2018-04-23 2018-04-23 A word vector optimization method based on the structural information of Chinese characters

Publications (2)

Publication Number Publication Date
CN108595426A CN108595426A (en) 2018-09-28
CN108595426B true CN108595426B (en) 2021-07-20

Family

ID=63614131

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810368909.0A Active CN108595426B (en) 2018-04-23 2018-04-23 A word vector optimization method based on the structural information of Chinese characters

Country Status (1)

Country Link
CN (1) CN108595426B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109408814B (en) * 2018-09-30 2020-08-07 中国地质大学(武汉) Chinese-English cross-language vocabulary representation learning method and system based on paraphrased primitive words
CN111563380A (en) * 2019-01-25 2020-08-21 浙江大学 Named entity identification method and device
CN110348022A (en) * 2019-07-18 2019-10-18 北京香侬慧语科技有限责任公司 A kind of method, apparatus of similarity analysis, storage medium and electronic equipment
CN110795935A (en) * 2020-01-06 2020-02-14 广东博智林机器人有限公司 Training method and device for character word vector model, terminal and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107168952A (en) * 2017-05-15 2017-09-15 北京百度网讯科技有限公司 Information generating method and device based on artificial intelligence
CN107273355A (en) * 2017-06-12 2017-10-20 大连理工大学 A kind of Chinese word vector generation method based on words joint training
CN107341152A (en) * 2016-04-28 2017-11-10 阿里巴巴集团控股有限公司 A kind of method and device of parameter input

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2003288362A (en) * 2002-03-27 2003-10-10 Seiko Epson Corp Specific element vector generation device, character string vector generation device, similarity calculation device, specific element vector generation program, character string vector generation program and similarity calculation program, and specific element vector generation method, character string vector generation method and similarity calculation Method

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107341152A (en) * 2016-04-28 2017-11-10 阿里巴巴集团控股有限公司 A kind of method and device of parameter input
CN107168952A (en) * 2017-05-15 2017-09-15 北京百度网讯科技有限公司 Information generating method and device based on artificial intelligence
CN107273355A (en) * 2017-06-12 2017-10-20 大连理工大学 A kind of Chinese word vector generation method based on words joint training

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
Jinxing Yu et al..Joint Embeddings of Chinese Words, Characters, and Fine-grained Subcharacter Components.《EMNLP 2017》.2017,第286-291页. *
Joint Embeddings of Chinese Words, Characters, and Fine-grained Subcharacter Components;Jinxing Yu et al.;《EMNLP 2017》;20170930;第286-291页 *
Learning ChineseWord Representations From Glyphs Of Characters;Tzu-Ray Su et al;《EMNLP 2017》;20170930;第264-273页 *
深度学习中汉语字向量和词向量结合方式探究;李伟康 等;《中文信息学报》;20170630;第31卷(第6期);全文 *

Also Published As

Publication number Publication date
CN108595426A (en) 2018-09-28

Similar Documents

Publication Publication Date Title
CN108133045B (en) Keyword extraction method and system, and keyword extraction model generation method and system
CN109508379A (en) A kind of short text clustering method indicating and combine similarity based on weighted words vector
CN108595426B (en) A word vector optimization method based on the structural information of Chinese characters
CN114372465B (en) Mixup and BQRNN-based legal naming entity identification method
Yang et al. Sentiment analysis of Weibo comment texts based on extended vocabulary and convolutional neural network
CN109993216B (en) Text classification method and device based on K nearest neighbor KNN
CN112380319A (en) Model training method and related device
CN113065331A (en) Entity emotion recognition method and system based on entity context discrimination
CN109446333A (en) A kind of method that realizing Chinese Text Categorization and relevant device
CN110516098A (en) An Image Annotation Method Based on Convolutional Neural Network and Binary Coded Features
CN109684476A (en) A kind of file classification method, document sorting apparatus and terminal device
CN114297388B (en) A text keyword extraction method
CN118711198B (en) Information identification method and device
CN111782804A (en) TextCNN-based same-distribution text data selection method, system and storage medium
CN111241271B (en) Text emotion classification method and device and electronic equipment
CN110413992A (en) A kind of semantic analysis recognition methods, system, medium and equipment
CN109858035A (en) A kind of sensibility classification method, device, electronic equipment and readable storage medium storing program for executing
Yao et al. Dual-disentangled deep multiple clustering
Jenckel et al. Impact of Training LSTM-RNN with Fuzzy Ground Truth.
CN118981511A (en) Method, system, device and medium for improving retrieval performance of RAG knowledge base
CN110852102B (en) Chinese part-of-speech tagging method and device, storage medium and electronic equipment
CN113220936A (en) Intelligent video recommendation method and device based on random matrix coding and simplified convolutional network and storage medium
Zhang et al. Associating spatially-consistent grouping with text-supervised semantic segmentation
CN109446321B (en) Text classification method, text classification device, terminal and computer readable storage medium
CN115455955B (en) Chinese named entity recognition method based on local and global character representation enhancement

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant