[go: up one dir, main page]

CN118966194A - A text intelligent analysis method and system based on natural language processing - Google Patents

A text intelligent analysis method and system based on natural language processing Download PDF

Info

Publication number
CN118966194A
CN118966194A CN202411441166.7A CN202411441166A CN118966194A CN 118966194 A CN118966194 A CN 118966194A CN 202411441166 A CN202411441166 A CN 202411441166A CN 118966194 A CN118966194 A CN 118966194A
Authority
CN
China
Prior art keywords
text
data
named entity
entity recognition
responsible
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202411441166.7A
Other languages
Chinese (zh)
Inventor
张帅
李照川
张尧臣
王冠军
张野
李会
侯冬刚
孙源
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Inspur Software Technology Co Ltd
Original Assignee
Inspur Software Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Inspur Software Technology Co Ltd filed Critical Inspur Software Technology Co Ltd
Priority to CN202411441166.7A priority Critical patent/CN118966194A/en
Publication of CN118966194A publication Critical patent/CN118966194A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/10Pre-processing; Data cleansing
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • G06N3/0442Recurrent networks, e.g. Hopfield networks characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU]
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • G06N3/0455Auto-encoder networks; Encoder-decoder networks
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Biomedical Technology (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Databases & Information Systems (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Machine Translation (AREA)

Abstract

本发明涉及技术领域,特别涉及一种基于自然语言处理的文本智能分析方法与系统。该基于自然语言处理的文本智能分析方法,利用爬虫技术爬取对应网站数据,进行数据清洗,清除文本噪音,并规范数据格式,对文本进行真实数据填充;以双向编码器表示变换BERT为基石,结合双向长短期记忆网络BiLSTM与条件随机场CRF,构建命名实体识别模型;训练命名实体识别模型,加载训练后的模型,对新输入文本进行命名实体识别,输出实体及其类别标签。该基于自然语言处理的文本智能分析方法与系统,能够自动化提取关键信息,不仅提高了处理效率,而且提升了关键信息提取的准确率,降低人为错误的风险,拓展了应用范围,改善了用户体验。

The present invention relates to the technical field, and in particular to a text intelligent analysis method and system based on natural language processing. The text intelligent analysis method based on natural language processing uses crawler technology to crawl corresponding website data, perform data cleaning, remove text noise, standardize data format, and fill the text with real data; using the bidirectional encoder representation transformation BERT as the cornerstone, combined with the bidirectional long short-term memory network BiLSTM and the conditional random field CRF, a named entity recognition model is constructed; the named entity recognition model is trained, the trained model is loaded, and the named entity recognition is performed on the new input text, and the entity and its category label are output. The text intelligent analysis method and system based on natural language processing can automatically extract key information, which not only improves processing efficiency, but also improves the accuracy of key information extraction, reduces the risk of human errors, expands the scope of application, and improves user experience.

Description

Text intelligent analysis method and system based on natural language processing
Technical Field
The invention relates to the technical field of artificial intelligence, in particular to a text intelligent analysis method and system based on natural language processing.
Background
With the continuous progress of society, the number of various documents is increasing, and part of documents still need to be signed or checked manually, which brings a series of problems and challenges. Firstly, the manual examination requires deep knowledge of the content to be checked of various files, but because of the numerous file types, the auditor may not be familiar with the key content of certain files, and a great deal of time is required for review and understanding; second, manual auditing presents a high risk of error. Therefore, it is important to develop a system capable of extracting key information with high accuracy.
In order to accurately identify and mark key contents to be audited in various files and facilitate auditors to audit the files, the invention provides a text intelligent analysis method and a text intelligent analysis system based on natural language processing.
Disclosure of Invention
The invention provides a simple and efficient text intelligent analysis method and system based on natural language processing in order to make up the defects of the prior art.
The invention is realized by the following technical scheme:
a text intelligent analysis method based on natural language processing comprises the following steps:
Step S1, data collection
A website is selected in a self-defining mode, a crawler technology is utilized to crawl files on the corresponding website, and corresponding data are obtained;
Step S2, data cleaning
After the data crawling is completed, performing data cleaning on the crawled data, removing text noise, and standardizing a data format so as to improve the accuracy and quality of the crawled data;
in the step S2, when the data is cleaned, firstly, removing text noise, then normalizing the data format, and performing textualization on the obtained data to enable each piece of data to be a line of text independently; and finally, correcting the content, and removing repeated and wrong content in the text.
Step S3, data filling
After the data cleaning is completed, the text is filled with real data so as to improve the accuracy of entity identification;
in the step S3, the text input into the named entity recognition model is preprocessed, including word segmentation and stop word removal, so as to convert the text into a word sequence.
S4, constructing a named entity recognition model
The two-way encoder representation transformation BERT is used as a basic stone, and a named entity recognition model is constructed by combining a two-way long-short-term memory network BiLSTM and a conditional random field CRF so as to improve the accuracy of entity recognition;
In the step S4, the bi-directional encoder represents the transformation BERT as a pre-trained language model and is responsible for extracting deep text features from the original text; the two-way long-short-term memory network BiLSTM is responsible for further capturing various information in the text sequence; the conditional random field CRF is responsible for constraining the output tag sequence to ensure the ordering of the entity tags and optimize the output of the model.
In the step S4, the two-way long-short-term memory network BiLSTM is used as an encoding layer of the named entity recognition model, and is responsible for comprehensively capturing context information of the text, so that the named entity recognition model can fully consider the front-rear relevance of the text when generating an output tag sequence, and the accuracy of prediction is improved;
The conditional random field CRF is responsible for accurately determining the dependency relationship between labels in the labeling sequence, so as to optimize the recognition process of the named entity.
Step S5, model training
After extracting effective features from word sequences, inputting the characterized text fragments into a pre-defined class set, and training a named entity recognition model so that the corresponding relationship between the text features and class labels can be learned;
Step S6, named entity identification
And loading the optimal model obtained after training, carrying out named entity recognition on the new input text, and outputting an entity and a category label thereof.
A system for implementing a natural language processing based text intelligent analysis method, comprising:
the data collection module is responsible for self-defining and selecting websites, and crawling files on the corresponding websites by utilizing a crawler technology to obtain corresponding data;
The data cleaning module is responsible for cleaning the data of the crawled data after the crawling of the data is completed, eliminating text noise and standardizing the data format so as to improve the accuracy and quality of the crawled data;
The data cleaning module firstly cleans text noise, then standardizes a data format, and carries out textualization on the acquired data so that each piece of data is independently a line of text; and finally, correcting the content, and removing repeated and wrong content in the text.
The data filling module is responsible for carrying out real data filling on the text after the data cleaning is completed so as to improve the accuracy of entity identification;
The system also comprises a preprocessing module which is responsible for preprocessing the text input into the named entity recognition model, and comprises word segmentation and stop word removal, and converts the text into a word sequence.
The named entity recognition model construction module is responsible for constructing a named entity recognition model by taking a bi-directional encoder representation transformation BERT as a basic stone and combining a bi-directional long-short-term memory network BiLSTM and a conditional random field CRF so as to improve the accuracy of entity recognition;
The model training module is in charge of extracting effective features from word sequences, inputting the characterized text fragments into a pre-defined class set, and training a named entity recognition model so that the corresponding relationship between the text features and class labels can be learned;
And the named entity recognition module is responsible for loading the optimal model obtained after training, carrying out named entity recognition on the new input text, and outputting the entity and the category label thereof.
A computing device, characterized by: comprising the following steps:
one or more processors, one or more memories, and one or more programs, wherein the one or more programs are stored in the one or more memories and configured to be executed by the one or more processors, the one or more programs comprising instructions for performing any of the methods described above.
A readable storage medium, characterized by: the readable storage medium has stored thereon a computer program which, when executed by a processor, implements a method as described above.
The beneficial effects of the invention are as follows: the text intelligent analysis method and system based on natural language processing can automatically extract key information, greatly reduce the time and workload of manually processing the text, improve the processing efficiency, improve the accuracy of key information extraction, reduce the risk of human errors, expand the application range and improve the user experience.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic diagram of a text intelligent analysis method based on natural language processing.
FIG. 2 is a diagram of a named entity recognition model according to the present invention.
Detailed Description
In order to enable those skilled in the art to better understand the technical solution of the present invention, the following description will make clear and complete descriptions of the technical solution of the present invention in combination with the embodiments of the present invention. It will be apparent that the described embodiments are only some, but not all, embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present invention without making any inventive effort, shall fall within the scope of the present invention.
The intelligent text analysis method based on natural language processing comprises the following steps:
Step S1, data collection
A website is selected in a self-defining mode, a crawler technology is utilized to crawl files on the corresponding website, and corresponding data are obtained;
Step S2, data cleaning
After the data crawling is completed, because the crawled data is various and not particularly standard data, the crawled data is subjected to data cleaning, text noise is removed, and the data format is standardized, so that the accuracy and quality of the crawled data are improved;
in the step S2, when the data is cleaned, firstly, text noise such as html tags, messy codes and the like generated in the crawling process is removed; then, the data format is standardized, and the obtained data is subjected to text processing, so that each piece of data is independently a line of text; and finally, correcting the content, and removing repeated and wrong content in the text.
Step S3, data filling
After the data cleaning is completed, the text is filled with real data so as to improve the accuracy of entity identification;
in the step S3, the text input into the named entity recognition model is preprocessed, including word segmentation and stop word removal, so as to convert the text into a word sequence.
S4, constructing a named entity recognition model
The two-way encoder representation transformation BERT is used as a basic stone, and a named entity recognition model is constructed by combining a two-way long-short-term memory network BiLSTM and a conditional random field CRF so as to improve the accuracy of entity recognition;
In the step S4, the bi-directional encoder represents the transformation BERT as a pre-trained language model and is responsible for extracting deep text features from the original text; the two-way long-short-term memory network BiLSTM is responsible for further capturing various information in the text sequence; the conditional random field CRF is responsible for constraining the output tag sequence to ensure the ordering of the entity tags and optimize the output of the model.
The result shows that the model based on the bi-directional encoder representation transformation BERT achieves remarkable performance improvement on a specific named entity data set, so that the correctness of the design concept is verified. In addition, because the bi-directional encoder represents the wide application and excellent performance of the transformation BERT in natural language processing, the accurate identification of key entities in the text is realized by fine tuning the trained bi-directional encoder to represent the transformation BERT model and combining specific text data.
In order to understand the text more deeply and identify the named entity therein accurately, in the step S4, the bidirectional long-short-term memory network BiLSTM is used as a coding layer of the named entity identification model and is responsible for capturing the context information of the text comprehensively, so that the named entity identification model can fully consider the front-rear relevance of the text when generating the output tag sequence, and the accuracy of prediction is improved;
However, relying solely on BiLSTM for prediction may cause problems with tag sequence confusion, such as generating erroneous tag combinations of "B-JiaFang, 0-JiaFang, I-JiaFang", etc. To effectively circumvent such problems, the characteristics of conditional random field CRF are smartly exploited. The conditional random field CRF is responsible for accurately determining the dependency relationship between labels in the labeling sequence, so as to optimize the recognition process of the named entity. By applying CRF to constraints of the output sequence, more accurate, consistent recognition results are obtained.
Step S5, model training
After extracting effective features from word sequences, inputting the characterized text fragments into a pre-defined class set, and training a named entity recognition model so that the corresponding relationship between the text features and class labels can be learned;
Step S6, named entity identification
And loading the optimal model obtained after training, carrying out named entity recognition on the new input text, and outputting an entity and a category label thereof.
The system for realizing the intelligent text analysis method based on natural language processing comprises the following steps:
the data collection module is responsible for self-defining and selecting websites, and crawling files on the corresponding websites by utilizing a crawler technology to obtain corresponding data;
The data cleaning module is responsible for cleaning the data of the crawled data after the crawling of the data is completed, eliminating text noise and standardizing the data format so as to improve the accuracy and quality of the crawled data;
The data cleaning module firstly cleans text noise, then standardizes a data format, and carries out textualization on the acquired data so that each piece of data is independently a line of text; and finally, correcting the content, and removing repeated and wrong content in the text.
The data filling module is responsible for carrying out real data filling on the text after the data cleaning is completed so as to improve the accuracy of entity identification;
The system also comprises a preprocessing module which is responsible for preprocessing the text input into the named entity recognition model, and comprises word segmentation and stop word removal, and converts the text into a word sequence.
The named entity recognition model construction module is responsible for constructing a named entity recognition model by taking a bi-directional encoder representation transformation BERT as a basic stone and combining a bi-directional long-short-term memory network BiLSTM and a conditional random field CRF so as to improve the accuracy of entity recognition;
The model training module is in charge of extracting effective features from word sequences, inputting the characterized text fragments into a pre-defined class set, and training a named entity recognition model so that the corresponding relationship between the text features and class labels can be learned;
And the named entity recognition module is responsible for loading the optimal model obtained after training, carrying out named entity recognition on the new input text, and outputting the entity and the category label thereof.
The computing device includes:
one or more processors, one or more memories, and one or more programs, wherein the one or more programs are stored in the one or more memories and configured to be executed by the one or more processors, the one or more programs comprising instructions for performing any of the methods described above.
The readable storage medium has stored thereon a computer program which, when executed by a processor, implements a method as described above.
Compared with the prior art, the text intelligent analysis method and system based on natural language processing have the following characteristics:
(1) By automatic key information extraction, the time and workload of manually processing the text can be greatly reduced, and the processing efficiency is improved.
(2) The text content can be more accurately understood based on the natural language processing technology, so that the key information is more accurately extracted, the accuracy is improved, and the risk of human errors is reduced.
(3) The method can be applied to text data in various types and fields, is not limited by a specific format or structure, expands the application range, and has strong universality and expandability.
(4) For users who need to process a large amount of texts, the method can provide more convenient and efficient service, and improves user experience and satisfaction.
The text intelligent analysis method and the text intelligent analysis system based on natural language processing in the embodiment of the invention are described in detail. The principles and embodiments of the present invention have been described in this section with specific examples provided above to facilitate understanding of the core concepts of the invention and all other examples obtained by one skilled in the art without departing from the principles of the invention are intended to be within the scope of the invention.

Claims (9)

1. A text intelligent analysis method based on natural language processing is characterized in that: the method comprises the following steps:
Step S1, data collection
A website is selected in a self-defining mode, a crawler technology is utilized to crawl files on the corresponding website, and corresponding data are obtained;
Step S2, data cleaning
After the data crawling is completed, performing data cleaning on the crawled data, removing text noise, and standardizing a data format so as to improve the accuracy and quality of the crawled data;
Step S3, data filling
After the data cleaning is completed, the text is filled with real data so as to improve the accuracy of entity identification;
s4, constructing a named entity recognition model
The two-way encoder representation transformation BERT is used as a basic stone, and a named entity recognition model is constructed by combining a two-way long-short-term memory network BiLSTM and a conditional random field CRF so as to improve the accuracy of entity recognition;
Step S5, model training
After extracting effective features from word sequences, inputting the characterized text fragments into a pre-defined class set, and training a named entity recognition model so that the corresponding relationship between the text features and class labels can be learned;
Step S6, named entity identification
And loading the optimal model obtained after training, carrying out named entity recognition on the new input text, and outputting an entity and a category label thereof.
2. The intelligent text analysis method based on natural language processing according to claim 1, wherein: in the step S2, when the data is cleaned, firstly, removing text noise, then normalizing the data format, and performing textualization on the obtained data to enable each piece of data to be a line of text independently; and finally, correcting the content, and removing repeated and wrong content in the text.
3. The intelligent text analysis method based on natural language processing according to claim 1, wherein: in the step S3, the text input into the named entity recognition model is preprocessed, including word segmentation and stop word removal, so as to convert the text into a word sequence.
4. The intelligent text analysis method based on natural language processing according to claim 1, wherein: in the step S4, the bi-directional encoder represents the transformation BERT as a pre-trained language model and is responsible for extracting deep text features from the original text; the bidirectional long-short term memory network BiLSTM is responsible for further capturing information in the text sequence; the conditional random field CRF is responsible for constraining the output tag sequence to ensure the ordering of the entity tags and optimize the output of the model.
5. The intelligent text analysis method based on natural language processing according to claim 4, wherein: in the step S4, the two-way long-short-term memory network BiLSTM is used as an encoding layer of the named entity recognition model, and is responsible for comprehensively capturing context information of the text, so that the named entity recognition model can fully consider the front-rear relevance of the text when generating an output tag sequence, and the accuracy of prediction is improved;
The conditional random field CRF is responsible for determining the dependency relationship between labels in the labeling sequence, thereby optimizing the recognition process of the named entity.
6. A system for implementing the intelligent text analysis method based on natural language processing according to any one of claims 1 to 5, wherein: comprising the following steps:
the data collection module is responsible for self-defining and selecting websites, and crawling files on the corresponding websites by utilizing a crawler technology to obtain corresponding data;
The data cleaning module is responsible for cleaning the data of the crawled data after the crawling of the data is completed, eliminating text noise and standardizing the data format so as to improve the accuracy and quality of the crawled data;
the data filling module is responsible for carrying out real data filling on the text after the data cleaning is completed so as to improve the accuracy of entity identification;
The named entity recognition model construction module is responsible for constructing a named entity recognition model by taking a bi-directional encoder representation transformation BERT as a basic stone and combining a bi-directional long-short-term memory network BiLSTM and a conditional random field CRF so as to improve the accuracy of entity recognition;
The model training module is in charge of extracting effective features from word sequences, inputting the characterized text fragments into a pre-defined class set, and training a named entity recognition model so that the corresponding relationship between the text features and class labels can be learned;
And the named entity recognition module is responsible for loading the optimal model obtained after training, carrying out named entity recognition on the new input text, and outputting the entity and the category label thereof.
7. The system of intelligent analysis of text based on natural language processing of claim 6, wherein: the system also comprises a preprocessing module which is responsible for preprocessing the text input into the named entity recognition model, and comprises word segmentation and stop word removal, and converts the text into a word sequence.
8. A computer readable storage medium storing one or more programs, characterized by: the one or more programs include instructions, which when executed by a computing device, cause the computing device to perform any of the methods of claims 1-5.
9. A computing device, characterized by: comprising the following steps:
one or more processors, one or more memories, and one or more programs, wherein the one or more programs are stored in the one or more memories and configured to be executed by the one or more processors, the one or more programs comprising instructions for performing any of the methods of claims 1-5.
CN202411441166.7A 2024-10-16 2024-10-16 A text intelligent analysis method and system based on natural language processing Pending CN118966194A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202411441166.7A CN118966194A (en) 2024-10-16 2024-10-16 A text intelligent analysis method and system based on natural language processing

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202411441166.7A CN118966194A (en) 2024-10-16 2024-10-16 A text intelligent analysis method and system based on natural language processing

Publications (1)

Publication Number Publication Date
CN118966194A true CN118966194A (en) 2024-11-15

Family

ID=93401786

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202411441166.7A Pending CN118966194A (en) 2024-10-16 2024-10-16 A text intelligent analysis method and system based on natural language processing

Country Status (1)

Country Link
CN (1) CN118966194A (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111858944A (en) * 2020-07-31 2020-10-30 电子科技大学 A Entity Aspect-Level Sentiment Analysis Method Based on Attention Mechanism
CN112597373A (en) * 2020-12-29 2021-04-02 科技谷(厦门)信息技术有限公司 Data acquisition method based on distributed crawler engine
WO2021072852A1 (en) * 2019-10-16 2021-04-22 平安科技(深圳)有限公司 Sequence labeling method and system, and computer device
CN112749562A (en) * 2020-12-31 2021-05-04 合肥工业大学 Named entity identification method, device, storage medium and electronic equipment
CN113128227A (en) * 2020-01-14 2021-07-16 普天信息技术有限公司 Entity extraction method and device
CN114386422A (en) * 2022-01-14 2022-04-22 淮安市创新创业科技服务中心 Intelligent auxiliary decision-making method and device based on enterprise pollution public opinion extraction
CN114626346A (en) * 2022-01-21 2022-06-14 锦创科技股份有限公司 A method of NLP analysis, identification and data cleaning based on artificial intelligence

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021072852A1 (en) * 2019-10-16 2021-04-22 平安科技(深圳)有限公司 Sequence labeling method and system, and computer device
CN113128227A (en) * 2020-01-14 2021-07-16 普天信息技术有限公司 Entity extraction method and device
CN111858944A (en) * 2020-07-31 2020-10-30 电子科技大学 A Entity Aspect-Level Sentiment Analysis Method Based on Attention Mechanism
CN112597373A (en) * 2020-12-29 2021-04-02 科技谷(厦门)信息技术有限公司 Data acquisition method based on distributed crawler engine
CN112749562A (en) * 2020-12-31 2021-05-04 合肥工业大学 Named entity identification method, device, storage medium and electronic equipment
CN114386422A (en) * 2022-01-14 2022-04-22 淮安市创新创业科技服务中心 Intelligent auxiliary decision-making method and device based on enterprise pollution public opinion extraction
CN114626346A (en) * 2022-01-21 2022-06-14 锦创科技股份有限公司 A method of NLP analysis, identification and data cleaning based on artificial intelligence

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
中国电子学会编著: "《大数据学科与技术路线图》", 30 November 2021, 中国科学技术出版社, pages: 125 - 126 *

Similar Documents

Publication Publication Date Title
CN111027327B (en) Machine reading understanding method, device, storage medium and device
CN108334489B (en) Text core word recognition method and device
CN113946677A (en) Event identification and classification method based on bidirectional cyclic neural network and attention mechanism
CN110188158B (en) Keyword and topic label generation method, device, medium and electronic equipment
CN110390049B (en) Automatic answer generation method for software development questions
CN113806548A (en) Petition factor extraction method and system based on deep learning model
CN113836896A (en) Patent text abstract generation method and device based on deep learning
CN117520503A (en) Financial customer service dialogue generation method, device, equipment and medium based on LLM model
CN110852071A (en) Knowledge point detection method, device, equipment and readable storage medium
CN119577148A (en) A text classification method, device, computer equipment and storage medium
CN117520561A (en) Entity relationship extraction method and system for constructing knowledge graph in helicopter assembly field
CN116955534A (en) Intelligent complaint work order processing method, intelligent complaint work order processing device, intelligent complaint work order processing equipment and storage medium
CN114356990A (en) Base named entity recognition system and method based on transfer learning
CN114357160A (en) Early rumor detection method and device based on generation propagation structure characteristics
CN113762589A (en) A system and method for predicting changes in power transmission and transformation projects
CN118966194A (en) A text intelligent analysis method and system based on natural language processing
CN120372618A (en) Cross-modal feature-based vulnerability positioning method and system
CN115658956B (en) Hot topic mining method and system based on conference audio data
CN117390156A (en) Cross-modal question and answer dialogue methods, systems, devices and storage media
CN112732908B (en) Test question novelty evaluation method and device, electronic equipment and storage medium
CN109885827B (en) Deep learning-based named entity identification method and system
CN118132733B (en) Test question retrieval method, system, storage medium and electronic equipment
CN113673236A (en) Model training method, table recognition method, device, electronic equipment and storage medium
US12511478B2 (en) Artificial intelligence based log mask prediction for communications system testing
CN116738289B (en) Text emotion classification method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20241115