CN111160024B - Chinese word segmentation method, system, device and storage medium based on statistics - Google Patents
Chinese word segmentation method, system, device and storage medium based on statistics Download PDFInfo
- Publication number
- CN111160024B CN111160024B CN201911392455.1A CN201911392455A CN111160024B CN 111160024 B CN111160024 B CN 111160024B CN 201911392455 A CN201911392455 A CN 201911392455A CN 111160024 B CN111160024 B CN 111160024B
- Authority
- CN
- China
- Prior art keywords
- word segmentation
- word
- probability
- target text
- combining
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 230000011218 segmentation Effects 0.000 title claims abstract description 215
- 238000000034 method Methods 0.000 title claims abstract description 41
- 238000012545 processing Methods 0.000 claims abstract description 27
- 230000007704 transition Effects 0.000 claims description 14
- 230000006870 function Effects 0.000 claims description 10
- 239000011159 matrix material Substances 0.000 claims description 8
- 230000009286 beneficial effect Effects 0.000 description 3
- 238000010586 diagram Methods 0.000 description 2
- 230000010365 information processing Effects 0.000 description 2
- 239000000463 material Substances 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000006467 substitution reaction Methods 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000015572 biosynthetic process Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000012937 correction Methods 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000003058 natural language processing Methods 0.000 description 1
- 238000003786 synthesis reaction Methods 0.000 description 1
- 238000013519 translation Methods 0.000 description 1
Classifications
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Machine Translation (AREA)
- Document Processing Apparatus (AREA)
Abstract
Description
Claims (8)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911392455.1A CN111160024B (en) | 2019-12-30 | 2019-12-30 | Chinese word segmentation method, system, device and storage medium based on statistics |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911392455.1A CN111160024B (en) | 2019-12-30 | 2019-12-30 | Chinese word segmentation method, system, device and storage medium based on statistics |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111160024A CN111160024A (en) | 2020-05-15 |
CN111160024B true CN111160024B (en) | 2023-08-15 |
Family
ID=70558931
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201911392455.1A Active CN111160024B (en) | 2019-12-30 | 2019-12-30 | Chinese word segmentation method, system, device and storage medium based on statistics |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111160024B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113779062B (en) * | 2021-02-23 | 2025-02-21 | 北京沃东天骏信息技术有限公司 | SQL statement generation method, device, storage medium and electronic device |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4819271A (en) * | 1985-05-29 | 1989-04-04 | International Business Machines Corporation | Constructing Markov model word baseforms from multiple utterances by concatenating model sequences for word segments |
CN101819772A (en) * | 2010-02-09 | 2010-09-01 | 中国船舶重工集团公司第七○九研究所 | Phonetic segmentation-based isolate word recognition method |
CN104408034A (en) * | 2014-11-28 | 2015-03-11 | 武汉数为科技有限公司 | Text big data-oriented Chinese word segmentation method |
CN105718586A (en) * | 2016-01-26 | 2016-06-29 | 中国人民解放军国防科学技术大学 | Word division method and device |
CN105975454A (en) * | 2016-04-21 | 2016-09-28 | 广州精点计算机科技有限公司 | Chinese word segmentation method and device of webpage text |
CN106844350A (en) * | 2017-02-15 | 2017-06-13 | 广州索答信息科技有限公司 | A kind of computational methods of short text semantic similarity |
CN108170680A (en) * | 2017-12-29 | 2018-06-15 | 厦门市美亚柏科信息股份有限公司 | Keyword recognition method, terminal device and storage medium based on Hidden Markov Model |
CN108984159A (en) * | 2018-06-15 | 2018-12-11 | 浙江网新恒天软件有限公司 | A kind of breviary phrase extended method based on markov language model |
CN109033085A (en) * | 2018-08-02 | 2018-12-18 | 北京神州泰岳软件股份有限公司 | The segmenting method of Chinese automatic word-cut and Chinese text |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20090326916A1 (en) * | 2008-06-27 | 2009-12-31 | Microsoft Corporation | Unsupervised chinese word segmentation for statistical machine translation |
-
2019
- 2019-12-30 CN CN201911392455.1A patent/CN111160024B/en active Active
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4819271A (en) * | 1985-05-29 | 1989-04-04 | International Business Machines Corporation | Constructing Markov model word baseforms from multiple utterances by concatenating model sequences for word segments |
CN101819772A (en) * | 2010-02-09 | 2010-09-01 | 中国船舶重工集团公司第七○九研究所 | Phonetic segmentation-based isolate word recognition method |
CN104408034A (en) * | 2014-11-28 | 2015-03-11 | 武汉数为科技有限公司 | Text big data-oriented Chinese word segmentation method |
CN105718586A (en) * | 2016-01-26 | 2016-06-29 | 中国人民解放军国防科学技术大学 | Word division method and device |
CN105975454A (en) * | 2016-04-21 | 2016-09-28 | 广州精点计算机科技有限公司 | Chinese word segmentation method and device of webpage text |
CN106844350A (en) * | 2017-02-15 | 2017-06-13 | 广州索答信息科技有限公司 | A kind of computational methods of short text semantic similarity |
CN108170680A (en) * | 2017-12-29 | 2018-06-15 | 厦门市美亚柏科信息股份有限公司 | Keyword recognition method, terminal device and storage medium based on Hidden Markov Model |
CN108984159A (en) * | 2018-06-15 | 2018-12-11 | 浙江网新恒天软件有限公司 | A kind of breviary phrase extended method based on markov language model |
CN109033085A (en) * | 2018-08-02 | 2018-12-18 | 北京神州泰岳软件股份有限公司 | The segmenting method of Chinese automatic word-cut and Chinese text |
Non-Patent Citations (1)
Title |
---|
彭瑜."基于语法的分词系统的设计与实现".《中国优秀硕士学位论文全文数据库信息科技辑》.2014,(第undefined期),I138-906. * |
Also Published As
Publication number | Publication date |
---|---|
CN111160024A (en) | 2020-05-15 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109857845B (en) | Model training and data retrieval method, device, terminal and computer-readable storage medium | |
US20210193121A1 (en) | Speech recognition method, apparatus, and device, and storage medium | |
US10073673B2 (en) | Method and system for robust tagging of named entities in the presence of source or translation errors | |
Xu et al. | Minimum bayes risk decoding and system combination based on a recursion for edit distance | |
EP4131255A1 (en) | Method and apparatus for decoding voice data, computer device and storage medium | |
US20180276525A1 (en) | Method and neural network system for human-computer interaction, and user equipment | |
KR101543992B1 (en) | Intra-language statistical machine translation | |
CN108287820B (en) | Text representation generation method and device | |
CN112100354B (en) | Man-machine conversation method, device, equipment and storage medium | |
CN111859964B (en) | Method and device for identifying named entities in sentences | |
CN111739514B (en) | Voice recognition method, device, equipment and medium | |
CN106503231B (en) | Search method and device based on artificial intelligence | |
CN110619043A (en) | Automatic text abstract generation method based on dynamic word vector | |
US8356065B2 (en) | Similar text search method, similar text search system, and similar text search program | |
CN102479191A (en) | Method and device for providing multi-granularity word segmentation result | |
JPH0689302A (en) | Dictionary memory | |
US11423237B2 (en) | Sequence transduction neural networks | |
CN111666374A (en) | Method for integrating additional knowledge information into deep language model | |
CN117648933A (en) | Natural language ambiguity resolution method and system based on deep learning and knowledge base | |
CN111160024B (en) | Chinese word segmentation method, system, device and storage medium based on statistics | |
Fusayasu et al. | Word-Error Correction of Continuous Speech Recognition Based on Normalized Relevance Distance. | |
CN112447172B (en) | Quality improvement method and device for voice recognition text | |
CN112766002B (en) | Text alignment method and system based on dynamic programming | |
Besacier et al. | Word confidence estimation for speech translation | |
CN113111651A (en) | Chinese word segmentation method and device and search word bank reading method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CP03 | Change of name, title or address | ||
CP03 | Change of name, title or address |
Address after: 510663 research institute office building, No.9, Kelin Road, Science City, Guangzhou high tech Industrial Development Zone, Guangzhou City, Guangdong Province Patentee after: GRG BANKING IT Co.,Ltd. Country or region after: China Patentee after: Guangdian Yuntong Group Co.,Ltd. Address before: 510663 research institute office building, No.9, Kelin Road, Science City, Guangzhou high tech Industrial Development Zone, Guangzhou City, Guangdong Province Patentee before: GRG BANKING IT Co.,Ltd. Country or region before: China Patentee before: GRG BANKING EQUIPMENT Co.,Ltd. |
|
TR01 | Transfer of patent right | ||
TR01 | Transfer of patent right |
Effective date of registration: 20240618 Address after: Room 701, No. 11, Kelin Road, Science City, Huangpu District, Guangzhou City, Guangdong Province, 510663 Patentee after: GRG BANKING IT Co.,Ltd. Country or region after: China Address before: 510663 research institute office building, No.9, Kelin Road, Science City, Guangzhou high tech Industrial Development Zone, Guangzhou City, Guangdong Province Patentee before: GRG BANKING IT Co.,Ltd. Country or region before: China Patentee before: Guangdian Yuntong Group Co.,Ltd. |