CN105512110B - 一种基于模糊匹配与统计的错字词知识库构建方法 - Google Patents
一种基于模糊匹配与统计的错字词知识库构建方法 Download PDFInfo
- Publication number
- CN105512110B CN105512110B CN201510934356.7A CN201510934356A CN105512110B CN 105512110 B CN105512110 B CN 105512110B CN 201510934356 A CN201510934356 A CN 201510934356A CN 105512110 B CN105512110 B CN 105512110B
- Authority
- CN
- China
- Prior art keywords
- word
- merged
- string
- adjacent
- word string
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/232—Orthographic correction, e.g. spell checking or vowelisation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/237—Lexical tools
- G06F40/242—Dictionaries
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/258—Heading extraction; Automatic titling; Numbering
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Machine Translation (AREA)
Abstract
Description
Claims (6)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510934356.7A CN105512110B (zh) | 2015-12-15 | 2015-12-15 | 一种基于模糊匹配与统计的错字词知识库构建方法 |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510934356.7A CN105512110B (zh) | 2015-12-15 | 2015-12-15 | 一种基于模糊匹配与统计的错字词知识库构建方法 |
Publications (2)
Publication Number | Publication Date |
---|---|
CN105512110A CN105512110A (zh) | 2016-04-20 |
CN105512110B true CN105512110B (zh) | 2018-04-06 |
Family
ID=55720103
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201510934356.7A Active CN105512110B (zh) | 2015-12-15 | 2015-12-15 | 一种基于模糊匹配与统计的错字词知识库构建方法 |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN105512110B (zh) |
Families Citing this family (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106528532B (zh) * | 2016-11-07 | 2019-03-12 | 上海智臻智能网络科技股份有限公司 | 文本纠错方法、装置及终端 |
JP7027696B2 (ja) * | 2017-04-25 | 2022-03-02 | 富士フイルムビジネスイノベーション株式会社 | 情報処理装置及び情報処理プログラム |
CN107180084B (zh) * | 2017-05-05 | 2020-04-21 | 上海木木聚枞机器人科技有限公司 | 词库更新方法及装置 |
CN108280051B (zh) * | 2018-01-22 | 2019-04-05 | 清华大学 | 一种文本数据中错误字符的检测方法、装置和设备 |
CN108564086B (zh) * | 2018-03-17 | 2024-05-10 | 上海柯渡医学科技股份有限公司 | 一种字符串的识别校验方法及装置 |
CN108984515B (zh) * | 2018-05-22 | 2022-09-06 | 广州视源电子科技股份有限公司 | 错别字检测方法、装置及计算机可读存储介质、终端设备 |
CN108717412A (zh) * | 2018-06-12 | 2018-10-30 | 北京览群智数据科技有限责任公司 | 基于中文分词的中文校对纠错方法及系统 |
CN110807321B (zh) * | 2018-07-20 | 2024-11-12 | 北京搜狗科技发展有限公司 | 一种组词方法、装置、电子设备及可读存储介质 |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2007073054A (ja) * | 2005-09-08 | 2007-03-22 | Fujitsu Ltd | 対訳語句提示プログラム、対訳語句提示方法および対訳語句提示装置 |
CN101639826A (zh) * | 2009-09-01 | 2010-02-03 | 西北大学 | 一种基于中文句式模板变换的文本隐藏方法 |
CN101655982A (zh) * | 2009-09-04 | 2010-02-24 | 上海交通大学 | 基于改进Harris角点的图像配准方法 |
CN101950306A (zh) * | 2010-09-29 | 2011-01-19 | 北京新媒传信科技有限公司 | 新词发现中的字符串过滤方法 |
CN104915264A (zh) * | 2015-05-29 | 2015-09-16 | 北京搜狗科技发展有限公司 | 一种输入纠错方法和装置 |
CN104991889A (zh) * | 2015-06-26 | 2015-10-21 | 江苏科技大学 | 一种基于模糊分词的非多字词错误自动校对方法 |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1652107A (zh) * | 1998-06-04 | 2005-08-10 | 松下电器产业株式会社 | 语言变换规则产生装置、语言变换装置及程序记录媒体 |
-
2015
- 2015-12-15 CN CN201510934356.7A patent/CN105512110B/zh active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2007073054A (ja) * | 2005-09-08 | 2007-03-22 | Fujitsu Ltd | 対訳語句提示プログラム、対訳語句提示方法および対訳語句提示装置 |
CN101639826A (zh) * | 2009-09-01 | 2010-02-03 | 西北大学 | 一种基于中文句式模板变换的文本隐藏方法 |
CN101655982A (zh) * | 2009-09-04 | 2010-02-24 | 上海交通大学 | 基于改进Harris角点的图像配准方法 |
CN101950306A (zh) * | 2010-09-29 | 2011-01-19 | 北京新媒传信科技有限公司 | 新词发现中的字符串过滤方法 |
CN104915264A (zh) * | 2015-05-29 | 2015-09-16 | 北京搜狗科技发展有限公司 | 一种输入纠错方法和装置 |
CN104991889A (zh) * | 2015-06-26 | 2015-10-21 | 江苏科技大学 | 一种基于模糊分词的非多字词错误自动校对方法 |
Non-Patent Citations (5)
Title |
---|
Metrics for Measuring Domain Independence of Semantic Classes;Andrew Pargellis 等;《Proc. of European Speech Processing》;20010930;第1-4页 * |
中文文本自动校对技术的研究;骆卫华 等;《计算机研究与发展》;20040131;第41卷(第1期);第244-249页 * |
利用三元模型及依存分析查找中文文本错误;马金山 等;《情报学报》;20041231;第23卷(第6期);第724页右栏第6段、725页左栏第2段 * |
汉字种子混淆集的构建方法研究;施恒利 等;《计算机科学》;20140831;第41卷(第8期);第229-233页 * |
领域问答系统中的文本错误自动发现方法;刘亮亮 等;《中文信息学报》;20130531;第27卷(第3期);第79页4.2节第1段、第80页表1、4.3节第1段、4.4节第1段、4.5节、第83页左栏第1段 * |
Also Published As
Publication number | Publication date |
---|---|
CN105512110A (zh) | 2016-04-20 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN105512110B (zh) | 一种基于模糊匹配与统计的错字词知识库构建方法 | |
CN109271626B (zh) | 文本语义分析方法 | |
CN111310470B (zh) | 一种融合字词特征的中文命名实体识别方法 | |
CN105279149A (zh) | 一种中文文本自动校正方法 | |
US10417335B2 (en) | Automated quantitative assessment of text complexity | |
CN104133812B (zh) | 一种面向用户查询意图的汉语句子相似度分层计算方法及装置 | |
CN110427618A (zh) | 对抗样本生成方法、介质、装置和计算设备 | |
CN100524293C (zh) | 一种从双语句对获取词对译文的方法及系统 | |
CN106528524A (zh) | 一种基于MMseg算法与逐点互信息算法的分词方法 | |
CN113177412A (zh) | 基于bert的命名实体识别方法、系统、电子设备及存储介质 | |
CN101876975A (zh) | 汉语地名的识别方法 | |
CN111626042A (zh) | 指代消解方法及装置 | |
CN108536724A (zh) | 一种基于双层哈希索引的地铁设计规范中主体识别方法 | |
Attia et al. | Gwu-hasp: Hybrid arabic spelling and punctuation corrector | |
CN115017335A (zh) | 知识图谱构建方法和系统 | |
CN107229611A (zh) | 一种基于词对齐的历史典籍分词方法 | |
CN103942188B (zh) | 一种识别语料语言的方法和装置 | |
US20230274088A1 (en) | Sentiment parsing method, electronic device, and storage medium | |
CN114912437B (zh) | 弹幕颜文字检测与提取方法、系统、终端及介质 | |
Khan et al. | Knowledge-based word tokenization system for urdu | |
Naemi et al. | Informal-to-formal word conversion for persian language using natural language processing techniques | |
CN115238694A (zh) | 一种文本抽取方法、装置、设备及介质 | |
CN115221335A (zh) | 一种知识图谱的构建方法 | |
CN102110087A (zh) | 字符数据中实体消解的方法和装置 | |
CN114528824A (zh) | 文本纠错方法、装置、电子设备及存储介质 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
CB03 | Change of inventor or designer information |
Inventor after: Liu Liangliang Inventor after: Liu Haibo Inventor after: Wu Jiankang Inventor after: Gu Dezhi Inventor after: Zhang Zaiyue Inventor after: Zhang Xiaoru Inventor before: Liu Haibo Inventor before: Liu Liangliang Inventor before: Wu Jiankang Inventor before: Gu Dezhi Inventor before: Zhang Zaiyue Inventor before: Zhang Xiaoru |
|
CB03 | Change of inventor or designer information | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
EE01 | Entry into force of recordation of patent licensing contract |
Application publication date: 20160420 Assignee: JIANGSU KEDA HUIFENG SCIENCE AND TECHNOLOGY Co.,Ltd. Assignor: JIANGSU University OF SCIENCE AND TECHNOLOGY Contract record no.: X2020980007325 Denomination of invention: A method of building wrong word knowledge base based on fuzzy matching and statistics Granted publication date: 20180406 License type: Common License Record date: 20201029 |
|
EE01 | Entry into force of recordation of patent licensing contract | ||
EC01 | Cancellation of recordation of patent licensing contract |
Assignee: JIANGSU KEDA HUIFENG SCIENCE AND TECHNOLOGY Co.,Ltd. Assignor: JIANGSU University OF SCIENCE AND TECHNOLOGY Contract record no.: X2020980007325 Date of cancellation: 20201223 |
|
EC01 | Cancellation of recordation of patent licensing contract | ||
TR01 | Transfer of patent right |
Effective date of registration: 20221230 Address after: Room 02A-084, Building C (Second Floor), No. 28, Xinxi Road, Haidian District, Beijing 100085 Patentee after: Jingchuang United (Beijing) Intellectual Property Service Co.,Ltd. Address before: 212003, No. 2, Mengxi Road, Zhenjiang, Jiangsu Patentee before: JIANGSU University OF SCIENCE AND TECHNOLOGY Effective date of registration: 20221230 Address after: Room 606-609, Compound Office Complex Building, No. 757, Dongfeng East Road, Yuexiu District, Guangzhou, Guangdong Province, 510699 Patentee after: China Southern Power Grid Internet Service Co.,Ltd. Address before: Room 02A-084, Building C (Second Floor), No. 28, Xinxi Road, Haidian District, Beijing 100085 Patentee before: Jingchuang United (Beijing) Intellectual Property Service Co.,Ltd. |
|
TR01 | Transfer of patent right |