CN101308512A - 一种基于网页的互译翻译对抽取方法及装置 - Google Patents
一种基于网页的互译翻译对抽取方法及装置 Download PDFInfo
- Publication number
- CN101308512A CN101308512A CN 200810126468 CN200810126468A CN101308512A CN 101308512 A CN101308512 A CN 101308512A CN 200810126468 CN200810126468 CN 200810126468 CN 200810126468 A CN200810126468 A CN 200810126468A CN 101308512 A CN101308512 A CN 101308512A
- Authority
- CN
- China
- Prior art keywords
- text
- bilingual
- tuples
- unit
- extract
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Landscapes
- Information Transfer Between Computers (AREA)
- Machine Translation (AREA)
Abstract
Description
合并后的双语二元组 | 频度 |
(“木马”,“trojan horse”) | 4 |
(“特洛伊木马”,“trojan horse”) | 4 |
(“叫做特洛伊木马”,“trojan horse”) | 1 |
(“全称叫做特洛伊木马”,“trojan horse”) | 1 |
(“的全称叫做特洛伊木马”,“trojan horse”) | 1 |
(“木马的全称叫做特洛伊木马”,“trojanhorse”) | 1 |
(“全称特洛伊木马”,“trojan horse”) | 1 |
(“的特洛伊木马”,“trojan horse”) | 1 |
(“点的特洛伊木马”,“trojan horse”) | 1 |
(“好点的特洛伊木马”,“trojan horse”) | 1 |
(“比较好点的特洛伊木马”,“trojan horse”) | 1 |
(“个比较好点的特洛伊木马”,“trojanhorse”) | 1 |
(“介绍个比较好点的特洛伊木马”,“trojan horse”) | 1 |
(“能介绍个比较好点的特洛伊木马”,“trojan horse”) | 1 |
(“谁能介绍个比较好点的特洛伊木马”,“trojan horse”) | 1 |
双语二元组 | 分值 |
(“木马”,“trojan horse”) | 4.39 |
(“特洛伊木马”,“trojan horse”) | 7.17 |
Claims (19)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN200810126468XA CN101308512B (zh) | 2008-06-25 | 2008-07-03 | 一种基于网页的互译翻译对抽取方法及装置 |
Applications Claiming Priority (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN200810125774.1 | 2008-06-25 | ||
CN200810125774 | 2008-06-25 | ||
CN200810126468XA CN101308512B (zh) | 2008-06-25 | 2008-07-03 | 一种基于网页的互译翻译对抽取方法及装置 |
Publications (2)
Publication Number | Publication Date |
---|---|
CN101308512A true CN101308512A (zh) | 2008-11-19 |
CN101308512B CN101308512B (zh) | 2011-09-14 |
Family
ID=40124967
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN200810126468XA Active CN101308512B (zh) | 2008-06-25 | 2008-07-03 | 一种基于网页的互译翻译对抽取方法及装置 |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN101308512B (zh) |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102043808A (zh) * | 2009-10-14 | 2011-05-04 | 腾讯科技(深圳)有限公司 | 利用网页结构抽取双语词条的方法及设备 |
CN102550049A (zh) * | 2009-09-25 | 2012-07-04 | 雅虎公司 | 通过动态学习提取规则来获取词表外的翻译 |
CN102902667A (zh) * | 2012-10-12 | 2013-01-30 | 曾立人 | 一种翻译记忆匹配结果显示方法 |
CN103186645A (zh) * | 2011-12-31 | 2013-07-03 | 北京金山软件有限公司 | 一种基于网络的特定资源获取方法和装置 |
CN103970732A (zh) * | 2014-05-22 | 2014-08-06 | 北京百度网讯科技有限公司 | 新词译文的挖掘方法和装置 |
CN105653516A (zh) * | 2015-12-30 | 2016-06-08 | 武汉传神信息技术有限公司 | 平行语料对齐的方法和装置 |
CN106055543A (zh) * | 2016-05-23 | 2016-10-26 | 南京大学 | 基于Spark的大规模短语翻译模型的训练方法 |
CN109977424A (zh) * | 2017-12-27 | 2019-07-05 | 北京搜狗科技发展有限公司 | 一种机器翻译模型的训练方法及装置 |
-
2008
- 2008-07-03 CN CN200810126468XA patent/CN101308512B/zh active Active
Cited By (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102550049A (zh) * | 2009-09-25 | 2012-07-04 | 雅虎公司 | 通过动态学习提取规则来获取词表外的翻译 |
CN102550049B (zh) * | 2009-09-25 | 2016-05-25 | 雅虎公司 | 通过动态学习提取规则来获取词表外的翻译 |
CN102043808B (zh) * | 2009-10-14 | 2014-06-18 | 腾讯科技(深圳)有限公司 | 利用网页结构抽取双语词条的方法及设备 |
CN102043808A (zh) * | 2009-10-14 | 2011-05-04 | 腾讯科技(深圳)有限公司 | 利用网页结构抽取双语词条的方法及设备 |
CN103186645A (zh) * | 2011-12-31 | 2013-07-03 | 北京金山软件有限公司 | 一种基于网络的特定资源获取方法和装置 |
CN102902667A (zh) * | 2012-10-12 | 2013-01-30 | 曾立人 | 一种翻译记忆匹配结果显示方法 |
CN103970732A (zh) * | 2014-05-22 | 2014-08-06 | 北京百度网讯科技有限公司 | 新词译文的挖掘方法和装置 |
CN103970732B (zh) * | 2014-05-22 | 2017-05-10 | 北京百度网讯科技有限公司 | 新词译文的挖掘方法和装置 |
CN105653516A (zh) * | 2015-12-30 | 2016-06-08 | 武汉传神信息技术有限公司 | 平行语料对齐的方法和装置 |
CN105653516B (zh) * | 2015-12-30 | 2018-08-10 | 语联网(武汉)信息技术有限公司 | 平行语料对齐的方法和装置 |
CN106055543A (zh) * | 2016-05-23 | 2016-10-26 | 南京大学 | 基于Spark的大规模短语翻译模型的训练方法 |
CN106055543B (zh) * | 2016-05-23 | 2019-04-09 | 南京大学 | 基于Spark的大规模短语翻译模型的训练方法 |
CN109977424A (zh) * | 2017-12-27 | 2019-07-05 | 北京搜狗科技发展有限公司 | 一种机器翻译模型的训练方法及装置 |
CN109977424B (zh) * | 2017-12-27 | 2023-08-08 | 北京搜狗科技发展有限公司 | 一种机器翻译模型的训练方法及装置 |
Also Published As
Publication number | Publication date |
---|---|
CN101308512B (zh) | 2011-09-14 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
TWI636452B (zh) | 語音識別方法及系統 | |
CN107797991B (zh) | 一种基于依存句法树的知识图谱扩充方法及系统 | |
CN101308512B (zh) | 一种基于网页的互译翻译对抽取方法及装置 | |
Tran et al. | JAIST: Combining multiple features for answer selection in community question answering | |
US8612206B2 (en) | Transliterating semitic languages including diacritics | |
CN109635297B (zh) | 一种实体消歧方法、装置、计算机装置及计算机存储介质 | |
Kothari et al. | SMS based interface for FAQ retrieval | |
JP5379138B2 (ja) | 領域辞書の作成 | |
KR102491172B1 (ko) | 자연어 질의응답 시스템 및 그 학습 방법 | |
JP5710581B2 (ja) | 質問応答装置、方法、及びプログラム | |
CN101782898A (zh) | 一种情感词倾向性的分析方法 | |
Alshalabi et al. | Arabic light-based stemmer using new rules | |
Jayan et al. | A hybrid statistical approach for named entity recognition for malayalam language | |
CN101187924A (zh) | 一种从双语句对获取词对译文的方法及系统 | |
Gadri et al. | Information retrieval: A new multilingual stemmer based on a statistical approach | |
CN107797995A (zh) | 一种中英文片段语料生成方法 | |
Kilgarriff et al. | Longest–commonest match | |
Sahu et al. | Twitter sentiment analysis--a more enhanced way of classification and scoring | |
CN107894977A (zh) | 结合兼类词词性消歧模型和字典的越南语词性标记方法 | |
CN106776590A (zh) | 一种获取词条译文的方法及系统 | |
Sabouri et al. | naab: A ready-to-use plug-and-play corpus for Farsi | |
Naemi et al. | Informal-to-formal word conversion for persian language using natural language processing techniques | |
Kaur et al. | Toward normalizing Romanized Gurumukhi text from social media | |
Gupta et al. | Product review translation: Parallel corpus creation and robustness towards user-generated noisy text | |
JP4088171B2 (ja) | テキスト解析装置、方法、プログラム及びそのプログラムを記録した記録媒体 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant | ||
ASS | Succession or assignment of patent right |
Owner name: BEIJING KINGSOFT OFFICE SOFTWARE CO., LTD. Free format text: FORMER OWNER: BEIJING JINSHAN SOFTWARE CO., LTD. Effective date: 20140312 Free format text: FORMER OWNER: BEIJING JINSHAN DIGITAL ENTERTAINMENT SCIENCE AND TECHNOLOGY CO., LTD. Effective date: 20140312 |
|
C41 | Transfer of patent application or patent right or utility model | ||
COR | Change of bibliographic data |
Free format text: CORRECT: ADDRESS; FROM: 100083 HAIDIAN, BEIJING TO: 100085 HAIDIAN, BEIJING |
|
TR01 | Transfer of patent right |
Effective date of registration: 20140312 Address after: Kingsoft No. 33 building, 100085 Beijing city Haidian District Xiaoying Road Patentee after: Beijing Kingsoft WPS Office Co., Ltd. Address before: 100083, Beijing, Haidian District No. 238 North Fourth Ring Road, No. 20, Bai Yan building Patentee before: Beijing Jinshan Software Co., Ltd. Patentee before: Beijing Jinshan Digital Entertainment Science and Technology Co., Ltd. |
|
C56 | Change in the name or address of the patentee | ||
CP01 | Change in the name or title of a patent holder |
Address after: Kingsoft No. 33 building, 100085 Beijing city Haidian District Xiaoying Road Patentee after: Beijing Kingsoft office software Limited by Share Ltd Address before: Kingsoft No. 33 building, 100085 Beijing city Haidian District Xiaoying Road Patentee before: Beijing Kingsoft WPS Office Co., Ltd. |