CN110705269A

CN110705269A - Automatic construction method of multi-source information fusion new word library

Info

Publication number: CN110705269A
Application number: CN201910764965.0A
Authority: CN
Inventors: 李吉平; 古万荣; 朱凯
Original assignee: South China Agricultural University
Current assignee: South China Agricultural University
Priority date: 2019-08-19
Filing date: 2019-08-19
Publication date: 2020-01-17
Anticipated expiration: 2039-08-19
Also published as: CN110705269B

Abstract

The invention discloses a method for automatically constructing a new word database with multi-source information fusion. Memory and long-term memory are two multi-level vocabularies; the invention integrates information such as user operation, mouth shape, voice, memory, etc. to automatically recognize new words and dynamically update the vocabularies, and realize the construction process of the vocabularies Compared with the previous method where users judge by themselves and then manually confirm, the learning efficiency can be improved; new words are divided into reading new words and phonetic new words, which is different from the previous method of memorizing unknown or unfamiliar words. It can promote the improvement of reading ability and listening and speaking level at the same time; divide the vocabulary library into two multi-level vocabulary libraries of short-term memory and long-term memory, which provides a means for the study of personalized memory rules.

Description

An automatic construction method of new vocabulary database based on multi-source information fusion

技术领域technical field

本发明涉及计算机应用技术领域，具体涉及一种多源信息融合的生词库自动构建方法。The invention relates to the technical field of computer applications, in particular to a method for automatically constructing a lexicon based on multi-source information fusion.

背景技术Background technique

国际化是社会发展的一个重要特征。在日常生活、工作中，人们越来越多的需要用外语进行沟通交流。背生词是提高外语能力的一种有效手段，市场上也有很多辅助生词记忆的软件工具，这些工具的不足主要表现在以下几点：Internationalization is an important feature of social development. In daily life and work, more and more people need to communicate in foreign languages. Memorizing new words is an effective means to improve foreign language ability. There are also many software tools on the market to assist the memory of new words. The deficiencies of these tools are mainly reflected in the following points:

(1)生词库中的单词首先需要用户自己判断是否为生词，然后再通过手动确认的方式实现生词库的更新，影响学习效率的提高；(1) The words in the new vocabulary database first need users to judge whether they are new words, and then realize the update of the new vocabulary database by manual confirmation, which affects the improvement of learning efficiency;

(2)对“生词”的理解局限在不认识或不熟悉的词，对这类生词进行记忆，有利于阅读能力的提升，但对听说能力没有直接促进作用；(2) The comprehension of "new words" is limited to the words that are not known or unfamiliar. Memorizing such new words is beneficial to the improvement of reading ability, but does not directly promote the listening and speaking ability;

(3)依据艾宾浩斯记忆遗忘曲线的共性群体规律进行生词复习，忽略了记忆遗忘速度的个体差异。(3) Review new words according to the common group law of the Ebbinghaus memory-forgetting curve, ignoring the individual differences in memory-forgetting speed.

发明内容SUMMARY OF THE INVENTION

本发明的目的在于克服现有技术的缺点与不足，提供一种多源信息融合的生词库自动构建方法，该方法根据用户操作信息，自动识别阅读生词；根据用户口型、语音等信息，自动识别语音生词；依据记忆遗忘规律，对多级生词库进行自动动态更新。The object of the present invention is to overcome the shortcomings and deficiencies of the prior art, and to provide a method for automatically constructing a new word library with multi-source information fusion, which automatically recognizes and reads new words according to user operation information; Automatically recognize new words in speech; automatically and dynamically update the multi-level new word database according to the law of memory forgetting.

本发明的目的通过下述技术方案实现：The object of the present invention is achieved through the following technical solutions:

一种多源信息融合的生词库自动构建方法，生词包括阅读生词和语音生词两类，生词库分为短时记忆和长时记忆两个多级生词库，该方法包括下述步骤：A method for automatically constructing a lexicon based on multi-source information fusion. The lexicon includes two types: reading vocabularies and phonetic vocabularies. The vocabularies are divided into two multi-level vocabularies of short-term memory and long-term memory. The method includes the following steps :

S1，生词自动识别；S1, automatic recognition of new words;

S11，根据用户操作信息，自动识别阅读生词；S11, according to the user operation information, automatically recognize and read new words;

所述用户操作信息包括操作反应时间和操作是否正确的信息，如果用户在规定时间内未做出操作或操作错误，则单词被识别为阅读生词；Described user operation information includes operation reaction time and the information whether the operation is correct, if the user does not make an operation within the specified time or the operation is wrong, the word is recognized as a new word for reading;

S12，根据用户口型、语音的信息，自动识别语音生词；S12, according to the information of the user's mouth shape and voice, automatically recognize the new words in the voice;

S2，生词库自动更新；S2, the new vocabulary is automatically updated;

S21，识别的阅读生词、语音生词分别自动存入阅读生词库和语音生词库；S21, the recognized new reading words and new phonetic words are automatically stored in the new reading word database and the new phonetic word database respectively;

S22，多级生词库的自动动态更新。S22, the automatic dynamic update of the multi-level lexicon.

优选地，所述S12还包括如下步骤：Preferably, the S12 further includes the following steps:

S121，如果语音识别设备在规定时间内未接收到用户语音或接收到错误的用户语音，但口型识别设备识别到用户的发音口型是正确的，则提示用户再读一遍；S121, if the voice recognition device does not receive the user's voice or receives an incorrect user's voice within the specified time, but the mouth shape recognition device recognizes that the user's pronunciation mouth shape is correct, then prompt the user to read it again;

S122，如果语音识别设备在规定时间内未接收到用户语音或接收到错误的用户语音，同时，口型识别设备识别到用户的发音口型也是错误的，则单词被识别为语音生词。S122, if the speech recognition device does not receive the user's voice within the specified time or receives an incorrect user's speech, and at the same time, the mouth shape recognition device recognizes that the user's pronunciation mouth shape is also wrong, then the word is recognized as a new speech word.

优选地，所述S22中多级生词库是根据记忆遗忘的一般规律划分的，可作为个性化记忆规律研究的基础，S22还包括如下步骤：Preferably, the multi-level new vocabulary database in the S22 is divided according to the general law of memory forgetting, which can be used as the basis for the study of the personalized memory law. S22 also includes the following steps:

S221，初次识别的生词自动存入需要在最短时间内进行复习的生词库；S221, the new words recognized for the first time are automatically stored in the new word database that needs to be reviewed in the shortest time;

S222，生词在记忆周期内完成正确记忆，自动转入下一级记忆周期更长的生词库；S222, the new word is correctly memorized within the memory period, and is automatically transferred to the next-level new word database with a longer memory period;

S223，生词在记忆周期内未完成正确记忆，自动转入上一级记忆周期更短的生词库；S223, the new word has not been correctly memorized within the memory period, and is automatically transferred to the new word library with a shorter memory period at the previous level;

S224，记忆周期最长的生词库中的生词在记忆周期内完成正确记忆后，从生词库中删除。S224, the new word in the new word database with the longest memory period is deleted from the new word database after the correct memory is completed within the memory period.

本发明与现有技术相比具有以下的有益效果：Compared with the prior art, the present invention has the following beneficial effects:

(1)本发明将用户操作、口型、语音、记忆等信息相融合，进行生词的自动识别和生词库的动态更新，实现了生词库构建过程的完全自动化，同以往用户先自行判断再手动确认的方式相比，可以提高学习效率；(1) The present invention fuses information such as user operation, mouth shape, voice, memory, etc., performs automatic identification of new words and dynamic update of the new word database, and realizes the complete automation of the construction process of the new word database. Compared with the manual confirmation method, the learning efficiency can be improved;

(2)本发明将生词划分为阅读生词和语音生词，同以往局限于对不认识或不熟悉的词进行记忆的方式相比，可同时对阅读能力和听说水平的提高起到促进作用；(2) the present invention divides new words into reading new words and phonetic new words, compared with the way of being limited to memorizing unrecognized or unfamiliar words in the past, the improvement of reading ability and listening and speaking level can be promoted simultaneously;

(3)本发明将生词库分为短时记忆和长时记忆两个多级生词库，为个性化记忆规律的研究提供了手段。(3) The present invention divides the lexicon into two multi-level lexicons of short-term memory and long-term memory, which provides a means for the study of individualized memory rules.

附图说明Description of drawings

图1为本发明的流程示意图；Fig. 1 is the schematic flow chart of the present invention;

图2为本发明的多源信息获取装置示意图；2 is a schematic diagram of a multi-source information acquisition device of the present invention;

图3为本发明的生词自动识别流程示意图；Fig. 3 is the schematic flow chart of the automatic recognition of new words of the present invention;

图4为本发明的阅读生词自动识别界面示意图；4 is a schematic diagram of an automatic recognition interface for reading new words of the present invention;

图5为本发明的语音生词自动识别界面示意图；Fig. 5 is the interface schematic diagram of the automatic recognition of speech new words of the present invention;

图6为本发明的生词库结构示意图；FIG. 6 is a schematic diagram of the structure of the vocabulary library of the present invention;

图7为本发明的生词库自动更新流程示意图；7 is a schematic diagram of the automatic updating process flow of the new vocabulary database of the present invention;

图8为本发明的多级生词库的自动动态更新流程示意图。FIG. 8 is a schematic diagram of the automatic dynamic update process of the multi-level new vocabulary database of the present invention.

具体实施方式Detailed ways

下面结合实施例及附图对本发明作进一步详细的描述，但本发明的实施方式不限于此。The present invention will be described in further detail below with reference to the embodiments and the accompanying drawings, but the embodiments of the present invention are not limited thereto.

本发明提出了一种多源信息融合的生词库自动构建方法，包括生词自动识别和生词库自动更新两个过程；该方法面向文献阅读和语音交流两类用途，将生词划分为阅读生词和语音生词；依据记忆遗忘规律，将生词库分为短时记忆和长时记忆两个多级生词库；该方法根据用户操作信息，自动识别阅读生词；根据用户口型、语音等信息，自动识别语音生词；依据记忆遗忘规律，对多级生词库进行自动动态更新。The invention proposes a method for automatically constructing a new word database with multi-source information fusion, which includes two processes of automatic recognition of new words and automatic update of the new word database; the method is oriented to two purposes of document reading and voice communication, and the new words are divided into reading new words. According to the law of memory and forgetting, the new word database is divided into two multi-level new word databases: short-term memory and long-term memory; this method automatically recognizes and reads new words according to user operation information; , automatically recognize the new words in speech; according to the law of memory forgetting, the multi-level new word database is automatically and dynamically updated.

具体来说，如图1～8所示，一种多源信息融合的生词库自动构建方法，生词包括阅读生词和语音生词两类，生词库分为短时记忆和长时记忆两个多级生词库，该方法包括下述步骤：Specifically, as shown in Figures 1 to 8, an automatic construction method of a new word database based on multi-source information fusion, the new words include two types of reading words and phonetic words, and the new word database is divided into two types: short-term memory and long-term memory. Multi-level lexicon, the method includes the following steps:

步骤一，生词自动识别。Step 1: Automatic recognition of new words.

(1)根据用户操作信息，自动识别阅读生词。(1) Automatically identify and read new words according to user operation information.

所述用户操作信息包括操作反应时间和操作是否正确的信息，如果用户在规定时间内未做出操作或操作错误，则单词被识别为阅读生词。The user operation information includes operation response time and information on whether the operation is correct. If the user does not perform the operation within the specified time or the operation is wrong, the word is recognized as a new word for reading.

(2)根据用户口型、语音的信息，自动识别语音生词。(2) According to the information of the user's mouth and voice, automatically recognize the new words in the voice.

如果语音识别设备在规定时间内未接收到用户语音或接收到错误的用户语音，但口型识别设备识别到用户的发音口型是正确的，则提示用户再读一遍；如果语音识别设备在规定时间内未接收到用户语音或接收到错误的用户语音，同时，口型识别设备识别到用户的发音口型也是错误的，则单词被识别为语音生词。If the voice recognition device does not receive the user's voice or receives the wrong user voice within the specified time, but the mouth-shape recognition device recognizes that the user's pronunciation is correct, it will prompt the user to read it again; If the user's voice is not received or an incorrect user's voice is received within the time, and at the same time, the mouth shape recognition device recognizes that the user's pronunciation mouth shape is also wrong, and the word is recognized as a new voice word.

步骤二，生词库自动更新。Step 2, the new vocabulary is automatically updated.

(1)识别的阅读生词、语音生词分别自动存入阅读生词库和语音生词库。(1) The recognized new reading words and new phonetic words are automatically stored in the new reading word database and the new phonetic word database, respectively.

(2)多级生词库的自动动态更新。(2) Automatic dynamic update of multi-level lexicon.

多级生词库是根据记忆遗忘的一般规律划分的，可作为个性化记忆规律研究的基础，其中：The multi-level lexicon is divided according to the general law of memory forgetting and can be used as the basis for the study of personalized memory law, among which:

初次识别的生词自动存入需要在最短时间内进行复习的生词库；生词在记忆周期内完成正确记忆，自动转入下一级记忆周期更长的生词库；生词在记忆周期内未完成正确记忆，自动转入上一级记忆周期更短的生词库；记忆周期最长的生词库中的生词在记忆周期内完成正确记忆后，从生词库中删除。The new words recognized for the first time are automatically stored in the new word library that needs to be reviewed in the shortest time; the new words are correctly memorized within the memory cycle, and automatically transferred to the next level new word database with a longer memory cycle; the new words are not completed within the memory cycle. For correct memorization, it will be automatically transferred to the vocabulary library with the shorter memory period at the previous level; the new words in the vocabulary library with the longest memory period will be deleted from the vocabulary library after the correct memory is completed within the memory period.

如图1所示，本实施例提出了一种多源信息融合的生词库自动构建方法100，包括生词自动识别300和生词库自动更新700两个过程。As shown in FIG. 1 , this embodiment proposes a method 100 for automatically constructing a new word database by fusion of multi-source information, which includes two processes of automatic word recognition 300 and automatic update 700 of the new word database.

所述多源信息包括用户操作、口型、语音、记忆等信息。如图2所示，多源信息获取装置200，包括但不限于摄像头201、触摸屏202、扬声器203、麦克风204以及用于计算和数据存储的软硬件系统。其中，摄像头201用于获取口型信息，触摸屏202用于获取用户操作信息，麦克风204用于获取语音信息。记忆信息随着时间的推移会逐渐遗忘，并且遵循先快后慢的一般规律。设置不同记忆周期的多级生词库，可以记录用户个性化记忆信息，为分析个性化记忆特点，形成个性化记忆规律提供了有效途径。The multi-source information includes user operation, mouth shape, voice, memory and other information. As shown in FIG. 2 , the multi-source information acquisition apparatus 200 includes, but is not limited to, a camera 201 , a touch screen 202 , a speaker 203 , a microphone 204 , and software and hardware systems for computing and data storage. Among them, the camera 201 is used to obtain mouth shape information, the touch screen 202 is used to obtain user operation information, and the microphone 204 is used to obtain voice information. Memorized information is gradually forgotten over time and follows the general pattern of fast first and then slow. Setting up multi-level lexicons with different memory periods can record the user's personalized memory information, which provides an effective way to analyze the characteristics of personalized memory and form personalized memory rules.

本发明面向文献阅读和语音交流两类用途，将生词划分为阅读生词和语音生词两类。如图3所示，生词自动识别300包括：根据用户操作信息，自动识别阅读生词301；根据用户口型、语音等信息，自动识别语音生词302等步骤。The invention is oriented to two types of uses of document reading and voice communication, and divides new words into two types: reading new words and phonetic new words. As shown in FIG. 3 , the automatic recognition of new words 300 includes: automatically recognizing and reading new words 301 according to user operation information;

如图4所示，为阅读生词自动识别信息的获取界面400，包括单词401、选项一403、选项二404、选项三405、选项四406。单词401和403、404、405、406四个选项之间的关系类似单项选择题，有且只有一个选项是正确的。当单词401是中文时，四个选项403、404、405、406是外文；当单词401是外文时，四个选项403、404、405、406是中文。单词401随机从单词库中抽取，并显示在屏幕上，对应的四个选项分别出现在403、404、405、406四个选项所示位置。界面400还可以包括提示信息402，用于显示选择是否正确、所用时长等用户操作信息；相关提示信息也可通过扬声器203播放出来。当单词401在屏幕上出现时，如果用户在规定时间内从403、404、405、406四个选项中做出正确选择，则402出现选择正确和所用时长等提示信息；如果用户在规定时间内未作出选择或选择错误，则402出现选择错误或超时的提示信息，单词401被识别为阅读生词。As shown in FIG. 4 , an acquisition interface 400 for automatic recognition information for reading new words includes word 401 , option one 403 , option two 404 , option three 405 , and option four 406 . The relationship between the word 401 and the four options 403, 404, 405, and 406 is similar to a multiple-choice question, and only one option is correct. When the word 401 is in Chinese, the four options 403, 404, 405, and 406 are in foreign languages; when the word 401 is in foreign languages, the four options 403, 404, 405, and 406 are in Chinese. The word 401 is randomly selected from the word bank and displayed on the screen, and the corresponding four options appear in the positions indicated by the four options 403, 404, 405, and 406 respectively. The interface 400 may further include prompt information 402 , which is used to display user operation information such as whether the selection is correct, the duration used, and the like; the related prompt information can also be played through the speaker 203 . When the word 401 appears on the screen, if the user makes a correct choice from the four options 403, 404, 405, and 406 within the specified time, prompt information such as the correct choice and the time used will appear at 402; if the user makes the correct choice within the specified time If no selection is made or the selection is wrong, a prompt message of selection error or timeout appears in 402, and the word 401 is recognized as a new word for reading.

本发明对用户选择操作的规定时间不做限制，可以是10秒以内的某个时长，也可以是根据用户个性化记忆特点计算得到的时长。The present invention does not limit the specified time for the user's selection operation, which may be a certain time length within 10 seconds, or may be a time length calculated according to the user's personalized memory characteristics.

界面400还可以包括游戏人物407，用游戏的娱乐性克服单词记忆的枯燥性。用户连续做出3次正确选择，游戏人物407开始跳舞；否则，游戏人物407停止跳舞。The interface 400 may also include game characters 407 to overcome the tediousness of word memory with the entertainment of the game. If the user makes three correct choices in a row, the game character 407 starts dancing; otherwise, the game character 407 stops dancing.

如图5所示，为语音生词自动识别信息的获取界面500，包括单词401。单词401随机从单词库中抽取，并显示在屏幕上。界面500还可以包括提示信息402，用于显示发音是否正确、是否需要重新朗读、所用时长等信息；相关提示信息也可通过扬声器203播放出来。如果麦克风204在规定时间内接收到用户正确的单词朗读语音，则402出现发音正确和所用时长等提示信息；如果麦克风204在规定时间内未接收到用户语音或接收到错误的用户语音，但摄像头201识别到用户的发音口型是正确的，则402出现重新朗读的提示信息，计时重新开始；如果麦克风204在规定时间内未接收到用户语音或接收到错误的用户语音，同时，摄像头201识别到用户的发音口型也是错误的，则402出现发音错误或超时的提示信息，单词401被识别为语音生词。As shown in FIG. 5 , an acquisition interface 500 for automatic recognition information of new words in speech includes words 401 . The word 401 is randomly drawn from the word bank and displayed on the screen. The interface 500 may also include prompt information 402 for displaying information such as whether the pronunciation is correct, whether it needs to be re-read aloud, the duration of use, etc.; the related prompt information can also be played through the speaker 203 . If the microphone 204 receives the user's correct word reading voice within the specified time, prompt information such as the correct pronunciation and the time used will appear in 402; 201 recognizes that the user's pronunciation and mouth shape is correct, then 402 appears a prompt message of re-reading, and the timing restarts; if the microphone 204 does not receive the user's voice within the specified time or receives an incorrect user's voice, at the same time, the camera 201 recognizes If the pronunciation of the user is also wrong, a prompt message 402 of a pronunciation error or a timeout appears, and the word 401 is recognized as a new voice word.

本发明对麦克风204接收到用户正确单词朗读语音的规定时间不做限制，可以是10秒以内的某个时长，也可以是根据用户个性化记忆特点计算得到的时长。The present invention does not limit the specified time for the microphone 204 to receive the user's correct word reading voice, which may be a certain time within 10 seconds, or a time calculated according to the user's personalized memory characteristics.

界面500还可以包括游戏人物407，用游戏的娱乐性克服单词记忆的枯燥性。用户连续做出3次正确发音，游戏人物407开始跳舞；否则，游戏人物407停止跳舞。The interface 500 may also include game characters 407 to overcome the tediousness of word memory with the entertainment of the game. The user makes three consecutive correct pronunciations, and the game character 407 starts dancing; otherwise, the game character 407 stops dancing.

本发明对语音、口型识别方法不做限定，可以利用人工智能方法，通过音频、口型比对，判断单词发音、发音口型是否正确。The present invention does not limit the speech and mouth shape recognition methods, and can use artificial intelligence methods to judge whether the pronunciation of the word and the mouth shape are correct through audio and mouth shape comparison.

如图6所示，生词库600包括阅读生词库601和语音生词库602。由于记忆的保持在时间上是不同的，有短时记忆和长时记忆两种。输入信息在经过人的注意过程的学习后，便成为了人的短时记忆，但是如果不经过及时复习，这些记忆就会遗忘，而经过了及时的复习，这些短时记忆就会成为了人的一种长时记忆，在大脑中保存着很长的时间。因此，本发明将601、602分别细分为多级短时记忆生词库603和多级长时记忆生词库604。As shown in FIG. 6 , the new word database 600 includes a new reading word database 601 and a new speech word database 602 . Since the retention of memory is different in time, there are two kinds of short-term memory and long-term memory. After the input information is learned through the human attention process, it becomes human short-term memory, but if it is not reviewed in time, these memories will be forgotten, and after timely review, these short-term memories will become human. A type of long-term memory that remains in the brain for a long time. Therefore, the present invention subdivides 601 and 602 into a multi-level short-term memory lexicon 603 and a multi-level long-term memory lexicon 604 respectively.

本发明对多级生词库的级数不做限定，依据记忆遗忘一般规律，短时记忆多级生词库可以包括5分钟生词库、30分钟生词库和12小时生词库；长时记忆多级生词库可以包括1天生词库、2天生词库、4天生词库、7天生词库和15天生词库。The present invention does not limit the number of levels of the multi-level lexicon. According to the general law of memory forgetting, the short-term memory multi-level lexicon may include a 5-minute lexicon, a 30-minute lexicon and a 12-hour lexicon; The multi-level lexicon of time memory can include 1-generation lexicon, 2-generation lexicon, 4-generation lexicon, 7-generation lexicon and 15-generation lexicon.

如图7所示，生词库自动更新700包括：识别的阅读生词、语音生词分别自动存入阅读生词库和语音生词库701；多级生词库的自动动态更新800等步骤。As shown in FIG. 7 , the automatic update 700 of the new vocabulary includes: automatically storing the recognized new reading words and new voice words in the new reading vocabulary and the new pronunciation vocabulary 701 respectively; automatic dynamic updating 800 of the multi-level new vocabulary and other steps.

如图8所示，多级生词库的自动动态更新800包括：初次识别的生词自动存入需要在最短时间内进行复习的生词库801；生词在记忆周期内完成正确记忆，自动转入下一级记忆周期更长的生词库802；生词在记忆周期内未完成正确记忆，自动转入上一级记忆周期更短的生词库803、记忆周期最长的生词库中的生词在记忆周期内完成正确记忆后，从生词库中删除804等步骤。例如，初次识别的生词自动存入5分钟生词库，存入时刻为t，如果用户在t+5分钟内完成正确记忆，则该生词由5分钟生词库转入30分钟生词库；如果用户在t+30分钟内完成正确记忆，则该生词由30分钟生词库转入12小时生词库；否则，该生词由30分钟生词库调整回5分钟生词库。一般而言，当一个生词在15天内，经过3级短时记忆生词库和5级长时记忆生词库共计8次正确记忆，则该生词对用户而言不再是生词，最终从15天生词库中删除。As shown in FIG. 8 , the automatic dynamic update 800 of the multi-level new word database includes: the new words recognized for the first time are automatically stored in the new word database 801 that needs to be reviewed in the shortest time; The new word bank 802 with a longer memory period at the next level; if the new word is not correctly memorized within the memory period, it is automatically transferred to the new word bank 803 with a shorter memory period at the previous level and the new word in the new word library with the longest memory period After completing the correct memorization in the memory cycle, the steps such as 804 are deleted from the new vocabulary database. For example, a new word recognized for the first time is automatically stored in the 5-minute new word database, and the storage time is t. If the user completes the correct memory within t + 5 minutes, the new word will be transferred from the 5-minute new word database to the 30-minute new word database; If the user completes the correct memorization within t+30 minutes, the new word will be transferred from the 30-minute vocabulary database to the 12-hour vocabulary library; otherwise, the new word will be adjusted from the 30-minute vocabulary library back to the 5-minute vocabulary library. Generally speaking, when a new word has been correctly memorized 8 times in the 3-level short-term memory lexicon and 5-level long-term memory lexicon within 15 days, the new word is no longer a new word for the user, and finally from 15 Deleted from the natural thesaurus.

本发明对一种多源信息融合的生词库自动构建方法的生词自动识别和生词库自动更新过程作了进一步说明，不涉及用户个性化记忆规律的获取方法。但是，将生词库分为短时记忆、长时记忆两个多级生词库，可以为用户个性化记忆规律的研究提供支撑。例如：初次识别的生词自动存入5分钟生词库，存入时刻为t。用户A在t+5分钟内没进行记忆，则该生词保留在5分钟生词库中；如果用户A在[t+5,t+30]分钟范围内进行了记忆，但不正确，则该生词保留在5分钟生词库中。用户B在t+5分钟内没进行记忆，则该生词保留在5分钟生词库中；如果用户B在[t+5,t+30]分钟范围内进行了正确记忆，则该生词由5分钟生词库直接转入12小时生词库。如果上述情况发生的概率足够高，可以得出结论：用户B的短时记忆力比用户A的好。当用户A在t+5分钟范围内进行了正确记忆，则所记生词由5分钟生词库转入30分钟生词库；如果用户B在t+5分钟范围内进行了正确记忆，则所记生词可以由5分钟生词库直接转入12小时生词库。这样，最终把个性化记忆规律的差别体现到多级生词库的动态更新中，可以进一步提高学习效率。The present invention further describes the process of automatic recognition of new words and automatic update of the new word database of a method for automatically constructing a new word database based on multi-source information fusion, and does not involve a method for obtaining a user's personalized memory rule. However, dividing the new word database into two multi-level word databases of short-term memory and long-term memory can provide support for the research on the user's personalized memory rules. For example, the new words recognized for the first time are automatically stored in the 5-minute new word database, and the storage time is t. If user A does not memorize within t+5 minutes, the new word will remain in the 5-minute new vocabulary database; if user A has memorized it within [t+5, t+30] minutes, but it is incorrect, the New words are kept in the 5-minute vocabulary. If user B does not memorize within t+5 minutes, the new word will be kept in the 5-minute vocabulary database; if user B has memorized correctly within [t+5, t+30] minutes, the new word will be changed from 5 to 5 minutes. The minute vocabulary is directly transferred to the 12-hour vocabulary. If the probability of the above occurrence is high enough, it can be concluded that user B's short-term memory is better than user A's. When user A has memorized correctly within the range of t+5 minutes, the memorized new words will be transferred from the 5-minute vocabulary database to the 30-minute vocabulary library; if user B has memorized correctly within the range of t+5 minutes, then Memorized words can be directly transferred from the 5-minute lexicon to the 12-hour lexicon. In this way, the differences in the individualized memory rules are finally reflected in the dynamic update of the multi-level lexicon, which can further improve the learning efficiency.

本发明将用户操作、口型、语音、记忆等信息相融合，进行生词的自动识别和生词库的动态更新，实现了生词库构建过程的完全自动化，同以往用户先自行判断再手动确认的方式相比，可以提高学习效率；将生词划分为阅读生词和语音生词，同以往局限于对不认识或不熟悉的词进行记忆的方式相比，可同时对阅读能力和听说水平的提高起到促进作用；将生词库分为短时记忆和长时记忆两个多级生词库，为个性化记忆规律的研究提供了手段。The invention integrates information such as user operation, mouth shape, voice, memory, etc., to automatically recognize new words and dynamically update the new word database, and realizes the complete automation of the construction process of the new word database. In the past, users first judged themselves and then manually confirmed. Compared with the way of memorizing unfamiliar or unfamiliar words in the past, the new words can be divided into reading new words and phonetic new words, which can improve reading ability and listening level at the same time. It plays a promoting role; divides the vocabulary database into two multi-level vocabulary libraries of short-term memory and long-term memory, which provides a means for the study of personalized memory rules.

上述为本发明较佳的实施方式，但本发明的实施方式并不受上述内容的限制，其他的任何未背离本发明的精神实质与原理下所作的改变、修饰、替代、组合、简化，均应为等效的置换方式，都包含在本发明的保护范围之内。The above are the preferred embodiments of the present invention, but the embodiments of the present invention are not limited by the above-mentioned contents, and any other changes, modifications, substitutions, combinations, and simplifications made without departing from the spirit and principle of the present invention are all Should be equivalent replacement manners, all are included within the protection scope of the present invention.

Claims

1. the automatic construction method of a new word bank of multi-source information fusion, it is characterized in that, new word comprises two classes of reading new word and phonetic new word, and new word bank is divided into two multi-level new word banks of short-term memory and long-term memory, The method includes the following steps:

S1, automatic recognition of new words;

S11, according to the user operation information, automatically recognize and read new words;

Described user operation information includes operation reaction time and the information whether the operation is correct, if the user does not make an operation within the specified time or the operation is wrong, the word is recognized as a new word for reading;

S12, according to the information of the user's mouth shape and voice, automatically recognize the new words in the voice;

S2, the new vocabulary is automatically updated;

S21, the recognized new reading words and new phonetic words are automatically stored in the new reading word database and the new phonetic word database respectively;

S22, the automatic dynamic update of the multi-level lexicon.

2. the method for automatically constructing the new vocabulary library of multi-source information fusion according to claim 1, is characterized in that, described S12 also comprises the steps:

S121, if the voice recognition device does not receive the user's voice or receives an incorrect user's voice within the specified time, but the mouth shape recognition device recognizes that the user's pronunciation mouth shape is correct, then prompt the user to read it again;

S122, if the speech recognition device does not receive the user's voice within the specified time or receives an incorrect user's speech, and at the same time, the mouth shape recognition device recognizes that the user's pronunciation mouth shape is also wrong, then the word is recognized as a new speech word.

3. the automatic construction method of the new vocabulary library of multi-source information fusion according to claim 1, is characterized in that, in described S22, the multi-level new vocabulary library is divided according to the general rule of memory forgetting, can be used as personalized memory rule Based on the research, S22 also includes the following steps:

S221, the new words recognized for the first time are automatically stored in the new word database that needs to be reviewed in the shortest time;

S222, the new word is correctly memorized within the memory period, and is automatically transferred to the next-level new word database with a longer memory period;

S223, the new word is not correctly memorized within the memory period, and it is automatically transferred to the new word bank with a shorter memory period at the previous level;

S224, the new word in the new word database with the longest memory period is deleted from the new word database after the correct memory is completed within the memory period.