WO2015096429A1 - Call voice recognition method and apparatus - Google Patents
Call voice recognition method and apparatus Download PDFInfo
- Publication number
- WO2015096429A1 WO2015096429A1 PCT/CN2014/080661 CN2014080661W WO2015096429A1 WO 2015096429 A1 WO2015096429 A1 WO 2015096429A1 CN 2014080661 W CN2014080661 W CN 2014080661W WO 2015096429 A1 WO2015096429 A1 WO 2015096429A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- sound
- call
- model library
- sample
- sound model
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 34
- 238000000605 extraction Methods 0.000 claims description 16
- 239000013598 vector Substances 0.000 claims description 16
- 238000005070 sampling Methods 0.000 claims description 14
- 238000012545 processing Methods 0.000 claims description 3
- 238000004891 communication Methods 0.000 description 14
- 238000010586 diagram Methods 0.000 description 10
- 238000005516 engineering process Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000012423 maintenance Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
- G10L17/04—Training, enrolment or model building
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
- G10L17/06—Decision making techniques; Pattern matching strategies
- G10L17/08—Use of distortion metrics or a particular distance between probe pattern and reference templates
Definitions
- the present invention relates to the field of mobile applications, and in particular to a method and apparatus for recognizing a voice of a call.
- BACKGROUND OF THE INVENTION At present, communication technology has been greatly developed. While the communication industry is developing rapidly, criminal activities using these means of communication for fraud are becoming increasingly rampant, and telephone fraud is one of them. Telephone fraud, that is, using the phone for fraudulent activities, an important means of fraud by criminals is to scam by calling the victim's acquaintance to call the victim. In many cases, the victim cannot immediately distinguish the opposite caller by voice. Identity, or because the face does not promptly challenge the identity of the other party, may lead to fraud.
- a call voice recognition method including: acquiring a sound sample of a call object that performs a call; comparing the sound sample with a sound in a sound model library; and speaking the call according to the comparison result The sound is identified.
- the method further includes: sampling and saving the sound of the contact in the address book of the mobile terminal to establish a sound model library, where The sound model library is stored in the remote server and/or in the mobile terminal.
- Sampling and saving the voice of the contact in the address book of the mobile terminal includes: extracting the sampled sound into a sound vector, and converting the digital vector into a digital vector.
- Comparing the sound sample with the sound in the sound model library includes: acquiring a counterpart number of the call; searching for a sound in the sound model library according to the counterpart number, and comparing the sound sample with the found sound Compare.
- the method further includes: comparing the sound sample with all the sounds in the sound model library. Identifying the call voice according to the comparison result includes: when the similarity of the sound found in the sound sample and the sound model library is greater than or equal to a threshold, identifying the call object as the sound model library The user corresponding to the middle sound model; when the similarity between the sound sample and the sound found in the sound model library is less than a threshold, it is confirmed that the call object is a stranger. The method further includes: notifying the mobile terminal of the recognition result of the call object.
- a call voice recognition apparatus including: an acquisition module, configured to acquire a sound sample of a call object that performs a call of the mobile terminal; and a comparison module configured to set the sound sample and the sound model The sounds in the library are compared; the recognition module is arranged to recognize the call sound based on the comparison result.
- the device further includes: a saving module, configured to sample and save the sound of the contact in the address book of the mobile terminal, to establish a sound model library, wherein the sound model library is stored in the remote server and / or in the mobile terminal.
- the saving module includes: an extracting unit configured to perform sound feature extraction on the sampled sound and convert the image into a digital vector; and the saving unit is configured to save the digital vector.
- the comparison module includes: an obtaining unit configured to acquire a counterpart number of the call; a comparing unit configured to search for a sound in the sound model library according to the counterpart number, and the sound sample and the found sound Compare.
- the comparison module is further configured to compare the sound samples with all of the sounds in the sound model library in the event that the sound search fails in the sound model library according to the counterpart number.
- the comparison module and the identification module are located in the mobile terminal or in a server on the network side.
- the identification module is configured to identify the call object as a sound model corresponding to the sound model library when the similarity between the sound sample and the sound found in the sound model library is greater than or equal to a threshold value
- the user confirms that the call object is a stranger when the similarity between the sound sample and the sound found in the sound model library is less than a threshold.
- the device further includes: a notification module, configured to notify the mobile terminal of the recognition result of the call object. According to the present invention, a sound sample for acquiring a call object for making a call is used; the sound sample is compared with the sound in the sound model library; and the call sound is recognized according to the comparison result, and the terminal is unable to pass the call sound in the related art.
- FIG. 1 is a flowchart of a voice recognition method according to an embodiment of the present invention
- FIG. 2 is a block diagram of a voice recognition apparatus according to an embodiment of the present invention
- FIG. 3 is a voice recognition of a voice according to an embodiment of the present invention.
- FIG. 4 is an optional block diagram 2 of a call voice recognition apparatus according to an embodiment of the present invention
- FIG. 5 is an optional block diagram 3 of a call voice recognition apparatus according to an embodiment of the present invention
- FIG. 7 is a block diagram of a call voice recognition system module according to an embodiment of the present invention
- FIG. 8 is a flow chart of a call voice recognition function according to an embodiment of the present invention.
- Step S102 Acquire a call The sound sample of the call object
- Step S104 comparing the sound sample with the sound in the sound model library
- Step S106 identifying the call sound according to the comparison result.
- the obtained sound sample of the call object is compared with the sound stored in the sound model library in advance, and the call sound is recognized according to the comparison result.
- the terminal cannot distinguish the opposite call by the call voice.
- the identity of the person can identify the voice of the call at the opposite end of the call, and then identify the identity of the person at the opposite end of the call, so that the mobile terminal user can determine whether the opposite end of the call is a stranger. More preferably, the user can select whether to continue the call or adjust the content of the call according to the result of the judgment, and can also select an alarm, thereby effectively reducing the occurrence of the mobile phone fraud event and improving the security.
- the sound model library may be pre-established prior to comparing the sound samples to the sounds in the sound model library. The establishment of the sound model library can be implemented in various ways. In this embodiment, a relatively good implementation manner is provided. In this manner, the sound model library is established through the address book of the mobile terminal.
- the voice of the contact is set up and saved, wherein the sound model library is stored in the remote server and/or in the mobile terminal.
- the sampling process may be to select a recording and get a sound sample of the contact each time a call to the contact is received.
- the user knows the voice of the contact, so that a more accurate sound sample can be obtained.
- the sound model library may be corresponding to each user.
- both user A and user B have their own sound model libraries.
- the sound database can also be shared by multiple users or a group of users. For example, all users of a company or a group share a sound model library, and the shared sound model library can be concentrated after each user records the sound sample by himself. Formed together.
- the operator can use the obtained sound samples of all users as a large sound model library, and the sound model library can provide users with more comprehensive voice recognition.
- the sampling process and the saving of the voice of the contact may be implemented in various manners.
- a preferred implementation manner is provided.
- the sound obtained by the sampling may be extracted and converted.
- the digital vector is saved, and then the voice of the contact in the address book of the mobile terminal is sampled and saved.
- there are many ways to obtain a party There is a relatively straightforward way to obtain the party number of the call, find the voice in the voice model library according to the number of the party, and find the voice sample and the sound. The sound is compared.
- the other party number exists in the address book of the mobile terminal, and the sound model library is sampled and saved by the voice of the contact in the address book, the other party number is directly searched in the sound model library in the sound model library.
- the sound in the middle compares the sound sample with the found sound; when the other party number is not in the address book of the mobile terminal, finds whether the other party's number has a corresponding sound in the sound model library, if there is a corresponding sound , compares the sound sample to the sound you find.
- the sound samples can be compared with all the sounds in the sound model library in the case where the sound search fails in the sound model library according to the counterpart number.
- a similarity determination method may be adopted for the recognition of the sound.
- the call object When the similarity of the sounds found in the sound sample and the sound model library is greater than or equal to the threshold, the call object is identified as the sound model library. The user corresponding to the sound model; when the similarity of the sounds found in the sound sample and the sound model library is less than the threshold, the call object is confirmed to be a stranger.
- the recognition result of the call object may also be notified to the mobile terminal.
- a call voice recognition device is also provided, and the device is used to implement the foregoing device. The description of the device in the device is not described here.
- the name of the module in the device should not be understood as The module is defined, for example, an acquisition module, which is set to obtain a sound sample of a call object for making a call, and may also be expressed as "a module for acquiring a sound sample of a call object for making a call", the module described below
- the function can be implemented by the processor.
- 2 is a block diagram of a call voice recognition apparatus according to an embodiment of the present invention. As shown in FIG. 2, the method includes: an acquisition module 22, a comparison module 24, and an identification module 26.
- the obtaining module 22 is configured to obtain a sound sample of the call object that performs the call; the comparing module 24 is configured to compare the sound sample with the sound in the sound model library; and the identifying module 26 is configured to The call voice is recognized.
- the comparison module 24 and the identification module 26 may be located in the mobile terminal or in a server on the network side.
- 3 is an optional block diagram of a call voice recognition apparatus according to an embodiment of the present invention. As shown in FIG. 3, the apparatus further includes: a saving module 32 configured to sample a voice of a contact in an address book of the mobile terminal. Processing and saving to build a sound model library, wherein the sound model library is stored in the remote server and/or in the mobile terminal.
- the saving module 32 includes: an extracting unit 42 configured to perform sound feature extraction on the sampled sound and convert it into a digital vector.
- the save unit 44 is set to save the digital vector.
- 5 is an optional block diagram 3 of a call voice recognition apparatus according to an embodiment of the present invention.
- the comparison module 24 includes: an acquisition unit 52 configured to acquire a counterpart number of a call; and a comparison unit 54 configured to The number looks up the sound in the sound model library and compares the sound sample to the found sound.
- the comparison module 24 is further configured to compare the sound samples to all of the sounds in the sound model library in the event that the sound search fails in the sound model library based on the counterpart number.
- the identification module 26 is configured to identify the call object as a user corresponding to the sound model in the sound model library when the similarity of the sounds found in the sound sample and the sound model library is greater than or equal to the threshold value; When the similarity between the sound sample and the sound found in the sound model library is less than the threshold, it is confirmed that the call object is a stranger.
- FIG. 6 is an optional block diagram of a call voice recognition apparatus according to an embodiment of the present invention. As shown in FIG.
- the apparatus further includes: a notification module 62, configured to notify the mobile terminal of the recognition result of the call object.
- a notification module 62 configured to notify the mobile terminal of the recognition result of the call object.
- the apparatus in this alternative embodiment includes two subsystems: a front end subsystem and a back end subsystem.
- the front-end subsystem can include four modules, namely: 1. a user interface interface module; 2. a sound sampling module; 3.
- the back-end subsystem includes five modules, which are: 1. User configuration management module; 2. Sound feature extraction module; 3. Sound model creation module; 4. Sound recognition module; 5. Communication interface module.
- the voice recognition module implements the functions of the comparison module 24 and the recognition module 26 described above. These modules are described below.
- Sound Sampling Module responsible for capturing the voice of the other party's speaker during the call, and then handing it over to the sound feature extraction module of the front-end subsystem.
- Sound Feature Extraction Module responsible for converting the acquired sound extraction features into digital vectors.
- Sound Model Creation Module responsible for establishing a sound model for the sound digital vector after feature extraction.
- Voice recognition module Used to identify the identity of the caller based on the voice.
- FIG. 7 is a block diagram of a call voice recognition system module according to an embodiment of the present invention.
- the front end subsystem includes: a user interface interface module, a sound sampling module, a sound feature extraction module, and a communication interface module.
- the backend subsystem includes: a user configuration management module, a sound feature extraction module, a voice recognition module, a sound model creation module, and a communication interface module.
- the front-end subsystem of the device can be deployed to the user's smartphone, and the back-end subsystem of the device can be deployed to the user's smartphone or deployed to the back-end server. If the back-end subsystem is deployed on the smartphone, the front-end subsystem and the back-end subsystem use the internal communication communication mode of the mobile phone operating system. If the back-end subsystem is deployed to the back-end server, the front-end subsystem and the back-end subsystem use wifi or 3G network communication method.
- the backend subsystem is responsible for creating and storing the voice model of the contacts in the address book for the mobile phone user, and the front end subsystem is responsible for sampling the voice of the opposite speaker during the mobile phone call, and then uploading the sampled and feature extracted sound samples to the rear terminal.
- FIG. 8 is a flowchart of a call voice recognition function according to an embodiment of the present invention. As shown in FIG. 8, the process includes the following steps:
- the phone received an incoming call.
- the front-end subsystem of the device will match the phone address book to confirm whether the caller number belongs to the existing number in the address book. If the caller number belongs to the existing number in the address book, go to S803; if the caller number does not belong to the existing number in the address book, go to S804.
- the front-end subsystem of the device queries the user address book to confirm whether the number has a sound model in the sound model library. If the number already has a sound model in the sound model library, go to S804; otherwise, go to S807.
- the front end subsystem sound feature extraction module of the device picks up the voice of the opposite caller in the sample call, and performs feature extraction, and then proceeds to S805.
- the front-end subsystem inputs the sound feature extracted by the sound feature extraction module of the S804 as a voice input module input to the back-end subsystem, and the voice recognition module identifies the opposite caller of the call according to the sound model in the sound model library.
- Identity S806.
- the user interface interface module module notifies the mobile phone user of the identity of the peer speaker.
- the sound sampling module of the front end subsystem of the device uploads the sampled sound sample to the back end subsystem using the communication module, and the sound feature extraction module of the back end subsystem Feature extraction is performed on this sound sample, and then go to S808.
- the sound model building module of the back end subsystem constructs a sound model by extracting the sound samples from the feature, and then deposits the sound model into the sound model library.
- the method or device of the alternative embodiment is different from the previous method of human judgment, and the voice of the mobile phone is discriminated by a non-manual method, which can effectively prevent the mobile phone user from being deceived in the telephone fraud.
- the above modules or steps of the present invention can be implemented by a general-purpose computing device, which can be concentrated on a single computing device or distributed over a network composed of multiple computing devices.
- the computing device may be implemented by program code executable by the computing device, such that they may be stored in the storage device by the computing device, or they may be separately fabricated into individual integrated circuit modules, or they may be Multiple modules or steps are made into a single integrated circuit module.
- the invention is not limited to any specific combination of hardware and software.
- the above is only an alternative embodiment of the present invention, and is not intended to limit the present invention, and various modifications and changes can be made to the present invention. Any modifications, equivalent substitutions, improvements, etc. made within the spirit and scope of the present invention are intended to be included within the scope of the present invention.
- the present invention relates to the field of mobile applications, which adopts a sound sample for acquiring a call object for making a call; compares the sound sample with the sound in the sound model library; and recognizes the call sound according to the comparison result, and solves the related technology Because the terminal can not identify the identity of the opposite party through the voice of the call, it is easy to cause the problem of the fraud event, and the terminal can identify the identity of the opposite party by the voice of the call, thereby improving the security.
Landscapes
- Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Telephone Function (AREA)
- Telephonic Communication Services (AREA)
Abstract
Disclosed are a call voice recognition method and an apparatus, wherein the method comprises: obtaining a voice sample of a call target who is placing a call; comparing the voice sample with voice in a voice model library; and recognizing call voice based on a comparison result. The present invention resolves a problem that a fraud event is easy to occur because a terminal in related art cannot distinguish an identity of a peer-end call person through call voice, enables the terminal to distinguish the identity of the peer-end call person through the call voice, and improves security.
Description
通话声音识别方法及装置 技术领域 本发明涉及移动应用领域, 具体而言, 涉及通话声音识别方法及装置。 背景技术 目前, 通信技术得到了很大的发展, 在通信业迅猛发展的同时, 利用这些通讯手 段进行诈骗的犯罪活动也日渐猖獗, 电话诈骗就是其中一种。 电话诈骗, 即利用电话 进行诈骗活动, 犯罪分子一种重要的诈骗手段就是通过冒充受害人熟人给受害人打电 话来进行诈骗, 许多时候, 受害人并不能通过声音立即分辨出对端通话人的身份, 或 者碍于面子没有及时对对方身份提出质疑, 因此可能会导致诈骗事件的发生。 针对相关技术中, 终端因不能通过通话声音辨别对端通话人的身份, 容易导致诈 骗事件发生的问题, 目前还没有提出合理的解决方案。 发明内容 本发明提供了通话声音识别方法及装置, 以至少解决相关技术中终端因不能通过 通话声音辨别对端通话人的身份, 容易导致诈骗事件发生的问题。 根据本发明的一个方面, 提供了一种通话声音识别方法, 包括: 获取进行通话的 通话对象的声音样本; 将所述声音样本与声音模型库中的声音进行比较; 根据比较结 果对所述通话声音进行识别。 将所述声音样本与声音模型库中的声音进行比较之前, 所述方法还包括: 对移动 终端的通讯录中的联系人的声音进行采样处理和保存, 以建立声音模型库, 其中, 所 述声音模型库存储在远程服务器中和 /或所述移动终端中。 对所述移动终端的通讯录中的联系人的声音进行采样处理和保存包括: 将所述采 样得到的声音进行声音特征提取, 转化为数字向量, 将所述数字向量进行保存。 将所述声音样本与声音模型库中的声音进行比较包括:获取所述通话的对方号码; 根据所述对方号码在所述声音模型库中查找声音, 并将所述声音样本与查找到的声音 进行比较。
在根据所述对方号码在所述声音模型库中查找声音失败的情况下, 所述方法还包 括: 将所述声音样本与所述声音模型库中所有的声音进行比较。 根据比较结果对所述通话声音进行识别包括: 在所述声音样本与所述声音模型库 中查找到的声音的相似度大于或等于阈值时, 则将所述通话对象识别为所述声音模型 库中声音模型所对应的用户; 在所述声音样本与所述声音模型库中查找到的声音的相 似度小于阈值时, 则确认所述通话对象为陌生人。 所述方法还包括: 将所述通话对象的识别结果通知给所述移动终端。 根据本发明的另一个方面, 还提供了一种通话声音识别装置, 包括: 获取模块, 用于获取进行移动终端通话的通话对象的声音样本; 比较模块, 设置为将所述声音样 本与声音模型库中的声音进行比较; 识别模块, 设置为根据比较结果对所述通话声音 进行识别。 所述装置还包括: 保存模块, 设置为对所述移动终端的通讯录中的联系人的声音 进行采样处理和保存, 以建立声音模型库, 其中, 所述声音模型库存储在远程服务器 中和 /或所述移动终端中。 所述保存模块包括: 提取单元,设置为将所述采样得到的声音进行声音特征提取, 转化为数字向量; 保存单元, 设置为将所述数字向量进行保存。 所述比较模块包括: 获取单元, 设置为获取所述通话的对方号码; 比较单元, 设 置为根据所述对方号码在所述声音模型库中查找声音, 并将所述声音样本与查找到的 声音进行比较。 所述比较模块还设置为在根据所述对方号码在所述声音模型库中查找声音失败的 情况下, 将所述声音样本与所述声音模型库中所有的声音进行比较。 所述比较模块和所述识别模块位于所述移动终端中或位于网络侧的服务器中。 所述识别模块设置为在所述声音样本与所述声音模型库中查找到的声音的相似度 大于或等于阈值时, 则将所述通话对象识别为所述声音模型库中声音模型所对应的用 户; 在所述声音样本与所述声音模型库中查找到的声音的相似度小于阈值时, 则确认 所述通话对象为陌生人。 所述装置还包括: 通知模块, 设置为将所述通话对象的识别结果通知给所述移动 终端。
通过本发明, 采用了获取进行通话的通话对象的声音样本; 将该声音样本与声音 模型库中的声音进行比较; 根据比较结果对通话声音进行识别, 解决了相关技术中终 端因不能通过通话声音辨别对端通话人的身份, 容易导致诈骗事件发生的问题, 实现 了终端能够通过通话声音辨别对端通话人的身份, 提高了安全性。 附图说明 此处所说明的附图用来提供对本发明的进一步理解, 构成本申请的一部分, 本发 明的示意性实施例及其说明用于解释本发明, 并不构成对本发明的不当限定。 在附图 中: 图 1是根据本发明实施例的通话声音识别方法的流程图; 图 2是根据本发明实施例的通话声音识别装置的框图; 图 3是根据本发明实施例的通话声音识别装置的可选框图一; 图 4是根据本发明实施例的通话声音识别装置的可选框图二; 图 5是根据本发明实施例的通话声音识别装置的可选框图三; 图 6是根据本发明实施例的通话声音识别装置的可选框图四; 图 7是根据本发明实施例的通话声音识别系统模块组成图; 图 8是根据本发明实施例的通话声音识别功能流程图。 具体实施方式 需要说明的是, 在不冲突的情况下, 本申请中的实施例及实施例中的特征可以相 互组合。 下面将参考附图并结合实施例来详细说明本发明。 在本实施例中, 提供了一种通话声音识别方法, 图 1是根据本发明实施例的通话 声音识别方法的流程图, 如图 1所示, 该流程包括如下步骤: 步骤 S102, 获取进行通话的通话对象的声音样本; 步骤 S104, 将声音样本与声音模型库中的声音进行比较; 步骤 S106, 根据比较结果对通话声音进行识别。
通过上述步骤, 将获取到的通话对象的声音样本与预先存储在声音模型库中的声 音进行比较, 根据比较结果识别该通话声音, 相比于现有技术中终端不能通过通话声 音辨别对端通话人的身份, 通过上述步骤可识别通话对端的通话声音, 进而对通话对 端人的身份进行辨别, 方便移动终端用户判断通话对端是否是陌生人。 更优地, 用户 可根据判断的结果选择是否继续通话或者调整通话的内容, 还可以选择报警, 从而可 有效降低手机诈骗事件的发生, 提升了安全性。 在一种可选的实施例中, 在声音样本与声音模型库中的声音进行比较之前, 可以 预先建立声音模型库。 其中, 对于声音模型库的建立, 可以有多种方式来实现, 本实 施例中提供了一种比较优的实现方式, 在该方式中, 声音模型库的建立是通过对移动 终端的通讯录中的联系人的声音进行采样处理和保存建立的, 其中, 声音模型库存储 在远程服务器中和 /或该移动终端中。 例如, 该采样处理可以是在每次接到该联系人的 电话时选择录音并得到该联系人的声音样本。 这种情况下的录音, 用户是知道该联系 人的声音的, 这样可以得到比较精确的声音样本。 声音模型库可以是与每个用户对应 的, 例如, 用户 A和用户 B均有各自的声音模型库。 或者, 声音数据库还可以至多个 用户或者一组用户共享的, 例如, 一个公司或者一个团体的所有的用户均共享一个声 音模型库, 该共享的声音模型库可以是各个用户自行录制声音样本之后集中在一起形 成的。 另外, 作为运营商可以提供的一个服务, 运营商可以将得到的所有的用户的声 音样本作为一个大型的声音模型库, 通过该声音模型库可以为用户提供更加全面的声 音识别。 对联系人的声音进行采样处理和保存, 可以有多种实现方式, 本实施例中提供了 一种比较优的实施方式, 在该方式中, 可以将该采样得到的声音进行声音特征提取, 转化为数字向量, 将该数字向量进行保存, 进而实现移动终端的通讯录中的联系人的 声音进行采样处理和保存。 在另一个可选实施例中, 获取通话方的方式有很多, 有一种比较直接的方式, 是 获取通话的对方号码, 根据对方号码在声音模型库中查找声音, 并将声音样本与查找 到的声音进行比较。 在对方号码存在于移动终端的通讯录中, 且该声音模型库是通过 此通讯录中的联系人的声音进行采样处理和保存建立的时, 直接在声音模型库中查找 对方号码在声音模型库中的声音, 将声音样本与所查找到的声音进行比较; 在对方号 码不在移动终端的通讯录中时, 查找对方号码在声音模型库中有无相对应的声音, 如 果有与之对应的声音, 将声音样本与所查找到的声音进行比较。 更可选地, 可以在根 据对方号码在声音模型库中查找声音失败的情况下, 将声音样本与声音模型库中所有 的声音进行比较。
可选地, 对于声音的识别, 可以采用相似度的判别方法, 可以在声音样本与声音 模型库中查找到的声音的相似度大于或等于阈值时, 则将通话对象识别为该声音模型 库中声音模型所对应的用户; 在声音样本与声音模型库中查找到的声音的相似度小于 阈值时, 则确认通话对象为陌生人。 可选地, 还可以将通话对象的识别结果通知给移 动终端。 在本实施例中还提供了一种通话声音识别装置, 该装置用于实现上述装置, 在上 述装置中已经进行过说明的在此不再赘述, 以下该装置中的模块的名称不应当理解为 对该模块的限定, 例如, 获取模块, 设置为获取进行通话的通话对象的声音样本, 也 可以表述为 "一种用于获取进行通话的通话对象的声音样本的模块",下面所描述的模 块的功能可以通过处理器来实现。 图 2是根据本发明实施例的通话声音识别装置的框 图, 如图 2所示, 包括: 获取模块 22、 比较模块 24和识别模块 26。 可选地,获取模块 22,设置为获取进行通话的通话对象的声音样本; 比较模块 24, 设置为将声音样本与声音模型库中的声音进行比较; 识别模块 26, 设置为根据比较结 果对该通话声音进行识别。 可选地, 比较模块 24和识别模块 26可以位于所述移动终 端中或位于网络侧的服务器中。 图 3是根据本发明实施例的通话声音识别装置的可选框图一, 如图 3所示, 该装 置还包括: 保存模块 32, 设置为对移动终端的通讯录中的联系人的声音进行采样处理 和保存, 以建立声音模型库, 其中, 声音模型库存储在远程服务器中和 /或该移动终端 中。 图 4是根据本发明实施例的通话声音识别装置的可选框图二, 如图 4所示, 保存 模块 32包括: 提取单元 42, 设置为将采样得到的声音进行声音特征提取, 转化为数 字向量; 保存单元 44, 设置为将数字向量进行保存。 图 5是根据本发明实施例的通话声音识别装置的可选框图三, 如图 5所示, 比较 模块 24包括: 获取单元 52, 设置为获取通话的对方号码; 比较单元 54, 设置为根据 对方号码在声音模型库中查找声音, 并将声音样本与查找到的声音进行比较。 可选地,比较模块 24还设置为在根据对方号码在声音模型库中查找声音失败的情 况下, 将声音样本与声音模型库中所有的声音进行比较。 可选地, 识别模块 26, 设置为在声音样本与声音模型库中查找到的声音的相似度 大于或等于阈值时, 则将通话对象识别为声音模型库中声音模型所对应的用户; 在声
音样本与该声音模型库中查找到的声音的相似度小于阈值时, 则确认该通话对象为陌 生人。 图 6是根据本发明实施例的通话声音识别装置的可选框图四, 如图 6所示, 该装 置还包括: 通知模块 62, 设置为将通话对象的识别结果通知给移动终端。 下面结合可选实施例进行说明。 在本可选实施例中提出了一种可以通过通话声音辨别说话人身份的移动终端及通 话识别方法, 用于防止犯罪分子通过冒充手机用户的熟人给受害人打电话来达到诈骗 的目的。 并且还提供一种移动终端的声音分析装置, 这种装置先通过对手机通讯录中 的联系人的声音采样、 建立一个声音模型库、 并存储于远程服务器或移动终端中; 在 用户使用手机通话过程中, 首先对来电的声音进行采样, 然后将声音样本上传至远程 服务器或移动终端, 远程服务器或移动终端将声音样本与声音模型库作匹配或模式分 类等手段得出声音相似度的结论, 从而识别对端通话人的身份。 本可选实施例中的装置包括两个子系统: 前端子系统和后端子系统。 前端子系统 可以包括四个模块, 分别是: 1、 用户接口界面模块; 2、 声音采样模块; 3、 声音特征 提取模块; 4、 通讯接口模块。 后端子系统包括 5个模块, 分别是: 1、 用户配置管理 模块; 2、 声音特征提取模块; 3、 声音模型创建模块; 4、 声音识别模块; 5、 通讯接 口模块。 其中, 声音识别模块实现了上述比较模块 24和识别模块 26的功能。 下面对 这些模块进行说明。 声音采样模块: 负责在通话过程中捕捉对方说话人的声音, 然后交给前端子系统 的声音特征提取模块。 声音特征提取模块: 负责将获取到的声音提取特征, 转化为数字向量。 声音模型创建模块: 负责将特征提取后的声音数字向量建立一个声音模型。 声音识别模块: 用来根据声音识别通话人身份。 用户配置管理模块: 用户配置后端子系统的门户, 设置为对声音模型创建的参数 进行设置。 用户接口界面模块: 用户的操作界面接口。 通讯接口模块: 负责前端子系统和后端子系统的通信链路维护, 可以支持 wifi、 3G网络、 本系统内部通信等方式。
图 7是根据本发明实施例的通话声音识别系统模块组成图, 如图 7所示, 前端子 系统包括: 用户接口界面模块、 声音采样模块、 声音特征提取模块和通讯接口模块。 后端子系统包括: 用户配置管理模块、 声音特征提取模块、 声音识别模块、 声音模型 创建模块和通讯接口模块。 本装置的前端子系统可以部署到用户的智能手机上, 而本 装置的后端子系统可以部署到用户的智能手机上, 也可以部署到后端服务器上。 如果 后端子系统部署到智能手机上, 则前端子系统和后端子系统采用手机操作系统内部通 信的通讯方式, 如果后端子系统部署到后端服务器上, 则前端子系统和后端子系统采 用 wifi或 3G网络的通讯方式。 后端子系统负责为手机用户创建和存储通讯录中联系 人的声音模型, 而前端子系统负责采样手机通话过程中对端说话人的声音, 然后将采 样和特征提取后的声音样本上传至后端子系统, 后端子系统根据声音模型库来识别对 端说话人。 一种典型的应用场景如下: 小明在自己新买的手机上安装了本系统, 在安装本系统后, 小明的朋友小马跟小 明通电话, 小马的声音模型就被本系统存储下来。 若干天之后, 有一个自称小马的人 使用非通讯录中小马的手机号给小明打电话, 此通话人的声音将在本系统的声音模型 库中作匹配或模式分类, 然后本系统会提示小明此通话人的身份。 图 8是根据本发明实施例的通话声音识别功能流程图, 如图 8所示, 该流程包括 如下步骤: BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to the field of mobile applications, and in particular to a method and apparatus for recognizing a voice of a call. BACKGROUND OF THE INVENTION At present, communication technology has been greatly developed. While the communication industry is developing rapidly, criminal activities using these means of communication for fraud are becoming increasingly rampant, and telephone fraud is one of them. Telephone fraud, that is, using the phone for fraudulent activities, an important means of fraud by criminals is to scam by calling the victim's acquaintance to call the victim. In many cases, the victim cannot immediately distinguish the opposite caller by voice. Identity, or because the face does not promptly challenge the identity of the other party, may lead to fraud. In view of the related art, the terminal cannot easily identify the problem of the fraud event because the identity of the opposite party cannot be identified by the voice of the call, and no reasonable solution has been proposed yet. SUMMARY OF THE INVENTION The present invention provides a method and apparatus for recognizing a call voice, so as to at least solve the problem that a terminal can easily cause a fraud event due to the inability of the terminal to discriminate the identity of the correspondent caller by the call voice. According to an aspect of the present invention, a call voice recognition method is provided, including: acquiring a sound sample of a call object that performs a call; comparing the sound sample with a sound in a sound model library; and speaking the call according to the comparison result The sound is identified. Before the sound sample is compared with the sound in the sound model library, the method further includes: sampling and saving the sound of the contact in the address book of the mobile terminal to establish a sound model library, where The sound model library is stored in the remote server and/or in the mobile terminal. Sampling and saving the voice of the contact in the address book of the mobile terminal includes: extracting the sampled sound into a sound vector, and converting the digital vector into a digital vector. Comparing the sound sample with the sound in the sound model library includes: acquiring a counterpart number of the call; searching for a sound in the sound model library according to the counterpart number, and comparing the sound sample with the found sound Compare. In the case that the sound search fails in the sound model library according to the counterpart number, the method further includes: comparing the sound sample with all the sounds in the sound model library. Identifying the call voice according to the comparison result includes: when the similarity of the sound found in the sound sample and the sound model library is greater than or equal to a threshold, identifying the call object as the sound model library The user corresponding to the middle sound model; when the similarity between the sound sample and the sound found in the sound model library is less than a threshold, it is confirmed that the call object is a stranger. The method further includes: notifying the mobile terminal of the recognition result of the call object. According to another aspect of the present invention, a call voice recognition apparatus is further provided, including: an acquisition module, configured to acquire a sound sample of a call object that performs a call of the mobile terminal; and a comparison module configured to set the sound sample and the sound model The sounds in the library are compared; the recognition module is arranged to recognize the call sound based on the comparison result. The device further includes: a saving module, configured to sample and save the sound of the contact in the address book of the mobile terminal, to establish a sound model library, wherein the sound model library is stored in the remote server and / or in the mobile terminal. The saving module includes: an extracting unit configured to perform sound feature extraction on the sampled sound and convert the image into a digital vector; and the saving unit is configured to save the digital vector. The comparison module includes: an obtaining unit configured to acquire a counterpart number of the call; a comparing unit configured to search for a sound in the sound model library according to the counterpart number, and the sound sample and the found sound Compare. The comparison module is further configured to compare the sound samples with all of the sounds in the sound model library in the event that the sound search fails in the sound model library according to the counterpart number. The comparison module and the identification module are located in the mobile terminal or in a server on the network side. The identification module is configured to identify the call object as a sound model corresponding to the sound model library when the similarity between the sound sample and the sound found in the sound model library is greater than or equal to a threshold value The user confirms that the call object is a stranger when the similarity between the sound sample and the sound found in the sound model library is less than a threshold. The device further includes: a notification module, configured to notify the mobile terminal of the recognition result of the call object. According to the present invention, a sound sample for acquiring a call object for making a call is used; the sound sample is compared with the sound in the sound model library; and the call sound is recognized according to the comparison result, and the terminal is unable to pass the call sound in the related art. Identifying the identity of the peer caller can easily lead to fraudulent incidents, and the terminal can identify the identity of the correspondent caller by voice, and improve security. BRIEF DESCRIPTION OF THE DRAWINGS The accompanying drawings, which are set to illustrate,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, 1 is a flowchart of a voice recognition method according to an embodiment of the present invention; FIG. 2 is a block diagram of a voice recognition apparatus according to an embodiment of the present invention; FIG. 3 is a voice recognition of a voice according to an embodiment of the present invention. FIG. 4 is an optional block diagram 2 of a call voice recognition apparatus according to an embodiment of the present invention; FIG. 5 is an optional block diagram 3 of a call voice recognition apparatus according to an embodiment of the present invention; FIG. 7 is a block diagram of a call voice recognition system module according to an embodiment of the present invention; FIG. 8 is a flow chart of a call voice recognition function according to an embodiment of the present invention. DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS It should be noted that the embodiments in the present application and the features in the embodiments may be combined with each other without conflict. The invention will be described in detail below with reference to the drawings in conjunction with the embodiments. In this embodiment, a call voice recognition method is provided. FIG. 1 is a flowchart of a call voice recognition method according to an embodiment of the present invention. As shown in FIG. 1, the process includes the following steps: Step S102: Acquire a call The sound sample of the call object; Step S104, comparing the sound sample with the sound in the sound model library; Step S106, identifying the call sound according to the comparison result. Through the above steps, the obtained sound sample of the call object is compared with the sound stored in the sound model library in advance, and the call sound is recognized according to the comparison result. Compared with the prior art, the terminal cannot distinguish the opposite call by the call voice. The identity of the person, through the above steps, can identify the voice of the call at the opposite end of the call, and then identify the identity of the person at the opposite end of the call, so that the mobile terminal user can determine whether the opposite end of the call is a stranger. More preferably, the user can select whether to continue the call or adjust the content of the call according to the result of the judgment, and can also select an alarm, thereby effectively reducing the occurrence of the mobile phone fraud event and improving the security. In an alternative embodiment, the sound model library may be pre-established prior to comparing the sound samples to the sounds in the sound model library. The establishment of the sound model library can be implemented in various ways. In this embodiment, a relatively good implementation manner is provided. In this manner, the sound model library is established through the address book of the mobile terminal. The voice of the contact is set up and saved, wherein the sound model library is stored in the remote server and/or in the mobile terminal. For example, the sampling process may be to select a recording and get a sound sample of the contact each time a call to the contact is received. In this case, the user knows the voice of the contact, so that a more accurate sound sample can be obtained. The sound model library may be corresponding to each user. For example, both user A and user B have their own sound model libraries. Alternatively, the sound database can also be shared by multiple users or a group of users. For example, all users of a company or a group share a sound model library, and the shared sound model library can be concentrated after each user records the sound sample by himself. Formed together. In addition, as a service that the operator can provide, the operator can use the obtained sound samples of all users as a large sound model library, and the sound model library can provide users with more comprehensive voice recognition. The sampling process and the saving of the voice of the contact may be implemented in various manners. In this embodiment, a preferred implementation manner is provided. In this manner, the sound obtained by the sampling may be extracted and converted. For the digital vector, the digital vector is saved, and then the voice of the contact in the address book of the mobile terminal is sampled and saved. In another optional embodiment, there are many ways to obtain a party. There is a relatively straightforward way to obtain the party number of the call, find the voice in the voice model library according to the number of the party, and find the voice sample and the sound. The sound is compared. When the other party number exists in the address book of the mobile terminal, and the sound model library is sampled and saved by the voice of the contact in the address book, the other party number is directly searched in the sound model library in the sound model library. The sound in the middle, compares the sound sample with the found sound; when the other party number is not in the address book of the mobile terminal, finds whether the other party's number has a corresponding sound in the sound model library, if there is a corresponding sound , compares the sound sample to the sound you find. More optionally, the sound samples can be compared with all the sounds in the sound model library in the case where the sound search fails in the sound model library according to the counterpart number. Optionally, for the recognition of the sound, a similarity determination method may be adopted. When the similarity of the sounds found in the sound sample and the sound model library is greater than or equal to the threshold, the call object is identified as the sound model library. The user corresponding to the sound model; when the similarity of the sounds found in the sound sample and the sound model library is less than the threshold, the call object is confirmed to be a stranger. Optionally, the recognition result of the call object may also be notified to the mobile terminal. In the embodiment, a call voice recognition device is also provided, and the device is used to implement the foregoing device. The description of the device in the device is not described here. The name of the module in the device should not be understood as The module is defined, for example, an acquisition module, which is set to obtain a sound sample of a call object for making a call, and may also be expressed as "a module for acquiring a sound sample of a call object for making a call", the module described below The function can be implemented by the processor. 2 is a block diagram of a call voice recognition apparatus according to an embodiment of the present invention. As shown in FIG. 2, the method includes: an acquisition module 22, a comparison module 24, and an identification module 26. Optionally, the obtaining module 22 is configured to obtain a sound sample of the call object that performs the call; the comparing module 24 is configured to compare the sound sample with the sound in the sound model library; and the identifying module 26 is configured to The call voice is recognized. Alternatively, the comparison module 24 and the identification module 26 may be located in the mobile terminal or in a server on the network side. 3 is an optional block diagram of a call voice recognition apparatus according to an embodiment of the present invention. As shown in FIG. 3, the apparatus further includes: a saving module 32 configured to sample a voice of a contact in an address book of the mobile terminal. Processing and saving to build a sound model library, wherein the sound model library is stored in the remote server and/or in the mobile terminal. 4 is an optional block diagram 2 of a call voice recognition apparatus according to an embodiment of the present invention. As shown in FIG. 4, the saving module 32 includes: an extracting unit 42 configured to perform sound feature extraction on the sampled sound and convert it into a digital vector. The save unit 44 is set to save the digital vector. 5 is an optional block diagram 3 of a call voice recognition apparatus according to an embodiment of the present invention. As shown in FIG. 5, the comparison module 24 includes: an acquisition unit 52 configured to acquire a counterpart number of a call; and a comparison unit 54 configured to The number looks up the sound in the sound model library and compares the sound sample to the found sound. Optionally, the comparison module 24 is further configured to compare the sound samples to all of the sounds in the sound model library in the event that the sound search fails in the sound model library based on the counterpart number. Optionally, the identification module 26 is configured to identify the call object as a user corresponding to the sound model in the sound model library when the similarity of the sounds found in the sound sample and the sound model library is greater than or equal to the threshold value; When the similarity between the sound sample and the sound found in the sound model library is less than the threshold, it is confirmed that the call object is a stranger. FIG. 6 is an optional block diagram of a call voice recognition apparatus according to an embodiment of the present invention. As shown in FIG. 6, the apparatus further includes: a notification module 62, configured to notify the mobile terminal of the recognition result of the call object. The following description will be made in conjunction with alternative embodiments. In this alternative embodiment, a mobile terminal and a call identification method capable of discriminating a speaker identity by a call voice are proposed, which are used to prevent criminals from spoofing by calling an acquaintance of a mobile phone user to reach a victim. And also providing a sound analysis device for the mobile terminal, the device first sampling a sound of a contact in the mobile phone address book, establishing a sound model library, and storing it in a remote server or a mobile terminal; In the process, the voice of the incoming call is first sampled, and then the sound sample is uploaded to a remote server or a mobile terminal, and the remote server or the mobile terminal compares the sound sample with the sound model library or classifies the mode to obtain a sound similarity conclusion. Thereby identifying the identity of the correspondent. The apparatus in this alternative embodiment includes two subsystems: a front end subsystem and a back end subsystem. The front-end subsystem can include four modules, namely: 1. a user interface interface module; 2. a sound sampling module; 3. a sound feature extraction module; 4. a communication interface module. The back-end subsystem includes five modules, which are: 1. User configuration management module; 2. Sound feature extraction module; 3. Sound model creation module; 4. Sound recognition module; 5. Communication interface module. The voice recognition module implements the functions of the comparison module 24 and the recognition module 26 described above. These modules are described below. Sound Sampling Module: Responsible for capturing the voice of the other party's speaker during the call, and then handing it over to the sound feature extraction module of the front-end subsystem. Sound Feature Extraction Module: Responsible for converting the acquired sound extraction features into digital vectors. Sound Model Creation Module: Responsible for establishing a sound model for the sound digital vector after feature extraction. Voice recognition module: Used to identify the identity of the caller based on the voice. User Configuration Management Module: The portal for the user to configure the backend subsystem, set to set the parameters created by the sound model. User interface interface module: User interface interface. Communication interface module: Responsible for communication link maintenance of front-end subsystem and back-end subsystem, can support wifi, 3G network, internal communication of the system. FIG. 7 is a block diagram of a call voice recognition system module according to an embodiment of the present invention. As shown in FIG. 7, the front end subsystem includes: a user interface interface module, a sound sampling module, a sound feature extraction module, and a communication interface module. The backend subsystem includes: a user configuration management module, a sound feature extraction module, a voice recognition module, a sound model creation module, and a communication interface module. The front-end subsystem of the device can be deployed to the user's smartphone, and the back-end subsystem of the device can be deployed to the user's smartphone or deployed to the back-end server. If the back-end subsystem is deployed on the smartphone, the front-end subsystem and the back-end subsystem use the internal communication communication mode of the mobile phone operating system. If the back-end subsystem is deployed to the back-end server, the front-end subsystem and the back-end subsystem use wifi or 3G network communication method. The backend subsystem is responsible for creating and storing the voice model of the contacts in the address book for the mobile phone user, and the front end subsystem is responsible for sampling the voice of the opposite speaker during the mobile phone call, and then uploading the sampled and feature extracted sound samples to the rear terminal. The system, the back-end subsystem identifies the opposite speaker based on the sound model library. A typical application scenario is as follows: Xiao Ming installed the system on his newly purchased mobile phone. After installing the system, Xiao Ming's friend Xiao Ma and Xiao Ming telephone, the pony's voice model is stored by the system. A few days later, a person who claimed to be a pony called Xiao Ming using the mobile phone number of the non-addressed pony. The voice of the caller will be matched or pattern classified in the sound model library of the system, and then the system will prompt Xiao Ming is the identity of this caller. FIG. 8 is a flowchart of a call voice recognition function according to an embodiment of the present invention. As shown in FIG. 8, the process includes the following steps:
5801 , 手机接到来电。 5801, the phone received an incoming call.
5802, 本装置的前端子系统会去匹配手机通讯录, 确认来电号码是否属于通讯录 中的已有号码。 如果来电号码属于通讯录中的已有号码, 则转入 S803 ; 如果来电号码 不属于通讯录中的已有号码, 则转入 S804。 5802, the front-end subsystem of the device will match the phone address book to confirm whether the caller number belongs to the existing number in the address book. If the caller number belongs to the existing number in the address book, go to S803; if the caller number does not belong to the existing number in the address book, go to S804.
5803 , 如果来电号码是属于通讯录中的已有号码, 本装置的前端子系统会查询用 户通讯录, 确认本号码是否在声音模型库中已经有声音模型。 如果本号码在声音模型 库中已有声音模型, 则转到 S804; 否则转到 S807。 5803. If the caller ID number belongs to an existing number in the address book, the front-end subsystem of the device queries the user address book to confirm whether the number has a sound model in the sound model library. If the number already has a sound model in the sound model library, go to S804; otherwise, go to S807.
S804, 如果此号码已有声音模型, 则本装置的前端子系统声音特征提取模块会采 样本次通话中对端通话人的声音, 并进行特征提取, 然后转到 S805。 S804, if the sound model exists in the number, the front end subsystem sound feature extraction module of the device picks up the voice of the opposite caller in the sample call, and performs feature extraction, and then proceeds to S805.
S805 , 前端子系统将 S804 的声音特征提取模块提取到的声音特征作为入参输入 到后端子系统的声音识别模块, 声音识别模块根据声音模型库中的声音模型辨别本次 通话的对端通话人身份。
S806, 用户接口界面模块模块将对端说话人的身份辨别结果通知本手机用户。 S805, the front-end subsystem inputs the sound feature extracted by the sound feature extraction module of the S804 as a voice input module input to the back-end subsystem, and the voice recognition module identifies the opposite caller of the call according to the sound model in the sound model library. Identity. S806. The user interface interface module module notifies the mobile phone user of the identity of the peer speaker.
S807, 如果在声音模型库中, 来电号码还没有声音模型, 则本装置的前端子系统 的声音采样模块将采样得到的声音样本使用通讯模块上传给后端子系统, 后端子系统 的声音特征提取模块会对此声音样本进行特征提取, 然后转到 S808。 S807, if there is no sound model in the sound model library, the sound sampling module of the front end subsystem of the device uploads the sampled sound sample to the back end subsystem using the communication module, and the sound feature extraction module of the back end subsystem Feature extraction is performed on this sound sample, and then go to S808.
S808, 后端子系统的声音模型建立模块将特征提取后的声音样本构造声音模型, 然后存入声音模型库。 采用本可选实施例的方法或装置, 区别于以往只能通过人为判断的方式, 而是通 过非人工的方法对手机通话声音进行辨别, 可以有效避免手机用户在电话诈骗中上当 受骗。 显然, 本领域的技术人员应该明白, 上述的本发明的各模块或各步骤可以用通用 的计算装置来实现, 它们可以集中在单个的计算装置上, 或者分布在多个计算装置所 组成的网络上, 可选地, 它们可以用计算装置可执行的程序代码来实现, 从而, 可以 将它们存储在存储装置中由计算装置来执行, 或者将它们分别制作成各个集成电路模 块, 或者将它们中的多个模块或步骤制作成单个集成电路模块来实现。 这样, 本发明 不限制于任何特定的硬件和软件结合。 以上该仅为本发明的可选实施例而已, 并不用于限制本发明, 对于本领域的技术 人员来说, 本发明可以有各种更改和变化。 凡在本发明的精神和原则之内, 所作的任 何修改、 等同替换、 改进等, 均应包含在本发明的保护范围之内。 工业实用性: 本发明涉及移动应用领域, 采用了获取进行通话的通话对象的声音样本; 将该声 音样本与声音模型库中的声音进行比较; 根据比较结果对通话声音进行识别, 解决了 相关技术中终端因不能通过通话声音辨别对端通话人的身份, 容易导致诈骗事件发生 的问题, 实现了终端能够通过通话声音辨别对端通话人的身份, 提高了安全性。
S808. The sound model building module of the back end subsystem constructs a sound model by extracting the sound samples from the feature, and then deposits the sound model into the sound model library. The method or device of the alternative embodiment is different from the previous method of human judgment, and the voice of the mobile phone is discriminated by a non-manual method, which can effectively prevent the mobile phone user from being deceived in the telephone fraud. Obviously, those skilled in the art should understand that the above modules or steps of the present invention can be implemented by a general-purpose computing device, which can be concentrated on a single computing device or distributed over a network composed of multiple computing devices. Alternatively, they may be implemented by program code executable by the computing device, such that they may be stored in the storage device by the computing device, or they may be separately fabricated into individual integrated circuit modules, or they may be Multiple modules or steps are made into a single integrated circuit module. Thus, the invention is not limited to any specific combination of hardware and software. The above is only an alternative embodiment of the present invention, and is not intended to limit the present invention, and various modifications and changes can be made to the present invention. Any modifications, equivalent substitutions, improvements, etc. made within the spirit and scope of the present invention are intended to be included within the scope of the present invention. Industrial Applicability: The present invention relates to the field of mobile applications, which adopts a sound sample for acquiring a call object for making a call; compares the sound sample with the sound in the sound model library; and recognizes the call sound according to the comparison result, and solves the related technology Because the terminal can not identify the identity of the opposite party through the voice of the call, it is easy to cause the problem of the fraud event, and the terminal can identify the identity of the opposite party by the voice of the call, thereby improving the security.
Claims
1. 一种通话声音识别方法, 包括: 获取进行通话的通话对象的声音样本; A call voice recognition method, comprising: acquiring a sound sample of a call object that performs a call;
将所述声音样本与声音模型库中的声音进行比较; Comparing the sound samples with sounds in a sound model library;
根据比较结果对所述通话声音进行识别。 The call voice is identified based on the comparison result.
2. 根据权利要求 1所述的方法, 其中, 将所述声音样本与声音模型库中的声音进 行比较之前, 所述方法还包括: 对移动终端的通讯录中的联系人的声音进行采样处理和保存, 以建立声音 模型库, 其中, 所述声音模型库存储在远程服务器中和 /或所述移动终端中。 2. The method according to claim 1, wherein, before comparing the sound sample with the sound in the sound model library, the method further comprises: sampling the sound of the contact in the address book of the mobile terminal And saving to build a sound model library, wherein the sound model library is stored in a remote server and/or in the mobile terminal.
3. 根据权利要求 2所述的方法, 其中, 对所述移动终端的通讯录中的联系人的声 音进行采样处理和保存包括: 3. The method according to claim 2, wherein the sampling processing and saving of the voice of the contact in the address book of the mobile terminal comprises:
将所述采样得到的声音进行声音特征提取, 转化为数字向量, 将所述数字 向量进行保存。 The sampled sound is extracted into a sound vector, converted into a digital vector, and the digital vector is saved.
4. 根据权利要求 1中任一项所述的方法, 其中, 将所述声音样本与声音模型库中 的声音进行比较包括: 4. The method of any of claims 1 , wherein comparing the sound samples to sounds in a sound model library comprises:
获取所述通话的对方号码; 根据所述对方号码在所述声音模型库中查找声音, 并将所述声音样本与查 找到的声音进行比较。 Obtaining the counterpart number of the call; searching for a sound in the sound model library according to the counterpart number, and comparing the sound sample with the found sound.
5. 根据权利要求 4所述的方法, 其中, 在根据所述对方号码在所述声音模型库中 查找声音失败的情况下, 所述方法还包括: The method according to claim 4, wherein, in the case that the sound is found in the sound model library according to the counterpart number, the method further comprises:
将所述声音样本与所述声音模型库中所有的声音进行比较。 The sound samples are compared to all sounds in the sound model library.
6. 根据权利要求 1至 5中任一项所述的方法, 其中, 根据比较结果对所述通话声 音进行识别包括: 在所述声音样本与所述声音模型库中查找到的声音的相似度大于或等于阈 值时, 则将所述通话对象识别为所述声音模型库中声音模型所对应的用户;
在所述声音样本与所述声音模型库中查找到的声音的相似度小于阈值时, 则确认所述通话对象为陌生人。 The method according to any one of claims 1 to 5, wherein the recognizing the call sound according to the comparison result comprises: a similarity of sounds found in the sound sample and the sound model library When the threshold is greater than or equal to the threshold, the call object is identified as a user corresponding to the sound model in the sound model library; When the similarity between the sound sample and the sound found in the sound model library is less than a threshold, it is confirmed that the call object is a stranger.
7. 根据权利要求 6所述的方法, 其中, 所述方法还包括: 将所述通话对象的识别结果通知给所述移动终端。 The method according to claim 6, wherein the method further comprises: notifying the mobile terminal of the recognition result of the call object.
8. 一种通话声音识别装置, 包括: 获取模块, 设置为获取进行移动终端通话的通话对象的声音样本; 比较模块, 设置为将所述声音样本与声音模型库中的声音进行比较; 识别模块, 设置为根据比较结果对所述通话声音进行识别。 8. A call voice recognition apparatus, comprising: an acquisition module configured to acquire a sound sample of a call object for making a call of the mobile terminal; a comparison module configured to compare the sound sample with a sound in the sound model library; And being set to identify the call sound according to the comparison result.
9. 根据权利要求 8所述的装置, 其中, 所述装置还包括: 保存模块, 设置为对所述移动终端的通讯录中的联系人的声音进行采样处 理和保存, 以建立声音模型库, 其中, 所述声音模型库存储在远程服务器中和 / 或所述移动终端中。 The device according to claim 8, wherein the device further comprises: a saving module, configured to perform sampling processing and saving on a voice of a contact in the address book of the mobile terminal, to establish a sound model library, The sound model library is stored in a remote server and/or in the mobile terminal.
10. 根据权利要求 9所述的装置, 其中, 所述保存模块包括: 提取单元, 设置为将所述采样得到的声音进行声音特征提取, 转化为数字 向量; The device according to claim 9, wherein the saving module comprises: an extracting unit configured to perform sound feature extraction on the sampled sound and convert it into a digital vector;
保存单元, 设置为将所述数字向量进行保存。 A save unit, set to save the digital vector.
11. 根据权利要求 8所述的装置, 其中, 所述比较模块包括: 获取单元, 设置为获取所述通话的对方号码; The device according to claim 8, wherein the comparison module comprises: an obtaining unit, configured to acquire a counterpart number of the call;
比较单元, 设置为根据所述对方号码在所述声音模型库中查找声音, 并将 所述声音样本与查找到的声音进行比较。 The comparing unit is configured to find a sound in the sound model library based on the counterpart number, and compare the sound sample with the found sound.
12. 根据权利要求 11所述的装置,其中,所述比较模块还设置为在根据所述对方号 码在所述声音模型库中查找声音失败的情况下, 将所述声音样本与所述声音模 型库中所有的声音进行比较。 12. The apparatus according to claim 11, wherein the comparison module is further configured to: when the sound finding in the sound model library fails according to the counterpart number, the sound sample and the sound model All the sounds in the library are compared.
13. 根据权利要求 11所述的装置,其中,所述比较模块和所述识别模块位于所述移 动终端中或位于网络侧的服务器中。 13. The apparatus of claim 11, wherein the comparison module and the identification module are located in the mobile terminal or in a server on the network side.
14. 根据权利要求 8所述的装置, 其中, 所述识别模块设置为在所述声音样本与所 述声音模型库中查找到的声音的相似度大于或等于阈值时, 则将所述通话对象
识别为所述声音模型库中声音模型所对应的用户; 在所述声音样本与所述声音 模型库中查找到的声音的相似度小于阈值时, 则确认所述通话对象为陌生人。The device according to claim 8, wherein the identification module is configured to: when the similarity of the sounds found in the sound sample and the sound model library is greater than or equal to a threshold value, The user corresponding to the sound model in the sound model library is identified; when the similarity of the sounds found in the sound sample and the sound model library is less than a threshold, the call object is confirmed to be a stranger.
15. 根据权利要求 13所述的装置, 其中, 所述装置还包括: The device according to claim 13, wherein the device further comprises:
通知模块, 设置为将所述通话对象的识别结果通知给所述移动终端。
The notification module is configured to notify the mobile terminal of the recognition result of the call object.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201310728622.1 | 2013-12-25 | ||
CN201310728622.1A CN104751848A (en) | 2013-12-25 | 2013-12-25 | Call voice recognition method and call voice recognition device |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2015096429A1 true WO2015096429A1 (en) | 2015-07-02 |
Family
ID=53477465
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2014/080661 WO2015096429A1 (en) | 2013-12-25 | 2014-06-24 | Call voice recognition method and apparatus |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN104751848A (en) |
WO (1) | WO2015096429A1 (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113225327A (en) * | 2021-04-29 | 2021-08-06 | 心动网络股份有限公司 | Login client supervision method, device, equipment and medium based on voice recognition |
Families Citing this family (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106790949A (en) * | 2015-11-20 | 2017-05-31 | 北京奇虎科技有限公司 | The collocation method and device in the phonetic feature storehouse of malicious call |
CN105590632B (en) * | 2015-12-16 | 2019-01-29 | 广东德诚科教有限公司 | A kind of S-T teaching process analysis method based on phonetic similarity identification |
WO2018170816A1 (en) * | 2017-03-23 | 2018-09-27 | 李卓希 | Call control processing method, and mobile terminal |
CN108122555B (en) * | 2017-12-18 | 2021-07-23 | 北京百度网讯科技有限公司 | Communication method, voice recognition device and terminal device |
CN107846493B (en) * | 2017-12-21 | 2019-10-25 | Oppo广东移动通信有限公司 | Call contact person control method, device and storage medium and mobile terminal |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1852560A (en) * | 2005-07-22 | 2006-10-25 | 华为技术有限公司 | Subscriber identy identifying method and calling control method and system |
US20080159488A1 (en) * | 2006-12-27 | 2008-07-03 | Chander Raja | Voice based caller identification and screening |
CN102576530A (en) * | 2009-10-15 | 2012-07-11 | 索尼爱立信移动通讯有限公司 | Voice pattern tagged contacts |
CN102780819A (en) * | 2012-07-27 | 2012-11-14 | 广东欧珀移动通信有限公司 | A method for voice recognition contacts of a mobile terminal |
CN103377652A (en) * | 2012-04-25 | 2013-10-30 | 上海智臻网络科技有限公司 | Method, device and equipment for carrying out voice recognition |
Family Cites Families (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101442579A (en) * | 2007-11-23 | 2009-05-27 | 中兴通讯股份有限公司 | Mobile terminal with speech recognition calling subscriber information |
JP2011119953A (en) * | 2009-12-03 | 2011-06-16 | Hitachi Ltd | Speech recording system using function of call control and speech recording |
CN202142288U (en) * | 2011-07-07 | 2012-02-08 | 龙旗科技(上海)有限公司 | Safety voice communication apparatus of portable terminal |
CN103281425A (en) * | 2013-04-25 | 2013-09-04 | 广东欧珀移动通信有限公司 | Method and device for analyzing contacts through call voice |
CN103313249B (en) * | 2013-05-07 | 2017-05-10 | 百度在线网络技术(北京)有限公司 | Reminding method and reminding system for terminal and server |
-
2013
- 2013-12-25 CN CN201310728622.1A patent/CN104751848A/en not_active Withdrawn
-
2014
- 2014-06-24 WO PCT/CN2014/080661 patent/WO2015096429A1/en active Application Filing
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1852560A (en) * | 2005-07-22 | 2006-10-25 | 华为技术有限公司 | Subscriber identy identifying method and calling control method and system |
US20080159488A1 (en) * | 2006-12-27 | 2008-07-03 | Chander Raja | Voice based caller identification and screening |
CN102576530A (en) * | 2009-10-15 | 2012-07-11 | 索尼爱立信移动通讯有限公司 | Voice pattern tagged contacts |
CN103377652A (en) * | 2012-04-25 | 2013-10-30 | 上海智臻网络科技有限公司 | Method, device and equipment for carrying out voice recognition |
CN102780819A (en) * | 2012-07-27 | 2012-11-14 | 广东欧珀移动通信有限公司 | A method for voice recognition contacts of a mobile terminal |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113225327A (en) * | 2021-04-29 | 2021-08-06 | 心动网络股份有限公司 | Login client supervision method, device, equipment and medium based on voice recognition |
Also Published As
Publication number | Publication date |
---|---|
CN104751848A (en) | 2015-07-01 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN113794805B (en) | A detection method and detection system for GOIP fraud calls | |
US9607621B2 (en) | Customer identification through voice biometrics | |
WO2015096429A1 (en) | Call voice recognition method and apparatus | |
CN105306657B (en) | Personal identification method, device and communicating terminal | |
KR101881058B1 (en) | Method, apparatus and system for voice verification | |
CN104537746A (en) | Intelligent electronic door control method, system and equipment | |
WO2016169095A1 (en) | Terminal alarm method and apparatus | |
CN105554223A (en) | A method for establishing connection and mobile terminal | |
US20180013869A1 (en) | Integration of voip phone services with intelligent cloud voice recognition | |
CN107995381B (en) | Alarm terminal, cloud, alarm processing method of cloud and storage medium | |
CN204990444U (en) | Intelligent security controlgear | |
WO2017201874A1 (en) | Method and apparatus for prompting loss of terminal | |
US8483672B2 (en) | System and method for selective monitoring of mobile communication terminals based on speech key-phrases | |
CN109039509A (en) | A kind of method and broadcasting equipment of voice control broadcasting equipment | |
WO2017059679A1 (en) | Account processing method and apparatus | |
CN112333709B (en) | Cross-network fraud association analysis method and system and computer storage medium | |
CN107707754A (en) | A kind of intelligent terminal recovers method and apparatus | |
WO2018166367A1 (en) | Real-time prompt method and device in real-time conversation, storage medium, and electronic device | |
JP2016149636A (en) | Authentication apparatus, telephone terminal, authentication method and authentication program | |
EP2723036A1 (en) | System and method for user-privacy-aware communication monitoring and analysis | |
JP2016071068A (en) | Call analysis device, call analysis method, and call analysis program | |
US20180343342A1 (en) | Controlled environment communication system for detecting unauthorized employee communications | |
CN107820251A (en) | The method, apparatus and system of a kind of network insertion | |
CN106886697A (en) | Authentication method, authentication platform, user terminal and Verification System | |
US20160028724A1 (en) | Identity Reputation |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 14873352 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 14873352 Country of ref document: EP Kind code of ref document: A1 |