[go: up one dir, main page]

WO2015096429A1 - 通话声音识别方法及装置 - Google Patents

通话声音识别方法及装置 Download PDF

Info

Publication number
WO2015096429A1
WO2015096429A1 PCT/CN2014/080661 CN2014080661W WO2015096429A1 WO 2015096429 A1 WO2015096429 A1 WO 2015096429A1 CN 2014080661 W CN2014080661 W CN 2014080661W WO 2015096429 A1 WO2015096429 A1 WO 2015096429A1
Authority
WO
WIPO (PCT)
Prior art keywords
sound
call
model library
sample
sound model
Prior art date
Application number
PCT/CN2014/080661
Other languages
English (en)
French (fr)
Inventor
雷杨
华国栋
王勿英
Original Assignee
中兴通讯股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中兴通讯股份有限公司 filed Critical 中兴通讯股份有限公司
Publication of WO2015096429A1 publication Critical patent/WO2015096429A1/zh

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/04Training, enrolment or model building
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/06Decision making techniques; Pattern matching strategies
    • G10L17/08Use of distortion metrics or a particular distance between probe pattern and reference templates

Definitions

  • the present invention relates to the field of mobile applications, and in particular to a method and apparatus for recognizing a voice of a call.
  • BACKGROUND OF THE INVENTION At present, communication technology has been greatly developed. While the communication industry is developing rapidly, criminal activities using these means of communication for fraud are becoming increasingly rampant, and telephone fraud is one of them. Telephone fraud, that is, using the phone for fraudulent activities, an important means of fraud by criminals is to scam by calling the victim's acquaintance to call the victim. In many cases, the victim cannot immediately distinguish the opposite caller by voice. Identity, or because the face does not promptly challenge the identity of the other party, may lead to fraud.
  • a call voice recognition method including: acquiring a sound sample of a call object that performs a call; comparing the sound sample with a sound in a sound model library; and speaking the call according to the comparison result The sound is identified.
  • the method further includes: sampling and saving the sound of the contact in the address book of the mobile terminal to establish a sound model library, where The sound model library is stored in the remote server and/or in the mobile terminal.
  • Sampling and saving the voice of the contact in the address book of the mobile terminal includes: extracting the sampled sound into a sound vector, and converting the digital vector into a digital vector.
  • Comparing the sound sample with the sound in the sound model library includes: acquiring a counterpart number of the call; searching for a sound in the sound model library according to the counterpart number, and comparing the sound sample with the found sound Compare.
  • the method further includes: comparing the sound sample with all the sounds in the sound model library. Identifying the call voice according to the comparison result includes: when the similarity of the sound found in the sound sample and the sound model library is greater than or equal to a threshold, identifying the call object as the sound model library The user corresponding to the middle sound model; when the similarity between the sound sample and the sound found in the sound model library is less than a threshold, it is confirmed that the call object is a stranger. The method further includes: notifying the mobile terminal of the recognition result of the call object.
  • a call voice recognition apparatus including: an acquisition module, configured to acquire a sound sample of a call object that performs a call of the mobile terminal; and a comparison module configured to set the sound sample and the sound model The sounds in the library are compared; the recognition module is arranged to recognize the call sound based on the comparison result.
  • the device further includes: a saving module, configured to sample and save the sound of the contact in the address book of the mobile terminal, to establish a sound model library, wherein the sound model library is stored in the remote server and / or in the mobile terminal.
  • the saving module includes: an extracting unit configured to perform sound feature extraction on the sampled sound and convert the image into a digital vector; and the saving unit is configured to save the digital vector.
  • the comparison module includes: an obtaining unit configured to acquire a counterpart number of the call; a comparing unit configured to search for a sound in the sound model library according to the counterpart number, and the sound sample and the found sound Compare.
  • the comparison module is further configured to compare the sound samples with all of the sounds in the sound model library in the event that the sound search fails in the sound model library according to the counterpart number.
  • the comparison module and the identification module are located in the mobile terminal or in a server on the network side.
  • the identification module is configured to identify the call object as a sound model corresponding to the sound model library when the similarity between the sound sample and the sound found in the sound model library is greater than or equal to a threshold value
  • the user confirms that the call object is a stranger when the similarity between the sound sample and the sound found in the sound model library is less than a threshold.
  • the device further includes: a notification module, configured to notify the mobile terminal of the recognition result of the call object. According to the present invention, a sound sample for acquiring a call object for making a call is used; the sound sample is compared with the sound in the sound model library; and the call sound is recognized according to the comparison result, and the terminal is unable to pass the call sound in the related art.
  • FIG. 1 is a flowchart of a voice recognition method according to an embodiment of the present invention
  • FIG. 2 is a block diagram of a voice recognition apparatus according to an embodiment of the present invention
  • FIG. 3 is a voice recognition of a voice according to an embodiment of the present invention.
  • FIG. 4 is an optional block diagram 2 of a call voice recognition apparatus according to an embodiment of the present invention
  • FIG. 5 is an optional block diagram 3 of a call voice recognition apparatus according to an embodiment of the present invention
  • FIG. 7 is a block diagram of a call voice recognition system module according to an embodiment of the present invention
  • FIG. 8 is a flow chart of a call voice recognition function according to an embodiment of the present invention.
  • Step S102 Acquire a call The sound sample of the call object
  • Step S104 comparing the sound sample with the sound in the sound model library
  • Step S106 identifying the call sound according to the comparison result.
  • the obtained sound sample of the call object is compared with the sound stored in the sound model library in advance, and the call sound is recognized according to the comparison result.
  • the terminal cannot distinguish the opposite call by the call voice.
  • the identity of the person can identify the voice of the call at the opposite end of the call, and then identify the identity of the person at the opposite end of the call, so that the mobile terminal user can determine whether the opposite end of the call is a stranger. More preferably, the user can select whether to continue the call or adjust the content of the call according to the result of the judgment, and can also select an alarm, thereby effectively reducing the occurrence of the mobile phone fraud event and improving the security.
  • the sound model library may be pre-established prior to comparing the sound samples to the sounds in the sound model library. The establishment of the sound model library can be implemented in various ways. In this embodiment, a relatively good implementation manner is provided. In this manner, the sound model library is established through the address book of the mobile terminal.
  • the voice of the contact is set up and saved, wherein the sound model library is stored in the remote server and/or in the mobile terminal.
  • the sampling process may be to select a recording and get a sound sample of the contact each time a call to the contact is received.
  • the user knows the voice of the contact, so that a more accurate sound sample can be obtained.
  • the sound model library may be corresponding to each user.
  • both user A and user B have their own sound model libraries.
  • the sound database can also be shared by multiple users or a group of users. For example, all users of a company or a group share a sound model library, and the shared sound model library can be concentrated after each user records the sound sample by himself. Formed together.
  • the operator can use the obtained sound samples of all users as a large sound model library, and the sound model library can provide users with more comprehensive voice recognition.
  • the sampling process and the saving of the voice of the contact may be implemented in various manners.
  • a preferred implementation manner is provided.
  • the sound obtained by the sampling may be extracted and converted.
  • the digital vector is saved, and then the voice of the contact in the address book of the mobile terminal is sampled and saved.
  • there are many ways to obtain a party There is a relatively straightforward way to obtain the party number of the call, find the voice in the voice model library according to the number of the party, and find the voice sample and the sound. The sound is compared.
  • the other party number exists in the address book of the mobile terminal, and the sound model library is sampled and saved by the voice of the contact in the address book, the other party number is directly searched in the sound model library in the sound model library.
  • the sound in the middle compares the sound sample with the found sound; when the other party number is not in the address book of the mobile terminal, finds whether the other party's number has a corresponding sound in the sound model library, if there is a corresponding sound , compares the sound sample to the sound you find.
  • the sound samples can be compared with all the sounds in the sound model library in the case where the sound search fails in the sound model library according to the counterpart number.
  • a similarity determination method may be adopted for the recognition of the sound.
  • the call object When the similarity of the sounds found in the sound sample and the sound model library is greater than or equal to the threshold, the call object is identified as the sound model library. The user corresponding to the sound model; when the similarity of the sounds found in the sound sample and the sound model library is less than the threshold, the call object is confirmed to be a stranger.
  • the recognition result of the call object may also be notified to the mobile terminal.
  • a call voice recognition device is also provided, and the device is used to implement the foregoing device. The description of the device in the device is not described here.
  • the name of the module in the device should not be understood as The module is defined, for example, an acquisition module, which is set to obtain a sound sample of a call object for making a call, and may also be expressed as "a module for acquiring a sound sample of a call object for making a call", the module described below
  • the function can be implemented by the processor.
  • 2 is a block diagram of a call voice recognition apparatus according to an embodiment of the present invention. As shown in FIG. 2, the method includes: an acquisition module 22, a comparison module 24, and an identification module 26.
  • the obtaining module 22 is configured to obtain a sound sample of the call object that performs the call; the comparing module 24 is configured to compare the sound sample with the sound in the sound model library; and the identifying module 26 is configured to The call voice is recognized.
  • the comparison module 24 and the identification module 26 may be located in the mobile terminal or in a server on the network side.
  • 3 is an optional block diagram of a call voice recognition apparatus according to an embodiment of the present invention. As shown in FIG. 3, the apparatus further includes: a saving module 32 configured to sample a voice of a contact in an address book of the mobile terminal. Processing and saving to build a sound model library, wherein the sound model library is stored in the remote server and/or in the mobile terminal.
  • the saving module 32 includes: an extracting unit 42 configured to perform sound feature extraction on the sampled sound and convert it into a digital vector.
  • the save unit 44 is set to save the digital vector.
  • 5 is an optional block diagram 3 of a call voice recognition apparatus according to an embodiment of the present invention.
  • the comparison module 24 includes: an acquisition unit 52 configured to acquire a counterpart number of a call; and a comparison unit 54 configured to The number looks up the sound in the sound model library and compares the sound sample to the found sound.
  • the comparison module 24 is further configured to compare the sound samples to all of the sounds in the sound model library in the event that the sound search fails in the sound model library based on the counterpart number.
  • the identification module 26 is configured to identify the call object as a user corresponding to the sound model in the sound model library when the similarity of the sounds found in the sound sample and the sound model library is greater than or equal to the threshold value; When the similarity between the sound sample and the sound found in the sound model library is less than the threshold, it is confirmed that the call object is a stranger.
  • FIG. 6 is an optional block diagram of a call voice recognition apparatus according to an embodiment of the present invention. As shown in FIG.
  • the apparatus further includes: a notification module 62, configured to notify the mobile terminal of the recognition result of the call object.
  • a notification module 62 configured to notify the mobile terminal of the recognition result of the call object.
  • the apparatus in this alternative embodiment includes two subsystems: a front end subsystem and a back end subsystem.
  • the front-end subsystem can include four modules, namely: 1. a user interface interface module; 2. a sound sampling module; 3.
  • the back-end subsystem includes five modules, which are: 1. User configuration management module; 2. Sound feature extraction module; 3. Sound model creation module; 4. Sound recognition module; 5. Communication interface module.
  • the voice recognition module implements the functions of the comparison module 24 and the recognition module 26 described above. These modules are described below.
  • Sound Sampling Module responsible for capturing the voice of the other party's speaker during the call, and then handing it over to the sound feature extraction module of the front-end subsystem.
  • Sound Feature Extraction Module responsible for converting the acquired sound extraction features into digital vectors.
  • Sound Model Creation Module responsible for establishing a sound model for the sound digital vector after feature extraction.
  • Voice recognition module Used to identify the identity of the caller based on the voice.
  • FIG. 7 is a block diagram of a call voice recognition system module according to an embodiment of the present invention.
  • the front end subsystem includes: a user interface interface module, a sound sampling module, a sound feature extraction module, and a communication interface module.
  • the backend subsystem includes: a user configuration management module, a sound feature extraction module, a voice recognition module, a sound model creation module, and a communication interface module.
  • the front-end subsystem of the device can be deployed to the user's smartphone, and the back-end subsystem of the device can be deployed to the user's smartphone or deployed to the back-end server. If the back-end subsystem is deployed on the smartphone, the front-end subsystem and the back-end subsystem use the internal communication communication mode of the mobile phone operating system. If the back-end subsystem is deployed to the back-end server, the front-end subsystem and the back-end subsystem use wifi or 3G network communication method.
  • the backend subsystem is responsible for creating and storing the voice model of the contacts in the address book for the mobile phone user, and the front end subsystem is responsible for sampling the voice of the opposite speaker during the mobile phone call, and then uploading the sampled and feature extracted sound samples to the rear terminal.
  • FIG. 8 is a flowchart of a call voice recognition function according to an embodiment of the present invention. As shown in FIG. 8, the process includes the following steps:
  • the phone received an incoming call.
  • the front-end subsystem of the device will match the phone address book to confirm whether the caller number belongs to the existing number in the address book. If the caller number belongs to the existing number in the address book, go to S803; if the caller number does not belong to the existing number in the address book, go to S804.
  • the front-end subsystem of the device queries the user address book to confirm whether the number has a sound model in the sound model library. If the number already has a sound model in the sound model library, go to S804; otherwise, go to S807.
  • the front end subsystem sound feature extraction module of the device picks up the voice of the opposite caller in the sample call, and performs feature extraction, and then proceeds to S805.
  • the front-end subsystem inputs the sound feature extracted by the sound feature extraction module of the S804 as a voice input module input to the back-end subsystem, and the voice recognition module identifies the opposite caller of the call according to the sound model in the sound model library.
  • Identity S806.
  • the user interface interface module module notifies the mobile phone user of the identity of the peer speaker.
  • the sound sampling module of the front end subsystem of the device uploads the sampled sound sample to the back end subsystem using the communication module, and the sound feature extraction module of the back end subsystem Feature extraction is performed on this sound sample, and then go to S808.
  • the sound model building module of the back end subsystem constructs a sound model by extracting the sound samples from the feature, and then deposits the sound model into the sound model library.
  • the method or device of the alternative embodiment is different from the previous method of human judgment, and the voice of the mobile phone is discriminated by a non-manual method, which can effectively prevent the mobile phone user from being deceived in the telephone fraud.
  • the above modules or steps of the present invention can be implemented by a general-purpose computing device, which can be concentrated on a single computing device or distributed over a network composed of multiple computing devices.
  • the computing device may be implemented by program code executable by the computing device, such that they may be stored in the storage device by the computing device, or they may be separately fabricated into individual integrated circuit modules, or they may be Multiple modules or steps are made into a single integrated circuit module.
  • the invention is not limited to any specific combination of hardware and software.
  • the above is only an alternative embodiment of the present invention, and is not intended to limit the present invention, and various modifications and changes can be made to the present invention. Any modifications, equivalent substitutions, improvements, etc. made within the spirit and scope of the present invention are intended to be included within the scope of the present invention.
  • the present invention relates to the field of mobile applications, which adopts a sound sample for acquiring a call object for making a call; compares the sound sample with the sound in the sound model library; and recognizes the call sound according to the comparison result, and solves the related technology Because the terminal can not identify the identity of the opposite party through the voice of the call, it is easy to cause the problem of the fraud event, and the terminal can identify the identity of the opposite party by the voice of the call, thereby improving the security.

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Telephone Function (AREA)
  • Telephonic Communication Services (AREA)

Abstract

本发明公开了通话声音识别方法及装置,其中,该方法包括:获取进行通话的通话对象的声音样本;将该声音样本与声音模型库中的声音进行比较;根据比较结果对通话声音进行识别。通过本发明解决了相关技术中终端因不能通过通话声音辨别对端通话人的身份,容易导致诈骗事件发生的问题,实现了终端能够通过通话声音辨别对端通话人的身份,提高了安全性。

Description

通话声音识别方法及装置 技术领域 本发明涉及移动应用领域, 具体而言, 涉及通话声音识别方法及装置。 背景技术 目前, 通信技术得到了很大的发展, 在通信业迅猛发展的同时, 利用这些通讯手 段进行诈骗的犯罪活动也日渐猖獗, 电话诈骗就是其中一种。 电话诈骗, 即利用电话 进行诈骗活动, 犯罪分子一种重要的诈骗手段就是通过冒充受害人熟人给受害人打电 话来进行诈骗, 许多时候, 受害人并不能通过声音立即分辨出对端通话人的身份, 或 者碍于面子没有及时对对方身份提出质疑, 因此可能会导致诈骗事件的发生。 针对相关技术中, 终端因不能通过通话声音辨别对端通话人的身份, 容易导致诈 骗事件发生的问题, 目前还没有提出合理的解决方案。 发明内容 本发明提供了通话声音识别方法及装置, 以至少解决相关技术中终端因不能通过 通话声音辨别对端通话人的身份, 容易导致诈骗事件发生的问题。 根据本发明的一个方面, 提供了一种通话声音识别方法, 包括: 获取进行通话的 通话对象的声音样本; 将所述声音样本与声音模型库中的声音进行比较; 根据比较结 果对所述通话声音进行识别。 将所述声音样本与声音模型库中的声音进行比较之前, 所述方法还包括: 对移动 终端的通讯录中的联系人的声音进行采样处理和保存, 以建立声音模型库, 其中, 所 述声音模型库存储在远程服务器中和 /或所述移动终端中。 对所述移动终端的通讯录中的联系人的声音进行采样处理和保存包括: 将所述采 样得到的声音进行声音特征提取, 转化为数字向量, 将所述数字向量进行保存。 将所述声音样本与声音模型库中的声音进行比较包括:获取所述通话的对方号码; 根据所述对方号码在所述声音模型库中查找声音, 并将所述声音样本与查找到的声音 进行比较。 在根据所述对方号码在所述声音模型库中查找声音失败的情况下, 所述方法还包 括: 将所述声音样本与所述声音模型库中所有的声音进行比较。 根据比较结果对所述通话声音进行识别包括: 在所述声音样本与所述声音模型库 中查找到的声音的相似度大于或等于阈值时, 则将所述通话对象识别为所述声音模型 库中声音模型所对应的用户; 在所述声音样本与所述声音模型库中查找到的声音的相 似度小于阈值时, 则确认所述通话对象为陌生人。 所述方法还包括: 将所述通话对象的识别结果通知给所述移动终端。 根据本发明的另一个方面, 还提供了一种通话声音识别装置, 包括: 获取模块, 用于获取进行移动终端通话的通话对象的声音样本; 比较模块, 设置为将所述声音样 本与声音模型库中的声音进行比较; 识别模块, 设置为根据比较结果对所述通话声音 进行识别。 所述装置还包括: 保存模块, 设置为对所述移动终端的通讯录中的联系人的声音 进行采样处理和保存, 以建立声音模型库, 其中, 所述声音模型库存储在远程服务器 中和 /或所述移动终端中。 所述保存模块包括: 提取单元,设置为将所述采样得到的声音进行声音特征提取, 转化为数字向量; 保存单元, 设置为将所述数字向量进行保存。 所述比较模块包括: 获取单元, 设置为获取所述通话的对方号码; 比较单元, 设 置为根据所述对方号码在所述声音模型库中查找声音, 并将所述声音样本与查找到的 声音进行比较。 所述比较模块还设置为在根据所述对方号码在所述声音模型库中查找声音失败的 情况下, 将所述声音样本与所述声音模型库中所有的声音进行比较。 所述比较模块和所述识别模块位于所述移动终端中或位于网络侧的服务器中。 所述识别模块设置为在所述声音样本与所述声音模型库中查找到的声音的相似度 大于或等于阈值时, 则将所述通话对象识别为所述声音模型库中声音模型所对应的用 户; 在所述声音样本与所述声音模型库中查找到的声音的相似度小于阈值时, 则确认 所述通话对象为陌生人。 所述装置还包括: 通知模块, 设置为将所述通话对象的识别结果通知给所述移动 终端。 通过本发明, 采用了获取进行通话的通话对象的声音样本; 将该声音样本与声音 模型库中的声音进行比较; 根据比较结果对通话声音进行识别, 解决了相关技术中终 端因不能通过通话声音辨别对端通话人的身份, 容易导致诈骗事件发生的问题, 实现 了终端能够通过通话声音辨别对端通话人的身份, 提高了安全性。 附图说明 此处所说明的附图用来提供对本发明的进一步理解, 构成本申请的一部分, 本发 明的示意性实施例及其说明用于解释本发明, 并不构成对本发明的不当限定。 在附图 中: 图 1是根据本发明实施例的通话声音识别方法的流程图; 图 2是根据本发明实施例的通话声音识别装置的框图; 图 3是根据本发明实施例的通话声音识别装置的可选框图一; 图 4是根据本发明实施例的通话声音识别装置的可选框图二; 图 5是根据本发明实施例的通话声音识别装置的可选框图三; 图 6是根据本发明实施例的通话声音识别装置的可选框图四; 图 7是根据本发明实施例的通话声音识别系统模块组成图; 图 8是根据本发明实施例的通话声音识别功能流程图。 具体实施方式 需要说明的是, 在不冲突的情况下, 本申请中的实施例及实施例中的特征可以相 互组合。 下面将参考附图并结合实施例来详细说明本发明。 在本实施例中, 提供了一种通话声音识别方法, 图 1是根据本发明实施例的通话 声音识别方法的流程图, 如图 1所示, 该流程包括如下步骤: 步骤 S102, 获取进行通话的通话对象的声音样本; 步骤 S104, 将声音样本与声音模型库中的声音进行比较; 步骤 S106, 根据比较结果对通话声音进行识别。 通过上述步骤, 将获取到的通话对象的声音样本与预先存储在声音模型库中的声 音进行比较, 根据比较结果识别该通话声音, 相比于现有技术中终端不能通过通话声 音辨别对端通话人的身份, 通过上述步骤可识别通话对端的通话声音, 进而对通话对 端人的身份进行辨别, 方便移动终端用户判断通话对端是否是陌生人。 更优地, 用户 可根据判断的结果选择是否继续通话或者调整通话的内容, 还可以选择报警, 从而可 有效降低手机诈骗事件的发生, 提升了安全性。 在一种可选的实施例中, 在声音样本与声音模型库中的声音进行比较之前, 可以 预先建立声音模型库。 其中, 对于声音模型库的建立, 可以有多种方式来实现, 本实 施例中提供了一种比较优的实现方式, 在该方式中, 声音模型库的建立是通过对移动 终端的通讯录中的联系人的声音进行采样处理和保存建立的, 其中, 声音模型库存储 在远程服务器中和 /或该移动终端中。 例如, 该采样处理可以是在每次接到该联系人的 电话时选择录音并得到该联系人的声音样本。 这种情况下的录音, 用户是知道该联系 人的声音的, 这样可以得到比较精确的声音样本。 声音模型库可以是与每个用户对应 的, 例如, 用户 A和用户 B均有各自的声音模型库。 或者, 声音数据库还可以至多个 用户或者一组用户共享的, 例如, 一个公司或者一个团体的所有的用户均共享一个声 音模型库, 该共享的声音模型库可以是各个用户自行录制声音样本之后集中在一起形 成的。 另外, 作为运营商可以提供的一个服务, 运营商可以将得到的所有的用户的声 音样本作为一个大型的声音模型库, 通过该声音模型库可以为用户提供更加全面的声 音识别。 对联系人的声音进行采样处理和保存, 可以有多种实现方式, 本实施例中提供了 一种比较优的实施方式, 在该方式中, 可以将该采样得到的声音进行声音特征提取, 转化为数字向量, 将该数字向量进行保存, 进而实现移动终端的通讯录中的联系人的 声音进行采样处理和保存。 在另一个可选实施例中, 获取通话方的方式有很多, 有一种比较直接的方式, 是 获取通话的对方号码, 根据对方号码在声音模型库中查找声音, 并将声音样本与查找 到的声音进行比较。 在对方号码存在于移动终端的通讯录中, 且该声音模型库是通过 此通讯录中的联系人的声音进行采样处理和保存建立的时, 直接在声音模型库中查找 对方号码在声音模型库中的声音, 将声音样本与所查找到的声音进行比较; 在对方号 码不在移动终端的通讯录中时, 查找对方号码在声音模型库中有无相对应的声音, 如 果有与之对应的声音, 将声音样本与所查找到的声音进行比较。 更可选地, 可以在根 据对方号码在声音模型库中查找声音失败的情况下, 将声音样本与声音模型库中所有 的声音进行比较。 可选地, 对于声音的识别, 可以采用相似度的判别方法, 可以在声音样本与声音 模型库中查找到的声音的相似度大于或等于阈值时, 则将通话对象识别为该声音模型 库中声音模型所对应的用户; 在声音样本与声音模型库中查找到的声音的相似度小于 阈值时, 则确认通话对象为陌生人。 可选地, 还可以将通话对象的识别结果通知给移 动终端。 在本实施例中还提供了一种通话声音识别装置, 该装置用于实现上述装置, 在上 述装置中已经进行过说明的在此不再赘述, 以下该装置中的模块的名称不应当理解为 对该模块的限定, 例如, 获取模块, 设置为获取进行通话的通话对象的声音样本, 也 可以表述为 "一种用于获取进行通话的通话对象的声音样本的模块",下面所描述的模 块的功能可以通过处理器来实现。 图 2是根据本发明实施例的通话声音识别装置的框 图, 如图 2所示, 包括: 获取模块 22、 比较模块 24和识别模块 26。 可选地,获取模块 22,设置为获取进行通话的通话对象的声音样本; 比较模块 24, 设置为将声音样本与声音模型库中的声音进行比较; 识别模块 26, 设置为根据比较结 果对该通话声音进行识别。 可选地, 比较模块 24和识别模块 26可以位于所述移动终 端中或位于网络侧的服务器中。 图 3是根据本发明实施例的通话声音识别装置的可选框图一, 如图 3所示, 该装 置还包括: 保存模块 32, 设置为对移动终端的通讯录中的联系人的声音进行采样处理 和保存, 以建立声音模型库, 其中, 声音模型库存储在远程服务器中和 /或该移动终端 中。 图 4是根据本发明实施例的通话声音识别装置的可选框图二, 如图 4所示, 保存 模块 32包括: 提取单元 42, 设置为将采样得到的声音进行声音特征提取, 转化为数 字向量; 保存单元 44, 设置为将数字向量进行保存。 图 5是根据本发明实施例的通话声音识别装置的可选框图三, 如图 5所示, 比较 模块 24包括: 获取单元 52, 设置为获取通话的对方号码; 比较单元 54, 设置为根据 对方号码在声音模型库中查找声音, 并将声音样本与查找到的声音进行比较。 可选地,比较模块 24还设置为在根据对方号码在声音模型库中查找声音失败的情 况下, 将声音样本与声音模型库中所有的声音进行比较。 可选地, 识别模块 26, 设置为在声音样本与声音模型库中查找到的声音的相似度 大于或等于阈值时, 则将通话对象识别为声音模型库中声音模型所对应的用户; 在声 音样本与该声音模型库中查找到的声音的相似度小于阈值时, 则确认该通话对象为陌 生人。 图 6是根据本发明实施例的通话声音识别装置的可选框图四, 如图 6所示, 该装 置还包括: 通知模块 62, 设置为将通话对象的识别结果通知给移动终端。 下面结合可选实施例进行说明。 在本可选实施例中提出了一种可以通过通话声音辨别说话人身份的移动终端及通 话识别方法, 用于防止犯罪分子通过冒充手机用户的熟人给受害人打电话来达到诈骗 的目的。 并且还提供一种移动终端的声音分析装置, 这种装置先通过对手机通讯录中 的联系人的声音采样、 建立一个声音模型库、 并存储于远程服务器或移动终端中; 在 用户使用手机通话过程中, 首先对来电的声音进行采样, 然后将声音样本上传至远程 服务器或移动终端, 远程服务器或移动终端将声音样本与声音模型库作匹配或模式分 类等手段得出声音相似度的结论, 从而识别对端通话人的身份。 本可选实施例中的装置包括两个子系统: 前端子系统和后端子系统。 前端子系统 可以包括四个模块, 分别是: 1、 用户接口界面模块; 2、 声音采样模块; 3、 声音特征 提取模块; 4、 通讯接口模块。 后端子系统包括 5个模块, 分别是: 1、 用户配置管理 模块; 2、 声音特征提取模块; 3、 声音模型创建模块; 4、 声音识别模块; 5、 通讯接 口模块。 其中, 声音识别模块实现了上述比较模块 24和识别模块 26的功能。 下面对 这些模块进行说明。 声音采样模块: 负责在通话过程中捕捉对方说话人的声音, 然后交给前端子系统 的声音特征提取模块。 声音特征提取模块: 负责将获取到的声音提取特征, 转化为数字向量。 声音模型创建模块: 负责将特征提取后的声音数字向量建立一个声音模型。 声音识别模块: 用来根据声音识别通话人身份。 用户配置管理模块: 用户配置后端子系统的门户, 设置为对声音模型创建的参数 进行设置。 用户接口界面模块: 用户的操作界面接口。 通讯接口模块: 负责前端子系统和后端子系统的通信链路维护, 可以支持 wifi、 3G网络、 本系统内部通信等方式。 图 7是根据本发明实施例的通话声音识别系统模块组成图, 如图 7所示, 前端子 系统包括: 用户接口界面模块、 声音采样模块、 声音特征提取模块和通讯接口模块。 后端子系统包括: 用户配置管理模块、 声音特征提取模块、 声音识别模块、 声音模型 创建模块和通讯接口模块。 本装置的前端子系统可以部署到用户的智能手机上, 而本 装置的后端子系统可以部署到用户的智能手机上, 也可以部署到后端服务器上。 如果 后端子系统部署到智能手机上, 则前端子系统和后端子系统采用手机操作系统内部通 信的通讯方式, 如果后端子系统部署到后端服务器上, 则前端子系统和后端子系统采 用 wifi或 3G网络的通讯方式。 后端子系统负责为手机用户创建和存储通讯录中联系 人的声音模型, 而前端子系统负责采样手机通话过程中对端说话人的声音, 然后将采 样和特征提取后的声音样本上传至后端子系统, 后端子系统根据声音模型库来识别对 端说话人。 一种典型的应用场景如下: 小明在自己新买的手机上安装了本系统, 在安装本系统后, 小明的朋友小马跟小 明通电话, 小马的声音模型就被本系统存储下来。 若干天之后, 有一个自称小马的人 使用非通讯录中小马的手机号给小明打电话, 此通话人的声音将在本系统的声音模型 库中作匹配或模式分类, 然后本系统会提示小明此通话人的身份。 图 8是根据本发明实施例的通话声音识别功能流程图, 如图 8所示, 该流程包括 如下步骤:
5801 , 手机接到来电。
5802, 本装置的前端子系统会去匹配手机通讯录, 确认来电号码是否属于通讯录 中的已有号码。 如果来电号码属于通讯录中的已有号码, 则转入 S803 ; 如果来电号码 不属于通讯录中的已有号码, 则转入 S804。
5803 , 如果来电号码是属于通讯录中的已有号码, 本装置的前端子系统会查询用 户通讯录, 确认本号码是否在声音模型库中已经有声音模型。 如果本号码在声音模型 库中已有声音模型, 则转到 S804; 否则转到 S807。
S804, 如果此号码已有声音模型, 则本装置的前端子系统声音特征提取模块会采 样本次通话中对端通话人的声音, 并进行特征提取, 然后转到 S805。
S805 , 前端子系统将 S804 的声音特征提取模块提取到的声音特征作为入参输入 到后端子系统的声音识别模块, 声音识别模块根据声音模型库中的声音模型辨别本次 通话的对端通话人身份。 S806, 用户接口界面模块模块将对端说话人的身份辨别结果通知本手机用户。
S807, 如果在声音模型库中, 来电号码还没有声音模型, 则本装置的前端子系统 的声音采样模块将采样得到的声音样本使用通讯模块上传给后端子系统, 后端子系统 的声音特征提取模块会对此声音样本进行特征提取, 然后转到 S808。
S808, 后端子系统的声音模型建立模块将特征提取后的声音样本构造声音模型, 然后存入声音模型库。 采用本可选实施例的方法或装置, 区别于以往只能通过人为判断的方式, 而是通 过非人工的方法对手机通话声音进行辨别, 可以有效避免手机用户在电话诈骗中上当 受骗。 显然, 本领域的技术人员应该明白, 上述的本发明的各模块或各步骤可以用通用 的计算装置来实现, 它们可以集中在单个的计算装置上, 或者分布在多个计算装置所 组成的网络上, 可选地, 它们可以用计算装置可执行的程序代码来实现, 从而, 可以 将它们存储在存储装置中由计算装置来执行, 或者将它们分别制作成各个集成电路模 块, 或者将它们中的多个模块或步骤制作成单个集成电路模块来实现。 这样, 本发明 不限制于任何特定的硬件和软件结合。 以上该仅为本发明的可选实施例而已, 并不用于限制本发明, 对于本领域的技术 人员来说, 本发明可以有各种更改和变化。 凡在本发明的精神和原则之内, 所作的任 何修改、 等同替换、 改进等, 均应包含在本发明的保护范围之内。 工业实用性: 本发明涉及移动应用领域, 采用了获取进行通话的通话对象的声音样本; 将该声 音样本与声音模型库中的声音进行比较; 根据比较结果对通话声音进行识别, 解决了 相关技术中终端因不能通过通话声音辨别对端通话人的身份, 容易导致诈骗事件发生 的问题, 实现了终端能够通过通话声音辨别对端通话人的身份, 提高了安全性。

Claims

权 利 要 求 书
1. 一种通话声音识别方法, 包括: 获取进行通话的通话对象的声音样本;
将所述声音样本与声音模型库中的声音进行比较;
根据比较结果对所述通话声音进行识别。
2. 根据权利要求 1所述的方法, 其中, 将所述声音样本与声音模型库中的声音进 行比较之前, 所述方法还包括: 对移动终端的通讯录中的联系人的声音进行采样处理和保存, 以建立声音 模型库, 其中, 所述声音模型库存储在远程服务器中和 /或所述移动终端中。
3. 根据权利要求 2所述的方法, 其中, 对所述移动终端的通讯录中的联系人的声 音进行采样处理和保存包括:
将所述采样得到的声音进行声音特征提取, 转化为数字向量, 将所述数字 向量进行保存。
4. 根据权利要求 1中任一项所述的方法, 其中, 将所述声音样本与声音模型库中 的声音进行比较包括:
获取所述通话的对方号码; 根据所述对方号码在所述声音模型库中查找声音, 并将所述声音样本与查 找到的声音进行比较。
5. 根据权利要求 4所述的方法, 其中, 在根据所述对方号码在所述声音模型库中 查找声音失败的情况下, 所述方法还包括:
将所述声音样本与所述声音模型库中所有的声音进行比较。
6. 根据权利要求 1至 5中任一项所述的方法, 其中, 根据比较结果对所述通话声 音进行识别包括: 在所述声音样本与所述声音模型库中查找到的声音的相似度大于或等于阈 值时, 则将所述通话对象识别为所述声音模型库中声音模型所对应的用户; 在所述声音样本与所述声音模型库中查找到的声音的相似度小于阈值时, 则确认所述通话对象为陌生人。
7. 根据权利要求 6所述的方法, 其中, 所述方法还包括: 将所述通话对象的识别结果通知给所述移动终端。
8. 一种通话声音识别装置, 包括: 获取模块, 设置为获取进行移动终端通话的通话对象的声音样本; 比较模块, 设置为将所述声音样本与声音模型库中的声音进行比较; 识别模块, 设置为根据比较结果对所述通话声音进行识别。
9. 根据权利要求 8所述的装置, 其中, 所述装置还包括: 保存模块, 设置为对所述移动终端的通讯录中的联系人的声音进行采样处 理和保存, 以建立声音模型库, 其中, 所述声音模型库存储在远程服务器中和 / 或所述移动终端中。
10. 根据权利要求 9所述的装置, 其中, 所述保存模块包括: 提取单元, 设置为将所述采样得到的声音进行声音特征提取, 转化为数字 向量;
保存单元, 设置为将所述数字向量进行保存。
11. 根据权利要求 8所述的装置, 其中, 所述比较模块包括: 获取单元, 设置为获取所述通话的对方号码;
比较单元, 设置为根据所述对方号码在所述声音模型库中查找声音, 并将 所述声音样本与查找到的声音进行比较。
12. 根据权利要求 11所述的装置,其中,所述比较模块还设置为在根据所述对方号 码在所述声音模型库中查找声音失败的情况下, 将所述声音样本与所述声音模 型库中所有的声音进行比较。
13. 根据权利要求 11所述的装置,其中,所述比较模块和所述识别模块位于所述移 动终端中或位于网络侧的服务器中。
14. 根据权利要求 8所述的装置, 其中, 所述识别模块设置为在所述声音样本与所 述声音模型库中查找到的声音的相似度大于或等于阈值时, 则将所述通话对象 识别为所述声音模型库中声音模型所对应的用户; 在所述声音样本与所述声音 模型库中查找到的声音的相似度小于阈值时, 则确认所述通话对象为陌生人。
15. 根据权利要求 13所述的装置, 其中, 所述装置还包括:
通知模块, 设置为将所述通话对象的识别结果通知给所述移动终端。
PCT/CN2014/080661 2013-12-25 2014-06-24 通话声音识别方法及装置 WO2015096429A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201310728622.1 2013-12-25
CN201310728622.1A CN104751848A (zh) 2013-12-25 2013-12-25 通话声音识别方法及装置

Publications (1)

Publication Number Publication Date
WO2015096429A1 true WO2015096429A1 (zh) 2015-07-02

Family

ID=53477465

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2014/080661 WO2015096429A1 (zh) 2013-12-25 2014-06-24 通话声音识别方法及装置

Country Status (2)

Country Link
CN (1) CN104751848A (zh)
WO (1) WO2015096429A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113225327A (zh) * 2021-04-29 2021-08-06 心动网络股份有限公司 基于语音识别的登录客户监督方法、装置、设备及介质

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106790949A (zh) * 2015-11-20 2017-05-31 北京奇虎科技有限公司 恶意电话的语音特征库的配置方法和装置
CN105590632B (zh) * 2015-12-16 2019-01-29 广东德诚科教有限公司 一种基于语音相似性识别的s-t教学过程分析方法
WO2018170816A1 (zh) * 2017-03-23 2018-09-27 李卓希 一种呼叫控制处理方法和移动终端
CN108122555B (zh) * 2017-12-18 2021-07-23 北京百度网讯科技有限公司 通讯方法、语音识别设备和终端设备
CN107846493B (zh) * 2017-12-21 2019-10-25 Oppo广东移动通信有限公司 通话联系人控制方法、装置及存储介质和移动终端

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1852560A (zh) * 2005-07-22 2006-10-25 华为技术有限公司 一种用户身份识别方法和呼叫控制方法与系统
US20080159488A1 (en) * 2006-12-27 2008-07-03 Chander Raja Voice based caller identification and screening
CN102576530A (zh) * 2009-10-15 2012-07-11 索尼爱立信移动通讯有限公司 对声音模式加了标签的联系人
CN102780819A (zh) * 2012-07-27 2012-11-14 广东欧珀移动通信有限公司 一种移动终端的语音识别联系人的方法
CN103377652A (zh) * 2012-04-25 2013-10-30 上海智臻网络科技有限公司 一种用于进行语音识别的方法、装置和设备

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101442579A (zh) * 2007-11-23 2009-05-27 中兴通讯股份有限公司 一种具有语音识别主叫用户信息的移动终端
JP2011119953A (ja) * 2009-12-03 2011-06-16 Hitachi Ltd 呼制御および通話録音の機能を用いた通話録音システム
CN202142288U (zh) * 2011-07-07 2012-02-08 龙旗科技(上海)有限公司 一种便携终端的安全语音通讯装置
CN103281425A (zh) * 2013-04-25 2013-09-04 广东欧珀移动通信有限公司 一种通过通话声音分析联系人的方法及装置
CN103313249B (zh) * 2013-05-07 2017-05-10 百度在线网络技术(北京)有限公司 用于终端的提醒方法、系统和服务器

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1852560A (zh) * 2005-07-22 2006-10-25 华为技术有限公司 一种用户身份识别方法和呼叫控制方法与系统
US20080159488A1 (en) * 2006-12-27 2008-07-03 Chander Raja Voice based caller identification and screening
CN102576530A (zh) * 2009-10-15 2012-07-11 索尼爱立信移动通讯有限公司 对声音模式加了标签的联系人
CN103377652A (zh) * 2012-04-25 2013-10-30 上海智臻网络科技有限公司 一种用于进行语音识别的方法、装置和设备
CN102780819A (zh) * 2012-07-27 2012-11-14 广东欧珀移动通信有限公司 一种移动终端的语音识别联系人的方法

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113225327A (zh) * 2021-04-29 2021-08-06 心动网络股份有限公司 基于语音识别的登录客户监督方法、装置、设备及介质

Also Published As

Publication number Publication date
CN104751848A (zh) 2015-07-01

Similar Documents

Publication Publication Date Title
CN113794805B (zh) 一种goip诈骗电话的检测方法、检测系统
US9607621B2 (en) Customer identification through voice biometrics
WO2015096429A1 (zh) 通话声音识别方法及装置
CN105306657B (zh) 身份识别方法、装置及通讯终端
KR101881058B1 (ko) 음성 검증 방법, 장치 및 시스템
CN104537746A (zh) 智能电子门控制方法、系统及设备
WO2016169095A1 (zh) 终端的报警方法及装置
CN105554223A (zh) 一种建立连接的方法及移动终端
US20180013869A1 (en) Integration of voip phone services with intelligent cloud voice recognition
CN107995381B (zh) 一种报警终端、云端及其报警处理方法、以及存储介质
CN204990444U (zh) 智能安防控制设备
WO2017201874A1 (zh) 终端丢失提示方法及装置
US8483672B2 (en) System and method for selective monitoring of mobile communication terminals based on speech key-phrases
CN109039509A (zh) 一种语音控制广播设备的方法及广播设备
WO2017059679A1 (zh) 一种帐号处理方法及装置
CN112333709B (zh) 一种跨网络涉诈关联分析方法、系统及计算机存储介质
CN107707754A (zh) 一种智能终端寻回方法和装置
WO2018166367A1 (zh) 一种实时对话中的实时提醒方法、装置、存储介质及电子装置
JP2016149636A (ja) 認証装置、電話端末、認証方法および認証プログラム
EP2723036A1 (en) System and method for user-privacy-aware communication monitoring and analysis
JP2016071068A (ja) 通話解析装置、通話解析方法および通話解析プログラム
US20180343342A1 (en) Controlled environment communication system for detecting unauthorized employee communications
CN107820251A (zh) 一种网络接入的方法、装置及系统
CN106886697A (zh) 认证方法、认证平台、用户终端及认证系统
US20160028724A1 (en) Identity Reputation

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 14873352

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 14873352

Country of ref document: EP

Kind code of ref document: A1