WO2015096429A1

WO2015096429A1 - Call voice recognition method and apparatus

Info

Publication number: WO2015096429A1
Application number: PCT/CN2014/080661
Authority: WO
Inventors: 雷杨; 华国栋; 王勿英
Original assignee: 中兴通讯股份有限公司
Priority date: 2013-12-25
Filing date: 2014-06-24
Publication date: 2015-07-02
Also published as: CN104751848A

Abstract

Disclosed are a call voice recognition method and an apparatus, wherein the method comprises: obtaining a voice sample of a call target who is placing a call; comparing the voice sample with voice in a voice model library; and recognizing call voice based on a comparison result. The present invention resolves a problem that a fraud event is easy to occur because a terminal in related art cannot distinguish an identity of a peer-end call person through call voice, enables the terminal to distinguish the identity of the peer-end call person through the call voice, and improves security.

Description

BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to the field of mobile applications, and in particular to a method and apparatus for recognizing a voice of a call. BACKGROUND OF THE INVENTION At present, communication technology has been greatly developed. While the communication industry is developing rapidly, criminal activities using these means of communication for fraud are becoming increasingly rampant, and telephone fraud is one of them. Telephone fraud, that is, using the phone for fraudulent activities, an important means of fraud by criminals is to scam by calling the victim's acquaintance to call the victim. In many cases, the victim cannot immediately distinguish the opposite caller by voice. Identity, or because the face does not promptly challenge the identity of the other party, may lead to fraud. In view of the related art, the terminal cannot easily identify the problem of the fraud event because the identity of the opposite party cannot be identified by the voice of the call, and no reasonable solution has been proposed yet. SUMMARY OF THE INVENTION The present invention provides a method and apparatus for recognizing a call voice, so as to at least solve the problem that a terminal can easily cause a fraud event due to the inability of the terminal to discriminate the identity of the correspondent caller by the call voice. According to an aspect of the present invention, a call voice recognition method is provided, including: acquiring a sound sample of a call object that performs a call; comparing the sound sample with a sound in a sound model library; and speaking the call according to the comparison result The sound is identified. Before the sound sample is compared with the sound in the sound model library, the method further includes: sampling and saving the sound of the contact in the address book of the mobile terminal to establish a sound model library, where The sound model library is stored in the remote server and/or in the mobile terminal. Sampling and saving the voice of the contact in the address book of the mobile terminal includes: extracting the sampled sound into a sound vector, and converting the digital vector into a digital vector. Comparing the sound sample with the sound in the sound model library includes: acquiring a counterpart number of the call; searching for a sound in the sound model library according to the counterpart number, and comparing the sound sample with the found sound Compare. In the case that the sound search fails in the sound model library according to the counterpart number, the method further includes: comparing the sound sample with all the sounds in the sound model library. Identifying the call voice according to the comparison result includes: when the similarity of the sound found in the sound sample and the sound model library is greater than or equal to a threshold, identifying the call object as the sound model library The user corresponding to the middle sound model; when the similarity between the sound sample and the sound found in the sound model library is less than a threshold, it is confirmed that the call object is a stranger. The method further includes: notifying the mobile terminal of the recognition result of the call object. According to another aspect of the present invention, a call voice recognition apparatus is further provided, including: an acquisition module, configured to acquire a sound sample of a call object that performs a call of the mobile terminal; and a comparison module configured to set the sound sample and the sound model The sounds in the library are compared; the recognition module is arranged to recognize the call sound based on the comparison result. The device further includes: a saving module, configured to sample and save the sound of the contact in the address book of the mobile terminal, to establish a sound model library, wherein the sound model library is stored in the remote server and / or in the mobile terminal. The saving module includes: an extracting unit configured to perform sound feature extraction on the sampled sound and convert the image into a digital vector; and the saving unit is configured to save the digital vector. The comparison module includes: an obtaining unit configured to acquire a counterpart number of the call; a comparing unit configured to search for a sound in the sound model library according to the counterpart number, and the sound sample and the found sound Compare. The comparison module is further configured to compare the sound samples with all of the sounds in the sound model library in the event that the sound search fails in the sound model library according to the counterpart number. The comparison module and the identification module are located in the mobile terminal or in a server on the network side. The identification module is configured to identify the call object as a sound model corresponding to the sound model library when the similarity between the sound sample and the sound found in the sound model library is greater than or equal to a threshold value The user confirms that the call object is a stranger when the similarity between the sound sample and the sound found in the sound model library is less than a threshold. The device further includes: a notification module, configured to notify the mobile terminal of the recognition result of the call object. According to the present invention, a sound sample for acquiring a call object for making a call is used; the sound sample is compared with the sound in the sound model library; and the call sound is recognized according to the comparison result, and the terminal is unable to pass the call sound in the related art. Identifying the identity of the peer caller can easily lead to fraudulent incidents, and the terminal can identify the identity of the correspondent caller by voice, and improve security. BRIEF DESCRIPTION OF THE DRAWINGS The accompanying drawings, which are set to illustrate,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, 1 is a flowchart of a voice recognition method according to an embodiment of the present invention; FIG. 2 is a block diagram of a voice recognition apparatus according to an embodiment of the present invention; FIG. 3 is a voice recognition of a voice according to an embodiment of the present invention. FIG. 4 is an optional block diagram 2 of a call voice recognition apparatus according to an embodiment of the present invention; FIG. 5 is an optional block diagram 3 of a call voice recognition apparatus according to an embodiment of the present invention; FIG. 7 is a block diagram of a call voice recognition system module according to an embodiment of the present invention; FIG. 8 is a flow chart of a call voice recognition function according to an embodiment of the present invention. DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS It should be noted that the embodiments in the present application and the features in the embodiments may be combined with each other without conflict. The invention will be described in detail below with reference to the drawings in conjunction with the embodiments. In this embodiment, a call voice recognition method is provided. FIG. 1 is a flowchart of a call voice recognition method according to an embodiment of the present invention. As shown in FIG. 1, the process includes the following steps: Step S102: Acquire a call The sound sample of the call object; Step S104, comparing the sound sample with the sound in the sound model library; Step S106, identifying the call sound according to the comparison result. Through the above steps, the obtained sound sample of the call object is compared with the sound stored in the sound model library in advance, and the call sound is recognized according to the comparison result. Compared with the prior art, the terminal cannot distinguish the opposite call by the call voice. The identity of the person, through the above steps, can identify the voice of the call at the opposite end of the call, and then identify the identity of the person at the opposite end of the call, so that the mobile terminal user can determine whether the opposite end of the call is a stranger. More preferably, the user can select whether to continue the call or adjust the content of the call according to the result of the judgment, and can also select an alarm, thereby effectively reducing the occurrence of the mobile phone fraud event and improving the security. In an alternative embodiment, the sound model library may be pre-established prior to comparing the sound samples to the sounds in the sound model library. The establishment of the sound model library can be implemented in various ways. In this embodiment, a relatively good implementation manner is provided. In this manner, the sound model library is established through the address book of the mobile terminal. The voice of the contact is set up and saved, wherein the sound model library is stored in the remote server and/or in the mobile terminal. For example, the sampling process may be to select a recording and get a sound sample of the contact each time a call to the contact is received. In this case, the user knows the voice of the contact, so that a more accurate sound sample can be obtained. The sound model library may be corresponding to each user. For example, both user A and user B have their own sound model libraries. Alternatively, the sound database can also be shared by multiple users or a group of users. For example, all users of a company or a group share a sound model library, and the shared sound model library can be concentrated after each user records the sound sample by himself. Formed together. In addition, as a service that the operator can provide, the operator can use the obtained sound samples of all users as a large sound model library, and the sound model library can provide users with more comprehensive voice recognition. The sampling process and the saving of the voice of the contact may be implemented in various manners. In this embodiment, a preferred implementation manner is provided. In this manner, the sound obtained by the sampling may be extracted and converted. For the digital vector, the digital vector is saved, and then the voice of the contact in the address book of the mobile terminal is sampled and saved. In another optional embodiment, there are many ways to obtain a party. There is a relatively straightforward way to obtain the party number of the call, find the voice in the voice model library according to the number of the party, and find the voice sample and the sound. The sound is compared. When the other party number exists in the address book of the mobile terminal, and the sound model library is sampled and saved by the voice of the contact in the address book, the other party number is directly searched in the sound model library in the sound model library. The sound in the middle, compares the sound sample with the found sound; when the other party number is not in the address book of the mobile terminal, finds whether the other party's number has a corresponding sound in the sound model library, if there is a corresponding sound , compares the sound sample to the sound you find. More optionally, the sound samples can be compared with all the sounds in the sound model library in the case where the sound search fails in the sound model library according to the counterpart number. Optionally, for the recognition of the sound, a similarity determination method may be adopted. When the similarity of the sounds found in the sound sample and the sound model library is greater than or equal to the threshold, the call object is identified as the sound model library. The user corresponding to the sound model; when the similarity of the sounds found in the sound sample and the sound model library is less than the threshold, the call object is confirmed to be a stranger. Optionally, the recognition result of the call object may also be notified to the mobile terminal. In the embodiment, a call voice recognition device is also provided, and the device is used to implement the foregoing device. The description of the device in the device is not described here. The name of the module in the device should not be understood as The module is defined, for example, an acquisition module, which is set to obtain a sound sample of a call object for making a call, and may also be expressed as "a module for acquiring a sound sample of a call object for making a call", the module described below The function can be implemented by the processor. 2 is a block diagram of a call voice recognition apparatus according to an embodiment of the present invention. As shown in FIG. 2, the method includes: an acquisition module 22, a comparison module 24, and an identification module 26. Optionally, the obtaining module 22 is configured to obtain a sound sample of the call object that performs the call; the comparing module 24 is configured to compare the sound sample with the sound in the sound model library; and the identifying module 26 is configured to The call voice is recognized. Alternatively, the comparison module 24 and the identification module 26 may be located in the mobile terminal or in a server on the network side. 3 is an optional block diagram of a call voice recognition apparatus according to an embodiment of the present invention. As shown in FIG. 3, the apparatus further includes: a saving module 32 configured to sample a voice of a contact in an address book of the mobile terminal. Processing and saving to build a sound model library, wherein the sound model library is stored in the remote server and/or in the mobile terminal. 4 is an optional block diagram 2 of a call voice recognition apparatus according to an embodiment of the present invention. As shown in FIG. 4, the saving module 32 includes: an extracting unit 42 configured to perform sound feature extraction on the sampled sound and convert it into a digital vector. The save unit 44 is set to save the digital vector. 5 is an optional block diagram 3 of a call voice recognition apparatus according to an embodiment of the present invention. As shown in FIG. 5, the comparison module 24 includes: an acquisition unit 52 configured to acquire a counterpart number of a call; and a comparison unit 54 configured to The number looks up the sound in the sound model library and compares the sound sample to the found sound. Optionally, the comparison module 24 is further configured to compare the sound samples to all of the sounds in the sound model library in the event that the sound search fails in the sound model library based on the counterpart number. Optionally, the identification module 26 is configured to identify the call object as a user corresponding to the sound model in the sound model library when the similarity of the sounds found in the sound sample and the sound model library is greater than or equal to the threshold value; When the similarity between the sound sample and the sound found in the sound model library is less than the threshold, it is confirmed that the call object is a stranger. FIG. 6 is an optional block diagram of a call voice recognition apparatus according to an embodiment of the present invention. As shown in FIG. 6, the apparatus further includes: a notification module 62, configured to notify the mobile terminal of the recognition result of the call object. The following description will be made in conjunction with alternative embodiments. In this alternative embodiment, a mobile terminal and a call identification method capable of discriminating a speaker identity by a call voice are proposed, which are used to prevent criminals from spoofing by calling an acquaintance of a mobile phone user to reach a victim. And also providing a sound analysis device for the mobile terminal, the device first sampling a sound of a contact in the mobile phone address book, establishing a sound model library, and storing it in a remote server or a mobile terminal; In the process, the voice of the incoming call is first sampled, and then the sound sample is uploaded to a remote server or a mobile terminal, and the remote server or the mobile terminal compares the sound sample with the sound model library or classifies the mode to obtain a sound similarity conclusion. Thereby identifying the identity of the correspondent. The apparatus in this alternative embodiment includes two subsystems: a front end subsystem and a back end subsystem. The front-end subsystem can include four modules, namely: 1. a user interface interface module; 2. a sound sampling module; 3. a sound feature extraction module; 4. a communication interface module. The back-end subsystem includes five modules, which are: 1. User configuration management module; 2. Sound feature extraction module; 3. Sound model creation module; 4. Sound recognition module; 5. Communication interface module. The voice recognition module implements the functions of the comparison module 24 and the recognition module 26 described above. These modules are described below. Sound Sampling Module: Responsible for capturing the voice of the other party's speaker during the call, and then handing it over to the sound feature extraction module of the front-end subsystem. Sound Feature Extraction Module: Responsible for converting the acquired sound extraction features into digital vectors. Sound Model Creation Module: Responsible for establishing a sound model for the sound digital vector after feature extraction. Voice recognition module: Used to identify the identity of the caller based on the voice. User Configuration Management Module: The portal for the user to configure the backend subsystem, set to set the parameters created by the sound model. User interface interface module: User interface interface. Communication interface module: Responsible for communication link maintenance of front-end subsystem and back-end subsystem, can support wifi, 3G network, internal communication of the system. FIG. 7 is a block diagram of a call voice recognition system module according to an embodiment of the present invention. As shown in FIG. 7, the front end subsystem includes: a user interface interface module, a sound sampling module, a sound feature extraction module, and a communication interface module. The backend subsystem includes: a user configuration management module, a sound feature extraction module, a voice recognition module, a sound model creation module, and a communication interface module. The front-end subsystem of the device can be deployed to the user's smartphone, and the back-end subsystem of the device can be deployed to the user's smartphone or deployed to the back-end server. If the back-end subsystem is deployed on the smartphone, the front-end subsystem and the back-end subsystem use the internal communication communication mode of the mobile phone operating system. If the back-end subsystem is deployed to the back-end server, the front-end subsystem and the back-end subsystem use wifi or 3G network communication method. The backend subsystem is responsible for creating and storing the voice model of the contacts in the address book for the mobile phone user, and the front end subsystem is responsible for sampling the voice of the opposite speaker during the mobile phone call, and then uploading the sampled and feature extracted sound samples to the rear terminal. The system, the back-end subsystem identifies the opposite speaker based on the sound model library. A typical application scenario is as follows: Xiao Ming installed the system on his newly purchased mobile phone. After installing the system, Xiao Ming's friend Xiao Ma and Xiao Ming telephone, the pony's voice model is stored by the system. A few days later, a person who claimed to be a pony called Xiao Ming using the mobile phone number of the non-addressed pony. The voice of the caller will be matched or pattern classified in the sound model library of the system, and then the system will prompt Xiao Ming is the identity of this caller. FIG. 8 is a flowchart of a call voice recognition function according to an embodiment of the present invention. As shown in FIG. 8, the process includes the following steps:

5801, the phone received an incoming call.

5802, the front-end subsystem of the device will match the phone address book to confirm whether the caller number belongs to the existing number in the address book. If the caller number belongs to the existing number in the address book, go to S803; if the caller number does not belong to the existing number in the address book, go to S804.

5803. If the caller ID number belongs to an existing number in the address book, the front-end subsystem of the device queries the user address book to confirm whether the number has a sound model in the sound model library. If the number already has a sound model in the sound model library, go to S804; otherwise, go to S807.

S804, if the sound model exists in the number, the front end subsystem sound feature extraction module of the device picks up the voice of the opposite caller in the sample call, and performs feature extraction, and then proceeds to S805.

S805, the front-end subsystem inputs the sound feature extracted by the sound feature extraction module of the S804 as a voice input module input to the back-end subsystem, and the voice recognition module identifies the opposite caller of the call according to the sound model in the sound model library. Identity. S806. The user interface interface module module notifies the mobile phone user of the identity of the peer speaker.

S807, if there is no sound model in the sound model library, the sound sampling module of the front end subsystem of the device uploads the sampled sound sample to the back end subsystem using the communication module, and the sound feature extraction module of the back end subsystem Feature extraction is performed on this sound sample, and then go to S808.

S808. The sound model building module of the back end subsystem constructs a sound model by extracting the sound samples from the feature, and then deposits the sound model into the sound model library. The method or device of the alternative embodiment is different from the previous method of human judgment, and the voice of the mobile phone is discriminated by a non-manual method, which can effectively prevent the mobile phone user from being deceived in the telephone fraud. Obviously, those skilled in the art should understand that the above modules or steps of the present invention can be implemented by a general-purpose computing device, which can be concentrated on a single computing device or distributed over a network composed of multiple computing devices. Alternatively, they may be implemented by program code executable by the computing device, such that they may be stored in the storage device by the computing device, or they may be separately fabricated into individual integrated circuit modules, or they may be Multiple modules or steps are made into a single integrated circuit module. Thus, the invention is not limited to any specific combination of hardware and software. The above is only an alternative embodiment of the present invention, and is not intended to limit the present invention, and various modifications and changes can be made to the present invention. Any modifications, equivalent substitutions, improvements, etc. made within the spirit and scope of the present invention are intended to be included within the scope of the present invention. Industrial Applicability: The present invention relates to the field of mobile applications, which adopts a sound sample for acquiring a call object for making a call; compares the sound sample with the sound in the sound model library; and recognizes the call sound according to the comparison result, and solves the related technology Because the terminal can not identify the identity of the opposite party through the voice of the call, it is easy to cause the problem of the fraud event, and the terminal can identify the identity of the opposite party by the voice of the call, thereby improving the security.

Claims

Claim

A call voice recognition method, comprising: acquiring a sound sample of a call object that performs a call;

Comparing the sound samples with sounds in a sound model library;

The call voice is identified based on the comparison result.

2. The method according to claim 1, wherein, before comparing the sound sample with the sound in the sound model library, the method further comprises: sampling the sound of the contact in the address book of the mobile terminal And saving to build a sound model library, wherein the sound model library is stored in a remote server and/or in the mobile terminal.

3. The method according to claim 2, wherein the sampling processing and saving of the voice of the contact in the address book of the mobile terminal comprises:

The sampled sound is extracted into a sound vector, converted into a digital vector, and the digital vector is saved.

4. The method of any of claims 1 , wherein comparing the sound samples to sounds in a sound model library comprises:

Obtaining the counterpart number of the call; searching for a sound in the sound model library according to the counterpart number, and comparing the sound sample with the found sound.

The method according to claim 4, wherein, in the case that the sound is found in the sound model library according to the counterpart number, the method further comprises:

The sound samples are compared to all sounds in the sound model library.

The method according to any one of claims 1 to 5, wherein the recognizing the call sound according to the comparison result comprises: a similarity of sounds found in the sound sample and the sound model library When the threshold is greater than or equal to the threshold, the call object is identified as a user corresponding to the sound model in the sound model library; When the similarity between the sound sample and the sound found in the sound model library is less than a threshold, it is confirmed that the call object is a stranger.

The method according to claim 6, wherein the method further comprises: notifying the mobile terminal of the recognition result of the call object.

8. A call voice recognition apparatus, comprising: an acquisition module configured to acquire a sound sample of a call object for making a call of the mobile terminal; a comparison module configured to compare the sound sample with a sound in the sound model library; And being set to identify the call sound according to the comparison result.

The device according to claim 8, wherein the device further comprises: a saving module, configured to perform sampling processing and saving on a voice of a contact in the address book of the mobile terminal, to establish a sound model library, The sound model library is stored in a remote server and/or in the mobile terminal.

The device according to claim 9, wherein the saving module comprises: an extracting unit configured to perform sound feature extraction on the sampled sound and convert it into a digital vector;

A save unit, set to save the digital vector.

The device according to claim 8, wherein the comparison module comprises: an obtaining unit, configured to acquire a counterpart number of the call;

The comparing unit is configured to find a sound in the sound model library based on the counterpart number, and compare the sound sample with the found sound.

12. The apparatus according to claim 11, wherein the comparison module is further configured to: when the sound finding in the sound model library fails according to the counterpart number, the sound sample and the sound model All the sounds in the library are compared.

13. The apparatus of claim 11, wherein the comparison module and the identification module are located in the mobile terminal or in a server on the network side.

The device according to claim 8, wherein the identification module is configured to: when the similarity of the sounds found in the sound sample and the sound model library is greater than or equal to a threshold value, The user corresponding to the sound model in the sound model library is identified; when the similarity of the sounds found in the sound sample and the sound model library is less than a threshold, the call object is confirmed to be a stranger.

The device according to claim 13, wherein the device further comprises:

The notification module is configured to notify the mobile terminal of the recognition result of the call object.