CN111933152B

CN111933152B - Method and device for detecting validity of registered audio and electronic equipment

Info

Publication number: CN111933152B
Application number: CN202011081502.3A
Authority: CN
Inventors: 李健; 邢启洲; 武卫东; 陈明
Original assignee: Beijing Sinovoice Technology Co Ltd
Current assignee: Beijing Sinovoice Technology Co Ltd
Priority date: 2020-10-12
Filing date: 2020-10-12
Publication date: 2021-01-08
Anticipated expiration: 2040-10-12
Also published as: WO2022077918A1; CN111933152A

Abstract

The application provides a detection method, a detection device and electronic equipment for validity of registered audio, wherein the detection method comprises the following steps: acquiring N sections of registered audio, wherein the duration of each section of registered audio is greater than the preset duration; selecting M sections of registration audios from the N sections of registration audios to obtain a plurality of modeling groups, and establishing a first voiceprint recognition model in a one-to-one correspondence mode according to the modeling groups, wherein M is 1< M < N and M is an integer; carrying out similarity matching on each detection audio in the detection group and the corresponding first voiceprint recognition model respectively, wherein the detection group is formed by the registration audio except the modeling group in the N sections of registration audio, and the registration audio in the detection group is the detection audio; and determining that the registered audio is invalid when the number of the detected audio which does not match with the corresponding similarity of the first voiceprint recognition model is greater than or equal to a predetermined number. The method well improves the condition that the person is judged not to be the person in the conventional simple pairwise verification, so that the user experience is good.

Description

Method and device for detecting validity of registered audio and electronic equipment

Technical Field

The present application relates to the field of voiceprint recognition, and in particular, to a method and an apparatus for detecting validity of a registered audio, a computer-readable storage medium, a processor, and an electronic device.

Background

The voiceprint recognition comprises two steps of voiceprint registration and voiceprint verification, wherein the voiceprint registration refers to extracting voiceprint features in registered audio and establishing a corresponding voiceprint user model, and the voiceprint verification refers to extracting voiceprint features in audio to be verified and establishing a corresponding feature model, comparing the feature model with the voiceprint user model and verifying similarity.

However, if the voice of the person other than the voiceprint person himself is mixed in the registration process, the registered voiceprint user model includes the voiceprint features of the person and the other person at the same time. At this time, other persons participating in voiceprint registration than the voiceprint person himself can also pass the authentication of the voiceprint system. That is, the security of the voiceprint authentication system itself cannot be guaranteed. To avoid this, we need to perform validity verification on the voiceprint registration audio to avoid voiceprint registration on non-compliant registration audio.

The prior art generally provides a method for verifying the validity of voiceprint registration audio. The method specifically comprises the following steps: obtaining effective voice when a user registers voice; equally dividing the effective voice into integral parts; respectively extracting the voiceprint characteristics in each voice; and comparing and verifying the voiceprint characteristics of each voice in pairs. However, because the duration of each audio segment is too short, the extracted features hardly cover all the voiceprint features of one voiceprint person. Therefore, the simple method of pairwise comparison for verifying the validity of the registered audio has high false rejection rate.

The above information disclosed in this background section is only for enhancement of understanding of the background of the technology described herein and, therefore, certain information may be included in the background that does not form the prior art that is already known in this country to a person of ordinary skill in the art.

Disclosure of Invention

The present application mainly aims to provide a method, an apparatus, a computer-readable storage medium, a processor, and an electronic device for detecting validity of a registration audio, so as to solve the problem in the prior art that a false rejection rate of a scheme for verifying validity of a voiceprint registration audio is high.

According to an aspect of the embodiments of the present invention, there is provided a method for detecting validity of a registration audio, including: acquiring N sections of registered audio, wherein the duration of each section of registered audio is greater than the preset duration; selecting M sections of the registered audios in the N sections of the registered audios to obtain a plurality of modeling groups, and establishing a first voiceprint recognition model in a one-to-one correspondence mode according to the modeling groups, wherein M is more than 1 and less than N, and M is an integer; performing similarity matching on each detection audio in a detection group and the corresponding first voiceprint recognition model, wherein the detection group is formed by the registration audio except the modeling group in the N sections of registration audio, and the registration audio in the detection group is the detection audio; determining that the registered audio is invalid when the number of detected audio that does not match the corresponding first voiceprint recognition model similarity is greater than or equal to a predetermined number.

Alternatively, N/2 ≦ M ≦ N/2+ 1.

Optionally, selecting M segments of the registered audio from the N segments of the registered audio to obtain a plurality of modeling groups, and establishing a voiceprint recognition model according to the one-to-one correspondence of the modeling groups, including: selecting M sections of the registered audios in the N sections of the registered audios to obtain

A building block is constructed according to

One-to-one correspondence establishment of modeling groups

Each voiceprint recognition model carries out similarity matching on each detection audio in the detection group and the corresponding first voiceprint recognition model respectively, and the similarity matching method comprises the following steps: similarity matching is carried out on each detection audio in the detection group and the corresponding first voiceprint recognition model respectively,

the building modules are matched together

Next, the process is carried out.

Optionally, performing similarity matching on each detected audio in the detection group and the corresponding first voiceprint recognition model respectively, including: establishing a second fingerprint identification model according to each detection audio in the detection group; similarity matching is carried out on the second voiceprint recognition model and the first voiceprint recognition model; determining that the similarity of the second voiceprint recognition model and the first voiceprint recognition model is matched under the condition that the similarity of the second voiceprint recognition model and the first voiceprint recognition model is greater than or equal to a preset threshold value; determining that the similarity of the second voiceprint recognition model and the first voiceprint recognition model does not match in the case that the similarity of the second voiceprint recognition model and the first voiceprint recognition model is smaller than a predetermined threshold.

Optionally, in a case where the number of detected audios that do not match with the corresponding first voiceprint recognition model similarity is greater than or equal to a predetermined number, determining that the registered audio is invalid includes: determining that the registered audio is invalid when the number of detected audio that does not match the corresponding first voiceprint recognition model similarity is greater than or equal to 1.

Optionally, the obtaining N pieces of registration audio includes: acquiring a voice audio; extracting effective voice of the voice audio; and cutting and slicing the effective voice to obtain N sections of the registered voice frequency.

According to another aspect of the embodiments of the present invention, there is also provided a device for detecting validity of a registration audio, including an obtaining unit, a modeling unit, a matching unit, and a determining unit, where the obtaining unit is configured to obtain N segments of registration audio, and a duration of each segment of registration audio is greater than a predetermined duration; the modeling unit is used for selecting M sections of the registered audios in the N sections of the registered audios to obtain a plurality of modeling groups, and establishing a first voiceprint recognition model in a one-to-one correspondence mode according to the modeling groups, wherein M is more than 1 and less than N, and M is an integer; the matching unit is used for performing similarity matching on each detection audio in a detection group and the corresponding first voiceprint recognition model, wherein the detection group is formed by the registration audios except the modeling group in the N sections of registration audios, and the registration audios in the detection group are the detection audios; the determining unit is used for determining that the registered audio is invalid when the number of the detected audio which does not match with the corresponding similarity of the first voiceprint recognition model is larger than or equal to a preset number.

According to still another aspect of embodiments of the present invention, there is provided a computer-readable storage medium including a stored program, wherein the program performs any one of the methods.

According to a further aspect of the embodiments of the present invention, there is provided a processor for executing a program, wherein the program executes to perform any one of the methods.

According to another aspect of the embodiments of the present invention, there is also provided an electronic device, including: one or more processors, memory, and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors, the one or more programs including instructions for performing any of the methods.

In the embodiment of the invention, the method for detecting the validity of the registration audio frequency divides N sections of the registration audio frequency into two parts to obtain the building module and the detection group, wherein the building module comprises M sections of the registration audio frequency, namely at least two sections of the registration audio frequencies, the M sections of the registration audio frequency of the building module are used for forming the first voiceprint recognition model, each detection audio frequency of the detection group is respectively subjected to similarity matching with the registration audio frequency in the corresponding first voiceprint recognition model to determine whether the registration audio frequency is valid or not, and the scheme better improves the condition that the user is judged not to be the user when the user is subjected to the existing simple pairwise verification, namely improves the problem of high false rejection rate, so that the user experience is better.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this application, illustrate embodiments of the application and, together with the description, serve to explain the application and are not intended to limit the application. In the drawings:

fig. 1 shows a schematic flowchart generated by a method for detecting validity of registered audio according to an embodiment of the present application; and

fig. 2 is a schematic block diagram of a device for detecting validity of registered audio according to an embodiment of the present application.

Wherein the figures include the following reference numerals:

10. an acquisition unit; 20. a modeling unit; 30. a matching unit; 40. a determination unit.

Detailed Description

It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present application will be described in detail below with reference to the embodiments with reference to the attached drawings.

In order to make the technical solutions better understood by those skilled in the art, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only partial embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

It should be noted that the terms "first," "second," and the like in the description and claims of this application and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It should be understood that the data so used may be interchanged under appropriate circumstances such that embodiments of the application described herein may be used. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

It will be understood that when an element such as a layer, film, region, or substrate is referred to as being "on" another element, it can be directly on the other element or intervening elements may also be present. Also, in the specification and claims, when an element is described as being "connected" to another element, the element may be "directly connected" to the other element or "connected" to the other element through a third element.

For convenience of description, some terms or expressions referred to in the embodiments of the present application are explained below:

false Acceptance Rate (FAR): the lower the FAR, the better the system safety, which means the proportion of the cases not identified by the principal but recognized by the principal;

false Rejection Rate (FRR): the lower the FRR, the better the user experience, which is the proportion of the situation that the person is judged by the system to be not the person.

As mentioned in the background, in order to solve the above problem, the prior art has a high false rejection rate in the scheme of verifying the validity of the voiceprint registration audio, and in an exemplary embodiment of the present application, a method, an apparatus, a computer-readable storage medium, a processor, and an electronic device for detecting the validity of the registration audio are provided.

According to an embodiment of the present application, there is provided a method of detecting validity of a registered audio.

Fig. 1 is a flowchart generated by a method for detecting validity of registered audio according to an embodiment of the present application. As shown in fig. 1, the above method comprises the following steps:

step S101, obtaining N sections of registered audios, wherein the duration of each section of registered audio is greater than a preset duration;

step S102, selecting M sections of the registered audios in N sections of the registered audios to obtain a plurality of modeling groups, and establishing a first voiceprint recognition model in a one-to-one correspondence mode according to the modeling groups, wherein M is more than 1 and less than N, and M is an integer;

step S103, performing similarity matching on each detection audio in a detection group with the corresponding first voiceprint recognition model, wherein the detection group is formed by the registration audio except the modeling group in the N sections of registration audio, and the registration audio in the detection group is the detection audio;

step S104, determining that the registered audio is invalid when the number of the detected audio that does not match the corresponding similarity of the first voiceprint recognition model is greater than or equal to a predetermined number.

The method for detecting the validity of the registered audio obtains the building module and the detection group by dividing N sections of the registered audio into two parts, wherein the building module comprises M sections of the registered audio, namely at least two registered audios, the M sections of the registered audio of the building module are used for forming the first voiceprint recognition model, each detection audio of the detection group is respectively subjected to similarity matching with the registered audio in the corresponding first voiceprint recognition model, whether the registered audio is valid or not is determined, and the scheme better improves the condition that the user is determined not to be the user when the user is subjected to pairwise verification in the prior art, namely the problem of high false rejection rate is solved, so that the user experience is better.

Specifically, the above method for detecting the validity of the registered audio may be applied to a numeric string mode or a text mode.

In a typical embodiment of the application, N/2 is greater than or equal to M and is less than or equal to N/2+1, and when N is larger, the value of M is correspondingly larger, that is, the number of segments of the registered audio used for establishing the first voiceprint recognition model is larger, so that the first voiceprint recognition model is more accurate, the false acceptance rate and the false rejection rate can be reduced, and the security and the user experience of the voiceprint recognition process are ensured. In addition, in the scheme, the fact that the duration of each section of the registered audio is longer than the preset duration ensures that when the registered audio contains a plurality of speakers, one or more conditions exist so that the voice of the non-registered speaker is not contained in the detection group at all, that is, one or more conditions exist so that the detected audio does not pass through the similarity matching, and further ensures the safety of voiceprint recognition.

According to another exemplary embodiment of the present application, selecting M segments of the registered audio from among N segments of the registered audio to obtain a plurality of modeling groups, and building a voiceprint recognition model according to the modeling groups in a one-to-one correspondence manner, includes: selecting M sections of the registered audios from N sections of the registered audios to obtain

A building block is constructed according to

One-to-one correspondence establishment of modeling groups

Each voiceprint recognition model carries out similarity matching on each detection audio in the detection group and the corresponding first voiceprint recognition model respectively, and the similarity matching method comprises the following steps: similarity matching is carried out on each detection audio in the detection group and the corresponding first voiceprint recognition model,

the building modules are matched together

Next, the process is carried out. According to the method, the first voiceprint recognition models are more accurate through establishing the plurality of voiceprint recognition models, each detection audio in the detection group is subjected to similarity matching with the corresponding first voiceprint recognition model, all the registration audios are guaranteed to be matched, the low false acceptance rate and the low false rejection rate are further guaranteed, and further the voiceprint recognition safety and the user experience are guaranteed. And the process of establishing the voiceprint recognition model and the similarity matching can be realized by using a typical vector method, so that the time and the labor are saved, and the implementability is high.

According to still another exemplary embodiment of the present application, similarity matching each detected audio in the detection group with the corresponding first voiceprint recognition model includes: establishing a second fingerprint identification model according to each detection audio in the detection group; carrying out similarity matching on the second voiceprint recognition model and the first voiceprint recognition model; determining that the similarity between the second voiceprint recognition model and the first voiceprint recognition model is matched when the similarity between the second voiceprint recognition model and the first voiceprint recognition model is greater than or equal to a predetermined threshold; and determining that the similarity between the second voiceprint recognition model and the first voiceprint recognition model is not matched when the similarity between the second voiceprint recognition model and the first voiceprint recognition model is smaller than a preset threshold value. The method determines whether the similarity between the second voiceprint recognition model and the first voiceprint recognition model is matched by matching the similarity between the second voiceprint recognition model and the first voiceprint recognition model and comparing the similarity with the preset threshold, thereby further avoiding the high proportion of the situations that the person is not identified but is the person and the person is judged not to be the person, namely ensuring the low false acceptance rate and the low false rejection rate, and further ensuring the high safety and the good user experience of the voiceprint recognition process.

According to a specific embodiment of the present application, in a case that the number of the detected audios that do not match with the corresponding similarity of the first voiceprint recognition model is greater than or equal to a predetermined number, determining that the registered audio is invalid includes: and determining that the registered audio is invalid when the number of the detected audio which does not match with the corresponding similarity of the first voiceprint recognition model is greater than or equal to 1. Therefore, the probability that the user is not the user but the user is identified is further reduced, the safety in the voiceprint identification process is further ensured, and the preset number can be set differently according to the safety requirements of voiceprint identification in different scenes.

According to another specific embodiment of the present application, acquiring N segments of registration audio includes: acquiring a voice audio; extracting effective voice of the voice audio; and cutting and slicing the effective voice to obtain N sections of the registered voice frequency. According to the method, the effective voice is cut and segmented by obtaining one voice audio, so that N sections of the registered audios are obtained, each registered audio is guaranteed to contain enough voiceprint features, and the safety and the user experience sense of the method during detection are further guaranteed.

Certainly, in an actual application process, when the acquired registration audio is short in duration, a section of registration audio can be formed in a splicing manner.

It should be noted that the steps illustrated in the flowcharts of the figures may be performed in a computer system such as a set of computer-executable instructions and that, although a logical order is illustrated in the flowcharts, in some cases, the steps illustrated or described may be performed in an order different than presented herein.

The embodiment of the present application further provides a device for detecting validity of a registration audio, and it should be noted that the device for detecting validity of a registration audio in the embodiment of the present application may be used to execute the method for detecting validity of a registration audio provided in the embodiment of the present application. The following describes an apparatus for detecting validity of registered audio provided in an embodiment of the present application.

Fig. 2 is a schematic structural diagram of an apparatus for detecting validity of registered audio according to an embodiment of the present application. As shown in fig. 2, the apparatus includes: the device comprises an acquisition unit 10, a modeling unit 20, a matching unit 30 and a determination unit 40, wherein the acquisition unit 10 is used for acquiring N sections of registered audio, and the duration of each section of registered audio is greater than a preset duration; the modeling unit 20 is configured to select M segments of the registered audio from among N segments of the registered audio to obtain a plurality of modeling groups, and build a first voiceprint recognition model in a one-to-one correspondence according to the modeling groups, where 1< M < N, and M is an integer; the matching unit 30 is configured to perform similarity matching between each detection audio in a detection group and the corresponding first voiceprint recognition model, where the detection group is formed by the registration audio of N segments of the registration audio except the modeling group, and the registration audio in the detection group is the detection audio; the determining unit 40 is configured to determine that the registered audio is invalid when the number of detected audio that does not match the corresponding similarity of the first voiceprint recognition model is greater than or equal to a predetermined number.

The device for detecting the validity of the registered audio acquires N sections of registered audio with the duration longer than the preset duration through an acquisition unit, then selects M sections of the registered audio through a modeling unit, establishes the first voiceprint recognition model, then carries out similarity matching on each detected audio in the detection group and the registered audio in the first voiceprint recognition model through the matching unit, and finally determines whether the registered audio is valid according to the matching condition through the determination unit. And respectively carrying out similarity matching on each detection audio of the detection group and the corresponding registration audio in the first voiceprint recognition model to determine whether the registration audio is valid, so that the lower proportion of the detection audio which is not the user but is recognized as the user is ensured, the lower proportion of the detection audio which is judged as the user but not the user is also ensured, namely, the lower false acceptance rate and the lower false rejection rate are ensured, and the safety and the user experience sense of the voiceprint recognition process are considered.

Specifically, the above-described detection apparatus for the validity of the registered audio may be applied to a numeric string mode as well as a text mode.

In a typical embodiment of the application, N/2 is greater than or equal to M and is less than or equal to N/2+1, and when N is larger, the value of M is correspondingly larger, that is, the number of segments of the registered audio used for establishing the first voiceprint recognition model is larger, so that the first voiceprint recognition model is more accurate, the false acceptance rate and the false rejection rate can be reduced, and the security and the user experience of the voiceprint recognition process are ensured.

According to another exemplary embodiment of the present application, the modeling unit 20 includes a modeling module, and the modeling module is configured to select M pieces of the registered audio from N pieces of the registered audio to obtain M pieces of registered audio

A building block is constructed according to

One-to-one correspondence establishment of modeling groups

A voiceprint recognition model. The matching unit 30 includes a matching module, which is configured to perform similarity matching between each detected audio in the detection group and the corresponding first voiceprint recognition model,

the building modules are matched together

Next, the process is carried out. The device ensures that the first voiceprint recognition model is more accurate by establishing a plurality of voiceprint recognition models, and carries out similarity matching on each detection audio in the detection group and the corresponding first voiceprint recognition model, so that all the registration audios are ensured to be matched, the low false acceptance rate and the low false rejection rate are further ensured, and the voiceprint recognition safety and the user experience are further ensured. And the process of establishing the voiceprint recognition model and the similarity matching can be realized by using a typical vector method, so that the time and the labor are saved, and the implementability is high.

According to another exemplary embodiment of the present application, the matching module includes: the system comprises an establishing submodule, a matching submodule, a first determining submodule and a second determining submodule, wherein the establishing submodule is used for establishing a second voiceprint recognition model according to each detection audio in the detection group; the matching submodule is used for carrying out similarity matching on the second voiceprint recognition model and the first voiceprint recognition model; the first determining submodule is configured to determine that the second voiceprint recognition model matches the first voiceprint recognition model when the similarity between the second voiceprint recognition model and the first voiceprint recognition model is greater than or equal to a predetermined threshold; (ii) a The second determining submodule is configured to determine that the similarity between the second voiceprint recognition model and the first voiceprint recognition model does not match when the similarity between the second voiceprint recognition model and the first voiceprint recognition model is smaller than a predetermined threshold. The device determines whether the similarity between the second voiceprint recognition model and the first voiceprint recognition model is matched by matching the similarity between the second voiceprint recognition model and the first voiceprint recognition model and comparing the similarity with the preset threshold, so that the situation that the person is not identified but is not identified and the person is judged not to be the person is further avoided, namely, the device guarantees low false acceptance rate and false rejection rate, and further guarantees high safety and good user experience of the voiceprint recognition process.

In a specific embodiment of the present application, the determining unit 40 includes a determining module, and the determining module is configured to determine that the registered audio is invalid when the number of the detected audio that does not match with the corresponding similarity of the first voiceprint recognition model is greater than or equal to 1. Therefore, the probability that the user is not the user but the user is identified is further reduced, the safety in the voiceprint identification process is further ensured, and the preset number can be set differently according to the safety requirements of voiceprint identification in different scenes.

In another specific embodiment of the present application, the obtaining unit 10 includes: the device comprises an acquisition module, an extraction module and a cutting module, wherein the acquisition module is used for acquiring a voice audio; the extraction module is used for extracting effective voice of the voice audio; the cutting module is used for cutting and slicing the effective voice to obtain N sections of the registered voice frequency. The device cuts and fragments the effective voice by obtaining the voice audio and extracting the effective voice to obtain N sections of the registered audio, and ensures that each registered audio contains enough voiceprint characteristics, thereby ensuring the safety and the user experience sense when the method is used for detecting.

The device for detecting the validity of the registered audio comprises a processor and a memory, wherein the acquiring unit 10, the modeling unit 20, the matching unit 30, the determining unit 40 and the like are stored in the memory as program units, and the processor executes the program units stored in the memory to realize corresponding functions.

The processor comprises a kernel, and the kernel calls the corresponding program unit from the memory. The kernel can be set to be one or more than one, and the problem of high false rejection rate of a scheme for verifying the validity of the voiceprint registration audio in the prior art is solved by adjusting kernel parameters.

The memory may include volatile memory in a computer readable medium, Random Access Memory (RAM) and/or nonvolatile memory such as Read Only Memory (ROM) or flash memory (flash RAM), and the memory includes at least one memory chip.

An embodiment of the present invention provides a storage medium, on which a program is stored, and the program, when executed by a processor, implements the method for detecting validity of a registered audio.

The embodiment of the invention provides a processor, which is used for running a program, wherein the program executes the method for detecting the validity of the registered audio when running.

The embodiment of the invention provides equipment, which comprises a processor, a memory and a program which is stored on the memory and can run on the processor, wherein when the processor executes the program, at least the following steps are realized:

The device herein may be a server, a PC, a PAD, a mobile phone, etc.

The present application further provides a computer program product adapted to perform a program of initializing at least the following method steps when executed on a data processing device:

In the above embodiments of the present invention, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.

In the embodiments provided in the present application, it should be understood that the disclosed technology can be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the above-described division of the units may be a logical division, and in actual implementation, there may be another division, for example, multiple units or components may be combined or may be integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, units or modules, and may be in an electrical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated unit may be stored in a computer-readable storage medium if it is implemented in the form of a software functional unit and sold or used as a separate product. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the above methods according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic or optical disk, and other various media capable of storing program codes.

In order to make the technical solutions of the present application more clearly understood by those skilled in the art, the following description will be given with reference to specific embodiments.

Example 1

According to the detection method for the validity of the registration audio, the detection of the validity of the registration audio in the text mode is carried out:

firstly, effective voice extraction is carried out on original voice audio;

then, the effective voice is cut and sliced to obtain N sections of the registered voice frequency;

and then, carrying out validity detection on the registration audio by using the detection method of the validity of the registration audio to determine whether the registration audio is valid.

The effective voice can be extracted by methods such as endpoint detection, semantic clause segmentation and the like, and also can be further extracted by methods such as voice clustering and the like for speaker separation on the basis of the methods; the slice may be cut into 2s of the registration audios to ensure that each of the registration audios contains sufficient voiceprint features, and may also be cut into other time periods of registration audios.

Example 2

Testing according to the method for detecting the validity of the registration audios, wherein the testing is divided into 3 testing sets, each testing set comprises 50 groups of registration audios, and each group of registration audios in the testing set 1 all come from the same voiceprint person; each group of registered audios in the test set 2 comprises 4 pieces from the same person and 1 piece from another person; each set of registered audios in test set 3 has 3 segments from the same person and 2 segments from another person. Three different schemes are adopted to obtain the detection results in the following table, wherein the numbers in the cells in the table are the number of people to be misjudged.

The first scheme is as follows: only 1 section of registered audio is in the first voiceprint recognition model, four second voiceprint recognition models formed by using 4 sections of detected audio are respectively subjected to similarity matching with the first voiceprint recognition model, and when the number of the detected audio which is not matched with the similarity of the registered audio in the corresponding first voiceprint recognition model is more than or equal to 1, the detection is determined to be invalid. The results are shown in table 1:

TABLE 1

Scheme II: and 4 sections of registered audios exist in the first voiceprint recognition model, a second voiceprint recognition model formed by using 1 section of detection audio is respectively subjected to similarity matching with the first voiceprint recognition model, and when the number of the detection audio which is not matched with the similarity of the registered audio in the corresponding first voiceprint recognition model is more than or equal to 1, the second voiceprint recognition model is determined to be invalid. The results are shown in table 2:

TABLE 2

The third scheme is as follows: and when the number of the detection audios which are not matched with the similarity of the registration audios in the corresponding first voiceprint recognition model is more than or equal to 1, the detection audios are determined to be invalid. The results are shown in table 3:

TABLE 3

The test results of the first scheme and the second scheme are not ideal, because the time length of the registered audio of the first scheme is too short, the effect is poor; the second scheme is that under the condition that the 2-segment audio is another person, the registered audio always contains the voiceprint person of the detected audio, so that the effective detection cannot be carried out. The third scheme avoids the situations, and can simultaneously consider the safety and the user experience when the threshold value is low.

From the above description, it can be seen that the above-described embodiments of the present application achieve the following technical effects:

1) the utility model provides a method for detecting validity of registered audio, which comprises dividing N sections of registered audio into two parts to obtain the building module and the detection group, wherein the building module comprises M sections of registered audio, namely at least two registered audio, the M sections of registered audio of the building module are used for forming the first voiceprint recognition model, each detection audio of the detection group is respectively subjected to similarity matching with the registered audio in the corresponding first voiceprint recognition model to determine whether the registered audio is valid or not.

2) The application provides a device for detecting the validity of registered audio, wherein N sections of registered audio with the duration longer than the preset duration are obtained through an obtaining unit, M sections of registered audio are selected through a modeling unit, the first voiceprint recognition model is established, each detected audio in a detection group is subjected to similarity matching with the registered audio in the first voiceprint recognition model through a matching unit, and finally, whether the registered audio is valid or not is determined through a determining unit according to the matching condition. And respectively carrying out similarity matching on each detection audio of the detection group and the corresponding registration audio in the first voiceprint recognition model to determine whether the registration audio is valid, so that the lower proportion of the detection audio which is not the user but is recognized as the user is ensured, the lower proportion of the detection audio which is judged as the user but not the user is also ensured, namely, the lower false acceptance rate and the lower false rejection rate are ensured, and the safety and the user experience sense of the voiceprint recognition process are considered.

The above description is only a preferred embodiment of the present application and is not intended to limit the present application, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims

1. A method for detecting validity of registered audio, comprising:

acquiring N sections of registered audio, wherein the duration of each section of registered audio is greater than the preset duration;

selecting M sections of the registered audios in the N sections of the registered audios to obtain a plurality of modeling groups, and establishing a first voiceprint recognition model in a one-to-one correspondence mode according to the modeling groups, wherein M is more than 1 and less than N, and M is an integer;

performing similarity matching on each detection audio in a detection group and the corresponding first voiceprint recognition model, wherein the detection group is formed by the registration audio except the modeling group in the N sections of registration audio, and the registration audio in the detection group is the detection audio;

determining that the registered audio is invalid when the number of detected audio that does not match the corresponding first voiceprint recognition model similarity is greater than or equal to a predetermined number.

2. The method of claim 1, wherein N/2 ≦ M ≦ N/2+ 1.

3. The method of claim 2,

selecting M sections of the registered audios in the N sections of the registered audios to obtain a plurality of modeling groups, and establishing a voiceprint recognition model according to the one-to-one correspondence of the modeling groups, wherein the method comprises the following steps:

selecting M sections of the registered audios in the N sections of the registered audios to obtain

A building block is constructed according to

One-to-one correspondence establishment of modeling groups

A voice print recognition model for recognizing the voice print of the user,

and respectively carrying out similarity matching on each detection audio in the detection group and the corresponding first voiceprint recognition model, wherein the similarity matching comprises the following steps:

similarity matching is carried out on each detection audio in the detection group and the corresponding first voiceprint recognition model respectively,

each of the building modulesMatching

Next, the process is carried out.

4. The method of claim 1, wherein similarity matching each detected audio in a detection group with the corresponding first voiceprint recognition model comprises:

establishing a second fingerprint identification model according to each detection audio in the detection group;

similarity matching is carried out on the second voiceprint recognition model and the first voiceprint recognition model;

determining that the similarity of the second voiceprint recognition model and the first voiceprint recognition model is matched under the condition that the similarity of the second voiceprint recognition model and the first voiceprint recognition model is greater than or equal to a preset threshold value;

determining that the similarity of the second voiceprint recognition model and the first voiceprint recognition model does not match in the case that the similarity of the second voiceprint recognition model and the first voiceprint recognition model is smaller than a predetermined threshold.

5. The method according to claim 1, wherein determining that the registered audio is invalid in a case where the number of detected audios that do not match the corresponding first voiceprint recognition model similarity is greater than or equal to a predetermined number comprises:

determining that the registered audio is invalid when the number of detected audio that does not match the corresponding first voiceprint recognition model similarity is greater than or equal to 1.

6. The method according to any one of claims 1 to 5, wherein obtaining N pieces of registered audio comprises:

acquiring a voice audio;

extracting effective voice of the voice audio;

and cutting and slicing the effective voice to obtain N sections of the registered voice frequency.

7. An apparatus for detecting validity of registered audio, comprising:

the device comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is used for acquiring N sections of registered audios, and the duration of each section of registered audio is greater than the preset duration;

the modeling unit is used for selecting M sections of the registered audios in the N sections of the registered audios to obtain a plurality of modeling groups, and establishing a first voiceprint recognition model according to the modeling groups in a one-to-one correspondence mode, wherein M is more than 1 and less than N, and M is an integer;

a matching unit, configured to perform similarity matching on each detection audio in a detection group and the corresponding first voiceprint recognition model, where the detection group is formed by the registration audios, except for the modeling group, in the N segments of registration audios, and the registration audios in the detection group are the detection audios;

a determining unit configured to determine that the registered audio is invalid when the number of detected audios that do not match the corresponding first voiceprint recognition model similarity is greater than or equal to a predetermined number.

8. A computer-readable storage medium, characterized in that the storage medium comprises a stored program, wherein the program performs the method of any one of claims 1 to 6.

9. A detection processor for detecting the validity of registered audio, the processor being configured to run a program, wherein the program is configured to perform the method of any one of claims 1 to 6 when running.

10. An electronic device for detecting validity of registered audio, comprising: one or more processors, memory, and one or more programs stored in the memory and configured to be executed by the one or more processors, the one or more programs including instructions for performing the method of any of claims 1-6.