[go: up one dir, main page]

CN111933152B - Method and device for detecting validity of registered audio and electronic equipment - Google Patents

Method and device for detecting validity of registered audio and electronic equipment Download PDF

Info

Publication number
CN111933152B
CN111933152B CN202011081502.3A CN202011081502A CN111933152B CN 111933152 B CN111933152 B CN 111933152B CN 202011081502 A CN202011081502 A CN 202011081502A CN 111933152 B CN111933152 B CN 111933152B
Authority
CN
China
Prior art keywords
audio
recognition model
registered
voiceprint recognition
detection
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011081502.3A
Other languages
Chinese (zh)
Other versions
CN111933152A (en
Inventor
李健
邢启洲
武卫东
陈明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Sinovoice Technology Co Ltd
Original Assignee
Beijing Sinovoice Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Sinovoice Technology Co Ltd filed Critical Beijing Sinovoice Technology Co Ltd
Priority to CN202011081502.3A priority Critical patent/CN111933152B/en
Publication of CN111933152A publication Critical patent/CN111933152A/en
Application granted granted Critical
Publication of CN111933152B publication Critical patent/CN111933152B/en
Priority to PCT/CN2021/096835 priority patent/WO2022077918A1/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/04Training, enrolment or model building
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/06Decision making techniques; Pattern matching strategies
    • G10L17/14Use of phonemic categorisation or speech recognition prior to speaker recognition or verification

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Business, Economics & Management (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Game Theory and Decision Science (AREA)
  • Telephonic Communication Services (AREA)

Abstract

The application provides a detection method, a detection device and electronic equipment for validity of registered audio, wherein the detection method comprises the following steps: acquiring N sections of registered audio, wherein the duration of each section of registered audio is greater than the preset duration; selecting M sections of registration audios from the N sections of registration audios to obtain a plurality of modeling groups, and establishing a first voiceprint recognition model in a one-to-one correspondence mode according to the modeling groups, wherein M is 1< M < N and M is an integer; carrying out similarity matching on each detection audio in the detection group and the corresponding first voiceprint recognition model respectively, wherein the detection group is formed by the registration audio except the modeling group in the N sections of registration audio, and the registration audio in the detection group is the detection audio; and determining that the registered audio is invalid when the number of the detected audio which does not match with the corresponding similarity of the first voiceprint recognition model is greater than or equal to a predetermined number. The method well improves the condition that the person is judged not to be the person in the conventional simple pairwise verification, so that the user experience is good.

Description

Method and device for detecting validity of registered audio and electronic equipment
Technical Field
The present application relates to the field of voiceprint recognition, and in particular, to a method and an apparatus for detecting validity of a registered audio, a computer-readable storage medium, a processor, and an electronic device.
Background
The voiceprint recognition comprises two steps of voiceprint registration and voiceprint verification, wherein the voiceprint registration refers to extracting voiceprint features in registered audio and establishing a corresponding voiceprint user model, and the voiceprint verification refers to extracting voiceprint features in audio to be verified and establishing a corresponding feature model, comparing the feature model with the voiceprint user model and verifying similarity.
However, if the voice of the person other than the voiceprint person himself is mixed in the registration process, the registered voiceprint user model includes the voiceprint features of the person and the other person at the same time. At this time, other persons participating in voiceprint registration than the voiceprint person himself can also pass the authentication of the voiceprint system. That is, the security of the voiceprint authentication system itself cannot be guaranteed. To avoid this, we need to perform validity verification on the voiceprint registration audio to avoid voiceprint registration on non-compliant registration audio.
The prior art generally provides a method for verifying the validity of voiceprint registration audio. The method specifically comprises the following steps: obtaining effective voice when a user registers voice; equally dividing the effective voice into integral parts; respectively extracting the voiceprint characteristics in each voice; and comparing and verifying the voiceprint characteristics of each voice in pairs. However, because the duration of each audio segment is too short, the extracted features hardly cover all the voiceprint features of one voiceprint person. Therefore, the simple method of pairwise comparison for verifying the validity of the registered audio has high false rejection rate.
The above information disclosed in this background section is only for enhancement of understanding of the background of the technology described herein and, therefore, certain information may be included in the background that does not form the prior art that is already known in this country to a person of ordinary skill in the art.
Disclosure of Invention
The present application mainly aims to provide a method, an apparatus, a computer-readable storage medium, a processor, and an electronic device for detecting validity of a registration audio, so as to solve the problem in the prior art that a false rejection rate of a scheme for verifying validity of a voiceprint registration audio is high.
According to an aspect of the embodiments of the present invention, there is provided a method for detecting validity of a registration audio, including: acquiring N sections of registered audio, wherein the duration of each section of registered audio is greater than the preset duration; selecting M sections of the registered audios in the N sections of the registered audios to obtain a plurality of modeling groups, and establishing a first voiceprint recognition model in a one-to-one correspondence mode according to the modeling groups, wherein M is more than 1 and less than N, and M is an integer; performing similarity matching on each detection audio in a detection group and the corresponding first voiceprint recognition model, wherein the detection group is formed by the registration audio except the modeling group in the N sections of registration audio, and the registration audio in the detection group is the detection audio; determining that the registered audio is invalid when the number of detected audio that does not match the corresponding first voiceprint recognition model similarity is greater than or equal to a predetermined number.
Alternatively, N/2 ≦ M ≦ N/2+ 1.
Optionally, selecting M segments of the registered audio from the N segments of the registered audio to obtain a plurality of modeling groups, and establishing a voiceprint recognition model according to the one-to-one correspondence of the modeling groups, including: selecting M sections of the registered audios in the N sections of the registered audios to obtain
Figure 427944DEST_PATH_IMAGE001
A building block is constructed according to
Figure 79506DEST_PATH_IMAGE001
One-to-one correspondence establishment of modeling groups
Figure 859243DEST_PATH_IMAGE001
Each voiceprint recognition model carries out similarity matching on each detection audio in the detection group and the corresponding first voiceprint recognition model respectively, and the similarity matching method comprises the following steps: similarity matching is carried out on each detection audio in the detection group and the corresponding first voiceprint recognition model respectively,
Figure 988873DEST_PATH_IMAGE002
the building modules are matched together
Figure 537666DEST_PATH_IMAGE003
Next, the process is carried out.
Optionally, performing similarity matching on each detected audio in the detection group and the corresponding first voiceprint recognition model respectively, including: establishing a second fingerprint identification model according to each detection audio in the detection group; similarity matching is carried out on the second voiceprint recognition model and the first voiceprint recognition model; determining that the similarity of the second voiceprint recognition model and the first voiceprint recognition model is matched under the condition that the similarity of the second voiceprint recognition model and the first voiceprint recognition model is greater than or equal to a preset threshold value; determining that the similarity of the second voiceprint recognition model and the first voiceprint recognition model does not match in the case that the similarity of the second voiceprint recognition model and the first voiceprint recognition model is smaller than a predetermined threshold.
Optionally, in a case where the number of detected audios that do not match with the corresponding first voiceprint recognition model similarity is greater than or equal to a predetermined number, determining that the registered audio is invalid includes: determining that the registered audio is invalid when the number of detected audio that does not match the corresponding first voiceprint recognition model similarity is greater than or equal to 1.
Optionally, the obtaining N pieces of registration audio includes: acquiring a voice audio; extracting effective voice of the voice audio; and cutting and slicing the effective voice to obtain N sections of the registered voice frequency.
According to another aspect of the embodiments of the present invention, there is also provided a device for detecting validity of a registration audio, including an obtaining unit, a modeling unit, a matching unit, and a determining unit, where the obtaining unit is configured to obtain N segments of registration audio, and a duration of each segment of registration audio is greater than a predetermined duration; the modeling unit is used for selecting M sections of the registered audios in the N sections of the registered audios to obtain a plurality of modeling groups, and establishing a first voiceprint recognition model in a one-to-one correspondence mode according to the modeling groups, wherein M is more than 1 and less than N, and M is an integer; the matching unit is used for performing similarity matching on each detection audio in a detection group and the corresponding first voiceprint recognition model, wherein the detection group is formed by the registration audios except the modeling group in the N sections of registration audios, and the registration audios in the detection group are the detection audios; the determining unit is used for determining that the registered audio is invalid when the number of the detected audio which does not match with the corresponding similarity of the first voiceprint recognition model is larger than or equal to a preset number.
According to still another aspect of embodiments of the present invention, there is provided a computer-readable storage medium including a stored program, wherein the program performs any one of the methods.
According to a further aspect of the embodiments of the present invention, there is provided a processor for executing a program, wherein the program executes to perform any one of the methods.
According to another aspect of the embodiments of the present invention, there is also provided an electronic device, including: one or more processors, memory, and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors, the one or more programs including instructions for performing any of the methods.
In the embodiment of the invention, the method for detecting the validity of the registration audio frequency divides N sections of the registration audio frequency into two parts to obtain the building module and the detection group, wherein the building module comprises M sections of the registration audio frequency, namely at least two sections of the registration audio frequencies, the M sections of the registration audio frequency of the building module are used for forming the first voiceprint recognition model, each detection audio frequency of the detection group is respectively subjected to similarity matching with the registration audio frequency in the corresponding first voiceprint recognition model to determine whether the registration audio frequency is valid or not, and the scheme better improves the condition that the user is judged not to be the user when the user is subjected to the existing simple pairwise verification, namely improves the problem of high false rejection rate, so that the user experience is better.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this application, illustrate embodiments of the application and, together with the description, serve to explain the application and are not intended to limit the application. In the drawings:
fig. 1 shows a schematic flowchart generated by a method for detecting validity of registered audio according to an embodiment of the present application; and
fig. 2 is a schematic block diagram of a device for detecting validity of registered audio according to an embodiment of the present application.
Wherein the figures include the following reference numerals:
10. an acquisition unit; 20. a modeling unit; 30. a matching unit; 40. a determination unit.
Detailed Description
It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present application will be described in detail below with reference to the embodiments with reference to the attached drawings.
In order to make the technical solutions better understood by those skilled in the art, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only partial embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
It should be noted that the terms "first," "second," and the like in the description and claims of this application and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It should be understood that the data so used may be interchanged under appropriate circumstances such that embodiments of the application described herein may be used. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
It will be understood that when an element such as a layer, film, region, or substrate is referred to as being "on" another element, it can be directly on the other element or intervening elements may also be present. Also, in the specification and claims, when an element is described as being "connected" to another element, the element may be "directly connected" to the other element or "connected" to the other element through a third element.
For convenience of description, some terms or expressions referred to in the embodiments of the present application are explained below:
false Acceptance Rate (FAR): the lower the FAR, the better the system safety, which means the proportion of the cases not identified by the principal but recognized by the principal;
false Rejection Rate (FRR): the lower the FRR, the better the user experience, which is the proportion of the situation that the person is judged by the system to be not the person.
As mentioned in the background, in order to solve the above problem, the prior art has a high false rejection rate in the scheme of verifying the validity of the voiceprint registration audio, and in an exemplary embodiment of the present application, a method, an apparatus, a computer-readable storage medium, a processor, and an electronic device for detecting the validity of the registration audio are provided.
According to an embodiment of the present application, there is provided a method of detecting validity of a registered audio.
Fig. 1 is a flowchart generated by a method for detecting validity of registered audio according to an embodiment of the present application. As shown in fig. 1, the above method comprises the following steps:
step S101, obtaining N sections of registered audios, wherein the duration of each section of registered audio is greater than a preset duration;
step S102, selecting M sections of the registered audios in N sections of the registered audios to obtain a plurality of modeling groups, and establishing a first voiceprint recognition model in a one-to-one correspondence mode according to the modeling groups, wherein M is more than 1 and less than N, and M is an integer;
step S103, performing similarity matching on each detection audio in a detection group with the corresponding first voiceprint recognition model, wherein the detection group is formed by the registration audio except the modeling group in the N sections of registration audio, and the registration audio in the detection group is the detection audio;
step S104, determining that the registered audio is invalid when the number of the detected audio that does not match the corresponding similarity of the first voiceprint recognition model is greater than or equal to a predetermined number.
The method for detecting the validity of the registered audio obtains the building module and the detection group by dividing N sections of the registered audio into two parts, wherein the building module comprises M sections of the registered audio, namely at least two registered audios, the M sections of the registered audio of the building module are used for forming the first voiceprint recognition model, each detection audio of the detection group is respectively subjected to similarity matching with the registered audio in the corresponding first voiceprint recognition model, whether the registered audio is valid or not is determined, and the scheme better improves the condition that the user is determined not to be the user when the user is subjected to pairwise verification in the prior art, namely the problem of high false rejection rate is solved, so that the user experience is better.
Specifically, the above method for detecting the validity of the registered audio may be applied to a numeric string mode or a text mode.
In a typical embodiment of the application, N/2 is greater than or equal to M and is less than or equal to N/2+1, and when N is larger, the value of M is correspondingly larger, that is, the number of segments of the registered audio used for establishing the first voiceprint recognition model is larger, so that the first voiceprint recognition model is more accurate, the false acceptance rate and the false rejection rate can be reduced, and the security and the user experience of the voiceprint recognition process are ensured. In addition, in the scheme, the fact that the duration of each section of the registered audio is longer than the preset duration ensures that when the registered audio contains a plurality of speakers, one or more conditions exist so that the voice of the non-registered speaker is not contained in the detection group at all, that is, one or more conditions exist so that the detected audio does not pass through the similarity matching, and further ensures the safety of voiceprint recognition.
According to another exemplary embodiment of the present application, selecting M segments of the registered audio from among N segments of the registered audio to obtain a plurality of modeling groups, and building a voiceprint recognition model according to the modeling groups in a one-to-one correspondence manner, includes: selecting M sections of the registered audios from N sections of the registered audios to obtain
Figure 360128DEST_PATH_IMAGE004
A building block is constructed according to
Figure 486216DEST_PATH_IMAGE004
One-to-one correspondence establishment of modeling groups
Figure 685116DEST_PATH_IMAGE004
Each voiceprint recognition model carries out similarity matching on each detection audio in the detection group and the corresponding first voiceprint recognition model respectively, and the similarity matching method comprises the following steps: similarity matching is carried out on each detection audio in the detection group and the corresponding first voiceprint recognition model,
Figure 88416DEST_PATH_IMAGE005
the building modules are matched together
Figure 81780DEST_PATH_IMAGE006
Next, the process is carried out. According to the method, the first voiceprint recognition models are more accurate through establishing the plurality of voiceprint recognition models, each detection audio in the detection group is subjected to similarity matching with the corresponding first voiceprint recognition model, all the registration audios are guaranteed to be matched, the low false acceptance rate and the low false rejection rate are further guaranteed, and further the voiceprint recognition safety and the user experience are guaranteed. And the process of establishing the voiceprint recognition model and the similarity matching can be realized by using a typical vector method, so that the time and the labor are saved, and the implementability is high.
According to still another exemplary embodiment of the present application, similarity matching each detected audio in the detection group with the corresponding first voiceprint recognition model includes: establishing a second fingerprint identification model according to each detection audio in the detection group; carrying out similarity matching on the second voiceprint recognition model and the first voiceprint recognition model; determining that the similarity between the second voiceprint recognition model and the first voiceprint recognition model is matched when the similarity between the second voiceprint recognition model and the first voiceprint recognition model is greater than or equal to a predetermined threshold; and determining that the similarity between the second voiceprint recognition model and the first voiceprint recognition model is not matched when the similarity between the second voiceprint recognition model and the first voiceprint recognition model is smaller than a preset threshold value. The method determines whether the similarity between the second voiceprint recognition model and the first voiceprint recognition model is matched by matching the similarity between the second voiceprint recognition model and the first voiceprint recognition model and comparing the similarity with the preset threshold, thereby further avoiding the high proportion of the situations that the person is not identified but is the person and the person is judged not to be the person, namely ensuring the low false acceptance rate and the low false rejection rate, and further ensuring the high safety and the good user experience of the voiceprint recognition process.
According to a specific embodiment of the present application, in a case that the number of the detected audios that do not match with the corresponding similarity of the first voiceprint recognition model is greater than or equal to a predetermined number, determining that the registered audio is invalid includes: and determining that the registered audio is invalid when the number of the detected audio which does not match with the corresponding similarity of the first voiceprint recognition model is greater than or equal to 1. Therefore, the probability that the user is not the user but the user is identified is further reduced, the safety in the voiceprint identification process is further ensured, and the preset number can be set differently according to the safety requirements of voiceprint identification in different scenes.
According to another specific embodiment of the present application, acquiring N segments of registration audio includes: acquiring a voice audio; extracting effective voice of the voice audio; and cutting and slicing the effective voice to obtain N sections of the registered voice frequency. According to the method, the effective voice is cut and segmented by obtaining one voice audio, so that N sections of the registered audios are obtained, each registered audio is guaranteed to contain enough voiceprint features, and the safety and the user experience sense of the method during detection are further guaranteed.
Certainly, in an actual application process, when the acquired registration audio is short in duration, a section of registration audio can be formed in a splicing manner.
It should be noted that the steps illustrated in the flowcharts of the figures may be performed in a computer system such as a set of computer-executable instructions and that, although a logical order is illustrated in the flowcharts, in some cases, the steps illustrated or described may be performed in an order different than presented herein.
The embodiment of the present application further provides a device for detecting validity of a registration audio, and it should be noted that the device for detecting validity of a registration audio in the embodiment of the present application may be used to execute the method for detecting validity of a registration audio provided in the embodiment of the present application. The following describes an apparatus for detecting validity of registered audio provided in an embodiment of the present application.
Fig. 2 is a schematic structural diagram of an apparatus for detecting validity of registered audio according to an embodiment of the present application. As shown in fig. 2, the apparatus includes: the device comprises an acquisition unit 10, a modeling unit 20, a matching unit 30 and a determination unit 40, wherein the acquisition unit 10 is used for acquiring N sections of registered audio, and the duration of each section of registered audio is greater than a preset duration; the modeling unit 20 is configured to select M segments of the registered audio from among N segments of the registered audio to obtain a plurality of modeling groups, and build a first voiceprint recognition model in a one-to-one correspondence according to the modeling groups, where 1< M < N, and M is an integer; the matching unit 30 is configured to perform similarity matching between each detection audio in a detection group and the corresponding first voiceprint recognition model, where the detection group is formed by the registration audio of N segments of the registration audio except the modeling group, and the registration audio in the detection group is the detection audio; the determining unit 40 is configured to determine that the registered audio is invalid when the number of detected audio that does not match the corresponding similarity of the first voiceprint recognition model is greater than or equal to a predetermined number.
The device for detecting the validity of the registered audio acquires N sections of registered audio with the duration longer than the preset duration through an acquisition unit, then selects M sections of the registered audio through a modeling unit, establishes the first voiceprint recognition model, then carries out similarity matching on each detected audio in the detection group and the registered audio in the first voiceprint recognition model through the matching unit, and finally determines whether the registered audio is valid according to the matching condition through the determination unit. And respectively carrying out similarity matching on each detection audio of the detection group and the corresponding registration audio in the first voiceprint recognition model to determine whether the registration audio is valid, so that the lower proportion of the detection audio which is not the user but is recognized as the user is ensured, the lower proportion of the detection audio which is judged as the user but not the user is also ensured, namely, the lower false acceptance rate and the lower false rejection rate are ensured, and the safety and the user experience sense of the voiceprint recognition process are considered.
Specifically, the above-described detection apparatus for the validity of the registered audio may be applied to a numeric string mode as well as a text mode.
In a typical embodiment of the application, N/2 is greater than or equal to M and is less than or equal to N/2+1, and when N is larger, the value of M is correspondingly larger, that is, the number of segments of the registered audio used for establishing the first voiceprint recognition model is larger, so that the first voiceprint recognition model is more accurate, the false acceptance rate and the false rejection rate can be reduced, and the security and the user experience of the voiceprint recognition process are ensured.
According to another exemplary embodiment of the present application, the modeling unit 20 includes a modeling module, and the modeling module is configured to select M pieces of the registered audio from N pieces of the registered audio to obtain M pieces of registered audio
Figure 570530DEST_PATH_IMAGE004
A building block is constructed according to
Figure 307542DEST_PATH_IMAGE004
One-to-one correspondence establishment of modeling groups
Figure 565348DEST_PATH_IMAGE004
A voiceprint recognition model. The matching unit 30 includes a matching module, which is configured to perform similarity matching between each detected audio in the detection group and the corresponding first voiceprint recognition model,
Figure 995192DEST_PATH_IMAGE005
the building modules are matched together
Figure 846604DEST_PATH_IMAGE006
Next, the process is carried out. The device ensures that the first voiceprint recognition model is more accurate by establishing a plurality of voiceprint recognition models, and carries out similarity matching on each detection audio in the detection group and the corresponding first voiceprint recognition model, so that all the registration audios are ensured to be matched, the low false acceptance rate and the low false rejection rate are further ensured, and the voiceprint recognition safety and the user experience are further ensured. And the process of establishing the voiceprint recognition model and the similarity matching can be realized by using a typical vector method, so that the time and the labor are saved, and the implementability is high.
According to another exemplary embodiment of the present application, the matching module includes: the system comprises an establishing submodule, a matching submodule, a first determining submodule and a second determining submodule, wherein the establishing submodule is used for establishing a second voiceprint recognition model according to each detection audio in the detection group; the matching submodule is used for carrying out similarity matching on the second voiceprint recognition model and the first voiceprint recognition model; the first determining submodule is configured to determine that the second voiceprint recognition model matches the first voiceprint recognition model when the similarity between the second voiceprint recognition model and the first voiceprint recognition model is greater than or equal to a predetermined threshold; (ii) a The second determining submodule is configured to determine that the similarity between the second voiceprint recognition model and the first voiceprint recognition model does not match when the similarity between the second voiceprint recognition model and the first voiceprint recognition model is smaller than a predetermined threshold. The device determines whether the similarity between the second voiceprint recognition model and the first voiceprint recognition model is matched by matching the similarity between the second voiceprint recognition model and the first voiceprint recognition model and comparing the similarity with the preset threshold, so that the situation that the person is not identified but is not identified and the person is judged not to be the person is further avoided, namely, the device guarantees low false acceptance rate and false rejection rate, and further guarantees high safety and good user experience of the voiceprint recognition process.
In a specific embodiment of the present application, the determining unit 40 includes a determining module, and the determining module is configured to determine that the registered audio is invalid when the number of the detected audio that does not match with the corresponding similarity of the first voiceprint recognition model is greater than or equal to 1. Therefore, the probability that the user is not the user but the user is identified is further reduced, the safety in the voiceprint identification process is further ensured, and the preset number can be set differently according to the safety requirements of voiceprint identification in different scenes.
In another specific embodiment of the present application, the obtaining unit 10 includes: the device comprises an acquisition module, an extraction module and a cutting module, wherein the acquisition module is used for acquiring a voice audio; the extraction module is used for extracting effective voice of the voice audio; the cutting module is used for cutting and slicing the effective voice to obtain N sections of the registered voice frequency. The device cuts and fragments the effective voice by obtaining the voice audio and extracting the effective voice to obtain N sections of the registered audio, and ensures that each registered audio contains enough voiceprint characteristics, thereby ensuring the safety and the user experience sense when the method is used for detecting.
Certainly, in an actual application process, when the acquired registration audio is short in duration, a section of registration audio can be formed in a splicing manner.
The device for detecting the validity of the registered audio comprises a processor and a memory, wherein the acquiring unit 10, the modeling unit 20, the matching unit 30, the determining unit 40 and the like are stored in the memory as program units, and the processor executes the program units stored in the memory to realize corresponding functions.
The processor comprises a kernel, and the kernel calls the corresponding program unit from the memory. The kernel can be set to be one or more than one, and the problem of high false rejection rate of a scheme for verifying the validity of the voiceprint registration audio in the prior art is solved by adjusting kernel parameters.
The memory may include volatile memory in a computer readable medium, Random Access Memory (RAM) and/or nonvolatile memory such as Read Only Memory (ROM) or flash memory (flash RAM), and the memory includes at least one memory chip.
An embodiment of the present invention provides a storage medium, on which a program is stored, and the program, when executed by a processor, implements the method for detecting validity of a registered audio.
The embodiment of the invention provides a processor, which is used for running a program, wherein the program executes the method for detecting the validity of the registered audio when running.
The embodiment of the invention provides equipment, which comprises a processor, a memory and a program which is stored on the memory and can run on the processor, wherein when the processor executes the program, at least the following steps are realized:
step S101, obtaining N sections of registered audios, wherein the duration of each section of registered audio is greater than a preset duration;
step S102, selecting M sections of the registered audios in N sections of the registered audios to obtain a plurality of modeling groups, and establishing a first voiceprint recognition model in a one-to-one correspondence mode according to the modeling groups, wherein M is more than 1 and less than N, and M is an integer;
step S103, performing similarity matching on each detection audio in a detection group with the corresponding first voiceprint recognition model, wherein the detection group is formed by the registration audio except the modeling group in the N sections of registration audio, and the registration audio in the detection group is the detection audio;
step S104, determining that the registered audio is invalid when the number of the detected audio that does not match the corresponding similarity of the first voiceprint recognition model is greater than or equal to a predetermined number.
The device herein may be a server, a PC, a PAD, a mobile phone, etc.
The present application further provides a computer program product adapted to perform a program of initializing at least the following method steps when executed on a data processing device:
step S101, obtaining N sections of registered audios, wherein the duration of each section of registered audio is greater than a preset duration;
step S102, selecting M sections of the registered audios in N sections of the registered audios to obtain a plurality of modeling groups, and establishing a first voiceprint recognition model in a one-to-one correspondence mode according to the modeling groups, wherein M is more than 1 and less than N, and M is an integer;
step S103, performing similarity matching on each detection audio in a detection group with the corresponding first voiceprint recognition model, wherein the detection group is formed by the registration audio except the modeling group in the N sections of registration audio, and the registration audio in the detection group is the detection audio;
step S104, determining that the registered audio is invalid when the number of the detected audio that does not match the corresponding similarity of the first voiceprint recognition model is greater than or equal to a predetermined number.
In the above embodiments of the present invention, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.
In the embodiments provided in the present application, it should be understood that the disclosed technology can be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the above-described division of the units may be a logical division, and in actual implementation, there may be another division, for example, multiple units or components may be combined or may be integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, units or modules, and may be in an electrical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated unit may be stored in a computer-readable storage medium if it is implemented in the form of a software functional unit and sold or used as a separate product. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the above methods according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic or optical disk, and other various media capable of storing program codes.
In order to make the technical solutions of the present application more clearly understood by those skilled in the art, the following description will be given with reference to specific embodiments.
Example 1
According to the detection method for the validity of the registration audio, the detection of the validity of the registration audio in the text mode is carried out:
firstly, effective voice extraction is carried out on original voice audio;
then, the effective voice is cut and sliced to obtain N sections of the registered voice frequency;
and then, carrying out validity detection on the registration audio by using the detection method of the validity of the registration audio to determine whether the registration audio is valid.
The effective voice can be extracted by methods such as endpoint detection, semantic clause segmentation and the like, and also can be further extracted by methods such as voice clustering and the like for speaker separation on the basis of the methods; the slice may be cut into 2s of the registration audios to ensure that each of the registration audios contains sufficient voiceprint features, and may also be cut into other time periods of registration audios.
Example 2
Testing according to the method for detecting the validity of the registration audios, wherein the testing is divided into 3 testing sets, each testing set comprises 50 groups of registration audios, and each group of registration audios in the testing set 1 all come from the same voiceprint person; each group of registered audios in the test set 2 comprises 4 pieces from the same person and 1 piece from another person; each set of registered audios in test set 3 has 3 segments from the same person and 2 segments from another person. Three different schemes are adopted to obtain the detection results in the following table, wherein the numbers in the cells in the table are the number of people to be misjudged.
The first scheme is as follows: only 1 section of registered audio is in the first voiceprint recognition model, four second voiceprint recognition models formed by using 4 sections of detected audio are respectively subjected to similarity matching with the first voiceprint recognition model, and when the number of the detected audio which is not matched with the similarity of the registered audio in the corresponding first voiceprint recognition model is more than or equal to 1, the detection is determined to be invalid. The results are shown in table 1:
TABLE 1
Figure 121728DEST_PATH_IMAGE007
Scheme II: and 4 sections of registered audios exist in the first voiceprint recognition model, a second voiceprint recognition model formed by using 1 section of detection audio is respectively subjected to similarity matching with the first voiceprint recognition model, and when the number of the detection audio which is not matched with the similarity of the registered audio in the corresponding first voiceprint recognition model is more than or equal to 1, the second voiceprint recognition model is determined to be invalid. The results are shown in table 2:
TABLE 2
Figure 499620DEST_PATH_IMAGE008
The third scheme is as follows: and when the number of the detection audios which are not matched with the similarity of the registration audios in the corresponding first voiceprint recognition model is more than or equal to 1, the detection audios are determined to be invalid. The results are shown in table 3:
TABLE 3
Figure 834786DEST_PATH_IMAGE009
The test results of the first scheme and the second scheme are not ideal, because the time length of the registered audio of the first scheme is too short, the effect is poor; the second scheme is that under the condition that the 2-segment audio is another person, the registered audio always contains the voiceprint person of the detected audio, so that the effective detection cannot be carried out. The third scheme avoids the situations, and can simultaneously consider the safety and the user experience when the threshold value is low.
From the above description, it can be seen that the above-described embodiments of the present application achieve the following technical effects:
1) the utility model provides a method for detecting validity of registered audio, which comprises dividing N sections of registered audio into two parts to obtain the building module and the detection group, wherein the building module comprises M sections of registered audio, namely at least two registered audio, the M sections of registered audio of the building module are used for forming the first voiceprint recognition model, each detection audio of the detection group is respectively subjected to similarity matching with the registered audio in the corresponding first voiceprint recognition model to determine whether the registered audio is valid or not.
2) The application provides a device for detecting the validity of registered audio, wherein N sections of registered audio with the duration longer than the preset duration are obtained through an obtaining unit, M sections of registered audio are selected through a modeling unit, the first voiceprint recognition model is established, each detected audio in a detection group is subjected to similarity matching with the registered audio in the first voiceprint recognition model through a matching unit, and finally, whether the registered audio is valid or not is determined through a determining unit according to the matching condition. And respectively carrying out similarity matching on each detection audio of the detection group and the corresponding registration audio in the first voiceprint recognition model to determine whether the registration audio is valid, so that the lower proportion of the detection audio which is not the user but is recognized as the user is ensured, the lower proportion of the detection audio which is judged as the user but not the user is also ensured, namely, the lower false acceptance rate and the lower false rejection rate are ensured, and the safety and the user experience sense of the voiceprint recognition process are considered.
The above description is only a preferred embodiment of the present application and is not intended to limit the present application, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims (10)

1. A method for detecting validity of registered audio, comprising:
acquiring N sections of registered audio, wherein the duration of each section of registered audio is greater than the preset duration;
selecting M sections of the registered audios in the N sections of the registered audios to obtain a plurality of modeling groups, and establishing a first voiceprint recognition model in a one-to-one correspondence mode according to the modeling groups, wherein M is more than 1 and less than N, and M is an integer;
performing similarity matching on each detection audio in a detection group and the corresponding first voiceprint recognition model, wherein the detection group is formed by the registration audio except the modeling group in the N sections of registration audio, and the registration audio in the detection group is the detection audio;
determining that the registered audio is invalid when the number of detected audio that does not match the corresponding first voiceprint recognition model similarity is greater than or equal to a predetermined number.
2. The method of claim 1, wherein N/2 ≦ M ≦ N/2+ 1.
3. The method of claim 2,
selecting M sections of the registered audios in the N sections of the registered audios to obtain a plurality of modeling groups, and establishing a voiceprint recognition model according to the one-to-one correspondence of the modeling groups, wherein the method comprises the following steps:
selecting M sections of the registered audios in the N sections of the registered audios to obtain
Figure DEST_PATH_IMAGE002
A building block is constructed according to
Figure 222454DEST_PATH_IMAGE002
One-to-one correspondence establishment of modeling groups
Figure 139595DEST_PATH_IMAGE002
A voice print recognition model for recognizing the voice print of the user,
and respectively carrying out similarity matching on each detection audio in the detection group and the corresponding first voiceprint recognition model, wherein the similarity matching comprises the following steps:
similarity matching is carried out on each detection audio in the detection group and the corresponding first voiceprint recognition model respectively,
Figure DEST_PATH_IMAGE004
each of the building modulesMatching
Figure DEST_PATH_IMAGE006
Next, the process is carried out.
4. The method of claim 1, wherein similarity matching each detected audio in a detection group with the corresponding first voiceprint recognition model comprises:
establishing a second fingerprint identification model according to each detection audio in the detection group;
similarity matching is carried out on the second voiceprint recognition model and the first voiceprint recognition model;
determining that the similarity of the second voiceprint recognition model and the first voiceprint recognition model is matched under the condition that the similarity of the second voiceprint recognition model and the first voiceprint recognition model is greater than or equal to a preset threshold value;
determining that the similarity of the second voiceprint recognition model and the first voiceprint recognition model does not match in the case that the similarity of the second voiceprint recognition model and the first voiceprint recognition model is smaller than a predetermined threshold.
5. The method according to claim 1, wherein determining that the registered audio is invalid in a case where the number of detected audios that do not match the corresponding first voiceprint recognition model similarity is greater than or equal to a predetermined number comprises:
determining that the registered audio is invalid when the number of detected audio that does not match the corresponding first voiceprint recognition model similarity is greater than or equal to 1.
6. The method according to any one of claims 1 to 5, wherein obtaining N pieces of registered audio comprises:
acquiring a voice audio;
extracting effective voice of the voice audio;
and cutting and slicing the effective voice to obtain N sections of the registered voice frequency.
7. An apparatus for detecting validity of registered audio, comprising:
the device comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is used for acquiring N sections of registered audios, and the duration of each section of registered audio is greater than the preset duration;
the modeling unit is used for selecting M sections of the registered audios in the N sections of the registered audios to obtain a plurality of modeling groups, and establishing a first voiceprint recognition model according to the modeling groups in a one-to-one correspondence mode, wherein M is more than 1 and less than N, and M is an integer;
a matching unit, configured to perform similarity matching on each detection audio in a detection group and the corresponding first voiceprint recognition model, where the detection group is formed by the registration audios, except for the modeling group, in the N segments of registration audios, and the registration audios in the detection group are the detection audios;
a determining unit configured to determine that the registered audio is invalid when the number of detected audios that do not match the corresponding first voiceprint recognition model similarity is greater than or equal to a predetermined number.
8. A computer-readable storage medium, characterized in that the storage medium comprises a stored program, wherein the program performs the method of any one of claims 1 to 6.
9. A detection processor for detecting the validity of registered audio, the processor being configured to run a program, wherein the program is configured to perform the method of any one of claims 1 to 6 when running.
10. An electronic device for detecting validity of registered audio, comprising: one or more processors, memory, and one or more programs stored in the memory and configured to be executed by the one or more processors, the one or more programs including instructions for performing the method of any of claims 1-6.
CN202011081502.3A 2020-10-12 2020-10-12 Method and device for detecting validity of registered audio and electronic equipment Active CN111933152B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202011081502.3A CN111933152B (en) 2020-10-12 2020-10-12 Method and device for detecting validity of registered audio and electronic equipment
PCT/CN2021/096835 WO2022077918A1 (en) 2020-10-12 2021-05-28 Method for detecting validity of registered audio, detection apparatus, and electronic device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011081502.3A CN111933152B (en) 2020-10-12 2020-10-12 Method and device for detecting validity of registered audio and electronic equipment

Publications (2)

Publication Number Publication Date
CN111933152A CN111933152A (en) 2020-11-13
CN111933152B true CN111933152B (en) 2021-01-08

Family

ID=73334367

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011081502.3A Active CN111933152B (en) 2020-10-12 2020-10-12 Method and device for detecting validity of registered audio and electronic equipment

Country Status (2)

Country Link
CN (1) CN111933152B (en)
WO (1) WO2022077918A1 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111933152B (en) * 2020-10-12 2021-01-08 北京捷通华声科技股份有限公司 Method and device for detecting validity of registered audio and electronic equipment

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1963917A (en) * 2005-11-11 2007-05-16 株式会社东芝 Method for estimating distinguish of voice, registering and validating authentication of speaker and apparatus thereof
CN102402985A (en) * 2010-09-14 2012-04-04 盛乐信息技术(上海)有限公司 Voiceprint authentication system for improving voiceprint identification security and implementation method thereof
US9978374B2 (en) * 2015-09-04 2018-05-22 Google Llc Neural networks for speaker verification
CN106971727A (en) * 2016-01-14 2017-07-21 芋头科技(杭州)有限公司 A kind of verification method of Application on Voiceprint Recognition
CN108172230A (en) * 2018-01-03 2018-06-15 平安科技(深圳)有限公司 Voiceprint registration method, terminal installation and storage medium based on Application on Voiceprint Recognition model
CN108417217B (en) * 2018-01-11 2021-07-13 思必驰科技股份有限公司 Speaker recognition network model training method, speaker recognition method and system
CN108694950B (en) * 2018-05-16 2021-10-01 清华大学 A Speaker Confirmation Method Based on Deep Mixture Model
CN109065028B (en) * 2018-06-11 2022-12-30 平安科技(深圳)有限公司 Speaker clustering method, speaker clustering device, computer equipment and storage medium
CN108962284B (en) * 2018-07-04 2021-06-08 科大讯飞股份有限公司 Voice recording method and device
CN111933152B (en) * 2020-10-12 2021-01-08 北京捷通华声科技股份有限公司 Method and device for detecting validity of registered audio and electronic equipment

Also Published As

Publication number Publication date
WO2022077918A1 (en) 2022-04-21
CN111933152A (en) 2020-11-13

Similar Documents

Publication Publication Date Title
EP3611895B1 (en) Method and device for user registration, and electronic device
CN111312256B (en) Voice identification method and device and computer equipment
EP2965249B1 (en) Method and system for distinguishing humans from machines
CN108447471B (en) Speech recognition method and speech recognition device
CN108335695B (en) Voice control method, device, computer equipment and storage medium
WO2017162053A1 (en) Identity authentication method and device
EP2605182A2 (en) High-security biometric authentication system
CN111312286A (en) Age identification method, age identification device, age identification equipment and computer readable storage medium
CN107533598B (en) Input method and device of login password of application program and terminal
CN105991593A (en) Method and device for identifying risk of user
CN111785291A (en) Voice separation method and voice separation device
CN109065051B (en) Voice recognition processing method and device
CN109117622B (en) Identity authentication method based on audio fingerprints
WO2021072893A1 (en) Voiceprint clustering method and apparatus, processing device and computer storage medium
CN107766868A (en) A kind of classifier training method and device
EP3816996B1 (en) Information processing device, control method, and program
CN111933152B (en) Method and device for detecting validity of registered audio and electronic equipment
KR20180041016A (en) Method for authenticating user using hybrid biometrics information
CN113593579B (en) Voiceprint recognition method and device and electronic equipment
CN112182520B (en) Identification method and device of illegal account number, readable medium and electronic equipment
CN111653283A (en) Cross-scene voiceprint comparison method, device, equipment and storage medium
CN111930885B (en) Text topic extraction method and device and computer equipment
CN113870865A (en) Voiceprint feature updating method and device, electronic equipment and storage medium
CN113257254B (en) Voiceprint recognition method and device, electronic equipment and storage medium
CN108985035B (en) Control method, device, storage medium and electronic device for user operation authority

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant