[go: up one dir, main page]

CN109192216A - A kind of Application on Voiceprint Recognition training dataset emulation acquisition methods and its acquisition device - Google Patents

A kind of Application on Voiceprint Recognition training dataset emulation acquisition methods and its acquisition device Download PDF

Info

Publication number
CN109192216A
CN109192216A CN201810895193.XA CN201810895193A CN109192216A CN 109192216 A CN109192216 A CN 109192216A CN 201810895193 A CN201810895193 A CN 201810895193A CN 109192216 A CN109192216 A CN 109192216A
Authority
CN
China
Prior art keywords
voice
data
emulation
noise
module
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201810895193.XA
Other languages
Chinese (zh)
Inventor
刘晓鹏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Lianzhi Technology (tianjin) Co Ltd
Original Assignee
Lianzhi Technology (tianjin) Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Lianzhi Technology (tianjin) Co Ltd filed Critical Lianzhi Technology (tianjin) Co Ltd
Priority to CN201810895193.XA priority Critical patent/CN109192216A/en
Publication of CN109192216A publication Critical patent/CN109192216A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/02Preprocessing operations, e.g. segment selection; Pattern representation or modelling, e.g. based on linear discriminant analysis [LDA] or principal components; Feature selection or extraction
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/04Training, enrolment or model building
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/06Decision making techniques; Pattern matching strategies
    • G10L17/08Use of distortion metrics or a particular distance between probe pattern and reference templates
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/20Pattern transformations or operations aimed at increasing system robustness, e.g. against channel noise or different working conditions
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L9/00Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
    • H04L9/32Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols including means for verifying the identity or authority of a user of the system or for message authentication, e.g. authorization, entity authentication, data integrity or data verification, non-repudiation, key authentication or verification of credentials
    • H04L9/3226Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols including means for verifying the identity or authority of a user of the system or for message authentication, e.g. authorization, entity authentication, data integrity or data verification, non-repudiation, key authentication or verification of credentials using a predetermined code, e.g. password, passphrase or PIN
    • H04L9/3231Biological data, e.g. fingerprint, voice or retina
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L9/00Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
    • H04L9/32Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols including means for verifying the identity or authority of a user of the system or for message authentication, e.g. authorization, entity authentication, data integrity or data verification, non-repudiation, key authentication or verification of credentials
    • H04L9/3263Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols including means for verifying the identity or authority of a user of the system or for message authentication, e.g. authorization, entity authentication, data integrity or data verification, non-repudiation, key authentication or verification of credentials involving certificates, e.g. public key certificate [PKC] or attribute certificate [AC]; Public key infrastructure [PKI] arrangements

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biodiversity & Conservation Biology (AREA)
  • Biomedical Technology (AREA)
  • General Health & Medical Sciences (AREA)
  • Business, Economics & Management (AREA)
  • Game Theory and Decision Science (AREA)
  • Telephonic Communication Services (AREA)

Abstract

The invention belongs to vocal print acquisition and vocal print information technology fields, disclose a kind of Application on Voiceprint Recognition training dataset emulation acquisition methods and its acquisition device, the acquisition methods include the method for channel coding, the method for environmental noise emulation, method, the method for data decoded method and vocal print database establishment of communication pattern emulation, and the acquisition device includes the voice acquisition module being successively linked in sequence, voice coding module, environmental noise emulation module, communication pattern scrambling emulation module, voice codec module and vocal print database module;Its recognition effect that can have been obtained, it is able to carry out noise processed, application training can be directly carried out in data training process, can be improved Application on Voiceprint Recognition environmental noise robustness and cha nnel robustness, and the Application on Voiceprint Recognition that can be used under different communication modes is emulated with training dataset to be obtained.

Description

A kind of Application on Voiceprint Recognition training dataset emulation acquisition methods and its acquisition device
Technical field
The invention belongs to vocal print acquisition and vocal print information technology field more particularly to a kind of Application on Voiceprint Recognition training datasets Emulate acquisition methods and its acquisition device.
Background technique
Application on Voiceprint Recognition, also referred to as Speaker Identification are that the life of speaker's identity is judged automatically according to voice using computer Object feature identification technique;According to different application scenarios, there are many classification methods for sound groove recognition technology in e: according to voice content whether It is known that can be divided into text relevant unrelated with text for Application on Voiceprint Recognition;According to the difference of identification mission, Application on Voiceprint Recognition can be divided into Talk about people's identification and speaker verification;Sound groove recognition technology in e is mainly used in the fields such as public safety, the criminal investigation administration of justice and finance.
In recent years, the unrelated speaker of the text of mainstream recognizes (hereinafter referred to as Speaker Identification) technology and is based on Gauss hybrid models-universal background model (Gaussian mixture that Douglas A. Reynolds was proposed in 2000 Model-universal background models, GMM-UBM) Speaker Recognition System.GMM-UBM system is from speaker It identifies angle, proposes the theoretical frame and implementation method for measuring two sections of voice similarity degrees, there is landmark meaning.
Product under different scenes can all have different environmental noises, even identical product also has different background rings Border, such as intelligent sound box are used in family using and company, and environmental noise also can be different, needs before using Application on Voiceprint Recognition The environmental robustness of product is assessed, this index shows adaptability of this technology under varying environment noise, keeps away Exempt from all to be carefully, once ineffective to user environment when company debugs.
Voice communication refers to by voice and by the communication way of transmission medium, there is base call, mobile phone communication, intercommunication Machine is conversed, and the voice-enabled chat etc. above network is referred to as voice communication.Voice communication mode common at present has fixed line, mobile phone GSM(Global System for Mobile Communication, global system for mobile communications), mobile TD-SDMA(Time Division-Synchronous Code Division Multiple Access, TD SDMA), connection WCDMA(Wideband Code Division Multiple Access, wideband code division multiple access), telecommunications CDMA(Code Division Multiple Access, CDMA), LTE(Long Term Evolution, long term evolution) and sending out The 5G etc. of exhibition.
PSTN(Public Switched Telephone Network, Public Switched Telephony Network) i.e. in daily life Common telephone network.PSTN is a kind of circuit-switched network based on analogue technique, and the speech coding algorithm used is G.711 a rate coding mode or u rate coding mode.VOIP(Voice over Internet Protocol, the networking telephone) often The coding mode used is the G.723 standard of International Telecommunication Union, specially algebraic code-excited linear predictive coding ACELP (Algebraic Code Excited Linear Prediction, algebraic code-excited linear predictive coding) coding.Wechat The communication mode that phone, QQ phone use is that narrowband self-adaption multi code Rate of Chinese character AMR-NB(Adaptive Multi-Rate(is adaptive Multi-rate coding) narrowband-Narrow Band()) coding mode.2G communication, i.e. Generation Mobile Telecommunication System technology, packet Chinese juniper GSM are logical Letter system and CDMA1x communication system, wherein the 2G of China Mobile and China Unicom uses GSM standard, and China Telecom 2G makes It is CDMA1x standard.GSM voice coding is Regular-Pulse Excitation long-term linearity predictive coding RPE-LTP.3G is communicated The WCDMA of TD-SCDMA and China Unicom that China Mobile independently formulates and the CDMA2000 of China Telecom.SCDMA and WCDMA uses adaptive multi-beam forming AMR-NB or AMR-WB(Adaptive Multi-Rate(adaptive multi-rate to compile Code) broadband-Wide Band()) coding.Telecommunications 2G, 3G use enhanced variable rate codec EVRC or QCELP coding mode.4G communication, China Mobile use TD-LTE(Time Division Long Term Evolution, time-division-long term evolution) standard, China Unicom and China Telecom use FDD-LTE(frequency division duplex-long-term Evolution) standard.That 4G communication uses is high definition voice communication VoLTE, and voice coding modes are adaptive multi-beam forming AMR.
Due to the extensive use of digital voice communication system, sound passes through after microphone acquisition into Voiceprint Recognition System Many links, include different microphone types, different audio CODEC(coders), different transmission channel etc., These can all have vocal print feature and influence, or be illustrated with intelligent sound box, if in registration be with mobile phone terminal app, and Verifying is then directly to speak against speaker when using, mobile phone MIC(microphone) with the speaker MIC channel that be exactly two different, this The accuracy rate that verifying may be reduced in the case of kind, is channel mismatch, therefore, in addition to advising in product level on technical term It keeps away, it is also desirable to consider performance of the sound groove recognition technology in e in different channels;Voiceprint Recognition System can obtain in the actual environment Training voice and tested speech coding it is often different, Application on Voiceprint Recognition at this moment is just faced with since trained and tested speech is compiled The voice channel mismatch problem that code is different and generates, this will have a huge impact the performance of system;Channel is not solved not It is to improve Speaker Identification performance with problem, enhances one of the key of system degree of being practical.
To solve the problems, such as channel mismatch in Application on Voiceprint Recognition, there are two types of technological approaches;One is vocal print of the research across channel to build Modulo n arithmetic;Main vocal print modeling algorithm has NAP(Nuisance Attribute Proje ction, disturbance component under the technology Projection model), JFA(Joint Factor Analysis, simultaneous factor analysis model), i-vector(identity- Vector, identity-based authentication vector) Speaker Identification modeling method and combine speech recognition DNN (Deep Neural Networks, deep neural network) acoustic model and the Speaker Identification modeling method of i-vector model etc..
NAP and JFA is the subspace model put forward for channel mismatch problem;Wherein NAP direct estimation goes out one Channel subspace, then by the subspace from GMM(Gaussian Mixed Model, gauss hybrid models) mean value super vector is empty Between it is middle removal to reduce interference of the channel information to Speaker Identification;JFA thinks empty in the higher-dimension of GMM mean value super vector Between in, there are two sub-spaces to have separately included speaker information and channel information, by combining to the two subspaces Modeling can more effectively separate speaker information in voice and channel information, thus the speaker under promoting Complex Channel Identifying system performance;Due to containing speaker information more abundant in the channel subspace in JFA model, JFA is to saying The method that words people and channel separately model can generate biggish damage to speaker information, and 2010, Dehak et al. was in JFA Basis propose i-vector model;A sub-spaces are defined only in i-vector model, referred to as entire change is empty Between, speaker information and channel information are contained in the subspace simultaneously.Further every section of voice has been expressed as the subspace In a low dimension vector, i.e. i-vector;Weaken letter finally by the mode of channel compensation is carried out in i-vector level Influence of the road to Speaker Recognition System performance;Compared with JFA model, the complexity of i-vector model is greatly reduced, together When it is more flexible by way of carrying out channel compensation in subspace, and show better Speaker Identification performance, And this is but also i-vector model becomes the Speaker Identification modeling method of most mainstream and forefront.
2014, Lei and Kenny et al. proposed a kind of combination speech recognition DNN acoustic model and i-vector model Speaker Identification modeling method: during the valuation of i-vector model correlation sufficient statistic, using in speech recognition Traditional UBM model is replaced to calculate frame posterior probability the DNN acoustic model that phoneme state is classified;This method reduce System modelling complexity, recognition effect are promoted obvious.
Above-mentioned method for recognizing sound-groove may serve to solve the problems, such as channel mismatch, but there are corresponding problems;By taking JFA as an example, It is required that amount of training data is very big, operand is also very big when test, is often difficult to obtain fine knowledge in practical applications Other effect;And the problem of for environmental noise, in the prior art in addition to noise processed, there has been no direct in data training process Carry out the relevant technologies of application training.
Summary of the invention
In view of the problems of the existing technology, the present invention provides one kind to be able to solve channel mismatch problem, can obtain Good recognition effect, is able to carry out noise processed, application training can be directly carried out in data training process, can be improved sound Line environment-identification noise robust and cha nnel robustness can be used in the Application on Voiceprint Recognition training dataset under different communication modes Emulate acquisition methods and its acquisition device.
The invention is realized in this way one aspect of the present invention provides a kind of Application on Voiceprint Recognition training dataset emulation acquisition side Method, the acquisition methods include the method for channel coding, the method for environmental noise emulation, method, the data of communication pattern emulation The method of decoded method and vocal print database establishment,
The method of channel coding the following steps are included:
Read primary voice data;
Header file is removed according to format standard according to the voice print database of the primary voice data, obtains pure speech data block;
Select the voice communication mode to be emulated;
Data volume is carried out to the obtained pure speech data block according to the voice communication mode corresponding speech coding standard Code, obtains compressed voice data;
Environmental noise emulation method the following steps are included:
Select the environmental noise mode to be emulated;
The compressed voice number obtained according to the method for the varying environment noise and sound pressure levels of selection and channel coding According to same channel mixing is carried out, the voice data comprising environmental noise is obtained;
Communication pattern emulation method the following steps are included:
Obtain the scrambling parameter that voice channel transmits under different relative amplitudes and state of signal-to-noise;
Scrambling parameter is selected, includes environmental noise to passing through described in coding, noise mixing in the method for environmental noise emulation Voice data carries out Channel scrambling operation, obtains the Hybrid communication model simulated voice data comprising environmental noise;
The decoded method of data the following steps are included:
Corresponding tone decoding algorithm is selected according to voice communication mode;
The Hybrid communication model simulated voice data comprising environmental noise are added with corresponding tone decoding algorithm corresponding Audio file head obtains trained voice identical with tested speech channel condition;
Voice print database construction method the following steps are included:
Data sample library is established according to voice and characteristic model data item;
Data sample library is established according to voice messaging data item;
Database interface is established according to voice print database interface specification.
Application on Voiceprint Recognition of the invention emulates acquisition methods with training dataset, first acquisition primary voice data, then right The voice print database of primary voice data carries out environmental noise emulation, then in communication pattern emulation, i.e. communication pattern soft simulation, First make an uproar according to being mixed in the method that the communication pattern to be emulated emulates environmental noise by coding, noise comprising environment The voice data of sound carries out voice coding;Then, scrambling parameter is emulated to the language after coding according to respective communication mode lower channel Sound data carry out communication pattern and scramble simulation operations;Finally, carrying out voice solution to the voice data after communication pattern scrambling emulation Code operation, obtains the voice under respective communication mode;It is by carrying out communication pattern emulation to acquired original voice, so that training Voice is identical with the channel condition of tested speech, solves channel mismatch phenomenon;It effectively can carry out channel to voice communication Soft simulation, the voice communication courses such as simulation fixed line, 2G, 3G, 4G, VOIP virtual speech and Internet chat, to obtain and test The identical trained voice of voice channel condition, efficiently solves the problems, such as channel mismatch, is suitable for actual application demand.
One aspect of the present invention provides a kind of Application on Voiceprint Recognition training dataset emulation as described in one aspect of the invention and obtains Take the acquisition device of method, the acquisition device include voice acquisition module, voice coding module, environmental noise emulation module, Communication pattern scrambles emulation module, voice codec module and vocal print database module;And voice acquisition module, voice coding mould Block, environmental noise emulation module, communication pattern scramble emulation module, voice codec module and vocal print database module successively sequence Connection.
Application on Voiceprint Recognition of the invention training dataset emulates acquisition device, and voice acquisition module is for acquiring speaker's The voice print database of primary voice data;Voice coding module is used to carry out voice coding to the voice print database of primary voice data, To obtain the compressed voice data under respective communication mode, i.e. vocoded data;Environmental noise emulation module is used for To the environmental noise data of vocoded data mixing selection, the voice data comprising environmental noise is obtained, that is, realizes different works Sound coding data under condition environment;Communication pattern scrambles emulation module and selects respective channel according to communication pattern, realizes to sound The communication pattern of sound coded data emulates;Voice codec module is according to corresponding decoding algorithm, to by environmental noise emulation and letter The compressed data of road emulation carries out voice codec, to obtain the output voice of needs;Voice print database library module is used to acquisition Simulated voice data according to voice and characteristic model data item and voice messaging data item, establish data training sample storehouse respectively Library, and calling is provided according to authority data interface.
Beneficial effects of the present invention:
Application on Voiceprint Recognition of the invention emulates acquisition methods and its acquisition device with training dataset, is able to solve institute in the prior art Existing to require amount of training data very big, operand is also very big when test, is often difficult to obtain in practical applications Fine recognition effect, the problem of application training can not be directly carried out in data training process, the identification effect that can have been obtained Fruit is able to carry out noise processed, and application training can be directly carried out in data training process, can be improved Application on Voiceprint Recognition environment Noise robust and cha nnel robustness, the Application on Voiceprint Recognition that can be used under different communication modes is emulated with training dataset to be obtained. Specifically, Application on Voiceprint Recognition of the invention training dataset, which emulates acquisition methods and acquisition device, has following two o'clock advantage:
Application on Voiceprint Recognition of the invention emulates acquisition methods and its acquisition device with training dataset, with traditional method for recognizing sound-groove phase Than environmental noise emulation mode and information channel simulation method are applied in Application on Voiceprint Recognition, only need that the original of training will be used for Beginning voice data carries out environmental noise mixing and communication pattern emulation, can obtain training identical with tested speech channel condition Voice, to solve the problems, such as that environmental noise robustness and cha nnel robustness existing for traditional method for recognizing sound-groove are bad.
Application on Voiceprint Recognition of the invention emulates acquisition methods and its acquisition device with training dataset, knows with the vocal print across channel Other modeling algorithm is compared, and only needs to carry out environmental noise emulation and communication pattern emulation to original trained speech samples, without Need to change Application on Voiceprint Recognition modeling algorithm, to reduce the complexity of recognizer, while recognition effect is also than across channel The more preferable environmental suitability of Application on Voiceprint Recognition modeling algorithm is more preferably;Therefore, more suitable for the foundation of Application on Voiceprint Recognition training sample database, symbol Close the demand of engineering application.
Detailed description of the invention
Fig. 1 is the flow diagram that Application on Voiceprint Recognition training dataset of the invention emulates acquisition methods.
Fig. 2 is the structural block diagram that Application on Voiceprint Recognition training dataset of the invention emulates acquisition device.
Specific embodiment
The specific embodiment of the invention is described with reference to the accompanying drawings and embodiments:
Embodiment 1:
A kind of Application on Voiceprint Recognition training dataset emulation acquisition methods, referring to Fig. 1, the acquisition methods include the side of channel coding The side of method, the method for environmental noise emulation, the method for communication pattern emulation, data decoded method and vocal print database establishment Method,
The method of channel coding the following steps are included:
Read primary voice data;
Header file is removed according to format standard according to the voice print database of the primary voice data, obtains pure speech data block;
Select the voice communication mode to be emulated;
Data volume is carried out to the obtained pure speech data block according to the voice communication mode corresponding speech coding standard Code, obtains compressed voice data;
Environmental noise emulation method the following steps are included:
Select the environmental noise mode to be emulated;
The compressed voice number obtained according to the method for the varying environment noise and sound pressure levels of selection and channel coding According to same channel mixing is carried out, the voice data comprising environmental noise is obtained;
Communication pattern emulation method the following steps are included:
Obtain the scrambling parameter that voice channel transmits under different relative amplitudes and state of signal-to-noise;
Scrambling parameter is selected, includes environmental noise to passing through described in coding, noise mixing in the method for environmental noise emulation Voice data carries out Channel scrambling operation, obtains the Hybrid communication model simulated voice data comprising environmental noise;
The decoded method of data the following steps are included:
Corresponding tone decoding algorithm is selected according to voice communication mode;
The Hybrid communication model simulated voice data comprising environmental noise are added with corresponding tone decoding algorithm corresponding Audio file head obtains trained voice identical with tested speech channel condition;
Voice print database construction method the following steps are included:
Data sample library is established according to voice and characteristic model data item;
Data sample library is established according to voice messaging data item;
Database interface is established according to voice print database interface specification.
The Application on Voiceprint Recognition of the present embodiment emulates acquisition methods with training dataset, first acquisition primary voice data, then Environmental noise emulation is carried out to the voice print database of primary voice data, is then emulated in communication pattern, i.e. communication pattern soft simulation When, it include first ring according to being mixed in the method that the communication pattern to be emulated emulates environmental noise by coding, noise The voice data of border noise carries out voice coding;Then, after emulating scrambling parameter to coding according to respective communication mode lower channel Voice data carry out communication pattern scramble simulation operations;Finally, carrying out language to the voice data after communication pattern scrambling emulation Sound decoding operate obtains the voice under respective communication mode;Its by acquired original voice carry out communication pattern emulation so that Training voice is identical with the channel condition of tested speech, solves channel mismatch phenomenon;It can effectively carry out voice communication Channel soft simulation, the voice communication courses such as simulation fixed line, 2G, 3G, 4G, VOIP virtual speech and Internet chat, thus obtain with The identical trained voice of tested speech channel condition, efficiently solves the problems, such as channel mismatch, is suitable for actual application demand.
It may also be preferred that the speech samples of the primary voice data are WAV, MP3 or ACC format.
It may also be preferred that the voice communication mode to be emulated includes fixed line, mobile phone in the method for channel coding GSM, mobile TD-SDMA, connection WCDMA, telecommunications CDMA, LTE, 5G, recording pen or network virtual phone.
It may also be preferred that the environmental noise mode to be emulated includes interior in the method for environmental noise emulation Environmental noise.
It may also be preferred that the indoor environment noise is machine operation noise in the method for environmental noise emulation.
It may also be preferred that the environmental noise mode to be emulated includes outdoor in the method for environmental noise emulation Environmental noise.
It may also be preferred that the outdoor environment noise includes sound of the wind noise, the patter of rain in the method for environmental noise emulation Noise, vehicle noise or machine operation noise.Here machine operation noise refers to what the mechanical equipment other than vehicle generated Running noise.
Embodiment 2:
A kind of Application on Voiceprint Recognition as described in Example 1 emulates the acquisition device of acquisition methods, the acquisition dress with training dataset It sets including voice acquisition module, voice coding module, environmental noise emulation module, communication pattern scrambling emulation module, sound solution Code module and vocal print database module;The voice acquisition module, the voice coding module, the environmental noise emulate mould Block, communication pattern scrambling emulation module, the voice codec module and the voice print database library module are successively linked in sequence.
The Application on Voiceprint Recognition of the present embodiment training dataset emulates acquisition device, and voice acquisition module is for acquiring speaker Primary voice data voice print database;Voice coding module is used to carry out voice coder to the voice print database of primary voice data Code, to obtain the compressed voice data under respective communication mode, i.e. vocoded data;Environmental noise emulation module is used In the environmental noise data to vocoded data mixing selection, the voice data comprising environmental noise is obtained, that is, is realized different Sound coding data under work condition environment;Communication pattern scrambles emulation module and selects respective channel, realization pair according to communication pattern The communication pattern of sound coding data emulates;Voice codec module according to corresponding decoding algorithm, to by environmental noise emulation and The compressed data of channel simulator carries out voice codec, to obtain the output voice of needs;Voice print database library module is used to obtaining The simulated voice data obtained establish data training sample according to voice and characteristic model data item and voice messaging data item respectively Warehouse, and calling is provided according to authority data interface.
It should be noted that the voice acquisition module, the voice coding module, the environmental noise emulation module, The communication pattern scrambling emulation module, the voice codec module and the voice print database library module are successively suitable by data line Sequence connection.
It may also be preferred that the voice acquisition module is set as recording pen.
It may also be preferred that the environmental noise emulation module is loudspeaker.
It should be noted that the voice coding module can be using pulse code modulation coding, i.e. pcm encoder.PCM is logical Continuously varying analog signal is converted to digital coding by three oversampling, quantization, coding steps.
The communication pattern scrambling emulation module may refer to publishing house of BJ University of Aeronautics & Astronautics, 1 sunrise of September in 2007 " Communication System Simulation based on MATLAB " of version.
The voice codec module can be acquired and be handled to voice signal using audio chip, and audio coding decoding is calculated Method is integrated in inside hardware, such as MP3 codec chip, speech synthesis analysis chip.Also it can use A/D capture card plus meter Calculation machine forms hardware platform, and audio coding decoding algorithm is realized by the software on computer.Also A/D acquisition chip can be used The acquisition for completing voice signal, the algorithm of Speech processing is realized using the strong chip of programmable data processing capacity, Then it is controlled with ARM(Advanced RISC Machine) processor.
The voice print database library module may refer to " voice print database construction and application ", " first national audio-visual data Test sensitivity technical conferences selected theses ", page number 609-611.
It should be noted that voiceprint, oice feature and model, refer to contained in voice, energy table It seeks peace and identifies the phonetic feature of speaker, and the general name for the speech model established based on these features (parameter).It is collected People, recording target object refer to the single natural person for being recorded voice.Vocal print acquisition, voice recording, Refer to and acquire equipment using the vocal print of profession, according to certain operating process, acquisition meets the voice data of certain technical requirements Process.Voice data, speech data refer to its for the voice and generation that people is collected obtained in vocal print collection process His related data.Efficient voice, valid recorded speech refer to and belong to collected people in voice data and meet technology The voice of parameter.Background noise, background noise refer in addition to mute, the part of non-effective voice.
For every collected people, the efficient voice duration acquired using different expression ways should meet the following conditions: chat Predicate sound effective time is no less than 60 seconds;Reading voice effective time is read to be no less than 30 seconds.Reverberation time: when acquiring the reverberation in place Between≤0.4 second.Noise: the ambient noise≤35dB in place is acquired.
Vocal print type can be following three kinds of situations: (1) basic data: basic data is the voice and feature that system prestores Model data covers multilingual, more regions, multi channel feature.(2) sample data: sample data are unknown speakers Voice and characteristic model data submit storage in such a way that inquiry compares by public security organs at different levels.(3) sample data: sample Data are the voice and characteristic model data of known speaker, and storage is submitted by way of acquisition by public security organs at different levels.
User submits voice document to be detected, is converted to voiceprint after system is handled, touches in voice print database Comparison is hit, collision comparison result is needed comprising four seed types: (1) sample and sample comparison result: sample audio to be checked and sample Library comparison result.Such result can match clear identity, compare the part vocal print of highest scoring with audio to be checked, and then really Personnel belonging to this fixed part vocal print.(2) sample and sample comparison result: sample audio to be checked and sample library comparison result.This Class the result is that personal part belonging to unidentified sample audio to be checked, compared with the vocal print set for not yet identifying identity in sample library Divide highest part vocal print, for matching not clear identity, but was registered in the past, has case information, the portion of highest scoring Divide vocal print, and then determine whether the affiliated people in audio to be checked had case-involving history, carry out string and tracks down range to reduce.(3) Sample and sample comparison result: sample to be examined audio and sample library comparison result.It is such the result is that personal part belonging to having identified Not yet identify that the vocal print of identity compares the part vocal print of highest scoring in sample to be examined audio, with sample library, and then determination is to be checked Whether affiliated people has case-involving history in audio.(4) sample to be examined audio and task sample database comparison result.It is such the result is that having known The sample to be examined audio of personal part belonging to not is compared with the vocal print set for having identified affiliated personal part in sample database, thus Whether people belonging to determining has multiple identities and case-involving history.
(1) the construction meaning of voice print database
With the management method of advanced technological means and science, processing, pipe are acquired to the voice messaging of speaker dependent It ought to use, for solving criminal cases, fighting crime provides evidence, provides Informational support for work such as social security management, technological prevention.
(2) logic structure of data of voice print database
Establish volume of data resource management architecture and data back system, the basis as the system operation of entire vocal print library. It should include following word bank in voice print database, the data structure in each library should meet " vocal print library data structure specification ", and In the constraint of other database design specifications of national public safety field publication.
(3) personnel's sample database of voice print database
The comparison sample data submitted comprising acquisition system personal information collected and sample vocal print and user.Personnel's sample Data in this library are corresponding with specific personal information, and carry out tissue according to multiple dimensions such as personnel's classification, personnel's attributes And storage.
(4) the live sample library of voice print database
Live sample library is used to store the sample data relevant to case of user's submission, including the case voice being related to and case The relevant background information of part.Live sample data should carry out category division according to case feature.
(5) the thematic special project library of voice print database
The needs for setting up special project, special topic in handling all kinds of cases in conjunction with business department, can establish more in voice print database The other library of case, and will save in case-involving sample and sample set into the other library of case, play the work that word bank is divided in conjunction with business characteristic With the precise alignment of realization small range data.Such as anti-terrorism special topic library, telephone fraud special topic library, ban taking addictive drugs thematic library, 75 special projects Library etc..
(6) the basic vocal print library of voice print database
Basic vocal print is the data prestored in systems, covers multilingual, more regions, multi channel vocal print feature.Facilitate Voiceprint analysis expert carries out technical research and study, simultaneously can be used for the self-optimization of vocal print comparison engine.
(7) systematic functional structrue of voice print database
Based on the abundant data of core data layer, need to establish corresponding service system, to the data in voice print database into Row analysis, use, management.The kernel service of voice print database system includes kernel service layer, integrated application layer and data exchange Interface.System architecture diagram refers to appendix A.
(8) the kernel service layer of voice print database
(1) voiceprint registration service
The registration engine that sample or sample voice document are registered as to vocal print is provided, passes through the side of service for upper-layer service system Formula is called;
(2) vocal print compares service
It provides vocal print comparison service to call for upper-layer service system, by automating vocal print comparison engine, realize in voice print database Data are checked in library, and return to the comparison result that can reduce data area, and wherein comparison result includes sample and sample, sample With sample, sample and sample, sample and sample;
(3) vocal print management service
Management service, the service such as modification, deletion, inquiry including data are provided for the data stored in library.
Integrated application layer
(9) user management of voice print database
The login mode that voice print database system should be combined using user name encrypted code or PKI verifying, each login system With there is a corresponding user account per family, and stringent permission control is carried out to it by role.
(10) rights management of voice print database
Voice print database user is related to various rolls and application terminal, and power should be strictly distinguished in the function and data processing of system Limit only can use the function and data of system when Authority Verification passes through, and the operation log recording of user exists automatically In system.
(1) operation monitoring
Operation monitoring includes the data volume of real-time display current system and the operating condition of task, and can pass through monitoring system The operating status of the server nodes such as storage, calculating in real-time understanding system.
(2) data exchange interface
The voice print database of national public security organ should be built according to the requirement in portion, province's (city) two-level configuration, the province of various regions (city) grade voice print database should meet the requirement of " voice print database access criteria ", be realized and national library by data exchange interface Linkage, wherein the data information exchanged should include voiceprint report, compare mission dispatching.
The vocal print acquisition terminal of (11) voice print database
Sample acquisition system in voice print database should meet the requirement of " voiceprint acquisition technique specification ", pass through police network It accesses vocal print and acquires equipment, realize the acquisition of voice print database and report.
The vocal print library server-side of (12) voice print database
Voice print database server-side is divided into registration subsystem according to functional characteristics, storage subsystem, compares subsystem.It considers The data characteristics of vocal print and voice, each subsystem should use distributed storage and Distributed Computing Platform, can be according to data processing The needs of amount, it is flexible to realize horizontal extension and vertical extension, and can support Single Point of Faliure transfer and data thermal backup, in list When platform server breaks down, safeguards system can be operated normally.
The construction principle of (13) voice print database
(1) nurturing of network environment principle
National public security organ's voice print database system will be supported in Functional Design planning centered on the Ministry of Public Security, province's (city) grade is built The two-stage framework in library, the Ministry of Public Security and province's (city) grade platform can access the terminal of service application units at different levels.The level knot of system Composition refers to Appendix B.
(2) for national public security organ voice print database system installation and deployment in public security internal network, all application terminals are equal Intranet access application system is crossed in public security Netcom.
The principles of planning design of (14) voice print database
The construction in vocal print library should meet following generic principles:
(1) reliability: the design of system should be using mature technology and equipment, to reduce technical risk;
(2) scalability: the design of system will have certain scalability, to meet the needs of business development from now on and constantly introduce newly Technology, new equipment avoid disposable excess investment to improve the overall performance of system;
(3) safety: the master-plan of system will fully consider the security performance of system, prevention and dissolve technology risk;
(4) advanced and rational combination: information system correlative technology field must be taken into consideration in the master-plan of system Development and status realize advanced and rational combination, should eliminate using bottleneck, limited fund is used again Key aspect avoids unnecessary waste;
(5) open: system Construction should follow related international standard, and when network equipment type selecting should confirm both there is extensive manufacturer With the support of standard, and meet the main trend of network technical development, and good technical support can be obtained.
The preferred embodiment for the present invention is explained in detail above in conjunction with attached drawing, but the present invention is not limited to above-mentioned implementations Mode within the knowledge of a person skilled in the art can also be without departing from the purpose of the present invention It makes a variety of changes.
Many other changes and remodeling can be made by not departing from the spirit and scope of the present invention.It should be appreciated that the present invention is not It is limited to specific embodiment, the scope of the present invention is defined by the following claims.

Claims (10)

1. a kind of Application on Voiceprint Recognition emulates acquisition methods with training dataset, the acquisition methods include the method for channel coding, ring Method, method, the method for data decoded method and vocal print database establishment of communication pattern emulation of border noise emulation, it is special Sign is,
The method of channel coding the following steps are included:
Read primary voice data;
Header file is removed according to format standard according to the voice print database of the primary voice data, obtains pure speech data block;
Select the voice communication mode to be emulated;
Data volume is carried out to the obtained pure speech data block according to the voice communication mode corresponding speech coding standard Code, obtains compressed voice data;
Environmental noise emulation method the following steps are included:
Select the environmental noise mode to be emulated;
The compressed voice number obtained according to the method for the varying environment noise and sound pressure levels of selection and channel coding According to same channel mixing is carried out, the voice data comprising environmental noise is obtained;
Communication pattern emulation method the following steps are included:
Obtain the scrambling parameter that voice channel transmits under different relative amplitudes and state of signal-to-noise;
Scrambling parameter is selected, includes environmental noise to passing through described in coding, noise mixing in the method for environmental noise emulation Voice data carries out Channel scrambling operation, obtains the Hybrid communication model simulated voice data comprising environmental noise;
The decoded method of data the following steps are included:
Corresponding tone decoding algorithm is selected according to voice communication mode;
The Hybrid communication model simulated voice data comprising environmental noise are added with corresponding tone decoding algorithm corresponding Audio file head obtains trained voice identical with tested speech channel condition;
Voice print database construction method the following steps are included:
Data sample library is established according to voice and characteristic model data item;
Data sample library is established according to voice messaging data item;
Database interface is established according to voice print database interface specification.
2. Application on Voiceprint Recognition as described in claim 1 emulates acquisition methods with training dataset, which is characterized in that channel coding In method, the speech samples of the primary voice data are WAV, MP3 or ACC format.
3. Application on Voiceprint Recognition as described in claim 1 emulates acquisition methods with training dataset, which is characterized in that channel coding In method, the voice communication mode to be emulated includes fixed line, mobile phone GSM, mobile TD-SDMA, connection WCDMA, telecommunications CDMA, LTE, 5G, recording pen or network virtual phone.
4. Application on Voiceprint Recognition as described in claim 1 emulates acquisition methods with training dataset, which is characterized in that environmental noise is imitative In genuine method, the environmental noise mode to be emulated includes indoor environment noise.
5. Application on Voiceprint Recognition as claimed in claim 4 emulates acquisition methods with training dataset, which is characterized in that environmental noise is imitative In genuine method, the indoor environment noise is machine operation noise.
6. Application on Voiceprint Recognition as described in claim 1 emulates acquisition methods with training dataset, which is characterized in that environmental noise is imitative In genuine method, the environmental noise mode to be emulated includes outdoor environment noise.
7. Application on Voiceprint Recognition as claimed in claim 6 emulates acquisition methods with training dataset, which is characterized in that environmental noise is imitative In genuine method, the outdoor environment noise includes sound of the wind noise, patter of rain noise, vehicle noise or machine operation noise.
8. such as the Application on Voiceprint Recognition of any of claims 1-7 acquisition device of training dataset emulation acquisition methods, The acquisition device includes voice acquisition module, voice coding module, environmental noise emulation module, communication pattern scrambling emulation mould Block, voice codec module and vocal print database module;It is characterized in that, the voice acquisition module, the voice coding module, The environmental noise emulation module, communication pattern scrambling emulation module, the voice codec module and the voice print database Library module is successively linked in sequence.
9. the acquisition device that Application on Voiceprint Recognition as claimed in claim 8 emulates acquisition methods with training dataset, which is characterized in that The voice acquisition module is set as recording pen.
10. Application on Voiceprint Recognition as claimed in claim 8 emulates the acquisition device of acquisition methods with training dataset, feature exists In the environmental noise emulation module is loudspeaker.
CN201810895193.XA 2018-08-08 2018-08-08 A kind of Application on Voiceprint Recognition training dataset emulation acquisition methods and its acquisition device Pending CN109192216A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810895193.XA CN109192216A (en) 2018-08-08 2018-08-08 A kind of Application on Voiceprint Recognition training dataset emulation acquisition methods and its acquisition device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810895193.XA CN109192216A (en) 2018-08-08 2018-08-08 A kind of Application on Voiceprint Recognition training dataset emulation acquisition methods and its acquisition device

Publications (1)

Publication Number Publication Date
CN109192216A true CN109192216A (en) 2019-01-11

Family

ID=64920502

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810895193.XA Pending CN109192216A (en) 2018-08-08 2018-08-08 A kind of Application on Voiceprint Recognition training dataset emulation acquisition methods and its acquisition device

Country Status (1)

Country Link
CN (1) CN109192216A (en)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109920435A (en) * 2019-04-09 2019-06-21 厦门快商通信息咨询有限公司 A kind of method for recognizing sound-groove and voice print identification device
CN110390937A (en) * 2019-06-10 2019-10-29 南京硅基智能科技有限公司 A kind of across channel method for recognizing sound-groove based on ArcFace loss algorithm
CN110544469A (en) * 2019-09-04 2019-12-06 秒针信息技术有限公司 Training method and device of voice recognition model, storage medium and electronic device
CN110970035A (en) * 2019-12-06 2020-04-07 广州国音智能科技有限公司 Stand-alone speech recognition method, device and computer-readable storage medium
CN111341323A (en) * 2020-02-10 2020-06-26 厦门快商通科技股份有限公司 Voiceprint recognition training data amplification method and system, mobile terminal and storage medium
CN112802482A (en) * 2021-04-15 2021-05-14 北京远鉴信息技术有限公司 Voiceprint serial-parallel identification method, individual soldier system and storage medium
CN113160834A (en) * 2021-04-27 2021-07-23 河南能创电子科技有限公司 Low-voltage centralized reading, operation and maintenance implementation method based on AI intelligent voice recognition technology
CN113611328A (en) * 2021-06-30 2021-11-05 公安部第一研究所 Voiceprint recognition voice evaluation method and device
CN114070441A (en) * 2021-12-27 2022-02-18 北京中安智能信息科技有限公司 Underwater PCM signal receiving simulation system based on m-sequence coding
CN114783447A (en) * 2022-04-21 2022-07-22 浙江大学 Physical domain identity camouflage system and method based on adversarial samples for voiceprint recognition

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101321387A (en) * 2008-07-10 2008-12-10 中国移动通信集团广东有限公司 Voiceprint recognition method and system based on communication system
CN105580071A (en) * 2013-05-06 2016-05-11 谷歌技术控股有限责任公司 Method and apparatus for training a voice recognition model database
CN106384588A (en) * 2016-09-08 2017-02-08 河海大学 Additive noise and short time reverberation combined compensation method based on vector Taylor series
CN106531155A (en) * 2015-09-10 2017-03-22 三星电子株式会社 Apparatus and method for generating acoustic model, and apparatus and method for speech recognition
CN106782565A (en) * 2016-11-29 2017-05-31 重庆重智机器人研究院有限公司 A kind of vocal print feature recognition methods and system
CN107481723A (en) * 2017-08-28 2017-12-15 清华大学 A channel matching method and device for voiceprint recognition

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101321387A (en) * 2008-07-10 2008-12-10 中国移动通信集团广东有限公司 Voiceprint recognition method and system based on communication system
CN105580071A (en) * 2013-05-06 2016-05-11 谷歌技术控股有限责任公司 Method and apparatus for training a voice recognition model database
CN106531155A (en) * 2015-09-10 2017-03-22 三星电子株式会社 Apparatus and method for generating acoustic model, and apparatus and method for speech recognition
CN106384588A (en) * 2016-09-08 2017-02-08 河海大学 Additive noise and short time reverberation combined compensation method based on vector Taylor series
CN106782565A (en) * 2016-11-29 2017-05-31 重庆重智机器人研究院有限公司 A kind of vocal print feature recognition methods and system
CN107481723A (en) * 2017-08-28 2017-12-15 清华大学 A channel matching method and device for voiceprint recognition

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109920435A (en) * 2019-04-09 2019-06-21 厦门快商通信息咨询有限公司 A kind of method for recognizing sound-groove and voice print identification device
CN110390937B (en) * 2019-06-10 2021-12-24 南京硅基智能科技有限公司 Cross-channel voiceprint recognition method based on ArcFace loss algorithm
CN110390937A (en) * 2019-06-10 2019-10-29 南京硅基智能科技有限公司 A kind of across channel method for recognizing sound-groove based on ArcFace loss algorithm
CN110544469A (en) * 2019-09-04 2019-12-06 秒针信息技术有限公司 Training method and device of voice recognition model, storage medium and electronic device
CN110544469B (en) * 2019-09-04 2022-04-19 秒针信息技术有限公司 Training method and device of voice recognition model, storage medium and electronic device
CN110970035A (en) * 2019-12-06 2020-04-07 广州国音智能科技有限公司 Stand-alone speech recognition method, device and computer-readable storage medium
CN111341323A (en) * 2020-02-10 2020-06-26 厦门快商通科技股份有限公司 Voiceprint recognition training data amplification method and system, mobile terminal and storage medium
CN112802482A (en) * 2021-04-15 2021-05-14 北京远鉴信息技术有限公司 Voiceprint serial-parallel identification method, individual soldier system and storage medium
CN113160834A (en) * 2021-04-27 2021-07-23 河南能创电子科技有限公司 Low-voltage centralized reading, operation and maintenance implementation method based on AI intelligent voice recognition technology
CN113611328A (en) * 2021-06-30 2021-11-05 公安部第一研究所 Voiceprint recognition voice evaluation method and device
CN114070441A (en) * 2021-12-27 2022-02-18 北京中安智能信息科技有限公司 Underwater PCM signal receiving simulation system based on m-sequence coding
CN114070441B (en) * 2021-12-27 2024-07-30 北京中安智能信息科技有限公司 Underwater PCM signal receiving simulation system based on m-sequence coding
CN114783447A (en) * 2022-04-21 2022-07-22 浙江大学 Physical domain identity camouflage system and method based on adversarial samples for voiceprint recognition

Similar Documents

Publication Publication Date Title
CN109192216A (en) A kind of Application on Voiceprint Recognition training dataset emulation acquisition methods and its acquisition device
CN107222865B (en) Communication swindle real-time detection method and system based on suspicious actions identification
CN111951823B (en) Audio processing method, device, equipment and medium
US7716048B2 (en) Method and apparatus for segmentation of audio interactions
US20120303369A1 (en) Energy-Efficient Unobtrusive Identification of a Speaker
CN109473108A (en) Auth method, device, equipment and storage medium based on Application on Voiceprint Recognition
CN103377651B (en) The automatic synthesizer of voice and method
CN108833722A (en) Audio recognition method, device, computer equipment and storage medium
CN110232932A (en) Method for identifying speaker, device, equipment and medium based on residual error time-delay network
CN103078995A (en) Customizable individualized response method and system used in mobile terminal
CN109873907A (en) Call processing method, device, computer equipment and storage medium
CN107481723A (en) A channel matching method and device for voiceprint recognition
CN113539232B (en) Voice synthesis method based on lesson-admiring voice data set
CN112037772A (en) Multi-mode-based response obligation detection method, system and device
Yi et al. Scenefake: An initial dataset and benchmarks for scene fake audio detection
CN117037772A (en) Voice audio segmentation method, device, computer equipment and storage medium
CN106710591A (en) Voice customer service system for power terminal
KR102389995B1 (en) Method for generating spontaneous speech, and computer program recorded on record-medium for executing method therefor
CN103474062A (en) Voice identification method
Yi et al. ADD 2023: Towards Audio Deepfake Detection and Analysis in the Wild
CN117351948A (en) Training method of voice recognition model, voice recognition method, device and equipment
CN110298150A (en) A kind of auth method and system based on speech recognition
KR102395399B1 (en) Voice data disassemble method for speech recognition learning, and computer program recorded on record-medium for executing method therefor
KR20130073643A (en) Group mapping data building server, sound recognition server and method thereof by using personalized phoneme
CN113990288A (en) Method and system for automatically generating and deploying speech synthesis model by speech customer service

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20190111