WO2015012680A2 - A method for speech watermarking in speaker verification - Google Patents
A method for speech watermarking in speaker verification Download PDFInfo
- Publication number
- WO2015012680A2 WO2015012680A2 PCT/MY2014/000138 MY2014000138W WO2015012680A2 WO 2015012680 A2 WO2015012680 A2 WO 2015012680A2 MY 2014000138 W MY2014000138 W MY 2014000138W WO 2015012680 A2 WO2015012680 A2 WO 2015012680A2
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- speaker
- speech
- speech signal
- watermarking
- voice
- Prior art date
Links
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/018—Audio watermarking, i.e. embedding inaudible data in the audio signal
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
- G10L17/20—Pattern transformations or operations aimed at increasing system robustness, e.g. against channel noise or different working conditions
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/78—Detection of presence or absence of voice signals
Definitions
- This invention relates to a method for speech watermarking to provide a secure communication system in speaker verification, and more particularly to a method for speech watermarking by taking into account speaker-specific information and characteristics of speech features.
- Speaker verification is a process to verify speaker identity in a speech signal to provide secure access in communication system particularly in a distance communication system involving critical subject matter such as telephone banking and air traffic control.
- speaker verification process is a must and need to be employed before taking further action.
- Conventional speaker verification techniques are exposed to two possible vulnerable points. Firstly, speech could be manipulated while speech is recorded before being transmitted and secondly when speech signal passes through the communication channel.
- Speech watermarking improves security of the conventional speaker verification by embedding watermark inside the speech signal at a transmitter side and extracting on a receiver side. Apart from the security issues, selecting proper features is another concern for the conventional speaker verification due to discriminant ability, reliability and robustness. Speaker recognition base on speech features has several common problems such as long-term effects due to physiological changes, the emotional state of the speaker, illness, time of the day, fatigue or tiredness, and auditory accommodation. This is due to the speaker-specific features having different concentration in each speech signal frame. Other problems of feature base speaker verification are time and cost of training, amount of data for training, level of security to achieve, and developing text dependant or text independent system. Furthermore, noise in the speech signal is a major contributor for mismatch between training and testing phases which could degrade speaker verification performance. Many researchers have tried to combat with undesired features effect as long as developing speaker modelling techniques to improve the accuracy.
- US patent 6892175 B1 disclosed a method for encoding watermark in digital message such as a speech signal.
- the cited patent generates a spread spectrum signal, wherein the spread spectrum signal is representative of the digital information and further embedding the spread spectrum signal in the speech signal.
- Drawback of the cited patent is that the spread spectrum signal of the watermark is embedded in all frames of the speech signal. As the speech signal has less bandwidth compare to audio signal, thus speech signal can carry less watermark bits than the audio signal which is lead to less watermark capacity.
- implementing speech watermarking in all frames of the speech signal may degrade accuracy of the speaker verification while consuming more time.
- the present invention relates to a method for speech watermarking in speaker verification, comprising the steps of: embedding watermark data into speech signal at a transmitter; and extracting watermark data from the speech signal at a receiver; characterised by the steps of: selecting frames having least speaker-specific information from the speech signal to carry watermark data; detecting voice activity to detect presence or absence of speaker's voice in the speech signal; and embedding watermark data into the selected frames of the speech signal according to the presence or absence of the speaker's voice.
- Fig. 1 is a flow chart of a method for embedding speech watermarking in speaker verification of the present invention.
- Fig. 2 is a schematic diagram of a method for speech watermarking in speaker verification of the present invention.
- Fig. 3 is a flow chart of frame selection in the method of the speech watermarking in the present invention.
- Fig. 4 shows a step of detecting voice activity for separating voice and non-voice frames.
- Fig. 5 is a schematic diagram for a method of embedding the speech watermarking in speaker verification in the present invention.
- Fig. 6 is a schematic diagram for a method of extracting speech watermarking in speaker verification in the present invention.
- the present invention provides a method for speech watermarking in speaker verification, comprising the steps of:
- the method for speech watermarking in speaker verification of the present invention comprises embedding watermark data into the speech signal.
- the embedding watermark process is employed at the transmitter side whereby only watermarked speech signal is available at the receiver. Then, the watermarked speech signal is transmitted over a communication channel to the receiver to go through a watermark extraction method as shown in Fig. 2 before being further processed.
- the speech signal is first undergo a frame selection to prioritize frames of the speech signal to carry the watermark data. This is due to the speaker specific-information are not uniformly distributed into all frames of the speech signal.
- the speaker-specific information depends on system noise, fundamental frequencies, system features and source features of the speaker-specific information.
- the system features relates to structure of speaker vocal fold while source features are up to manner and vibration of speaker vocal cords.
- Fig. 3 shows a preferred embodiment of the step for selecting frames of the speech signal.
- fundamental frequency estimation for the frame selection is estimated using Linear Predictive Coding (LPC).
- LPC Linear Predictive Coding
- GCI glottal closure instance
- most speaker discriminant frequencies are located in low frequencies below 600 Hz and high frequencies above 3500 Hz. Some frequencies are located in mid frequency area of 500Hz to 3500Hz which is most important for phonetic speech verification.
- phonetic speaker verification shows that stop, fricative, nasal, diphthongs and vowel have important speaker-specific information in ascending order.
- Said frequencies are then weighted for comparison between frames of the speech signal.
- higher-order spectral analysis HOS is also applied to each frame to detect associated Gaussianity of the speech signal such as speech enhancement, channel selection and blind source separation.
- variance, skewness and kurtosis are applied to select most noisy frame from other frames in the speech signal.
- the most noisy frame is preferred as noise is known to be the main source of mismatch between enrolment (training) and testing sets in speaker verification systems. In addition to that, noise does not carry much speaker-specific information.
- the frames with least speaker-specific information are selected to carry the watermark data Therefore, the embedded watermark cannot change the noisy frame severely. Thus, the watermark will be imperceptible and inaudible.
- voice activity detection is applied to the selected frames to detect presence or absence of speaker's voice in the speech signal.
- the step of detecting voice activity in the speech signal categorizes the selected frames into voice and non-voice frames.
- Magnitude Sum Function (MSF), Pitch period and Zero Crossing Rates (ZCR) is utilized to determine the voiced and non-voiced frames.
- Fig. 4 shows a preferred embodiment of the voice activity detection for separating voice and non-voice frames.
- ZCR is counting number of times that speech signal cross the X axis.
- the non-voice frame has more ZCR than voice frame due to high frequency character.
- the MSF shows energy of the speech signal wherein in the preferred embodiment show that voice frame has more energy than non-voice frame due to lower frequency. Also, shown in Fig. 4 that pitch period in voice frame is higher than the non-voice frame.
- the steps of embedding watermark data comprises of modifying probability distribution function of Linear Predictive Coding (LPC) coefficients.
- LPC Linear Predictive Coding
- constants may be applied to shape the probability density function of the LPCs in the method for embedding the speech watermarking. This is done by multiplying a constant to all LPCs and adding another constant to all LPCs. These constants change the variance and mean of the LPCs. Therefore, all LPCs of the speech frames will be embedded with the watermark to increase robustness instead of only embedding the watermark in just one LPC.
- Fig.5 shows preferred embodiment of a schematic diagram for a method of embedding the speech watermarking in speaker verification.
- the schematic diagram depicted how the probability density function is shaped by constants named alpha and beta.
- Fig.6 shows preferred embodiment of a schematic diagram for a method of embedding the speech watermarking in speaker verification. The preferred embodiment in Fig. 6 shows how watermark may be detected by using mean and standard deviation.
- the step of extracting watermark data from the speech signal is comprises the steps of:
- the step of extracting watermark data from the speech signal is performed on the receiver side of the communication system.
- the receiver receives the watermarked speech signal
- the watermarked frames must be distinguished from non-watermarked frames. Therefore, synchronization is performed to arrange the received speech signals.
- the step of performing synchronization may also improve timing and robustness between the transmitter and the receiver.
- other information like meta data, parity, cycling redundancy check (CRC) and watermark information may also be sent from the transmitter to the receiver.
- CRC cycling redundancy check
- synchronization is performed for timing between transmitter and receiver. Then, based on synchronization information the watermarked speech signal is segmented to frames. Then, Voice Activity Detection (VAD) is applied to each frame to distinguish voice or non-voice speech signal. Fourth, based on VAD decision, type of watermark method is found and LPCs are extracted from the frame. Finally, watermark is detected based on the shape of probability density function of LPCs in the method of embedding the speech watermarking.
- VAD Voice Activity Detection
- frames of the least speaker-specific information are selected to carry the watermark data to preserve performance of features in the speaker-specific information in the speaker verification.
- the method in the present invention may stand alone as a method for speech watermarking and may also be used in conventional speaker verification to solve security problems over channels without any degradation over performance, accuracy and efficiency.
Landscapes
- Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Mobile Radio Communication Systems (AREA)
- Editing Of Facsimile Originals (AREA)
Abstract
The present invention relates toamethod forspeech watermarking inspeaker verification,comprising the steps of: embedding watermark data into speech signal at a transmitter; and extracting watermark data from the speech signal at a receiver;characterisedby the steps of: selecting frameshavingleast speaker-specific information fromthe speech signal to carry watermark data; detecting voice activity to detect presence or absence of speaker's voice in the speech signal;and embedding watermark data into the selected frames of the speech signal according to thepresence or absence of the speaker's voice.
Description
A METHOD FOR SPEECH WATERMARKING IN SPEAKER VERIFICATION
Background of the Invention
Field of the Invention
This invention relates to a method for speech watermarking to provide a secure communication system in speaker verification, and more particularly to a method for speech watermarking by taking into account speaker-specific information and characteristics of speech features. Description of Related Arts
Speaker verification is a process to verify speaker identity in a speech signal to provide secure access in communication system particularly in a distance communication system involving critical subject matter such as telephone banking and air traffic control. In order to establish a secure communication system, speaker verification process is a must and need to be employed before taking further action. Conventional speaker verification techniques are exposed to two possible vulnerable points. Firstly, speech could be manipulated while speech is recorded before being transmitted and secondly when speech signal passes through the communication channel.
There are various techniques for performing speaker verification and one of the most well-known techniques is speech watermarking. Speech watermarking improves security of the conventional speaker verification by embedding watermark inside the speech signal at a transmitter side and extracting on a receiver side. Apart from the security issues, selecting proper features is another concern for the conventional speaker verification due to discriminant ability, reliability and robustness. Speaker recognition base on speech features has several common problems such as long-term effects due to physiological changes, the emotional state of the speaker, illness, time of the day, fatigue or tiredness, and auditory accommodation.
This is due to the speaker-specific features having different concentration in each speech signal frame. Other problems of feature base speaker verification are time and cost of training, amount of data for training, level of security to achieve, and developing text dependant or text independent system. Furthermore, noise in the speech signal is a major contributor for mismatch between training and testing phases which could degrade speaker verification performance. Many researchers have tried to combat with undesired features effect as long as developing speaker modelling techniques to improve the accuracy.
One of the prior art is US patent 6892175 B1 , disclosed a method for encoding watermark in digital message such as a speech signal. The cited patent generates a spread spectrum signal, wherein the spread spectrum signal is representative of the digital information and further embedding the spread spectrum signal in the speech signal. Drawback of the cited patent is that the spread spectrum signal of the watermark is embedded in all frames of the speech signal. As the speech signal has less bandwidth compare to audio signal, thus speech signal can carry less watermark bits than the audio signal which is lead to less watermark capacity. Furthermore, implementing speech watermarking in all frames of the speech signal may degrade accuracy of the speaker verification while consuming more time.
In the paper of Marcos Faundez-Zanuy et al. disclosed speech watermarking which combines the spread spectrum approach with a simplified frequency masking. However, the present paper also does not consider the speaker-specific features for embedding the watermark data. Therefore, a lot challenges and opportunity in robustness, accuracy, and efficiency of the speech watermarking methods are yet to be explored particularly in distance speaker verification. Accordingly, it can be seen in the prior arts that there exists a need to provide a speech watermarking method for more secured while efficiently considering
speaker-specific features of speech signal in speaker verification process. The speech watermarking method should be robust under unintentional attacks (i.e background noise, compression, amplitude scaling) and fragile under intentional attacks (i.e copying, cutting or removing). The speech watermarking method must also provide enough capacity to transmit verification data through speech signal. Also, there is trade-off between capacity, inaudibility and robustness that should be considered for designing speech watermark method.
References
· Marcos Faundez-Zanuy et al., Pattern Recognition Journal, Elsevier, volume 40, pages 3027-3034, February 2007.
• Faundez-Zanuy, Marcos, Jose J. Lucena-Molina, and Martin Hagmiiller.
"Speech Watermarking: An Approach for the Forensic Analysis of Digital Telephonic Recordings*." Journal of forensic sciences 55.4 (2010): 1080-1087.
Summary of Invention
It is an objective of the present invention to provide a robust, efficient and accurate speech watermarking method in speaker verification technique.
It is also an objective of the present invention to provide speech watermarking method having least speaker-specific features.
It is yet another objective of the present invention to provide speech watermarking method by selecting frames with the least speaker-specific features to carry watermark data.
It is a further objective of the present invention to provide an efficient speech watermarking method for a genuine distance speaker verification technique.
Accordingly, these objectives may be achieved by following the teachings of the present invention. The present invention relates to a method for speech
watermarking in speaker verification, comprising the steps of: embedding watermark data into speech signal at a transmitter; and extracting watermark data from the speech signal at a receiver; characterised by the steps of: selecting frames having least speaker-specific information from the speech signal to carry watermark data; detecting voice activity to detect presence or absence of speaker's voice in the speech signal; and embedding watermark data into the selected frames of the speech signal according to the presence or absence of the speaker's voice. Brief Description of the Drawings
The features of the invention will be more readily understood and appreciated from the following detailed description when read in conjunction with the accompanying drawings of the preferred embodiment of the present invention, in which: Fig. 1 is a flow chart of a method for embedding speech watermarking in speaker verification of the present invention.
Fig. 2 is a schematic diagram of a method for speech watermarking in speaker verification of the present invention.
Fig. 3 is a flow chart of frame selection in the method of the speech watermarking in the present invention.
Fig. 4 shows a step of detecting voice activity for separating voice and non-voice frames.
Fig. 5 is a schematic diagram for a method of embedding the speech watermarking in speaker verification in the present invention.
Fig. 6 is a schematic diagram for a method of extracting speech watermarking in speaker verification in the present invention.
Detailed Description of the Invention
As required, detailed embodiments of the present invention are disclosed herein; however, it is to be understood that the disclosed embodiments are merely
exemplary of the invention, which may be embodied in various forms. Therefore, specific structural and functional details disclosed herein are not to be interpreted as limiting but merely as a basis for claims. It should be understood that the drawings and detailed description thereto are not intended to limit the invention to the particular form disclosed, but on the contrary, the invention is to cover all modification, equivalents and alternatives falling within the scope of the present invention as defined by the appended claims. As used throughout this application, the word "may" is used in a permissive sense (i.e., meaning having the potential to), rather than the mandatory sense (i.e., meaning must). Similarly, the words "include," "including," and "includes" mean including, but not limited to. Further, the words "a" or "an" mean "at least one" and the word "plurality" means one or more, unless otherwise mentioned. Where the abbreviations or technical terms are used, these indicate the commonly accepted meanings as known in the technical field. For ease of reference, common reference numerals will be used throughout the figures when referring to the same or similar features common to the figures. The present invention will now be described with reference to Figs. 1-6.
The present invention provides a method for speech watermarking in speaker verification, comprising the steps of:
embedding watermark data into speech signal at a transmitter; and extracting watermark data from the speech signal at a receiver; characterised by the steps of.
selecting frames having least speaker-specific information from the speech signal to carry watermark data;
detecting voice activity to detect presence or absence of speaker's voice in the speech signal; and
embedding watermark data into the selected frames of the speech signal according to the presence or absence of the speaker's voice.
Referring to Fig. 1 , the method for speech watermarking in speaker verification of the present invention comprises embedding watermark data into the speech signal.
The embedding watermark process is employed at the transmitter side whereby only watermarked speech signal is available at the receiver. Then, the watermarked speech signal is transmitted over a communication channel to the receiver to go through a watermark extraction method as shown in Fig. 2 before being further processed.
As shown in Fig.3, the speech signal is first undergo a frame selection to prioritize frames of the speech signal to carry the watermark data. This is due to the speaker specific-information are not uniformly distributed into all frames of the speech signal. In a preferred embodiment of the method for speech watermarking in speaker verification, the speaker-specific information depends on system noise, fundamental frequencies, system features and source features of the speaker-specific information. In a preferred embodiment, the system features relates to structure of speaker vocal fold while source features are up to manner and vibration of speaker vocal cords.
Fig. 3 shows a preferred embodiment of the step for selecting frames of the speech signal. In a preferred embodiment, fundamental frequency estimation for the frame selection is estimated using Linear Predictive Coding (LPC). LPC is applied on each frame of the speech signal to calculate prediction number of dominant frequencies and prediction error for extracting glottal closure instance (GCI) in next step. In a preferred embodiment, most speaker discriminant frequencies are located in low frequencies below 600 Hz and high frequencies above 3500 Hz. Some frequencies are located in mid frequency area of 500Hz to 3500Hz which is most important for phonetic speech verification. In the preferred embodiment, phonetic speaker verification shows that stop, fricative, nasal, diphthongs and vowel have important speaker-specific information in ascending order. Said frequencies are then weighted for comparison between frames of the speech signal.
In the preferred embodiment in the present invention, higher-order spectral analysis (HOS) is also applied to each frame to detect associated Gaussianity of the speech signal such as speech enhancement, channel selection and blind source separation. In the preferred embodiment, variance, skewness and kurtosis are applied to select most noisy frame from other frames in the speech signal. The most noisy frame is preferred as noise is known to be the main source of mismatch between enrolment (training) and testing sets in speaker verification systems. In addition to that, noise does not carry much speaker-specific information. In a preferred embodiment of the method for speech watermarking in speaker verification, the frames with least speaker-specific information are selected to carry the watermark data Therefore, the embedded watermark cannot change the noisy frame severely. Thus, the watermark will be imperceptible and inaudible. When the least speaker-specific features frames are selected, voice activity detection is applied to the selected frames to detect presence or absence of speaker's voice in the speech signal. The step of detecting voice activity in the speech signal categorizes the selected frames into voice and non-voice frames. In a preferred embodiment, Magnitude Sum Function (MSF), Pitch period and Zero Crossing Rates (ZCR) is utilized to determine the voiced and non-voiced frames. Fig. 4 shows a preferred embodiment of the voice activity detection for separating voice and non-voice frames. ZCR is counting number of times that speech signal cross the X axis. In a preferred embodiment, the non-voice frame has more ZCR than voice frame due to high frequency character. On the other hand, the MSF shows energy of the speech signal wherein in the preferred embodiment show that voice frame has more energy than non-voice frame due to lower frequency. Also, shown in Fig. 4 that pitch period in voice frame is higher than the non-voice frame.
In a preferred embodiment of the method for speech watermarking in speaker verification, the steps of embedding watermark data comprises of modifying probability distribution function of Linear Predictive Coding (LPC) coefficients. However, it may be difficult to modify each LPC because the LPCs are varying
during the embedding and extraction process even without speech manipulation attack. In another preferred embodiment, constants may be applied to shape the probability density function of the LPCs in the method for embedding the speech watermarking. This is done by multiplying a constant to all LPCs and adding another constant to all LPCs. These constants change the variance and mean of the LPCs. Therefore, all LPCs of the speech frames will be embedded with the watermark to increase robustness instead of only embedding the watermark in just one LPC. Fig.5 shows preferred embodiment of a schematic diagram for a method of embedding the speech watermarking in speaker verification. In the preferred embodiment, the schematic diagram depicted how the probability density function is shaped by constants named alpha and beta. Fig.6 shows preferred embodiment of a schematic diagram for a method of embedding the speech watermarking in speaker verification. The preferred embodiment in Fig. 6 shows how watermark may be detected by using mean and standard deviation.
In a preferred embodiment of the method for speech watermarking in speaker verification, the step of extracting watermark data from the speech signal is comprises the steps of:
performing synchronization of a decoder to the speech signal;
detecting voice activity to detect presence or absence of speaker's voice in the speech signal; and
extracting watermark data from the speech signal according to the presence or absence of the speaker's voice.
In the preferred embodiment of the present invention as shown in Fig.2, the step of extracting watermark data from the speech signal is performed on the receiver side of the communication system. When the receiver receives the watermarked speech signal, the watermarked frames must be distinguished from non-watermarked frames. Therefore, synchronization is performed to arrange the received speech signals. The step of performing synchronization may also
improve timing and robustness between the transmitter and the receiver. Besides that, through the step of synchronization, other information like meta data, parity, cycling redundancy check (CRC) and watermark information may also be sent from the transmitter to the receiver.
Firstly in the step of performing synchronization, synchronization is performed for timing between transmitter and receiver. Then, based on synchronization information the watermarked speech signal is segmented to frames. Then, Voice Activity Detection (VAD) is applied to each frame to distinguish voice or non-voice speech signal. Fourth, based on VAD decision, type of watermark method is found and LPCs are extracted from the frame. Finally, watermark is detected based on the shape of probability density function of LPCs in the method of embedding the speech watermarking. According to the method for speech watermarking in the present invention considering speaker-specific information and embedding watermark data regarding to speech characteristics of voice and non-voice frames provide solution to security issues in speaker verification as well as improving the efficacy and accuracy of the speaker verification. Therefore, frames of the least speaker-specific information are selected to carry the watermark data to preserve performance of features in the speaker-specific information in the speaker verification. The method in the present invention may stand alone as a method for speech watermarking and may also be used in conventional speaker verification to solve security problems over channels without any degradation over performance, accuracy and efficiency.
Although the present invention has been described with reference to specific embodiments, also shown in the appended figures, it will be apparent for those skilled in the art that many variations and modifications can be done within the scope of the invention as described in the specification and defined in the following claims.
Claims
A method for speech watermarking in speaker verification, comprising the steps of:
embedding watermark data into speech signal at a transmitter; and extracting watermark data from the speech signal at a receiver; characterised by the steps of:
selecting frames having least speaker-specific information from the speech signal to carry watermark data;
detecting voice activity to detect presence or absence of speaker's voice in the speech signal; and
embedding watermark data into the selected frames of the speech signal according to the presence or absence of the speaker's voice.
A method for speech watermarking in speaker verification according to claim 1 , wherein speaker-specific information depends on system noise, fundamental frequencies, system features and source features of the speaker-specific information.
A method for speech watermarking in speaker verification according to claim 1 , wherein the step of extracting watermark data from the speech signal is comprises the steps of:
performing synchronization of a decoder to the speech signal; detecting voice activity to detect presence or absence of speaker's voice in the speech signal; and
extracting watermark data from the speech signal according to the presence or absence of the speaker's voice.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
MYPI2013701280A MY180944A (en) | 2012-09-14 | 2013-07-22 | A method for speech watermarking in speaker verification |
MYPI2013701280 | 2013-07-22 |
Publications (2)
Publication Number | Publication Date |
---|---|
WO2015012680A2 true WO2015012680A2 (en) | 2015-01-29 |
WO2015012680A3 WO2015012680A3 (en) | 2015-03-26 |
Family
ID=51542420
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/MY2014/000138 WO2015012680A2 (en) | 2013-07-22 | 2014-05-29 | A method for speech watermarking in speaker verification |
Country Status (1)
Country | Link |
---|---|
WO (1) | WO2015012680A2 (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
GB2552722A (en) * | 2016-08-03 | 2018-02-07 | Cirrus Logic Int Semiconductor Ltd | Speaker recognition |
US10950245B2 (en) | 2016-08-03 | 2021-03-16 | Cirrus Logic, Inc. | Generating prompts for user vocalisation for biometric speaker recognition |
CN113113021A (en) * | 2021-04-13 | 2021-07-13 | 效生软件科技(上海)有限公司 | Voice biological recognition authentication real-time detection method and system |
US11269976B2 (en) | 2019-03-20 | 2022-03-08 | Saudi Arabian Oil Company | Apparatus and method for watermarking a call signal |
CN114999502A (en) * | 2022-05-19 | 2022-09-02 | 贵州财经大学 | Adaptive word framing based voice content watermark generation and embedding method and voice content integrity authentication and tampering positioning method |
CN118398021A (en) * | 2024-05-15 | 2024-07-26 | 北京和人广智科技有限公司 | Audio watermark processing method and device |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106531176B (en) * | 2016-10-27 | 2019-09-24 | 天津大学 | The digital watermarking algorithm of audio signal tampering detection and recovery |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6892175B1 (en) | 2000-11-02 | 2005-05-10 | International Business Machines Corporation | Spread spectrum signaling for speech watermarking |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP0896712A4 (en) * | 1997-01-31 | 2000-01-26 | T Netix Inc | System and method for detecting a recorded voice |
-
2014
- 2014-05-29 WO PCT/MY2014/000138 patent/WO2015012680A2/en active Application Filing
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6892175B1 (en) | 2000-11-02 | 2005-05-10 | International Business Machines Corporation | Spread spectrum signaling for speech watermarking |
Non-Patent Citations (2)
Title |
---|
FAUNDEZ-ZANUY; MARCOS; JOSE J. LUCENA-MOLINA; MARTIN HAGMULLER.: "Speech Watermarking: An Approach for the Forensic Analysis of Digital Telephonic Recordings", JOURNAL OF FORENSIC SCIENCES, vol. 55.4, 2010, pages 1080 - 1087, XP055159377, DOI: doi:10.1111/j.1556-4029.2010.01395.x |
MARCOS FAUNDEZ-ZANUY ET AL.: "Pattern Recognition Journal", vol. 40, February 2007, ELSEVIER, pages: 3027 - 3034 |
Cited By (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
GB2552722A (en) * | 2016-08-03 | 2018-02-07 | Cirrus Logic Int Semiconductor Ltd | Speaker recognition |
WO2018025024A1 (en) * | 2016-08-03 | 2018-02-08 | Cirrus Logic International Semiconductor Limited | Speaker recognition |
GB2567339A (en) * | 2016-08-03 | 2019-04-10 | Cirrus Logic Int Semiconductor Ltd | Speaker recognition |
US10726849B2 (en) | 2016-08-03 | 2020-07-28 | Cirrus Logic, Inc. | Speaker recognition with assessment of audio frame contribution |
US10950245B2 (en) | 2016-08-03 | 2021-03-16 | Cirrus Logic, Inc. | Generating prompts for user vocalisation for biometric speaker recognition |
GB2567339B (en) * | 2016-08-03 | 2022-04-06 | Cirrus Logic Int Semiconductor Ltd | Speaker recognition |
US11735191B2 (en) | 2016-08-03 | 2023-08-22 | Cirrus Logic, Inc. | Speaker recognition with assessment of audio frame contribution |
US11269976B2 (en) | 2019-03-20 | 2022-03-08 | Saudi Arabian Oil Company | Apparatus and method for watermarking a call signal |
CN113113021A (en) * | 2021-04-13 | 2021-07-13 | 效生软件科技(上海)有限公司 | Voice biological recognition authentication real-time detection method and system |
CN114999502A (en) * | 2022-05-19 | 2022-09-02 | 贵州财经大学 | Adaptive word framing based voice content watermark generation and embedding method and voice content integrity authentication and tampering positioning method |
CN114999502B (en) * | 2022-05-19 | 2023-01-06 | 贵州财经大学 | Voice content watermark generation, embedding method, voice content integrity authentication and tampering location method based on adaptive word framing |
CN118398021A (en) * | 2024-05-15 | 2024-07-26 | 北京和人广智科技有限公司 | Audio watermark processing method and device |
Also Published As
Publication number | Publication date |
---|---|
WO2015012680A3 (en) | 2015-03-26 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2015012680A2 (en) | A method for speech watermarking in speaker verification | |
Mak et al. | A study of voice activity detection techniques for NIST speaker recognition evaluations | |
Cooke | A glimpsing model of speech perception in noise | |
EP2224433B1 (en) | An apparatus for processing an audio signal and method thereof | |
Nematollahi et al. | An overview of digital speech watermarking | |
Hu et al. | Segregation of unvoiced speech from nonspeech interference | |
RU2680352C1 (en) | Encoding mode determining method and device, the audio signals encoding method and device and the audio signals decoding method and device | |
Kakouros et al. | Evaluation of spectral tilt measures for sentence prominence under different noise conditions | |
Celik et al. | Pitch and duration modification for speech watermarking | |
Hummersone | A psychoacoustic engineering approach to machine sound source separation in reverberant environments | |
Wang et al. | Tampering Detection Scheme for Speech Signals using Formant Enhancement based Watermarking. | |
Li et al. | Cross-domain audio deepfake detection: Dataset and analysis | |
Ahmadi et al. | Sparse coding of the modulation spectrum for noise-robust automatic speech recognition | |
Ijitona et al. | Improved silence-unvoiced-voiced (SUV) segmentation for dysarthric speech signals using linear prediction error variance | |
Srinivasan et al. | A model for multitalker speech perception | |
Nematollahi et al. | Semifragile speech watermarking based on least significant bit replacement of line spectral frequencies | |
JP2002169579A (en) | Device for embedding additional data in audio signal and device for reproducing additional data from audio signal | |
Joglekar et al. | DeepComboSAD: Spectro-Temporal Correlation Based Speech Activity Detection for Naturalistic Audio Streams | |
Wang et al. | Watermarking of speech signals based on formant enhancement | |
Nishimura | Reversible audio data hiding based on variable error-expansion of linear prediction for segmental audio and G. 711 speech | |
Wang et al. | Speech Watermarking Based on Source-filter Model of Speech Production. | |
Mawalim et al. | Improving security in McAdams coefficient-based speaker anonymization by watermarking method | |
Patel et al. | Security Issues In Speech Watermarking For Information Transmission | |
Nematollahi et al. | Research Article Semifragile Speech Watermarking Based on Least Significant Bit Replacement of Line Spectral Frequencies | |
Heckmann et al. | Speaker independent voiced-unvoiced detection evaluated in different speaking styles. |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 14766550 Country of ref document: EP Kind code of ref document: A2 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 14766550 Country of ref document: EP Kind code of ref document: A2 |