EP1730728A1 - Procede et systeme de conversion rapides d'un signal vocal - Google Patents
Procede et systeme de conversion rapides d'un signal vocalInfo
- Publication number
- EP1730728A1 EP1730728A1 EP05735426A EP05735426A EP1730728A1 EP 1730728 A1 EP1730728 A1 EP 1730728A1 EP 05735426 A EP05735426 A EP 05735426A EP 05735426 A EP05735426 A EP 05735426A EP 1730728 A1 EP1730728 A1 EP 1730728A1
- Authority
- EP
- European Patent Office
- Prior art keywords
- acoustic characteristics
- model
- speaker
- converted
- transformation
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
- 238000000034 method Methods 0.000 title claims abstract description 56
- 238000006243 chemical reaction Methods 0.000 title description 14
- 230000009466 transformation Effects 0.000 claims abstract description 83
- 230000001131 transforming effect Effects 0.000 claims abstract description 16
- 239000000203 mixture Substances 0.000 claims description 12
- 230000015572 biosynthetic process Effects 0.000 claims description 6
- 238000003786 synthesis reaction Methods 0.000 claims description 6
- 238000007476 Maximum Likelihood Methods 0.000 claims description 3
- 230000003111 delayed effect Effects 0.000 claims description 3
- 238000010606 normalization Methods 0.000 claims description 3
- 230000002123 temporal effect Effects 0.000 claims description 3
- 230000006870 function Effects 0.000 description 53
- 230000003595 spectral effect Effects 0.000 description 22
- 239000013598 vector Substances 0.000 description 17
- 238000004422 calculation algorithm Methods 0.000 description 3
- 238000004364 calculation method Methods 0.000 description 3
- 238000000354 decomposition reaction Methods 0.000 description 3
- 238000010586 diagram Methods 0.000 description 3
- 230000008569 process Effects 0.000 description 3
- 230000009467 reduction Effects 0.000 description 3
- 230000001755 vocal effect Effects 0.000 description 3
- 230000000737 periodic effect Effects 0.000 description 2
- 238000013139 quantization Methods 0.000 description 2
- 230000001360 synchronised effect Effects 0.000 description 2
- 238000004590 computer program Methods 0.000 description 1
- 238000007796 conventional method Methods 0.000 description 1
- 230000001186 cumulative effect Effects 0.000 description 1
- 230000005284 excitation Effects 0.000 description 1
- 150000002500 ions Chemical class 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000005236 sound signal Effects 0.000 description 1
- 238000001228 spectrum Methods 0.000 description 1
- 210000001260 vocal cord Anatomy 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/003—Changing voice quality, e.g. pitch or formants
- G10L21/007—Changing voice quality, e.g. pitch or formants characterised by the process used
- G10L21/013—Adapting to target pitch
- G10L2021/0135—Voice conversion or morphing
Definitions
- the present invention relates to a method for converting a voice signal pronounced by a source speaker into a converted voice signal whose acoustic characteristics resemble those of a target speaker and to a system implementing such a method.
- voice conversion applications such as voice services, human-machine oral dialogue applications or even text-to-speech synthesis
- hearing is essential and, in order to obtain acceptable quality, master the parameters related to the prosody of voice signals.
- the main acoustic or prosodic parameters modified during voice conversion processes are the parameters relating to the spectral envelope and / or for voiced sounds involving the vibration of the vocal cords, the parameters relating to a periodic structure, or the fundamental period, the inverse of which is called the fundamental frequency or "pitch".
- Conventional voice conversion methods generally comprise the determination of at least one function for transforming the acoustic characteristics of the source speaker into acoustic characteristics close to those of the target speaker, and the transformation of a voice signal to be converted by the application. of this or these functions. This transformation is a long and costly operation in computation time.
- transformation functions are conventionally considered as linear combinations of a large finite number of transformation elements applied to elements representative of the speech signal to be converted.
- the object of the invention is to solve these problems by defining a method and a system for converting a fast and good quality voice signal.
- the subject of the present invention is a method of converting a voice signal pronounced by a source speaker into a converted voice signal whose acoustic characteristics resemble those of a target speaker, comprising: - determining at least one function for transforming the acoustic characteristics of the source speaker into acoustic characteristics close to those of the target speaker, from voice samples of the source and target speakers; and - the transformation of acoustic characteristics of the voice signal to be converted from the source speaker, by the application of said at least one transformation function, characterized in that said transformation comprises a step of applying only a determined part of at least minus a transformation function on said signal to be converted.
- the method of the invention thus makes it possible to reduce the computation time necessary for implementation, thanks to the application only of a determined part of at least one transformation function.
- - at least the determination of a transformation function comprises a step of determining a model representing in a weighted manner common acoustic characteristics of the voice samples of the target speaker and the source speaker on a finite set of model components, and said transformation comprises: a step of analysis of the voice signal to be converted, grouped into frames to obtain, for each frame of samples, information relating to the acoustic characteristics; a step of determining a correspondence index between the frames to be converted and each component of said model; and a step of selecting a determined part of said components of said model as a function of said correspondence indices, said step of applying only a determined part of at least one transformation function comprising the application to said frames to be converted of the only part of said at least one transformation function corresponding to said selected components of the model; - It further includes a step of normalizing each of said correspondence
- the invention also relates to a system for converting a voice signal pronounced by a source speaker into a converted voice signal whose acoustic characteristics resemble those of a target speaker, comprising: means for determining at least one function for transforming the acoustic characteristics of the source speaker into acoustic characteristics close to those of the target speaker, from vocal samples of the source and target speakers; and means for transforming the acoustic characteristics of the voice signal to be converted from the source speaker by the application of said at least one transformation function, characterized in that said transformation means are suitable for the application of only a determined part at least one transformation function on said signal to be converted.
- said determination means are suitable for determining at least one transformation function using a model representing in weighted common acoustic characteristics of the voice samples of the source and target speakers on a finite set of components, and in that it comprises: - means for analyzing said signal to be converted, grouped into frames, to obtain, for each frame of samples, information relating to the acoustic characteristics; means for determining a correspondence index between the frames to be converted and each component of said model; and means for selecting a determined part of said components of said model as a function of said correspondence indices, said application means being adapted to apply only a determined part of said at least one transformation function corresponding to said components of the selected model.
- Voice conversion involves modifying the voice signal of a reference speaker called the source speaker, so that the signal produced seems to have been spoken by another speaker, called the target speaker.
- Such a method firstly comprises the determination of functions for transforming acoustic or prosodic characteristics, voice signals of the source speaker into acoustic characteristics close to those of the voice signals of the target speaker, from voice samples pronounced by the source speaker. and the target speaker.
- the determination 1 of transformation functions is carried out on databases of vocal samples corresponding to the acoustic realization of the same phonetic sequences, pronounced respectively by the source and target speakers.
- This determination is designated in FIG. 1A by the general reference numeral 1 and is also commonly called “learning”.
- the method then comprises a transformation of the acoustic characteristics of a voice signal to be converted pronounced by the source speaker using the function or functions previously determined.
- This transformation is designated by the general reference numeral 2 in FIG. 1B.
- different acoustic characteristics are transformed such as spectral envelope and / or fundamental frequency characteristics.
- the process begins with steps 4X and 4Y of analyzing voice samples spoken by the source and target speakers respectively.
- the analysis steps 4X and 4Y are based on the use of a sound signal model in the form of a sum of a harmonic signal with a noise signal according to a model commonly called "HN" (in English: Harmonie plus Noise Model).
- the HNM model includes the modeling of each voice signal frame into a harmonic part representing the periodic component of the signal, consisting of a sum of L harmonic sinusoids of amplitude Ai and of phase ⁇ , and a noisy part representing the friction noise and the variation of the glottal excitation.
- Steps 4X and 4Y include sub-steps 8X and 8Y for estimating, for each frame, the fundamental frequency, for example by means of an autocorrelation method.
- Sub-steps 8X and 8Y are each followed by a sub-step 10X and 10Y of synchronized analysis of each frame on its fundamental frequency, which makes it possible to estimate the parameters of the harmonic part as well as the parameters of the signal noise and in particular the maximum voicing frequency.
- this frequency can be arbitrarily fixed or be estimated by other known means.
- this synchronized analysis corresponds to the determination of the parameters of the harmonics by minimization of a criterion of weighted least squares between the complete signal and its harmonic decomposition corresponding in the embodiment described, to the noise signal valued.
- the criterion noted E is equal to: In this equation, w (n) is the analysis window and Tj is the fundamental period of the current frame.
- the analysis window is centered around the mark of the fundamental period and has a duration twice this period.
- these analyzes are made asynchronously with a fixed analysis step and a window of fixed size.
- the analysis steps 4X and 4Y finally include sub-steps 12X and 12Y for estimating the parameters of the spectral envelope of the signals in use. sant for example a method of discrete regularized cepstrum and a transformation in scale of Bark to reproduce as faithfully as possible the properties of the human ear.
- the analysis steps 4X and 4Y respectively deliver for the vocal samples pronounced by the source and target speakers, for each frame of rank n of samples of the speech signals, a scalar denoted F n representing the fundamental frequency and a vector denoted c n comprising spectral envelope information in the form of a sequence of cepstral coefficients.
- the method of calculating cepstral coefficients corresponds to a procedure known from the state of the art and, for this reason, will not be described in more detail.
- the method of the invention therefore makes it possible to define for each frame n of the source speaker, a vector denoted x n of cepstral coefficients c x (n) and the fundamental frequency.
- Steps 4X and 4Y are followed by a step 18 of alignment between the source vector x n and the target vector y n , so as to form a pairing between these vectors obtained by a conventional algorithm of dynamic temporal alignment known as “DTW ”(In English: Dynamic Time Warping).
- the alignment step 18 is followed by a step 20 of determining a model representing in a weighted manner the common acoustic characteristics of the source speaker and the target speaker on a finite set of model components.
- GMM a probabilistic model of the acoustic characteristics of the target speaker and the source speaker, according to a model denoted “GMM” of mixtures of components formed of Gausian densities.
- the parameters of the components are estimated from the source and target vectors containing, for each speaker, the discrete cepstrum.
- GMM probability density of a random variable denoted in general p (z), according to a model of mixture of densities of Gaussian probabilities GMM is written mathematically as follows:
- step 20 of determining the model includes a sub-step 22 of modeling the joint density p (z) of the source vectors denoted x and target denoted y, so that:
- Step 20 then includes a sub-step 24 for estimating GMM parameters ( ⁇ , ⁇ , ⁇ ) of the density p (z).
- This estimation can be carried out, for example, using a conventional algorithm of the so-called "EM" type (Expectation - Maximization), corresponding to an iterative method leading to obtaining a maximum likelihood estimator between the speech sample data and the Gaussian mixing model.
- the initial parameters of the GMM model are determined using a standard vector quantization technique.
- the model determination step 20 thus delivers the parameters of a mixture of Gaussian densities representative of the common acoustic characteristics of the voice samples of the source speaker and the target speaker.
- the model thus defined therefore forms a weighted representation of acoustic characteristics of the spectral envelope common to the voice samples of the target speaker and the source speaker over the finite set of components of the model.
- the method then comprises a step 30 of determining, from the model and the voice samples, a function of transformation of the spectral envelope of the signal from the source speaker to the target speaker. This transformation function is determined from an estimator of the achievement of the acoustic characteristics of the target speaker given the acoustic characteristics of the source speaker, formed in the embodiment described, by the conditional expectation.
- step 30 includes a sub-step 32 for determining the conditional expectation of the acoustic characteristics of the target speaker knowing the acoustic characteristic information of the source speaker.
- the conditional expectation is noted F (x) and is determined from the following formulas:
- Step 30 also includes a sub-step 34 for determining a function for transforming the fundamental frequency by scaling the fundamental frequency of the source speaker to the fundamental frequency of the target speaker.
- This step 34 is carried out in a conventional manner with any point in the process at the end of sub-steps 8X and 8Y for estimating the fundamental frequency.
- the conversion method then comprises the transformation 2 of a voice signal to be converted pronounced by the source speaker, which signal to be converted may be different from the voice signals used previously.
- This transformation 2 begins with an analysis step 36 carried out, in the embodiment described, using a decomposition according to the HNM model similar to those carried out in steps 4X and 4Y described previously.
- This step 36 makes it possible to deliver information of spectral envelope in the form of cepstral coefficients, information of fundamental frequency as well as information of phase and maximum frequency of voice.
- This analysis step 36 is followed by a step 38 of determining a correspondence index between the vector to be converted and each component of the model.
- each of these indices corresponds to the posterior probability of the realization of the vector to be converted by each of the different components of the model, ie at the term hj (x).
- the method then comprises a step 40 of selecting a restricted number of components of the model as a function of the correspondence indices determined in the preceding step, which restricted set is denoted S (x).
- This selection step 40 is implemented by an iterative procedure making it possible to retain a minimum set of components, these components being selected as long as the cumulative sum of their correspondence indices is less than a predetermined threshold.
- this selection step comprises the selection of a fixed number of components whose correspondence indices are the highest.
- the selection step 40 is followed by a step 42 of normalizing the correspondence indices of the selected components of the model. This normalization is achieved by the ratio of each selected index to the sum of all the selected indices.
- the method then comprises a step 43 of storing the selected model components as well as the associated normalized correspondence indices. Such a memorization step 43 is particularly useful in the case where the analysis is carried out in delayed time with respect to the rest of the transformation 2, which makes it possible to effectively prepare a subsequent conversion.
- the method then comprises a step 44 of partial application of the transformation function of the spectral envelope by the application of the only transformation elements corresponding to the selected model components. These only selected transformation elements are applied to the frames of the signal to be converted, in order to reduce the time necessary for the implementation of this transformation.
- step 44 of partial application of the transformation function is limited to N (P 2 + 1) multiplications, which are added to the Q (P 2 + 1) modifications making it possible to determine the correspondence indices, against twice Q (P 2 +1). Consequently, the reduction in complexity obtained is at least of the order of Q / (Q + N).
- step 44 of applying the transformation function is limited to N (P 2 +1) operations against 2Q (P 2 +1), in the state of the art, so that, for this step 44, the reduction in the calculation time is of the order of 2Q / N.
- the quality of the transformation is however preserved by the application of the components having a high correspondence index with the signal to be converted.
- the method then comprises a step 46 of transforming the fundamental frequency characteristics of the speech signal to be converted, using the scaling transformation function determined in step 34 and carried out according to conventional techniques.
- the conversion method then includes a step 48 of synthesis of the output signal carried out, in the example described, by an HNM type synthesis which directly delivers the converted voice signal from the transformed spectral envelope information. in step 44 and fundamental frequency information delivered by step 46.
- This step 48 also uses phase and maximum voicing frequency information delivered by step 36.
- the conversion method of the invention thus allows to perform a high quality conversion with low complexity and therefore a significant saving in computation time.
- FIG. 2 shows a block diagram of a voice conversion system implementing the method described with reference to FIGS. 1A and 1B. This system uses as input a database 50 of voice samples spoken by the source speaker and a database 52 containing at least the same voice samples spoken by the target speaker.
- a module 54 for determining functions for transforming acoustic characteristics and of the source speaker into acoustic characteristics of the target speaker.
- This module 54 is suitable for the implementation of step 1 as described with reference to FIG. 1 and therefore allows the determination of at least one function for transforming acoustic characteristics and in particular the function for transforming characteristics.
- spectral envelope and the fundamental frequency transformation function are suitable for determining the transformation function of the spectral envelope from a model representing in a weighted manner common acoustic characteristics of the samples. Voice ions of the target speaker and the source speaker, on a finite set of model components.
- the voice conversion system receives as input a voice signal 60 corresponding to a speech signal spoken by the source speaker and intended to be converted.
- the signal 60 is introduced into an analysis module 62 implementing, for example an HNM type decomposition making it possible to extract information from the spectral envelope of the signal 60 in the form of cepstral coefficients and information of fundamental frequency.
- the module 62 also delivers information on phase and maximum voicing frequency obtained by the application of the HNM model.
- the module 62 therefore implements step 36 of the method as described above.
- the module 62 is implemented beforehand and the information is stored for later use.
- the system then comprises a module 64 for determining the correspondence indices between the voice signal to be converted 60 and each component of the model.
- the module 64 receives the parameters of the model determined by the module 54.
- the module 64 therefore implements step 38 of the method as described above.
- the system then comprises a model 65 for selecting components of the model implementing the method step 40 described above and allowing the selection of components having a correspondence index reflecting a strong connection with the voice signal to be converted.
- this module 65 also performs the normalization of the correspondence indices of the selected components with respect to their average by implementing step 42.
- the method then comprises a module 66 for partial application of the envelope transformation function spectral determined by the module 54, by the application of the only transformation elements selected by the module 65 according to the correspondence indices.
- this module 66 is suitable for the implementation of step 44 of partial application of the transformation function, so as to deliver in output, acoustic information from the source speaker transformed by the only selected elements of the transformation function, ie by the components of the model having a high correspondence index, with the frames of the signal to be converted 60.
- This module therefore allows rapid transformation of the voice signal to convert thanks to the partial application of the transformation function. The quality of the transformation is preserved by the selection of the components of the model having a high index of correspondence with the signal to be converted.
- the module 66 is also suitable for carrying out a transformation of the fundamental frequency characteristics, carried out in a conventional manner by the application of the scaling transformation function carried out according to step 46.
- the system then comprises a module 68 of synthesis receiving as input, the spectral envelope and fundamental frequency information transformed and delivered by the module 66 as well as phase and maximum voicing frequency information delivered by the analysis module 62.
- the module 68 thus implements step 46 of the method described with reference to FIG. 1 and delivers a signal 70, corresponding to the voice signal 60 of the source speaker but whose spectral envelope and fundamental frequency characteristics have been modified. to be similar to that of the target speaker.
- the system described can be implemented in various ways and in particular using suitable computer programs and connected to hardware means of sound acquisition. This system can also be implemented on specific databases in order to form databases of converted signals ready to be used.
- this system can be implemented in a first operating phase in order to deliver, for a signal database, information relating to the components of the selected model as well as to their respective correspondence indices, this information then being stored. .
- the modules 66 and 68 of the system are implemented later on demand, to generate a synthetic voice signal using the voice signals to be converted and the information relating to the selected components and their correspondence indices in order to obtain a reduction. maximum calculation time.
- the method of the invention and the corresponding system can also be implemented in real time.
- the method of the invention and the corresponding system are suitable for the determination of several transformation functions.
- a first and second function are determined for the transformation respectively of the spectral envelope parameters and of the fundamental frequency parameters of the frames with voiced character and a third function is determined for the transformation of the frames with unvoiced character.
- a step of separating, in the voice signal to be converted, voiced and unvoiced frames and one or more steps of transformation of each of these sets of frames is therefore provided.
- only one or more of the functions of. transformation is partially applied in order to decrease the processing time.
- the voice conversion is carried out by transforming the spectral envelope characteristics and the fundamental frequency characteristics separately, only the spectral envelope transformation function being partially applied.
- the system is suitable for the implementation of all the steps of the method described with reference to FIGS. 1A and 1B.
- the HNM and GMM models can be replaced by other techniques and models known to those skilled in the art.
- the analysis is carried out using techniques called LPC (Linear Predictive Coding), sinusoidal models or MBE (Multi Band Excited), the spectral parameters are parameters called LSF (Une Spectrum Frequencies), or even parameters linked to formants or to a glottic signal.
- the GMM model is replaced by a fuzzy vector quantization (Fuzzy VQ.).
- the estimator implemented during step 30 can be an a posteriori maximum criterion, called "MAP" and corresponding to the realization of the expectation calculation only for the model best representing the pair of vectors target source.
- the determination of a transformation function is carried out using a so-called least squares technique instead of the estimation of the joint density described.
- the determination of a transformation function comprises modeling the probability density of the source vectors using a GMM model and then determining the parameters of the model using an EM algorithm. The modeling thus takes into account the speech segments of the source speaker whose correspondents spoken by the target speaker are not available. The determination then includes the minimization of a least squares criterion between target and source parameters to obtain the transformation function. It should be noted that the estimator of this function is always expressed in the same way but that the parameters are estimated differently and that additional data are taken into account.
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Quality & Reliability (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Stereophonic System (AREA)
- Circuit For Audible Band Transducer (AREA)
Abstract
Description
Claims
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
FR0403405A FR2868587A1 (fr) | 2004-03-31 | 2004-03-31 | Procede et systeme de conversion rapides d'un signal vocal |
PCT/FR2005/000607 WO2005106853A1 (fr) | 2004-03-31 | 2005-03-14 | Procede et systeme de conversion rapides d'un signal vocal |
Publications (1)
Publication Number | Publication Date |
---|---|
EP1730728A1 true EP1730728A1 (fr) | 2006-12-13 |
Family
ID=34944345
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
EP05735426A Withdrawn EP1730728A1 (fr) | 2004-03-31 | 2005-03-14 | Procede et systeme de conversion rapides d'un signal vocal |
Country Status (4)
Country | Link |
---|---|
US (1) | US7792672B2 (fr) |
EP (1) | EP1730728A1 (fr) |
FR (1) | FR2868587A1 (fr) |
WO (1) | WO2005106853A1 (fr) |
Families Citing this family (19)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2006099467A2 (fr) * | 2005-03-14 | 2006-09-21 | Voxonic, Inc. | Systeme et procede de selection et de classement automatique de donneur pour la conversion vocale |
CN101351841B (zh) * | 2005-12-02 | 2011-11-16 | 旭化成株式会社 | 音质转换系统 |
US20070213987A1 (en) * | 2006-03-08 | 2007-09-13 | Voxonic, Inc. | Codebook-less speech conversion method and system |
JP4966048B2 (ja) * | 2007-02-20 | 2012-07-04 | 株式会社東芝 | 声質変換装置及び音声合成装置 |
EP1970894A1 (fr) * | 2007-03-12 | 2008-09-17 | France Télécom | Procédé et dispositif de modification d'un signal audio |
ES2895268T3 (es) * | 2008-03-20 | 2022-02-18 | Fraunhofer Ges Forschung | Aparato y método para modificar una representación parametrizada |
JP5038995B2 (ja) * | 2008-08-25 | 2012-10-03 | 株式会社東芝 | 声質変換装置及び方法、音声合成装置及び方法 |
CN102257566A (zh) * | 2008-12-19 | 2011-11-23 | 皇家飞利浦电子股份有限公司 | 用于适配通信的方法和系统 |
TWI391876B (zh) * | 2009-02-16 | 2013-04-01 | Inst Information Industry | 利用多重模組混合圖形切割之前景偵測方法、系統以及電腦程式產品 |
DE102009013020A1 (de) * | 2009-03-16 | 2010-09-23 | Hayo Becks | Vorrichtung und Verfahren zur Anpassung von Klangbildern |
US8321209B2 (en) * | 2009-11-10 | 2012-11-27 | Research In Motion Limited | System and method for low overhead frequency domain voice authentication |
JP5961950B2 (ja) * | 2010-09-15 | 2016-08-03 | ヤマハ株式会社 | 音声処理装置 |
US8620646B2 (en) * | 2011-08-08 | 2013-12-31 | The Intellisis Corporation | System and method for tracking sound pitch across an audio signal using harmonic envelope |
US9520138B2 (en) * | 2013-03-15 | 2016-12-13 | Broadcom Corporation | Adaptive modulation filtering for spectral feature enhancement |
WO2016042626A1 (fr) | 2014-09-17 | 2016-03-24 | 株式会社東芝 | Appareil de traitement de la parole, procédé de traitement de la parole, et programme |
US20190019500A1 (en) * | 2017-07-13 | 2019-01-17 | Electronics And Telecommunications Research Institute | Apparatus for deep learning based text-to-speech synthesizing by using multi-speaker data and method for the same |
US20190362737A1 (en) * | 2018-05-25 | 2019-11-28 | i2x GmbH | Modifying voice data of a conversation to achieve a desired outcome |
US11380345B2 (en) * | 2020-10-15 | 2022-07-05 | Agora Lab, Inc. | Real-time voice timbre style transform |
CN112750446B (zh) * | 2020-12-30 | 2024-05-24 | 标贝(青岛)科技有限公司 | 语音转换方法、装置和系统及存储介质 |
Family Cites Families (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO1993018505A1 (fr) * | 1992-03-02 | 1993-09-16 | The Walt Disney Company | Systeme de transformation vocale |
US5572624A (en) * | 1994-01-24 | 1996-11-05 | Kurzweil Applied Intelligence, Inc. | Speech recognition system accommodating different sources |
ATE277405T1 (de) * | 1997-01-27 | 2004-10-15 | Microsoft Corp | Stimmumwandlung |
US6029124A (en) * | 1997-02-21 | 2000-02-22 | Dragon Systems, Inc. | Sequential, nonparametric speech recognition and speaker identification |
US6336092B1 (en) * | 1997-04-28 | 2002-01-01 | Ivl Technologies Ltd | Targeted vocal transformation |
US6317710B1 (en) * | 1998-08-13 | 2001-11-13 | At&T Corp. | Multimedia search apparatus and method for searching multimedia content using speaker detection by audio data |
US6879952B2 (en) * | 2000-04-26 | 2005-04-12 | Microsoft Corporation | Sound source separation using convolutional mixing and a priori sound source knowledge |
WO2002067245A1 (fr) * | 2001-02-16 | 2002-08-29 | Imagination Technologies Limited | Verification de haut-parleurs |
US7412377B2 (en) * | 2003-12-19 | 2008-08-12 | International Business Machines Corporation | Voice model for speech processing based on ordered average ranks of spectral features |
-
2004
- 2004-03-31 FR FR0403405A patent/FR2868587A1/fr active Pending
-
2005
- 2005-03-14 EP EP05735426A patent/EP1730728A1/fr not_active Withdrawn
- 2005-03-14 WO PCT/FR2005/000607 patent/WO2005106853A1/fr not_active Application Discontinuation
- 2005-03-14 US US10/591,599 patent/US7792672B2/en not_active Expired - Fee Related
Non-Patent Citations (2)
Title |
---|
None * |
See also references of WO2005106853A1 * |
Also Published As
Publication number | Publication date |
---|---|
US7792672B2 (en) | 2010-09-07 |
WO2005106853A1 (fr) | 2005-11-10 |
US20070192100A1 (en) | 2007-08-16 |
FR2868587A1 (fr) | 2005-10-07 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
EP1730729A1 (fr) | Procede et systeme ameliores de conversion d'un signal vocal | |
WO2005106853A1 (fr) | Procede et systeme de conversion rapides d'un signal vocal | |
Ye et al. | Quality-enhanced voice morphing using maximum likelihood transformations | |
EP1606792B1 (fr) | Procede d analyse d informations de frequence fondament ale et procede et systeme de conversion de voix mettant en oeuvre un tel procede d analyse | |
Mowlaee et al. | Interspeech 2014 special session: Phase importance in speech processing applications | |
EP2202723B1 (fr) | Procédé et système pour authentifier un locuteur | |
RU2568278C2 (ru) | Расширение полосы пропускания звукового сигнала нижней полосы | |
JPH075892A (ja) | 音声認識方法 | |
US7505950B2 (en) | Soft alignment based on a probability of time alignment | |
Guglani et al. | Automatic speech recognition system with pitch dependent features for Punjabi language on KALDI toolkit | |
EP1526508A1 (fr) | Procédé de sélection d'unités de synthèse | |
EP1275109B2 (fr) | Méthode et dispositif d'enrichissement spectral | |
EP2795618A1 (fr) | Procédé de détection d'une bande de fréquence prédéterminée dans un signal de données audio, dispositif de détection et programme d'ordinateur correspondant | |
EP1846918B1 (fr) | Procede d'estimation d'une fonction de conversion de voix | |
EP1895433A1 (fr) | Procédé d'estimation de phase pour la modélisation sinusoidale d'un signal numérique | |
Berisha et al. | Bandwidth extension of speech using perceptual criteria | |
En-Najjary et al. | Fast GMM-based voice conversion for text-to-speech synthesis systems. | |
EP1194923B1 (fr) | Procedes et dispositifs d'analyse et de synthese audio | |
WO2008081141A2 (fr) | Codage d'unites acoustiques par interpolation | |
EP1605440A1 (fr) | Procédé de séparation de signaux sources à partir d'un signal issu du mélange | |
Grekas | On Speaker Interpolation and Speech Conversion for parallel corpora. | |
Collen | Bandwidth extension tools for audio digital signals | |
Mohammadi et al. | Nearest neighbor approach in speaker adaptation for HMM-based speech synthesis | |
WO2001003119A1 (fr) | Codage et decodage audio incluant des composantes non harmoniques du signal | |
Petrinovic | Harmonic weighting for all-pole modeling of the voiced speech. |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PUAI | Public reference made under article 153(3) epc to a published international application that has entered the european phase |
Free format text: ORIGINAL CODE: 0009012 |
|
17P | Request for examination filed |
Effective date: 20060823 |
|
AK | Designated contracting states |
Kind code of ref document: A1 Designated state(s): AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IS IT LI LT LU MC NL PL PT RO SE SI SK TR |
|
DAX | Request for extension of the european patent (deleted) | ||
17Q | First examination report despatched |
Effective date: 20110520 |
|
RAP1 | Party data changed (applicant data changed or rights of an application transferred) |
Owner name: ORANGE |
|
GRAP | Despatch of communication of intention to grant a patent |
Free format text: ORIGINAL CODE: EPIDOSNIGR1 |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: GRANT OF PATENT IS INTENDED |
|
RIC1 | Information provided on ipc code assigned before grant |
Ipc: G10L 21/00 20130101ALI20170621BHEP Ipc: G10L 21/013 20130101AFI20170621BHEP |
|
INTG | Intention to grant announced |
Effective date: 20170724 |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: THE APPLICATION IS DEEMED TO BE WITHDRAWN |
|
18D | Application deemed to be withdrawn |
Effective date: 20171205 |