EP0396141A2 - System for and method of synthesizing singing in real time - Google Patents
System for and method of synthesizing singing in real time Download PDFInfo
- Publication number
- EP0396141A2 EP0396141A2 EP19900108393 EP90108393A EP0396141A2 EP 0396141 A2 EP0396141 A2 EP 0396141A2 EP 19900108393 EP19900108393 EP 19900108393 EP 90108393 A EP90108393 A EP 90108393A EP 0396141 A2 EP0396141 A2 EP 0396141A2
- Authority
- EP
- European Patent Office
- Prior art keywords
- real time
- electrical signals
- midi
- generating
- varying
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
- 230000002194 synthesizing effect Effects 0.000 title claims abstract description 10
- 238000000034 method Methods 0.000 title claims description 10
- 230000001755 vocal effect Effects 0.000 claims description 8
- 230000005236 sound signal Effects 0.000 claims description 3
- 239000011295 pitch Substances 0.000 description 12
- 230000008859 change Effects 0.000 description 7
- 239000000872 buffer Substances 0.000 description 4
- 238000010586 diagram Methods 0.000 description 3
- 230000015572 biosynthetic process Effects 0.000 description 2
- 238000006243 chemical reaction Methods 0.000 description 2
- 230000005284 excitation Effects 0.000 description 2
- 210000000056 organ Anatomy 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 238000003786 synthesis reaction Methods 0.000 description 2
- 238000013519 translation Methods 0.000 description 2
- QAWIHIJWNYOLBE-OKKQSCSOSA-N acivicin Chemical compound OC(=O)[C@@H](N)[C@@H]1CC(Cl)=NO1 QAWIHIJWNYOLBE-OKKQSCSOSA-N 0.000 description 1
- ZYXYTGQFPZEUFX-UHFFFAOYSA-N benzpyrimoxan Chemical compound O1C(OCCC1)C=1C(=NC=NC=1)OCC1=CC=C(C=C1)C(F)(F)F ZYXYTGQFPZEUFX-UHFFFAOYSA-N 0.000 description 1
- NEHMKBQYUWJMIP-UHFFFAOYSA-N chloromethane Chemical compound ClC NEHMKBQYUWJMIP-UHFFFAOYSA-N 0.000 description 1
- 238000004040 coloring Methods 0.000 description 1
- 230000006835 compression Effects 0.000 description 1
- 238000007906 compression Methods 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 230000000994 depressogenic effect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 230000033764 rhythmic process Effects 0.000 description 1
- 230000001360 synchronised effect Effects 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H5/00—Instruments in which the tones are generated by means of electronic generators
- G10H5/005—Voice controlled instruments
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
- G10L13/02—Methods for producing synthetic speech; Speech synthesisers
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H2250/00—Aspects of algorithms or signal processing methods without intrinsic musical character, yet specifically adapted for or used in electrophonic musical processing
- G10H2250/315—Sound category-dependent sound synthesis processes [Gensound] for musical use; Sound category-specific synthesis-controlling parameters or control means therefor
- G10H2250/455—Gensound singing voices, i.e. generation of human voices for musical applications, vocal singing sounds or intelligible words at a desired pitch or with desired vocal effects, e.g. by phoneme synthesis
Definitions
- the present invention relates to voice synthesizing and in particular to a system for the synthesizing of singing in real time.
- Speech synthesizing systems are known wherein the singing of a song is simulated.
- two systems of this type are shown in US-A-4,731,847 and US-A-4,527,274.
- the main object of the present invention is to provide a system for and method of synthesizing the singing of song in real time.
- MIDI musical instrument digital interface hardware
- speech processor integrated circuit By making use of musical instrument digital interface hardware (MIDI) and a speech processor integrated circuit, it is possible to create a singing voice directly from a keyboard input in real time, that is, whith the duration and frequency at which the keyboard keys are actuated controlling the time, loudness and pitch of the singing voice.
- MIDI musical instrument digital interface hardware
- speech processor integrated circuit By making use of musical instrument digital interface hardware (MIDI) and a speech processor integrated circuit, it is possible to create a singing voice directly from a keyboard input in real time, that is, whith the duration and frequency at which the keyboard keys are actuated controlling the time, loudness and pitch of the singing voice.
- any sppec synthesizing device can be used in the system according to the present invention.
- the sounds can be played in real time.
- the duration of the sounds is dependent upon how long a key on the keyboard is depressed and is not determined by preprogrammed duration as described in US-A-4,527,274.
- the present invention enabels one to do so.
- the quality of sounds can be manipulated in real time by means of changing the vocal tract length, the dynamic values can be altered in real time, the interpolation timing or transfer function between one sound and the following sound can be altered in real time, the pitch of the sounds can be changed in real time independently from the above-mentioned parameters and the envelope of the sounds which includes the attack and decay are programmable.
- the sound and pitch range of the synthesizer can be further enhanced and improved by inserting an external sound source into the chip and replacing the internal excitation function.
- the external sound can be easily taken from any sound source allowing one to feed different wave forms or noise to simulate whispering. Also chords or any composed sound could be fed into the speech synthesizer filters.
- chords or any composed sound could be fed into the speech synthesizer filters.
- both the sound characteristics of the speech synthesizer and the external sound sources can be drastically changed.
- the expressive subtleties of singing, such as vibrato and portamento, are also obtainable and which are not possible in the prior art devices.
- the present invention also enables one to store the MIDI generated data in any MIDI memory device and which can be loaded and transferred onto storage media because of the international MIDI standard. Thus no specific computer software or hardware is needed besides the ability to understand MIDI codes.
- the speech synthesizer and MIDI combination can be used to generate a large amount of electronic sounds and organ like timbres.
- the combination can also be used to generate spectres with vocal qualities resembling a voice choir and additionally one can program sequences in such a specific way that speech like sounds or even intelligible speech can be generated.
- the attributes of the MIDI commands allow synchronization of high resolution with the MIDI time code and this means that one can perfectly synchronize one or more MIDI speech synthesizers with one another and with existing or prerecorded music.
- a capella music or a synthetic vocal orchestra can be realized.
- synthesized lyrics can be programmed and edited to that a synthetic voice sings along with the music, synchronized by the MIDI time code. Because the synchronization proportionally relates to the speed of the music, the strings of speech sounds can be speeded up or slowed down in relation to the rhythm and the result is a 100% accurate timing which can hardly be achieved by a human singer.
- a phoneme editor compiler can be utilized to easily and accurately synchronize the system.
- the system comprises the combination of the MIDI interface and speech synthesis in a musical context.
- the sounds are configured on a MIDI keyboard according to the sounding properties of the speed sounds from dark to bright vowels, followed by the voiced consonants and voiceless consonants and finally plosives so that a great variety of musically useful timbers like formants, noise bands, stationary sounds and percussive sounds can be generated to create the electronic sounds covering a broad spectrum.
- the speech synthesizer vocal tract length or filter frequency and interpolation speed can be controlled by the MIDI pitch bend and modulation wheel or any other MIDI controller.
- envelope parameters for all sounds except the plosives, in this case attack time and decay time can be determined by specific MIDI control change sequences.
- the MIDI implementation is such that keys 36-93 control speech sounds on channels 1, 3, 5 and 7 whereas the pitch control is controlled on channels 2, 4, 6 and 8.
- the pitch bend wheel is used to control vocal tract length, the modulation wheel controls speed of filter interpolation and the velocity controls dynamic loudness control.
- a phoneme speech synthesizer 17 now produced by Artic, formerly by the Votrax Division of the Federal Screw Works, Troy, Mich, USA, specifically, the Votrax SC-02 speech synthesizer and the data sheet for that synthesizer is incorporated herein by reference. Also incorporated by reference is the MIDI specification 1.0 by the International MIDI Association, Sun Valley, CA, USA, which is incorporated in the MIDI interface 15.
- the system also includes a processing unit is shown that consists of CPU 10, which may be a 6502 CPU, a ROM 12 for data storage purposes, which may be a 6116, and Address Decoding Logic 13. These parts are connected via a common computer bus 14 containing address, data and command lines. Timing is controlled by a clock generator 18.
- MIDI interface 15 Also connected to the compuer bus are MIDI interface 15, mainly an ACIA 6850, and a buffer register 16, consisting of two 75LS245.
- One or more speech processors chips 17 are connected to the buffer 16. These chips are provided with additional audio circuitry, the audio output 19 which buffers the audio output of the SC-02 and makes it suitable for audio amplifiers, and the audio input 20 at pins 3 and 5 which allows an external audio signal, e.g. from a synthesizer, to be fed into the SC-02 thus replacing the internal tone generator.
- the tone source can be selected using an appropriate switch 22.
- the main taks of the processing unit is to receive MIDI data from interface 15, to translate it in several ways into SC-02 specific data and to output these data via the buffer 16 to the SC-02 17.
- the SC-02 provides 8 registers, of which 5 are used in the singing process Phoneme, Inflection, Articulation, Amplitude and Filter Frequency.
- the kind of interpreting and translating of the received MIDI data into data for these registers can depend on the switch position of the mode selection 21 or can be controlled by specific MIDI data, e.g. Program Change events.
- the MIDI Note On events receivd on a MIDI channel N from keyborad 1 shown in Fig. 2 turn on a specific phoneme by writing the number of the phoneme into the Phoneme register of the SC-02.
- the phoneme number is generated by a translation table which translates the Note Number into a phoneme number.
- the Velocity of the Note on envent is used to affect the Amplitude register of the SC-02.
- Note Off events received on channel N are used to turn off the phoneme by writing the code for the phoneme 'Pause' into the phoneme register.
- Note On events received on channel N + 1 are used to select the appropriate singing frequency.
- the MIDI Note On events received on a MIDI channel N are used to select the appropriate singing frequency.
- the Note Numbers of these events are translated into values for the Inflection register like above.
- the velocity of the Note On event is used to affect the Amplitude register of the SC-02.
- Note Off events received on channel N are used to turn off the phoneme by writing the value 0 into the Amplitude register.
- Program Change events turn on a specific phoneme by writing the number of the phoneme into the Phoneme register of the SC-02.
- the phoneme number is generated by a translation table which translates the Program Number into a phoneme number.
- Pitch Wheel Change events are translated into values for the Filter Frequency and Continuous Controller events for Modulation Wheel are translated into values for the Articulation register. This way of interpretation allows the user to play the SC-02 using MIDI compatible keyboard 1 in a way similar to a common expander device with special voice-like sounds.
- Two preferred implementations of the invention include using a single SC-02 with 2 modes of interpreting the MIDI events, and using four SC-02 chips with 6 modes of interpreting the MIDI events.
- a MIDI-compatible sequencer-program is designed for the system using a phoneme editor-compiler synchronizing syllables of speech to MIDI-timecode.
- Any text-editor can be used to write a "phonetic score" which determines all parameters for the artificial singer like pitch, dynamic, timing, etc.
- the digital controlled analogue chip is using the concept of phoneme synthesis, which makes it relatively easy to program for real time application.
- the sounding elements of speech or singing are defined as vowels, voiced and unvoiced consonants and plosives, a variety of timbres like formants, noise bands and percussive sounds are generated.
- the 54 phonemes of the chip are configures to match with 54 MIDI-notes on any MIDI-keyboard from 36 to 93 in groups of vowels, consonants etc. as shown in Fig. 2.
- the speech synthesizer functions like a MIDI-expander, to be "played” with a keyboard or driven by the "speech-sequencer” program.
- the chip is suitable as a multi bandpass-filter to process external sound sources.
- Any text to be uttered or sung is a potential "phonetic score”.
- First a text is analyzed by its phonemic structure, through a text to speech conversion either automatically by rules, or by manual input.
- syllables, or phonemes strings have to be divided or chopped in more or less equal frames based on the 24 pulses of the MIDI-clock.
- the program calculates a timebase of 24 divided by the number of phonemes, so that one syllable consisting of 7 and another consisting of 2 phonemes have the same duration. If the automatic conversion does not sound satisfying, subtle variations are possible through editing.
- the next parameter is the dynamic value :L: loudness
- Values range from 0 to 9, configured either to a linear or lograithmic equivalent of the 1- 127 velocity scale.
- the last parameter is pitch or tone "T"
- filter frquency and interpolation rate are controlled in realtime by MIDI pitchbend- and modulation wheel.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Electrophonic Musical Instruments (AREA)
Abstract
A system for synthesizing singing in real time, comprises a plurality of manually actuatable keys (1) for producing different first electrical signals corresponding to each of the plurality of keys actuated and varying in real time with duration and frequency of actuation, a musical instrument digital interface (MIDI) (15) receptive of the electrical signals from the plurality of manually actuatable keys (1) for generating standard data output signals in real time for musical notes corresponding to the first electrical signals, a phoneme speech synthesizer (17) receptive of phoneme codes for generating real time analog signals corresponding to a singing voice and a translator (10, 12, 13) receptive of the MIDI output signals for converting same in real time to phoneme codes and for applying same to the speech synthesizer (17) in real time.
Description
- The present invention relates to voice synthesizing and in particular to a system for the synthesizing of singing in real time.
- Speech synthesizing systems are known wherein the singing of a song is simulated. For example, two systems of this type are shown in US-A-4,731,847 and US-A-4,527,274.
- These known systems, while being capable of synthesizing the singing of a song, are not capable of doing so in real time wherein the frequency, duration, pitch, etc. of the synthesized voice can be varied in response to a real time manually actuatable input from, for example, a keyboard.
- The main object of the present invention is to provide a system for and method of synthesizing the singing of song in real time.
- The solution of such problem is apparent from
claims 1 and 6, respectively, whereas further developments of the invention can be taken from claims 1 to 5 and 7 to 10. - By making use of musical instrument digital interface hardware (MIDI) and a speech processor integrated circuit, it is possible to create a singing voice directly from a keyboard input in real time, that is, whith the duration and frequency at which the keyboard keys are actuated controlling the time, loudness and pitch of the singing voice.
- By using the MIDI device to generate the codes from a keyboard input, any sppec synthesizing device can be used in the system according to the present invention.
- The way that the sounds of the speech synthesizier are activated by the MIDI commands is totally open in terms of input procedure and independent of the type of hardware or manufacturer as far as the MIDI protocol is used.
- As opposed to the conventional systems, the sounds can be played in real time. Thus the duration of the sounds is dependent upon how long a key on the keyboard is depressed and is not determined by preprogrammed duration as described in US-A-4,527,274. Whereas prior art systems did not make it possible to play along with music, the present invention enabels one to do so. In fact, the quality of sounds can be manipulated in real time by means of changing the vocal tract length, the dynamic values can be altered in real time, the interpolation timing or transfer function between one sound and the following sound can be altered in real time, the pitch of the sounds can be changed in real time independently from the above-mentioned parameters and the envelope of the sounds which includes the attack and decay are programmable.
- The sound and pitch range of the synthesizer can be further enhanced and improved by inserting an external sound source into the chip and replacing the internal excitation function. The external sound can be easily taken from any sound source allowing one to feed different wave forms or noise to simulate whispering. Also chords or any composed sound could be fed into the speech synthesizer filters. Thus both the sound characteristics of the speech synthesizer and the external sound sources can be drastically changed. The expressive subtleties of singing, such as vibrato and portamento, are also obtainable and which are not possible in the prior art devices.
- The present invention also enables one to store the MIDI generated data in any MIDI memory device and which can be loaded and transferred onto storage media because of the international MIDI standard. Thus no specific computer software or hardware is needed besides the ability to understand MIDI codes.
- The speech synthesizer and MIDI combination can be used to generate a large amount of electronic sounds and organ like timbres. The combination can also be used to generate spectres with vocal qualities resembling a voice choir and additionally one can program sequences in such a specific way that speech like sounds or even intelligible speech can be generated.
- The attributes of the MIDI commands allow synchronization of high resolution with the MIDI time code and this means that one can perfectly synchronize one or more MIDI speech synthesizers with one another and with existing or prerecorded music. In addition, with multitrack recording, a capella music or a synthetic vocal orchestra can be realized.
- With any sequencer, synthesized lyrics can be programmed and edited to that a synthetic voice sings along with the music, synchronized by the MIDI time code. Because the synchronization proportionally relates to the speed of the music, the strings of speech sounds can be speeded up or slowed down in relation to the rhythm and the result is a 100% accurate timing which can hardly be achieved by a human singer.
- Moreover, a phoneme editor compiler can be utilized to easily and accurately synchronize the system.
- In accordance with the invention, the system comprises the combination of the MIDI interface and speech synthesis in a musical context.
- The sounds are configured on a MIDI keyboard according to the sounding properties of the speed sounds from dark to bright vowels, followed by the voiced consonants and voiceless consonants and finally plosives so that a great variety of musically useful timbers like formants, noise bands, stationary sounds and percussive sounds can be generated to create the electronic sounds covering a broad spectrum.
- Different vocal qualities can be obtained and the speech synthesizer's array of narrow band pass filters allow a wide range from subtle coloring to extreme distortion of sound sources.
- By using more than one speech synthesizer at a time, choir or organ-like effects can be played.
- According to the invention, five modes of operation are possible:
- 1. Sequencer mode wherein the speech sounds can be played either by a keyboard or up to N-identical or N-different sounds can be called up by any MIDI sequencer.
- 2. Polyphonic mode wherein a keyboard generates N-identical speech sounds which can be played in different pitches and selected by the MIDI program change sequences.
- 3. Monophonic mode wherein N-difference speech sounds can be combined like an organ register and played by a single key at a time.
- 4. Filter mode wherein the filters of the speech synthesizer are in stationary mode for the applied external sound sources and the filter parameters are controlled by the MIDI program change.
- 5. Split mode wherein a combination of sequencer mode and N-voice polyphonic mode is achieved.
- In addition to this, the speech synthesizer vocal tract length or filter frequency and interpolation speed can be controlled by the MIDI pitch bend and modulation wheel or any other MIDI controller. Additionally, envelope parameters for all sounds except the plosives, in this case attack time and decay time can be determined by specific MIDI control change sequences. By using an external sound source instead of the internal excitation signal, the quality of the speech sounds can be altered, enhanced and improved.
- In accordance with the invention, the MIDI implementation is such that keys 36-93 control speech sounds on
channels channels - These and other features and advantages of the present invention will be seen from the following detailed description in conjunction with the attached referenced drawings, wherein:
-
- Fig. 1 is a block diagram of a system according to the invention;
- Fig. 2 is an illustration of a MIDI keyboard and the voice sounds associated with each key;
- Fig. 3 is a circuit diagram of an analog circuit of a speech synthesizer; and
- Fig. 4 is a circuit diagram of a digital circuit of a speech synthesizer.
- Referring to Fig. 1, it is to be noted that the system employs a
phoneme speech synthesizer 17 now produced by Artic, formerly by the Votrax Division of the Federal Screw Works, Troy, Mich, USA, specifically, the Votrax SC-02 speech synthesizer and the data sheet for that synthesizer is incorporated herein by reference. Also incorporated by reference is the MIDI specification 1.0 by the International MIDI Association, Sun Valley, CA, USA, which is incorporated in theMIDI interface 15. - The system also includes a processing unit is shown that consists of
CPU 10, which may be a 6502 CPU, aROM 12 for data storage purposes, which may be a 6116, and Address DecodingLogic 13. These parts are connected via acommon computer bus 14 containing address, data and command lines. Timing is controlled by aclock generator 18. - Also connected to the compuer bus are
MIDI interface 15, mainly an ACIA 6850, and abuffer register 16, consisting of two 75LS245. - One or more
speech processors chips 17 are connected to thebuffer 16. These chips are provided with additional audio circuitry, theaudio output 19 which buffers the audio output of the SC-02 and makes it suitable for audio amplifiers, and theaudio input 20 atpins appropriate switch 22. - The main taks of the processing unit is to receive MIDI data from
interface 15, to translate it in several ways into SC-02 specific data and to output these data via thebuffer 16 to the SC-02 17. The SC-02 provides 8 registers, of which 5 are used in the singing process Phoneme, Inflection, Articulation, Amplitude and Filter Frequency. The kind of interpreting and translating of the received MIDI data into data for these registers can depend on the switch position of themode selection 21 or can be controlled by specific MIDI data, e.g. Program Change events. - In one embodiment, the MIDI Note On events receivd on a MIDI channel N from keyborad 1 shown in Fig. 2, turn on a specific phoneme by writing the number of the phoneme into the Phoneme register of the SC-02. The phoneme number is generated by a translation table which translates the Note Number into a phoneme number. The Velocity of the Note on envent is used to affect the Amplitude register of the SC-02. Note Off events received on channel N are used to turn off the phoneme by writing the code for the phoneme 'Pause' into the phoneme register. Note On events received on channel N + 1 are used to select the appropriate singing frequency. The Note Numbers of these events are translated into values for the Inflection register of the SC-02, which enables the SC-02 to produce singing sounds in a wide range based on a tuning frequency of 440 Hz for a′. Pitch Wheel Change events are translated into values for the Filter Frequency and Continuous Controller events for Modulation Wheel are translated into values for the Articulation register. This way of interpretation allows the user to make the device sing with full control of the relevant parameters. For convenience purpose, the user could prerecord the events using a MIDI compatible sequencer or special event editing software on a MIDI compatible computer.
- In a second embodiment, the MIDI Note On events received on a MIDI channel N are used to select the appropriate singing frequency. The Note Numbers of these events are translated into values for the Inflection register like above. The velocity of the Note On event is used to affect the Amplitude register of the SC-02. Note Off events received on channel N are used to turn off the phoneme by writing the value 0 into the Amplitude register. Program Change events turn on a specific phoneme by writing the number of the phoneme into the Phoneme register of the SC-02. The phoneme number is generated by a translation table which translates the Program Number into a phoneme number. Pitch Wheel Change events are translated into values for the Filter Frequency and Continuous Controller events for Modulation Wheel are translated into values for the Articulation register. This way of interpretation allows the user to play the SC-02 using MIDI compatible keyboard 1 in a way similar to a common expander device with special voice-like sounds.
- Other ways of interpreting MIDI date are possible and useful, especially if an implementation of the invention employs more than one SC-02 sound processor, which is easily possible and enables the user to play like on a polyphonic keyboard, or to produce choir-like singing.
- Two preferred implementations of the invention include using a single SC-02 with 2 modes of interpreting the MIDI events, and using four SC-02 chips with 6 modes of interpreting the MIDI events.
- A MIDI-compatible sequencer-program is designed for the system using a phoneme editor-compiler synchronizing syllables of speech to MIDI-timecode.
- Any text-editor can be used to write a "phonetic score" which determines all parameters for the artificial singer like pitch, dynamic, timing, etc.
- The digital controlled analogue chip is using the concept of phoneme synthesis, which makes it relatively easy to program for real time application.
- The sounding elements of speech or singing are defined as vowels, voiced and unvoiced consonants and plosives, a variety of timbres like formants, noise bands and percussive sounds are generated.
- The 54 phonemes of the chip are configures to match with 54 MIDI-notes on any MIDI-keyboard from 36 to 93 in groups of vowels, consonants etc. as shown in Fig. 2. The speech synthesizer functions like a MIDI-expander, to be "played" with a keyboard or driven by the "speech-sequencer" program. In addition to this the chip is suitable as a multi bandpass-filter to process external sound sources.
- Any text to be uttered or sung is a potential "phonetic score". First a text is analyzed by its phonemic structure, through a text to speech conversion either automatically by rules, or by manual input.
-
- The words, syllables, or phonemes strings have to be divided or chopped in more or less equal frames based on the 24 pulses of the MIDI-clock.
-
- The "Frames" are marked by bars or slashes //// and funtion as sync marks.
- The program calculates a timebase of 24 divided by the number of phonemes, so that one syllable consisting of 7 and another consisting of 2 phonemes have the same duration. If the automatic conversion does not sound satisfying, subtle variations are possible through editing.
-
-
-
-
- Additionally the filter frquency and interpolation rate are controlled in realtime by MIDI pitchbend- and modulation wheel.
- It will be appreciated that the instant specification, example and claims are set forth by way of illustration and not limitation, and that various modifications, and changes may be made without departing from the spirit and scope of the present invention.
Claims (10)
1. A System for synthesizing singing in real time, comprising:
a plurality of manually actuatable means for producing different first electrical signals corresponding to each of the plurality of means actuated and varying in real time with duration and frequency of actuation;
a musical instrument digital interface receptive of the electrical signals from the plurality of manually actuatable means for generating standard data output signals in real time for musical notes corresponding to the first electrical signals;
at least one phoneme speech synthesizer receptive of phoneme codes for generating real time analog signals corresponding to a signing voice; and
means receptive of the interface output signals for converting same in real time to phoneme codes and for applying same to the speech synthesizer in real time.
a plurality of manually actuatable means for producing different first electrical signals corresponding to each of the plurality of means actuated and varying in real time with duration and frequency of actuation;
a musical instrument digital interface receptive of the electrical signals from the plurality of manually actuatable means for generating standard data output signals in real time for musical notes corresponding to the first electrical signals;
at least one phoneme speech synthesizer receptive of phoneme codes for generating real time analog signals corresponding to a signing voice; and
means receptive of the interface output signals for converting same in real time to phoneme codes and for applying same to the speech synthesizer in real time.
2. The system according to claim 1, wherein the plurality of actuatable means includes means for varying the resulting vocal tract length in real time.
3. The system according to claim 1, wherein the plurality of actuatable means includes means for varying the resulting interpolation timing in real time.
4. The system according to claim 1, wherein the plurality of actuatable means includes means for varying the resulting pitch in real time.
5. The system according to claim 1, comprising a plurality of speech synthesizers, each corresponding to a different singing voice.
6. A method of synthesizing singing in real time, comprising:
producing different first electrical signals corresponding to each of a plurality of manually actuatable means actuated and varying the signals in real time with duration and frequency of actuation;
generating standrad musical instrument digital interface (MIDI) data output signals in real time for musical notes corresponding to the first electrical signals;
for converting the MIDI output signals in real time to phoneme codes; and
generating real time analog audio signals corresponding to a singing voice in a phoneme speech synthesizer from the phoneme codes.
producing different first electrical signals corresponding to each of a plurality of manually actuatable means actuated and varying the signals in real time with duration and frequency of actuation;
generating standrad musical instrument digital interface (MIDI) data output signals in real time for musical notes corresponding to the first electrical signals;
for converting the MIDI output signals in real time to phoneme codes; and
generating real time analog audio signals corresponding to a singing voice in a phoneme speech synthesizer from the phoneme codes.
7. The method according to claim 6, wherein the generating of first electrical signals includes varying the electrical signals in real time to vary the resulting vocal tract length in real time.
8. The method according to claim 6, wherein the generating of first electrical signals includes varying the electrical signals in real time to vary the resulting interpolation timing in real time.
9. The method according to claim 6, wherein the generating of first electrical signals includes varying the electrical signals in real time to vary the resulting pitch in real time.
10. The method according to claim 6, wherein the step of generating audio signals comprises providing a plurality of speech synthesizers, each corresponding to a different singing voice.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US34736789A | 1989-05-04 | 1989-05-04 | |
US347367 | 1989-05-04 |
Publications (1)
Publication Number | Publication Date |
---|---|
EP0396141A2 true EP0396141A2 (en) | 1990-11-07 |
Family
ID=23363411
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
EP19900108393 Withdrawn EP0396141A2 (en) | 1989-05-04 | 1990-05-03 | System for and method of synthesizing singing in real time |
Country Status (1)
Country | Link |
---|---|
EP (1) | EP0396141A2 (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP0723256A3 (en) * | 1995-01-17 | 1996-11-13 | Yamaha Corp | Karaoke apparatus modifying live singing voice by model voice |
EP0729130A3 (en) * | 1995-02-27 | 1997-01-08 | Yamaha Corp | Karaoke apparatus synthetic harmony voice over actual singing voice |
EP2733696A1 (en) * | 2012-11-14 | 2014-05-21 | Yamaha Corporation | Voice synthesizing method and voice synthesizing apparatus |
-
1990
- 1990-05-03 EP EP19900108393 patent/EP0396141A2/en not_active Withdrawn
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP0723256A3 (en) * | 1995-01-17 | 1996-11-13 | Yamaha Corp | Karaoke apparatus modifying live singing voice by model voice |
US5955693A (en) * | 1995-01-17 | 1999-09-21 | Yamaha Corporation | Karaoke apparatus modifying live singing voice by model voice |
EP0729130A3 (en) * | 1995-02-27 | 1997-01-08 | Yamaha Corp | Karaoke apparatus synthetic harmony voice over actual singing voice |
US5857171A (en) * | 1995-02-27 | 1999-01-05 | Yamaha Corporation | Karaoke apparatus using frequency of actual singing voice to synthesize harmony voice from stored voice information |
EP2733696A1 (en) * | 2012-11-14 | 2014-05-21 | Yamaha Corporation | Voice synthesizing method and voice synthesizing apparatus |
US10002604B2 (en) | 2012-11-14 | 2018-06-19 | Yamaha Corporation | Voice synthesizing method and voice synthesizing apparatus |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP3333022B2 (en) | Singing voice synthesizer | |
US6191349B1 (en) | Musical instrument digital interface with speech capability | |
US5857171A (en) | Karaoke apparatus using frequency of actual singing voice to synthesize harmony voice from stored voice information | |
JP7673786B2 (en) | Electronic musical instrument, method and program | |
US7124084B2 (en) | Singing voice-synthesizing method and apparatus and storage medium | |
JP3144273B2 (en) | Automatic singing device | |
EP1512140B1 (en) | Musical notation system | |
US11417312B2 (en) | Keyboard instrument and method performed by computer of keyboard instrument | |
JPH10105169A (en) | Harmony data generating device and karaoke (sing along machine) device | |
Lindemann | Music synthesis with reconstructive phrase modeling | |
JPH11184490A (en) | Singing voice synthesis method using regular speech synthesis | |
WO2020217801A1 (en) | Audio information playback method and device, audio information generation method and device, and program | |
JP3518253B2 (en) | Data editing device | |
EP0396141A2 (en) | System for and method of synthesizing singing in real time | |
JP3307283B2 (en) | Singing sound synthesizer | |
JPH11126083A (en) | Karaoke playback device | |
JPH11282483A (en) | Karaoke device | |
JP7276292B2 (en) | Electronic musical instrument, electronic musical instrument control method, and program | |
JP3233036B2 (en) | Singing sound synthesizer | |
JP2002221978A (en) | Vocal data forming device, vocal data forming method and singing tone synthesizer | |
JPH0895588A (en) | Speech synthesizing device | |
JP3265995B2 (en) | Singing voice synthesis apparatus and method | |
JP2904045B2 (en) | Karaoke equipment | |
JPH1031496A (en) | Tone generator | |
JPS6183600A (en) | Singing voice synthesizer/performer |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PUAI | Public reference made under article 153(3) epc to a published international application that has entered the european phase |
Free format text: ORIGINAL CODE: 0009012 |
|
AK | Designated contracting states |
Kind code of ref document: A2 Designated state(s): DE FR GB |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: THE APPLICATION IS DEEMED TO BE WITHDRAWN |
|
18D | Application deemed to be withdrawn |
Effective date: 19921202 |