US5943648A - Speech signal distribution system providing supplemental parameter associated data - Google Patents
Speech signal distribution system providing supplemental parameter associated data Download PDFInfo
- Publication number
- US5943648A US5943648A US08/638,061 US63806196A US5943648A US 5943648 A US5943648 A US 5943648A US 63806196 A US63806196 A US 63806196A US 5943648 A US5943648 A US 5943648A
- Authority
- US
- United States
- Prior art keywords
- stream
- data stream
- parameters
- data
- text
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Lifetime
Links
- 230000000153 supplemental effect Effects 0.000 title claims abstract description 38
- 238000009826 distribution Methods 0.000 title claims abstract description 31
- 230000005236 sound signal Effects 0.000 claims abstract description 38
- 230000001755 vocal effect Effects 0.000 claims abstract description 19
- 238000000034 method Methods 0.000 claims description 19
- 238000004891 communication Methods 0.000 claims description 9
- 238000006243 chemical reaction Methods 0.000 claims description 7
- 230000036651 mood Effects 0.000 claims description 7
- 238000012545 processing Methods 0.000 claims description 7
- 230000005540 biological transmission Effects 0.000 description 11
- 238000004519 manufacturing process Methods 0.000 description 5
- 230000003139 buffering effect Effects 0.000 description 4
- 238000010586 diagram Methods 0.000 description 4
- 230000008921 facial expression Effects 0.000 description 3
- 238000003860 storage Methods 0.000 description 3
- 230000003044 adaptive effect Effects 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 238000003780 insertion Methods 0.000 description 2
- 230000037431 insertion Effects 0.000 description 2
- 239000000047 product Substances 0.000 description 2
- 230000004044 response Effects 0.000 description 2
- 239000013589 supplement Substances 0.000 description 2
- 238000013459 approach Methods 0.000 description 1
- 230000015572 biosynthetic process Effects 0.000 description 1
- 238000013144 data compression Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000002452 interceptive effect Effects 0.000 description 1
- 238000007726 management method Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 230000033764 rhythmic process Effects 0.000 description 1
- 230000008054 signal transmission Effects 0.000 description 1
- 230000001360 synchronised effect Effects 0.000 description 1
- 238000003786 synthesis reaction Methods 0.000 description 1
- 238000013519 translation Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
- G10L13/08—Text analysis or generation of parameters for speech synthesis out of text, e.g. grapheme to phoneme translation, prosody generation or stress or intonation determination
Definitions
- the present invention relates generally to systems for transmitting voice messages in encoded form via a transmission media, and particularly to a system and method for converting text into an encoded voice message that includes both voice reproduction information as well as semantic and contextual information to enable a receiving system to produce audio signals in units of full sentences, to generate animated pictures of a person speaking simultaneously with the production of the corresponding audio signals, and to override voice settings selected by the transmitting system.
- voice or audio messages are also transmitted via computer networks, including the Internet and the part of the Internet known as the World Wide Web.
- voice messages are transmitted in a digital, compressed, encoded form.
- LPC linear predictive coding
- adaptive LPC adaptive LPC are used to compress voice signals from a raw data rate of 8 to 10 kilobytes per second to data rates in the range of 1 to 3 kilobytes per second.
- Voice quality is usually rather poor for voice signals compressed using LPC techniques down to data rates under 1.5 kilobytes per second.
- Text is enormously more efficient in its use of bandwidth than voice, at least in terms of the amount of data required to transmit a given amount of information. While text transmission (including the transmission of various binary document files) is fine for recipients who have the facilities and inclination to read the transmitted text, there are many contexts in which it is either essential or desirable for recipients to have information communicated to them orally. In such contexts, the transmission of text to the recipient is feasible only if the receiving system includes text to speech conversion apparatus or software.
- Text to speech conversion is the process by which raw text, such as the words in a memorandum or other document or file, are converted into audio signals. There are a number of competing approaches for text to speech conversion.
- the text to speech conversion methodology used by the present invention is described in some detail in U.S. Pat. No. 4,979,216.
- the present invention addresses another problem associated with real time distribution of digitized voice messages via computer network connections.
- a network server such as World Wide Web (hereinafter Web) server and a client computer to experience periods during which the rate of transmission is highly variable, often including periods of one or more seconds in which the data rate is zero.
- Web World Wide Web
- Yet another problem with existing speech message transmission systems is that there is very little the receiving system can do with the received message other than "play it" as an audio signal. That is, the receiving system generally cannot determine what is being said, cannot modify the voice characteristics of received signals except in very primitive ways (e.g, with a graphic band equalizer), and cannot perform any actions, such as generating a corresponding animation of a speaking person, that would require information about the words or phonemes being spoken.
- Another object of the present invention is to transmit a high quality speech signal to receiving systems using a bandwidth of less than 1.5 kilobytes per second.
- Another object of the present invention is to transmit a speech signal to receiving systems with sentence boundary data embedded in the speech signal so as to enable the receiving systems to present audio speech signals as full, uninterrupted sentences, despite any interruptions in the transmission of said speech signal.
- Yet another object of the present invention is to transmit a speech signal to receiving systems with lip position data embedded in the speech signal so as to enable the receiving systems to generate an animated mouth-like image that moves in accordance with the lip position data in the received data stream.
- Still another object of the present invention is to transmit a speech signal to receiving systems with voice setting data (e.g., indicating special effects to be applied to the speech signal) embedded in the speech signal so as to enable the receiving systems to control the generation of audio speech signals in accordance with the voice setting data in the received data stream.
- voice setting data e.g., indicating special effects to be applied to the speech signal
- the present invention is a speech signal distribution system that includes a transmitting subsystem and one or more receiving subsystems.
- the transmitting subsystem has a text to speech converter for converting text into a data stream of formant parameters.
- a supplemental parameter generator inserts into the data stream supplemental data, including linguistic boundary data indicating which parameters in the stream of formant parameters are associated with predefined linguistic boundaries in the text.
- the boundary data indicates which formant parameters in the data stream are associated with sentence boundaries.
- the supplemental parameter generator optionally inserts the text, lip position data corresponding to phonemes in the text, and voice setting data into the data stream. The resulting data stream is compressed and transmitted to the receiving subsystems.
- the receiving subsystem receives the transmitted compressed data stream, decompresses it to regenerate the full data stream, and splits off the supplemental data.
- the formant data is buffered until boundary data is received indicating that a full sentence, or other linguistic unit, has been received.
- the formant data received before the boundary data is processed by an audio signal generator that converts the formant parameters into an audio speech signal in accordance with a vocal tract model.
- Voice settings in the supplemental data are passed to the audio signal generator, which modifies audio signal generation accordingly.
- Text in the supplemental data may be processed by a closed captioning program for simultaneously displaying text while the text is being spoken, or by a text translation program for translating the text being spoken into another language.
- Lip position data in the supplemental data may be processed by an animation program to generate animated pictures of a person speaking simultaneously with the production of the corresponding audio signals.
- the user of the receiving subsystem may optionally apply voice settings to the audio signal generator to either supplement or override the voice settings provided by the transmitting subsystem.
- FIG. 1 is a block diagram of a speech signal distribution system in accordance with a preferred embodiment of the present invention.
- FIG. 2 is a block diagram of a computer system incorporating a transmitting subsystem in a speech signal distribution system.
- FIG. 3 is a block diagram of a computer system incorporating a receiving subsystem in a speech signal distribution system.
- FIG. 4 is a block diagram of a second speech signal distribution system, that is compatible with the receiving subsystems of the system in FIG. 1, in accordance with a preferred embodiment of the present invention.
- the transmitter subsystem 102 is an information server, such as a (World Wide) Web server or interactive voice response (IVR) system that has a control application 110 that dispenses information from an information database 112 to end users using the receiving subsystems 104.
- the receiving subsystem 104 will also typically include a control application 114, such as Web browser or an IVR client application, that receives information from the information server and passes it to a speech generator 116 and other procedures.
- a control application 114 such as Web browser or an IVR client application
- the transmitting and receiving subsystems preferably each have memory (both RAM and nonvolatile memory) 105 for storing programs and data, a central processing unit (CPU) 106, a user interface 107, a communications interface 108 for exchanging data with other computers, and an operating system 109 that provides the basic environment in which other programs are executed.
- memory both RAM and nonvolatile memory
- CPU central processing unit
- user interface 107 for storing programs and data
- communications interface 108 for exchanging data with other computers
- an operating system 109 that provides the basic environment in which other programs are executed.
- the control application 110 and the associated information database 112 output raw text in response to either a user's information request, or as part of some other information dispensing task (such as an "electronic mail" event or a scheduled information dispensing task).
- Raw text can also be received from other sources, such as another application program, or from the user via the transmitting subsystem's user interface 107A.
- a modified text-to-speech (TTS) converter 120 converts the raw text into a time varying parameter stream that is then transmitted via a communications interface 108A and then a communications network 124 (such as the telephone network, the Internet, or a private communications network) to one or more receiving subsystems 104.
- TTS modified text-to-speech
- the TTS converter 120 is a modified version of Centigram Communication Corporation's TruVoice product (TruVoice is a registered trademark of Centigram Communication Corporation).
- TruVoice is a registered trademark of Centigram Communication Corporation.
- the text to speech conversion methodology used by the present invention is described in some detail in U.S. Pat. No. 4,979,216.
- the TruVoice product has been modified primarily to (A) insert additional information parameters not normally used during speech synthesis, and (B) perform data compression for more efficient speech signal transmission.
- the "conventional" aspects of the TTS converter 120 include a text normalizer 126 and those aspects of a linguistic analyzer and formant parameter generator 128 that are directed to generating "formant data" for use by a formant synthesizer.
- the text normalizer 126 expands abbreviations, numbers, ordinals, dates and the like into full words.
- the linguistic analyzer and formant parameter generator 128 converts words into phonemes using word to phoneme rules supplemented by a look up dictionary, adds word level stress assignments, and assigns allophones to represent vowel sounds based on the neighboring phonemes to produce a phoneme string (including allophones) with stress assignments.
- phoneme string is converted into formant parameters, in conjunction with the application of sentence level prosodics rules to determine the duration and fundamental frequency pattern of the words to be spoken (so as to give sentences a semblance of the rhythm and melody of a human speaker).
- the non-conventional aspects of the TTS converter 120 include facilities for passing four types of parameters to a data insertion procedure 130:
- voice settings some of which are derived by the text normalizer 126, such as a voice setting to distinguish text in quotes from other text, and some of which are provided by the control application 110, such as instructions to raise or lower the pitch of all the speech generated;
- lip position data which is derived by the modified linguistic analyzer from the phoneme string (i.e., a speaker's lip position is, in general, a function of the phoneme being spoken as well as the immediately preceding and following phonemes);
- stop frame data which indicates linguistic boundaries (such as sentence boundaries or phrase boundaries) in the speech.
- supplemental parameters can be inserted into the generated data stream, in many applications of the present invention only a subset of these parameters will be used. In alternate embodiments other types of supplemental data may be added to the format data stream.
- a sentence boundary indication is always inserted into the data stream immediately after the last data frame of formant data for a sentence.
- boundary data representing other linguistic boundaries, such as phrases or words could be inserted in the data stream.
- the boundary data is used to control flow of speech production so as to avoid unnatural sounding pauses in the middle of words, phrases and sentences.
- the text associated with the generated speech parameters is inserted in the data stream immediately prior to those speech parameters.
- the text data is useful for systems having a "closed captioning" program (i.e., for simultaneously displaying text while the text is being spoken), as well as receiving systems having features such as text translation programs 162 for translating the text being spoken into another language.
- Lip position data is inserted in the generated data stream immediately prior to the speech data for the associated phonemes so as to allow receiving systems that have an animation program 164 to generate animated pictures of a person speaking simultaneously with the production of the corresponding audio signals. That is, the lip synchronization data allows video animation of a speaker that is synchronized with the generation of audio speech signals.
- Voice settings are inserted in the generated data stream immediately prior to the first speech data to which those voice settings are applicable. Voice settings are usually changed relatively infrequently.
- the general form of the data stream passed to the data compressor 132 consists of speech data frames interleaved with supplemental data frames.
- the speech data frames also called formant data frames, includes "full frames” that include a full set of formant data as well as shorter frames, such as a special one-byte frame that represents one sample period of silence, and another one-byte frame that indicates a repeat of the previous formant data frame, as well as a short frame format for changing formant frequencies without changing fornant amplitude settings.
- the supplemental data frames include separate data frames for lip position data, text data, various voice settings, and linguistic boundary data.
- the data compressor 132 compresses the data stream so as to reduce the bandwidth used by the data stream transmitted to the receiver subsystems.
- the resulting data stream generally uses a bandwidth of less than 1.5 kilobytes per second and in the preferred embodiment generates a data stream having a bandwidth of less than 1.0 kilobytes per second.
- the resulting speech generated by the receiving system is comparable to the quality of speech generated by adaptive LPC systems using data rates of approximately 2 to 3 kilobytes per second.
- the linguistic analyzer and formant parameter generator 128 can include a plurality of predefined voice profiles 134, such as separate profiles for a man and a woman, or separate profiles for a set of specific individuals.
- the control procedure 110 indicates the voice profile to be used by providing a voice selection indication to the linguistic analyzer and formant parameter generator 128.
- the "information database" 112 may consist of a set of text files, rather than data in a database management system.
- the compressed data stream generated by the data compressor 132 may be stored in a storage device, such as a magnetic disk, prior to sending it to one or more receiving subsystems.
- a storage device such as a magnetic disk
- Such storage of compressed message data is needed if the transmitting subsystem works in a batch mode (e.g., storing messages over time and then sending all of them at a scheduled time), and may also be required for efficiency if the same message is to be transmitted multiple times to different receiving subsystems.
- the receiving subsystem 104 includes the aforementioned communications interface 108 for sending requests to the transmitting subsystem 102 and for receiving the resulting data stream.
- the received data stream is routed to a speech generator 116, and in particular to a data decompressor 150 that decompresses the received data stream into the full data stream, and then a data splitter procedure 152 that splits off the supplemental data from the formant parameters.
- the formant data is buffered by a speech frame buffering program 154 until boundary data is received indicating that a full sentence, or other linguistic unit, has been received. Then the buffering program releases the formant data received prior to the boundary data for processing by an audio signal generator 156, also known as a formant synthesizer, that converts the formant data into an audio speech signal in accordance with a vocal tract model.
- the buffering program 154 prevents the received speech data from being converted into an audio speech signal until all the data for a sentence or phrase has been received. This buffering of the speech data until the receipt of boundary data indicating a linguistic boundary avoids the generation of speech that stops and restarts mid-word or mid-phrase with silent periods of unpredictable length.
- the voice settings in the supplemental data are passed to the audio signal generator 156, which modifies audio signal generation accordingly.
- the resulting audio speech signal is converted into audio sound energy by an audio speaker 158.
- the audio speaker 158 is typically driven by a sound card, and thus the audio speech signal generated by the audio signal generator 156 must typically be processed by a device driver program associated with the sound card, and then the sound card, before the audio speech signal is actually converted into audio sound energy by the audio speaker 158.
- Text in the supplemental data may be processed by a closed captioning program 160 for simultaneously displaying text on a television of computer monitor 161 while the text is being "spoken," by the speech generator, or by a text translation program 162 for translating the text being spoken into another language.
- Lip position data in the supplemental data may be processed by an animation program 164 to generate animated pictures (on monitor 161) of a person speaking simultaneously with the production of the corresponding audio signals.
- the animation program 164 uses the lip position data to control the mouth position (and a portion of the facial expressions) of a person in an animated image.
- the control program 114 of the receiving subsystem may optionally include instructions for enabling a user of the receiving subsystem to apply voice settings to the audio signal generator 156 to either supplement or override the voice settings provided by the transmitting subsystem.
- the receiving subsystem 104 may further include storage 159 for storing one or more received messages, including both the speech parameters as well as the supplemental parameters of those messages. This allows the control application 114 to perform "tape recorder" functions such are replaying portions of a message. Since the message stored by the receiving subsystem has sentence boundary information embedded in the message, the control application 114 enables the user to "jump backward” and “jump forward” a whole sentence at a time, instead of a fixed number of seconds like a normal tape recorder.
- FIG. 4 shows a system 200 in which the receiving subsystem 104 is the same as shown in FIGS. 1 and 3, but uses a different transmitting subsystem 202 that accepts voice input 204 and outputs a formant data stream similar to that produced by the transmitting subsystem 102 described above with reference to FIGS. 1 and 2.
- the voice input is processed by emphasis filters 206, a pitch and formant analyzer 208, a parameter generator 210 for generating a stream of formant parameters, and a data compressor 212.
- the transmitting subsystem 202 may optionally include a speech recognition subsystem (not shown) for generating text corresponding to the voice input, as well as supplemental procedures for generating lip position data corresponding to the phonemes in the generated text, voice setting data representing various characteristics of the voice input, and boundary data to represent sentence or other linguistic boundaries in the voice input, as well as a data insertion procedure for inserting the text, lip position data, voice setting data and boundary data into the data stream processed by the data compressor 212.
- a speech recognition subsystem for generating text corresponding to the voice input
- supplemental procedures for generating lip position data corresponding to the phonemes in the generated text
- voice setting data representing various characteristics of the voice input
- boundary data to represent sentence or other linguistic boundaries in the voice input
- a data insertion procedure for inserting the text, lip position data, voice setting data and boundary data into the data stream processed by the data compressor 212.
- the receiving subsystems 104 are compatible with transmitting subsystems 102 that convert text into a stream of speech parameters as well as transmitting subsystems 202 that convert voice input into a stream of speech parameters.
- the receiving subsystem's animation program 164 may be enhanced to generate animated pictures that show the facial expressions, mood and gestures represented by this supplemental data.
- the animation program 164 may be used to drive devices other than a computer monitor, such as an LCD screen or other media suitable for displaying animated figures or images.
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)
Abstract
Description
Claims (23)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US08/638,061 US5943648A (en) | 1996-04-25 | 1996-04-25 | Speech signal distribution system providing supplemental parameter associated data |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US08/638,061 US5943648A (en) | 1996-04-25 | 1996-04-25 | Speech signal distribution system providing supplemental parameter associated data |
Publications (1)
Publication Number | Publication Date |
---|---|
US5943648A true US5943648A (en) | 1999-08-24 |
Family
ID=24558485
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US08/638,061 Expired - Lifetime US5943648A (en) | 1996-04-25 | 1996-04-25 | Speech signal distribution system providing supplemental parameter associated data |
Country Status (1)
Country | Link |
---|---|
US (1) | US5943648A (en) |
Cited By (53)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6199067B1 (en) * | 1999-01-20 | 2001-03-06 | Mightiest Logicon Unisearch, Inc. | System and method for generating personalized user profiles and for utilizing the generated user profiles to perform adaptive internet searches |
US6324511B1 (en) * | 1998-10-01 | 2001-11-27 | Mindmaker, Inc. | Method of and apparatus for multi-modal information presentation to computer users with dyslexia, reading disabilities or visual impairment |
US20010048735A1 (en) * | 1999-01-29 | 2001-12-06 | O'neal Stephen C. | Apparatus and method for channel-transparent multimedia broadcast messaging |
US6404872B1 (en) * | 1997-09-25 | 2002-06-11 | At&T Corp. | Method and apparatus for altering a speech signal during a telephone call |
WO2002047067A2 (en) * | 2000-12-04 | 2002-06-13 | Sisbit Ltd. | Improved speech transformation system and apparatus |
US20020072918A1 (en) * | 1999-04-12 | 2002-06-13 | White George M. | Distributed voice user interface |
US6411687B1 (en) * | 1997-11-11 | 2002-06-25 | Mitel Knowledge Corporation | Call routing based on the caller's mood |
US6412011B1 (en) * | 1998-09-14 | 2002-06-25 | At&T Corp. | Method and apparatus to enhance a multicast information stream in a communication network |
US20020123912A1 (en) * | 2000-10-31 | 2002-09-05 | Contextweb | Internet contextual communication system |
WO2002075719A1 (en) * | 2001-03-15 | 2002-09-26 | Lips, Inc. | Methods and systems of simulating movement accompanying speech |
US20020184036A1 (en) * | 1999-12-29 | 2002-12-05 | Nachshon Margaliot | Apparatus and method for visible indication of speech |
US6493428B1 (en) * | 1998-08-18 | 2002-12-10 | Siemens Information & Communication Networks, Inc | Text-enhanced voice menu system |
US6587822B2 (en) * | 1998-10-06 | 2003-07-01 | Lucent Technologies Inc. | Web-based platform for interactive voice response (IVR) |
US20030135375A1 (en) * | 2002-01-14 | 2003-07-17 | Bloomstein Richard W. | Encoding speech segments for economical transmission and automatic playing at remote computers |
US20040015361A1 (en) * | 2002-07-22 | 2004-01-22 | Bloomstein Richard W. | Encoding media data for decompression at remote computers employing automatic decoding options |
US20040122678A1 (en) * | 2002-12-10 | 2004-06-24 | Leslie Rousseau | Device and method for translating language |
US6757657B1 (en) * | 1999-09-03 | 2004-06-29 | Sony Corporation | Information processing apparatus, information processing method and program storage medium |
US6810379B1 (en) * | 2000-04-24 | 2004-10-26 | Sensory, Inc. | Client/server architecture for text-to-speech synthesis |
US20040215460A1 (en) * | 2003-04-25 | 2004-10-28 | Eric Cosatto | System for low-latency animation of talking heads |
US20050091057A1 (en) * | 1999-04-12 | 2005-04-28 | General Magic, Inc. | Voice application development methodology |
US20050216267A1 (en) * | 2002-09-23 | 2005-09-29 | Infineon Technologies Ag | Method and system for computer-aided speech synthesis |
US20050261907A1 (en) * | 1999-04-12 | 2005-11-24 | Ben Franklin Patent Holding Llc | Voice integration platform |
US6990094B1 (en) * | 1999-01-29 | 2006-01-24 | Microsoft Corporation | Method and apparatus for network independent initiation of telephony |
US7003463B1 (en) * | 1998-10-02 | 2006-02-21 | International Business Machines Corporation | System and method for providing network coordinated conversational services |
US7027568B1 (en) * | 1997-10-10 | 2006-04-11 | Verizon Services Corp. | Personal message service with enhanced text to speech synthesis |
US20060122836A1 (en) * | 2004-12-08 | 2006-06-08 | International Business Machines Corporation | Dynamic switching between local and remote speech rendering |
US7076426B1 (en) * | 1998-01-30 | 2006-07-11 | At&T Corp. | Advance TTS for facial animation |
US20060195323A1 (en) * | 2003-03-25 | 2006-08-31 | Jean Monne | Distributed speech recognition system |
US7130790B1 (en) | 2000-10-24 | 2006-10-31 | Global Translations, Inc. | System and method for closed caption data translation |
US7159008B1 (en) * | 2000-06-30 | 2007-01-02 | Immersion Corporation | Chat interface with haptic feedback functionality |
US20070038779A1 (en) * | 1996-05-01 | 2007-02-15 | Hickman Paul L | Method and apparatus for accessing a wide area network |
US20070055569A1 (en) * | 2005-08-11 | 2007-03-08 | Contextweb | Method and system for placement and pricing of internet-based advertisements or services |
US20080052069A1 (en) * | 2000-10-24 | 2008-02-28 | Global Translation, Inc. | Integrated speech recognition, closed captioning, and translation system and method |
US20080126491A1 (en) * | 2004-05-14 | 2008-05-29 | Koninklijke Philips Electronics, N.V. | Method for Transmitting Messages from a Sender to a Recipient, a Messaging System and Message Converting Means |
US20080195386A1 (en) * | 2005-05-31 | 2008-08-14 | Koninklijke Philips Electronics, N.V. | Method and a Device For Performing an Automatic Dubbing on a Multimedia Signal |
US20080287147A1 (en) * | 2007-05-18 | 2008-11-20 | Immersion Corporation | Haptically Enabled Messaging |
US20090012903A1 (en) * | 2006-01-26 | 2009-01-08 | Contextweb, Inc. | Online exchange for internet ad media |
DE102007043264A1 (en) * | 2007-09-11 | 2009-03-12 | Siemens Ag | Speech signal e.g. traffic information, outputting device for e.g. navigation car radio, has signal store storing speech signal, and output unit outputting speech signal upon recognized sentence limitation of speech signal |
US20100023396A1 (en) * | 2006-01-26 | 2010-01-28 | ContextWeb,Inc. | New open insertion order system to interface with an exchange for internet ad media |
US20100049521A1 (en) * | 2001-06-15 | 2010-02-25 | Nuance Communications, Inc. | Selective enablement of speech recognition grammars |
US20100250256A1 (en) * | 2009-03-31 | 2010-09-30 | Namco Bandai Games Inc. | Character mouth shape control method |
US20120259623A1 (en) * | 1997-04-14 | 2012-10-11 | AT&T Intellectual Properties II, L.P. | System and Method of Providing Generated Speech Via A Network |
WO2012148369A1 (en) * | 2011-04-27 | 2012-11-01 | Echostar Ukraine L.L.C. | Content receiver system and method for providing supplemental content in translated and/or audio form |
US20130122871A1 (en) * | 2011-11-16 | 2013-05-16 | At & T Intellectual Property I, L.P. | System And Method For Augmenting Features Of Visual Voice Mail |
US8515029B2 (en) | 2011-11-02 | 2013-08-20 | At&T Intellectual Property I, L.P. | System and method for visual voice mail in an LTE environment |
US20130339007A1 (en) * | 2012-06-18 | 2013-12-19 | International Business Machines Corporation | Enhancing comprehension in voice communications |
US8898065B2 (en) | 2011-01-07 | 2014-11-25 | Nuance Communications, Inc. | Configurable speech recognition system using multiple recognizers |
US9025739B2 (en) | 2011-10-20 | 2015-05-05 | At&T Intellectual Property I, L.P. | System and method for visual voice mail in a multi-screen environment |
US9042527B2 (en) | 2011-10-17 | 2015-05-26 | At&T Intellectual Property I, L.P. | Visual voice mail delivery mechanisms |
US9282185B2 (en) | 2011-10-17 | 2016-03-08 | At&T Intellectual Property I, L.P. | System and method for callee-caller specific greetings for voice mail |
US9886944B2 (en) | 2012-10-04 | 2018-02-06 | Nuance Communications, Inc. | Hybrid controller for ASR |
WO2018132721A1 (en) * | 2017-01-12 | 2018-07-19 | The Regents Of The University Of Colorado, A Body Corporate | Method and system for implementing three-dimensional facial modeling and visual speech synthesis |
US10971157B2 (en) | 2017-01-11 | 2021-04-06 | Nuance Communications, Inc. | Methods and apparatus for hybrid speech recognition processing |
Citations (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4913539A (en) * | 1988-04-04 | 1990-04-03 | New York Institute Of Technology | Apparatus and method for lip-synching animation |
US5111409A (en) * | 1989-07-21 | 1992-05-05 | Elon Gasper | Authoring and use systems for sound synchronized animation |
US5164980A (en) * | 1990-02-21 | 1992-11-17 | Alkanox Corporation | Video telephone system |
US5208745A (en) * | 1988-07-25 | 1993-05-04 | Electric Power Research Institute | Multimedia interface and method for computer system |
US5231492A (en) * | 1989-03-16 | 1993-07-27 | Fujitsu Limited | Video and audio multiplex transmission system |
US5241619A (en) * | 1991-06-25 | 1993-08-31 | Bolt Beranek And Newman Inc. | Word dependent N-best search method |
US5278943A (en) * | 1990-03-23 | 1994-01-11 | Bright Star Technology, Inc. | Speech animation and inflection system |
US5347306A (en) * | 1993-12-17 | 1994-09-13 | Mitsubishi Electric Research Laboratories, Inc. | Animated electronic meeting place |
US5357596A (en) * | 1991-11-18 | 1994-10-18 | Kabushiki Kaisha Toshiba | Speech dialogue system for facilitating improved human-computer interaction |
US5367454A (en) * | 1992-06-26 | 1994-11-22 | Fuji Xerox Co., Ltd. | Interactive man-machine interface for simulating human emotions |
US5608839A (en) * | 1994-03-18 | 1997-03-04 | Lucent Technologies Inc. | Sound-synchronized video system |
US5613056A (en) * | 1991-02-19 | 1997-03-18 | Bright Star Technology, Inc. | Advanced tools for speech synchronized animation |
US5623690A (en) * | 1992-06-03 | 1997-04-22 | Digital Equipment Corporation | Audio/video storage and retrieval for multimedia workstations by interleaving audio and video data in data file |
US5644355A (en) * | 1992-02-24 | 1997-07-01 | Intelligent Instruments Corporation | Adaptive video subscriber system and methods for its use |
US5652828A (en) * | 1993-03-19 | 1997-07-29 | Nynex Science & Technology, Inc. | Automated voice synthesis employing enhanced prosodic treatment of text, spelling of text and rate of annunciation |
US5822727A (en) * | 1995-03-30 | 1998-10-13 | At&T Corp | Method for automatic speech recognition in telephony |
-
1996
- 1996-04-25 US US08/638,061 patent/US5943648A/en not_active Expired - Lifetime
Patent Citations (21)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4913539A (en) * | 1988-04-04 | 1990-04-03 | New York Institute Of Technology | Apparatus and method for lip-synching animation |
US5208745A (en) * | 1988-07-25 | 1993-05-04 | Electric Power Research Institute | Multimedia interface and method for computer system |
US5231492A (en) * | 1989-03-16 | 1993-07-27 | Fujitsu Limited | Video and audio multiplex transmission system |
US5111409A (en) * | 1989-07-21 | 1992-05-05 | Elon Gasper | Authoring and use systems for sound synchronized animation |
US5164980A (en) * | 1990-02-21 | 1992-11-17 | Alkanox Corporation | Video telephone system |
US5278943A (en) * | 1990-03-23 | 1994-01-11 | Bright Star Technology, Inc. | Speech animation and inflection system |
US5630017A (en) * | 1991-02-19 | 1997-05-13 | Bright Star Technology, Inc. | Advanced tools for speech synchronized animation |
US5613056A (en) * | 1991-02-19 | 1997-03-18 | Bright Star Technology, Inc. | Advanced tools for speech synchronized animation |
US5241619A (en) * | 1991-06-25 | 1993-08-31 | Bolt Beranek And Newman Inc. | Word dependent N-best search method |
US5577165A (en) * | 1991-11-18 | 1996-11-19 | Kabushiki Kaisha Toshiba | Speech dialogue system for facilitating improved human-computer interaction |
US5357596A (en) * | 1991-11-18 | 1994-10-18 | Kabushiki Kaisha Toshiba | Speech dialogue system for facilitating improved human-computer interaction |
US5644355A (en) * | 1992-02-24 | 1997-07-01 | Intelligent Instruments Corporation | Adaptive video subscriber system and methods for its use |
US5623690A (en) * | 1992-06-03 | 1997-04-22 | Digital Equipment Corporation | Audio/video storage and retrieval for multimedia workstations by interleaving audio and video data in data file |
US5367454A (en) * | 1992-06-26 | 1994-11-22 | Fuji Xerox Co., Ltd. | Interactive man-machine interface for simulating human emotions |
US5652828A (en) * | 1993-03-19 | 1997-07-29 | Nynex Science & Technology, Inc. | Automated voice synthesis employing enhanced prosodic treatment of text, spelling of text and rate of annunciation |
US5732395A (en) * | 1993-03-19 | 1998-03-24 | Nynex Science & Technology | Methods for controlling the generation of speech from text representing names and addresses |
US5751906A (en) * | 1993-03-19 | 1998-05-12 | Nynex Science & Technology | Method for synthesizing speech from text and for spelling all or portions of the text by analogy |
US5832435A (en) * | 1993-03-19 | 1998-11-03 | Nynex Science & Technology Inc. | Methods for controlling the generation of speech from text representing one or more names |
US5347306A (en) * | 1993-12-17 | 1994-09-13 | Mitsubishi Electric Research Laboratories, Inc. | Animated electronic meeting place |
US5608839A (en) * | 1994-03-18 | 1997-03-04 | Lucent Technologies Inc. | Sound-synchronized video system |
US5822727A (en) * | 1995-03-30 | 1998-10-13 | At&T Corp | Method for automatic speech recognition in telephony |
Non-Patent Citations (4)
Title |
---|
Edmund X. Dejesus, "How the Internet Will Replace Broadcasting", Feb. 1996, BYTE, pp. 51-64. |
Edmund X. Dejesus, How the Internet Will Replace Broadcasting , Feb. 1996, BYTE, pp. 51 64. * |
Newton s Telecom Dictionary, p. 113, definition of audio signal, 1996. * |
Newton's Telecom Dictionary, p. 113, definition of audio signal, 1996. |
Cited By (116)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20070038779A1 (en) * | 1996-05-01 | 2007-02-15 | Hickman Paul L | Method and apparatus for accessing a wide area network |
US20070206737A1 (en) * | 1996-05-01 | 2007-09-06 | Hickman Paul L | Method and apparatus for accessing a wide area network |
US20120259623A1 (en) * | 1997-04-14 | 2012-10-11 | AT&T Intellectual Properties II, L.P. | System and Method of Providing Generated Speech Via A Network |
US9065914B2 (en) * | 1997-04-14 | 2015-06-23 | At&T Intellectual Property Ii, L.P. | System and method of providing generated speech via a network |
US6404872B1 (en) * | 1997-09-25 | 2002-06-11 | At&T Corp. | Method and apparatus for altering a speech signal during a telephone call |
US7027568B1 (en) * | 1997-10-10 | 2006-04-11 | Verizon Services Corp. | Personal message service with enhanced text to speech synthesis |
US6411687B1 (en) * | 1997-11-11 | 2002-06-25 | Mitel Knowledge Corporation | Call routing based on the caller's mood |
US7076426B1 (en) * | 1998-01-30 | 2006-07-11 | At&T Corp. | Advance TTS for facial animation |
US6493428B1 (en) * | 1998-08-18 | 2002-12-10 | Siemens Information & Communication Networks, Inc | Text-enhanced voice menu system |
US6412011B1 (en) * | 1998-09-14 | 2002-06-25 | At&T Corp. | Method and apparatus to enhance a multicast information stream in a communication network |
US6564186B1 (en) * | 1998-10-01 | 2003-05-13 | Mindmaker, Inc. | Method of displaying information to a user in multiple windows |
US6324511B1 (en) * | 1998-10-01 | 2001-11-27 | Mindmaker, Inc. | Method of and apparatus for multi-modal information presentation to computer users with dyslexia, reading disabilities or visual impairment |
US20090287477A1 (en) * | 1998-10-02 | 2009-11-19 | Maes Stephane H | System and method for providing network coordinated conversational services |
US20060111909A1 (en) * | 1998-10-02 | 2006-05-25 | Maes Stephane H | System and method for providing network coordinated conversational services |
US8332227B2 (en) * | 1998-10-02 | 2012-12-11 | Nuance Communications, Inc. | System and method for providing network coordinated conversational services |
US7003463B1 (en) * | 1998-10-02 | 2006-02-21 | International Business Machines Corporation | System and method for providing network coordinated conversational services |
US8868425B2 (en) | 1998-10-02 | 2014-10-21 | Nuance Communications, Inc. | System and method for providing network coordinated conversational services |
US7519536B2 (en) * | 1998-10-02 | 2009-04-14 | Nuance Communications, Inc. | System and method for providing network coordinated conversational services |
US9761241B2 (en) | 1998-10-02 | 2017-09-12 | Nuance Communications, Inc. | System and method for providing network coordinated conversational services |
US6587822B2 (en) * | 1998-10-06 | 2003-07-01 | Lucent Technologies Inc. | Web-based platform for interactive voice response (IVR) |
US6199067B1 (en) * | 1999-01-20 | 2001-03-06 | Mightiest Logicon Unisearch, Inc. | System and method for generating personalized user profiles and for utilizing the generated user profiles to perform adaptive internet searches |
US20010048735A1 (en) * | 1999-01-29 | 2001-12-06 | O'neal Stephen C. | Apparatus and method for channel-transparent multimedia broadcast messaging |
US7035383B2 (en) | 1999-01-29 | 2006-04-25 | Microsoft Corporation | Apparatus and method for channel-transparent multimedia broadcast messaging |
US7649983B2 (en) | 1999-01-29 | 2010-01-19 | Microsoft Corporation | Apparatus and method for channel-transparent multimedia broadcast messaging |
US20060171514A1 (en) * | 1999-01-29 | 2006-08-03 | Microsoft Corporation | Apparatus and method for channel-transparent multimedia broadcast messaging |
US6990094B1 (en) * | 1999-01-29 | 2006-01-24 | Microsoft Corporation | Method and apparatus for network independent initiation of telephony |
US8396710B2 (en) | 1999-04-12 | 2013-03-12 | Ben Franklin Patent Holding Llc | Distributed voice user interface |
US8762155B2 (en) | 1999-04-12 | 2014-06-24 | Intellectual Ventures I Llc | Voice integration platform |
US20050091057A1 (en) * | 1999-04-12 | 2005-04-28 | General Magic, Inc. | Voice application development methodology |
US7769591B2 (en) | 1999-04-12 | 2010-08-03 | White George M | Distributed voice user interface |
US20050261907A1 (en) * | 1999-04-12 | 2005-11-24 | Ben Franklin Patent Holding Llc | Voice integration platform |
US8078469B2 (en) * | 1999-04-12 | 2011-12-13 | White George M | Distributed voice user interface |
US8036897B2 (en) | 1999-04-12 | 2011-10-11 | Smolenski Andrew G | Voice integration platform |
US20020072918A1 (en) * | 1999-04-12 | 2002-06-13 | White George M. | Distributed voice user interface |
US20060287854A1 (en) * | 1999-04-12 | 2006-12-21 | Ben Franklin Patent Holding Llc | Voice integration platform |
US20060293897A1 (en) * | 1999-04-12 | 2006-12-28 | Ben Franklin Patent Holding Llc | Distributed voice user interface |
US6757657B1 (en) * | 1999-09-03 | 2004-06-29 | Sony Corporation | Information processing apparatus, information processing method and program storage medium |
US20020184036A1 (en) * | 1999-12-29 | 2002-12-05 | Nachshon Margaliot | Apparatus and method for visible indication of speech |
US6810379B1 (en) * | 2000-04-24 | 2004-10-26 | Sensory, Inc. | Client/server architecture for text-to-speech synthesis |
USRE45884E1 (en) | 2000-06-30 | 2016-02-09 | Immersion Corporation | Chat interface with haptic feedback functionality |
US7159008B1 (en) * | 2000-06-30 | 2007-01-02 | Immersion Corporation | Chat interface with haptic feedback functionality |
US7747434B2 (en) | 2000-10-24 | 2010-06-29 | Speech Conversion Technologies, Inc. | Integrated speech recognition, closed captioning, and translation system and method |
US7130790B1 (en) | 2000-10-24 | 2006-10-31 | Global Translations, Inc. | System and method for closed caption data translation |
US20080052069A1 (en) * | 2000-10-24 | 2008-02-28 | Global Translation, Inc. | Integrated speech recognition, closed captioning, and translation system and method |
US20080140510A1 (en) * | 2000-10-31 | 2008-06-12 | Contextweb, Inc. | Internet contextual communication system |
US20110137725A1 (en) * | 2000-10-31 | 2011-06-09 | Anand Subramanian | Internet Contextual Communication System |
US20080140761A1 (en) * | 2000-10-31 | 2008-06-12 | Contextweb, Inc. | Internet contextual communication system |
US7945476B2 (en) | 2000-10-31 | 2011-05-17 | Context Web, Inc. | Internet contextual advertisement delivery system |
US20020123912A1 (en) * | 2000-10-31 | 2002-09-05 | Contextweb | Internet contextual communication system |
US20080281614A1 (en) * | 2000-10-31 | 2008-11-13 | Contextweb, Inc. | Internet contextual communication system |
US7912752B2 (en) | 2000-10-31 | 2011-03-22 | Context Web, Inc. | Internet contextual communication system |
US20040078265A1 (en) * | 2000-10-31 | 2004-04-22 | Anand Subramanian | Internet contextual communication system |
US9965765B2 (en) | 2000-10-31 | 2018-05-08 | Pulsepoint, Inc. | Internet contextual communication system |
US20080114774A1 (en) * | 2000-10-31 | 2008-05-15 | Contextweb, Inc. | Internet contextual communication system |
WO2002047067A2 (en) * | 2000-12-04 | 2002-06-13 | Sisbit Ltd. | Improved speech transformation system and apparatus |
WO2002047067A3 (en) * | 2000-12-04 | 2002-09-06 | Sisbit Ltd | Improved speech transformation system and apparatus |
WO2002075719A1 (en) * | 2001-03-15 | 2002-09-26 | Lips, Inc. | Methods and systems of simulating movement accompanying speech |
US20100049521A1 (en) * | 2001-06-15 | 2010-02-25 | Nuance Communications, Inc. | Selective enablement of speech recognition grammars |
US9196252B2 (en) | 2001-06-15 | 2015-11-24 | Nuance Communications, Inc. | Selective enablement of speech recognition grammars |
US20030135375A1 (en) * | 2002-01-14 | 2003-07-17 | Bloomstein Richard W. | Encoding speech segments for economical transmission and automatic playing at remote computers |
US20040015361A1 (en) * | 2002-07-22 | 2004-01-22 | Bloomstein Richard W. | Encoding media data for decompression at remote computers employing automatic decoding options |
US7558732B2 (en) * | 2002-09-23 | 2009-07-07 | Infineon Technologies Ag | Method and system for computer-aided speech synthesis |
US20050216267A1 (en) * | 2002-09-23 | 2005-09-29 | Infineon Technologies Ag | Method and system for computer-aided speech synthesis |
US7593842B2 (en) * | 2002-12-10 | 2009-09-22 | Leslie Rousseau | Device and method for translating language |
US20040122678A1 (en) * | 2002-12-10 | 2004-06-24 | Leslie Rousseau | Device and method for translating language |
US20060195323A1 (en) * | 2003-03-25 | 2006-08-31 | Jean Monne | Distributed speech recognition system |
US7627478B2 (en) | 2003-04-25 | 2009-12-01 | At&T Intellectual Property Ii, L.P. | System for low-latency animation of talking heads |
US20100076750A1 (en) * | 2003-04-25 | 2010-03-25 | At&T Corp. | System for Low-Latency Animation of Talking Heads |
US20040215460A1 (en) * | 2003-04-25 | 2004-10-28 | Eric Cosatto | System for low-latency animation of talking heads |
US8086464B2 (en) | 2003-04-25 | 2011-12-27 | At&T Intellectual Property Ii, L.P. | System for low-latency animation of talking heads |
US7260539B2 (en) * | 2003-04-25 | 2007-08-21 | At&T Corp. | System for low-latency animation of talking heads |
US20080015861A1 (en) * | 2003-04-25 | 2008-01-17 | At&T Corp. | System for low-latency animation of talking heads |
US20080126491A1 (en) * | 2004-05-14 | 2008-05-29 | Koninklijke Philips Electronics, N.V. | Method for Transmitting Messages from a Sender to a Recipient, a Messaging System and Message Converting Means |
US20060122836A1 (en) * | 2004-12-08 | 2006-06-08 | International Business Machines Corporation | Dynamic switching between local and remote speech rendering |
US8024194B2 (en) | 2004-12-08 | 2011-09-20 | Nuance Communications, Inc. | Dynamic switching between local and remote speech rendering |
US20080195386A1 (en) * | 2005-05-31 | 2008-08-14 | Koninklijke Philips Electronics, N.V. | Method and a Device For Performing an Automatic Dubbing on a Multimedia Signal |
US8751302B2 (en) | 2005-08-11 | 2014-06-10 | Pulsepoint, Inc. | Method and system for placement and pricing of internet-based advertisements or services |
US10672039B2 (en) | 2005-08-11 | 2020-06-02 | Pulsepoint, Inc. | Assembling internet display pages with content provided from multiple servers after failure of one server |
US20070055569A1 (en) * | 2005-08-11 | 2007-03-08 | Contextweb | Method and system for placement and pricing of internet-based advertisements or services |
US10453078B2 (en) | 2006-01-26 | 2019-10-22 | Pulsepoint, Inc. | Open insertion order system to interface with an exchange for internet ad media |
US20090012903A1 (en) * | 2006-01-26 | 2009-01-08 | Contextweb, Inc. | Online exchange for internet ad media |
US20100023396A1 (en) * | 2006-01-26 | 2010-01-28 | ContextWeb,Inc. | New open insertion order system to interface with an exchange for internet ad media |
US20080287147A1 (en) * | 2007-05-18 | 2008-11-20 | Immersion Corporation | Haptically Enabled Messaging |
US8315652B2 (en) | 2007-05-18 | 2012-11-20 | Immersion Corporation | Haptically enabled messaging |
US9197735B2 (en) | 2007-05-18 | 2015-11-24 | Immersion Corporation | Haptically enabled messaging |
DE102007043264A1 (en) * | 2007-09-11 | 2009-03-12 | Siemens Ag | Speech signal e.g. traffic information, outputting device for e.g. navigation car radio, has signal store storing speech signal, and output unit outputting speech signal upon recognized sentence limitation of speech signal |
US8612228B2 (en) * | 2009-03-31 | 2013-12-17 | Namco Bandai Games Inc. | Character mouth shape control method |
US20100250256A1 (en) * | 2009-03-31 | 2010-09-30 | Namco Bandai Games Inc. | Character mouth shape control method |
US10032455B2 (en) | 2011-01-07 | 2018-07-24 | Nuance Communications, Inc. | Configurable speech recognition system using a pronunciation alignment between multiple recognizers |
US9953653B2 (en) | 2011-01-07 | 2018-04-24 | Nuance Communications, Inc. | Configurable speech recognition system using multiple recognizers |
US8930194B2 (en) | 2011-01-07 | 2015-01-06 | Nuance Communications, Inc. | Configurable speech recognition system using multiple recognizers |
US8898065B2 (en) | 2011-01-07 | 2014-11-25 | Nuance Communications, Inc. | Configurable speech recognition system using multiple recognizers |
US10049669B2 (en) | 2011-01-07 | 2018-08-14 | Nuance Communications, Inc. | Configurable speech recognition system using multiple recognizers |
WO2012148369A1 (en) * | 2011-04-27 | 2012-11-01 | Echostar Ukraine L.L.C. | Content receiver system and method for providing supplemental content in translated and/or audio form |
US9826270B2 (en) | 2011-04-27 | 2017-11-21 | Echostar Ukraine Llc | Content receiver system and method for providing supplemental content in translated and/or audio form |
US9876911B2 (en) | 2011-10-17 | 2018-01-23 | At&T Intellectual Property I, L.P. | System and method for augmenting features of visual voice mail |
US9596351B2 (en) | 2011-10-17 | 2017-03-14 | At&T Intellectual Property I, L.P. | System and method for augmenting features of visual voice mail |
US9628627B2 (en) | 2011-10-17 | 2017-04-18 | AT&T Illectual Property I, L.P. | System and method for visual voice mail in a multi-screen environment |
US9444941B2 (en) | 2011-10-17 | 2016-09-13 | At&T Intellectual Property I, L.P. | Delivery of visual voice mail |
US9769316B2 (en) | 2011-10-17 | 2017-09-19 | At&T Intellectual Property I, L.P. | System and method for callee-caller specific greetings for voice mail |
US10735595B2 (en) | 2011-10-17 | 2020-08-04 | At&T Intellectual Property I, L.P. | Visual voice mail delivery mechanisms |
US9282185B2 (en) | 2011-10-17 | 2016-03-08 | At&T Intellectual Property I, L.P. | System and method for callee-caller specific greetings for voice mail |
US9258683B2 (en) | 2011-10-17 | 2016-02-09 | At&T Intellectual Property I, L.P. | Delivery of visual voice mail |
US9584666B2 (en) | 2011-10-17 | 2017-02-28 | At&T Intellectual Property I, L.P. | Visual voice mail delivery mechanisms |
US9042527B2 (en) | 2011-10-17 | 2015-05-26 | At&T Intellectual Property I, L.P. | Visual voice mail delivery mechanisms |
US9025739B2 (en) | 2011-10-20 | 2015-05-05 | At&T Intellectual Property I, L.P. | System and method for visual voice mail in a multi-screen environment |
US8515029B2 (en) | 2011-11-02 | 2013-08-20 | At&T Intellectual Property I, L.P. | System and method for visual voice mail in an LTE environment |
US20130122871A1 (en) * | 2011-11-16 | 2013-05-16 | At & T Intellectual Property I, L.P. | System And Method For Augmenting Features Of Visual Voice Mail |
US8489075B2 (en) * | 2011-11-16 | 2013-07-16 | At&T Intellectual Property I, L.P. | System and method for augmenting features of visual voice mail |
US20130339007A1 (en) * | 2012-06-18 | 2013-12-19 | International Business Machines Corporation | Enhancing comprehension in voice communications |
US9824695B2 (en) * | 2012-06-18 | 2017-11-21 | International Business Machines Corporation | Enhancing comprehension in voice communications |
US9886944B2 (en) | 2012-10-04 | 2018-02-06 | Nuance Communications, Inc. | Hybrid controller for ASR |
US10971157B2 (en) | 2017-01-11 | 2021-04-06 | Nuance Communications, Inc. | Methods and apparatus for hybrid speech recognition processing |
US11990135B2 (en) | 2017-01-11 | 2024-05-21 | Microsoft Technology Licensing, Llc | Methods and apparatus for hybrid speech recognition processing |
WO2018132721A1 (en) * | 2017-01-12 | 2018-07-19 | The Regents Of The University Of Colorado, A Body Corporate | Method and system for implementing three-dimensional facial modeling and visual speech synthesis |
US11145100B2 (en) * | 2017-01-12 | 2021-10-12 | The Regents Of The University Of Colorado, A Body Corporate | Method and system for implementing three-dimensional facial modeling and visual speech synthesis |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US5943648A (en) | Speech signal distribution system providing supplemental parameter associated data | |
US12033611B2 (en) | Generating expressive speech audio from text data | |
US9318100B2 (en) | Supplementing audio recorded in a media file | |
US6510413B1 (en) | Distributed synthetic speech generation | |
US9196241B2 (en) | Asynchronous communications using messages recorded on handheld devices | |
US5696879A (en) | Method and apparatus for improved voice transmission | |
US5774854A (en) | Text to speech system | |
US6098041A (en) | Speech synthesis system | |
US5911129A (en) | Audio font used for capture and rendering | |
US6463412B1 (en) | High performance voice transformation apparatus and method | |
Syrdal et al. | Applied speech technology | |
JP2003521750A (en) | Speech system | |
EP0458859A4 (en) | Text to speech synthesis system and method using context dependent vowell allophones | |
WO2023276539A1 (en) | Voice conversion device, voice conversion method, program, and recording medium | |
US20080162559A1 (en) | Asynchronous communications regarding the subject matter of a media file stored on a handheld recording device | |
US11915689B1 (en) | Generating audio using auto-regressive generative neural networks | |
JP4884212B2 (en) | Speech synthesizer | |
US7778833B2 (en) | Method and apparatus for using computer generated voice | |
JP2005215888A (en) | Display device for text sentence | |
Westall et al. | Speech technology for telecommunications | |
US8219402B2 (en) | Asynchronous receipt of information from a user | |
Henton | Challenges and rewards in using parametric or concatenative speech synthesis | |
CN114724540A (en) | Model processing method and device, emotion voice synthesis method and device | |
JP7572388B2 (en) | Data processing device, data processing method and program | |
JP3830200B2 (en) | Human image synthesizer |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: CENTIGRAM COMMUNICATIONS CORPORATION, CALIFORNIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:TEL, MICHAEL P.;REEL/FRAME:008090/0179 Effective date: 19960424 |
|
AS | Assignment |
Owner name: LERNOUT & HAUSPIE SPEECH PRODUCTS N.V., A BELGIAN Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:CENTRIGRAM COMMUNICATIONS CORPORATION, A DELAWARE CORPORATION;REEL/FRAME:008621/0636 Effective date: 19970630 |
|
STCF | Information on status: patent grant |
Free format text: PATENTED CASE |
|
AS | Assignment |
Owner name: MICROSOFT CORPORATION, WASHINGTON Free format text: PATENT LICENSE AGREEMENT;ASSIGNOR:LERNOUT & HAUSPIE SPEECH PRODUCTS;REEL/FRAME:012539/0977 Effective date: 19970910 |
|
AS | Assignment |
Owner name: SCANSOFT, INC., MASSACHUSETTS Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:LERNOUT & HAUSPIE SPEECH PRODUCTS, N.V.;REEL/FRAME:012775/0308 Effective date: 20011212 |
|
FEPP | Fee payment procedure |
Free format text: PAT HOLDER NO LONGER CLAIMS SMALL ENTITY STATUS, ENTITY STATUS SET TO UNDISCOUNTED (ORIGINAL EVENT CODE: STOL); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
REFU | Refund |
Free format text: REFUND - SURCHARGE, PETITION TO ACCEPT PYMT AFTER EXP, UNINTENTIONAL (ORIGINAL EVENT CODE: R2551); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
FPAY | Fee payment |
Year of fee payment: 4 |
|
AS | Assignment |
Owner name: NUANCE COMMUNICATIONS, INC., MASSACHUSETTS Free format text: MERGER AND CHANGE OF NAME TO NUANCE COMMUNICATIONS, INC.;ASSIGNOR:SCANSOFT, INC.;REEL/FRAME:016914/0975 Effective date: 20051017 |
|
AS | Assignment |
Owner name: USB AG, STAMFORD BRANCH,CONNECTICUT Free format text: SECURITY AGREEMENT;ASSIGNOR:NUANCE COMMUNICATIONS, INC.;REEL/FRAME:017435/0199 Effective date: 20060331 Owner name: USB AG, STAMFORD BRANCH, CONNECTICUT Free format text: SECURITY AGREEMENT;ASSIGNOR:NUANCE COMMUNICATIONS, INC.;REEL/FRAME:017435/0199 Effective date: 20060331 |
|
AS | Assignment |
Owner name: USB AG. STAMFORD BRANCH,CONNECTICUT Free format text: SECURITY AGREEMENT;ASSIGNOR:NUANCE COMMUNICATIONS, INC.;REEL/FRAME:018160/0909 Effective date: 20060331 Owner name: USB AG. STAMFORD BRANCH, CONNECTICUT Free format text: SECURITY AGREEMENT;ASSIGNOR:NUANCE COMMUNICATIONS, INC.;REEL/FRAME:018160/0909 Effective date: 20060331 |
|
FPAY | Fee payment |
Year of fee payment: 8 |
|
FPAY | Fee payment |
Year of fee payment: 12 |
|
AS | Assignment |
Owner name: NORTHROP GRUMMAN CORPORATION, A DELAWARE CORPORATI Free format text: PATENT RELEASE (REEL:018160/FRAME:0909);ASSIGNOR:MORGAN STANLEY SENIOR FUNDING, INC., AS ADMINISTRATIVE AGENT;REEL/FRAME:038770/0869 Effective date: 20160520 Owner name: SPEECHWORKS INTERNATIONAL, INC., A DELAWARE CORPOR Free format text: PATENT RELEASE (REEL:017435/FRAME:0199);ASSIGNOR:MORGAN STANLEY SENIOR FUNDING, INC., AS ADMINISTRATIVE AGENT;REEL/FRAME:038770/0824 Effective date: 20160520 Owner name: NUANCE COMMUNICATIONS, INC., AS GRANTOR, MASSACHUS Free format text: PATENT RELEASE (REEL:018160/FRAME:0909);ASSIGNOR:MORGAN STANLEY SENIOR FUNDING, INC., AS ADMINISTRATIVE AGENT;REEL/FRAME:038770/0869 Effective date: 20160520 Owner name: TELELOGUE, INC., A DELAWARE CORPORATION, AS GRANTO Free format text: PATENT RELEASE (REEL:017435/FRAME:0199);ASSIGNOR:MORGAN STANLEY SENIOR FUNDING, INC., AS ADMINISTRATIVE AGENT;REEL/FRAME:038770/0824 Effective date: 20160520 Owner name: DSP, INC., D/B/A DIAMOND EQUIPMENT, A MAINE CORPOR Free format text: PATENT RELEASE (REEL:017435/FRAME:0199);ASSIGNOR:MORGAN STANLEY SENIOR FUNDING, INC., AS ADMINISTRATIVE AGENT;REEL/FRAME:038770/0824 Effective date: 20160520 Owner name: NOKIA CORPORATION, AS GRANTOR, FINLAND Free format text: PATENT RELEASE (REEL:018160/FRAME:0909);ASSIGNOR:MORGAN STANLEY SENIOR FUNDING, INC., AS ADMINISTRATIVE AGENT;REEL/FRAME:038770/0869 Effective date: 20160520 Owner name: SCANSOFT, INC., A DELAWARE CORPORATION, AS GRANTOR Free format text: PATENT RELEASE (REEL:018160/FRAME:0909);ASSIGNOR:MORGAN STANLEY SENIOR FUNDING, INC., AS ADMINISTRATIVE AGENT;REEL/FRAME:038770/0869 Effective date: 20160520 Owner name: STRYKER LEIBINGER GMBH & CO., KG, AS GRANTOR, GERM Free format text: PATENT RELEASE (REEL:018160/FRAME:0909);ASSIGNOR:MORGAN STANLEY SENIOR FUNDING, INC., AS ADMINISTRATIVE AGENT;REEL/FRAME:038770/0869 Effective date: 20160520 Owner name: SPEECHWORKS INTERNATIONAL, INC., A DELAWARE CORPOR Free format text: PATENT RELEASE (REEL:018160/FRAME:0909);ASSIGNOR:MORGAN STANLEY SENIOR FUNDING, INC., AS ADMINISTRATIVE AGENT;REEL/FRAME:038770/0869 Effective date: 20160520 Owner name: MITSUBISH DENKI KABUSHIKI KAISHA, AS GRANTOR, JAPA Free format text: PATENT RELEASE (REEL:018160/FRAME:0909);ASSIGNOR:MORGAN STANLEY SENIOR FUNDING, INC., AS ADMINISTRATIVE AGENT;REEL/FRAME:038770/0869 Effective date: 20160520 Owner name: DICTAPHONE CORPORATION, A DELAWARE CORPORATION, AS Free format text: PATENT RELEASE (REEL:017435/FRAME:0199);ASSIGNOR:MORGAN STANLEY SENIOR FUNDING, INC., AS ADMINISTRATIVE AGENT;REEL/FRAME:038770/0824 Effective date: 20160520 Owner name: ART ADVANCED RECOGNITION TECHNOLOGIES, INC., A DEL Free format text: PATENT RELEASE (REEL:018160/FRAME:0909);ASSIGNOR:MORGAN STANLEY SENIOR FUNDING, INC., AS ADMINISTRATIVE AGENT;REEL/FRAME:038770/0869 Effective date: 20160520 Owner name: DICTAPHONE CORPORATION, A DELAWARE CORPORATION, AS Free format text: PATENT RELEASE (REEL:018160/FRAME:0909);ASSIGNOR:MORGAN STANLEY SENIOR FUNDING, INC., AS ADMINISTRATIVE AGENT;REEL/FRAME:038770/0869 Effective date: 20160520 Owner name: DSP, INC., D/B/A DIAMOND EQUIPMENT, A MAINE CORPOR Free format text: PATENT RELEASE (REEL:018160/FRAME:0909);ASSIGNOR:MORGAN STANLEY SENIOR FUNDING, INC., AS ADMINISTRATIVE AGENT;REEL/FRAME:038770/0869 Effective date: 20160520 Owner name: HUMAN CAPITAL RESOURCES, INC., A DELAWARE CORPORAT Free format text: PATENT RELEASE (REEL:018160/FRAME:0909);ASSIGNOR:MORGAN STANLEY SENIOR FUNDING, INC., AS ADMINISTRATIVE AGENT;REEL/FRAME:038770/0869 Effective date: 20160520 Owner name: INSTITIT KATALIZA IMENI G.K. BORESKOVA SIBIRSKOGO Free format text: PATENT RELEASE (REEL:018160/FRAME:0909);ASSIGNOR:MORGAN STANLEY SENIOR FUNDING, INC., AS ADMINISTRATIVE AGENT;REEL/FRAME:038770/0869 Effective date: 20160520 Owner name: TELELOGUE, INC., A DELAWARE CORPORATION, AS GRANTO Free format text: PATENT RELEASE (REEL:018160/FRAME:0909);ASSIGNOR:MORGAN STANLEY SENIOR FUNDING, INC., AS ADMINISTRATIVE AGENT;REEL/FRAME:038770/0869 Effective date: 20160520 Owner name: NUANCE COMMUNICATIONS, INC., AS GRANTOR, MASSACHUS Free format text: PATENT RELEASE (REEL:017435/FRAME:0199);ASSIGNOR:MORGAN STANLEY SENIOR FUNDING, INC., AS ADMINISTRATIVE AGENT;REEL/FRAME:038770/0824 Effective date: 20160520 Owner name: ART ADVANCED RECOGNITION TECHNOLOGIES, INC., A DEL Free format text: PATENT RELEASE (REEL:017435/FRAME:0199);ASSIGNOR:MORGAN STANLEY SENIOR FUNDING, INC., AS ADMINISTRATIVE AGENT;REEL/FRAME:038770/0824 Effective date: 20160520 Owner name: SCANSOFT, INC., A DELAWARE CORPORATION, AS GRANTOR Free format text: PATENT RELEASE (REEL:017435/FRAME:0199);ASSIGNOR:MORGAN STANLEY SENIOR FUNDING, INC., AS ADMINISTRATIVE AGENT;REEL/FRAME:038770/0824 Effective date: 20160520 |