Web Speech API
The Web Speech API aims to enable web developers to provide, in a web
browser, speech-input and text-to-speech output features that are typically not
available when using standard speech-recognition or screen-reader software.
The API itself is agnostic of the underlying speech recognition and synthesis
implementation and can support both server-based and
client-based/embedded recognition and synthesis. The API is designed to
enable both brief (one-shot) speech input and continuous speech input.
Speech recognition results are provided to the web page as a list of
hypotheses, along with other relevant information for each hypothesis.
Speech recognition
Speech recognition involves receiving speech through a device's
microphone, which is then checked by a speech recognition
service against a list of grammar (basically, the vocabulary you
want to have recognized in a particular app.) When a word or
phrase is successfully recognized, it is returned as a result (or
list of results) as a text string, and further actions can be
initiated
<h1>Speech color changer</h1>
<p>Tap/click then say a color to change the background color of
the app.</p>
<div>
<p class="output"><em>…diagnostic messages</em></p>
</div>
const SpeechRecognition = window.SpeechRecognition ||
webkitSpeechRecognition;
const SpeechGrammarList = window.SpeechGrammarList ||
webkitSpeechGrammarList;
const SpeechRecognitionEvent =
window.SpeechRecognitionEvent ||
webkitSpeechRecognitionEvent;
SpeechGrammar
The SpeechGrammar interface of the Web Speech
API represents a set of words or patterns of words that we
want the recognition service to recognize.
Grammar is defined using JSpeech Grammar Format (JSGF.)
Other formats may also be supported in the future.
SpeechRecognitionAlternative
The SpeechRecognitionAlternative interface of the Web
Speech API represents a single word that has been recognized
by the speech recognition service.
Instance properties
SpeechRecognitionAlternative.transcript Read only
Returns a string containing the transcript of the recognized
word.
SpeechRecognitionAlternative.confidence Read only
Returns a numeric estimate between 0 and 1 of how
confident the speech recognition system is that the
recognition is correct.
SpeechRecognitionErrorEvent
The SpeechRecognitionErrorEvent interface of the Web Speech
API represents error messages from the recognition service.
SpeechSynthesisVoice
The SpeechSynthesisVoice interface of the Web Speech
API represents a voice that the system supports.
Every SpeechSynthesisVoice has its own relative speech service
including information about language, name and URI.
Instance properties
SpeechSynthesisVoice.default Read only
A boolean value indicating whether the voice is the default
voice for the current app language (true), or not (false.)
SpeechSynthesisVoice.lang Read only
Returns a BCP 47 language tag indicating the language of
the voice.
SpeechSynthesisVoice.localService Read only
A boolean value indicating whether the voice is supplied
by a local speech synthesizer service (true), or a remote
speech synthesizer service (false.)
SpeechSynthesisVoice.name Read only
Returns a human-readable name that represents the
voice.
SpeechSynthesisVoice.voiceURI Read only
Returns the type of URI and location of the speech
synthesis service for this voice.
SpeechSynthesis
The SpeechSynthesis interface of the Web Speech API is the
controller interface for the speech service; this can be used to
retrieve information about the synthesis voices available on the
device, start and pause speech, and other commands besides.
EventTargetSpeechSynthesis
Instance properties
SpeechSynthesis also inherits properties from its parent
interface, EventTarget.
SpeechSynthesis.paused Read only
A boolean value that returns true if
the SpeechSynthesis object is in a paused state.
SpeechSynthesis.pending Read only
A boolean value that returns true if the utterance queue
contains as-yet-unspoken utterances.
SpeechSynthesis.speaking Read only
A boolean value that returns true if an utterance is
currently in the process of being spoken — even
if SpeechSynthesis is in a paused state.
Instance methods
SpeechSynthesis also inherits methods from its parent
interface, EventTarget.
SpeechSynthesis.cancel()
Removes all utterances from the utterance queue.
SpeechSynthesis.getVoices()
Returns a list of SpeechSynthesisVoice objects
representing all the available voices on the current device.
SpeechSynthesis.pause()
Puts the SpeechSynthesis object into a paused state.
SpeechSynthesis.resume()
Puts the SpeechSynthesis object into a non-paused state:
resumes it if it was already paused.
SpeechSynthesis.speak()
Adds an utterance to the utterance queue; it will be
spoken when any other utterances queued before it have
been spoken.
Events
Listen to this event using addEventListener() or by assigning an
event listener to the oneventname property of this interface.
voiceschanged
Fired when the list of SpeechSynthesisVoice objects that
would be returned by
the SpeechSynthesis.getVoices() method has changed.
Also available via the onvoiceschanged property.
Speech recognition is accessed via
the SpeechRecognition interface, which provides the ability to
recognize voice context from an audio input (normally via the
device's default speech recognition service) and respond
appropriately. Generally you'll use the interface's constructor to
create a new SpeechRecognition object, which has a number of
event handlers available for detecting when speech is input
through the device's microphone. The SpeechGrammar interface
represents a container for a particular set of grammar that your
app should recognize. Grammar is defined using JSpeech
Grammar Format (JSGF.)
Speech synthesis is accessed via the SpeechSynthesis interface, a
text-to-speech component that allows programs to read out their
text content (normally via the device's default speech
synthesizer.) Different voice types are represented
by SpeechSynthesisVoice objects, and different parts of text that
you want to be spoken are represented
by SpeechSynthesisUtterance objects. You can get these spoken
by passing them to the SpeechSynthesis.speak() method.
For more details on using these features, see Using the Web Speech
API.
Web Speech API Interfaces
Speech recognition
SpeechRecognition
The controller interface for the recognition service; this also
handles the SpeechRecognitionEvent sent from the recognition
service.
SpeechRecognitionAlternative
Represents a single word that has been recognized by the speech
recognition service.
SpeechRecognitionErrorEvent
Represents error messages from the recognition service.
SpeechRecognitionEvent
The event object for the result and nomatch events, and contains
all the data associated with an interim or final speech recognition
result.
SpeechGrammar
The words or patterns of words that we want the recognition
service to recognize.
SpeechGrammarList
Represents a list of SpeechGrammar objects.
SpeechRecognitionResult
Represents a single recognition match, which may contain
multiple SpeechRecognitionAlternative objects.
SpeechRecognitionResultList
Represents a list of SpeechRecognitionResult objects, or a single
one if results are being captured in continuous mode.
Speech synthesis
SpeechSynthesis
The controller interface for the speech service; this can be used to
retrieve information about the synthesis voices available on the
device, start and pause speech, and other commands besides.
SpeechSynthesisErrorEvent
Contains information about any errors that occur while
processing SpeechSynthesisUtterance objects in the speech
service.
SpeechSynthesisEvent
Contains information about the current state
of SpeechSynthesisUtterance objects that have been processed in
the speech service.
SpeechSynthesisUtterance
Represents a speech request. It contains the content the speech
service should read and information about how to read it (e.g.
language, pitch and volume.)
SpeechSynthesisVoice
Represents a voice that the system supports.
Every SpeechSynthesisVoice has its own relative speech service
including information about language, name and URI.
Window.speechSynthesis
Specified out as part of a [NoInterfaceObject] interface
called SpeechSynthesisGetter, and Implemented by
the Window object, the speechSynthesis property provides access
to the SpeechSynthesis controller, and therefore the entry point
to speech synthesis functionality.