8000 Improved TTS Integration documentation (#39207) · home-assistant/home-assistant.io@b9999c0 · GitHub
[go: up one dir, main page]

Skip to content

Commit b9999c0

Browse files
lukakamacoderabbitai[bot]Copilot
authored
Improved TTS Integration documentation (#39207)
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
1 parent 77def4b commit b9999c0

File tree

1 file changed

+37
-4
lines changed

1 file changed

+37
-4
lines changed

source/_integrations/tts.markdown

Lines changed: 37 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -46,7 +46,7 @@ Screenshot showing the state of a text-to-speech entity in the developer tools.
4646

4747
Modern platforms will create entities under the `tts` domain, where each entity represents one text-to-speech service provider. These entities may be used as targets for the `tts.speak` action.
4848

49-
the `tts.speak` action supports `language` and on some platforms also `options` for settings, e.g., _voice, motion, speed, etc_. The text that should be spoken is set with `message`, and the media player that should output the sound is selected with `media_player_entity_id`.
49+
The `tts.speak` action supports `message`, `language`, `cache`, `media_player_entity_id` and `options` options. The text that should be spoken is set with `message`, and the media player that should output the sound is selected with `media_player_entity_id`. The language can be set with `language`, using the format required by the target entity platform (refer to specific platform documentation). See [cache section](#cache) for information on `cache` option. Additional settings can be specified with the `options` option, which include preferred audio settings (see [preferred audio settings](#preferred-audio-settings) section for more info) and further settings of the target entity platform, e.g., _voice, motion, speed, etc._ (refer to specific platform documentation for any supported settings).
5050

5151
```yaml
5252
action: tts.speak
@@ -59,7 +59,7 @@ data:
5959
6060
### Action say (legacy)
6161
62-
The `say` action supports `language` and on some platforms also `options` for settings, e.g., _voice, motion, speed, etc_. The text that should be spoken is set with `message`. Since release 0.92, action name can be defined in configuration `service_name` option.
62+
The `say` action supports `message`, `language`, `cache` and `options` options. The text that should be spoken is set with `message`. The language can be set with `language`, using the format required by the platform (refer to specific platform documentation). See [cache section](#cache) for information on `cache` option. Additional settings can be specified with the `options` option, which include preferred audio settings (see [preferred audio settings](#preferred-audio-settings) section for more info) and further settings of the target platform, e.g., _voice, motion, speed, etc._ (refer to specific platform documentation for any supported settings). Since release 0.92, action name can be defined in configuration `service_name` option.
6363

6464
Say to all `media_player` entities:
6565

@@ -105,13 +105,40 @@ data:
105105

106106
## Cache
107107

108-
The integration cache can be controlled with the `cache` option in the action to `speak` or `say`. A long time cache will be located on the file system. The in-memory cache for fast responses to media players will be auto-cleaned after a short period.
108+
The integration cache can be controlled with the `cache` option in the action to `speak` or `say`, setting it to `True` to enable it (default), or `False` to disable it. A long time cache will be located on the file system. The in-memory cache for fast responses to media players will be auto-cleaned after a short period.
109+
110+
## Preferred audio settings
111+
112+
Each TTS platform produces audio samples in different formats, not always compatible with every media player. TTS integration building block supports a way to configure preferred target audio format through `options` option of `speak` or `say` actions.
113+
114+
TTS integration building block uses [FFmpeg integration](/integrations/ffmpeg) to perform audio transcoding when target entity platform does not support one or all the specified preferred audio format settings (refer to specific platform documentation for any supported setting with related supported values).
115+
116+
Available preferred audio settings, all optional, are:
117+
118+
- `preferred_format`: Set the audio format. When not supported by the target entity platform, the value is a file extension like `wav`, `mp3`, `ogg`, etc., among ones supported by FFmpeg tool for output files.
119+
- `preferred_sample_rate`: Set the sample rate. When not supported by the target entity platform, the value is in Hz as a number, among ones supported by the `-ar` parameter of FFmpeg tool.
120+
- `preferred_sample_channels`: Set the number of audio channels. When not supported by the target entity platform, the value is a number among ones supported by the `-ac` parameter of FFmpeg tool.
121+
- `preferred_sample_bytes`: Set the audio bit sampling. When not supported by the target entity platform, can only be set to `2` to use 16-bit audio sampling (any other value is ignored).
122+
123+
Example to produce an MP3 audio at 22050Hz:
124+
125+
```yaml
126+
action: tts.speak
127+
target:
128+
entity_id: tts.example
129+
data:
130+
media_player_entity_id: media_player.kitchen
131+
message: "May the force be with you."
132+
options:
133+
preferred_format: mp3
134+
preferred_sample_rate: 22050
135+
```
109136

110137
## REST API
111138

112139
### POST `/api/tts_get_url`
113140

114-
Returns a URL to the generated TTS file. The `engine_id` or `platform` parameter together with `message` are required.
141+
Returns a URL to the generated TTS file. The `engine_id` (which is the entity id) or `platform` parameter together with `message` are required. Additional parameters `cache`, `language` and `options` are supported, as JSON attributes, as described for `speak` action.
115142

116143
```json
117144
{
@@ -166,3 +193,9 @@ These requirements present the following problems, all of which create problems
166193
- If you are using SSL (e.g., `https://yourhost.example.org/...`) then you _must_ use the hostname in the certificate (e.g., `external_url: https://yourhost.example.org`). You cannot use an IP address since the certificate won't be valid for the IP address, and the cast device will refuse the connection.
167194

168195
The recommended way to overcome these obstacles is to not manually configure a local Home Assistant URL.
196+
197+
### Partial, corrupted or no audio
198+
199+
Some media players could reproduce only partial, corrupted or no audio at all when the audio format is not fully supported. In such cases it is required to experiment with different combinations of audio formats, channels, sample rates and bits using [preferred audio settings](#preferred-audio-settings) options.
200+
201+
For example, some Google Cast devices skip initial audio part when the audio is sampled at 22050Hz, and to fix the problem it is required to set the `preferred_sample_rate` setting in the `options` option to `44100`.

0 commit comments

Comments
 (0)
0