[go: up one dir, main page]

Speechify API is the HTTP service. Here are the quick steps to get you started:

  1. First you need an account. Go to https://console.sws.speechify.com/ and create one.
  2. Navigate to the API Keys section and create one, giving it some descriptive name. Copy this key, we will refer to it as YOUR_API_KEY throughout this tutorial.
  3. We now encourage you to experiment with the Playground: go check the different voices and generate the audio for some sample texts. We hope you love what you hear!
  4. To start talking to our API from code you will need:
    1. the API base URL: https://api.sws.speechify.com/
    2. YOUR_API_KEY
    3. HTTP client of your choice (i.e. curl for the shell scripts, fetch for Node.js, etc.)

Check the Recipes section for the list of examples that demonstrate various usages of the API.

[BASIC] Get a list of available voices

Please refer to our detailed documentation for the full list of the available API endpoints, their params and return types.

Secure API Access

To ensure the security and integrity of your interactions with the API, we employ API key authentication. Start by obtaining your unique API key (see the previous section).

You have to include your API key in the Authorization header of each request:

Authorization: Bearer YOUR_API_KEY

Without a valid API key, requests will be met with a 401 Unauthorized status, ensuring that your data and interactions remain protected.

Crafting Your input with SSML

The input parameter of the audio generation endpoint (/v1/audio/speech) is a special one.

For the most trivial use-cases, you can send it as plain text. This works, but it doesn't give you the fine-grained control over how the speech is synthesized.

For anything beyond trivial, we recommend your input to be wrapped into the Speech Synthesis Markup Language (SSML). While it may look like an unnecessary complication, SSML offers you meticulous control over how your text is spoken.

SSML, an XML-based markup language, empowers you to enrich your audio content with nuances such as tone, emphasis, and emotional delivery, using tags like <prosody>, <break>, and <emphasis>.

For the simplest use-cases, your text should be wrapped in the<speak> tag:

<speak>Your content to be synthesized here</speak>

For an in-depth exploration of how SSML can transform your content and to stay updated on future enhancements, visit our SSML documentation.

Example: changing the speed of voice

Depending on your specific use case, you may want the speech to go slower or faster than what you get by default. This is a common request, and a great example of where SSML is worth every trouble. Please check the <prosody> tag documentation for how to adjust not only the speed (rate), but also the voice pitch and volume.

Custom (Cloned) Voices

Not only does Speechify provide an extensive list of standard voices, both male and female, it also lets you create a digitized version of any human voice, for example, your own.

Please note that this is the advanced feature only available to the paying customers.

You can start experimenting with cloned voices right from your browser. Upload or record a sample of your voice, and the new entry will appear in the voice select.

You can of course also create the custom voice via an API call, and use such voice IDs for the speech synthesis.


Happy building with Speechify's Text-to-Speech API!

" style="margin-left:31px" class="rm-Markdown markdown-body rm-Markdown markdown-body ng-non-bindable" data-testid="RDMD">

Speechify's Text-to-Speech API is designed to seamlessly integrate cutting-edge TTS capabilities into your digital offerings, providing an unparalleled auditory experience for your users. Ideal for automating customer calls, educational platforms, media and entertainment, accessibility, gaming, and more.

Our API is built upon a cutting-edge, proprietary AI model developed in-house by our team of researchers. This model has been behind Speechify's Reader Apps – the world's largest text-to-speech consumer apps, with a user base of over 23 million people. For more than two years, it has been powering not only our reader apps but also the text-to-speech experiences of Medium.com, Artifact, Walmart, Quadrant, Carnegie Learning, and hundreds of other products. We are thrilled to now open our technology to the world, enabling any business or developer to harness the power of our state-of-the-art AI model and elevate their audio experiences.

Getting Started

Speechify API is the HTTP service. Here are the quick steps to get you started:

  1. First you need an account. Go to https://console.sws.speechify.com/ and create one.
  2. Navigate to the API Keys section and create one, giving it some descriptive name. Copy this key, we will refer to it as YOUR_API_KEY throughout this tutorial.
  3. We now encourage you to experiment with the Playground: go check the different voices and generate the audio for some sample texts. We hope you love what you hear!
  4. To start talking to our API from code you will need:
    1. the API base URL: https://api.sws.speechify.com/
    2. YOUR_API_KEY
    3. HTTP client of your choice (i.e. curl for the shell scripts, fetch for Node.js, etc.)

Check the Recipes section for the list of examples that demonstrate various usages of the API.

Please refer to our detailed documentation for the full list of the available API endpoints, their params and return types.

Secure API Access

To ensure the security and integrity of your interactions with the API, we employ API key authentication. Start by obtaining your unique API key (see the previous section).

You have to include your API key in the Authorization header of each request:

Authorization: Bearer YOUR_API_KEY

Without a valid API key, requests will be met with a 401 Unauthorized status, ensuring that your data and interactions remain protected.

Crafting Your input with SSML

The input parameter of the audio generation endpoint (/v1/audio/speech) is a special one.

For the most trivial use-cases, you can send it as plain text. This works, but it doesn't give you the fine-grained control over how the speech is synthesized.

For anything beyond trivial, we recommend your input to be wrapped into the Speech Synthesis Markup Language (SSML). While it may look like an unnecessary complication, SSML offers you meticulous control over how your text is spoken.

SSML, an XML-based markup language, empowers you to enrich your audio content with nuances such as tone, emphasis, and emotional delivery, using tags like <prosody>, <break>, and <emphasis>.

For the simplest use-cases, your text should be wrapped in the<speak> tag:

<speak>Your content to be synthesized here</speak>

For an in-depth exploration of how SSML can transform your content and to stay updated on future enhancements, visit our SSML documentation.

Example: changing the speed of voice

Depending on your specific use case, you may want the speech to go slower or faster than what you get by default. This is a common request, and a great example of where SSML is worth every trouble. Please check the <prosody> tag documentation for how to adjust not only the speed (rate), but also the voice pitch and volume.

Custom (Cloned) Voices

Not only does Speechify provide an extensive list of standard voices, both male and female, it also lets you create a digitized version of any human voice, for example, your own.

Please note that this is the advanced feature only available to the paying customers.

You can start experimenting with cloned voices right from your browser. Upload or record a sample of your voice, and the new entry will appear in the voice select.

You can of course also create the custom voice via an API call, and use such voice IDs for the speech synthesis.


Happy building with Speechify's Text-to-Speech API!