Skip to main content
The Voice API provides real-time voice transcription and translation services using WebSocket streaming.
DeepL API for speech to text is now generally available via the v3 API endpoint for customers with a DeepL API Pro subscription. The supported scope of DeepL API for the speech to text functionality is covered in this documentation page.Please note that the existing provisions applying to customers’ DeepL API Pro subscription also apply to DeepL API for speech to text with the following applicable additions to the Terms and Conditions, the Service Specification and the Data Processing Agreement (as a new sub-processor has been added to serve specific languages for the API for speech to text).

Overview

The Voice API provides a way to open WebSocket streaming connections to transcribe and translate audio data. With each streaming connection, you can:
  • Send a single audio stream
  • Receive transcriptions in the source language
  • Receive translations in multiple target languages
The API uses a two-step flow:
  1. Request a streaming URL via POST request
  2. Stream audio via WebSocket

Getting Started

To start using the Voice API:
  1. Ensure you have a DeepL API Pro account with Voice API access
  2. Review the Request Stream documentation
  3. Review the WebSocket Streaming documentation
  4. Choose your audio format and configuration
  5. Implement the two-step flow in your application

Supported Languages

All source languages can be translated into any target language.
Source languages
Chinese
Czech
Dutch
English
French
German
Indonesian
Italian
Japanese
Korean
Polish
Portuguese
Romanian
Russian
Spanish
Swedish
Turkish
Ukrainian
Target languages
Arabic
Bulgarian
Chinese (Simplified)
Chinese (Traditional)
Czech
Danish
Dutch
English (American)
English (British)
Estonian
Finnish
French
German
Greek
Hebrew
Hungarian
Indonesian
Italian
Japanese
Korean
Latvian
Lithuanian
Norwegian Bokmål
Polish
Portuguese (Brazil)
Portuguese (Portugal)
Romanian
Russian
Slovak
Slovenian
Spanish
Swedish
Turkish
Ukrainian
Vietnamese

Supported Audio Formats

The API supports various common combinations of streaming codecs and containers with a single channel (mono) audio stream. For a detailed list, please refer to Source Media Content Type.
Audio CodecAudio ContainerRecommended Bitrate
PCM -256 kbps (16kHz), default recommendation
OPUS Matroska / Ogg / WebM32 kbps, recommended for low bandwidth scenarios
AACMatroska96 kbps
FLACFLAC / Matroska / Ogg256 kbps (16kHz)
MP3Matroska / MPEG128 kbps

Two-Step API Flow

The Voice API uses a two-step flow to initiate streaming.
1

Request Stream

Make a POST request v3/voice/realtime to obtain an ephemeral streaming URL and authentication token. The response will look like this:
{
  "streaming_url": "wss://api.deepl.com/v3/voice/realtime/connect",
  "token": <secure access token>,
}
This step handles:
  • Authentication and authorization
  • Main configuration options (audio format, languages, glossaries, etc.)
URL and token are valid for one-time use only.
See the Request Stream documentation for details.
2

Streaming Audio and Text (WebSocket)

Use the received URL to establish a WebSocket connection to wss://api.deepl.com/v3/voice/realtime/connect?token=<secure access token>. This step handles exchanging JSON messages on the WebSocket connection:
  • Sending audio data
  • Receiving transcriptions and translations in real-time
Once a WebSocket connection is established, you must send audio data to prevent connection closure within 30 seconds.
See the WebSocket Streaming documentation for details.
The following sequence diagram shows the flow of messages.par means parallel execution and loop means looped execution.

Limitations and Constraints

  • Maximum 5 target languages per stream
  • Maximum streaming connection duration: 3 hours
  • Audio chunk size: should not exceed 100 kilobyte or 1 second duration
  • Recommended chunk duration: 50-250 milliseconds for low latency
  • Audio stream speed: maximum 2x real-time
  • Timeout: If no data is received for 30 seconds, the session will be terminated