Live Captions

Use the Live Captions API to transcribe audio streams and generate real-time captions for your application.

The Vonage Video Live Captions API lets you show live captions to end-users in a Vonage Video session, using a transcription service. We are using AWS Transcribe as a transcription provider. Since Live Captions captures the audio from the Media Router, it can provide the captions for the audio of SIP dial-in participants as well.

Live Captions is enabled by default for all projects, and it is a usage-based product. Live Captions usage is charged based on the number of audio streams of participants (or stream IDs) that are sent to the transcription service. For more information, see Live Captions API pricing rules.

The Live Captions feature is only supported in routed sessions (sessions that use the Media Router). You can send up to 50 audio streams from a single Vonage session at a time to the transcription service for captions.

Steps to enable Live Captions

Use the REST API to enable captioning for a session.

Use the method in the client SDK to publish audio to the captions service. See Implementing live caption

In subscribing clients, call the respective client SDK method for a subscriber to subscribe to captions for a stream.

Upon starting live captioning, securely streams audio to a third-party audio transcriptions service such as Amazon Transcribe.

Use the captioning API in the client SDKs to enable or disable receiving live captions in your application:

Starting or stopping to receive live captions in one web client does not impact captions received by other clients connected to the session.

Supported languages

Live Captions Support 11 Languages and 3 dialects of English. Pass in the desired language as the languageCode option when enabling live captions with the REST API:

"en-US" — English, US
"en-AU" — English, Australia
"en-GB" — English, UK
"es-US" — Spanish, US
"zh-CN" — Chinese, Simplified
"fr-FR" — French
"fr-CA" — French, Canadian
"de-DE" — German
"hi-IN" — Hindi, Indian
"it-IT" — Italian
"ja-JP" — Japanese
"ko-KR" — Korean
"pt-BR" — Portuguese, Brazilian
"th-TH" — Thai

Use cases

Live captions can improve an application's user experience and user engagement. Captioning improves the accessibility score of your application, which often results in participation from individuals with hearing disabilities. Some laws worldwide require applications to provide captioning.

Captioning can result in increased speaker comprehension in uncontrolled surroundings, thereby improving user engagement.

Live captions are only available for routed sessions (sessions that use the Media Router).

Upon enabling the Live Captions feature:

Use the client audio captioning API to start audio captioning for each published stream.
The audio stream is sent to a third-party audio transcription service.
Use the client audio captioning API to subscribe to the live captions for each published stream.
Choosing to not receive the captions by an individual subscriber does not affect the receiving captions by other subscribers in other clients connected to the session.
When session is over (when all clients have stopped publishing streams to the session), you can explicitly stop captioning using the Stop Captions API. Otherwise, audio captioning automatically stops after maximum duration (specified when calling the Start Captions API) has expired.
Live captioning automatically stops after maximum duration specified in the Start Captions API has expired.

Notes

Use the client SDK audio captioning API to start audio captioning for each published stream.
The audio stream is sent to a third-party audio transcription service (AWS Transcribe).
Use the client audio captioning API to subscribe to the live captions for each published stream.
Choosing to not receive the captions by an individual subscriber does not affect the receiving captions by other subscribers in other clients connected to the session.
When the session is over (when all clients have stopped publishing streams to the session), you can explicitly stop captioning using the Stop Captions API. Otherwise, audio captioning automatically stops after maximum duration (specified when calling the Start Captions API) has expired.

The default maximum allowed captioning duration for each session is 4 hours. You can set this to another maximum duration when you call the Start Captions API. Upon expiration, the audio captioning will stop without any effect on the ongoing session.

Note that in the current phase, this feature is only available as a REST API interface and in the client SDKs as listed above.

Live caption status updates

You can set up a webhook to receive events when live captions start, stop, and fail for a session.

Go to your Video API account and select the project from the list of projects in the left-hand menu.
Under Project settings, find Live Captions Monitoring and click Configure.
Submit the URL for callbacks to be sent to.

Secure callbacks: Set a Signature Secret to use secure webhook callback requests with signed callbacks, using the signature secret. See Secure callbacks.

When the status of live captions changes, an HTTP POST is delivered to the callback URLs. If no callback URL is configured, no status update is delivered. The raw data of the HTTP request is a JSON-encoded message of the following form:

{
  "captionId": "<captionsId>",
  "projectId": "<apiKey>",
  "sessionId": "<sessionId>",
  "status": "stopped",
  "createdAt": 1651253477,
  "updatedAt": 1651253837,
  "duration": 360,
  "languageCode": "en-US",
  "reason": "Maximum duration exceeds.",
  "provider": "aws-transcribe",
  "group": "captions"
}

The JSON object includes the following properties:

captionsId The unique ID for Audio Captioning session.
projectId API Key
sessionId OpenTok session for which Audio Captioning has started.
status Current status of the live captions.
- "started" The Vonage Video API platform has successfully allocated necessary resources to send audio streams for captioning.
- "transcribing" The transcription service has started (and captioning is in progress).
- "stopped" Captioning has stopped and all the resources have been deleted.
- "failed" Captioning has failed to allocate the necessary resources or failed to send streams for captioning.
createdAt The Unix timestamp (Epoch) at which the audio captioning has started.
updatedAt The Unix timestamp (Epoch) at which the audio captioning has updated. If the status is "stopped", the updatedAt indicates the time at which captioning has stopped.
languageCode The BCP-47 language code used
reason Additional error information about the status change
providerThe third-party service provider used for the audio captioning:
- "aws-transcribe" Amazon Transcribe
group The type of the event, which is always set to "captions" for audio caption API events.

Implementing live caption

Use the live captions API to enable real-time audio captioning of the publishers and subscribers connected to an session.

Live captioning must be enabled at the session level via the REST API.

Live captioning is only supported in routed session.

Publishing live captions

To enable live captions, initialize the `OTPublisher` component with the optional boolean `publishCaptions` property of the `properties` prop set to true:

This setting is false by default.

You can dynamically change this property (based on a React state change) to toggle captions on an off for the published stream.

Subscribing to live captions

To start receiving captions, set the `subscribeToCaptions` property of the `properties` prop of the `OTSubscriber` component:

You can set the subscribeToCaptions property to true regardless of whether the client publishing the stream is currently publishing live captions. The subscriber will start receiving captions data once the publisher begins publishing captions.

Subscribers receive captions via the captionReceived event handler (shown above).

The captionReceived event object has two properties:

text — The text of the caption (a string)
isFinal — Whether the caption text is final for a phrase (true) or partial (false). The React Native SDK does not display the text of the captions events. You can create your own UI element to render the captions text based on captions events.

Receiving your own live captions

The Vonage web client SDK does not support a publisher receiving events for its own captions. To render captions for a stream published by the local client, create a hidden subscriber (to the local publisher's stream) to listen for the caption events. Set the subscribeToSelf property of the OTSubscriber to true. You should not render this subscriber's video (by setting its width and height to 0) and you should not subscribe to audio (to avoid echo, by setting subscribeToAudio to false).

You can add the captions to the UI, as you would for other stream's captions. See Custom rendering of subscribers.

Enabling live captions

A publisher may be initialized with the optional boolean publishCaptions parameter. The parameter is passed in via the properties object. This parameter will be false by default.

Toggling live captions for a publisher dynamically

After live captions is enabled, a publisher may start or stop sending captions by calling the publishCaptions method. This method accepts a boolean as its parameter.

The code below shows an example of stopping live captions for a publisher.

The code below shows an example of starting live captions for a publisher.

Subscribing to live captions

A subscriber may choose to begin or stop receiving live captions.

Subscriber subscribeToCaptions Method

A subscriber may start or stop receiving captions by calling the asynchronous subscribeToCaptions method. This method accepts a boolean as its parameter.

The method can be called regardless of whether the publisher is currently publishing live captions. The subscriber will start receiving captions data once the publisher begins publishing captions.

The code below shows an example of starting live captions for a subscriber.

The code below shows an example of stopping live captions for a subscriber.

Enabling live captions for a subscriber

Subscribers may verify whether they are actively subscribed to a publisher's live captions using the isSubscribedToCaptions method. This method has no parameters and returns a boolean

Receiving live captions

Subscribers receive captions via events. The SDK does not display the text of the captions events.

Subscriber captionReceived Event

A subscriber that is actively subscribed to live captions will dispatch captionReceived events. The captionReceived event object has three properties: caption, streamId, and isFinal. streamId is the ID of the stream, while caption is the transcribed text. isFinal indicates whether a caption is finished being transcribed. This value is only relevant when using partial captions. The developer may choose how and where to render the caption's text.

The following shows the console logs when using partial captions and having a publisher say "This is my sentence."

Receiving your own live captions

The Vonage web client SDK does not support a publisher receiving events for its own captions. To render the speaker's own captions, create a hidden subscriber (to the local publisher's stream) to listen for the caption events. This subscriber should not be attached to the DOM and should not subscribe to audio, to avoid echo. You can add the captions can then to the UI.

The following shows the creation of a hidden subscriber for the publisher to receive its own captions.

Note: For OpenTok.js 2.25.0 and earlier, you need to set testNetwork to true in the options passed into Session.subscribe(), due to a bug.

Sample

The opentok-web-samples Basic-Captions sample uses live captions in a web app built with the Vonage web client SDK.

Publishing live captions

A publisher can start or stop publishing real-time live captions by calling the setPublishCaptions() method of the PublisherKit object:

publisher.setPublishCaptions(true);

If the publisher does not include an audio track, the PublisherKit.PublisherListener.onError() method is called with an error (with the code property of the error set to PublisherMissingAudioTrack).

The Android SDK does not support a publisher receiving events for its own captions. To render the speaker’s own captions, create a hidden subscriber (to the publisher’s stream) to listen for the caption events. (See the next section.)

Subscribing to live captions

A subscriber may start or stop receiving captions by calling the subscribeToCaptions() method of the SubscriberKit object:

subscriber.setSubscribeToCaptions(true);

You can call this method regardless of whether the publisher of the stream is currently publishing live captions. The subscriber will start receiving captions data once the publisher begins publishing captions.

To stop receiving captions, pass false into the method:

subscriber.setSubscribeToCaptions(false);

Subscribers can verify whether they are actively subscribed to a stream's live captions using the isSubscribedToCaptions() method:

boolean isSubscribed = subscriber.getSubscribeToCaptions();

Subscribers receive captions via events. The SDK does not display the text of the captions in the UI. Use the SubscriberKit.CaptionsListener interface to set up a listener for captions events:

SubscriberKit.CaptionsListener captionsListener = new SubscriberKit.CaptionsListener() {
    @Override
    public void onCaptionText(SubscriberKit subscriber, String text, boolean isFinal) {
        // Display the text in the UI.
    }
}

The hasCaptions() method of a Stream object reports whether the stream has captions:

boolean hasCaptions = stream.hasCaptions();

Implement the onStreamHasCaptionsChanged() method of the Session.StreamPropertiesListener interface to monitor when a stream has captions enabled and disabled:

Session.StreamPropertiesListener captioningListener = new Session.StreamPropertiesListener() {
    @Override
    public void onStreamHasCaptionsChanged(Session session, Stream stream, boolean hasCaptions) {
        // Adjust UI to indicate that captions are or are not available.
    }
}

Publishing live captions

A publisher can start or stop publishing real-time live captions by setting the publishCaptions property of the OTPublisherKit object:

If the publisher does not include an audio track, the [PublisherKit publisher:didFailWithError:] message is sent, with the code property of the error set to OTPublisherMissingAudioTrack.

The Vonage iOS SDK does not support a publisher receiving events for its own captions. To render the speaker’s own captions, create a hidden subscriber (to the publisher’s stream) to listen for the caption events. (See the next section.)

Subscribing to live captions

A subscriber may start or stop receiving captions by setting the subscribeToCaptions property of the OTSubscriberKit object:

To stop receiving captions, set the property to false:

The OTSubscriberKitCaptionsDelegate(_:subscriber:caption:isFinal:) message is sent when a subscriber to a stream receives captions:

You can set up a key-value observer for the hasCaptions property of an OTStream object to see when the stream has captions enabled and disabled.

Publishing live captions

A publisher can start or stop publishing real-time live captions by setting the publishCaptions property of the OTPublisherKit object:

If the publisher does not include an audio track, the [PublisherKit publisher:didFailWithError:] message is sent, with the code property of the error set to OTPublisherMissingAudioTrack.

Subscribing to live captions

A subscriber may start or stop receiving captions by setting the subscribeToCaptions property of the OTSubscriberKit object:

To stop receiving captions, set the property to NO:

The [OTSubscriberKitCaptionsDelegate subscriber:caption:isFinal:] message is sent when a subscriber to a stream receives captions:

You can set up a key-value observer for the hasCaptions property of an OTStream to see when the stream has captions enabled and disabled:

Publishing live captions

A publisher can start or stop publishing real-time live captions by setting the PublishCaptions property of the Publisher object:

publisher.PublishCaptions = true;

If the publisher does not include an audio track, the PublisherKit.Error event is dispatched, with the ErrorCode property of the ErrorEventArgs object set to PublisherMissingAudioTrack.

The Vonage Windows SDK does not support a publisher receiving events for its own captions. To render the speaker’s own captions, create a hidden subscriber (to the publisher’s stream) to listen for the caption events. (See the next section.)

Subscribing to live captions

A subscriber may start or stop receiving captions by setting the SubscribeToCaptions property of the Subscriber object:

subscriber.SubscribeToCaptions = true;

You can set this property regardless of whether the publisher of the stream is currently publishing live captions. The subscriber will start receiving captions data once the publisher begins publishing captions.

To stop receiving captions, set the property to false:

subscriber.SubscribeToCaptions = false;

Subscribers receive captions via events. The SDK does not display the text of the captions in the UI. Add an event listener for the Subscriber.CaptionText event:

subscriber.CaptionText += (object sender, Subscriber.CaptionTextArgs e) =>
{
	// Display the text in the UI.
};

The HasCaptions property of a Stream object reports whether the stream has captions:

stream.HasCaptions;

The Session.StreamHasCaptionsChanged event is dispatched when a stream has captions enabled and disabled:

session.StreamHasCaptionsChanged += (object sender, StreamEventArgs e) =>
{
	// Adjust UI to indicate that captions are or are not available.
};

Publishing live captions

A publisher can start or stop publishing real-time live captions by calling the otc_publisher_set_publish_captions() function:

If the publisher does not include an audio track, the otc_publisher_callbacks.on_error() callback function is called with an error, with the error code set to OTC_PUBLISHER_MISSING_AUDIO_TRACK.

The Vonage macOS SDK does not support a publisher receiving events for its own captions. To render the speaker’s own captions, create a hidden subscriber (to the publisher’s stream) to listen for the caption events. (See the next section.)

Subscribing to live captions

A subscriber may start or stop receiving captions by calling the otc_subscriber_set_subscribe_to_captions() function:

You can call this function regardless of whether the publisher of the stream is currently publishing live captions. The subscriber will start receiving captions data once the publisher begins publishing captions.

To stop receiving captions, pass OTC_FALSE into the function:

Subscribers can verify whether they are actively subscribed to a stream's live captions using the otc_subscriber_get_subscribe_to_captions() function:

Subscribers receive captions via events. The SDK does not display the text of the captions in the UI. Set the on_caption_text() member function of the otc_subscriber_callbacks instance to set up a listener for captions events:

The otc_stream_has_captions() function reports whether a stream has captions:

Implement the on_stream_has_captions_changed() member function of the otc_session_callbacks instance to monitor when a stream has captions enabled and disabled:

Publishing live captions

An publisher can start or stop publishing real-time live captions by calling the otc_publisher_set_publish_captions() function:

If the publisher does not include an audio track, the otc_publisher_callbacks.on_error() callback function is called with an error, with the error code set to OTC_PUBLISHER_MISSING_AUDIO_TRACK.

The Vonage Linus SDK does not support a publisher receiving events for its own captions. To render the speaker’s own captions, create a hidden subscriber (to the publisher’s stream) to listen for the caption events. (See the next section.)

Subscribing to live captions

A subscriber may start or stop receiving captions by calling the otc_subscriber_set_subscribe_to_captions() function:

To stop receiving captions, pass OTC_FALSE into the function:

Subscribers can verify whether they are actively subscribed to a stream's live captions using the otc_subscriber_get_subscribe_to_captions() function:

The otc_stream_has_captions() function reports whether a stream has captions:

Implement the on_stream_has_captions_changed() member function of the otc_session_callbacks instance to monitor when a stream has captions enabled and disabled:

Known issues

When a participant mutes for more than 15 seconds, the connection to the third-party transcription provider is disconnected, to save billing costs. It might take 2-5 seconds to reconnect and see the captions resume.

More information

See this Vonage API Support article for more technical specifications and FAQs.