A Smarter, Safer Video API
Real-time video can be challenging. Participants join from different devices, on different networks, in different parts of the world. Conditions shift mid-call: a mobile device may switch from Wi-Fi to cellular data (e.g., 5G to 4G), a corporate firewall may block certain UDP paths, and a low-end laptop may struggle under CPU load.
The Vonage Video API is built around a set of smart, complementary features to continuously adapt the session to changing conditions, and those features are organized into three layers, each addressing a different level of the stack:
- Topology-level: Session infrastructure, routing optimizations, and server lifecycle. These affect all participants in a session and require no per-stream configuration.
- Peer-connection-level: How individual clients connect and negotiate with the Media Router. These settings apply once per client, but there may be multiple clients within the same end-user device.
- Stream-level: Per-stream quality knobs: codec, resolution, bitrate, and more. These can be adjusted independently for each publisher and subscriber.
The Building Blocks
The features are organized into three concentric layers. Outer layers affect all participants; inner layers can be tuned per-stream.
╔══════════════════════════════════════════════════════════════════════╗
║ TOPOLOGY LAYER ║
║ Routed vs. Relayed · Adaptive media routing · Media Mesh ║
║ Session migration ║
║ ┌──────────────────────────────────────────────────────────────┐ ║
║ │ PEER-CONNECTION LAYER │ ║
║ │ Single peer connection (SPC) · Codec negotiation │ ║
║ │ ┌────────────────────────────────────────────────────┐ │ ║
║ │ │ STREAM LAYER │ │ ║
║ │ │ Scalable video · Bitrate presets │ │ ║
║ │ │ Publisher resolution/frame rate │ │ ║
║ │ │ Subscriber preferred resolution/frame rate │ │ ║
║ │ │ Audio fallback · FEC / NACK / DTX │ │ ║
║ │ └────────────────────────────────────────────────────┘ │ ║
║ └──────────────────────────────────────────────────────────────┘ ║
║ Monitoring: client observability · sender-side stats · MOS ║
╚══════════════════════════════════════════════════════════════════════╝
The monitoring layer spans all three, providing visibility into quality at every level.
Under the Hood: The WebRTC Stack
Many of the reliability guarantees come from features baked into the WebRTC stack that require no explicit API call. Understanding them helps you reason about quality behaviour and use the higher-level APIs more effectively.
Congestion Control and Rate Adaptation
WebRTC implementations use Google Congestion Control (GCC) by default. GCC continuously probes available bandwidth, estimates the current bottleneck, and signals the encoder to raise or lower its target bitrate. This is the primary mechanism that keeps a session alive through transient congestion. Any Vonage Video API setting that affects bitrate acts as a ceiling on top of GCC, and GCC continues to dynamically adapt below that ceiling.
Forward Error Correction (FEC)
Audio streams try to negotiate Opus FEC by default. The Opus codec embeds a lower-quality copy of each audio frame inside the next frame. If the first frame is lost in transit, the receiver reconstructs it from the copy in the following packet, at the cost of a small bandwidth overhead. This is particularly effective for random packet loss, the kind you see on congested Wi-Fi or cellular networks.
For video, all the codecs support RTP RED FEC when it is negotiated. FEC is transparent to your application, adding redundancy to the media stream, enabling the receiver to recover lost or corrupted packets without requiring retransmission. For more information, you can read the RFC 8854.
NACK and RTX (Retransmission)
When a video packet is lost, the receiver sends a Negative ACK (NACK) to request retransmission. NACK/RTX adds latency proportional to the round-trip time (typically between 50 and 200 ms).
Discontinuous Transmission (DTX)
Opus DTX detects silence on the microphone and stops sending audio packets during quiet periods, reducing audio bandwidth to near zero. You can activate DTX through the enableDtx setting of your SDK of choice when initializing the publisher.
DTLS-SRTP and Secure Real-time Transport Protocol
All media is encrypted by default using DTLS-SRTP. The negotiation of encryption keys happens as part of the WebRTC handshake, before any media flows. For enhanced security, the AES-256 add-on feature provides 256-bit encryption. These transport-layer encryption methods (DTLS-SRTP and AES-256) work alongside the optional End-to-End Encryption (E2EE) feature, which adds an additional layer of application-level encryption. See the End-to-End Encryption guide for more details.
ICE and TURN
Interactive Connectivity Establishment (ICE) tries multiple paths between endpoints (direct UDP, STUN-reflexive, TURN relay) and picks the best one. If the best path changes mid-call (common when a mobile device switches networks), ICE can restart and find a new path without dropping the call. For sessions traversing restrictive firewalls, see the Configurable TURN Servers guide and the Restricted Networks guide.
Topology-level Optimizations
Topology-level features determine the infrastructure through which media flows. They are set when creating a session or connecting to it, and they affect all participants equally. Getting the topology right is the foundation everything else builds on.
Media Mode: Routed vs. Relayed
The most fundamental topology decision is media mode.
- Relayed sessions: Clients attempt to send audio-video streams directly to each other (peer-to-peer), falling back to TURN relay if a firewall blocks the direct path. Relayed sessions have lower latency for small groups but cannot use scalable video, subscriber audio fallback, single peer connection, or sender-side statistics.
- Routed sessions: Media flows through the Vonage Video Media Router. This unlocks the full smart-feature stack: scalable video, subscriber audio fallback, Single Peer Connection (SPC), sender-side statistics, live-streaming broadcasts, archiving, SIP interconnect, and more.
For sessions with three or more participants, always use a routed session. See the Creating a session guide.
Adaptive Media Routing
In routed sessions (from SDK v2.24.7+ for Web and v2.27.0+ for native), the platform automatically uses adaptive media routing to optimize traffic between participants. When publishers have only one subscriber and no routed-only features (archiving, live streaming, SIP, etc.) are active, the media is routed directly between publishers and subscribers, without declaring a relayed session upfront. As soon as a publisher gets more than one subscriber, or a routed-only feature is activated, the Media Router seamlessly takes over routing. For example, in conversational use cases (all participants publish and subscribe at the same time) without routed-only features, as soon as the third participant joins, the session will be seamlessly moved to the Media Router.
This means that practically you can declare a routed session for all cases and still get near-relayed latency for 1:1 calls. Adaptive media routing is enabled by default and requires no configuration. See the Creating a session guide.
Media Mesh
The platform uses Media Mesh to automatically connect each participant to the Media Router datacenter nearest to them (you can check here the list of all available datacenters). For a geographically distributed session (for example, participants in New York, Frankfurt, and Sydney) Media Mesh ensures each client routes media through its nearest regional server rather than traversing a single, potentially distant, datacenter. The result is lower latency and better quality for all participants, especially in large international sessions.
Media Mesh is enabled by default and requires no configuration. If you need more control over where media is routed, you have two personalization options:
Location hint (best-effort): you can provide a preferred signaling datacenter as a hint. The platform will attempt to use the requested location when possible, but it is not guaranteed and may fall back to another datacenter based on availability, capacity, or other operational considerations. See the Creating a session guide for more information.
Regional Media Zone (enforced): if you must keep media routing within a specific region (for data residency or compliance requirements), configure a Regional Media Zone. This setting is forced: signaling and media will be routed through the selected zone rather than automatically choosing the nearest datacenter. Additional documentation can be seen in the Regional Media Zones guide.
Session Migration
What it does: Transparently transfers all participants in a session to a new Media Router server when a planned server rotation occurs, without disconnecting anyone. No application-side handling is required (i.e., apps don't need to implement their own transfer/reconnect logic; the platform performs it for them).
Why it matters: Cloud infrastructure is never static. Servers are patched, scaled in and out, and replaced. Without session migration, a server rotation forces every participant to disconnect and reconnect, which can be a disruptive experience when the session exceeds 8 hours. With session migration enabled (SDK v2.30.0+), the transition is invisible: streams continue, no reconnection events fire, and existing audio/video is not interrupted.
What still requires a restart: Recordings (archives), live streaming broadcasts, and Experience Composer instances end when a session migrates and must be restarted via the session notification callback. Automatic archives restart automatically.
How to enable it: Pass sessionMigration: true in the session initialisation options for each client. By default, this is false. See the Server rotation and session migration guide for full SDK-specific syntax, triggering it manually, and for details on handling the session notification callback.
Automatic Reconnection
What it does: When a client unexpectedly loses its connection to a session, the SDK automatically attempts to reconnect without requiring any application code.
Why it matters: Without automatic reconnection, a brief network hiccup would force the user to manually rejoin the session. With automatic reconnection, the SDK handles the recovery transparently. Streams that were being subscribed to will pause and resume automatically when the connection is restored.
How it works: No configuration is required. When the connection is dropped and the client tries to reconnect:
- The Session object dispatches a
sessionReconnectingevent. - If the connection is restored, the Session object dispatches a
sessionReconnectedevent. - If the connection cannot be restored, the client disconnects from the session and the
sessionDisconnectedevent is fired.
Your application can optionally listen for these events to display status indicators to the user:
session.on({
sessionReconnecting: function() {
// Display a "reconnecting..." indicator
},
sessionReconnected: function() {
// Hide the indicator
},
sessionDisconnected: function() {
// Handle permanent disconnection
}
});
Signals sent while temporarily disconnected are queued and delivered once the connection is restored. See the Joining a session guide for full examples and SDK-specific event names.
Peer-Connection-Level Strategies
Peer-connection-level features govern how a client's WebRTC connections to the Media Router are structured and what codecs they negotiate. These settings apply once per client (not per stream) and have a broad effect on resource consumption and compatibility.
Single Peer Connection (SPC)
What it does: Bundles all subscriber streams for a client into a single WebRTC peer connection to the Media Router, instead of one connection per stream.
Why it matters: Each additional peer connection adds overhead: separate ICE candidates, DTLS handshakes, and OS-level socket state. On a mobile device subscribing to ten streams, ten peer connections can stress the device and consume significant power. SPC reduces this to one connection, cutting resource consumption and enabling larger sessions on native mobile clients.
Additional benefit — rate control: With a single connection, the WebRTC congestion controller sees all incoming streams as a single bundle and can make better-informed rate adaptation decisions. In the multi-connection model, each connection independently probes and adapts, which can cause oscillation and sub-optimal bandwidth sharing.
How to enable it: SPC is off by default. Set singlePeerConnection: true when initialising the session. For the web SDK, pass it as a property in the OT.initSession() options object. For other SDKs:
- Android:
Session.Builder.setSinglePeerConnection(true) - iOS:
OTSessionSettings.singlePeerConnection = YES - Windows:
SinglePeerConnectionproperty onSession.Builder - Linux/macOS:
otc_session_settings_set_single_peer_connection() - React Native:
enableSinglePeerConnectionin the OTSessionoptionsprop
See the Creating a session guide for full examples.
When to use it: Enable SPC whenever you have more than a handful of subscribers in a routed session — especially on mobile platforms. The benefits increase with the number of streams.
Codec Selection
What it does: Selects the video codec used between each publisher–subscriber pair.
Why it matters: Codec choice affects quality at a given bitrate, CPU usage, hardware acceleration availability, and compatibility with scalable video. VP8 is the safest default: universally supported, hardware-accelerated on most platforms, and compatible with scalable video. VP9 delivers better quality at the same bitrate but at higher CPU cost. H.264 benefits from hardware encoders on iOS and some Android devices, reducing battery drain, but does not support scalable video.
How it works automatically: The Vonage platform negotiates the codec for each publisher–subscriber pair, honoring the project-level preference while falling back to VP8 if the preferred codec is not supported by both endpoints.
When to intervene: See the Video codecs guide for full decision criteria, the VP9 Scalable Video guide for VP9-specific behaviour, and the SDK codec preference API to override per-publisher.
End-to-End Encryption (E2EE)
What it does: Encrypts media payloads at the client so they remain encrypted throughout the Media Router and can only be decrypted by other participants in the same session. This provides an additional encryption layer on top of the standard DTLS-SRTP transport encryption.
Why it matters: In standard routed sessions, the Media Router can access unencrypted media (which is required for features such as archiving, transcoding, and scalable video layer selection). E2EE prevents the Media Router from accessing the media content, which is required for use cases where strict data privacy must be maintained end-to-end.
Important constraints: When E2EE is enabled, features that require media decoding at the Media Router are unavailable: archiving, live streaming broadcasts, Experience Composer, Audio Connector, and SIP interconnect. Ensure you account for these limitations before enabling E2EE.
How to enable it: E2EE is an add-on feature that must first be enabled for your Vonage Video account. Once enabled, create the session with e2ee: true using the server-side Vonage Video API, and set the encryption secret when initialising the session in the client:
vonage.video.createSession({ mediaMode: "routed", e2ee: true });
All participants in the session must use the same encryption secret to receive intelligible media. The secret can be rotated on the fly once the session is connected. See the End-to-End Encryption guide for full setup instructions and SDK-specific examples.
Stream-level Quality Controls
Stream-level features are the most granular knobs available. They can be set independently for each publisher and subscriber, and many can be changed dynamically mid-call. This is the layer where your application actively participates in quality management.
Scalable Video
What it does: The publisher encodes multiple spatial and temporal layers of the same stream. Each subscriber receives the layer that matches its available bandwidth and display requirements, without the publisher sending duplicate full-resolution streams.
Why it matters: In a session with ten subscribers, sending ten independent full-HD streams from the publisher is wasteful, while adapting a single stream to the most constrained subscriber is far from optimal. Scalable video sends one stream with multiple quality layers; the Media Router selects and forwards the appropriate layer to each subscriber. As a subscriber's network degrades, the Media Router downgrades its layer in real time, while the publisher remains unaffected, continuing to stream all its layers for the Media Router to make the decision.
How it works automatically: In routed sessions with more than two participants, the Media Router enables scalable video automatically (the Auto project setting). Each publisher's SDK negotiates the scalability structure (VP8 simulcast uses L1T1/L2T1/L3T1 layers; VP9 SVC can use spatial layers too).
When to intervene: For screen-sharing streams, enable scalable video explicitly when you want the Media Router to downscale them for lower-bandwidth subscribers. See the Scalable video guide.
Bitrate Presets and Publisher Max Bitrate
What it does: Allows you to cap the maximum bitrate a publisher uses for camera video, using named presets (DEFAULT, BW_SAVER, EXTRA_BW_SAVER) or raw values.
Why it matters: On a metered or congested connection, an unconstrained publisher will compete for bandwidth with all other applications in the device. Setting a lower preset reduces your video footprint, leaving more headroom to other applications in the same device. Crucially, when you use VP8 with scalable video enabled, the preset also controls which encoding layers are active, so BW_SAVER effectively limits the stream to two spatial layers (low and medium), and EXTRA_BW_SAVER limits it to one (low).
Interaction with rate control: Google Congestion Control (GCC) still adapts dynamically below the preset ceiling. Setting BW_SAVER is a ceiling, not a floor.
Do not use bitrate presets for screen sharing. Screen-sharing encoders operate differently; applying a bitrate cap to a screen share can produce blurry, low frame-rate output without the expected bandwidth saving. See the Publisher Max Bitrate guide.
Publisher Resolution and Frame Rate Controls
Beyond bitrate presets, you can directly control the resolution and frame rate at which a publisher encodes video. There are two approaches: setting them at publish time, or adjusting them dynamically after publishing has started.
Setting resolution and frame rate at publish time (Web SDK):
const publisher = OT.initPublisher(targetElement, {
resolution: '1280x720', // '1920x1080', '1280x720', '640x480', '320x240'
frameRate: 15, // 30, 15, 7, or 1
});
Always initialise the publisher at the maximum resolution and frame rate you might ever need (the SDK can only reduce from the initial value, not increase beyond it).
Dynamic resolution and frame rate adjustment (Web SDK):
After publishing has started, you can change the publisher's preferred resolution and frame rate without restarting the stream:
// Reduce resolution
await publisher.setPreferredResolution({ width: 320, height: 180 });
// Reduce frame rate
await publisher.setPreferredFrameRate(7);
Common scenarios where dynamic adjustment helps:
- Responding to
qualityLimitationReason: "cpu"from publisher stats: reduce frame rate to relieve CPU pressure. - Responding to a sudden bandwidth drop before audio fallback triggers.
Video content hints (Web SDK): For screen-sharing streams, set the content hint to guide the browser's encoding strategy:
publisher.setVideoContentHint("text"); // Prioritises sharp text/static content
publisher.setVideoContentHint("motion"); // Prioritises smooth motion (default camera behaviour)
publisher.setVideoContentHint("detail"); // Prioritises fine detail at lower frame rates
See the Publisher Video Constraints guide and the Publishing a stream — Web guide.
Publisher Video Degradation Preference
What it does: Controls how the video engine prioritises between reducing resolution and reducing frame rate when bandwidth or CPU resources are constrained.
Why it matters: When the network or device cannot sustain the current video quality, the encoder must degrade something. By default, the video engine decides autonomously. The degradation preference lets you express your application's priority: for example, a screen-sharing session benefits from maintaining resolution (sharp text matters more than smooth motion), while a camera feed benefits from maintaining frame rate (smooth motion is more natural than blocky stills).
Relationship to Content Hint: Video content hints and degradation preference serve different but related purposes. Content Hint describes the type of content being transmitted (for example, "text" for screen sharing), while degradation preference explicitly controls the encoding strategy. When you set a Content Hint, the video engine automatically selects an appropriate degradation preference - for example, "text" automatically selects to maintain resolution. An explicitly set degradation preference overrides the automatic selection from Content Hint.
Relationship to scalable video: When scalable video is active, the video engine may decide to remove a temporal or spatial layer rather than degrading the resolution of the remaining stream. The degradation preference still applies as a guide within the active layers.
SDK-specific APIs:
See the platform-specific publish guides for full examples: Android, iOS, iOS (Swift), Linux, Windows.
Subscriber Preferred Resolution and Frame Rate
When subscribing to a scalable video stream, you can hint to the Media Router which quality layer to deliver to each subscriber. This is the primary tool for building adaptive layouts. For example, a large conference grid where thumbnails should receive a low-resolution layer and the active speaker should receive full resolution.
Setting preferred resolution at subscribe time (Web SDK):
session.subscribe(stream, targetElement, {
preferredResolution: 'auto', // Recommended: SDK picks the layer based on element size
// Or specify explicitly:
// preferredResolution: { width: 320, height: 240 },
// preferredFrameRate: 7,
});
The "auto" setting is recommended for most cases: the Web SDK automatically
derives the preferred resolution from the rendered size of the subscriber's
video element and requests the most appropriate scalable-video layer. Be aware
that "auto" is layout-sensitive. If your application frequently changes
the size of subscriber tiles (for example, active-speaker switching, pin/unpin
flows, responsive grid relayouts, or animated transitions), the SDK may
repeatedly update the preferred resolution. That can trigger more frequent
layer switches and bandwidth "ramp-up / ramp-down" cycles on the Media Router,
which may increase network churn and reduce visual stability. For highly
dynamic layouts, consider setting an explicit preferredResolution (and
updating it only when the UI settles), or rate-limit resolution changes to
avoid rapid oscillation.
Dynamic adjustment after subscribing:
// Downgrade a tile when moving it to a thumbnail position
subscriber.setPreferredResolution({ width: 320, height: 240 });
subscriber.setPreferredFrameRate(7);
// Upgrade when the subscriber becomes the active speaker
subscriber.setPreferredResolution({ width: 1280, height: 720 });
subscriber.setPreferredFrameRate(30);
Frame rate restriction (Web SDK): subscriber.restrictFrameRate(true) caps the subscriber to one frame per second or less. It may be useful for non-active participants in large sessions to save CPU and bandwidth while still showing a "presence" tile. Call restrictFrameRate(false) to restore normal frame rate.
For cross-SDK APIs, see the Scalable video guide and the Subscribe quality guide.
Audio Fallback
What it does: When a publisher's or subscriber's network can no longer sustain video, the SDK automatically disables video for that stream, keeping audio alive, and re-enables it when conditions improve.
Why it matters: A frozen video frame is jarring; a dropped audio stream breaks the conversation entirely. Audio fallback ensures that when bandwidth runs out, participants can still hear each other. Combined with Opus FEC and DTX, audio remains intelligible even at very low bitrates.
Two complementary modes:
- Publisher audio fallback: Triggered by the publishing client's own network assessment. When the publisher's congestion metrics show video is unsustainable, the publisher disables its video track. Available in both relayed and routed sessions.
- Subscriber audio fallback: Triggered by the Media Router on behalf of a specific subscriber. Only the affected subscriber loses video; other subscribers continue to receive it. Available in routed sessions only.
Both features are engaged automatically. For more information, see the Audio fallback guide.
Monitoring and Observability
The monitoring layer spans all three stack layers, providing visibility into quality at every level, from the infrastructure topology down to individual stream metrics.
Quality Monitoring and Client Observability
What it does: The client observability API provides a continuous stream of per-stream statistics - packet loss, bitrate, frame rate, decoded resolution, freeze counts, and more - for both publishers and subscribers. The web SDK additionally provides a Mean Opinion Score (MOS) for Video, which is a quality score that simulates Audio MOS score scale, and CPU performance monitoring.
Why it matters: Without visibility into what is happening at each endpoint, you cannot distinguish "the user's network is bad" from "the user's device is overloaded". The statistics let your application make smart decisions: warn the user before quality degrades further, adjust layouts, or trigger policy-based actions.
Key capabilities:
- High-level statistics API: Aggregated per-publisher or per-subscriber metrics, normalised across peer connection transitions. Use this for production monitoring and adaptive UX.
- Video quality changed events: Fired when quality changes with a reason code (bandwidth, CPU, codec change, resolution change). Use these to drive UI state without polling.
- Mean Opinion Score (MOS): An industry-standard 1–5 quality score (web only). Integrate it with your observability stack to track call quality trends over time.
- CPU performance monitoring: Detects device-side CPU stress before it causes dropped frames (web only). Respond by disabling CPU-intensive features like background blur.
- Pre-call testing: Use the Vonage Video Network Test library to estimate MOS and check audio/video publishability before the user joins a session.
See the Client observability guide for the full statistics reference and SDK-specific code examples.
Further Reading
- Video codecs guide
- Scalable video guide
- Audio fallback guide
- Creating a session (including media mode, SPC, adaptive media routing, Media Mesh)
- Publisher Video Constraints guide
- Publishing a stream — settings guide
- Subscribe quality guide
- Client observability guide
- Publisher max bitrate guide
- Server rotation and session migration guide
- Joining a session — automatic reconnection
- Regional Media Zones guide
- Restricted networks guide
- End-to-End Encryption guide
- Publisher video degradation preference
- Insights API guide