As a Solutions Architect I spend much of my time chatting to customers who are currently implementing, or are about to implement, the Vonage Video API. I listen carefully to their use case and requirements and advise them on best practices with our APIs as well as helping them build the best user experience possible. Here I’ve tried to pull together some of the tips and tricks that I’ve gleaned from my 4 years of working with Vonage customers on the video API.
Visit the Video API account portal - to set up your new account; it’s free and your account will automatically be credited ten US dollars ($10) testing credit valid for 1 year.
Where to get more help?
Detailed developer documentation on the Vonage Video API is publicly available on our Video API Developer Site.
Here you will find all the details you need for basically any question you might have, sample codes, and release notes. There is also a great selection of knowledge base articles located here: https://api.support.vonage.com/hc/en-us
Some of this info has been contributed by my peers, because as a team we’ve handled pretty much any use case you can think of!
To help us better assist you, please feel free to provide us your feedback and questions via email at support@api.vonage.com
Get to know your video options
Video API Platform
Vonage video uses webRTC for audio-video communications and consists of client libraries for web, IOS, Android, React Native, Windows, MacOS and Linux, as well as server SDKs and REST APIs. More information can be found here. Key Terms:
The Video API platform uses tokens for authorisation, meaning that you don’t have to worry about creating users on the platform; if needed, they can be created by the application.
Session: A session is a logical group of connections and streams. Connections within the same session can exchange messages. Think of a session as the “virtual room” where participants can interact with each other. Sessions should not be reused as it makes troubleshooting difficult and your implementation potentially less secure.
Connection: is an endpoint that participates in a session and is capable of sending and receiving messages. A connection has a presence, it is either connected and can receive messages, or it’s disconnected.
Stream: media stream between two connections. This means that actual bytes containing media are being exchanged. Media can consist of audio only, or audio and video. You can also create screenshare and custom streams.
Publisher: clients publishing media stream
Subscriber: clients receiving media streams.
Video Express
Vonage, with a decade of video development expertise, has created a simple high-level API called Video Express to accelerate the development and integration of multiparty video into web applications.
Video Express removes the complexity of designing a video application and can be used for 1:1 and multiparty applications on web or mobile.
Out of the box it delivers the following:
Scalable to 25 participants, automatically!
Built-in pre-call preview and settings
Intelligent bandwidth management
Adaptive layouts
Automatic prioritization of screen sharing
Mobile optimized out-of-the box
Up to 1080P resolution
Background blur
We also have a sample application which adds a number of features, including:
Pre-call room
Participant List
Chat
Find more information on the Vonage Video Express page.
For additional resources, check out these blog posts:
Introducing Vonage Video Express: Accelerating Your Video Collaboration Projects
Create a Multiparty Video App With the New Vonage Video Express
Project and Server Best Practices
Setting up your environment
When designing a video application, consider having two environments; one for testing and one for production. To test simple items, or to reproduce issues, you can also use our playground or you can use the Opentok command line.
Create a project key for lab and production
Link to opentok CLI - https://www.npmjs.com/package/opentok-cli
Link to playground - https://tokbox.com/developer/tools/playground_doc/
Understanding API (and SDK) versioning
For Enterprise environment customers, it is important to note that newly added API keys will be using the Standard environment by default. If you need to switch an API key’s environment from Standard to Enterprise, you can do so on your Video API Account Portal.
To ensure your application calls the Enterprise JS SDK, use source = https://enterprise.opentok.com/v2/js/opentok.min.js to ensure you receive long term support on your SDKs.
For more info, visit the Enterprise environment guide.
Best Practice for setting up API key/secret, Tokens and Session IDs
Next, you need to set up your account credentials.
API Key and secret
Keep the API secret/key private by NOT exposing them to public repos.
Do NOT save secret/key in client libraries/compiled mobile SDKs.
Use HTTPS only to make REST calls.
Session ID
Always generate a new sessionId for every new session created.
Sessions’ quality scores and data are indexed by sessionId. If there are multiple conversations (meetings) per sessionId, it will be difficult to debug using Vonage’s inspector tool, because reused sessionIds tend to report lower aggregate quality scores than the actual call quality experienced by end users.
Tokens
Your server that generates tokens must be placed behind a secured/authenticated endpoint.
Always generate new tokens for each participant.
Do not store or reuse tokens.
By default, tokens expire after 24 hours, this is checked at connection time. Adjust the expiration as needed, depending on your use case and application.
Add additional information to tokens (using the data parameter), such as usernames or other information to identify participants, but NEVER use personal information.
Set roles when applicable, such as moderator, publisher, and subscriber.
For more information about tokens, check out the Tokens Creation Overview page.
Understanding Media Router and Media Modes
When you create a session, you specify how clients in the session will send audio-video streams, known as the media mode. There are two options:
Relayed Mode - this media mode does not use Vonage media servers, but instead attempts to create a direct media connection between the participants. Before deciding whether to use relayed mode or not, be sure to consider the following:
Platform features such as archiving (recording), SIP integration, live streaming and experience composer are not needed
That the use case is one-to-one and 3-party sessions only
Where direct media between participants is preferred
End-to-end media encryption is required
Note that media quality will not be managed in relayed mode, given media is exchanged between clients. Therefore, setting the subscriber’s frame rate and/or resolution will not work. For more information about tokens, check out the Scalable Video page.
Routed Mode - this media mode uses Vonage media servers. Before deciding whether to use routed mode or not, be sure to check the following:
Three or more participants
May have a need to archive
Needs media quality control (audio fallback and video recovery)
May have a need to use SIP interconnect
May have a need to use interactive or live streaming broadcast
End-to-end encryption requires add-on subscription and is not supported on all SDKs
For more information about the media modes, check out the Session Creation Overview.
For more information about end-to-end encryption in Routed Mode, check out the End-to-End Encryption page.
Adaptive Media Routing - Beginning with OpenTok.js v2.24.7, routed sessions are optimised to use adaptive media routing, if possible. Adaptive media routing determines if media can be relayed without the OpenTok Media Router for one-on-one video streaming, to optimize the media performance between two participants. The routed session automatically adapts media routing to use the OpenTok Media Router when required — for sessions with three or more participants, archiving, live-streaming broadcasts, SIP interconnect, Experience Composer, and Audio Connector.
Audio Fallback - In routed mode Vonage SDK automatically falls back to audio only mode if the bandwidth is too low to support video calls. However, if you want to override this behavior this is possible by setting audioFallback to false on the OT.initPublisher
Getstats Method - in addition to implementing custom audio fallback mentioned above, getStats() polling can be used to capture information about the quality of the connection to display real-time information to the user as well as for troubleshooting purposes. See this example.
Vonage Inspector Tool - The Vonage Inspector Tool can be used to understand the media performance during the session, as well as which codecs, modes (relayed or routed), events and advanced features were used during the call.
Report Issue Method - It is possible to flag errors for later review in the inspector. The Report Issue ID can also be used to search inspector without knowing the sessionId.
Broadcast
Video API gives users two options for publishing live videos to wider audiences - an interactive broadcast and a live-streaming broadcast.
Interactive Broadcast
This type of broadcast allows clients to interact with each other by subscribing to each other's stream. Important to note that this type of broadcast can support up to 15,000 subscribers in up to Full HD. Below are things to consider when using this broadcast:
VisitVonage Scalable Video Simulcast to learn more about Simulcast. By default, Simulcast will kick in after the third connection joins the call (this is done to avoid Simulcast in one-to-one calls).
To override the default and disable scalable video for publishers in a routed session you can use the scalableVideo option in the OT.initPublisher() method, Keep in mind that the maximum number of subscribers will be impacted when publishers increase. To get the max subscribers consult the table on the “Live interactive video broadcasts” guide.
To ensure stability in large sessions, suppress connection events, see the “Suppressing connection events” guide.
Larger WebRTC sessions are possible when using the Experience Composer.
Live Streaming
This type of broadcast allows for more than 15,000 subscribers to subscribe to streams. There are two types of protocols available to broadcast video, RTMP (Real Time Messaging Protocol) and HLS (HTTP Live Streaming). Regardless of which one you choose, limit the number of publishers for a better viewer experience.
HLS vs RTMP
HLS supports an unlimited number of subscribers, whereas RTMP is limited by the RTMP delivery platform.
HLS is delayed by 15-20 seconds, whereas RTMP (from Vonage’s platform) is delayed by five seconds; this does not include the delay from RTMP delivery platform, however, as they too will induce delays based on how they process video.
Low Latency HLS (LL-HLS) is delayed by 4-6 seconds
HLS playback is not supported on all browsers, but there are plugins you can use, such as flowplayer. Playback allows users to go back, video scrubbing (rewind/fast forward) if you will, from the beginning of the live stream then back to the current live stream.
DVR mode can be activated when you create an HLS session. This is an Apple standard that allows users to Play/Pause and Resume Live HLS within a window of 2 hours.
HLS/RTMP has a default max duration of four hours. If the broadcast needs to go longer, change the max duration property (max is 10 hours).
HLS/RTMP streams automatically stop sixty seconds after the last client disconnects from the session.
To learn more about live streaming, such as layouts, max duration, and how to start/stop live streaming, visit the Live Streaming Broadcasts guide.
You can also read more on this topic in this blog post: “Video API: Making Interactive Broadcasts and Recordings Better for You” and in this sample code.
User Interface and User Experience Best Practices
In general, it is recommended to read and follow the UI Customization documentation (Web, iOS, Android, Windows) and follow the sections that are relevant to your application.
Ensuring a Good User Experience
Precall Test - add a precall test where users’ device and connection will be subject to network and hardware test prior to joining a session. Remember to generate new sessionIDs for every test and let the test run for at least 30 seconds for more accurate results.
The general Vonage Precall Test Tool can be used by you and your customers for generic connectivity tests to the Video API
If you would like to integrate your own PreCall test and gather all the test data, there are several resources available to do so:
You can also check how a Precall test can be embedded in a complete application by checking our Live Meeting Demo and inspecting the relevant source code of that demo to check how you can build it.
Publishing/Subscribing video streams - include handlers
Completion Handlers can give you feedback when you try to connect, publish, subscribe or send signals to a video API session. They are described here:
You can also listen for exception events on the OT object, which will throw exception events for more general errors that are described here: Exception Events
When the connection has been established, you would usually publish audio and video and also subscribe to other participants' streams. When managing the Publishers and Subscribers in regards to UI, you can make use of the respective events of the publisher and subscriber instances, which can help you display useful information to users when specific events or exceptions occur. Publisher and Subscriber events can be different and are described here:
For Android and iOS, please see “Exception Handling” above
Audio Fallback - our media server constantly checks network conditions and if it detects an issue with end users’ connection, it will automatically drop the video and continue with audio only, if packet loss is greater than 15%; and, an event gets sent when this happens (eg. for iOS: subscriberVideoDisabled:reason: and subscriberVideoEnabled:reason:). It is recommended that such event is displayed on the UI alerting impacted users that the quality of their connection dropped, switching to audio only. The threshold to switch to audio-only is not configurable, more information can be found in these examples:
https://tokbox.com/developer/sdks/js/reference/Subscriber.html#event:videoDisableWarning
Audio fallback is enabled by default, however it can be disabled with the audioFallbackEnabled parameter. See here
Reconnecting to session - when a participant suddenly drops from a session due to network-related issues, it will attempt to reconnect back to the session. For a better user experience, it is recommended that such events are captured and properly displayed to the UI letting the user know that it is attempting to reconnect back to the session. More information can be found here
Active speaker - for audio only session, try adding an audio level meter so that participants can have a visual of who the current active speaker/s is/are. For video, try changing the layout where the active speaker gets more screen real estate. You can use the audioLevelUpdated event that gets sent periodically to make UI adjustments.
Loudness detector - It is good practice to implement a loudness detector to identify when a given user who is muted is trying to speak. In this case, the audioLevelUpdated event will fire with audioLevel set to 0. Therefore, it’s necessary to use an AudioContext to avoid this situation. For reference, see "How to Create a Loudness Detector Using Vonage Video API".
Controlling resolution/frame rate - The Subscriber object provides methods to lower the received resolution and/or frame rate. This is useful in the context of saving bandwidth and CPU resources if you are displaying a large number of participants (e.g. more than 4 on a mobile or more than 8 on desktop)
Avoiding audio issues - On large calls participants can inadvertently introduce noise or echo into the call. The higher the number of participants the more likely this is to cause problems, therefore you could consider auto-mute on join logic and/or a mute all moderator button for larger sessions
Report Issue API - https://tokbox.com/developer/guides/debugging/js/#report-issue. This allows the end consumer of the application to trigger a unique issue ID from the client side. Our customer can store this issue ID and that can be used when raising a ticket with support. The issue Id will help to identify the unique connection ID that reported the problem and focus the investigation from support.
Location hint - https://tokbox.com/developer/guides/create-session/. This allows the developer to set a preferred region where the video call will be hosted. This can be useful for large sessions where you know most users will join from a specific region. Note that this does not guarantee a specific datacenter will be used, for that please see Regional Media Zones in this document
Enhancing In-Call Experience
Background Blur/Replacement - The JS SDK gives a simple method to blur the background or replace it with an image. For the latter please make sure the background image has the same aspect ratio as the published stream and be aware that the background may be cropped depending on the fitMode setting described here. Note that background blur and replacement is only supported in recent versions of Chrome, Electron, Opera, and Edge. It is not supported in other (non-Chromium-based) browsers or on iOS.
Media Processor - Use the Vonage Media Processor library to apply custom transformations to published video on desktop and mobile, including natively using our mobile SDKs. This supports background blur and replacement, but also more advanced use cases such as augmented reality and spatial audio.
Live Captions API - use this simple API to transcribe audio streams and generate real-time captions for your application. The Live Captions API lets you show live captions to end-users in a Vonage Video session, using a transcription service.
Audio Connector - using Audio Connector, you can send raw audio (PCM 16 khz/16bit) streams from a live Vonage Video session to external services such as AWS, GCP, Azure, etc, individually or mixed. You can also identify the speaker by sending the audio streams individually by opening multiple WS connections. Customers use this feature to build use cases such as medical transcriptions and real-time translation.
Engagement Features
Chat (text messaging) - you can send messages using Vonage’s signaling, but note that messages are not stored on Vonage’s video platform. When adding text messaging functionality, keep in mind that some users may arrive/join a session after text messages were sent; latecomers will be unable to view messages that were sent. Additionally, should you decide to record a session, text messages will not be captured, unless you implement Experience Composer.
Screen-share
- Consider hiding the publisher that shares their screen to avoid the hallway-mirror effect.
- ContentHint: allows you to optimise the screenshare to suit detail (such as slideware) or motion (such as a video).motion, detail, etc: This flag can and should be set after 2.20.
Archiving
There are two types of offerings when it comes to recording, composed and individual streams. Below talks about the difference between the two and things to consider
Composed:
Can record up to 16 video streams, plus an additional 34 streams where only the audio will be recorded, totaling 50 streams
Single MP4 file containing all media streams
Customizable layout - https://tokbox.com/developer/guides/archiving/layout-control.html
Screen recording and advanced layouts like custom overlays, montages and watermarks (see Experience Composer)
Can be started automatically (240 minutes max. If recording is not stopped, it will start archiving to a new file)
It is possible to prioritize certain streams to be included in the recording by assigning different layout classes. For example, screen-share streams - https://tokbox.com/developer/guides/archive-broadcast-layout/#stream-prioritization-rules
Supports SD, HD and FHD formats
Individual Stream:
Can record up to 50 streams, both audio and video
Multiple individual streams/files saved in a zip folder
Intended for use with a post-processing tool to produce customised content
Cannot be started automatically
Experience Composer
We also provide tools for you to create a highly customised composed layout. This allows you to create Web applications to build rich UI/UX experiences for end users.
Create a web page which will be rendered on Vonage platform
Mix video content with any other web content, such as
Chat window
Whiteboard
Advanced content and layouts
Content can be dynamic and change throughout the session
Composed content can be published into other Opentok sessions, in order to
Record
Archive
Broadcast
While keeping the rich UI/UX experiences delivered to users intact.
For more information see here: https://tokbox.com/developer/guides/experience-composer/
Storing archives
Vonage will retain archives for 72 hours if uploading fails, cloud storage has not been configured, or the disable option for storage fallback was not selected. Keep in mind that should you decide not to enable upload fallback and uploading fails for whatever reason, that archives will be not recoverable.
AWS S3: Visit this site https://tokbox.com/developer/guides/archiving/using-s3.html for instructions on how to upload archive files to AWS.
Azure: Visit this site https://tokbox.com/developer/guides/archiving/using-azure.html for instructions on how to upload archive files to Azure.
Google: You should use S3 compliant mode and follow the S3 instructions here https://tokbox.com/developer/guides/archiving/using-azure.html
Archiving FAQs:
Are archives encrypted?
Not by default. But one can add an encryption feature for composed archives. To learn more, visit https://tokbox.com/developer/guides/archiving/opentok-encryption.html
2. Can you record just the audio or just the video?
Yes. Using the REST API, set the hasVideo/hasAudio to true or false - https://tokbox.com/developer/rest/#start_archive
3. Can I name the archive so that I can identify them by name?
Yes. Using the REST API, set the name to the desired identifier
<String>
- https://tokbox.com/developer/rest/#start_archive
4. How can I check archives’ status?
Use the archive inspector. A great article can be found here https://api.support.vonage.com/hc/en-us/articles/6646228878236-Archiving-FAQ
5. Can I record certain streams from a session?
Yes, please see https://tokbox.com/developer/guides/archiving/#manual-stream-mode. You can also change this on the fly https://tokbox.com/developer/rest/#selecting-archive-streams
6. Can I record in different formats at the same time?
Yes, you can initiate archiving more than once for a given session. This means for example you could record a composed layout at the same time as having a separate recording per participant (i.e. Individual archiving)
Quality, Performance and Compatibility
Devices - for multi party sessions, try to limit the number of participants, as more participants requires more processing power.
See below the number of participants that we recommend:
Mobile = 4 (Engineering official statement supports up to 8 max)
Laptop = 10
Desktop = 25
Controlling resolution/frame rate - The Subscriber object provides methods to lower the received resolution and/or frame rate. This is useful in the context of saving bandwidth and CPU resources if you are displaying a large number of participants (e.g. more than 4 on a mobile or more than 8 on desktop)
Bandwidth requirements see "What is the minimum bandwidth requirement to use OpenTok?"
Proxy - if users can only access the internet through a proxy, make sure that it is a “transparent” proxy, else it must be configured in the browser for HTTPS connection, given webRTC does not work well on proxies requiring authentication. Check out our network check flow - https://tokbox.com/developer/guides/restricted-networks/
Firewall - at minimum, below are the ports and domains that we recommend be included in firewall rules:
TCP 443
FQDN: *.tokbox.com
FQDN: *.opentok.com
STUN/TURN: UDP 3478
If allowed, try opening the following port range: UDP 1025 - 65535. This will provide users with the best experience possible. This also eliminates the need for TURN; not relaying media through such network elements decreases latency.
Further information can be found in this article: "What are the Vonage Video API network connectivity requirements?".
Codec - link to codec compatibility https://tokbox.com/developer/guides/codecs/. Vonage supports VP9, VP8 and H.264 codecs; however, VP9 is only available on relayed media mode on sessions where ALL participants are using Chrome.
Difference between VP8 and H.264:
VP8 is a software codec, more mature and can handle lower bitrates. Additionally, it supports scalable/simulcast video.
H.264 is available through software and hardware, depending on the device. It does not support scalable video or simulcast.
By default, codec is set to VP8. If you need to change the assigned codec for a particular
project key, login to your portal to make the change.
Session Monitoring
Visit our dev page - https://tokbox.com/developer/guides/session-monitoring/
Session monitoring allows you to register a webhook URL.
Use this feature to monitor sessions and streams - an example of this is limiting the number of participants in a session, this is often used alongside forceDisconnect function for JS - https://tokbox.com/developer/guides/moderation/js/#force_disconnect. Moderator can also call an action to the server and have it do a REST call to force disconnect - https://tokbox.com/developer/guides/moderation/rest/
Can be used to track usage (for better usage tracking, use Advance Insights - https://tokbox.com/developer/guides/insights/#obtaining-session-data-advanced-insights-.
If within 30 minutes there are more than 50 event delivery failures (in which we don't receive a 200 success response when sending an HTTP request to your callback URL), we will disable session monitoring event forwarding. We will send an email if this occurs. You can re-enable session monitoring in your TokBox account page.
For extra security you can sign your webhooks
ConnectionCreated: you can use ConnectionData to identify users' connections. For example, you can pass the user ID, name, or other data describing the client (Do not use personal information in token data)
Add-ons
Most customers can purchase (or remove) add-ons with a single click. configured via the self-service tool.
SIP Interconnect
Get Started: https://tokbox.com/developer/guides/sip/
How to build a Phone Dial in via SIP Interconnect: https://developer.vonage.com/en/blog/connecting-webrtc-and-pstn-with-opentok-and-nexmo-dr
Configurable TURN
IP Proxy
Get Started: https://tokbox.com/developer/guides/ip-proxy/
How to host on AWS: https://api.support.vonage.com/hc/en-us/articles/6646184200348-Install-and-Configure-a-Test-Proxy-Server-in-AWS
Regional Media Zones
This is a geofencing feature used for compliance reasons. For more information, see “How do I configure Regional Media Zones for my project?”
Security and Privacy
Vonage Video API can be customized to meet the highest security standards. Our platform is GDPR compliant and we are HIPAA compliant. For European customers, we are offering extended add-ons that make it possible to comply with additional local certifications and standards, such as KBV certification (Germany) or other privacy laws that aim for better data ownership & protection (Europe-wide).
Read more about:
On request and under NDA, we can provide further reports such as the Data Transfer Impact Assessment, SOC2 Type 2 audit report, and External Pen Tests that prove the high-security standards of our Video platform.
Sample codes:
Git Repository:
Session Monitoring
Vonage text chat
Post-processing tool sample code for processing individual stream archive
Calculating monthly usage / Video API tiered pricing -
How do I estimate my OpenTok monthly usage - Per Participant Minutes (PPM) - default pricing model
How do I estimate my OpenTok monthly usage - Session Subscriber Minutes (SSM) - Legacy pricing model
Further Reading
Video API Basics
Explore Vonage Video API Capabilities with Vonage Video API Playground
Generate and Assign Token Roles using the Video API | One Dev Minute
Conferencing and Video Calling Samples
Video API Advanced Capabilities
Track Users Connections Using Video API and Session Monitoring
Waiting Room and Pre-Call Best Practices With Vonage Video API
Build a Breakout Room Application in JavaScript with Vonage Video API
Video Express
Broadcast/Recording
Create a Personal Twitch with Vonage Video API and Web Components
Capture the entire experience of your web application with Experience Composer
Auto Zoom and Center Published Video Calls with Vonage Video API
Tripling interactive participants to 15,000, introducing Experience Composer, LL-HLS and Full HD
Video + AI
Media Processor
Apply ML transformers to achieve video and audio effects in live video streams.
Use Media Processor and AI Video Transformers with Vonage Video API
Blurring for Clarity: Protect Participants Privacy and Aid Attention
Audio Connector
Extract audio from live video calls to take advantage of natural language processing engines.
Troubleshooting and Analytics
Simon is a Customer Solutions Architect at Vonage and brings along 20 years of communications experience. His global career has included working in rural Africa and India bringing communications to remote locations. Today he's passionate about using communications APIs to improve user experience and efficiency and is focused on the video WebRTC space.