How to Build an AI Voice Agent with Vonage Voice API and Deepgram
Introduction
This guide outlines the process of building a real-time AI voice agent using the Vonage Voice API and Deepgram's Voice Agent platform. You will create an intelligent voice assistant that answers phone calls, listens to users via Automatic Speech Recognition (ASR), processes requests with a Large Language Model (LLM), and responds with natural-sounding text-to-speech, all in real time. Additionally, the setup supports conversation interruption, also known as barge-in.
Prerequisites
Before you begin, ensure you have:
- A Vonage API account. Sign up for free.
- Node.js version 18 or higher installed on your machine.
- A Deepgram account with an API key.
- ngrok installed on your machine.
Setup Your Local Environment
Create a new directory for your project and install the required dependencies:
Expose Your Local Server
Vonage needs to send webhooks to your local machine. Use ngrok to expose your server:
Note: Keep this terminal open and copy your ngrok URL. You'll need it in the next steps.
Provision Your Vonage Resources
Log in to the Vonage Dashboard to start.
Create a Vonage Application
Generate your credentials via the Dashboard and save them to the folder you just created.
- Go to Applications > Create a new application.
- Give your application a name.
- Authentication: Click Generate public and private keys.
- A file named
private.keywill download. - Move this
private.keyfile from your Downloads folder into yourvonage-deepgram-voice-agentfolder.
- A file named
- Under Capabilities, enable Voice.
- In the Voice settings, set the following webhooks:
- Answer URL:
https://{ngrok-url}/answer(Method:GET) - Event URL:
https://{ngrok-url}/event(Method:POST)
- Answer URL:
- Click Generate new application at the bottom.
Link a Number
- Go to Phone Numbers > Buy Numbers and purchase a voice-enabled number.
- Go to Applications, select your bot application, and click Edit.
- Under the Numbers tab, click Link next to your newly purchased number.
Configure Environment Variables
Create a .env file in your project directory with the following variables:
Important: Store your API keys in environment variables rather than hardcoding them in your source code for security.
Build the Voice Agent Connector
Create a file named server.js and add the following code. This application acts as a connector between Vonage Voice API and Deepgram Voice Agent.
'use strict'
require('dotenv').config();
const express = require('express');
const bodyParser = require('body-parser');
const app = express();
require('express-ws')(app);
const webSocket = require('ws');
app.use(bodyParser.json());
//---- CORS policy ----
app.use(function (req, res, next) {
res.header("Access-Control-Allow-Origin", "*");
res.header("Access-Control-Allow-Headers", "Origin, X-Requested-With, Content-Type, Accept");
res.header("Access-Control-Allow-Methods", "OPTIONS,GET,POST,PUT,DELETE");
next();
});
//---- Configuration ----
const servicePhoneNumber = process.env.SERVICE_PHONE_NUMBER;
//---- Vonage API Setup ----
const { Auth } = require('@vonage/auth');
const credentials = new Auth({
apiKey: process.env.API_KEY,
apiSecret: process.env.API_SECRET,
applicationId: process.env.APP_ID,
privateKey: './private.key'
});
const apiBaseUrl = "https://api.nexmo.com";
const options = { apiHost: apiBaseUrl };
const { Vonage } = require('@vonage/server-sdk');
const vonage = new Vonage(credentials, options);
//---- Deepgram Voice Agent Configuration ----
const dgApiKey = process.env.DEEPGRAM_API_KEY;
const dgVoiceAgentEndpoint = process.env.DEEPGRAM_VOICE_AGENT_ENDPOINT;
const dgVoiceAgentSettings = {
"type": "Settings",
"audio": {
"input": { "encoding": "linear16", "sample_rate": 8000 },
"output": { "encoding": "linear16", "sample_rate": 8000, "container": "none" }
},
"agent": {
"listen": { "provider": { "type": "deepgram", "model": "nova-3" } },
"think": {
"provider": { "type": "anthropic", "model": "claude-sonnet-4-20250514" },
"prompt": "You are a helpful AI assistant on a live phone call. Keep responses concise and natural for spoken conversation."
},
"speak": {
"provider": {
"type": "deepgram",
"model": process.env.DEEPGRAM_AGENT_SPEAK
}
}
}
};
//---- Handle incoming PSTN calls ----
app.get('/answer', async (req, res) => {
const hostName = req.hostname;
const uuid = req.query.uuid;
// For local development with ngrok, use your ngrok URL directly
// const publicUrl = 'https://your-ngrok-url.ngrok.io';
const wsUri = `wss://${hostName}/socket?original_uuid=${uuid}`;
const nccoResponse = [
{
"action": "talk",
"text": "Hello, please wait while we're connecting your call!",
"language": "en-US",
"style": 11
},
{
"action": "connect",
"eventType": "synchronous",
"eventUrl": [`https://${hostName}/ws_event`],
"from": req.query.from,
"endpoint": [
{
"type": "websocket",
"uri": wsUri,
"content-type": "audio/l16;rate=8000",
"headers": {}
}
]
}
];
res.status(200).json(nccoResponse);
});
//---- Event webhook for call status ----
app.post('/event', async (req, res) => {
res.status(200).send('Ok');
});
//---- WebSocket event handler ----
app.post('/ws_event', async (req, res) => {
res.status(200).send('Ok');
// Trigger a greeting when WebSocket is connected
setTimeout(() => {
if (req.body.status === 'answered') {
vonage.voice.playTTS(req.body.uuid, {
text: "Hello",
language: 'en-US',
style: 11
})
.then(res => console.log("Initial greeting sent"))
.catch(err => console.error("Failed to play TTS:", err));
}
}, 1500);
});
//---- Start server ----
const port = process.env.PORT || 3000;
app.listen(port, () => {
console.log(`Voice Agent application listening on port ${port}`);
console.log(`Make sure ngrok is forwarding to this port!`);
});
Note: When running locally with ngrok, the req.hostname may not match your public tunnel URL. If webhooks fail, set your ngrok base URL as an environment variable and use it to build the eventUrl and wsUri instead.
Add the WebSocket Connector Logic
Now add the core connector logic that bridges Vonage Voice API with Deepgram Voice Agent. Append this to your server.js:
//---- WebSocket Connector ----
app.ws('/socket', async (ws, req) => {
let wsDgOpen = false; // Deepgram WebSocket ready?
const originalUuid = req.query.original_uuid;
console.log('WebSocket connected for call UUID:', originalUuid);
//---- Connect to Deepgram Voice Agent ----
console.log('Opening connection to Deepgram Voice Agent');
const wsDg = new webSocket(`wss://${dgVoiceAgentEndpoint}`, {
headers: { authorization: `token ${dgApiKey}` }
});
wsDg.on('error', async (event) => {
console.log('WebSocket to Deepgram error:', event);
});
wsDg.on('open', () => {
console.log('WebSocket to Deepgram opened');
// Send configuration to Deepgram Voice Agent
wsDg.send(JSON.stringify(dgVoiceAgentSettings));
wsDgOpen = true;
});
//---- Handle messages from Deepgram ----
wsDg.on('message', async (msg, isBinary) => {
if (isBinary) {
// Audio data from agent - send directly to Vonage
ws.send(msg);
} else {
// Text messages (transcripts, events, etc.)
const message = JSON.parse(msg.toString('utf8'));
console.log(`Message from Deepgram:`, message);
// Handle barge-in: clear Vonage's audio buffer when user starts speaking
if (message.type === "UserStartedSpeaking") {
ws.send(JSON.stringify({ action: "clear" }));
console.log('Sent CLEAR command to Vonage');
}
}
});
wsDg.on('close', async () => {
wsDgOpen = false;
console.log("Deepgram WebSocket closed");
});
//---- Handle messages from Vonage (user audio) ----
ws.on('message', async (msg) => {
if (typeof msg === "string") {
const event = JSON.parse(msg);
console.log("Vonage event:", event.event);
// The first message from Vonage is always websocket:connected
if (event.event === "websocket:connected") {
console.log('Vonage WebSocket established:', event['content-type']);
}
// Handle Vonage control message confirmations
if (event.event === "websocket:cleared") {
console.log('Vonage audio buffer cleared');
}
} else {
// Binary audio data from caller - forward to Deepgram
if (wsDgOpen) {
wsDg.send(msg);
}
}
});
//---- Clean up on disconnect ----
ws.on('close', async () => {
wsDgOpen = false;
wsDg.close();
console.log("Vonage WebSocket closed");
});
});
How It Works
Simplified Audio Streaming: Audio from Deepgram is sent directly to Vonage as binary messages. No manual buffering or timing is needed—Vonage handles the internal buffering automatically.
Clear Buffer Control Message: When Deepgram detects that the user has started speaking (UserStartedSpeaking event), the application sends a CLEAR control message to Vonage: {"action": "clear"}. This instructs the Vonage Voice API to immediately discard any buffered audio frames, creating instant barge-in functionality without manual buffer management.
Event Confirmation: Vonage responds with a websocket:cleared event to confirm the buffer was cleared successfully. This allows you to track when interruptions occur.
Bidirectional Communication: User audio flows from Vonage → Deepgram as binary WebSocket messages, while agent audio and transcripts flow from Deepgram → Vonage in real-time.
Real-time Transcripts: Deepgram sends JSON messages containing transcripts of both user speech and agent responses, which you can log or process for analytics and quality assurance.
Test the Application
- Make sure your
private.keyfile is in the project directory. - Start ngrok in one terminal:
- Run your server in another terminal:
- Call your Vonage phone number from your mobile phone.
- The voice agent will greet you and respond to your questions using AI-powered conversation.
Add Outbound Calling Capability
To enable your application to make outbound calls, add this endpoint to your server.js:
//---- Trigger outbound PSTN calls ----
app.get('/call', async (req, res) => {
if (req.query.callee == null) {
res.status(400).send('"callee" number missing as query parameter');
} else {
res.status(200).send('Ok');
const hostName = req.hostname;
vonage.voice.createOutboundCall({
to: [{
type: 'phone',
number: req.query.callee
}],
from: {
type: 'phone',
number: servicePhoneNumber
},
limit: process.env.MAX_CALL_DURATION,
answer_url: [`https://${hostName}/answer`],
answer_method: 'GET',
event_url: [`https://${hostName}/event`],
event_method: 'POST'
})
.then(res => console.log("Outgoing PSTN call status:", res))
.catch(err => console.error("Outgoing PSTN call error:", err));
}
});
To trigger an outbound call, open your browser and navigate to:
https://your-ngrok-url.ngrok.io/call?callee=15551234567
Replace 15551234567 with the phone number you want to call (in E.164 format without the + sign).
Customize Your Voice Agent
You can customize various aspects of the voice agent by modifying the dgVoiceAgentSettings object:
Change the AI Model
"think": {
"provider": { "type": "open_ai", "model": "gpt-4o-mini" },
"prompt": "You are a helpful AI assistant on a live phone call. Keep responses concise and natural for spoken conversation."
}
Change the Voice
Update the DEEPGRAM_AGENT_SPEAK variable in your .env file. See Deepgram's TTS models documentation for available voice options.
Customize the System Prompt
Modify the prompt field in the think section to change your agent's personality and behavior:
"prompt": "You are a friendly customer service representative for Acme Corp. Help users with their inquiries about our products and services. Be professional but warm."
Next Steps
- Explore WebSocket documentation for advanced audio streaming patterns.
- Add call recording and transcription for audit purposes and quality control.