How to Build an AI Voice Agent with Vonage Voice API and Deepgram

Introduction

This guide outlines the process of building a real-time AI voice agent using the Vonage Voice API and Deepgram's Voice Agent platform. You will create an intelligent voice assistant that answers phone calls, listens to users via Automatic Speech Recognition (ASR), processes requests with a Large Language Model (LLM), and responds with natural-sounding text-to-speech, all in real time. Additionally, the setup supports conversation interruption, also known as barge-in.

For an overview of voice automation concepts and a comparison of all three implementation approaches, see Understanding Voice Automation.

Prerequisites

Before you begin, ensure you have:

A Vonage API account. Sign up for free.
Node.js version 18 or higher installed on your machine.
A Deepgram account with an API key.
ngrok installed on your machine.

Setup Your Local Environment

Create a new directory for your project and install the required dependencies:

mkdir vonage-deepgram-voice-agent cd vonage-deepgram-voice-agent npm init -y npm install @vonage/server-sdk express express-ws body-parser dotenv ws

Expose Your Local Server

Vonage needs to send webhooks to your local machine. Use ngrok to expose your server:

ngrok http 3000

Note: Keep this terminal open and copy your ngrok URL. You'll need it in the next steps.

Provision Your Vonage Resources

Create a Vonage Application

Generate your credentials via the Dashboard and save them to the folder you just created.

Go to Applications > Create a new application.
Give your application a name.
Authentication: Click Generate public and private keys.
- A file named private.key will download.
- Move this private.key file from your Downloads folder into your vonage-deepgram-voice-agent folder.
Under Capabilities, enable Voice.
In the Voice settings, set the following webhooks:
- Answer URL: https://{ngrok-url}/answer (Method: GET)
- Event URL: https://{ngrok-url}/event (Method: POST)
Click Generate new application at the bottom.

Link a Number

Go to Phone Numbers > Buy Numbers and purchase a voice-enabled number.
Go to Applications, select your bot application, and click Edit.
Under the Numbers tab, click Link next to your newly purchased number.

Configure Environment Variables

Create a .env file in your project directory with the following variables:

#==== Vonage Voice API ====
API_KEY=your_vonage_api_key
API_SECRET=your_vonage_api_secret
APP_ID=your_application_id
SERVICE_PHONE_NUMBER=your_vonage_number

#==== Deepgram Voice Agent API ====
DEEPGRAM_API_KEY=your_deepgram_api_key
DEEPGRAM_VOICE_AGENT_ENDPOINT=agent.deepgram.com/v1/agent/converse
DEEPGRAM_AGENT_SPEAK=aura-orion-en

#==== Other custom parameters ====
MAX_CALL_DURATION=300

Important: Store your API keys in environment variables rather than hardcoding them in your source code for security.

Build the Voice Agent Connector

Create a file named server.js and add the following code. This application acts as a connector between Vonage Voice API and Deepgram Voice Agent.

'use strict'

require('dotenv').config();

const express = require('express');
const bodyParser = require('body-parser');
const app = express();
require('express-ws')(app);
const webSocket = require('ws');

app.use(bodyParser.json());

//---- CORS policy ----
app.use(function (req, res, next) {
  res.header("Access-Control-Allow-Origin", "*");
  res.header("Access-Control-Allow-Headers", "Origin, X-Requested-With, Content-Type, Accept");
  res.header("Access-Control-Allow-Methods", "OPTIONS,GET,POST,PUT,DELETE");
  next();
});

//---- Configuration ----
const servicePhoneNumber = process.env.SERVICE_PHONE_NUMBER;

//---- Vonage API Setup ----
const { Auth } = require('@vonage/auth');
const credentials = new Auth({
  apiKey: process.env.API_KEY,
  apiSecret: process.env.API_SECRET,
  applicationId: process.env.APP_ID,
  privateKey: './private.key'
});

const apiBaseUrl = "https://api.nexmo.com";
const options = { apiHost: apiBaseUrl };

const { Vonage } = require('@vonage/server-sdk');
const vonage = new Vonage(credentials, options);

//---- Deepgram Voice Agent Configuration ----
const dgApiKey = process.env.DEEPGRAM_API_KEY;
const dgVoiceAgentEndpoint = process.env.DEEPGRAM_VOICE_AGENT_ENDPOINT;
const dgVoiceAgentSettings = {
  "type": "Settings",
  "audio": {
    "input": { "encoding": "linear16", "sample_rate": 8000 },
    "output": { "encoding": "linear16", "sample_rate": 8000, "container": "none" }
  },
  "agent": {
    "listen": { "provider": { "type": "deepgram", "model": "nova-3" } },
    "think": {
      "provider": { "type": "anthropic", "model": "claude-sonnet-4-20250514" },
      "prompt": "You are a helpful AI assistant on a live phone call. Keep responses concise and natural for spoken conversation."
    },
    "speak": { 
      "provider": { 
        "type": "deepgram", 
        "model": process.env.DEEPGRAM_AGENT_SPEAK 
      } 
    }
  }
};

//---- Handle incoming PSTN calls ----
app.get('/answer', async (req, res) => {
  const hostName = req.hostname;
  const uuid = req.query.uuid;
  
  // For local development with ngrok, use your ngrok URL directly
  // const publicUrl = 'https://your-ngrok-url.ngrok.io';
  const wsUri = `wss://${hostName}/socket?original_uuid=${uuid}`;
  
  const nccoResponse = [
    {
      "action": "talk",
      "text": "Hello, please wait while we're connecting your call!",
      "language": "en-US",
      "style": 11
    },
    {
      "action": "connect",
      "eventType": "synchronous",
      "eventUrl": [`https://${hostName}/ws_event`],
      "from": req.query.from,
      "endpoint": [
        {
          "type": "websocket",
          "uri": wsUri,
          "content-type": "audio/l16;rate=8000",
          "headers": {}
        }
      ]
    }
  ];
  
  res.status(200).json(nccoResponse);
});

//---- Event webhook for call status ----
app.post('/event', async (req, res) => {
  res.status(200).send('Ok');
});

//---- WebSocket event handler ----
app.post('/ws_event', async (req, res) => {
  res.status(200).send('Ok');
  
  // Trigger a greeting when WebSocket is connected
  setTimeout(() => {
    if (req.body.status === 'answered') {
      vonage.voice.playTTS(req.body.uuid, {
        text: "Hello",
        language: 'en-US',
        style: 11
      })
      .then(res => console.log("Initial greeting sent"))
      .catch(err => console.error("Failed to play TTS:", err));
    }
  }, 1500);
});

//---- Start server ----
const port = process.env.PORT || 3000;
app.listen(port, () => {
  console.log(`Voice Agent application listening on port ${port}`);
  console.log(`Make sure ngrok is forwarding to this port!`);
});

Note: When running locally with ngrok, the req.hostname may not match your public tunnel URL. If webhooks fail, set your ngrok base URL as an environment variable and use it to build the eventUrl and wsUri instead.

Add the WebSocket Connector Logic

Now add the core connector logic that bridges Vonage Voice API with Deepgram Voice Agent. Append this to your server.js:

//---- WebSocket Connector ----
app.ws('/socket', async (ws, req) => {
  let wsDgOpen = false; // Deepgram WebSocket ready?
  const originalUuid = req.query.original_uuid;
  
  console.log('WebSocket connected for call UUID:', originalUuid);
  
  //---- Connect to Deepgram Voice Agent ----
  console.log('Opening connection to Deepgram Voice Agent');
  const wsDg = new webSocket(`wss://${dgVoiceAgentEndpoint}`, {
    headers: { authorization: `token ${dgApiKey}` }
  });
  
  wsDg.on('error', async (event) => {
    console.log('WebSocket to Deepgram error:', event);
  });
  
  wsDg.on('open', () => {
    console.log('WebSocket to Deepgram opened');
    // Send configuration to Deepgram Voice Agent
    wsDg.send(JSON.stringify(dgVoiceAgentSettings));
    wsDgOpen = true;
  });
  
  //---- Handle messages from Deepgram ----
  wsDg.on('message', async (msg, isBinary) => {
    if (isBinary) {
      // Audio data from agent - send directly to Vonage
      ws.send(msg);
    } else {
      // Text messages (transcripts, events, etc.)
      const message = JSON.parse(msg.toString('utf8'));
      console.log(`Message from Deepgram:`, message);
      
      // Handle barge-in: clear Vonage's audio buffer when user starts speaking
      if (message.type === "UserStartedSpeaking") {
        ws.send(JSON.stringify({ action: "clear" }));
        console.log('Sent CLEAR command to Vonage');
      }
    }
  });
  
  wsDg.on('close', async () => {
    wsDgOpen = false;
    console.log("Deepgram WebSocket closed");
  });
  
  //---- Handle messages from Vonage (user audio) ----
  ws.on('message', async (msg) => {
    if (typeof msg === "string") {
      const event = JSON.parse(msg);
      console.log("Vonage event:", event.event);
      
      // The first message from Vonage is always websocket:connected
      if (event.event === "websocket:connected") {
        console.log('Vonage WebSocket established:', event['content-type']);
      }
      
      // Handle Vonage control message confirmations
      if (event.event === "websocket:cleared") {
        console.log('Vonage audio buffer cleared');
      }
    } else {
      // Binary audio data from caller - forward to Deepgram
      if (wsDgOpen) {
        wsDg.send(msg);
      }
    }
  });
  
  //---- Clean up on disconnect ----
  ws.on('close', async () => {
    wsDgOpen = false;
    wsDg.close();
    console.log("Vonage WebSocket closed");
  });
});

How It Works

Simplified Audio Streaming: Audio from Deepgram is sent directly to Vonage as binary messages. No manual buffering or timing is needed—Vonage handles the internal buffering automatically.

Clear Buffer Control Message: When Deepgram detects that the user has started speaking (UserStartedSpeaking event), the application sends a CLEAR control message to Vonage: {"action": "clear"}. This instructs the Vonage Voice API to immediately discard any buffered audio frames, creating instant barge-in functionality without manual buffer management.

Event Confirmation: Vonage responds with a websocket:cleared event to confirm the buffer was cleared successfully. This allows you to track when interruptions occur.

Bidirectional Communication: User audio flows from Vonage → Deepgram as binary WebSocket messages, while agent audio and transcripts flow from Deepgram → Vonage in real-time.

Real-time Transcripts: Deepgram sends JSON messages containing transcripts of both user speech and agent responses, which you can log or process for analytics and quality assurance.

Test the Application

Make sure your private.key file is in the project directory.
Start ngrok in one terminal:

ngrok http 3000

Run your server in another terminal:

node server.js

Call your Vonage phone number from your mobile phone.
The voice agent will greet you and respond to your questions using AI-powered conversation.

Add Outbound Calling Capability

To enable your application to make outbound calls, add this endpoint to your server.js:

//---- Trigger outbound PSTN calls ----
app.get('/call', async (req, res) => {
  if (req.query.callee == null) {
    res.status(400).send('"callee" number missing as query parameter');
  } else {
    res.status(200).send('Ok');
    const hostName = req.hostname;
    
    vonage.voice.createOutboundCall({
      to: [{
        type: 'phone',
        number: req.query.callee
      }],
      from: {
        type: 'phone',
        number: servicePhoneNumber
      },
      limit: process.env.MAX_CALL_DURATION,
      answer_url: [`https://${hostName}/answer`],
      answer_method: 'GET',
      event_url: [`https://${hostName}/event`],
      event_method: 'POST'
    })
    .then(res => console.log("Outgoing PSTN call status:", res))
    .catch(err => console.error("Outgoing PSTN call error:", err));
  }
});

To trigger an outbound call, open your browser and navigate to:

https://your-ngrok-url.ngrok.io/call?callee=15551234567

Replace 15551234567 with the phone number you want to call (in E.164 format without the + sign).

Customize Your Voice Agent

You can customize various aspects of the voice agent by modifying the dgVoiceAgentSettings object:

Change the AI Model

"think": {
  "provider": { "type": "open_ai", "model": "gpt-4o-mini" },
  "prompt": "You are a helpful AI assistant on a live phone call. Keep responses concise and natural for spoken conversation."
}

Change the Voice

Update the DEEPGRAM_AGENT_SPEAK variable in your .env file. See Deepgram's TTS models documentation for available voice options.

Customize the System Prompt

Modify the prompt field in the think section to change your agent's personality and behavior:

"prompt": "You are a friendly customer service representative for Acme Corp. Help users with their inquiries about our products and services. Be professional but warm."

Next Steps

Explore WebSocket documentation for advanced audio streaming patterns.
Add call recording and transcription for audit purposes and quality control.