How to Build an Advanced IVR / Voice Bot
This guide demonstrates how to build a voice-based AI agent using the Vonage Voice API and OpenAI. You will create a Voice Bot that answers inbound calls, listens to a user's question using Automatic Speech Recognition (ASR), and responds with an intelligent answer generated by an LLM.
Prerequisites
Before you begin, ensure you have:
- A Vonage API account.
- Node.js installed on your machine.
- An OpenAI API Key.
- ngrok installed on your machine.
Setup Your Local Environment
Create a new directory for your project and install the required dependencies:
Expose Your Local Server
Vonage needs to send webhooks to your local machine. Use ngrok to expose your server:
ngrok will be forwarding your port 3000 (defined in your server) towards a public URL, like https://{random id}.ngrok.app.
Keep this terminal open.
Provision Your Vonage Resources
Log in to the Vonage Dashboard to start.
Create a Voice Application
- Navigate to Applications > Create a new application.
- Give it a name (e.g., Voice AI Bot).
- Under Capabilities, enable Voice.
- In the Answer URL field, enter your Base URL followed by
/webhooks/answer(e.g.,https://{random id}.ngrok.app/webhooks/answer). Set the method toGET. - In the Event URL field, enter your Base URL followed by
/webhooks/events. Set the method toPOST. - Click Generate public and private key. Save the
private.keyfile in your project folder (though we won't use it for this basic ASR flow, it's required for app creation). - Click Save changes.
Link a Number
- Go to Numbers > Buy Numbers and purchase a voice-enabled number.
- Go to Your applications, select your bot application, and click Edit.
- Under the Numbers tab, click Link next to your newly purchased number.
Build the Voice Bot
Create a file named index.js and add the following code. Replace YOUR_OPENAI_API_KEY with your actual key.
Note: When running locally with ngrok, req.protocol/req.get('host') may not match your public tunnel URL. If webhooks fail, set your tunnel base URL in config (for example an env var) and build eventUrl from that instead.
const express = require('express');
const { OpenAI } = require('openai');
const app = express();
app.use(express.json());
const openai = new OpenAI({ apiKey: 'YOUR_OPENAI_API_KEY' });
// 1. Handle the initial call
app.get('/webhooks/answer', (req, res) => {
const ncco = [
{
action: 'talk',
text: 'Hi, I am your AI assistant. How can I help you today?'
},
{
action: 'input',
eventUrl: [`${req.protocol}://${req.get('host')}/webhooks/asr`],
type: ['speech'],
speech: {
language: 'en-us',
endOnSilence: 1
}
}
];
res.json(ncco);
});
// 2. Process the Speech-to-Text result and query OpenAI
app.post('/webhooks/asr', async (req, res) => {
const speechResults = req.body.speech?.results;
if (!speechResults || speechResults.length === 0) {
return res.json([{ action: 'talk', text: 'I am sorry, I didn\'t catch that. Goodbye.' }]);
}
const userText = speechResults[0].text;
console.log(`User said: ${userText}`);
try {
// Request a completion from OpenAI
const completion = await openai.chat.completions.create({
model: "gpt-4o",
messages: [
{ role: "system", content: "You are a helpful assistant on a phone call. Keep answers concise." },
{ role: "user", content: userText }
],
});
const aiResponse = completion.choices[0].message.content;
// Respond back to the user
res.json([{ action: 'talk', text: aiResponse }]);
} catch (error) {
console.error("OpenAI Error:", error);
res.json([{ action: 'talk', text: 'I encountered an error processing your request.' }]);
}
});
// 3. Log call events
app.post('/webhooks/events', (req, res) => {
console.log('Event:', req.body.status);
res.sendStatus(200);
});
app.listen(3000, () => console.log('Server running on port 3000'));
Test the Application
Run your server:
node index.jsDial your Vonage number from your phone.
When prompted, ask a question (e.g., Why is the sky blue? or Tell me a joke).
The bot will capture your speech, send it to OpenAI, and read the response back to you using Text-to-Speech.
Enable Contextual Conversation
To make the conversation feel natural, we must modify the app to remember previous exchanges and re-prompt the user for more input.
Note: When running locally with ngrok, req.get('host') may not match your public tunnel host. If webhooks fail, build eventUrl using your public tunnel base URL (for example from config/env) instead of the request host.
Update your index.js with this stateful logic:
// 1. Add a Map to store conversation history by Call UUID
const sessions = new Map();
// Helper to generate a NCCO that "loops" back to ASR
const getConversationalNCCO = (text, host) => [
{ action: 'talk', text: text },
{
action: 'input',
eventUrl: [`https://${host}/webhooks/asr`],
type: ['speech'],
speech: { language: 'en-us', endOnSilence: 1 }
}
];
app.get('/webhooks/answer', (req, res) => {
const uuid = req.query.uuid;
// Initialize history for this specific caller
sessions.set(uuid, [{ role: "system", content: "You are a helpful, concise assistant." }]);
res.json(getConversationalNCCO('Hello! What is on your mind?', req.get('host')));
});
app.post('/webhooks/asr', async (req, res) => {
const { uuid, speech } = req.body;
const userText = speech?.results?.[0]?.text;
if (!userText) {
sessions.delete(uuid);
return res.json([{ action: 'talk', text: 'Goodbye!' }]);
}
// Retrieve history and append the new question
let history = sessions.get(uuid) || [];
history.push({ role: "user", content: userText });
const completion = await openai.chat.completions.create({
model: "gpt-4o",
messages: history,
});
const aiResponse = completion.choices[0].message.content;
history.push({ role: "assistant", content: aiResponse });
sessions.set(uuid, history);
// Return the AI response AND listen for the next question
res.json(getConversationalNCCO(aiResponse, req.get('host')));
});
// Clean up memory when the call ends
app.post('/webhooks/events', (req, res) => {
if (req.body.status === 'completed') sessions.delete(req.body.uuid);
res.sendStatus(200);
});
What Changed
- The Session Map: We use the
uuidto keep different callers' histories separate. - Recursive NCCO: Instead of a simple
talkaction, we now return atalkfollowed by aninputaction. This keeps the line open. - Memory: By passing the entire
historyarray to OpenAI, the bot now understands follow-up questions like Tell me more about that.
Try the updated application by restarting your server and dialing the Vonage number linked to your application, as in the Test the Application step.
Add "Connect to Human" Tool
This step involves updating your tool definitions and adding a branch to your ASR logic that returns the Vonage connect action.
Update index.js
Add the new tool definition and modify the asr webhook to handle the transfer:
// Define the transfer tool
const tools = [
{
type: "function",
function: {
name: "connect_to_human",
description: "Call this when the user wants to speak to a real person or a human agent.",
parameters: { type: "object", properties: {} } // No arguments needed
}
}
];
const HUMAN_AGENT_NUMBER = '15551234567'; // Replace with your phone number
app.post('/webhooks/asr', async (req, res) => {
const { uuid, speech } = req.body;
const userText = speech?.results?.[0]?.text;
if (!userText) return res.json([{ action: 'talk', text: 'Goodbye.' }]);
let history = sessions.get(uuid) || [];
history.push({ role: "user", content: userText });
const response = await openai.chat.completions.create({
model: "gpt-4o",
messages: history,
tools: tools // Provide the tool to the LLM
});
const message = response.choices[0].message;
// Check if the AI wants to transfer the call
if (message.tool_calls && message.tool_calls[0].function.name === 'connect_to_human') {
console.log(`Transferring call ${uuid} to human agent...`);
// Clean up session since the AI is leaving the call
sessions.delete(uuid);
// Return the "connect" NCCO
return res.json([
{
action: 'talk',
text: 'Please hold while I connect you to a human representative.'
},
{
action: 'connect',
from: 'YOUR_VONAGE_NUMBER', // Your linked Vonage number
endpoint: [{ type: 'phone', number: HUMAN_AGENT_NUMBER }]
}
]);
}
// Regular conversational flow
const aiResponse = message.content;
history.push({ role: "assistant", content: aiResponse });
sessions.set(uuid, history);
res.json(getConversationalNCCO(aiResponse, req.get('host')));
});
How It Works
- The Intent: When the user says I want to speak to a manager or Help me, this is too hard the LLM recognizes the intent and triggers the
connect_to_humanfunction. - The Hand-off: Your server stops the ASR loop and sends the
connectaction to Vonage. - The Connection: Vonage creates a new outbound leg to the
HUMAN_AGENT_NUMBERand bridges the two calls together. The AI is no longer "listening" once the connection is made.
Restart your server and call your application's Vonage number. When you ask the bot to speak to a human, it should say the phrase Please hold while I connect you to a human representative, and then connect you to the phone number you set as HUMAN_AGENT_NUMBER.
Next steps
- Custom Voices: Change the voice name in the
talkaction for a more branded experience. - WebSocket Streaming: For lower latency, use WebSockets to stream audio in real-time.
- Endpoints: Connect to your PBX or Contact Center via SIP or build your own web interface for a human agent with Client SDK.
- .NET version: See the same IVR/voice-bot scenario implemented in .NET in this blog post.