How to Build an Advanced IVR / Voice Bot

This guide demonstrates how to build a voice-based AI agent using the Vonage Voice API and OpenAI. You will create a Voice Bot that answers inbound calls, listens to a user's question using Automatic Speech Recognition (ASR), and responds with an intelligent answer generated by an LLM.

Prerequisites

Before you begin, ensure you have:

A Vonage API account.
Node.js installed on your machine.
An OpenAI API Key.
ngrok installed on your machine.

Setup Your Local Environment

Create a new directory for your project and install the required dependencies:

mkdir vonage-voice-bot cd vonage-voice-bot npm init -y npm install express openai

Expose Your Local Server

Vonage needs to send webhooks to your local machine. Use ngrok to expose your server:

ngrok http 3000

ngrok will be forwarding your port 3000 (defined in your server) towards a public URL, like https://{random id}.ngrok.app.

Keep this terminal open.

Provision Your Vonage Resources

Create a Voice Application

Navigate to Applications > Create a new application.
Give it a name (e.g., Voice AI Bot).
Under Capabilities, enable Voice.
In the Answer URL field, enter your Base URL followed by /webhooks/answer (e.g., https://{random id}.ngrok.app/webhooks/answer). Set the method to
GET
.
In the Event URL field, enter your Base URL followed by /webhooks/events. Set the method to
POST
.
Click Generate public and private key. Save the private.key file in your project folder (though we won't use it for this basic ASR flow, it's required for app creation).
Click Save changes.

Link a Number

Go to Numbers > Buy Numbers and purchase a voice-enabled number.
Go to Your applications, select your bot application, and click Edit.
Under the Numbers tab, click Link next to your newly purchased number.

Build the Voice Bot

Create a file named index.js and add the following code. Replace YOUR_OPENAI_API_KEY with your actual key.

Note: When running locally with ngrok, req.protocol/req.get('host') may not match your public tunnel URL. If webhooks fail, set your tunnel base URL in config (for example an env var) and build eventUrl from that instead.

const express = require('express');
const { OpenAI } = require('openai');

const app = express();
app.use(express.json());

const openai = new OpenAI({ apiKey: 'YOUR_OPENAI_API_KEY' });

// 1. Handle the initial call
app.get('/webhooks/answer', (req, res) => {
  const ncco = [
    {
      action: 'talk',
      text: 'Hi, I am your AI assistant. How can I help you today?'
    },
    {
      action: 'input',
      eventUrl: [`${req.protocol}://${req.get('host')}/webhooks/asr`],
      type: ['speech'],
      speech: {
        language: 'en-us',
        endOnSilence: 1
      }
    }
  ];
  res.json(ncco);
});

// 2. Process the Speech-to-Text result and query OpenAI
app.post('/webhooks/asr', async (req, res) => {
  const speechResults = req.body.speech?.results;

  if (!speechResults || speechResults.length === 0) {
    return res.json([{ action: 'talk', text: 'I am sorry, I didn\'t catch that. Goodbye.' }]);
  }

  const userText = speechResults[0].text;
  console.log(`User said: ${userText}`);

  try {
    // Request a completion from OpenAI
    const completion = await openai.chat.completions.create({
      model: "gpt-4o",
      messages: [
        { role: "system", content: "You are a helpful assistant on a phone call. Keep answers concise." },
        { role: "user", content: userText }
      ],
    });

    const aiResponse = completion.choices[0].message.content;

    // Respond back to the user
    res.json([{ action: 'talk', text: aiResponse }]);
    
  } catch (error) {
    console.error("OpenAI Error:", error);
    res.json([{ action: 'talk', text: 'I encountered an error processing your request.' }]);
  }
});

// 3. Log call events
app.post('/webhooks/events', (req, res) => {
  console.log('Event:', req.body.status);
  res.sendStatus(200);
});

app.listen(3000, () => console.log('Server running on port 3000'));

Test the Application

Run your server:

node index.js
Dial your Vonage number from your phone.
When prompted, ask a question (e.g., Why is the sky blue? or Tell me a joke).
The bot will capture your speech, send it to OpenAI, and read the response back to you using Text-to-Speech.

Enable Contextual Conversation

To make the conversation feel natural, we must modify the app to remember previous exchanges and re-prompt the user for more input.

Note: When running locally with ngrok, req.get('host') may not match your public tunnel host. If webhooks fail, build eventUrl using your public tunnel base URL (for example from config/env) instead of the request host.

Update your index.js with this stateful logic:

// 1. Add a Map to store conversation history by Call UUID
const sessions = new Map();

// Helper to generate a NCCO that "loops" back to ASR
const getConversationalNCCO = (text, host) => [
  { action: 'talk', text: text },
  {
    action: 'input',
    eventUrl: [`https://${host}/webhooks/asr`],
    type: ['speech'],
    speech: { language: 'en-us', endOnSilence: 1 }
  }
];

app.get('/webhooks/answer', (req, res) => {
  const uuid = req.query.uuid;
  // Initialize history for this specific caller
  sessions.set(uuid, [{ role: "system", content: "You are a helpful, concise assistant." }]);
  
  res.json(getConversationalNCCO('Hello! What is on your mind?', req.get('host')));
});

app.post('/webhooks/asr', async (req, res) => {
  const { uuid, speech } = req.body;
  const userText = speech?.results?.[0]?.text;

  if (!userText) {
    sessions.delete(uuid);
    return res.json([{ action: 'talk', text: 'Goodbye!' }]);
  }

  // Retrieve history and append the new question
  let history = sessions.get(uuid) || [];
  history.push({ role: "user", content: userText });

  const completion = await openai.chat.completions.create({
    model: "gpt-4o",
    messages: history,
  });

  const aiResponse = completion.choices[0].message.content;
  history.push({ role: "assistant", content: aiResponse });
  sessions.set(uuid, history);

  // Return the AI response AND listen for the next question
  res.json(getConversationalNCCO(aiResponse, req.get('host')));
});

// Clean up memory when the call ends
app.post('/webhooks/events', (req, res) => {
  if (req.body.status === 'completed') sessions.delete(req.body.uuid);
  res.sendStatus(200);
});

What Changed

The Session Map: We use the uuid to keep different callers' histories separate.
Recursive NCCO: Instead of a simple talk action, we now return a talk followed by an input action. This keeps the line open.
Memory: By passing the entire history array to OpenAI, the bot now understands follow-up questions like Tell me more about that.

Try the updated application by restarting your server and dialing the Vonage number linked to your application, as in the Test the Application step.

Add "Connect to Human" Tool

This step involves updating your tool definitions and adding a branch to your ASR logic that returns the Vonage connect action.

Update `index.js`

Add the new tool definition and modify the asr webhook to handle the transfer:

// Define the transfer tool
const tools = [
  {
    type: "function",
    function: {
      name: "connect_to_human",
      description: "Call this when the user wants to speak to a real person or a human agent.",
      parameters: { type: "object", properties: {} } // No arguments needed
    }
  }
];

const HUMAN_AGENT_NUMBER = '15551234567'; // Replace with your phone number

app.post('/webhooks/asr', async (req, res) => {
  const { uuid, speech } = req.body;
  const userText = speech?.results?.[0]?.text;

  if (!userText) return res.json([{ action: 'talk', text: 'Goodbye.' }]);

  let history = sessions.get(uuid) || [];
  history.push({ role: "user", content: userText });

  const response = await openai.chat.completions.create({
    model: "gpt-4o",
    messages: history,
    tools: tools // Provide the tool to the LLM
  });

  const message = response.choices[0].message;

  // Check if the AI wants to transfer the call
  if (message.tool_calls && message.tool_calls[0].function.name === 'connect_to_human') {
    console.log(`Transferring call ${uuid} to human agent...`);
    
    // Clean up session since the AI is leaving the call
    sessions.delete(uuid);

    // Return the "connect" NCCO
    return res.json([
      { 
        action: 'talk', 
        text: 'Please hold while I connect you to a human representative.' 
      },
      {
        action: 'connect',
        from: 'YOUR_VONAGE_NUMBER', // Your linked Vonage number
        endpoint: [{ type: 'phone', number: HUMAN_AGENT_NUMBER }]
      }
    ]);
  }

  // Regular conversational flow
  const aiResponse = message.content;
  history.push({ role: "assistant", content: aiResponse });
  sessions.set(uuid, history);

  res.json(getConversationalNCCO(aiResponse, req.get('host')));
});

How It Works

The Intent: When the user says I want to speak to a manager or Help me, this is too hard the LLM recognizes the intent and triggers the connect_to_human function.
The Hand-off: Your server stops the ASR loop and sends the connect action to Vonage.
The Connection: Vonage creates a new outbound leg to the HUMAN_AGENT_NUMBER and bridges the two calls together. The AI is no longer "listening" once the connection is made.

Restart your server and call your application's Vonage number. When you ask the bot to speak to a human, it should say the phrase Please hold while I connect you to a human representative, and then connect you to the phone number you set as HUMAN_AGENT_NUMBER.

Next steps

Custom Voices: Change the voice name in the talk action for a more branded experience.
WebSocket Streaming: For lower latency, use WebSockets to stream audio in real-time.
Endpoints: Connect to your PBX or Contact Center via SIP or build your own web interface for a human agent with Client SDK.
.NET version: See the same IVR/voice-bot scenario implemented in .NET in this blog post.