How to Build a Simple IVR with Speech and Touch-Tone Input

An Interactive Voice Response (IVR) allows you to automate phone interactions by providing callers with a menu of options. While traditional IVRs rely on keypad (DTMF) input, modern systems often include Speech-to-Text (ASR) for a more natural user experience. You can refer to the Advanced IVR guide for more details.

In this guide, you will build a Node.js application that answers a call and prompts the user to either press a key or speak. The application will then repeat that input back to the caller.

Prerequisites

Initialize Your Project Folder

Before configuring your Vonage resources, create a home for your code. This ensures you have a destination for your security keys later.

mkdir simple-ivr && cd simple-ivr npm init -y npm install express body-parser

Provision Your Vonage Resources

Configure your environment using the Vonage API Dashboard.

Create a Voice Application

  1. Navigate to Applications > Create a new application.
  2. Give your application a name (e.g., Simple-IVR-Speech-DTMF).
  3. Under Capabilities, enable Voice.
  4. For now, leave the Answer URL and Event URL blank. We will fill these in once we have our ngrok address in Expose and Connect Your App step.
  5. Click Generate public and private key.
  6. Move the downloaded private.key file into your simple-ivr project folder.
  7. Click Save changes.
  1. Navigate to Numbers > Buy Numbers and rent a number with Voice capabilities.
  2. Go back to Your Applications, select your IVR application, and link the new number to it.

Handle the Inbound Call

When a user calls your number, Vonage requests an NCCO (Call Control Object) from your Answer URL. Create a file named index.js and add the following code:

const express = require('express');
const bodyParser = require('body-parser');

const app = express();
app.use(bodyParser.json());

// 1. Initial greeting and input request
app.get('/webhooks/answer', (req, res) => {
  const ncco = [
    {
      action: 'talk',
      text: 'Hello. Please enter a digit or say something.',
      bargeIn: true
    },
    {
      action: 'input',
      type: ['dtmf', 'speech'],
      dtmf: { maxDigits: 1 },
      speech: { language: 'en-us' },
      eventUrl: [`${req.protocol}://${req.get('host')}/webhooks/input`]
    }
  ];

  res.json(ncco);
});

Process Speech and DTMF Input

Add the input handler to your index.js to process the payload Vonage sends back once the user interacts with the menu.

// 2. Handle the user's response
app.post('/webhooks/input', (req, res) => {
  let responseText = "I'm sorry, I didn't catch that.";
  
  if (req.body.dtmf && req.body.dtmf.digits) {
    responseText = `You pressed ${req.body.dtmf.digits}.`;
  }
  else if (req.body.speech && req.body.speech.results) {
    const transcript = req.body.speech.results[0].text;
    responseText = `You said: ${transcript}.`;
  }
  
  res.json([{ action: 'talk', text: responseText }]);
});

// 3. Log call events
app.post('/webhooks/events', (req, res) => {
  res.sendStatus(200);
});

app.listen(3000, () => console.log(`IVR server running on port 3000`));

Expose and Connect Your App

Because Vonage needs to reach your local server, you must expose it to the internet and update your Application settings.

  1. Start your server: node index.js.
  2. Start ngrok: In a new terminal, run ngrok http 8080.
  3. Update Webhooks:
    • Copy the URL provided by ngrok (e.g., https://{random id}.ngrok.app).
    • Return to your Application in the Vonage Dashboard.
    • Paste the URL into the Answer URL (appending /webhooks/answer) and Event URL (appending /webhooks/events).
    • Click Save changes.

Test the IVR

Call your Vonage number:

  • Keypad: Press 1. The IVR should say, You pressed 1.
  • Speech: Say Hello. The IVR should say, You said: Hello.

Next Steps

  1. Call Transfer: Add a connect action to connect the caller to the relevant department based on user input via phone number, SIP endpoint, or your web application using the Client SDK.
  2. Call Recording: Record and transcribe the call with record action.
  3. Voice AI: Pass user input to your AI agent to provide the caller with valuable information.
  4. Customization: Change text-to-speech voice or use the stream action in your NCCO to play pre-recorded MP3 files instead of text-to-speech.