How to Build a Simple IVR with Speech and Touch-Tone Input

An Interactive Voice Response (IVR) allows you to automate phone interactions by providing callers with a menu of options. While traditional IVRs rely on keypad (DTMF) input, modern systems often include Speech-to-Text (ASR) for a more natural user experience. You can refer to the Advanced IVR guide for more details.

In this guide, you will build a Node.js application that answers a call and prompts the user to either press a key or speak. The application will then repeat that input back to the caller.

Prerequisites

Initialize Your Project Folder

Before configuring your Vonage resources, create a home for your code. This ensures you have a destination for your security keys later.

mkdir simple-ivr && cd simple-ivr npm init -y npm install express body-parser

Expose Your Local Server

Vonage needs to send webhooks to your local machine. Use ngrok to expose your server:

ngrok http 3000

Note: Keep this terminal open and copy your ngrok URL. You'll need it in the next steps.

Provision Your Vonage Resources

Configure your environment using the Vonage API Dashboard.

Create a Voice Application

  1. Navigate to Applications > Create a new application.
  2. Give your application a name (e.g., Simple-IVR-Speech-DTMF).
  3. Click Generate public and private key.
  4. Move the downloaded private.key file into your simple-ivr project folder.
  5. Under Capabilities, enable Voice.
  6. Set the Answer URL to your ngrok URL with /webhooks/answer appended. Example: https://{random-id}.ngrok.app/webhooks/answer. Set the method to
    GET
    .
  7. Set the Event URL to your ngrok URL with /webhooks/events appended. Example: https://{random-id}.ngrok.app/webhooks/events. Set the method to
    POST
    .
  8. Click Generate new application at the bottom.
  1. Navigate to Phone Numbers > Buy Numbers and rent a number with Voice capabilities.
  2. Go back to Applications, select your IVR application, and link the new number to it.

Handle the Inbound Call

When a user calls your number, Vonage requests an NCCO (Call Control Object) from your Answer URL. Create a file named index.js and add the following code:

const express = require('express');
const bodyParser = require('body-parser');

const app = express();
app.use(bodyParser.json());

// 1. Initial greeting and input request
app.get('/webhooks/answer', (req, res) => {
  const ncco = [
    {
      action: 'talk',
      text: 'Hello. Please enter a digit or say something.',
      bargeIn: true
    },
    {
      action: 'input',
      type: ['dtmf', 'speech'],
      dtmf: { maxDigits: 1 },
      speech: { language: 'en-us' },
      eventUrl: [`${req.protocol}://${req.get('host')}/webhooks/input`]
    }
  ];

  res.json(ncco);
});

Process Speech and DTMF Input

Add the input handler to your index.js to process the payload Vonage sends back once the user interacts with the menu.

// 2. Handle the user's response
app.post('/webhooks/input', (req, res) => {
  let responseText = "I'm sorry, I didn't catch that.";

  if (req.body.dtmf && req.body.dtmf.digits) {
    responseText = `You pressed ${req.body.dtmf.digits}.`;
  }
  else if (req.body.speech && req.body.speech.results) {
    const transcript = req.body.speech.results[0].text;
    responseText = `You said: ${transcript}.`;
  }

  res.json([{ action: 'talk', text: responseText }]);
});

// 3. Log call events
app.post('/webhooks/events', (req, res) => {
  res.sendStatus(200);
});

app.listen(3000, () => console.log(`IVR server running on port 3000`));

Test the IVR

  1. Start your server: node index.js.
  2. Call your Vonage number:
  • Keypad: Press 1. The IVR should say, You pressed 1.
  • Speech: Say Hello. The IVR should say, You said: Hello.

Next Steps

  1. Call Transfer: Add a connect action to connect the caller to the relevant department based on user input via phone number, SIP endpoint, or your web application using the Client SDK.
  2. Call Recording: Record and transcribe the call with record action.
  3. Voice AI: Pass user input to your AI agent to provide the caller with valuable information.
  4. Customization: Change text-to-speech voice or use the stream action in your NCCO to play pre-recorded MP3 files instead of text-to-speech.