How to Build a Simple IVR with Speech and Touch-Tone Input
An Interactive Voice Response (IVR) allows you to automate phone interactions by providing callers with a menu of options. While traditional IVRs rely on keypad (DTMF) input, modern systems often include Speech-to-Text (ASR) for a more natural user experience. You can refer to the Advanced IVR guide for more details.
In this guide, you will build a Node.js application that answers a call and prompts the user to either press a key or speak. The application will then repeat that input back to the caller.
Prerequisites
- A Vonage API account. Sign up for free.
- Node.js installed on your machine.
- ngrok installed on your machine.
Initialize Your Project Folder
Before configuring your Vonage resources, create a home for your code. This ensures you have a destination for your security keys later.
Provision Your Vonage Resources
Configure your environment using the Vonage API Dashboard.
Create a Voice Application
- Navigate to Applications > Create a new application.
- Give your application a name (e.g., Simple-IVR-Speech-DTMF).
- Under Capabilities, enable Voice.
- For now, leave the Answer URL and Event URL blank. We will fill these in once we have our ngrok address in Expose and Connect Your App step.
- Click Generate public and private key.
- Move the downloaded
private.keyfile into your simple-ivr project folder. - Click Save changes.
Link a Virtual Number
- Navigate to Numbers > Buy Numbers and rent a number with Voice capabilities.
- Go back to Your Applications, select your IVR application, and link the new number to it.
Handle the Inbound Call
When a user calls your number, Vonage requests an NCCO (Call Control Object) from your Answer URL. Create a file named index.js and add the following code:
const express = require('express');
const bodyParser = require('body-parser');
const app = express();
app.use(bodyParser.json());
// 1. Initial greeting and input request
app.get('/webhooks/answer', (req, res) => {
const ncco = [
{
action: 'talk',
text: 'Hello. Please enter a digit or say something.',
bargeIn: true
},
{
action: 'input',
type: ['dtmf', 'speech'],
dtmf: { maxDigits: 1 },
speech: { language: 'en-us' },
eventUrl: [`${req.protocol}://${req.get('host')}/webhooks/input`]
}
];
res.json(ncco);
});
Process Speech and DTMF Input
Add the input handler to your index.js to process the payload Vonage sends back once the user interacts with the menu.
// 2. Handle the user's response
app.post('/webhooks/input', (req, res) => {
let responseText = "I'm sorry, I didn't catch that.";
if (req.body.dtmf && req.body.dtmf.digits) {
responseText = `You pressed ${req.body.dtmf.digits}.`;
}
else if (req.body.speech && req.body.speech.results) {
const transcript = req.body.speech.results[0].text;
responseText = `You said: ${transcript}.`;
}
res.json([{ action: 'talk', text: responseText }]);
});
// 3. Log call events
app.post('/webhooks/events', (req, res) => {
res.sendStatus(200);
});
app.listen(3000, () => console.log(`IVR server running on port 3000`));
Expose and Connect Your App
Because Vonage needs to reach your local server, you must expose it to the internet and update your Application settings.
- Start your server:
node index.js. - Start ngrok: In a new terminal, run
ngrok http 8080. - Update Webhooks:
- Copy the URL provided by ngrok (e.g.,
https://{random id}.ngrok.app). - Return to your Application in the Vonage Dashboard.
- Paste the URL into the Answer URL (appending
/webhooks/answer) and Event URL (appending/webhooks/events). - Click Save changes.
- Copy the URL provided by ngrok (e.g.,
Test the IVR
Call your Vonage number:
- Keypad: Press 1. The IVR should say, You pressed 1.
- Speech: Say Hello. The IVR should say, You said: Hello.
Next Steps
- Call Transfer: Add a
connectaction to connect the caller to the relevant department based on user input via phone number, SIP endpoint, or your web application using the Client SDK. - Call Recording: Record and transcribe the call with
recordaction. - Voice AI: Pass user input to your AI agent to provide the caller with valuable information.
- Customization: Change text-to-speech voice or use the stream action in your NCCO to play pre-recorded MP3 files instead of text-to-speech.