How to Build a Simple IVR with Speech and Touch-Tone Input
An Interactive Voice Response (IVR) allows you to automate phone interactions by providing callers with a menu of options. While traditional IVRs rely on keypad (DTMF) input, modern systems often include Speech-to-Text (ASR) for a more natural user experience. You can refer to the Advanced IVR guide for more details.
In this guide, you will build a Node.js application that answers a call and prompts the user to either press a key or speak. The application will then repeat that input back to the caller.
Prerequisites
- A Vonage API account. Sign up for free.
- Node.js installed on your machine.
- ngrok installed on your machine.
Initialize Your Project Folder
Before configuring your Vonage resources, create a home for your code. This ensures you have a destination for your security keys later.
Expose Your Local Server
Vonage needs to send webhooks to your local machine. Use ngrok to expose your server:
ngrok will forward your local port 3000 to a public URL like https://{random-id}.ngrok.app.
Important: Keep this terminal window open while developing and testing. If you close ngrok, you’ll need to update your webhook URLs with the new address.
Copy this URL. You’ll need it when configuring your Vonage application in the next step.
Note: The free version of ngrok generates a new random URL each time you restart it. For a consistent URL during development, consider using ngrok reserved domains or upgrading to a paid plan.
Provision Your Vonage Resources
Configure your environment using the Vonage API Dashboard.
Create a Voice Application
- Navigate to Applications > Create a new application.
- Give your application a name (e.g., Simple-IVR-Speech-DTMF).
- Under Capabilities, enable Voice.
- Set the Answer URL to your ngrok URL with
/webhooks/answerappended. Example:https://{random-id}.ngrok.app/webhooks/answer. Set the method toGET. - Set the Event URL to your ngrok URL with
/webhooks/eventsappended. Example:https://{random-id}.ngrok.app/webhooks/events. Set the method toPOST. - Click Generate public and private key.
- Move the downloaded
private.keyfile into your simple-ivr project folder. - Click Save changes.
Link a Virtual Number
- Navigate to Numbers > Buy Numbers and rent a number with Voice capabilities.
- Go back to Your Applications, select your IVR application, and link the new number to it.
Handle the Inbound Call
When a user calls your number, Vonage requests an NCCO (Call Control Object) from your Answer URL. Create a file named index.js and add the following code:
const express = require('express');
const bodyParser = require('body-parser');
const app = express();
app.use(bodyParser.json());
// 1. Initial greeting and input request
app.get('/webhooks/answer', (req, res) => {
const ncco = [
{
action: 'talk',
text: 'Hello. Please enter a digit or say something.',
bargeIn: true
},
{
action: 'input',
type: ['dtmf', 'speech'],
dtmf: { maxDigits: 1 },
speech: { language: 'en-us' },
eventUrl: [`${req.protocol}://${req.get('host')}/webhooks/input`]
}
];
res.json(ncco);
});
Process Speech and DTMF Input
Add the input handler to your index.js to process the payload Vonage sends back once the user interacts with the menu.
// 2. Handle the user's response
app.post('/webhooks/input', (req, res) => {
let responseText = "I'm sorry, I didn't catch that.";
if (req.body.dtmf && req.body.dtmf.digits) {
responseText = `You pressed ${req.body.dtmf.digits}.`;
}
else if (req.body.speech && req.body.speech.results) {
const transcript = req.body.speech.results[0].text;
responseText = `You said: ${transcript}.`;
}
res.json([{ action: 'talk', text: responseText }]);
});
// 3. Log call events
app.post('/webhooks/events', (req, res) => {
res.sendStatus(200);
});
app.listen(3000, () => console.log(`IVR server running on port 3000`));
Test the IVR
- Start your server:
node index.js. - Call your Vonage number:
- Keypad: Press 1. The IVR should say, You pressed 1.
- Speech: Say Hello. The IVR should say, You said: Hello.
Next Steps
- Call Transfer: Add a
connectaction to connect the caller to the relevant department based on user input via phone number, SIP endpoint, or your web application using the Client SDK. - Call Recording: Record and transcribe the call with
recordaction. - Voice AI: Pass user input to your AI agent to provide the caller with valuable information.
- Customization: Change text-to-speech voice or use the stream action in your NCCO to play pre-recorded MP3 files instead of text-to-speech.