Voicemail Detection (Automatic Machine Detection)

When using the Vonage Voice API to create an outbound voice experience, it is sometimes necessary to take into account who, or in some cases what, is answering the call. Maybe the experience that needs to be built is for humans only, in which case the call should be disconnected when answered by a voice mail system. Or, in the case a voice mail system answers the call, it could be necessary to leave a custom message after the beep.

Vonage offers two different features for machine detection use cases:

  • Standard Machine Detection - provides basic capability of machine detection with options to drop the call automatically or just inform the application of the fact the call was likely answered by voicemail. Free of extra charges.
  • Advanced Machine Detection - provides more accurate detection and full flexibility to build any call flow. It also provides beep detection, and an asynchronous mode of functioning that enables better call control. Advanced Machine Detection is a chargeable feature; exact rates can be found on the Voice API Pricing page under 'Programmable Features'.

Note: If you try to use both methods in the same call, the Advanced Machine Detection will take precedence and the standard option will not be used.

Call Establishment and Machine Detection

Regardless of the type of machine detection (standard or advanced) used, the same principles are applicable. When the Vonage Voice API is instructed to make a call to a phone number, either by a

POST
request or an NCCO connect action, and that request includes machine detection activation, the process of call establishment goes through the same states as described in Call Flow.

A successful call will always go through the following states before any machine detection:

  • Started
  • Ringing
  • Answered

Once the call has reached the answered state, machine detection starts. For any machine detection to begin, we need the call to be established and audio to start flowing to Vonage. It is based on the analysis of this audio that Vonage determines if it was a human or a machine who has answered the call.

Vonage will always send the results to the event webhook in the status field of a Human / Machine type webhook.

With this in mind, let’s discuss the sequence of events involved in detection and call processing, before we dive in on how to use each type of machine detection.

Synchronous and Asynchronous Implementations

In regards to call processing and NCCO actions, there are two options for implementation: synchronous and asynchronous.

In synchronous mode, all activities related with machine detection are performed, and only after they’re finished does Vonage start processing the NCCO instructions.

What this means is that the following happens sequentially after a call has been answered:

  • Answered event
  • Human / Machine event(s) (including beep event when applicable)
  • Process first NCCO instruction
  • (Optional) events related with NCCO instruction (when applicable)
  • Further NCCO instructions

Identifying the remote party as machine or human takes up to 4-5 seconds, so Vonage’s side of the call is silent until the identification is complete.

When no machine detection is done, after the call is answered, the NCCO is fetched and immediately starts being processed. This means in the cases a human answers the call, there is no silence at the start of the call, providing an improved end user experience. To bridge the gap between no machine detection and machine detection, Vonage developed asynchronous mode in the new advanced machine detection offering.

In asynchronous mode, the machine detection steps happen in parallel with the NCCO processing. So after the call is answered this is how applications can expect events to arrive and be processed:

Asynchrounous Machine Detection Flow

In this case, the NCCO's first action starts after the answer event, while machine detection is being performed in the background. If this action is a talk action, this means no silence in the channel while detection is happening. Furthermore, if the first action is finished, Vonage will continue processing further actions from the NCCO while detection is occurring.

Standard Machine Detection

When standard machine detection is activated it is possible for an application to know if a voice mail / machine or a human answered the call. Developers must define one out of two behaviors for when a machine is detected: continue or hangup.

For example, you may want to create a call with machine detection that hangs up in case a machine answers, or fetches an NCCO from the answer_url otherwise. In both situations, the application will receive events with the results of the detection so that you know if the person answered the call or not.

To achieve this, you would make a

POST
request to https://api.nexmo.com/v1/calls/, using a JWT as the authorization method.

The body of the request should contain something like this:

If instead of hangup, continue was used, then the NCCO would be fetched, but your application would also receive the result event through a Human / Machine callback to your event URL.

It is important to note that standard machine detection happens synchronously, which again means that the detection is done before fetching and processing the first NCCO action. While detection is happening, it is impossible to interact with the user receiving the call.

For more examples of how to make a call using the Voice API, see this code snippet.

Machine detection is also available from an NCCO connect action, with the same options of hangup and continue:

Advanced Machine Detection

Vonage’s Advanced Machine Detection feature was built to both improve the accuracy of the detection and the voice experience that Vonage is able to provide.

Note: This is a premium feature, with an extra charge of €0.0070 per call that Advanced Machine Detection is activated.

To use advanced machine detection when making a new call, you would make a

POST
request to https://api.nexmo.com/v1/calls/, using a JWT as the authorization method.

The body of the request must contain the advanced_machine_detection configuration:

This can also be done using an NCCO connect action, with the same options of hangup and continue:

As seen in the examples, advanced_machine_detection (or advancedMachineDetection) has three configurable parameters:

  • behavior - this defines what Vonage platform should do when encountering a machine. The possible values are:
    • continue - If machine is encountered, continue with the call.
    • hangup - If machine is encountered, terminate the call.
  • beep_timeout- this allows you to define how long to wait for the voicemail beep, and in the case it isn’t received, generate a corresponding webhook event.
  • mode - besides the main behavior of hangup versus continue when handling calls, it is also possible to define one of three modes. These are discussed in more detail below.

default Mode

The default mode provides the highest level of control to the call. This mode works asynchronously, which means Vonage starts processing NCCO actions during the detection phase.

As an example, assume this NCCO was returned to Vonage in response to the NCCO fetch from the answer_url:

Once the call is answered, this message will immediately start to be played and will repeat three times while machine detection algorithms are processing the remote party audio in parallel.

During this talk action, a machine detection result may occur, in which case a webhook is sent to the event_url.

The webhook will look like this (with either human or machine as status):

Independently of a human or machine event being sent to the application, the application can answer such webhook in one of two ways. Either return a 204 to continue, or return a 200 with a new NCCO which will replace the previous NCCO.

This means that an application can differentiate behavior. If human, the call could continue with the same NCCO and complete further actions, such as requesting and processing DTMF input.

If it’s a machine, the application could wait for the beep and then leave a customized voicemail message. A machine event with sub_state beep_start indicates the beginning of the voice mail beep. By contrast, beep_timeout sub_state indicates that the beep wasn't received after waiting for the designated period. The application can also return a 200 OK with a new NCCO, with the message that should be placed in the voice mail.

The following snippet of code running in an application would use a new talk action in case of a machine being detected, stating the platform should wait the for beep (for illustrative purposes, in a real example a 204 could be used instead), while if sub_state is beep_start it leaves the actual voice mail message after the beep.

With default mode, it is possible to create voice experiences where there are no silences at the beginning of the call, with the possibility to leave voice mail messages right after the beep, avoiding the message being truncated.

detect Mode

The detect mode has the same behavior as Standard Machine Detection with a more accurate identification of machines. It allows developers who want to migrate their old voice application, built on top of Standard Machine Detection, to do so with minimal code changes.

Typically, this mode should be used when the desired behavior is to hangup the call in the presence of a machine. It is synchronous, so if continue is used instead of hangup it will wait until after the detection of a human or machine to start to process the NCCO from the answer_url if it’s a new call created from

POST
, or the next action in a NCCO after a connect action with detection.

detect_beep Mode

The detect_beep mode builds upon the detect mode by allowing a voicemail message to be deposited right after the beep. This mode should be used when a continue behavior is desired. If human is detected, it behaves exactly like detect mode, in that it runs synchronously waits for machine detection to finish before processing the NCCO or the next NCCO action.

The main difference in this mode is that when a machine is detected, Vonage will wait for the beep event (or beep_timeout) before processing the NCCO or next NCCO action. This effectively allows Vonage to deliver a message to a human, or leave it in a voicemail message after a beep.