Vonage Voice APIとDeepgramでAI音声エージェントを構築する方法

はじめに

このガイドでは、Vonage Voice APIとDeepgramのVoice Agentプラットフォームを使用して、リアルタイムのAI音声エージェントを構築するプロセスの概要を説明します。電話への応答、自動音声認識（ASR）によるユーザーの聞き取り、大規模言語モデル（LLM）によるリクエストの処理、自然な音声合成による応答をすべてリアルタイムで行うインテリジェントな音声アシスタントを作成します。さらに、このセットアップは、割り込みとしても知られる会話の中断をサポートしている。

音声オートメーションのコンセプトの概要と、3つの実装アプローチの比較については、以下を参照してください。音声オートメーションの理解.

前提条件

始める前に、あなたが持っていることを確認してください：

Vonage APIアカウント。無料会員登録.
Node.js バージョン 18 以上をご使用のマシンにインストールしてください。
A DeepgramアカウントをAPIキーで指定する。
ングロクあなたのマシンにインストールされています。

ローカル環境のセットアップ

プロジェクト用に新しいディレクトリを作成し、必要な依存関係をインストールします：

mkdir vonage-deepgram-voice-agent cd vonage-deepgram-voice-agent npm init -y npm install @vonage/server-sdk express express-ws body-parser dotenv ws

ローカルサーバーの公開

Vonage needs to send webhooks to your local machine. Use ngrok to expose your server:

ngrok http 3000

Note: Keep this terminal open and copy your ngrok URL. You'll need it in the next steps.

Vonageリソースのプロビジョニング

にログインする。 Vonageダッシュボードを開始する。

Vonageアプリケーションの作成

ダッシュボードから認証情報を作成し、作成したフォルダに保存します。

こちらへ Applications > 新規アプリケーションの作成.
アプリケーションに名前を付けましょう。
認証クリック 公開鍵と秘密鍵の生成.
- という名前のファイル private.key をダウンロードする。
- これを動かす private.key ファイルをダウンロードフォルダから vonage-deepgram-voice-agent フォルダー
アンダー能力有効にする声.
音声の設定で、以下のウェブフックを設定する：
- 回答URL https://{ngrok-url}/answer (メソッド:
  GET
  )
- イベントのURL https://{ngrok-url}/event (メソッド:
  POST
  )
クリック 新規アプリケーションの作成 一番下にある。

Numbersをリンクする

こちらへ 電話番号 > ナンバーズを買う をクリックし、音声対応番号を購入する。
こちらへ Applicationsをクリックし、ボットアプリケーションを選択します。編集.
の下にある。 Numbers タブをクリックします。 リンク 新しく購入したNumbersの横。

環境変数の設定

を作成する。 .env ファイルに以下の変数を追加する：

重要:セキュリティのため、APIキーをソースコードにハードコーディングするのではなく、環境変数に保存します。

音声エージェントコネクタの構築

という名前のファイルを作成する。 server.js をクリックし、以下のコードを追加します。このアプリケーションは、Vonage Voice API と Deepgram Voice Agent 間のコネクタとして機能します。

'use strict'

require('dotenv').config();

const express = require('express');
const bodyParser = require('body-parser');
const app = express();
require('express-ws')(app);
const webSocket = require('ws');

app.use(bodyParser.json());

//---- CORS policy ----
app.use(function (req, res, next) {
  res.header("Access-Control-Allow-Origin", "*");
  res.header("Access-Control-Allow-Headers", "Origin, X-Requested-With, Content-Type, Accept");
  res.header("Access-Control-Allow-Methods", "OPTIONS,GET,POST,PUT,DELETE");
  next();
});

//---- Configuration ----
const servicePhoneNumber = process.env.SERVICE_PHONE_NUMBER;

//---- Vonage API Setup ----
const { Auth } = require('@vonage/auth');
const credentials = new Auth({
  apiKey: process.env.API_KEY,
  apiSecret: process.env.API_SECRET,
  applicationId: process.env.APP_ID,
  privateKey: './private.key'
});

const apiBaseUrl = "https://api.nexmo.com";
const options = { apiHost: apiBaseUrl };

const { Vonage } = require('@vonage/server-sdk');
const vonage = new Vonage(credentials, options);

//---- Deepgram Voice Agent Configuration ----
const dgApiKey = process.env.DEEPGRAM_API_KEY;
const dgVoiceAgentEndpoint = process.env.DEEPGRAM_VOICE_AGENT_ENDPOINT;
const dgVoiceAgentSettings = {
  "type": "Settings",
  "audio": {
    "input": { "encoding": "linear16", "sample_rate": 8000 },
    "output": { "encoding": "linear16", "sample_rate": 8000, "container": "none" }
  },
  "agent": {
    "listen": { "provider": { "type": "deepgram", "model": "nova-3" } },
    "think": {
      "provider": { "type": "anthropic", "model": "claude-sonnet-4-20250514" },
      "prompt": "You are a helpful AI assistant on a live phone call. Keep responses concise and natural for spoken conversation."
    },
    "speak": { 
      "provider": { 
        "type": "deepgram", 
        "model": process.env.DEEPGRAM_AGENT_SPEAK 
      } 
    }
  }
};

//---- Handle incoming PSTN calls ----
app.get('/answer', async (req, res) => {
  const hostName = req.hostname;
  const uuid = req.query.uuid;
  
  // For local development with ngrok, use your ngrok URL directly
  // const publicUrl = 'https://your-ngrok-url.ngrok.io';
  const wsUri = `wss://${hostName}/socket?original_uuid=${uuid}`;
  
  const nccoResponse = [
    {
      "action": "talk",
      "text": "Hello, please wait while we're connecting your call!",
      "language": "en-US",
      "style": 11
    },
    {
      "action": "connect",
      "eventType": "synchronous",
      "eventUrl": [`https://${hostName}/ws_event`],
      "from": req.query.from,
      "endpoint": [
        {
          "type": "websocket",
          "uri": wsUri,
          "content-type": "audio/l16;rate=8000",
          "headers": {}
        }
      ]
    }
  ];
  
  res.status(200).json(nccoResponse);
});

//---- Event webhook for call status ----
app.post('/event', async (req, res) => {
  res.status(200).send('Ok');
});

//---- WebSocket event handler ----
app.post('/ws_event', async (req, res) => {
  res.status(200).send('Ok');
  
  // Trigger a greeting when WebSocket is connected
  setTimeout(() => {
    if (req.body.status === 'answered') {
      vonage.voice.playTTS(req.body.uuid, {
        text: "Hello",
        language: 'en-US',
        style: 11
      })
      .then(res => console.log("Initial greeting sent"))
      .catch(err => console.error("Failed to play TTS:", err));
    }
  }, 1500);
});

//---- Start server ----
const port = process.env.PORT || 3000;
app.listen(port, () => {
  console.log(`Voice Agent application listening on port ${port}`);
  console.log(`Make sure ngrok is forwarding to this port!`);
});

注:ローカルで ngrok を実行する場合 req.hostname はあなたの公開トンネル URL と一致しないかもしれません。ウェブフックに失敗した場合は、ngrok のベース URL を環境変数として設定し、それを使って eventUrl そして wsUri その代わりだ。

WebSocketコネクタロジックを追加する

次に、Vonage Voice API と Deepgram Voice Agent を橋渡しするコア・コネクタ・ロジックを追加します。これを server.js:

//---- WebSocket Connector ----
app.ws('/socket', async (ws, req) => {
  let wsDgOpen = false; // Deepgram WebSocket ready?
  const originalUuid = req.query.original_uuid;
  
  console.log('WebSocket connected for call UUID:', originalUuid);
  
  //---- Connect to Deepgram Voice Agent ----
  console.log('Opening connection to Deepgram Voice Agent');
  const wsDg = new webSocket(`wss://${dgVoiceAgentEndpoint}`, {
    headers: { authorization: `token ${dgApiKey}` }
  });
  
  wsDg.on('error', async (event) => {
    console.log('WebSocket to Deepgram error:', event);
  });
  
  wsDg.on('open', () => {
    console.log('WebSocket to Deepgram opened');
    // Send configuration to Deepgram Voice Agent
    wsDg.send(JSON.stringify(dgVoiceAgentSettings));
    wsDgOpen = true;
  });
  
  //---- Handle messages from Deepgram ----
  wsDg.on('message', async (msg, isBinary) => {
    if (isBinary) {
      // Audio data from agent - send directly to Vonage
      ws.send(msg);
    } else {
      // Text messages (transcripts, events, etc.)
      const message = JSON.parse(msg.toString('utf8'));
      console.log(`Message from Deepgram:`, message);
      
      // Handle barge-in: clear Vonage's audio buffer when user starts speaking
      if (message.type === "UserStartedSpeaking") {
        ws.send(JSON.stringify({ action: "clear" }));
        console.log('Sent CLEAR command to Vonage');
      }
    }
  });
  
  wsDg.on('close', async () => {
    wsDgOpen = false;
    console.log("Deepgram WebSocket closed");
  });
  
  //---- Handle messages from Vonage (user audio) ----
  ws.on('message', async (msg) => {
    if (typeof msg === "string") {
      const event = JSON.parse(msg);
      console.log("Vonage event:", event.event);
      
      // The first message from Vonage is always websocket:connected
      if (event.event === "websocket:connected") {
        console.log('Vonage WebSocket established:', event['content-type']);
      }
      
      // Handle Vonage control message confirmations
      if (event.event === "websocket:cleared") {
        console.log('Vonage audio buffer cleared');
      }
    } else {
      // Binary audio data from caller - forward to Deepgram
      if (wsDgOpen) {
        wsDg.send(msg);
      }
    }
  });
  
  //---- Clean up on disconnect ----
  ws.on('close', async () => {
    wsDgOpen = false;
    wsDg.close();
    console.log("Vonage WebSocket closed");
  });
});

仕組み

簡易オーディオ・ストリーミング:Deepgramからの音声は、バイナリメッセージとしてVonageに直接送信されます。手動でのバッファリングやタイミングは必要ありません-Vonageが自動的に内部バッファリングを処理します。

クリア・バッファ・コントロール・メッセージ:Deepgramは、ユーザが話し始めたことを検出すると(UserStartedSpeaking イベント)、アプリケーションは Vonage に CLEAR コントロール・メッセージを送信します： {"action": "clear"}.これにより、Vonage Voice API はバッファリングされた音声フレームを即座に破棄するように指示し、手動でバッファを管理することなく、即座にバージイン機能を実現します。

イベントの確認:Vonageは websocket:cleared イベントでバッファが正常にクリアされたことを確認する。これにより、割り込みがいつ発生したかを追跡することができる。

双方向通信:ユーザーの音声はVonageからDeepgramへバイナリのWebSocketメッセージとして流れ、エージェントの音声とトランスクリプトはDeepgramからVonageへリアルタイムで流れます。

リアルタイムのトランスクリプト:Deepgramは、ユーザーの発話とエージェントの応答のトランスクリプトを含むJSONメッセージを送信します。

アプリケーションのテスト

を確認してください。 private.key ファイルはプロジェクト・ディレクトリにある。
一つのターミナルでngrokを起動する：

ngrok http 3000

サーバーを別のターミナルで実行する：

node server.js

携帯電話からVonageの電話番号に電話をかけます。
音声エージェントが挨拶し、AIを駆使した会話で質問に答える。

アウトバウンドコール機能の追加

アプリケーションが発信電話をかけられるようにするには、このエンドポイントを server.js:

//---- Trigger outbound PSTN calls ----
app.get('/call', async (req, res) => {
  if (req.query.callee == null) {
    res.status(400).send('"callee" number missing as query parameter');
  } else {
    res.status(200).send('Ok');
    const hostName = req.hostname;
    
    vonage.voice.createOutboundCall({
      to: [{
        type: 'phone',
        number: req.query.callee
      }],
      from: {
        type: 'phone',
        number: servicePhoneNumber
      },
      limit: process.env.MAX_CALL_DURATION,
      answer_url: [`https://${hostName}/answer`],
      answer_method: 'GET',
      event_url: [`https://${hostName}/event`],
      event_method: 'POST'
    })
    .then(res => console.log("Outgoing PSTN call status:", res))
    .catch(err => console.error("Outgoing PSTN call error:", err));
  }
});

アウトバウンドコールを発信するには、ブラウザを開いて、次のページに移動します：

https://your-ngrok-url.ngrok.io/call?callee=15551234567

交換 15551234567 には、電話をかけたい電話番号（E.164形式では + サイン)。

音声エージェントのカスタマイズ

を変更することで、音声エージェントのさまざまな側面をカスタマイズできます。 dgVoiceAgentSettings オブジェクトがある：

AIモデルを変える

"think": {
  "provider": { "type": "open_ai", "model": "gpt-4o-mini" },
  "prompt": "You are a helpful AI assistant on a live phone call. Keep responses concise and natural for spoken conversation."
}

声を変える

を更新する。 DEEPGRAM_AGENT_SPEAK 変数を .env ファイルを参照してください。参照 DeepgramのTTSモデルのドキュメントをクリックしてください。

システムプロンプトのカスタマイズ

を変更する。 prompt フィールドの think セクションで、エージェントの性格や行動を変える：

"prompt": "You are a friendly customer service representative for Acme Corp. Help users with their inquiries about our products and services. Be professional but warm."

次のステップ

エクスペリエンス WebSocketドキュメント高度なオーディオ・ストリーミング・パターンのために。
追加通話録音とテープ起こし監査目的および品質管理のため。

ナビゲーション