Vonage Audio Serializer for Pipecat

Overview

Pipecat is an open-source framework for building voice and multimodal conversational AI applications. It orchestrates AI services—such as speech-to-text, language models, and text-to-speech—alongside network transports and audio/video processing to produce low-latency, natural-sounding conversations.

The Vonage Audio Serializer for Pipecat is a transport component that bridges a Pipecat pipeline to the Vonage platform. It handles the audio format conversion and WebSocket connectivity required to receive audio from a Vonage Voice or Video session and send processed audio back in real-time.

How It Works

Vonage routes audio to external services via managed WebSocket connections. The Vonage Audio Serializer acts as the protocol adapter between that WebSocket stream and Pipecat's internal audio pipeline:

Your Pipecat application starts a WebSocket server using the Vonage serializer as its transport layer.
Vonage opens a WebSocket connection to your server—either from a Video session via Audio Connector, or from a Voice call via an NCCO connect action.
The serializer converts the incoming Vonage audio format into the PCM frames Pipecat expects, and feeds them into your pipeline.
Your pipeline processes the audio through its configured AI services and returns a response.
The serializer converts the output audio back to the format Vonage expects and sends it over the WebSocket, where it is played back to session participants.

Relationship to Other Vonage Pipecat Integrations

Vonage offers two separate Pipecat integrations that address different use cases:

Integration	Transport	Use case
Vonage Audio Serializer	Audio WebSocket (Audio Connector / Voice NCCO)	Audio-only AI pipelines for Voice or Video sessions
Video Connector Pipecat Integration	WebRTC (Video Connector)	Pipelines that also process or generate video, such as video avatars

Use the Audio Serializer when your pipeline only needs to process and return audio. Use the Video Connector transport when your pipeline also needs to work with video frames.

When to Use the Vonage Audio Serializer

Real-time voice AI assistants: Deploy an LLM-backed voice bot inside a Vonage Video session or on an inbound phone call.
Live transcription and translation: Pipe session audio through a transcription service and return translated speech to participants.
Call recording and analysis: Capture and analyze conversation content from Voice or Video calls in real-time.
Audio effects processing: Apply filtering, noise reduction, or other transformations to audio before it reaches participants.
Automated moderation: Detect and act on non-compliant or inappropriate speech as it occurs.

Vonage Audio Serializer for Pipecat

Overview

How It Works

Relationship to Other Vonage Pipecat Integrations

When to Use the Vonage Audio Serializer

See Also