Vonage Audio Serializer for Pipecat
Overview
Pipecat is an open-source framework for building voice and multimodal conversational AI applications. It orchestrates AI services—such as speech-to-text, language models, and text-to-speech—alongside network transports and audio/video processing to produce low-latency, natural-sounding conversations.
The Vonage Audio Serializer for Pipecat is a transport component that bridges a Pipecat pipeline to the Vonage platform. It handles the audio format conversion and WebSocket connectivity required to receive audio from a Vonage Voice or Video session and send processed audio back in real-time.
How It Works
Vonage routes audio to external services via managed WebSocket connections. The Vonage Audio Serializer acts as the protocol adapter between that WebSocket stream and Pipecat's internal audio pipeline:
- Your Pipecat application starts a WebSocket server using the Vonage serializer as its transport layer.
- Vonage opens a WebSocket connection to your server—either from a Video session via
Audio Connector, or from a Voice call via an NCCO
connectaction. - The serializer converts the incoming Vonage audio format into the PCM frames Pipecat expects, and feeds them into your pipeline.
- Your pipeline processes the audio through its configured AI services and returns a response.
- The serializer converts the output audio back to the format Vonage expects and sends it over the WebSocket, where it is played back to session participants.
Relationship to Other Vonage Pipecat Integrations
Vonage offers two separate Pipecat integrations that address different use cases:
| Integration | Transport | Use case |
|---|---|---|
| Vonage Audio Serializer | Audio WebSocket (Audio Connector / Voice NCCO) | Audio-only AI pipelines for Voice or Video sessions |
| Video Connector Pipecat Integration | WebRTC (Video Connector) | Pipelines that also process or generate video, such as video avatars |
Use the Audio Serializer when your pipeline only needs to process and return audio. Use the Video Connector transport when your pipeline also needs to work with video frames.
When to Use the Vonage Audio Serializer
- Real-time voice AI assistants: Deploy an LLM-backed voice bot inside a Vonage Video session or on an inbound phone call.
- Live transcription and translation: Pipe session audio through a transcription service and return translated speech to participants.
- Call recording and analysis: Capture and analyze conversation content from Voice or Video calls in real-time.
- Audio effects processing: Apply filtering, noise reduction, or other transformations to audio before it reaches participants.
- Automated moderation: Detect and act on non-compliant or inappropriate speech as it occurs.
See Also
- Connect Pipecat to a Vonage session — Step-by-step how-to guide for Video and Voice sessions
- Audio Connector — How Audio Connector streams audio from a Video session to a WebSocket
- Audio Connector Server SDK — Build your own custom WebSocket audio processing server without Pipecat
- Video Connector Pipecat Integration — Pipecat integration for pipelines that process video as well as audio
- Pipecat documentation