https://a.storyblok.com/f/270183/1368x665/430960f1d9/26feb_dev-blog_what-is-voip_1368x665.png

What is VoIP?

最終更新日 February 17, 2026

所要時間:9 分

VoIP, or Voice over Internet Protocol, is a technology that allows voice calls to be made over the internet instead of traditional phone networks. By converting sound into digital data packets, VoIP enables real-time voice and video communication across IP-based networks—powering everything from business phone systems to video conferencing apps and mobile networks.

Originally demonstrated in the 1970s, VoIP did not become commercially viable until the late 1990s and early 2000s, when widespread internet access and improved networking technologies made large-scale adoption possible. By 2020, VoIP had become essential infrastructure, forming the backbone of modern communication for businesses and consumers across nearly every industry.

Today, voice and video calls are so common that the technology behind them often goes unnoticed. But what does it actually mean to transmit voice calls as data packets? What technical challenges does VoIP solve, and why has it become the foundation of modern communications? In this post, we’ll answer those questions, explore how VoIP works, and show you how to try it yourself using the Vonage Voice API.

This blog post is structured to accommodate different reading and learning styles. Each section is atomic, so feel free to consume them in whatever order works best for you:

From Phone Lines to the Internet: How Voice Went Digital

VoIP is also known as IP telephony. VoIP is a suite of technologies that is primarily used for voice communication over Internet Protocol (IP) networks. In other words, VoIP makes it possible to conduct a phone call over the Internet instead of the traditional public switched telephone network (PSTN) or plain old telephone service (POTS).

In order to understand why VoIP is so revolutionary, it is important to understand how phone calls initially worked.

How Phone Calls Worked Before the Internet

For most of the 20th century, phone calls traveled over physical wires. When you picked up the phone and dialed a number, your call was connected to the other person through a dedicated path made of copper cables on a circuit switched network. The sound waves of your voice were converted into electrical signals and transmitted back and forth along this path. That connection stayed open for the entire duration of the call, even during moments of silence.

This system worked, but it had limits. It required a lot of expensive infrastructure, didn’t scale easily, and wasn’t very flexible. Once the call ended, the connection was terminated and couldn’t be reused for anything else. What happens if all that infrastructure experiences a failure? Could there be a way to make these connections more persistent and reliable?

In the 1960s, a new technology was emerging as an answer to that question—the internet. As computers and the internet became more common, engineers started asking a simple question: What if voice could travel the same way emails and web pages do?

A stylized illustration of the evolution of communication from analog to digital.Breaking Voice Into Data

The internet doesn’t send information as a single, continuous stream. Instead, it breaks everything into small chunks called packets in a process called packet switching. Each packet has a header and a payload: The header tells the packet where to go and upon reaching its destination, the packet’s payload is then extracted and acted upon by either an operating system, application software, or higher-layer protocols. In other words, a packet is like an envelope with an address on it, telling the network where it needs to go; the letter inside the envelope contains the data. This is what’s going on under the hood every time you send an email, access a website, stream a video online, or make a call with a virtual number.

Packet switching solves how data can move efficiently across networks—but voice presents a unique challenge. Human speech is continuous and analog, while computer networks operate using discrete digital data. To send voice over the internet, sound waves first need to be converted into a digital form that can be compressed, transmitted, and reconstructed on the other end.

This is where linear predictive coding (LPC) comes in.

LPC is a technique used in speech processing to represent spoken audio in a compact digital format. Instead of transmitting every detail of a voice signal, LPC analyzes short segments of speech and identifies patterns that can be used to predict upcoming sounds. Only the essential information needed to recreate the voice is sent across the network.

So to summarize, when you speak into a VoIP-enabled device like a smartphone or laptop, your voice is:

  1. Captured by a microphone

  2. Compressed into digital data using LPC or similar technique

  3. Split into small packets

  4. Routed across the internet

  5. Reassembled and played as sound on the other end

All of this happens in milliseconds, so the conversation feels natural and real-time.

Why This Was a Big Deal

Turning voice into internet data changed everything.

Because VoIP uses the same networks as websites, apps, and video streaming:

  • Calls are cheaper—especially long-distance and international ones

  • You don’t need special phone lines

  • Voice, video, and messaging can all work together in one app

This shift laid the foundation for modern communication tools like video calls, virtual phone numbers, and cloud-based contact centers.

In short, VoIP works because the internet learned how to transmit human conversation—not just text and images. And once voice became just another form of data, communication became faster, cheaper, and far more flexible.

What Is VoIP and How Is It Used?

Voice over Internet Protocol is a method of transmitting sound waves compressed into data packets over digital networks using the internet. These data packets are passed along their networks using headers to direct routers and carrying payloads with data and instructions. Once the packets reach their destination, they are reassembled and decompressed back into sound waves using a codec. Codec (a portmanteau of coder/decoder) refers to a piece of hardware or software that encodes or decodes a data stream or signal.

In order to work, VoIP usually requires the following:

  1. An internet connection

  2. A VoIP enabled device such as a smart phone with a VoIP app on it

  3. A VoIP service provider

  4. A Session Border Controller (SBC) to manage call routing and security

Session Protocols and Protections

VoIP employs Session Initiation Protocol (SIP) to establish, maintain, and terminate voice and video sessions. SIP is a text-based protocol inspired by the Hypertext Transfer Protocol (HTTP) and the Simple Mail Transfer Protocol (SMTP). Like IP addresses, SIP addresses help direct communication data packets between VoIP enabled devices. Once a session is initiated, other protocols are responsible for data packet encoding, transmission, and decoding.

Session Border Controllers (SBC) provide security for these sessions. They are also responsible for connectivity, service quality, enforcing regulations, and statistics and billing information. If a session is the connection between two VoIP enabled devices, then the SBC is the supervisor who makes sure everything is protected and in order. 

How VoIP Powers the Tech We Use Every Day

While VoIP was initially concerned with voice transmission, the concept of communication over the internet forms the foundation of what is known as IP telephony. IP telephony includes voice as well as text and fax communication. Applications like WhatsApp and Signal use IP telephony. These days, VoIP and IP telephony are integral to modern mobile infrastructure, with 4G and 5G networks relying on IP-based voice technologies.

Today VoIP isn’t just about replacing traditional phone calls. VoIP and IP telephony lie at the foundation of many of our essential modern communication tools.

Video Conferencing: More Than Just Video

When you join a video call on platforms like Zoom, Google Meet, or Microsoft Teams, VoIP is working behind the scenes. While your camera handles the video, your voice is still being captured, turned into digital data, and sent over the internet using VoIP principles.

As you’ve probably noticed in your own video conferencing experience, audio is usually the most important part of a video call. Even if the video freezes or drops in quality, clear audio preserves the conversation. VoIP makes it possible to adjust, compress, and prioritize voice data so conversations feel natural even when internet connections aren’t perfect.

Messaging Apps and Voice Notes

Messaging apps like WhatsApp, Slack, and Discord rely heavily on VoIP technology. When you send a voice note, start a voice chat, or make an in-app call, you’re not using a traditional phone network. Instead, your voice is traveling as internet data—just like text messages and images.

This is why these apps can offer:

  • Free or low-cost voice and video calls

  • Group calls across countries

  • Seamless switching between text, voice, and video

From the user’s perspective, it all feels like one simple app. Under the hood, VoIP is what makes that flexibility possible.

Mobile Networks and Wi-Fi Calling

VoIP and IP telephony are integral to modern mobile infrastructure. Many smartphones now support Wi-Fi calling, which lets you make calls over a wireless internet connection instead of relying solely on cellular towers.

Newer mobile standards also use internet-based voice technologies to deliver clearer calls and faster connection times. In other words, even “regular” phone calls are increasingly powered by VoIP-style systems.

VoIP is a testament to the human drive for communication. From the humble telephone to the technological achievements of packet switching, codecs, and protocols, our need to talk to each other continues to inspire innovation.

A stylized illustration of developers working on apps.

Give VoIP a Try!

The following code and a Vonage account is all you really need to use the Vonage Voice API to connect an inbound call using VoIP:

curl -X POST https://api.nexmo.com/v1/calls\
  -H "Authorization: Bearer $JWT"\
  -H "Content-Type: application/json"\
  -d '{"to":[{"type": "phone","number": "'$VOICE_TO_NUMBER'"}],
      "from": {"type": "phone","number": "'$VONAGE_VIRTUAL_NUMBER'"},
      "ncco": {"action": "talk",
                    "text": "This is a text to speech call from Vonage"}]}'

Want to build a full Vonage Voice application? Learn how to handle an inbound phone call with Python.

To see VoIP in action, watch the demo below on building an automated voice broadcasting system:

In Summary

VoIP is the technology that enables voice communication over the internet rather than traditional phone networks. In this post, we explored how VoIP works, tracing the evolution of voice transmission from early analog telephones to packet-switched digital networks. We looked at the core technologies that make VoIP possible, including packet switching, codecs, and IP-based communication, and saw how VoIP underpins modern tools like video conferencing, messaging apps, and mobile networks. 

Finally, the post demonstrates how easy it is to experience VoIP firsthand using the Vonage Voice API, connecting historical context with practical, real-world application.

Further Reading

質問や共有したいことがありますか?Vonageコミュニティ VonageコミュニティSlackまたは 開発者向けニュースレターでフォローしてください。 X(旧Twitter)YouTubeチャンネル YouTubeチャンネルビデオチュートリアルを購読する。 LinkedInのVonage開発者ページ開発者が学び、コミュニティとつながるためのスペースです。つながりを維持し、進捗状況を共有し、最新の開発者向けニュース、ヒント、イベントを把握してください!

シェア:

https://a.storyblok.com/f/270183/400x400/2c4345217d/liz-acosta.jpeg
Liz Acostaデベロッパー・アドボケイト

Liz AcostaはVonageのDeveloper Advocateです。映画学生からマーケター、エンジニア、デベロッパー・アドボケイトという彼女のキャリア・パスは型破りに見えるかもしれないが、デベロッパー・リレーションズにとってはごく一般的なものだ!ピザ、植物、パグ、Pythonが大好き。