How to Adopt iOS Picture in Picture for Vonage Video Calls
Published on August 27, 2024

With the release of iOS 15, Apple introduced the Picture in Picture feature, a significant improvement for multitasking on iPhones and iPads. Picture in Picture lets you continue video calls while using other apps, offering great flexibility and convenience. This makes you more productive and responsive, showing how important this feature is for modern communication.

In this article, we'll explore how developers can use iOS Picture in Picture to integrate Vonage video calls into their apps effectively.

Vonage API Account

To complete this tutorial, you will need a Vonage API account. If you don’t have one already, you can sign up today and start building with free credit. Once you have an account, you can find your API Key and API Secret at the top of the Vonage API Dashboard.

Pre-requisites

Vonage Custom Video Rendering

Before integrating Vonage video calls into iOS Picture in Picture, developers will need to get and process raw video data from Vonage's video renderer. This data is essential for accessing the video stream directly within your app. Let’s start with building out the foundation of our application!

Follow the tutorial to create a basic video chat app (est. completion time: 25 minutes). If you would like to avoid starting from scratch, you can clone our Swift sample app repo on GitHub.

If you cloned the repo, make sure you replace the values for your own App ID, session ID, and token with your own:

  • App ID is your Vonage API key found on your dashboard when you created the application.

  • Session ID is created when you chose “Create a video session” on your dashboard.

  • A token could be generated using one of the Vonage Server SDKs by choosing a language and finding the code snippet under “Generating Tokens”) with your own values.

Assuming you have CocoaPods installed, open your terminal, cd to your project directory and type pod install. Reopen your project in Xcode using the .xcworkspace file.

In the ViewController.swift file, replace the following empty strings with the corresponding API key, session ID, and token values:

// *** Fill the following variables using your own Project info  ***

//  ***            https://tokbox.com/account/#/                  ***

// Replace with your OpenTok API key

let kApiKey = ""

// Replace with your generated session ID

let kSessionId = ""

// Replace with your generated token

let kToken = ""

Build and run the application.

In the sample application ViewController.swift file, you’ve already created a subscriber using OTSubscriber:

  subscriber = OTSubscriber(stream: stream, delegate: self)

The subscriber has a property called videoRender, which you can assign a custom renderer for the subscribed stream. Now you have to assign the videoRender property of the subscriber to the ExampleVideoRender class, a custom class that implements the OTVideoRender protocol (defined in the iOS SDK). This protocol lets you define a custom video renderer to be used by a Vonage Video publisher or subscriber.

   let videoRender = ExampleVideoRender()
   subscriber?.videoRender = videoRender

For continuous video streaming, you will need to remove the willResignActiveNotification from the subscriber. This will allow the app to maintain video stream operations even when it's in the background.

NotificationCenter.default.removeObserver(subscriber, name: UIApplication.willResignActiveNotification, object: nil)

The ExampleVideoRender class handles and processes incoming video stream data, ensuring that the subscriber's video frames can be accessed and adjusted within your app. The renderVideoFrame method is called when the subscriber renders a video frame to the video renderer. The frame is an OTVideoFrame object (defined by the iOS SDK), which has video frame information such as metadata and plane data.

Since OTVideoFrame only provides YUV plane data, we need to convert it to CMSampleBuffer before it can be displayed in UIView and Picture in Picture. We will go through the converting process later in the article. 

With the CMSampleBuffer ready, you can retrieve and display it in a designated UIView.

let bufferDisplayLayer = videoRender.bufferDisplayLayer

bufferDisplayLayer.frame = frame

videoContainerView.layer.addSublayer(bufferDisplayLayer)

Converting YUV Data to Sample Buffer

YUV (YCbCr) is a common color space for video, representing luminance (Y) and chrominance (UV) separately. By extracting YUV plane data, developers can manipulate and convert these components into a format compatible with iOS's sample buffer. This is necessary for displaying video frames in iOS Picture in Picture mode.

As mentioned above, you can get the OTVideoFrame under the renderVideoFrame method, which is called when the subscriber renders a video frame. This frame includes YUV plane data, which developers can use to create a CVPixelBuffer. This step organizes and maps the YUV components into the pixel buffer correctly. 

let pixelAttributes: NSDictionary = [kCVPixelBufferIOSurfacePropertiesKey as String: [:]]

var pixelBuffer: CVPixelBuffer?

let result = CVPixelBufferCreate(kCFAllocatorDefault, width, height, kCVPixelFormatType_32BGRA, pixelAttributes as CFDictionary, &pixelBuffer)
         _ = accel.convertFrameVImageYUV(frame, to: pixelBuffer)

Next, create CMSampleBuffer from the CVPixelBuffer, including the necessary metadata like timestamp and format description. These steps are crucial for preparing video frames for display in iOS Picture in Picture mode, ensuring smooth playback and compliance with iOS standards.

func createSampleBufferFrom(pixelBuffer: CVPixelBuffer) -> CMSampleBuffer? {
        CVPixelBufferLockBaseAddress(pixelBuffer, .readOnly)
        
        var sampleBuffer: CMSampleBuffer?
        
        
        let now = CMTimeMakeWithSeconds(CACurrentMediaTime(), preferredTimescale: 1000)
        var timingInfo = CMSampleTimingInfo(duration: CMTimeMakeWithSeconds(1, preferredTimescale: 1000), presentationTimeStamp: now, decodeTimeStamp: now)
        var formatDescription: CMFormatDescription? = nil
        CMVideoFormatDescriptionCreateForImageBuffer(allocator: kCFAllocatorDefault, imageBuffer: pixelBuffer, formatDescriptionOut: &formatDescription)
        
        let osStatus = CMSampleBufferCreateReadyWithImageBuffer(
            allocator: kCFAllocatorDefault,
            imageBuffer: pixelBuffer,
            formatDescription: formatDescription!,
            sampleTiming: &timingInfo,
            sampleBufferOut: &sampleBuffer
        )
        
        if osStatus != noErr {
            let errorMessage = osStatusToString(status: osStatus)
            print("osStatus error: \(errorMessage)")
        }
        
        guard let buffer = sampleBuffer else {
            print("Cannot create sample buffer")
            return nil
        }
        
        CVPixelBufferUnlockBaseAddress(pixelBuffer, [])
        
        return buffer
    }

Implementing Picture in Picture Component 

Now that the CMSampleBuffer is ready, follow these steps to add Picture in Picture for video calls

1. Create source view to display inside the video-call view controller.

class SampleBufferVideoCallView: UIView {

    override class var layerClass: AnyClass {

        AVSampleBufferDisplayLayer.self

    }

    var sampleBufferDisplayLayer: AVSampleBufferDisplayLayer {

        layer as! AVSampleBufferDisplayLayer

    }

}

2. Create an AVPictureInPictureVideoCallViewController and add the source as a subview to display the source view.

let pipVideoCallViewController = AVPictureInPictureVideoCallViewController()

pipVideoCallViewController.preferredContentSize = CGSize(width: 640, height: 480)  

pipVideoCallViewController.view.addSubview(sampleBufferVideoCallView)

3. Create an AVPictureInPictureController.ContentSource that represents the source of the content the system displays in ViewController.swift file.

let contentSource = AVPictureInPictureController.ContentSource(

            activeVideoCallSourceView: videoContainerView,

            contentViewController: pipVideoCallViewController)

4. Initialize AVPictureInPictureController and set canStartPictureInPictureAutomaticallyFromInline to true.

pipController = AVPictureInPictureController(contentSource: contentSource)   

pipController.canStartPictureInPictureAutomaticallyFromInline = true

With the above setup, your application should be ready for Picture in Picture mode. You can see the video call start when a user moves to the background. If you don't see it, check the Picture in Picture mode in your iPhone/iPad settings to see if it is enabled.

Screen recording of opening the video app, minimizing it and still being able to see a window while opening other applications.iOS Demo

Summary of the Integration Process 

Integrating Vonage's video calling with iOS Picture in Picture enhances multitasking, allowing users to continue video calls uninterrupted while using other apps. This capability boosts productivity, especially for professional meetings and collaborative work.

Further details and the full code for the Picture in Picture Sample App are available on our GitHub repo.

If you have any questions, join our Community Slack or message us on X, formerly known as Twitter.

Additional Resources

Iu Jie Lim

Iu Jie is a Software Engineer who is constantly seeking innovative ways to solve a problem. She is passionate about new technology, especially relating to cloud and AI. Out of work, she likes to spend her time hunting for tasty food with family.

Ready to start building?

Experience seamless connectivity, real-time messaging, and crystal-clear voice and video calls-all at your fingertips.

Subscribe to Our Developer Newsletter

Subscribe to our monthly newsletter to receive our latest updates on tutorials, releases, and events. No spam.