Custom Video Capturing

Overview

The Vonage Video API allows for modifications to be made to the video capturer to be used in Android and iOS applications.

This how-to will go over:

Make modifications to the video capturer in your Vonage Video Android application
Make modifications to the video capturer in your Vonage Video iOS application

Android

Before you start

The code for this section is available in the Basic-Video-Capturer-Camera-2-Java project of the opentok-android-sdk-samples repo. If you haven't already, you'll need to clone the repo into a local directory. On the command line, run:

git clone git@github.com:opentok/opentok-android-sdk-samples.git

Open the Basic-Video-Capturer-Camera-2-Java project in Android Studio to follow along.

Exploring the Code

In this example, the app uses a custom video capturer to mirror a video image. This is done to illustrate the basic principals of setting up a custom video capturer.

MirrorVideoCapturer is a custom class that extends the BaseVideoCapturer class (defined in the Android SDK). The BaseVideoCapturer class lets you define a custom video capturer to be used by a Vonage Video publisher:

publisher = new Publisher.Builder(MainActivity.this)
    .capturer(new MirrorVideoCapturer(MainActivity.this))
    .build();

The getCaptureSettings() method provides settings used by the custom video capturer:

@Override
public synchronized CaptureSettings getCaptureSettings() {
    CaptureSettings captureSettings = new CaptureSettings();
    captureSettings.fps = desiredFps;
    captureSettings.width = (null != cameraFrame) ? cameraFrame.getWidth() : 0;
    captureSettings.height = (null != cameraFrame) ? cameraFrame.getHeight() : 0;
    captureSettings.format = BaseVideoCapturer.NV21;
    captureSettings.expectedDelay = 0;
    return captureSettings;
}

The BaseVideoCapturer.CaptureSetting class (which defines the capturerSettings property) is defined by the Android SDK. In this sample code, the format of the video capturer is set to use NV21 as the pixel format, with a specific number of frames per second, a specific height, and a specific width.

The BaseVideoCapturer startCapture() method is called when a publisher starts capturing video to be sent as a stream to the session. This will occur after the Session.publish(publisher) method is called:

@Override
public synchronized int startCapture() {
    Log.d(TAG,"startCapture enter (cameraState: "+ cameraState +")");

    if (null != camera && CameraState.OPEN == cameraState) {
        return startCameraCapture();
    } else if (CameraState.SETUP == cameraState) {
        Log.d(TAG,"camera not yet ready, queuing the start until camera is opened");
        executeAfterCameraOpened = () -> startCameraCapture();
    } else {
        throw new Camera2Exception("Start Capture called before init successfully completed");
    }

    Log.d(TAG,"startCapture exit");

    return 0;
}

iOS

Before you start

The code for this section is in the Basic Video Capturer project of the opentok-ios-sdk-samples repo, so if you haven't already, you'll need to clone the repo into a local directory — this can be done using the command line:

git clone https://github.com/opentok/opentok-ios-sdk-samples

Change directory to the Basic Video Capturer project:

cd opentok-ios-sdk-samples/Basic-Video-Capturer

Then install the Vonage Video dependency:

pod install

Exploring the Code

This project shows you how to make minor modifications to the video capturer used by the OTPublisher class. Open the project in Xcode to follow along.

In this example, the app uses a custom video capturer to publish random pixels (white noise). This is done simply to illustrate the basic principals of setting up a custom video capturer. (For a more practical example, see the Camera Video Capturer and Screen Video Capturer examples, described in the sections that follow.)

In the main ViewController, after calling [_session publish:_publisher error:&error] to initiate publishing of an audio-video stream, the videoCapture property of the OTPublisher object is set to an instance of OTKBasicVideoCapturer:

_publisher.videoCapture = [[OTKBasicVideoCapturer alloc] init];

OTKBasicVideoCapturer is a custom class that implements the OTVideoCapture protocol (defined in the Vonage Video iOS SDK). This protocol lets you define a custom video capturer to be used by an Vonage Video publisher.

The [OTVideoCapture initCapture:] method initializes capture settings to be used by the custom video capturer. In this sample's custom implementation of OTVideoCapture (OTKBasicVideoCapturer) the initCapture method sets properties of the format property of the OTVideoCapture instance:

- (void)initCapture
{
    self.format = [[OTVideoFormat alloc] init];
    self.format.pixelFormat = OTPixelFormatARGB;
    self.format.bytesPerRow = [@[@(kImageWidth * 4)] mutableCopy];
    self.format.imageHeight = kImageHeight;
    self.format.imageWidth = kImageWidth;
}

The OTVideoFormat class (which defines this format property) is defined by the Vonage Video iOS SDK. In this sample code, the format of the video capturer is set to use ARGB as the pixel format, with a specific number of bytes per row, a specific height, and a specific width.

The [OTVideoCapture setVideoCaptureConsumer] sets an OTVideoCaptureConsumer object (defined by the Vonage Video iOS SDK) the the video consumer uses to transmit video frames to the publisher's stream. In the OTKBasicVideoCapturer, this method sets a local OTVideoCaptureConsumer instance as the consumer:

- (void)setVideoCaptureConsumer:(id<OTVideoCaptureConsumer>)videoCaptureConsumer
{
    // Save consumer instance in order to use it to send frames to the session
    self.consumer = videoCaptureConsumer;
}

The [OTVideoCapture startCapture:] method is called when a publisher starts capturing video to send as a stream to the Vonage Video session. This will occur after the [Session publish: error:] method is called. In the OTKBasicVideoCapturer of this method, the [self produceFrame] method is called on a background queue after a set interval:

- (int32_t)startCapture
{
    self.captureStarted = YES;
    dispatch_after(kTimerInterval,
                   dispatch_get_global_queue(DISPATCH_QUEUE_PRIORITY_BACKGROUND, 0),
                   ^{
                       [self produceFrame];
                   });

    return 0;
}

The [self produceFrame] method generates an OTVideoFrame object (defined by the Vonage Video iOS SDK) that represents a frame of video. In this case, the frame contains random pixels filling the defined height and width for the sample video format:

- (void)produceFrame
{
     OTVideoFrame *frame = [[OTVideoFrame alloc] initWithFormat:self.format];

    // Generate a image with random pixels
    u_int8_t *imageData[1];
    imageData[0] = malloc(sizeof(uint8_t) * kImageHeight * kImageWidth * 4);
    for (int i = 0; i < kImageWidth * kImageHeight * 4; i+=4) {
        imageData[0][i] = rand() % 255;   // A
        imageData[0][i+1] = rand() % 255; // R
        imageData[0][i+2] = rand() % 255; // G
        imageData[0][i+3] = rand() % 255; // B
    }

    [frame setPlanesWithPointers:imageData numPlanes:1];
    [self.consumer consumeFrame:frame];

    free(imageData[0]);

    if (self.captureStarted) {
        dispatch_after(kTimerInterval,
                       dispatch_get_global_queue(DISPATCH_QUEUE_PRIORITY_BACKGROUND, 0),
                       ^{
                           [self produceFrame];
                       });
    }
}

The method passes the frame to the [consumeFrame] method of the instance of the OTVideoCaptureConsumer used by this video capturer (described above). This causes the publisher to send the frame of data to the video stream in the session.

The code for this sample is also included in the Basic Video Capturer project of the opentok-ios-sdk-samples repo. To use it, uncomment the following line:

_publisher.videoCapture = [[OTKBasicVideoCapturerCamera alloc] initWithPreset:AVCaptureSessionPreset352x288 andDesiredFrameRate:30];

Then comment out the line from part 1:

// _publisher.videoCapture = [[OTKBasicVideoCapturer alloc] init];

This project shows you how to use a custom video capturer using the device camera as the video source.

This sample code uses the Apple AVFoundation framework to capture video from a camera and publish it to a connected session. The ViewController class creates a session, instantiates subscribers, and sets up the publisher. The captureOutput method creates a frame, captures a screenshot, tags the frame with a timestamp and saves it in an instance of consumer. The publisher accesses the consumer to obtain the video frame.

Note that because this sample needs to access the device's camera, you must test it on an iOS device. You cannot test it in the iOS simulator.

The [OTKBasicVideoCapturerCamera initWithPreset: andDesiredFrameRate:] method is an initializer for the OTKBasicVideoCapturerCamera class. It calls the sizeFromAVCapturePreset method to set the resolution of the image. The image size and frame rate are also set here. A separate queue is created for capturing images, so as not to affect the UI queue.

- (id)initWithPreset:(NSString *)preset andDesiredFrameRate:(NSUInteger)frameRate
{
    self = [super init];
    if (self) {
        self.sessionPreset = preset;
        CGSize imageSize = [self sizeFromAVCapturePreset:self.sessionPreset];
        _imageHeight = imageSize.height;
        _imageWidth = imageSize.width;
        _desiredFrameRate = frameRate;

        _captureQueue = dispatch_queue_create("com.tokbox.OTKBasicVideoCapturer",
          DISPATCH_QUEUE_SERIAL);
    }
    return self;
}

The sizeFromAVCapturePreset method identifies the string value of the image resolution in the iOS AVFoundation framework and returns a CGSize representation.

The implementation of the [OTVideoCapture initCapture] method uses the AVFoundation framework to set the camera to capture images. In the first part of the method an instance of the AVCaptureVideoDataOutput is used to produce image frames:

- (void)initCapture
{
    NSError *error;
    self.captureSession = [[AVCaptureSession alloc] init];

   [self.captureSession beginConfiguration];

    // Set device capture
    self.captureSession.sessionPreset = self.sessionPreset;
    AVCaptureDevice *videoDevice =
      [AVCaptureDevice defaultDeviceWithMediaType:AVMediaTypeVideo];
    self.inputDevice =
      [AVCaptureDeviceInput deviceInputWithDevice:videoDevice error:&error];
    [self.captureSession addInput:self.inputDevice];

    AVCaptureVideoDataOutput *outputDevice = [[AVCaptureVideoDataOutput alloc] init];
    outputDevice.alwaysDiscardsLateVideoFrames = YES;
    outputDevice.videoSettings =
      @{(NSString *)kCVPixelBufferPixelFormatTypeKey:
        @(kCVPixelFormatType_420YpCbCr8BiPlanarVideoRange
      )};

    [outputDevice setSampleBufferDelegate:self queue:self.captureQueue];

    [self.captureSession addOutput:outputDevice];

    // See the next section ...
}

The frames captured with this method are accessed with the [AVCaptureVideoDataOutputSampleBufferDelegate captureOutput:didOutputSampleBuffer:fromConnection:] delegate method. The AVCaptureDevice object represents the camera and its properties. It provides captured images to an AVCaptureSession object.

The second part of the initCapture method calls the bestFrameRateForDevice method to obtain the best frame rate for image capture:

- (void)initCapture
{
    // See previous section ...

    // Set framerate
    double bestFrameRate = [self bestFrameRateForDevice];

    CMTime desiredMinFrameDuration = CMTimeMake(1, bestFrameRate);
    CMTime desiredMaxFrameDuration = CMTimeMake(1, bestFrameRate);

    [self.inputDevice.device lockForConfiguration:&error];
    self.inputDevice.device.activeVideoMaxFrameDuration = desiredMaxFrameDuration;
    self.inputDevice.device.activeVideoMinFrameDuration = desiredMinFrameDuration;

    [self.captureSession commitConfiguration];

    self.format = [OTVideoFormat videoFormatNV12WithWidth:self.imageWidth
                                                   height:self.imageHeight];
}

The [self bestFrameRateForDevice] method returns the best frame rate for the capturing device:

- (double)bestFrameRateForDevice
{
    double bestFrameRate = 0;
    for (AVFrameRateRange* range in
         self.inputDevice.device.activeFormat.videoSupportedFrameRateRanges)
    {
        CMTime currentDuration = range.minFrameDuration;
        double currentFrameRate = currentDuration.timescale / currentDuration.value;
        if (currentFrameRate > bestFrameRate && currentFrameRate < self.desiredFrameRate) {
            bestFrameRate = currentFrameRate;
        }
    }
    return bestFrameRate;
}

The AVFoundation framework requires a minimum and maximum range of frame rates to optimize the quality of an image capture. This range is set in the bestFrameRate object. For simplicity, the minimum and maximum frame rate is set as the same number but you may want to set your own minimum and maximum frame rates to obtain better image quality based on the speed of your network. In this application, the frame rate and resolution are fixed.

This method sets the video capture consumer, defined by the OTVideoCaptureConsumer protocol.

- (void)setVideoCaptureConsumer:(id<OTVideoCaptureConsumer>)videoCaptureConsumer
{
    self.consumer = videoCaptureConsumer;
}

The [OTVideoCapture captureSettings] method sets the pixel format and size of the image used by the video capturer, by setting properties of the OTVideoFormat object.

The [OTVideoCapture currentDeviceOrientation] method queries the orientation of the image in AVFoundation framework and returns its equivalent defined by the OTVideoOrientation enum in Vonage Video iOS SDK.

The implementation of the [OTVideoCapture startCapture] method is called when the publisher starts capturing video to publish. It calls the [AVCaptureSession startRunning] method of the AVCaptureSession object:

- (int32_t)startCapture
{
    self.captureStarted = YES;
    [self.captureSession startRunning];

    return 0;
}

The [AVCaptureVideoDataOutputSampleBufferDelegate captureOutput:didOutputSampleBuffer:fromConnection:] delegate method is called when a new video frame is available from the camera.

- (void)captureOutput:(AVCaptureOutput *)captureOutput
didOutputSampleBuffer:(CMSampleBufferRef)sampleBuffer
       fromConnection:(AVCaptureConnection *)connection
{
    if (!self.captureStarted)
        return;

    CVImageBufferRef imageBuffer = CMSampleBufferGetImageBuffer(sampleBuffer);
    OTVideoFrame *frame = [[OTVideoFrame alloc] initWithFormat:self.format];

    NSUInteger planeCount = CVPixelBufferGetPlaneCount(imageBuffer);

    uint8_t *buffer = malloc(sizeof(uint8_t) * CVPixelBufferGetDataSize(imageBuffer));
    uint8_t *dst = buffer;
    uint8_t *planes[planeCount];

    CVPixelBufferLockBaseAddress(imageBuffer, 0);
    for (int i = 0; i < planeCount; i++) {
        size_t planeSize = CVPixelBufferGetBytesPerRowOfPlane(imageBuffer, i)
          * CVPixelBufferGetHeightOfPlane(imageBuffer, i);

        planes[i] = dst;
        dst += planeSize;

        memcpy(planes[i],
                CVPixelBufferGetBaseAddressOfPlane(imageBuffer, i),
                planeSize);
    }

    CMTime minFrameDuration = self.inputDevice.device.activeVideoMinFrameDuration;
    frame.format.estimatedFramesPerSecond = minFrameDuration.timescale / minFrameDuration.value;
    frame.format.estimatedCaptureDelay = 100;
    frame.orientation = [self currentDeviceOrientation];

    CMTime time = CMSampleBufferGetPresentationTimeStamp(sampleBuffer);
    frame.timestamp = time;
    [frame setPlanesWithPointers:planes numPlanes:planeCount];

    [self.consumer consumeFrame:frame];

    free(buffer);
    CVPixelBufferUnlockBaseAddress(imageBuffer, 0);
}

This method does the following:

Creates an OTVideoFrame instance to define the new video frame.
Saves an image buffer of memory based on the size of the image.
Writes image data from two planes into one memory buffer. Since the image is an NV12, its data is distributed over two planes. There is a plane for Y data and a plane for UV data. A for loop is executed to iterate through both planes and write their data into one memory buffer.
Creates a timestamp to tag a captured image. Every image is tagged with a timestamp so both publisher and subscriber are able to create the same timeline and reference the frames in the same order.
Calls the [OTVideoCaptureConsumer consumeFrame:] method, passing in the OTVideoFrame object. This causes the publisher to send the frame in the stream it publishes.

The implementation of the [AVCaptureVideoDataOutputSampleBufferDelegate captureOutput:didDropSampleBuffer:fromConnection] method is called whenever there is a delay in receiving frames. It drops frames to keep publishing to the session without interruption:

- (void)captureOutput:(AVCaptureOutput *)captureOutput
  didDropSampleBuffer:(CMSampleBufferRef)sampleBuffer
       fromConnection:(AVCaptureConnection *)connection
{
    NSLog(@"Frame dropped");
}

Custom Video Capturing

Overview

Android

Before you start

Exploring the Code

iOS

Before you start

Exploring the Code

See also