As the infrastructure upgrades and expands capacity, 4K signals are becoming increasingly popular in various sectors. This poses new challenge for developers to render and encode video, since 4K data is four times the size of Full HD data (1920x1080). The rapid development of GPU technology in recent years has made GPU developers’ top choice as a tool for video encoding. If the 4K@60Hz signal can be processed by GPU, it will relieve the burden from the CPU, thus improving product stability and saving overall costs. This article will focus on how to use Magewell MWCapture SDK to implement the capture, rendering, and encoding of one-channel 4K@60Hz signal using the GPU on a Mac computer.
MWCapture SDK is a set of software libraries and procedures for implementing capture, encoding, rendering functions of Magewell I/O products. It is intended to help developers quickly optimize applications. The software libraries provide Magewel custom APIs for you to integrate the functions of capture devices into your applications, such as capturing A/V, obtaining information about the input signal, and setting the signal source. You can directly use these procedures, or refer to these procedures when building your own application. The functions available depend on the model of the capture card you have, firmware version, and driver, and your hardware configuration,
Depending on the compatibility of the capture device you use, the library functions can be one of following:
When the encoding format is H.264:
Mac mini (2018)
CPU: Quad-Core Intel Core i5
Memory: 2GB
GPU: Intel UHD Graphics 630 1536MB
iMac (2019)
CPU: 6-Core Intel Core i3
Memory: 8GB
GPU: AMD Radeon Pro 570X
When the encoding format is H.265:
iMac Pro (2019)
CPU: Intel Xeon 8 Core Processor
Memory: 32GB
GPU: AMD Radeon Vega 56
We recommend the AVCapture procedure in the MWCapture SDK be used for testing. This procedure has been tested by Magewell in detail and has proved capable of giving full play to GPU performance to achieve expected functions.
Development environment:
MacOS: 10.11 or above
Xcode: compatible with the installed macOS
Video capture is implemented using Magewell proprietary capture APIs. The steps are as below:
Note: The capture APIs provided by MWCapture SDK are consistent across different platforms (Windows, Linux, and macOS), so the video capture code in the AVCapture procedure can be ported to any platforms for use. However, the video rendering and encoding APIs mentioned in this document use macOS-specific APIs, and are not compatible with the Windows and Linux platforms.
The following is the major part of the code for capturing one frame of video:
while (self.running) {
llExpireTime = llExpireTime + dwFrameDuration;
LONGLONG llCurrentTime = 0LL;
xr = MWGetDeviceTime(self.hChannel, &llCurrentTime);
if (xr != MW_SUCCEEDED) {
llExpireTime = 0LL; usleep(10000);
continue;
}
if (llExpireTime < llCurrentTime) {
llExpireTime = llCurrentTime;
}
xr = MWScheduleTimer(self.hChannel, hTimerNotify, llExpireTime);
if (xr != MW_SUCCEEDED) {
llExpireTime = llCurrentTime;
continue;
}
DWORD dwRet = MWWaitEvent(hTimerEvent, 1000);
if (dwRet <= 0) {
continue;
}
........
if (frame->pixelBuffer) {
........
xr = MWCaptureVideoFrameToVirtualAddressEx(self.hChannel,
MWCAP_VIDEO_FRAME_ID_NEWEST_BUFFERED,
byBuffer,
dwFrameSize,
cbStride,
FALSE,
(MWCAP_PTR64)pixelBuffer,
self.fourcc,
self.width,
self.height,
0,
0,
NULL,
NULL,
0,
100,
0,
100,
0,
MWCAP_VIDEO_DEINTERLACE_BLEND,
MWCAP_VIDEO_ASPECT_RATIO_IGNORE,
&rcSrc,
NULL,
0,
0,
MWCAP_VIDEO_COLOR_FORMAT_UNKNOWN,
MWCAP_VIDEO_QUANTIZATION_UNKNOWN,
MWCAP_VIDEO_SATURATION_UNKNOWN);
MWWaitEvent(hCaptureEvent, -1);
CVPixelBufferUnlockBaseAddress(frame->pixelBuffer, 0);
........
}
}
........
} while (FALSE);
The AVCapture procedure has optimized video rendering by encapsulating the video data pointer as the CMSampleBufferRef class and passing the class to the renderer for rendering. The CMSampleBufferRef class only references original video data, so there would be no data copy during the process.
if (self.viewEnable) {
CMSampleTimingInfo timing = {kCMTimeInvalid, kCMTimeInvalid, kCMTimeInvalid};
CMVideoFormatDescriptionRef videoInfo = NULL;
CVReturn result = CMVideoFormatDescriptionCreateForImageBuffer(NULL, frame->pixelBuffer, &videoInfo);
CMSampleBufferRef sampleBuffer = NULL;
result = CMSampleBufferCreateForImageBuffer(kCFAllocatorDefault, frame->pixelBuffer, true, NULL, NULL, videoInfo, &timing, &sampleBuffer);
CFRelease(videoInfo);
CFArrayRef attachments = CMSampleBufferGetSampleAttachmentsArray(sampleBuffer, YES);
CFMutableDictionaryRef dict = (CFMutableDictionaryRef)CFArrayGetValueAtIndex(attachments, 0);
CFDictionarySetValue(dict, kCMSampleAttachmentKey_DisplayImmediately, kCFBooleanTrue);
if (self.videoLayer) {
[self.videoLayer enqueueSampleBuffer:(CMSampleBufferRef)sampleBuffer];
}
CFRelease(sampleBuffer);
}
AVCapture has made the following optimization on video encoding: Start a dedicated thread for video encoding, since the encoding can involve a great amount of computation.
if (self.audioCaptureThreadId == 0) {
pthread_t tid = 0;
if (0 == pthread_create(&tid, NULL, onVideoEncodeThreadProc, (__bridge void*)(self))) {
self.audioCaptureThreadId = tid;
}
}
After capturing the data of a frame, move the data pointer to a queue. The encoding thread will obtain the data pointer from the queue and send it to the encoder for encoding. Since data is transferred as pointers, there is also no data copy in the process.
[self.vtEncLock lock];
if (self.vtEnc) {
((std::queue > *)self.encPixelFrameQueue)->push(frame);
while(((std::queue > *)self.encPixelFrameQueue)->size() > MAX_VIDEO_ENCODE_BUFFER_FRAMES) {
((std::queue > *)self.encPixelFrameQueue)->pop();
}
}
[self.vtEncLock unlock];
std::queue > *encQueue = (std::queue > *)self.encPixelFrameQueue;
while (self.encoding) {
std::shared_ptr frame;
[self.vtEncLock lock];
if (!encQueue->empty()) {
frame = encQueue->front();
}
[self.vtEncLock unlock];
if (frame != NULL && frame->pixelBuffer) {
if (self.vtEnc) {
//printf("put video frame:%lld\n", frame->timestamp);
mw_venc_put_imagebuffer(self.vtEnc, frame->pixelBuffer, frame->timestamp);
}
[self.vtEncLock lock];
encQueue->pop();
[self.vtEncLock unlock];
........
} else {
usleep(5000);
}
}
All in all, to preview and record 4K@60Hz video simultaneously, it is of paramount importance to reduce video data copying as much as possible or even achieve zero-copy, so that the performance can be optimized to the maximum.