Using Metal Performance Shaders with Core Image

Jan 18, 2018

Core Image is a great Apple technology for manipulating images using the GPU. I am using it extensively in Colorcast, my video post-processing tool for macOS. With Colorcast you can quickly color correct and color grade video footage you've recorded with a video camera. Doing this it is very helpful to have something called video scopes. Video scopes let you see the image in a more analytic way. You can for example directly see if the image is correctly exposed or if the white balance and the skin tones are correct. There are multiple types of video scopes and some of them are already integrated in Colorcast.

Prior to version 0.3.1, these video scopes where calculated on the CPU, since with Core Image kernels you can only calculate image content on a per-pixel bases, which is not ideal for something like that, as I'll explain later in this post. But in macOS 10.12 and iOS 10.0, Apple added a special kernel type (CIImageProcessorKernel) to Core Image, which makes it possible to integrate Core Image with other image rendering technologies, like Metal Performance Shaders (MPS). Metal Performance Shaders offer a lot more flexibility than just plain Core Image kernels.

Let's take an RGB Parade as an example and explain what is needed to calculate an image like that.

RGB Parade

An RGB Parade is a video scope, that renders a waveform of the Red, the Green and the Blue channel side by side. Every pixel position in an image has an X and a Y coordinate associated with it. The waveform diagram projects the X position of the pixel on the x axis and the actual pixel value on the Y axis. The intensity of the diagram pixel hints at the overall count of pixels with that particular pixel value. Cinema5D has a good blog post that explains how to use these scopes. Since rending a pixel in the waveform involves counting the number of pixels that have Y as there particular value, you can see that doing this for every possible pixel is quite time consuming. An image with 512x512 pixels would need 512 times more time to render than any normal color filter.

And this is where Metal Performance Shaders come into play. You can pass an integer buffer to the metal shader, that has the same size as the image pixels.

kernel void
    texture2d<float, access::sample> inTexture [[texture(0)]],
    volatile device atomic_uint* columnDataRed [[buffer(0)]],
    sampler wrapSampler [[sampler(0)]],
    uint2 gid [[thread_position_in_grid]]

For every pixel of the source image, you increase the integer at the correct position in the buffer by one, but make sure to do that atomically, since the shader function runs in parallel on the GPU for every pixel.

ushort w = inTexture.get_width();
ushort h = inTexture.get_height();
ushort hmax = h-1;
float4 srcPx  = inTexture.sample(wrapSampler, float2(gid));

ushort y = (ushort)(clamp(0.0, (float)((srcPx.r) * hmax), (float)hmax));
atomic_fetch_add_explicit(columnDataRed + ((y * w) + gid.x), 1, memory_order_relaxed);

In a second render pass, you take all those integer values and write the correct color information into the texture of the waveform diagram.

ushort w = inTexture.get_width();
ushort h = inTexture.get_height();
ushort hmax = h-1;

uint y = (uint)(clamp((float)(hmax-gid.y), 0.0, (float)hmax));
uint cid = (y * w) + gid.x;
uint red = atomic_load_explicit( columnDataRed + cid, memory_order_relaxed );
float4 out = float4(clamp(red / 5.0, 0.0, 1.0),
                    clamp(green / 5.0, 0.0, 1.0),
                    clamp(blue / 5.0, 0.0, 1.0),

This method only takes 2x the time of a normal color filter, which is not that bad.

The complete code on GitHub includes the Core Image filter, the CIImageProcessorKernel subclass that applies the shader to the Metal texture and the shader code itself. The Core Image filter can be used as any other filter. Make sure to create the image using a Metal texture and render the CIImage inside an MTKView subclass.