Using Metal Performance Shaders with Core Image

Jan 18, 2018

Core Image is a great Apple technology for manipulating images using the GPU. I am using it extensively in Colorcast, my video post-processing tool for macOS. With Colorcast you can quickly color correct and color grade video footage you've recorded with a video camera. Doing this it is very helpful to have something called video scopes. Video scopes let you see the image in a more analytic way. You can for example directly see if the image is correctly exposed or if the white balance and the skin tones are correct. There are multiple types of video scopes and some of them are already integrated in Colorcast.

Prior to version 0.3.1, these video scopes where calculated on the CPU, since with Core Image kernels you can only calculate image content on a per-pixel bases, which is not ideal for something like that, as I'll explain later in this post. But in macOS 10.12 and iOS 10.0, Apple added a special kernel type (CIImageProcessorKernel) to Core Image, which makes it possible to integrate Core Image with other image rendering technologies, like Metal Performance Shaders (MPS). Metal Performance Shaders offer a lot more flexibility than just plain Core Image kernels.

Let's take an RGB Parade as an example and explain what is needed to calculate an image like that.

RGB Parade

An RGB Parade is a video scope, that renders a waveform of the Red, the Green and the Blue channel side by side. Every pixel position in an image has an X and a Y coordinate associated with it. The waveform diagram projects the X position of the pixel on the x axis and the actual pixel value on the Y axis. The intensity of the diagram pixel hints at the overall count of pixels with that particular pixel value. Cinema5D has a good blog post that explains how to use these scopes. Since rending a pixel in the waveform involves counting the number of pixels that have Y as there particular value, you can see that doing this for every possible pixel is quite time consuming. An image with 512x512 pixels would need 512 times more time to render than any normal color filter.

And this is where Metal Performance Shaders come into play. You can pass an integer buffer to the metal shader, that has the same size as the image pixels.

kernel void
    texture2d<float, access::sample> inTexture [[texture(0)]],
    volatile device atomic_uint* columnDataRed [[buffer(0)]],
    sampler wrapSampler [[sampler(0)]],
    uint2 gid [[thread_position_in_grid]]

For every pixel of the source image, you increase the integer at the correct position in the buffer by one, but make sure to do that atomically, since the shader function runs in parallel on the GPU for every pixel.

ushort w = inTexture.get_width();
ushort h = inTexture.get_height();
ushort hmax = h-1;
float4 srcPx  = inTexture.sample(wrapSampler, float2(gid));

ushort y = (ushort)(clamp(0.0, (float)((srcPx.r) * hmax), (float)hmax));
atomic_fetch_add_explicit(columnDataRed + ((y * w) + gid.x), 1, memory_order_relaxed);

In a second render pass, you take all those integer values and write the correct color information into the texture of the waveform diagram.

ushort w = inTexture.get_width();
ushort h = inTexture.get_height();
ushort hmax = h-1;

uint y = (uint)(clamp((float)(hmax-gid.y), 0.0, (float)hmax));
uint cid = (y * w) + gid.x;
uint red = atomic_load_explicit( columnDataRed + cid, memory_order_relaxed );
float4 out = float4(clamp(red / 5.0, 0.0, 1.0),
                    clamp(green / 5.0, 0.0, 1.0),
                    clamp(blue / 5.0, 0.0, 1.0),

This method only takes 2x the time of a normal color filter, which is not that bad.

The complete code on GitHub includes the Core Image filter, the CIImageProcessorKernel subclass that applies the shader to the Metal texture and the shader code itself. The Core Image filter can be used as any other filter. Make sure to create the image using a Metal texture and render the CIImage inside an MTKView subclass.

Jan 19, 2023
How I Ran My Own Mastodon Server in 10 Minutes

Apr 01, 2019
MacBook Pro (eGPU) vs Hackintosh GPU

Jan 12, 2019
My New Silent, Small Hackintosh for 600 EUR

Nov 22, 2018
Instacast is Now Free and Open Source

Sep 12, 2018
What makes a Mac Pro a Mac Pro

Jul 14, 2018
Why you Should Build a Hackintosh

Apr 16, 2018
Why the Blackmagic Pocket Cinema Camera Is Not a Good Vlogging Cam

Apr 09, 2018
ProRes RAW is Here

Apr 03, 2018
Is the Nokia Steel HR Smart Watch an Apple Watch Killer?

Mar 04, 2018
Workaround for Buggy DNG Handling in macOS

Mar 03, 2018
Colorcast 0.5 Adds Support for Cinema DNG, Anamorphic De-Squeeze and Slow Motion

Feb 06, 2018
Open-Source Objective-C API for Magic Lantern

Feb 06, 2018
Colorcast v0.4 Includes RAW Engine And Supports Magic Lantern

Jan 25, 2018
Thoughts on the DJI Mavic Air

Jan 24, 2018
Why I bought a Sony Alpha 7 Mark II in 2018

Jan 18, 2018
Using Metal Performance Shaders with Core Image

Jan 15, 2018
Night-Mode For Your iOS App

Jan 15, 2018
What’s Color Grading and Why You Want That

Jan 09, 2018
My Review of

Dec 27, 2017
Everything That's Wrong with Hackintosh

Dec 21, 2017
Colorcast Alpha v0.3 Available

Dec 19, 2017
RX Vega 64 Hackintosh for High-End Video Work

Sep 07, 2017
Colorcast Alpha v0.2 Available

Aug 31, 2017
First Alpha Version of Colorcast Available

Jun 01, 2015
Vemedio Product Development Has Been Discontinued

Apr 16, 2015
Audio Books vs. Podcasts

Mar 21, 2015
Rejected for Weak Linked Frameworks

Mar 19, 2015
Ideas for A Better App Store

Mar 05, 2015
Auto-Layout works