Cython 3.0 was finally released on July 17, 2023 and it comes with many features. There are many exciting features such as a Pure Python Mode and Initial support for Python's Limited API. In this blog post, we use Cython 3.0's @cython.ufunc decorator to quickly generate a NumPy Universal function (UFunc) for fused multiply-add (fma). This blog post is runnable as a notebook on Google Colab.

Fused Multiply-Add and UFuncs¶

Fused multiply-add computes (x * y) + z as if to infinite precision and rounded once to fit the return type. As of August 15, 2023, fma is not a part of NumPy, but it is part of the C99 standard. With Cython 3.0, we can quickly generate a NumPy UFunc for fma:

import numpy as np
np_include = np.get_include()

%%cython -I$np_include
cimport cython
from libc.math cimport fma

@cython.ufunc
cdef double fma_ufunc(double x, double y, double z) nogil:
    return fma(x, y, z)

With only a @cython.ufunc decorator, we get a function that operates on NumPy arrays:

import numpy as np

a = np.array([1.0, 2.0])
b = np.array([4.5, 8.0])
c = np.array([-2.0, 3.0])
fma_ufunc(a, b, c)

Out[5]:

array([ 2.5, 19. ])

Since fma_ufunc is a UFunc, it has broadcasting out of the box! In the following example, a has shape (2,) and b_column has shape (1, 2), so the resulting matrix follows NumPy's broadcasting rules to return an array of shape (2, 2).

b_column = np.array([[4.0], [-3.0]])
fma_ufunc(a, b_column, -4.5)

Out[6]:

array([[ -0.5,   3.5],
       [ -7.5, -10.5]])

Moreover, NumPy's UFunc allows us to specific an output array to place the results in with the out keyword argument:

out = np.empty((2, 2))
fma_ufunc(a, b_column, -4.5, out=out)
out

Out[7]:

array([[ -0.5,   3.5],
       [ -7.5, -10.5]])

Depending on the use case, the out keyword argument can help with reducing the peak memory usage of your program.

Supporting Floats and Doubles¶

C99 defines a fused multiple-add for single and double precision floating point numbers. We use Cython's Fused Types to extend our UFunc to single precision floats:

%%cython -I$np_include
cimport cython
from cython cimport floating
from libc.math cimport fma, fmaf

@cython.ufunc
cdef floating fma_fused_ufunc(floating x, floating y, floating z) nogil:
    if floating is double:
        return fma(x, y, z)
    else:  # floating is float in this case
        return fmaf(x, y, z)

The floating fused type consist of single and double precision types, which we use to call the corresponding version of fused multiply-add from C. When all the input arrays are float32, then the resulting array is also float32:

a = np.array([1.0, 2.0], dtype=np.float32)
b = np.array([4.5, 8.0], dtype=np.float32)
c = np.array([-2.0, 3.0], dtype=np.float32)

fma_fused_ufunc(a, b, c)

Out[11]:

array([ 2.5, 19. ], dtype=float32)

Conclusion¶

An alternative approach is Numba's @vectorize decorator, which provides a Just in Time compiled UFunc. On the other hand, Cython provides a Ahead of Time compiled solution using the @cython.ufunc decorator to create a NumPy UFunc. In either case, you can make use of many UFunc's features such as array broadcasting or specifying an output array with the out keyword argument 😊. This blog post runnable as a notebook on Google Colab.

Quick NumPy UFuncs with Cython 3.0

Fused Multiply-Add and UFuncs¶

Supporting Floats and Doubles¶

Conclusion¶

Similar Posts