Cython 3.0 was finally released on July 17, 2023 and it comes with many features. There are many exciting features such as a Pure Python Mode and Initial support for Python's Limited API. In this blog post, we use Cython 3.0's @cython.ufunc
decorator to quickly generate a NumPy Universal function (UFunc) for fused multiply-add (fma). This blog post is runnable as a notebook on Google Colab.
Fused Multiply-Add and UFuncs¶
Fused multiply-add computes (x * y) + z
as if to infinite precision and rounded once to fit the return type. As of August 15, 2023, fma
is not a part of NumPy, but it is part of the C99 standard. With Cython 3.0, we can quickly generate a NumPy UFunc for fma
:
import numpy as np
np_include = np.get_include()
%%cython -I$np_include
cimport cython
from libc.math cimport fma
@cython.ufunc
cdef double fma_ufunc(double x, double y, double z) nogil:
return fma(x, y, z)
With only a @cython.ufunc
decorator, we get a function that operates on NumPy arrays:
import numpy as np
a = np.array([1.0, 2.0])
b = np.array([4.5, 8.0])
c = np.array([-2.0, 3.0])
fma_ufunc(a, b, c)
Since fma_ufunc
is a UFunc, it has broadcasting out of the box! In the following example, a
has shape (2,)
and b_column
has shape (1, 2)
, so the resulting matrix follows NumPy's broadcasting rules to return an array of shape (2, 2)
.
b_column = np.array([[4.0], [-3.0]])
fma_ufunc(a, b_column, -4.5)
Moreover, NumPy's UFunc allows us to specific an output array to place the results in with the out
keyword argument:
out = np.empty((2, 2))
fma_ufunc(a, b_column, -4.5, out=out)
out
Depending on the use case, the out
keyword argument can help with reducing the peak memory usage of your program.
Supporting Floats and Doubles¶
C99
defines a fused multiple-add for single and double precision floating point numbers. We use Cython's Fused Types to extend our UFunc to single precision floats:
%%cython -I$np_include
cimport cython
from cython cimport floating
from libc.math cimport fma, fmaf
@cython.ufunc
cdef floating fma_fused_ufunc(floating x, floating y, floating z) nogil:
if floating is double:
return fma(x, y, z)
else: # floating is float in this case
return fmaf(x, y, z)
The floating
fused type consist of single and double precision types, which we use to call the corresponding version of fused multiply-add from C
. When all the input arrays are float32
, then the resulting array is also float32
:
a = np.array([1.0, 2.0], dtype=np.float32)
b = np.array([4.5, 8.0], dtype=np.float32)
c = np.array([-2.0, 3.0], dtype=np.float32)
fma_fused_ufunc(a, b, c)
Conclusion¶
An alternative approach is Numba's @vectorize decorator, which provides a Just in Time compiled UFunc. On the other hand, Cython provides a Ahead of Time compiled solution using the @cython.ufunc
decorator to create a NumPy UFunc. In either case, you can make use of many UFunc's features such as array broadcasting or specifying an output array with the out
keyword argument 😊. This blog post runnable as a notebook on Google Colab.