Cython 3.0 was finally released on July 17, 2023 and it comes with many features. There are many exciting features such as a Pure Python Mode and Initial support for Python's Limited API. In this blog post, we use Cython 3.0's `@cython.ufunc`

decorator to quickly generate a NumPy Universal function (UFunc) for fused multiply-add (fma). This blog post runnable as a notebook on Google Colab.

## Fused Multiply-Add and UFuncs¶

Fused multiply-add computes `(x * y) + z`

as if to infinite precision and rounded once to fit the return type. As of August 15, 2023, `fma`

is not a part of NumPy, but it is part of the C99 standard. With Cython 3.0, we can quickly generate a NumPy UFunc for `fma`

:

```
import numpy as np
np_include = np.get_include()
```

```
%%cython -I$np_include
cimport cython
from libc.math cimport fma
@cython.ufunc
cdef double fma_ufunc(double x, double y, double z) nogil:
return fma(x, y, z)
```

With only a `@cython.ufunc`

decorator, we get a function that operates on NumPy arrays:

```
import numpy as np
a = np.array([1.0, 2.0])
b = np.array([4.5, 8.0])
c = np.array([-2.0, 3.0])
fma_ufunc(a, b, c)
```

Since `fma_ufunc`

is a UFunc, it has broadcasting out of the box! In the following example, `a`

has shape `(2,)`

and `b_column`

has shape `(1, 2)`

, so the resulting matrix follows NumPy's broadcasting rules to return an array of shape `(2, 2)`

.

```
b_column = np.array([[4.0], [-3.0]])
fma_ufunc(a, b_column, -4.5)
```

Moreover, NumPy's UFunc allows us to specific an output array to place the results in with the `out`

keyword argument:

```
out = np.empty((2, 2))
fma_ufunc(a, b_column, -4.5, out=out)
out
```

Depending on the use case, the `out`

keyword argument can help with reducing the peak memory usage of your program.

## Supporting Floats and Doubles¶

`C99`

defines a fused multiple-add for single and double precision floating point numbers. We use Cython's Fused Types to extend our UFunc to single precision floats:

```
%%cython -I$np_include
cimport cython
from cython cimport floating
from libc.math cimport fma, fmaf
@cython.ufunc
cdef floating fma_fused_ufunc(floating x, floating y, floating z) nogil:
if floating is double:
return fma(x, y, z)
else: # floating is float in this case
return fmaf(x, y, z)
```

The `floating`

fused type consist of single and double precision types, which we use to call the corresponding version of fused multiply-add from `C`

. When all the input arrays are `float32`

, then the resulting array is also `float32`

:

```
a = np.array([1.0, 2.0], dtype=np.float32)
b = np.array([4.5, 8.0], dtype=np.float32)
c = np.array([-2.0, 3.0], dtype=np.float32)
fma_fused_ufunc(a, b, c)
```

## Conclusion¶

An alternative approach is Numba's @vectorize decorator, which provides a Just in Time compiled UFunc. On the other hand, Cython provides a Ahead of Time compiled solution using the `@cython.ufunc`

decorator to create a NumPy UFunc. In either case, you can make use of many UFunc's features such as array broadcasting or specifying an output array with the `out`

keyword argument 😊. This blog post runnable as a notebook on Google Colab.