# Quick NumPy UFuncs with Cython 3.0

Cython 3.0 was finally released on July 17, 2023 and it comes with many features. There are many exciting features such as a Pure Python Mode and Initial support for Python's Limited API. In this blog post, we use Cython 3.0's `@cython.ufunc` decorator to quickly generate a NumPy Universal function (UFunc) for fused multiply-add (fma). This blog post runnable as a notebook on Google Colab.

Fused multiply-add computes `(x * y) + z` as if to infinite precision and rounded once to fit the return type. As of August 15, 2023, `fma` is not a part of NumPy, but it is part of the C99 standard. With Cython 3.0, we can quickly generate a NumPy UFunc for `fma`:

```import numpy as np
np_include = np.get_include()
```
```%%cython -I\$np_include
cimport cython
from libc.math cimport fma

@cython.ufunc
cdef double fma_ufunc(double x, double y, double z) nogil:
return fma(x, y, z)
```

With only a `@cython.ufunc` decorator, we get a function that operates on NumPy arrays:

```import numpy as np

a = np.array([1.0, 2.0])
b = np.array([4.5, 8.0])
c = np.array([-2.0, 3.0])
fma_ufunc(a, b, c)
```
Out:
`array([ 2.5, 19. ])`

Since `fma_ufunc` is a UFunc, it has broadcasting out of the box! In the following example, `a` has shape `(2,)` and `b_column` has shape `(1, 2)`, so the resulting matrix follows NumPy's broadcasting rules to return an array of shape `(2, 2)`.

```b_column = np.array([[4.0], [-3.0]])
fma_ufunc(a, b_column, -4.5)
```
Out:
```array([[ -0.5,   3.5],
[ -7.5, -10.5]])```

Moreover, NumPy's UFunc allows us to specific an output array to place the results in with the `out` keyword argument:

```out = np.empty((2, 2))
fma_ufunc(a, b_column, -4.5, out=out)
out
```
Out:
```array([[ -0.5,   3.5],
[ -7.5, -10.5]])```

Depending on the use case, the `out` keyword argument can help with reducing the peak memory usage of your program.

## Supporting Floats and Doubles¶

`C99` defines a fused multiple-add for single and double precision floating point numbers. We use Cython's Fused Types to extend our UFunc to single precision floats:

```%%cython -I\$np_include
cimport cython
from cython cimport floating
from libc.math cimport fma, fmaf

@cython.ufunc
cdef floating fma_fused_ufunc(floating x, floating y, floating z) nogil:
if floating is double:
return fma(x, y, z)
else:  # floating is float in this case
return fmaf(x, y, z)
```

The `floating` fused type consist of single and double precision types, which we use to call the corresponding version of fused multiply-add from `C`. When all the input arrays are `float32`, then the resulting array is also `float32`:

```a = np.array([1.0, 2.0], dtype=np.float32)
b = np.array([4.5, 8.0], dtype=np.float32)
c = np.array([-2.0, 3.0], dtype=np.float32)

fma_fused_ufunc(a, b, c)
```
Out:
`array([ 2.5, 19. ], dtype=float32)`

## Conclusion¶

An alternative approach is Numba's @vectorize decorator, which provides a Just in Time compiled UFunc. On the other hand, Cython provides a Ahead of Time compiled solution using the `@cython.ufunc` decorator to create a NumPy UFunc. In either case, you can make use of many UFunc's features such as array broadcasting or specifying an output array with the `out` keyword argument 😊. This blog post runnable as a notebook on Google Colab.

### Similar Posts

05/14/23
Accessing Data from Python's DataFrame Interchange Protocol
09/12/18
Survival Regression Analysis on Customer Churn
07/31/18
Nuclei Image Segmentation Tutorial
08/28/17
Rodents Of NYC
09/12/15
Bayesian Coin Flips