Sunday, March 3, 2024

The Array API Customary in SciPy


On this weblog put up I focus on how SciPy
is harnessing the facility of array-agnostic code to develop into extra interoperable, and why that issues.


Hello 👋 I am Lucas Colley, a Laptop Science & Philosophy scholar
from the UK.
This summer season, I’ve had the superb alternative of working beneath the mentorship of
Ralf Gommers, Pamphile Roy and
Irwin Zaid in my Open Supply internship at Quansight Labs.
My mission was to make progress in the direction of SciPy’s purpose of adopting help for the array API normal,
which is able to permit customers to make use of SciPy features with CuPy arrays, PyTorch tensors and probably
many different array/tensor libraries sooner or later.

This put up stands on the shoulders of giants, together with
Anirudh Dagar‘s weblog put up
“Array Libraries Interoperability”
(from which my illustration takes a lot inspiration!) and Aaron Meurer‘s
SciPy 2023 speak “Towards Array Interoperability within the Scientific Python Ecosystem”;
for particulars of wider interoperability efforts and the array API normal,
I urge you to take a look at these hyperlinks!

Right here, I method the subject from the targeted perspective of a SciPy contributor.
I hope that this put up can be useful for anybody who needs to contribute to SciPy’s efforts to help
the array API normal.


SciPy and the Scientific Python Ecosystem

SciPy is one a part of a large assortment of Open Supply instruments, written in Python,
that are elementary to a spread of labor which depends on manipulating numbers with computer systems.
This assortment is named the Scientific Python Ecosystem – a time period which broadly
encompasses Open Supply Python initiatives that are neighborhood pushed and used for science
(see scientific-python.org for extra info).

NumPy is the n-dimensional array library which SciPy is
constructed to work with – it’s SciPy’s solely non-optional runtime dependency.
On the time of writing, the overwhelming majority of SciPy features solely work when offering NumPy arrays,
regardless of the assorted interoperability efforts up to now
(see Anirudh’s weblog put up).
The primary drawback with that is that “NumPy supplies an array implementation that’s in-memory,
CPU-only and single-threaded.”
The potential efficiency enhancements that might be received with parallel algorithms, help for distributed
arrays and help for GPUs (and different {hardware} accelerators) are large, and thus “SciPy has had ‘help
for distributed and GPU arrays’ on its roadmap” for 5 years now.
For extra info, see
this web page on the use circumstances of the array API,
and the weblog put up: “A imaginative and prescient for extensibility to GPU & distributed help for SciPy, scikit-learn, scikit-image and past”
by Ivan Yashchuk and Ralf Gommers.

As a consequence of these efficiency needs, many alternative ‘array supplier’ libraries, comparable to CuPy and
PyTorch, have been developed, with help for a number of of those {hardware} use-cases in thoughts.
Nevertheless, as a consequence of the truth that ‘array shopper’ libraries are sometimes constructed to be appropriate with
just one array supplier library, many features have been reimplemented in numerous shopper libraries,
simply to work with a special supplier library.
In excessive circumstances, this has led to efforts to reimplement total shopper libraries,
for instance cupyx.scipy.
This fixed reimplementation of the identical performance has led many to see the Scientific Python Ecosystem
as quite being fractured into many smaller ecosystems, one constructed round every array library, and to dream for a
world the place the ecosystems are unified and all the things works collectively!

Whereas it might definitely be a really formidable purpose to purpose for all the things working collectively, the purpose
of accelerating interoperability between array libraries is one with widespread help.
There are at present many hurdles which customers can run into when navigating this fractured ecosystem.
They might discover that they wish to convert between totally different array supplier libraries of their initiatives, as a way to
make use of the complete vary of instruments which have been developed.
Nevertheless, for a lot of initiatives, frequent conversion between supplier libraries is a whole no-go: cross-device copying
(e.g. from GPU to CPU when changing from CuPy to NumPy) is usually a large efficiency hit, and different options, comparable to
computerized differentiation, will be misplaced in conversion.

For customers who resolve to make use of a special supplier library to benefit from efficiency advantages, they’ll
discover themselves needing to be taught new APIs to perform the identical duties, which are sometimes related however frustratingly
totally different to these which they’re used to.
There are additionally more likely to be gaps within the performance provided by the libraries, which may lead again to the issue
of changing between libraries, or require the person to search out new methods to finish the identical activity.
For big present initiatives, this makes the prospect of switching to a special supplier library a really daunting one.

On the developer aspect of issues, it looks as if there’s lots of wasted time in reimplementing the identical performance
for a lot of totally different supplier libraries.
Developer time may be very invaluable since these initiatives are nonetheless largely maintained by volunteers (and therefore giant
duties like including GPU help can keep on the roadmap for a few years), so work to scale back the general upkeep
load is vital to accelerating progress.
If each bit of performance solely wanted to be carried out as soon as and labored with any array library, then there would
be many fewer initiatives to keep up, and way more time to work on implementing model new performance.
Thankfully, there was work on an ordinary which is able to assist push us in the direction of this imagined future.


The Array API Customary

The Consortium for Python Knowledge API Requirements has created
the Python array API normal, which is a “specification for what it means for a
library to have an ‘array API’” (see Aaron’s speak).
The Consortium was shaped in 2020 by a bunch of maintainers and trade stakeholders – see
the paper accompanying Aaron’s speak for
extra info.
Array shopper libraries can write ‘array-agnostic’ code which is appropriate with this specification,
quite than any explicit supplier library’s API, and it’ll work with any supplier library which complies with the
normal.
NumPy, CuPy and PyTorch are all planning to be absolutely compliant – with this being certainly one of
the targets for NumPy 2.0 – and JAX and Dask even have implementations
in progress.
The consortium has additionally developed array-api-compat,
a small wrapper round every array library (at present NumPy, CuPy and PyTorch), which, for sensible functions,
is absolutely compliant with the usual.

Which means array shopper libraries are in a position to begin writing and testing array-agnostic code at present.
For a NumPy-consuming library like SciPy, this has opened the chance to implement help for CuPy and
PyTorch by including help for the usual through array-api-compat.

SciPy help for the array API

Earlier than these interoperability efforts began, SciPy didn’t work properly with supplier libraries apart from
NumPy in any respect (as expressed by
this superb illustration
from Anirudh’s weblog put up).
For instance, when making an attempt to make use of scipy.fft.fft with a CuPy array, customers can be hit with this error message:

>>> scipy.fft.fft(cp.exp(2j * cp.pi * cp.arange(8) / 8))

KeyError: 'ALIGNED shouldn't be outlined for cupy.ndarray.flags'

Adopting the usual is step one in the direction of a future the place customers who’ve packages made up of
SciPy features can merely change the supplier library used for his or her enter information, and have the remainder of the
program nonetheless work the identical, whereas benefiting from any advantages of the brand new supplier library.

To kick off the method of this work, the request-for-comment (RFC) problem
“SciPy array varieties & libraries help” was opened.
This set out the essential design precept of “container sort in == container sort out“, in addition to a
detailed plan for the best way to deal with totally different array inputs, and the event technique.
To summarise, the scope of the work is to deal with all array inputs as specified within the RFC and to transform all
present pure Python + NumPy code to be array API appropriate.
Out of scope is changing the C/C++/Cython/Fortran code which is used inside SciPy.
Behind the scenes, SciPy supplies Python APIs for low degree features that are written in different languages and for the
CPU solely.
These APIs are made to work with NumPy arrays – they may positively not work for arrays on a special system to the CPU,
and usually are not assured to work with different varieties of array on the CPU.
For these features which use compiled code, some dispatching mechanism looks as if the fitting answer, however no explicit
implementation or improvement technique has been settled on but – see the dialogue of uarray in
Anirudh’s weblog put up and
the RFC for extra info.

Close to the beginning of my internship,
the primary PR to transform a SciPy submodule, by Pamphile Roy, certainly one of my mentors,
was merged, with scipy.cluster being lined (modulo a number of minor follow-ups).
This PR added lots of equipment (which I’ll discover in additional element later) to allow conversion and testing,
whereas retaining the brand new behaviour behind an experimental surroundings variable.

My fundamental purpose was to transform scipy.fft, SciPy’s Quick Fourier Remodel module.
fft is an
array API normal extension module,
which signifies that array supplier libraries supporting the usual can select to implement it or not.
scipy.fft‘s backend is generally written in C++, however due to the extension module, we have been in a position to convert all
of the usual extension features to be appropriate with all libraries which:

  1. adjust to the usual
  2. implement the fft extension

After the primary PR merged, the flexibility to make use of CuPy arrays and PyTorch tensors
with the usual extension features was enabled when the experimental surroundings variable SCIPY_ARRAY_API is about to 1.
For instance, the code within the following benchmark – the duty of smoothing a picture (taking a 2-D picture,
utilizing scipy.fftn, zero-ing out the best frequency elements, and utilizing scipy.ifftn) utilizing CuPy
arrays – now works:

cpface = cp.asarray(face)

f = scipy.fft.fftn(cpface)

fshift = scipy.fft.fftshift(f)

unique = cp.copy(fshift)

fshift[354:414, 482:532] = 0

f_ishift = scipy.fft.ifftshift(unique - fshift)

x = scipy.fft.ifftn(f_ishift)

On my machine (AMD Ryzen 5 2600 & NVIDIA GeForce GTX 1060 6GB), this demonstrated a ~15x efficiency enchancment over
utilizing NumPy arrays in the identical features.
You possibly can see the complete benchmark on the PR.

Finishing this PR wasn’t with out issue, as a number of SciPy’s model new array API testing infrastructure
wanted enhancements within the course of.
I contributed fairly a number of PRs to assist make this testing infrastructure extra strong and squash some bugs;
yow will discover all of my merged contributions regarding array varieties
on this filtered view of SciPy’s PR tracker.
This was additionally my first ever massive contribution to Open Supply, so there was a big (however very rewarding!) studying
course of with Git, GitHub and collaboration with reviewers.
Within the subsequent part, I am going to dive into the method of changing scipy.fft and share among the classes realized
for changing different submodules.


Tips on how to convert a submodule

I am going to begin by displaying what the mandatory adjustments seem like, earlier than explaining in additional element how among the
equipment works.
Manufacturing features will be divided into three classes, every with totally different conversion steps:
these which simply use pure Python and NumPy, these which use compiled code, and people which implement features
present in normal extension modules.

Pure Python + NumPy code

For features that are made up of pure Python and NumPy, the conversion course of is admittedly fairly easy!
It includes evaluating the at present used features/strategies with these in
the usual API specification
and mimicking the present behaviour.
The important thing step is the addition of xp = array_namespace(x, y), the place x and y (and so forth for extra inputs)
are the enter arrays.
This line is used at the beginning of each transformed operate, and permits us to make use of the features/strategies which
are appropriate with the arrays with out re-importing something!
The remainder of the adjustments are principally changing makes use of of np with xp, alongside making changes the place the
normal API differs from the NumPy API.

For instance, listed below are the modifications made to scipy.fft.fht:

def fht(a, dln, mu, offset=0.0, bias=0.0):

+ xp = array_namespace(a)

+ n = a.form[-1] # instance of a change the place the usual affords just one means of doing issues, whereas NumPy has extra methods

# a_q(r) = a(r) (r/r_c)^{-q}

- a = a * np.exp(-bias*(j - j_c)*dln)

+ j = xp.arange(n, dtype=xp.float64)

+ a = a * xp.exp(-bias*(j - j_c)*dln)

# compute FHT coefficients

- u = fhtcoeff(n, dln, mu, offset=offset, bias=bias)

+ u = xp.asarray(fhtcoeff(n, dln, mu, offset=offset, bias=bias))

# A(ok) = A_q(ok) (ok/k_c)^{-q} (k_c r_c)^{-q}

- A *= np.exp(-bias*((j - j_c)*dln + offset))

+ A *= xp.exp(-bias*((j - j_c)*dln + offset))

Compiled code

For features which hit compiled code, the usual is not sufficient anymore.
For now, we try and convert to NumPy utilizing np.asarray earlier than the compiled code, then convert again to our
array’s namespace utilizing xp.asarray after it.
This may increase exceptions for arrays on a special system to the CPU, as justified in
the RFC (tldr: we wish to keep away from silent system transfers).

Right here is an instance from scipy.fft‘s Discrete Sine and Cosine Transforms, the place _pocketfft is the backend
which SciPy makes use of for NumPy arrays:

def _execute(pocketfft_func, x, sort, s, axes, norm,

overwrite_x, staff, orthogonalize):

y = pocketfft_func(x, sort, s, axes, norm,

overwrite_x=overwrite_x, staff=staff,

orthogonalize=orthogonalize)

def dctn(x, sort=2, s=None, axes=None, norm=None,

overwrite_x=False, staff=None, *, orthogonalize=None):

return _execute(_pocketfft.dctn, x, sort, s, axes, norm,

overwrite_x, staff, orthogonalize)

As a aspect word, there has additionally been work to help the calling of some particular backends as a substitute of hitting
compiled code in scipy.particular (in this PR),
which allows GPU help for a minimum of some supplier libraries.
I’ll preserve this weblog put up targeted on simply array-agnostic code since this work in particular ought to finally be
changed by a extra strong dispatching mechanism, whereas the array-agnostic code is right here to remain!

Utilizing normal extension modules

Regardless of this limitation on GPU help, as talked about above, fft is an ordinary extension module, so the usual can nonetheless assist us with some
compiled code!
For the features within the extension, we are able to verify whether or not the array’s namespace implements the extension with
if hasattr(xp, 'fft'), and use xp.fft if that’s the case.
Since NumPy has an fft submodule, we must be cautious to nonetheless use SciPy’s implementation for NumPy arrays,
for which we’ve got the is_numpy helper.

Right here is how that appears for scipy.fft.fft:

def _execute_1D(func_str, pocketfft_func, x, n, axis, norm, overwrite_x, staff, plan):

return pocketfft_func(x, n=n, axis=axis, norm=norm,

overwrite_x=overwrite_x, staff=staff, plan=plan)

norm = _validate_fft_args(staff, plan, norm)

xp_func = getattr(xp.fft, func_str)

return xp_func(x, n=n, axis=axis, norm=norm)

y = pocketfft_func(x, n=n, axis=axis, norm=norm)

def fft(x, n=None, axis=-1, norm=None,

overwrite_x=False, staff=None, *, plan=None):

return _execute_1D('fft', _pocketfft.fft, x, n=n, axis=axis, norm=norm,

overwrite_x=overwrite_x, staff=staff, plan=plan)

Assessments

After changing the manufacturing features, we additionally have to convert the exams.

We’ve the @array_api_compatible decorator, which parametrizes xp with the
out there backends that we’ve got for testing.
We’ve additionally written three new array-agnostic assertions – xp_assert_close, xp_assert_equal and xp_assert_less
which incorporate checks for matching namespaces and dtypes.
If any exams use outcomes or enter information from NumPy, we convert all the things to the xp namespace earlier than utilizing
these assertions.
The assertions make use of features from xp the place doable which helps to minimise the quantity of system transfers.

Apart from that, we make related adjustments to these within the manufacturing features so that each check works
with all the backends with which we check.
Lastly, for exams of features which hit compiled code, we add the @skip_if_array_api_gpu decorator.

Here’s what that appears like for one of many Discrete Sine and Cosine Transforms exams:

@pytest.mark.parametrize("func", ['dct', 'dst', 'dctn', 'dstn'])

@pytest.mark.parametrize("sort", [1, 2, 3, 4])

@pytest.mark.parametrize("norm", [None, 'backward', 'ortho', 'forward'])

- def test_fftpack_equivalience(func, sort, norm):

+ def test_fftpack_equivalience(func, sort, norm, xp):

x = np.random.rand(8, 16)

+ fftpack_res = xp.asarray(getattr(fftpack, func)(x, sort, norm=norm))

fft_res = getattr(fft, func)(x, sort, norm=norm)

- fftpack_res = getattr(fftpack, func)(x, sort, norm=norm)

- assert_allclose(fft_res, fftpack_res)

+ xp_assert_close(fft_res, fftpack_res)

Enabling and testing with non-NumPy arrays

I am going to speak a bit about implementation particulars now – be at liberty to skip this part in case you are simply right here for
an summary!

After we name array_namespace, just a little extra is finished than simply returning the namespace.
If the surroundings variable SCIPY_ARRAY_API shouldn’t be set, array_api_compat.numpy
(the appropriate wrapper for numpy) is returned, no matter the kind of the enter array.
This permits us to retain all pre-array API behaviour until the person has set the surroundings variable,
for the reason that arrays are caught by our if is_numpy(xp) code path.
If it is not set, we increase exceptions for known-bad subclasses
(as set out in the RFC),
like NumPy MaskedArrays and object arrays.
After that, we retrieve the namespace utilizing array_api_compat.array_namespace.

The present testing infrastructure permits us to check with PyTorch (CPU & GPU) and CuPy by means of the command line.
For instance, python dev.py check -b pytorch exams with PyTorch, and python dev.py check -b all exams with
NumPy, numpy.array_api, CuPy and PyTorch.
numpy.array_api is a strict minimal implementation of the
normal, which permits us to ensure that our array-agnostic code is actually array-agnostic –
if our exams move for numpy.array_api, then they may move for any absolutely compliant library.
Testing with PyTorch on totally different gadgets will be enabled by setting the surroundings variable SCIPY_DEVICE to
totally different values: 'cuda' for Nvidia GPUs or 'mps' for MacOS gadgets with the Metallic framework.

For extra info on the implementation particulars, see
the SciPy “Help for Array API” documentation.


Standing

On the time of writing, SciPy has two of its submodules, cluster and fft, transformed
(within the sense I’ve described above).
linalg and sign have partial protection in progress, and among the compiled-code features in particular
have been lined, with further calling-out to CuPy, PyTorch and JAX.
The tracker problem
must be stored updated with progress as extra submodules are transformed.

We at present help CuPy and PyTorch by means of array-api-compat, with Dask and JAX on the horizon.
The nice factor is that every time a supplier library complies with the usual, it can work with our
transformed submodules, with out SciPy even needing to know that it exists!

Trying to the long run

I’ve opened the primary PR for scipy.linalg
which covers the features that are a part of
the linalg normal extension module.
Within the brief time period, we hope to see PRs masking the remainder of the submodules and continued refinement of our
testing infrastructure.
Wanting additional forward for SciPy, as soon as all libraries which we wish to help adjust to the usual, we are able to drop
array-api-compat and get the namespaces straight from the enter arrays (through the __array_namespace__ attribute).

As talked about above, a common dispatching mechanism will hopefully materialise down the road to permit full
help for all features which use compiled code. Nonetheless, even with out it, quite a lot of interoperability
is able to be achieved now!


Lastly, I might wish to say some phrases about my expertise as an intern at Quansight Labs and as a member of the
SciPy neighborhood.

Going from having by no means contributed to Open Supply to feeling proficient as a contributor, and having
vital contributions merged in the direction of a bleeding-edge mission, has been a difficult however very particular
expertise!
Common conferences with my mentors and the neighborhood meant that I used to be all the time in a position to ask questions once I was
confused, and be taught in regards to the array varieties mission and SciPy as an entire.
At Quansight Labs, I used to be a part of a cohort of interns and we additionally met often to debate our achievements and
clarify the place we have been caught.
Typically, simply placing your difficulties into phrases may help you work issues out from a special perspective!

After becoming a member of the SciPy neighborhood as a part of my internship, I’m eager to contribute voluntarily sooner or later
and assist to maintain science on this means.
Understanding that my code can be utilized in a package deal as extensively relied on as SciPy is an enormous motivation, and I’m
invested within the success of the array API normal now!

Acknowledgements

I wish to thank my mentors Ralf Gommers,
Pamphile Roy and Irwin Zaid for his or her
unimaginable help and steerage throughout my internship!
I really feel very fortunate to have had the chance to be taught from them in such a sensible means.
I’d additionally wish to thank the SciPy neighborhood, particularly
Tyler Reddy and Matt Haberland
who’ve been tremendously concerned within the overview strategy of lots of my PRs, and
Melissa Weber Mendonça who hosted lots of the neighborhood conferences which I
attended (in addition to coordinating the internship program at Quansight Labs!).

References

Related Articles

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Latest Articles