Skip to content

ENH: asarray for Array API support #24

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 90 commits into from
Closed

ENH: asarray for Array API support #24

wants to merge 90 commits into from

Conversation

tupui
Copy link
Owner

@tupui tupui commented Apr 21, 2023

Add helper functions to support Array API.

I've taken some inspirations from the open PR we had on SciPy and what is done in sklearn.

  • namespace_from_arrays is responsible of getting the namespace. As we want to use this along asarray, it wraps array_api_compat.array_namespace which would otherwise raise for array like objects such as Python lists or scalars.
x, y = [0, 1, 2], np.arange(3)
xp = namespace_from_arrays(x, y)
xp.__name__
# 'array_api_compat.numpy'
  • asarray is a drop-in replacement that we could use throughout our code-base. We are using things like order and dtype, sklearn's version seems good at taking care of that and also adds copy.
x_, y_ = asarray([0, 1, 2], xp=xp), asarray(np.arange(3), xp=xp)
x_, y_
# (array([0, 1, 2]]), array([0, 1, 2]))

asarray_namespace is a first attempt at having a function which does it all. It seems to be working for now, but needs work to make it efficient. (We can also say that this is not needed at all.)

x, y, xp = asarray_namespace(x, y)
xp.__name__
# 'array_api_compat.numpy'
x, y
# (array([0, 1, 2]]), array([0, 1, 2]))

I still need to go through the RFC and add see how to handle all cases.

There might be other things to borrow from sklearn (e.g. they have _ArrayAPIWrapper, for now I would not bother with things like their _NumPyAPIWrapper as this will not get into this release and by the time we do have something here, we might just already have NumPy >= 1.22. EDIT: well might need something like that to have concat and alike.)

cc @rgommers

@rgommers
Copy link

Looks like a good start! I'd try adding masked array validation and the temporary global switch next - that will give a pretty good idea of the usage pattern already.

@tupui
Copy link
Owner Author

tupui commented Apr 21, 2023

Ok I will do that 👍

namespaces = set()
for array in arrays:
try:
namespaces.add(array_api_compat.array_namespace(array))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

array_api_compat.array_namespace already accepts multiple arrays. Is the issue that there isn't a way to specify numpy as a default?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I would need the function to not raise but return NumPy. This is because we want to accept things like Python lists or scalar.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can add a flag to array_namespace like default=numpy that makes it do this. That function's not part of the spec so we can adjust it however makes it most useful.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would help with our use case 😃

Quick question, should this really fail?

array_api_compat.array_namespace(np.array(1), numpy.array_api.asarray(1))

This is why I have the following to go around and convert numpy.array_api to array_api_compat.numpy

  if numpy.array_api in namespaces:
      namespaces.remove(numpy.array_api)
      namespaces.add(array_api_compat.numpy)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And in general any compat code that's useful to more than one library can go in the compat library (assuming it's not too complex and pure Python).

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok I can do a PR for that if you want.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't try mixing numpy and numpy.array_api. numpy.array_api should only be used for testing purposes. It implements a strict version of the standard so you can use it to check that you are saying within the spec. It shouldn't be used for actual user code.

The whole purpose of the compat library is to provide sufficient wrappers around numpy itself to make it array API compatible. Using numpy.array_api for user code was found to be too challenging because it uses a different array class from NumPy, which is not what most users want.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mmm ok so when I test, I should consider that numpy.array_api is something different than numpy such as lets say cupy. Makes sense to me 👍 Thanks for the explanations Aaron 😃

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@asmeurer
Copy link

I think some of this stuff can be upstreamed to array-api-compat.

@tupui
Copy link
Owner Author

tupui commented Apr 26, 2023

I added a functional global switch, a to_numpy for Cython and some error logic for masked arrays and matrix.

Next I was planning on looking at scipy.cluster and how to actually use this. Unless there is something else I would do first?

xp_name = xp.__name__

if xp_name in {"array_api_compat.torch", "torch"}:
return array.cpu().numpy()

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't want to do this. Just using np.asarray(array) is better. As a rule, never do silent device transfers from GPU to CPU or vice verse. This is true for CuPy too - that should simply error.

This is specially useful to pass arrays to Cython.

This conversation was specifically about PyTorch CPU tensors. There may be other cpu array libraries where this works. But basically, this function doesn't seem needed, in any code where you planned to use this I think you want np.asarray instead.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok I will remove 👍 This was in sklearn so I thought this was more "optimal" to do the conversion using that.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm interesting. @thomasjpfan I thought you preferred exceptions?

Copy link

@thomasjpfan thomasjpfan Apr 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Scikit-learn has a function to do device transfer strictly for testing purposes and is not part of public API. Scikit-learn checks that the GPU implementation gives similar results as the CPU implementation.

This type of testing does not have full coverage, but it's nice to have some checks to make sure the GPU code paths does something reasonable.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes, for testing it makes sense indeed. As long as we figure out how to make sure to not accidentally introduce it into the code base outside of testing.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added some utilities for testing back on the host from the device in my branch.. my approach was quite different though.. perhaps I can try to make a PR to this branch but...

Copy link

@rgommers rgommers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Pamphile. I added a few comments about the behavior, those should not be hard to address I think - overall this seems in pretty good shape already.

Next I was planning on looking at scipy.cluster and how to actually use this

Yes, that seems like a good next step.

Copy link
Owner Author

@tupui tupui left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Ralf. I will make the changes and then move on to cluster 🤞

xp_name = xp.__name__

if xp_name in {"array_api_compat.torch", "torch"}:
return array.cpu().numpy()
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok I will remove 👍 This was in sklearn so I thought this was more "optimal" to do the conversion using that.

"""
if xp is None:
xp = array_namespace(array)
if xp.__name__ in {"numpy", "array_api_compat.numpy", "numpy.array_api"}:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No one's actually doing it yet as far as I know, but this wouldn't work if someone vendors array_api_compat and tries to call a scipy function.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also I don't know if it makes sense to list numpy.array_api here. That namespace is designed to only support a strict implementation of the standard, which doesn't include order.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For scikit-learn, we wanted the performance with numpy.array_api to be the same as numpy. When one explicitly sets the order, there is usually a performance reason for doing so. For example:

def scipy_func(X):
    xp = array_namespace(X)
    
    # switch order for performance reasons
    X_f = asarray(X, xp, order="F")
    
    # Do some operations that require prefer F ordered.
    return xp.sum(X_f, axis=0)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By the way something like

_X = numpy.asarray(..., order="F")
X = numpy.array_api.asarray(X)

will also work. That's maybe a little more "spec compliant" in the sense that converting arrays from one library to another with asarray is supported. In this case it's a trivial zero-copy wrapping but in general it will use DLPack (although I don't know how DLPack handles order, so maybe someone could confirm whether this would actually work in a more general setting).

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually that's wrong. I thought asarray in the spec used dlpack, but it's only numpy.asarray that does. In the spec you have to use from_dlpack (I'm not sure why they are separate).

@tupui
Copy link
Owner Author

tupui commented May 1, 2023

What should we do with _asarray_validated? It's used 69 times throughout 6 modules.

I think the simplest would be to call asarray_namespace within _asarray_validated. It would do as asarray_namespace in case the flag is on, otherwise it could respect the current behaviour (with the addition that it would return the namespace.

Alternatively, we could add the check _asarray_validated into asarray_namespace. But that would be bulky, so probably a wrapper would look better. Still not optimal I think.

What do you think?

@thomasjpfan
Copy link

I think the simplest would be to call asarray_namespace within _asarray_validated.

I would go with this option to keep asarray_namespace simple. In this case, _asarray_validated needs to be updated to work with Array API. For example, np.asarray_chkfinite needs to be re-implemented using Array API.

@tupui
Copy link
Owner Author

tupui commented Jun 13, 2023

  • Changes of assert_equal to assert_allclose: the latter uses a default relative tolerance of 1e-7, which is quite loose. I'd expect that if exact equality is dropped, this would use an rtol in the 1e-12 to 1e-15 range in most cases.

I just changed that because otherwise there was an issue about boolean comparison. I can try to change rtol to be stricter 👍

  • The if xp.__name__ in {"array_api_compat.torch", "torch"}: special-casing: is it not possible to use xp.matmul unconditionally (also, should use as a function not a method)?

Actually yes, just using xp.matmul did the trick and no specialisation is needed 😃

Ok I will make the PR now. Thank you all for helping out and see you on the other side 😉

@rgommers
Copy link

Ok I will make the PR now. Thank you all for helping out and see you on the other side 😉

🎉 🚀

@tupui
Copy link
Owner Author

tupui commented Jun 22, 2023

Just closing this on my fork for book keeping on my side.

@tupui tupui closed this Jun 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants