-
Notifications
You must be signed in to change notification settings - Fork 3
ENH: asarray
for Array API support
#24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Looks like a good start! I'd try adding masked array validation and the temporary global switch next - that will give a pretty good idea of the usage pattern already. |
Ok I will do that 👍 |
scipy/_lib/_array_api.py
Outdated
namespaces = set() | ||
for array in arrays: | ||
try: | ||
namespaces.add(array_api_compat.array_namespace(array)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
array_api_compat.array_namespace already accepts multiple arrays. Is the issue that there isn't a way to specify numpy as a default?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I would need the function to not raise but return NumPy. This is because we want to accept things like Python lists or scalar.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can add a flag to array_namespace
like default=numpy
that makes it do this. That function's not part of the spec so we can adjust it however makes it most useful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would help with our use case 😃
Quick question, should this really fail?
array_api_compat.array_namespace(np.array(1), numpy.array_api.asarray(1))
This is why I have the following to go around and convert numpy.array_api
to array_api_compat.numpy
if numpy.array_api in namespaces:
namespaces.remove(numpy.array_api)
namespaces.add(array_api_compat.numpy)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And in general any compat code that's useful to more than one library can go in the compat library (assuming it's not too complex and pure Python).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok I can do a PR for that if you want.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wouldn't try mixing numpy and numpy.array_api. numpy.array_api should only be used for testing purposes. It implements a strict version of the standard so you can use it to check that you are saying within the spec. It shouldn't be used for actual user code.
The whole purpose of the compat library is to provide sufficient wrappers around numpy itself to make it array API compatible. Using numpy.array_api for user code was found to be too challenging because it uses a different array class from NumPy, which is not what most users want.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mmm ok so when I test, I should consider that numpy.array_api
is something different than numpy
such as lets say cupy
. Makes sense to me 👍 Thanks for the explanations Aaron 😃
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@asmeurer I opened data-apis/array-api-compat#39
I think some of this stuff can be upstreamed to array-api-compat. |
I added a functional global switch, a Next I was planning on looking at |
scipy/_lib/_array_api.py
Outdated
xp_name = xp.__name__ | ||
|
||
if xp_name in {"array_api_compat.torch", "torch"}: | ||
return array.cpu().numpy() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You don't want to do this. Just using np.asarray(array)
is better. As a rule, never do silent device transfers from GPU to CPU or vice verse. This is true for CuPy too - that should simply error.
This is specially useful to pass arrays to Cython.
This conversation was specifically about PyTorch CPU tensors. There may be other cpu array libraries where this works. But basically, this function doesn't seem needed, in any code where you planned to use this I think you want np.asarray
instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok I will remove 👍 This was in sklearn so I thought this was more "optimal" to do the conversion using that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm interesting. @thomasjpfan I thought you preferred exceptions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Scikit-learn has a function to do device transfer strictly for testing purposes and is not part of public API. Scikit-learn checks that the GPU implementation gives similar results as the CPU implementation.
This type of testing does not have full coverage, but it's nice to have some checks to make sure the GPU code paths does something reasonable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes, for testing it makes sense indeed. As long as we figure out how to make sure to not accidentally introduce it into the code base outside of testing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added some utilities for testing back on the host from the device in my branch.. my approach was quite different though.. perhaps I can try to make a PR to this branch but...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Pamphile. I added a few comments about the behavior, those should not be hard to address I think - overall this seems in pretty good shape already.
Next I was planning on looking at
scipy.cluster
and how to actually use this
Yes, that seems like a good next step.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Ralf. I will make the changes and then move on to cluster
🤞
scipy/_lib/_array_api.py
Outdated
xp_name = xp.__name__ | ||
|
||
if xp_name in {"array_api_compat.torch", "torch"}: | ||
return array.cpu().numpy() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok I will remove 👍 This was in sklearn so I thought this was more "optimal" to do the conversion using that.
""" | ||
if xp is None: | ||
xp = array_namespace(array) | ||
if xp.__name__ in {"numpy", "array_api_compat.numpy", "numpy.array_api"}: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No one's actually doing it yet as far as I know, but this wouldn't work if someone vendors array_api_compat and tries to call a scipy function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also I don't know if it makes sense to list numpy.array_api
here. That namespace is designed to only support a strict implementation of the standard, which doesn't include order
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For scikit-learn, we wanted the performance with numpy.array_api
to be the same as numpy
. When one explicitly sets the order, there is usually a performance reason for doing so. For example:
def scipy_func(X):
xp = array_namespace(X)
# switch order for performance reasons
X_f = asarray(X, xp, order="F")
# Do some operations that require prefer F ordered.
return xp.sum(X_f, axis=0)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By the way something like
_X = numpy.asarray(..., order="F")
X = numpy.array_api.asarray(X)
will also work. That's maybe a little more "spec compliant" in the sense that converting arrays from one library to another with asarray is supported. In this case it's a trivial zero-copy wrapping but in general it will use DLPack (although I don't know how DLPack handles order, so maybe someone could confirm whether this would actually work in a more general setting).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually that's wrong. I thought asarray in the spec used dlpack, but it's only numpy.asarray that does. In the spec you have to use from_dlpack (I'm not sure why they are separate).
What should we do with I think the simplest would be to call Alternatively, we could add the check What do you think? |
I would go with this option to keep |
I just changed that because otherwise there was an issue about boolean comparison. I can try to change
Actually yes, just using Ok I will make the PR now. Thank you all for helping out and see you on the other side 😉 |
🎉 🚀 |
[skip cirrus] [skip circle]
This reverts commit 4787f50.
[skip cirrus] [skip circle]
[skip cirrus] [skip circle]
[skip circle] [skip cirrus]
[skip ci]
[skip cirrus] [skip circle]
[skip cirrus] [skip circle]
Just closing this on my fork for book keeping on my side. |
Add helper functions to support Array API.
I've taken some inspirations from the open PR we had on SciPy and what is done in sklearn.
namespace_from_arrays
is responsible of getting the namespace. As we want to use this alongasarray
, it wrapsarray_api_compat.array_namespace
which would otherwise raise for array like objects such as Python lists or scalars.asarray
is a drop-in replacement that we could use throughout our code-base. We are using things likeorder
anddtype
, sklearn's version seems good at taking care of that and also addscopy
.asarray_namespace
is a first attempt at having a function which does it all. It seems to be working for now, but needs work to make it efficient. (We can also say that this is not needed at all.)I still need to go through the RFC and add see how to handle all cases.
There might be other things to borrow from sklearn (e.g. they have
_ArrayAPIWrapper
, for now I would not bother with things like their_NumPyAPIWrapper
as this will not get into this release and by the time we do have something here, we might just already have NumPy >= 1.22. EDIT: well might need something like that to haveconcat
and alike.)cc @rgommers