-
Notifications
You must be signed in to change notification settings - Fork 34.5k
Transfer notebook output as VSBuffer #130452
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@mjbvz this looks really promising! I played around with it a bit with a 17MB ipynb file (one single text output) and found that the deserizeRequest regressed compared to previous impl, not sure if we miss anything there. It made me think avoiding string/byte transformation is worth a try. |
@rebornix Thanks taking a look. Can you please share the notebook you were using for testing? I tested using the GitHub Issues notebook with a large query. Here's what I see before applying the change: And after using the same query: This was collected by profiling the renderer for a request that includes two buffers: a 6MB buffer and an 2MB buffer |
7590b6a
to
3cd6e5a
Compare
@mjbvz the time for parsing buffer on IPC is almost gone now, this is a huge improvement to the notebook file opening, thanks and great work! |
3cd6e5a
to
c167d33
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The first implementation of the RPC was marshalling and unmarshalling URIs and regular expressions transparently. This was done by always using a replacer with JSON.stringify
and a recursive walk over the object after JSON.parse
, exactly as this proposed change does to transfer buffers. This turned out to be a huge perf bottle-neck, showing up and blocking the UI thread when transferring very many small objects, e.g. for outline of a very large file, or a very large list of completions, etc.
@jrieken and I spent a great deal of effort to move away from always using a replacer and a recursive walk over all objects transferred via RPC (see #40169 ) and the previous code (as you have seen) also went to great lengths to avoid using a replacer / recursive walk by using e.g. MixedArgs
vs JSONArgs
, etc. I suggest we try to make an implementation where only the calls which want to transfer deep array buffers nested inside objects need to pay the perf cost of walking the objects to marshall/unmarshall the array buffers.
IMHO the current proposed change of always walking all objects is not acceptable.
c167d33
to
5547288
Compare
Thanks for taking a look and for providing some background @alexdima! I've reworked the rpc part to only enable the object walking if the caller passes in a special flag. I've only enabled this flag for notebook requests so all other requests should behave the same as before Right now, extracting buffers is enabled on a per-proxy level. Ideally it would instead be per-call since not all calls on an object will need to pass buffers. If we think changing this to be per-call instead is worth investigating, I have two ideas on implementing it:
Will continue polishing this up but let me know if you think this is the right direction |
I really like the idea to make this explicit, pay the perf cost only in the specific call sites, and using TS to compile and check that everything is OK. e.g. on possible usage: // declaration side
interface IMyObjectWithBuffers {
key1: VSBuffer;
key2: number;
}
interface MainThreadX {
$someRpcFuncThatTakesBuffers(obj: Serialized<IMyObjectWithBuffers>);
}
// caller side:
const obj: IMyObjectWithBuffers;
myProxy.$someRpcFuncThatTakesBuffers(serialize(obj));
// implementation side:
public $someRpcFuncThatTakesBuffers(_obj: Serialized<IMyObjectWithBuffers>) {
const obj: IMyObjectWithBuffers = deserialize(_obj);
} e.g. a possible implementation: // a recursive replacer of all values of type VSBuffer with `{ $$ref: string; }`
type SerializedBuffers<T> = { [K in keyof T]: T[K] extends VSBuffer ? { $$ref: string; } : SerializedBuffers<T[K]> };
class Serialized<T> {
readonly objWithArrayBufferMarkers: SerializedBuffers<T>;
readonly buffers: VSBuffer[];
}
function serialize<T>(obj: T): Serialized<T> {
// walk and extract buffers, replacing them with `$$ref`, etc.
}
function deserialize<T>(obj: Serialized<T>): T {
// walk and replace `$$ref` with the actual buffers.
}
// the rpcProtocol could then loop over arguments and identify the ones which are instanceof Serialized and transport them specially. |
@alexdima Thanks! That's actually very similar to what we do for webview.postMessage today so I think it should be possible. I know that we can support buffers in the rpc arguments pretty easily with the existing rpc implementation but we will need to come up with an approach for when a response object contains a mix of normal objects and buffers I'll investigate moving this logic up a layer instead of having it in the core rpc protocol |
f857aed
to
92a8f27
Compare
This PR transfers notebook output as an VSBuffer instead of as a array of numbers. This significantly reduces the message size as well as the serialize/deserialize times To accomplish this, I've introduced a new `SerializableObjectWithBuffers` type. This specially marks rpc objects that contain VSBuffers. This is needed as our current RPC implementation can only efficently transfer VSBuffers that appears as top level arguments. The rpcProtocol now can also identify top level `SerializableObjectWithBuffers` arguments and ensure these are serialized more efficently. This is easier than trying to extract the `VSBuffer` proeprties to top level arguments. It also lets us easily support return types that may contain buffers
92a8f27
to
7f129fa
Compare
Updated the PR with a new implementation that lets us only pay the cost of extracting buffers for calls that deal with buffers. This is done by introducing a new
I'll see if I can get typescript to enforce that all VSBuffers must either be passed as top level arguments or wrapped in an |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @mjbvz. This looks very good to me and thanks for tackling this.
fyi - I can help with the test integration test failures |
fyi - pushed 2f66e67 to use SerializableObjectWithBuffers also when using the NotebookController-API |
Sorry for the slow response from my side, as I'm reading Notifications from oldest to newest and prioritizing PRs waiting for my review. (I still have 305 notifications). But it looks that a PR, once reviewed, no longer appears in the Review Requested section on GH (https://github.com/pulls/review-requested), nor in the "Waiting For My Review" category in the GHPR extension, even if more commits were pushed. I think pressing this button makes it show up in those places again, otherwise only a simple notification will be generated: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Brilliant!
Thanks everyone! |
Fixes #125819
This PR lets us transfer the notebook output as a VSBuffer instead of as an array of numbers. This should significantly reduce the rpc message size when talking about notebook outputs and also speed up serialization and deserialization of notebook outputs.
Problem
Right now, our RPC implementation can only serialize
VSBuffers
that appear as top level arguments. The issue is that the notebook outputs appear inside deeply nested objects, and we cannot extract them to top level arguments without adding a lot of complexity to rpc calls involving outputs.Fix
This PR adds a new
SerializableObjectWithBuffers
class that indicates to our rpc implementation that an object should have its buffers extracted and efficiently serialized for transfer. This lets us keep anyVSBuffers
that appear inside the arguments as binary data, and also lets us avoid overhead of parsing them as JSON.SerializableObjectWithBuffers
can appear both rpc arguments and be used as an rpc return value.