Skip to content

Transfer notebook output as VSBuffer #130452

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Aug 17, 2021
Merged

Transfer notebook output as VSBuffer #130452

merged 2 commits into from
Aug 17, 2021

Conversation

mjbvz
Copy link
Collaborator

@mjbvz mjbvz commented Aug 10, 2021

Fixes #125819

This PR lets us transfer the notebook output as a VSBuffer instead of as an array of numbers. This should significantly reduce the rpc message size when talking about notebook outputs and also speed up serialization and deserialization of notebook outputs.

Problem

Right now, our RPC implementation can only serialize VSBuffers that appear as top level arguments. The issue is that the notebook outputs appear inside deeply nested objects, and we cannot extract them to top level arguments without adding a lot of complexity to rpc calls involving outputs.

Fix

This PR adds a new SerializableObjectWithBuffers class that indicates to our rpc implementation that an object should have its buffers extracted and efficiently serialized for transfer. This lets us keep any VSBuffers that appear inside the arguments as binary data, and also lets us avoid overhead of parsing them as JSON.

SerializableObjectWithBuffers can appear both rpc arguments and be used as an rpc return value.

@rebornix
Copy link
Member

@mjbvz this looks really promising!

I played around with it a bit with a 17MB ipynb file (one single text output) and found that the deserizeRequest regressed compared to previous impl, not sure if we miss anything there. It made me think avoiding string/byte transformation is worth a try.

image

@mjbvz
Copy link
Collaborator Author

mjbvz commented Aug 10, 2021

@rebornix Thanks taking a look. Can you please share the notebook you were using for testing? I tested using the GitHub Issues notebook with a large query. Here's what I see before applying the change:

Screen Shot 2021-08-09 at 8 37 18 PM

And after using the same query:

Screen Shot 2021-08-09 at 8 40 07 PM

This was collected by profiling the renderer for a request that includes two buffers: a 6MB buffer and an 2MB buffer

@mjbvz
Copy link
Collaborator Author

mjbvz commented Aug 10, 2021

Just pushed a fix to also handle VSBuffers inside of response object. Here's opening a notebook with large output before this PR:

Screen Shot 2021-08-10 at 11 49 58 AM

And after:

Screen Shot 2021-08-10 at 11 54 16 AM

Opening the notebook feels faster too!

@mjbvz mjbvz force-pushed the dev/mjbvz/serialize-notebook branch from 7590b6a to 3cd6e5a Compare August 11, 2021 03:37
@rebornix
Copy link
Member

rebornix commented Aug 11, 2021

@mjbvz the time for parsing buffer on IPC is almost gone now, this is a huge improvement to the notebook file opening, thanks and great work!

@mjbvz mjbvz added this to the August 2021 milestone Aug 11, 2021
@mjbvz mjbvz force-pushed the dev/mjbvz/serialize-notebook branch from 3cd6e5a to c167d33 Compare August 11, 2021 21:20
Copy link
Member

@alexdima alexdima left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first implementation of the RPC was marshalling and unmarshalling URIs and regular expressions transparently. This was done by always using a replacer with JSON.stringify and a recursive walk over the object after JSON.parse, exactly as this proposed change does to transfer buffers. This turned out to be a huge perf bottle-neck, showing up and blocking the UI thread when transferring very many small objects, e.g. for outline of a very large file, or a very large list of completions, etc.

@jrieken and I spent a great deal of effort to move away from always using a replacer and a recursive walk over all objects transferred via RPC (see #40169 ) and the previous code (as you have seen) also went to great lengths to avoid using a replacer / recursive walk by using e.g. MixedArgs vs JSONArgs, etc. I suggest we try to make an implementation where only the calls which want to transfer deep array buffers nested inside objects need to pay the perf cost of walking the objects to marshall/unmarshall the array buffers.

IMHO the current proposed change of always walking all objects is not acceptable.

@mjbvz mjbvz force-pushed the dev/mjbvz/serialize-notebook branch from c167d33 to 5547288 Compare August 12, 2021 22:13
@mjbvz
Copy link
Collaborator Author

mjbvz commented Aug 12, 2021

Thanks for taking a look and for providing some background @alexdima!

I've reworked the rpc part to only enable the object walking if the caller passes in a special flag. I've only enabled this flag for notebook requests so all other requests should behave the same as before

Right now, extracting buffers is enabled on a per-proxy level. Ideally it would instead be per-call since not all calls on an object will need to pass buffers. If we think changing this to be per-call instead is worth investigating, I have two ideas on implementing it:

  • Use a method names to indicate that buffers may be included in the arguments. Perhaps prefixing them with $$, such as $$someRpcFuncThatTakesBuffers
  • Pass in a special extractBuffers argument to the function call: $someRpcFuncThatTakesBuffers(Rpc.extractBuffers, ...args)

Will continue polishing this up but let me know if you think this is the right direction

@alexdima
Copy link
Member

alexdima commented Aug 13, 2021

I really like the idea to make this explicit, pay the perf cost only in the specific call sites, and using TS to compile and check that everything is OK. e.g. on possible usage:

// declaration side
interface IMyObjectWithBuffers {
  key1: VSBuffer;
  key2: number;
}
interface MainThreadX {
   $someRpcFuncThatTakesBuffers(obj: Serialized<IMyObjectWithBuffers>);
}

// caller side:
const obj: IMyObjectWithBuffers;
myProxy.$someRpcFuncThatTakesBuffers(serialize(obj));

// implementation side:
public $someRpcFuncThatTakesBuffers(_obj: Serialized<IMyObjectWithBuffers>) {
  const obj: IMyObjectWithBuffers = deserialize(_obj);
}

e.g. a possible implementation:

// a recursive replacer of all values of type VSBuffer with `{ $$ref: string; }`
type SerializedBuffers<T> = { [K in keyof T]: T[K] extends VSBuffer ? { $$ref: string; } : SerializedBuffers<T[K]> };
class Serialized<T> {
  readonly objWithArrayBufferMarkers: SerializedBuffers<T>;
  readonly buffers: VSBuffer[];
}

function serialize<T>(obj: T): Serialized<T> {
  // walk and extract buffers, replacing them with `$$ref`, etc.
}

function deserialize<T>(obj: Serialized<T>): T {
  // walk and replace `$$ref` with the actual buffers.
}

// the rpcProtocol could then loop over arguments and identify the ones which are instanceof Serialized and transport them specially.

@mjbvz
Copy link
Collaborator Author

mjbvz commented Aug 13, 2021

@alexdima Thanks! That's actually very similar to what we do for webview.postMessage today so I think it should be possible.

I know that we can support buffers in the rpc arguments pretty easily with the existing rpc implementation but we will need to come up with an approach for when a response object contains a mix of normal objects and buffers

I'll investigate moving this logic up a layer instead of having it in the core rpc protocol

@mjbvz mjbvz force-pushed the dev/mjbvz/serialize-notebook branch from f857aed to 92a8f27 Compare August 14, 2021 00:13
This PR transfers notebook output as an VSBuffer instead of as a array of numbers. This significantly reduces the message size as well as the serialize/deserialize times

To accomplish this, I've introduced a new `SerializableObjectWithBuffers` type. This specially marks rpc objects that contain VSBuffers. This is needed as our current RPC implementation can only efficently transfer VSBuffers that appears as top level arguments. The rpcProtocol now can also identify top level `SerializableObjectWithBuffers` arguments and ensure these are serialized more efficently.

This is easier than trying to extract the `VSBuffer` proeprties to top level arguments. It also lets us easily support return types that may contain buffers
@mjbvz mjbvz force-pushed the dev/mjbvz/serialize-notebook branch from 92a8f27 to 7f129fa Compare August 14, 2021 00:18
@mjbvz
Copy link
Collaborator Author

mjbvz commented Aug 14, 2021

Updated the PR with a new implementation that lets us only pay the cost of extracting buffers for calls that deal with buffers. This is done by introducing a new SerializableObjectWithBuffers class that indicates to our rpc implementation that an object should have its buffers extracted and efficiently serialized for transfer.

SerializableObjectWithBuffers can appear both rpc arguments and be used as an rpc return value.

I'll see if I can get typescript to enforce that all VSBuffers must either be passed as top level arguments or wrapped in an SerializableObjectWithBuffers

Copy link
Member

@jrieken jrieken left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @mjbvz. This looks very good to me and thanks for tackling this.

@jrieken
Copy link
Member

jrieken commented Aug 16, 2021

fyi - I can help with the test integration test failures

@jrieken
Copy link
Member

jrieken commented Aug 16, 2021

fyi - pushed 2f66e67 to use SerializableObjectWithBuffers also when using the NotebookController-API

@alexdima
Copy link
Member

Sorry for the slow response from my side, as I'm reading Notifications from oldest to newest and prioritizing PRs waiting for my review. (I still have 305 notifications). But it looks that a PR, once reviewed, no longer appears in the Review Requested section on GH (https://github.com/pulls/review-requested), nor in the "Waiting For My Review" category in the GHPR extension, even if more commits were pushed.

I think pressing this button makes it show up in those places again, otherwise only a simple notification will be generated:

image

@alexdima alexdima self-requested a review August 17, 2021 12:24
Copy link
Member

@alexdima alexdima left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Brilliant!

@mjbvz mjbvz merged commit 8b6547a into main Aug 17, 2021
@mjbvz mjbvz deleted the dev/mjbvz/serialize-notebook branch August 17, 2021 16:57
@mjbvz
Copy link
Collaborator Author

mjbvz commented Aug 17, 2021

Thanks everyone!

@github-actions github-actions bot locked and limited conversation to collaborators Oct 1, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Send output item data as VSBuffer
4 participants