Skip to content

Don't instrument methods with [Intrinsic] #81983

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Feb 12, 2023

Conversation

EgorBo
Copy link
Member

@EgorBo EgorBo commented Feb 10, 2023

Let's assume all methods with [Intrinsic] don't need any instrumentation because they're imported as JIT intrinsics in Tier1 and their actual profile is never taken. There are few cases where we might not expand an intrinsic in Tier1 but so far they don't look like profile is a big deal for them.

The main problem this PR fixes are block counters inside those intrinsics and the cache contention they cause. As a bonus - we create less compilations now (slight decrease in working set).

Plaintex benchmark on aspnet-citrine-lin machine
with Tier0-instrumented (forced to stay in that tier forever):

| load                   |        Main |          PR |         |
| ---------------------- | ----------- | ----------- | ------- |
| CPU Usage (%)          |          11 |          12 |  +9.09% |
| Cores usage (%)        |         300 |         337 | +12.33% |
| Working Set (MB)       |         121 |         121 |   0.00% |
| Private Memory (MB)    |         358 |         358 |   0.00% |
| Start Time (ms)        |           0 |           0 |         |
| First Request (ms)     |         204 |         204 |   0.00% |
| Requests/sec           |     489,719 |     551,725 | +12.66% |
| Requests               |   7,380,118 |   8,317,673 | +12.70% |
| Mean latency (ms)      |        4.77 |        4.27 | -10.48% |
| Max latency (ms)       |       54.49 |       54.86 |  +0.68% |
| Bad responses          |           0 |           0 |         |
| Socket errors          |           0 |           0 |         |
| Read throughput (MB/s) |       76.13 |       85.77 | +12.66% |
| Latency 50th (ms)      |        4.94 |        4.35 | -11.94% |
| Latency 75th (ms)      |        7.17 |        6.33 | -11.72% |
| Latency 90th (ms)      |        8.91 |        7.90 | -11.34% |

NOTE: such methods are still call-counted, it's just that they will jump directly to Tier1 after R2R (or Tier0) after 30 calls.

cc @AndyAyersMS

@ghost ghost added the area-VM-coreclr label Feb 10, 2023
@ghost ghost assigned EgorBo Feb 10, 2023
Copy link
Member

@AndyAyersMS AndyAyersMS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like this should be a jit policy, not a runtime policy?

@EgorBo
Copy link
Member Author

EgorBo commented Feb 11, 2023

Seems like this should be a jit policy, not a runtime policy?

That will require new JIT-VM interface to notify VM it doesn't need to introduce instrumentation tier, so +200 LOC changes

@EgorBo
Copy link
Member Author

EgorBo commented Feb 11, 2023

Additionally we can introduce a sort of hack:

        PTR_MethodTable pMT = pMethodDesc->GetMethodTable();
        if (pMT->HasSameTypeDefAs(CoreLibBinder::GetClass(CLASS__SPAN)) ||
            pMT->HasSameTypeDefAs(CoreLibBinder::GetClass(CLASS__READONLY_SPAN)))
        {
            return NativeCodeVersion::OptimizationTier0;
        }

to disable instrumentation for all span members (they indeed don't need that). That gives up to 5% more RPS. Mostly because of Slice method (it's top hottest in the traces). And again, mostly because of BB instrumentation that it has (cache ping-pong again)

@AndyAyersMS
Copy link
Member

Seems like this should be a jit policy, not a runtime policy?

That will require new JIT-VM interface to notify VM it doesn't need to introduce instrumentation tier, so +200 LOC changes

I don't follow. The jit can decide whether or not to instrument, no matter what the runtime says.

@EgorBo EgorBo force-pushed the dont-instrument-intrinsics branch from 2eb477a to 4089eaa Compare February 11, 2023 12:42
@EgorBo
Copy link
Member Author

EgorBo commented Feb 11, 2023

Seems like this should be a jit policy, not a runtime policy?

That will require new JIT-VM interface to notify VM it doesn't need to introduce instrumentation tier, so +200 LOC changes

I don't follow. The jit can decide whether or not to instrument, no matter what the runtime says.

The problem that currently our workflow looks like this:

[Tier0] --30 calls--> [Tier 0 instrumented] --30 calls--> [Tier1]

If the instrumented tier decides it doesn't need an instrumentation it means we basically compiling Tier0 we already had. I've addressed your feedback as I generally agree with it. And will think about the ^ problem separately. Maybe it's actually for good because currently we end up with a lot of redundant Tier1 compilations for small methods (problem described here) so at least in this case we end up with non-optimized code and didn't waste time on optimizations

@EgorBo
Copy link
Member Author

EgorBo commented Feb 11, 2023

@AndyAyersMS anything else on your mind to do here? 🙂

Copy link
Member

@AndyAyersMS AndyAyersMS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

I realize we may do a redundant tier0 compile right now, but once we get around to handling some of these as intrinsics in Tier0 we actually should rarely be jitting these methods.

@EgorBo EgorBo merged commit 6826a7f into dotnet:main Feb 12, 2023
@EgorBo EgorBo deleted the dont-instrument-intrinsics branch February 12, 2023 09:02
@ghost ghost locked as resolved and limited conversation to collaborators Mar 14, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants