Page MenuHomePhabricator

Bug 1661016: aarch64: Invalidate icache when compiling on a background thread; r=nbp,lth
ClosedPublic

Authored by bbouvier on Aug 27 2020, 10:18 AM.
Tags
None
Referenced Files
Unknown Object (File)
Sun, Feb 18, 10:37 AM
Unknown Object (File)
Jan 31 2024, 12:50 AM
Unknown Object (File)
Jan 30 2024, 8:47 AM
Unknown Object (File)
Jan 26 2024, 4:15 AM
Unknown Object (File)
Jan 2 2024, 11:38 AM
Unknown Object (File)
Jun 24 2022, 6:27 PM
Unknown Object (File)
Jun 24 2022, 4:20 PM
Unknown Object (File)
Jun 24 2022, 4:13 PM
Subscribers

Details

Summary

On real hardware, when a background thread finishes compilation, it must signal
to the other executing threads that they need to reload a new version of the
code. Ideally, each executing thread would run an ISB instruction to do so. We
hereby use a system call membarrier that interrupts every other running thread,
and will cause the same effect as a local ISB would. It is heavyweight, so we
make sure to only run it in the case where we're on a background thread.

In the simulator, pending icache flushing requests were never satisfied before
this patch, when the request was emitted from a thread other than the main
thread. Similar behavior as above is emulated.

Diff Detail

Repository
rMOZILLACENTRAL mozilla-central
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

phab-bot changed the visibility from "Custom Policy" to "Public (No Login Required)".
phab-bot changed the edit policy from "Custom Policy" to "Restricted Project (Project)".
phab-bot removed a project: secure-revision.
nbp requested changes to this revision.Aug 27 2020, 10:37 AM
nbp added inline comments.
js/src/jit/arm64/vixl/MozSimulator-vixl.cpp
906–909

This code would not work if we have worker threads, which are additional threads with their own simulator, and their own caching decoder which is not shared.
Having a global atomic for all simulators, might get this number exhausted as soon as the first one is visited.

You might want to add this flag as part of https://searchfox.org/mozilla-central/source/js/src/jit/arm64/vixl/Simulator-vixl.h#2518 , and maybe have it register the length of the records vector, such that only the necessary records are flushed.

This revision now requires changes to proceed.Aug 27 2020, 10:37 AM
bbouvier edited the summary of this revision. (Show Details)
js/src/jit/arm64/vixl/MozSimulator-vixl.cpp
906–909

Good catch, i didn't think about multiple simulators.

You might want to add this flag as part of https://searchfox.org/mozilla-central/source/js/src/jit/arm64/vixl/Simulator-vixl.h#2518 , and maybe have it register the length of the records vector, such that only the necessary records are flushed.

I can't do this, since we basically need a data edge from the simulator to the number of pending requests (and while we have this edge in practice, it'd require taking the lock, which is precisely what we want to avoid here). So instead, I put the atomic on the simulator directly.

nbp requested changes to this revision.Aug 27 2020, 5:51 PM
nbp added inline comments.
js/src/jit/arm64/vixl/MozSimulator-vixl.cpp
213

Add a comment that this is to emulate what membarrier is doing, which is equivalent to run "isb" instruction on all cores.

979

Create a dedicated function SimulatorProcess::membarrier() which will once more loop over every Simulator to set the pendingCacheRequests.
Then call this new membarrier function from the CPU::EnsureIAndDCacheCoherency function, in order to mirror what the flushing code does.
Ideally, add this SimulatorProcess::membarrier() at the same time as the membarrier call is added to the CPU::EnsureIAndDCacheCoherency.

js/src/jit/arm64/vixl/Simulator-vixl.h
506

Any reasons not to use a boolean?

This revision now requires changes to proceed.Aug 27 2020, 5:51 PM
js/src/jit/arm64/vixl/Simulator-vixl.h
506

I thought about it, and i had in mind the case where two background compilations (for two different wasm modules) would end up around the same time. However, since the function getting the cache flush requests loops on the requests array while holding the lock (while the lock has also to be hold to append new requests onto the array), it should be fine to use a boolean here.

bbouvier retitled this revision from Bug 1661016: Flush pending icache invalidation request when using the caching decoder; r?nbp to Bug 1661016: aarch64: Invalidate icache when compiling on a background thread; r?nbp.
bbouvier edited the summary of this revision. (Show Details)

I think this implementation does not reflect perfectly what cache lines would have to be synchronized before the execution, such as having 2 execution threads compiling and sharing code between each others, while having a background compilation thread flushing caches.
However this case is not used in SpiderMonkey at the moment, and if it were to be used in the future, it is likely that we would still catch it in cases when we have no background compilation.

So, this patch sounds good to me!
Thanks for taking time to investigate this issue and fix it properly!

js/src/jit/arm64/vixl/MozSimulator-vixl.cpp
212–213

comment nit: reading this sentence while ignoring what is enclosed in parenthesis does not sounds right.

984

nit: oomUnsafe is unused, thus not needed here.

This revision is now accepted and ready to land.Aug 28 2020, 9:25 AM
bbouvier retitled this revision from Bug 1661016: aarch64: Invalidate icache when compiling on a background thread; r?nbp to Bug 1661016: aarch64: Invalidate icache when compiling on a background thread; r=nbp.
bbouvier edited the summary of this revision. (Show Details)

I've added enough to disable Cranelift when membarrier isn't available, but it's too drastic; will change the patch later.

This patch sounds good to me.

js/src/jit/arm64/Architecture-arm64.cpp
89

nit: s/On/From/

bbouvier retitled this revision from Bug 1661016: aarch64: Invalidate icache when compiling on a background thread; r=nbp to Bug 1661016: aarch64: Invalidate icache when compiling on a background thread; r?nbp,lth.
This revision is now accepted and ready to land.Aug 31 2020, 10:43 AM

Adding Lars as a reviewer for the tiered compilation decision. I checked and no other places seem to control whether tiering is done, but a second pair of eyes looking at this would surely be nice to have.

I also incorporated a suggestion from Glandium: try to run the system call in the function that will need it, so we know very eagerly if it'll run or not. This allows us to crash in the unlikely event where the actual membarrier did fail, which seems safe (I'll make sure this actually works before landing anything).

The test in CanFlushICacheFromBackgroundThreads is pretty expensive (uname, strcmp, sscanf). I can't agree with myself whether I think this matters, since cache flushing can be expensive, but would it be worth caching this value in a static in that function? The answer will always be the same.

Tiering decision looks OK, I can't find any other spots that need updating either. You could MOZ_ASSERT(IsICacheSafe()) in Module::startTier2(), just to be on the safe side.

bbouvier retitled this revision from Bug 1661016: aarch64: Invalidate icache when compiling on a background thread; r?nbp,lth to Bug 1661016: aarch64: Invalidate icache when compiling on a background thread; r=nbp,lth.

Code analysis found 1 defect in the diff 334603:

  • 1 defect found by clang-tidy

You can run this analysis locally with:

  • ./mach static-analysis check --outgoing (C/C++)

If you see a problem in this automated review, please report it here.

You can view these defects on the code-review frontend and on Treeherder.

Code analysis found 1 defect in the diff 335058:

  • 1 defect found by clang-tidy

You can run this analysis locally with:

  • ./mach static-analysis check --outgoing (C/C++)

If you see a problem in this automated review, please report it here.

You can view these defects on the code-review frontend and on Treeherder.