x86 does not require an icache flush because it has a unified cache. Rosetta emulates this correctly, which means it must be able to invalidate its code without encountering such an instruction.
The region is RWX, and code is put into it and then executed without a cache flush. This requires careful setup by the runtime, and here's how Rosetta does it, line by line:
1. buffer is created and marked as RW-, since the next thing you do with a RWX buffer is obviously going to be to write code into it.
2. buffer is written to directly, without any traps.
3. The indirect function call is compiled to go through an indirect branch trampoline. It notices that this is a call into a RWX region and creates a native JIT entry for it. buffer is marked as R-X (although it is not actually executed from, the JIT entry is.)
4. The write to buffer traps because the memory is read-only. The Rosetta exception server catches this and maps the memory back RW- and allows the write through.
5. Repeat of step 3. (Amusingly, a fresh JIT entry is allocated even though the code is the same…)
As you can see, this allows for pretty acceptable performance for most JITs that are effectively W^X even if they don't signal their intent specifically to the processor/kernel. The first write to the RWX region "signals" (heh) an intent to do further writes to it, then the indirect branch instrumentation lets the runtime know when it's time to do a translation.
Writing to an address would invalidate all JIT code associated with it, not just code that starts at that address. Lookup is done on the indirect branch, not on write, so if a new entry would be generated once execution runs through it.
> How do you think it detects a change to executable memory without a permissions change or a flush?
One way how this could be implemented was the way mentioned above: By making sure all x86-executable pages are marked r/o (in the real page tables, not from "the x86 API"). Whenever any code writes into it, the resulting page fault can flush out the existing translation and transparently return back to the x86 program, which can proceed to write into the region without taking a write fault (the kernel will actually mark them as writable in the page tables now).
When the x86 program then jumps into the modified code, no translation exists anymore, and the resulting page fault from trying to execute can trigger the translation of the newly modified pages. The (real, not-pretend) writable bit is removed from the x86 code pages again.
To the x86 code, the pages still look like they are writable, but in the actual page tables they are not. So the x86 code does not (need to) change the permission of the pages.
I don't know if that's exactly how it is implemented, but it is a way.