Exactly. Most will specialize at install-time. Code without an install step will specialize at first execution; the IR is designed for specialize speed.
The specializer can also be run free-standing as well, to create ROMS and where installations want to ensure distribution uniformity.
Specialization at install time is, in a lot of ways, the most reasonable answer. (It completely sidesteps the permissioning issues involved in write permission on a global cache at first execution, for example.)
However, hacking an existing package manager to do this job may not be entirely trivial --- and, in typical server environments these days, there will often be more than one package manager to hack. Language-based package managers like Rubygems and npm, for example, build binary extensions for Ruby and Node/Javascript as part of their job. Updating any of them to deal with an additional "specialization" step may not be much of a big deal --- but dealing with all of them might be. (Particularly if you have sysadmins that like to, say, run something like debsums to verify that the installed files are exactly the same as the ones in the package --- which might fail on a "fattened" binary.)
If you've already had coffee with the maintainers of, say, apt (or rpm), npm (or Rubygems, or PIP), and a few others, hashed out the issues, and have worked out that it's no big deal, that's great! But if not --- management of servers these days is complicated in ways that you'd never guess from just looking at the hardware, and a few of those coffee chats might be enlightening.
I don't think it's that difficult for package managers to deal with. All of them have a way to hook post-install actions to run an arbitrary script. You might wind up with another step added to the scripts used to produce the package which automatically generates this hook, and the checksums might be slightly difficult (but then storing specialised binaries separately isn't hard).
I guess the biggest infrastructure change might be running the specialisation on a dedicated machine and hosting your own packages. This way you could also checksum and sign the specialised binaries as well.
I'm not sure what you're suggesting here. Adding the "specialization" hooks to the post-install script in every single package is genuinely hard --- Debian, for example, has literally thousands of packages, maintained by hundreds of packagers, some of which squirrel .so files and the like in all sorts of places you might not expect them without detailed knowledge of the package in question. And that's before you even consider software installed on Debian boxes by other systems entirely, like npm or Rubygems.
Running the specialization on build machines is easier, if you have compilers there that produce binaries directly, and not IR. But that's exactly what the Mill crew is trying to avoid by producing the IR! (Though it may actually be a better fit to an infrastructure which is already set up to produce distinct binaries for different processor architectures (x86, x86-64, ARM, and several others), and which is not set up to expect the "specialization" hook as a necessary post-install step which each package must separately provide for...)
The compiler makes the first branch prediction table. (It's actually smaller than a branch prediction table, because it only needs one exit point for each entry point.) The table can be updated (in memory) during execution if needed. The on-disk version can be modified for optimization purposes for later runs, but the mechanism for that would obviously be software dependent and not reside on chip.
It seems like the specializer contains a lot of a compiler back end. (I am not a compiler guy, so my understanding is probably wrong.)
Different family members have different functional unit timings and different numbers and orders of functional units. ("orders" -- the order they drop results onto the belt. What's the right term?) Therefore the specializer has to schedule instructions.
Different family members have different belt lengths. Therefore the specializer has to insert belt spills, which seems to be analogous to register allocation/spills.
Different family members have different encodings, so the specializer has to determine the size of each basic block and link-edit them together. (Probably fast, just a lot of bookkeeping.)
It looks like there's a lot of work between the Mill IR and the actual machine code.
It transforms one language into another, so by definition it's a compiler. But what is so scary about that? I would be willing to bet that most people already run JIT'd code every day, in the form of Java, C#, Javascript, etc.
My question is, how slow is the specializer? It is portrayed as very fast, but it looks like it's doing a significant fraction of a real compiler's work.
Then again, if it's mostly used at program installation time, who cares?
The specializer can also be run free-standing as well, to create ROMS and where installations want to ensure distribution uniformity.