I think that the advantage the mill gives here is more room to compute things speculatively on the critical path. You can happily load from an invalid address and then realise that was wrong later without causing an exception in the CPU; same with FP arith etc. This allows you to parallise more sequential code than say x86. I think the other related advantage here is that they've done as much as they can to remove any idea of core global state (comparison flags and so on) so that more operartions can be run in parallel; do several comparisona at once, and then process all the results together.