Hacker Newsnew | past | comments | ask | show | jobs | submit | igodard's commentslogin

Much of the Mill is not new; we don't bother filing those parts. Much that is not new is very good, and was abandoned for reasons unrelated to the actual merits. The first compiler I ever wrote was for the Burroughs B6500, which in 1970 had better security than any current commercial architecture. That compiler is still in use.

Security in architecture is the history of a race to the bottom, driven by newbie customers not knowing there was such a thing and the economics of chip-making. We may hope that there are fewer newbies now. That leaves economics. To a large extent the Mill has been an effort to make old ideas economically viable to today's customers.


The grant model requires you to grant each object individually that you want to pass. That is annoying if you have many objects. In both the caps and grant models you can cut the overhead by thinking of the whole graph as "the object". A typical approach is to allocate graph nodes in an arena and pass the whole arena.

Fine granularity is expensive, which is why the monoliths have one process-granularity. If you have 100,000 graph nodes and want to pass all of them except this one then you will have to pay for the privilege, in any protection model. The Mill lets you pay less.


There are dededicated Well-Known-Regions for code, stack, globals and TLS that catch the great majority of memory references. The PLB is only consulted when we miss in the WKRs. What's not in the WKRs? MAP_SHARED mmap()s. How many of those are in your program? How often do you access one for the first time or after long enough that the entry has been evicted from the PLB?

Like any cache, the optimal PLB size is determined by the working set. In the typical code we are seeing, the program has a couple of open files, half a dozen mmaps where the heap grew itself, and portal blocks for assorted libraries. The working sets are much smaller than a conventional TLB, and with SAS we have several cycles available in parallel with the caches.

The upshot is that a PLB can be large, cool, and slow. As for the range compares, the PLB permits the same sore of address sub-setting as is done in mixed size TLBs. Think about how many bits in the typical address range differ between lower and upper bounds.


The tool chain does hoisting and if-conversion with wild abandon. That code becomes {x = cond ? a+b : a*b}, and both expressions are evaluated in parallel. The conversion is a heuristic; if you have tracing data for the branch then it might not convert. However, a miss-predict is a lot more expensive than a multiply so the tracing has to be pretty skewed to be worth the branch.

The conversion does increase the latency of getting the value of x. If there's nothing else to do then the tool chain will insert explicit nops to wait for the expression. The same stalls will exist on other architectures for the same code, just not visibly in the code. It happens that making the nops explicit is faster than a stall; you can idle through a nop with no added overhead, but you can't restart a stall instantaneously.


We have focused on the core and less on the uncore, which is why there have been no talks on I/O. The goal is for a smart peripheral to be indistinguishable from just another regular core; the Mill design is big on regularity. That implies that it has its own PLB and TLB, responds to HEYU, and supports the same IPC mechanisms, both those in the talk and those NYF.

Of course, modern peripherals don't look like that, so there will be adaptors. IBM 360 channels and CDC6600 PPs also haven't been architecturally revisited in a while.


That's really more the VC model; a bootstrap is different.


Caution is always warranted when you aren't getting cash on the barrel. The sweat equity documents are available - ask on the site (and now that I think about it, I suppose we should just put them on the site directly). There's no "owner": we all work on the same deal, me included. As it happens I have the largest chunk of equity. You can call that a scam after you have worked full time for over a decade with no paycheck :-)

And yes, Mill Computing, Inc. is not how real companies are run. Is that a bug or a feature?


So would we. See millcomputing.com -> About -> Invest in Us.


All sound points.

Most of what you'd like to see are things we'd like to see too. At the beginning we decided to bootstrap rather than follow the usual funding model, at least to the point at which we could demonstrate what we had to people who would understand it in detail. We choose bootstrap in large part because most of us were old enough to have had actual experience with other business models. Yes, it has taken far longer to get this far than we wanted, but we have gotten this far.

About evaluation: it has been our experience that the more senior/skilled a hardware (and software) guy is the more they fall in love with the Mill. You don't hear much of that - we want the tech to be judged on its merits, not on some luminariy's say-so. And of course those senior guys tend to work for potential competitors and don't want to say much publicly.

But you are right: the proof will be running code, and we're starting to do that. We'll be doing more talks like the switches talk, with actual code comparisons. Eventually we will put our tool chain and sim on the cloud for you to play with. Patience, waiting is.


The Mill grant-based model is semantically quite similar to capabilities, but it associates protection with the accessor (thread/turf) rather than the access (pointer/capability). This lets us preserve the size of a pointer, which no one knows how to do efficiently with capabilities.

The difference between the two models is visible when you pass a graph structure across a protection boundary. With caps is is easy to pass the whole graph, and hard to pass only one node. With grants it is vice versa.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: