nfachan's comments

nfachan · on July 11, 2024

We use our own small wrappers for these syscalls, built on top of Rust's libc crate. All our wrappers live here:

https://github.com/maelstrom-software/maelstrom/blob/main/cr...

For bind mounts, you want to look at open_tree and move_mount. For "regular" mounts, you want to look at fsopen, fsconfig, fsmount, and move_mount.

I found this video very useful: https://www.youtube.com/watch?v=gMWKFPnmJSc

nfachan · on July 10, 2024

If you're interested, the main part of the container implementation is here: https://github.com/maelstrom-software/maelstrom/blob/main/cr...

For each test we run, we clone the worker process, then make a bunch of Linux syscalls to set everything up for the container, then exec the test. We use the trick of having the child process share the virtual memory of the parent until the test is exec'ed.

We also use a technique where we build up a "program" of simple operations (each operation more or less maps to a syscall) in the parent before cloning, then evaluate the program in the child. This gives us the same performance benefits of using posix_spawn or vfork, but lets us configure all of the namespace stuff while we're spawning.

The code that's run in the child can be found here: https://github.com/maelstrom-software/maelstrom/blob/main/cr...

nfachan · on July 10, 2024

Our desire is to provide a general test-running and job-running framework, not to built the world's next test runner. We think nextest is great, and we were inspired by some of the things they did.

We've designed Maelstrom to be usable as a library. So you can build your own test runner or job runner. We've been in contact with Rain, the primary developer of nextest, regarding how we can make it so that nextest can use Maelstrom. We'd love nothing more than to have nextest be Maelstrom-ized (Maelstrom-ified?).

We definitely have a little bit of work to do, but we plan to make big steps with the API for the next release. Currently, the client library doesn't give per-test updates until the test finishes. This means that you don't know how long a test is taking to run until it's completed (though we do provide a timeout feature). This is fine for our currently limited UI, but is probably insufficient for nextest.

Maelstrom in standalone mode running on a single machine is usually a bit slower than nextest. Maelstrom and nextest are similar in that they both run each test in their own processes, and they both do a good job of running enough test processes in parallel to keep the machine busy. Maelstrom has to do a little bit more work each time it starts a new process to set up the namespaces, so it's always going to be a bit slower than nextest, but not by much.

One thing that Maelstrom does that I don't think nextest does is to use Longest Processing Time First (LPT) scheduling (https://en.wikipedia.org/wiki/Longest-processing-time-first_...). When the runtimes of tests varies a lot within a project, using LPT can result in big wins and more predictable runtimes. Maelstrom itself actually has some pretty long-running integration tests, and once we added LPT, running Maelstrom tests on Maelstrom is usually faster than running them on nextest. But again, we're not talking about huge differences in single-machine cases.

I think cargo test is usually slower than both Maelstrom and nextest for the reasons described in the nextest documentation: cargo test doesn't always keep enough test threads running to keep the machine busy. However, if you have a lot of really small tests all in a single crate, then cargo test can and does outperform both Maelstrom and nextest. The clap project (https://github.com/clap-rs/clap)is a good example of this.

I think Maelstrom does most of the performance things that nextest does. However, nextest obviously has a lot more features and integrations than Maelstrom.

nfachan · on July 10, 2024

I forgot to answer your two other questions.

Maelstrom is open source and we plan to keep it that way. We may look at ways of selling access to a hosted cluster as service. Test running is very elastic, and could benefit from having an elastic service to support it.

Maelstrom is completely root-less, so it'll work inside of Docker just fine. We regularly test Maelstrom within Maelstrom.

nfachan · on July 10, 2024

At my previous company, we had a lot of tests. They were a mix of C and Python. Running all of them on a single machine took on the order of an hour or more. Even just limiting the tests run to those that could theoretically be affected by your change could take minutes or even tens of minutes.

We ended up building a shared cluster of ~1000 cores that was available to all developers, and that was used by CI. This changed our developers' workflows quite a bit. It was now possible to run large amounts of tests regularly: like every few minutes instead of a once or twice a day. This in turn encouraged developers to write more tests and do more test-driven development.

On top of that, having the cluster available provided other benefits. If a test was flakey, it was easy to run it tens or even hundreds of thousands of times, making it easy to reproduce and identify the bug. We also occasionally did Monte Carlo simulations, and it was really handy to have a lot of cores available for general developer use.

I got used to working that way and I've missed it since I left that company. So this project is an attempt to make a more general-purpose implementation of that system. I hope others will find similar workflows that make them more productive using this system or something like it.

Regarding the container-per-test idea. It really comes about because it's the obvious way to package up jobs to submit them to a cluster. Plus, it makes tests reproducible for all developers in a project, and between developer machines and CI. Using Linux namespaces, the overhead of running tests in individual containers isn't much more than running tests in individual processes.

nfachan · on May 1, 2024

Hi everyone,

Maelstrom is a Rust test runner, built on top of a general-purpose clustered job runner. Maelstrom packages your Rust tests into hermetic micro-containers, then distributes them to be run on an arbitrarily large cluster of test-runners, or locally on your machine. You might use Maelstrom to run your tests because:

  * It's easy. Maelstrom functions as a drop-in replacement for cargo test, so in most cases, it just works.
  * It's reliable. Maelstrom runs every test hermetically in its own lightweight container, eliminating confusing errors caused by inter-test or implicit test-environment dependencies.
  * It's scalable. Maelstrom can be run as a cluster. You can add more worker machines to linearly increase test throughput.
  * It's fast. In most cases, Maelstrom is faster than cargo test, even without using clustering.
  * It's clean. Maelstrom has a from-scratch, rootless container implementation (not relying on Docker or RunC), optimized to be low-overhead and start quickly.
  * It's Rusty. The whole project is written in Rust.

We started with a Rust test runner, but Maelstrom's underlying job execution system is general-purpose. We will add support for other languages' test frameworks in the near future. We have also provided tools for adventurous users to run arbitrary jobs, either using a command-line tool or a gRPC-based SDK.

Feedback and questions are welcome! Thanks for giving it a whirl.