More

jfrisby · 2025-09-28T18:12:07 1759083127

Since this is HN, I'm gonna pick a nit.

> A clever and witty bash script running on a unix server somewhere is also not utilitarian coding, no human ever directly benefited from it.

Back around 2010, my friend Mat was doing cloud consulting. He wrote some code to screen-scrape the AWS billing and usage page for an account to determine how much had been spent day-over-day. This was, of course, all orchestrated via a bash script that iterated through clients and emailed the results to them (triggered by cron, of course).

He realized he had startup on his hands when something broke and clients started emailing him asking where their email was. Cloudability was born out of that.

I'd say that both the Ruby and bash code involved count as pretty utilitarian despite running on a server and not having a direct user interface.

boricj · 2025-09-28T18:37:07 1759084627

I'm gonna up the nit.

Several years ago, I was the sysadmin/devops of an on-premises lab whose uplink to the rest of the company (and the proxy by extension) was melting under the CICD load.

When that became so unbearable that it escalated all the way to the top priority of my oversaturated backlog, I took thirty minutes from my hectic day to whip up a Git proxy/cache written in an hundred lines of Bash.

That single-handedly brought back the uplink from being pegged at the redline, cut down time spent cloning/pulling repositories in the CICD pipelines by over two-thirds and improved the workday of over 40 software developers.

That hackjob is still in production right now, years after I left that position. They tried to decommission it at some point thinking that the newly installed fiber uplink was up to the task, only to instantly run into GitHub rate limiting.

It's still load-bearing and strangely enough is the most reliable piece of software I've ever written. It's clever and witty, because it's both easy to understand and hard to come up with. The team would strongly disagree with the statement that they didn't directly benefit from it.

jfrisby · on Oct 18, 2024

If you are good enough at the task of "finding a cofounder" that you can offer useful advice on it, you are... bad at some very critical things.

This is not a task anyone should be experienced enough to be good at.

jfrisby · on April 23, 2023

I saw a headline about the book on Reddit and made a joke to a friend of mine: "How much you wanna bet it uses mmap?" Well...

It uses mmap.

jfrisby · on Dec 29, 2022

Because employers _want_ people to be fungible. The quest to quantify and systematize hiring is a Quixotic effort at solving real challenges, but it persists because nobody has come up with an actual solution.

b20000 · on Dec 29, 2022

people have a long time ago, it’s called a university degree

also in the US employment is at will. not happy? fire the person.

jfrisby · on Oct 4, 2022

Almost, but not quite the audience for your question.

I was going to use Heroku, but the security requirements for my app made that a non-option. Instead, I went with simple Terraform that spins up CoreOS nodes, using cloud-init to spin up Docker containers. CI process built the Docker images, and deploying was just a matter of making an ECR tag point at the new image then cycling instances. Not quite as simple as Heroku, but pretty close.

I moved onto AWS via this: https://registry.terraform.io/namespaces/GoCarrot

No k8s. No Docker. Beautifully clean system. Blue/green deploys with automatic rollback. Continuous deployment (there's a CircleCI orb as well). Tightly buttoned-down system configuration. Debian built from scratch with an eye towards supply chain security. (There's a root image factory and base image factory to handle the layering of image build processes involved.) Log aggregation and configuration management baked in. Security that can only be described as "insanely paranoid, yet oddly pragmatic." Cascaded image builds, so I can update the entirety of my infrastructure by pushing one button to kick things off then clicking a few buttons to approve deployment of the various services.

1. About a week, although I had the assistance of the author. I'm now rebuilding my personal infra on Omat and it's going much more quickly. Probably 3 days total, with no assistance.

2. More experience with DevOps stuff than most, but not a DevOps person by any stretch.

3. Very, very, very instructive.

4. Given what I was coming from, cost-neutral. Compared to Heroku? Notably cheaper.

5. At present, we're in alpha so traffic is negligible and back-end workload is fairly minimal (tens of thousands of jobs per day). The author of the tools, however, is CTO of a mid-tier SaaS that handles a quite significant (millions of transactions/day, IIRC) amount of traffic, and he is super aggressive about not being hobbled on performance needs -- but also being cost-efficient.

6. Avoiding the k8s iceberg, while having all the modern amenities in a system I actually have a hope of understanding top-to-bottom (modulo my hesitation around reading the systemd source) is nice. This system is an object lesson in "loosely coupled, highly cohesive" design. I haven't felt at any point that I may be stuck in a You Can't Get There From Here situation. Avoiding layering a second layer of software-defined networking (Docker/k8s) on top of the software-defined networking of AWS means I neatly avoid the single biggest source of chronic issues (and, via the workarounds needed, system complexity) that I've experienced with "modern" (Docker/k8s) DevOps approaches.

jfrisby · on Nov 25, 2021

My home folder is a git repo (with a very extensive .gitignore, so I'm only versioning dotfiles, ~/bin, etc), with a branch per machine. Copying over the .git folder to a new machine is step 1. From there, I selectively restore

I have a Brewfile/Brewfile.lock, so I can go from setting up Homebrew to having most of the core tools I need very easily.

I have some scripts in ~/bin for helping with setup using the defaults command.

Other handy tools include chflags, scutil, possibly pmset, etc.

My config setup script looks approximately as follows -- hope you find some of this useful!

https://gist.github.com/MrJoy/20accc3b463e75ce5eecbbd0cff841...

jfrisby · on April 11, 2021

Rails + graphql-ruby + ActiveAdmin + Devise + SideKiq/Faktory + Postgres remains a ridiculously productive combination for me.

I'm leaning more towards Faktory these days because it makes it possible to move individual jobs from Ruby to Go if/when performance becomes an issue and the much simpler client-side logic makes making performance-related adjustments much simpler and less risky.

For the front-end, I've been dabbling with Next.js, and using Tailwind heavily. Deployed to S3/CloudFront. I'm not completely sold on Next.js yet, but I've liked it better than the other options I've tried so far.

For deployment of the backend / admin tooling, Docker + ECR + EC2 + CoreOS, although I'm looking at changing out CoreOS for Debian now that it's effectively DOA. CI does a Docker image build and pushes to ECR, tagging with the git hash. Deployment consists of changing an environment-specific tag to the desired image, and replacing servers. For blue/green deployment you just use 2 different tags (e.g. prod-blue / prod-green).

I keep a docker-compose config for developers who only work on one piece of the stack at a time and don't want to deal with the complexities of managing the local development environment for the rest but don't personally use it in development.

For context: I'm a serial technical founder, so time to market is usually the biggest priority for me. Of course, different requirements / career paths are often going to lead to other options being more suitable.

jfrisby · on April 5, 2021

The workaround I've found: Select the relevant line(s), but before releasing the mouse button, cmd-c (or the equivalent shortcut in your local computing environment).

But yes, it's incredibly annoying. Especially for those of us with ADHD who highlight parts of text as an anchor so we can get back on-track faster if we get distracted.

jfrisby · on Dec 31, 2020

Well, for starters, it's simply untrue that every company (likely) will (barring bankruptcy of course) eventually need extreme scale, and the implicit assumption that the up-front cost of implementing such scalability is necessarily worthwhile. That may be true in Staruplandia, but the industry is a _lot_ bigger than the world of Silicon Valley style startups.

For example: My current company is a small lifestyle biz. Sure, it's a "tech company", has an event-processing pipeline and the like, but if we were 100x more successful than our wildest ambitions, we still wouldn't need anywhere near 25.6k IDs/sec/server. And, given the objectives of the company, it would make more sense to turn customers away than to grow the team to accommodate that sort of demand.

The simple fact is that in a great many situations extreme scale isn't needed. And, either way, the cost of implementing extreme scalability can inflict its own harm. Highly scalable systems usually come with more operational complexity, and steeper learning curves. When you have a small team, this can impair -- or even cripple -- product development. If your company hasn't found product/market fit yet, this attitude of sacrificing the present for an imagined future can materially reduce the likelihood of that future coming to pass. Of course, the opposite is sometimes true as well. Companies have, in fact, failed because they found product market fit but took so many shortcuts they couldn't adapt and grow into their success. But the point is that determining how much to invest in future-proofing is a complex and nuanced problem not amenable to sweeping generalizations.

Now, this clearly isn't such an extreme case, but frankly Sonyflake (the original, not the Rust implementation) seems to be operationally simpler than Snowflake while offering perfectly reasonable tradeoffs. The Rust implementation might prove entirely useful to any number of organizations _based on their needs_. If they have a Rust codebase with a single process per machine this could easily be a simpler and more robust option than Sonyflake.

The kind of arrogant dismissiveness based on one's own personal (and highly specialized) experience that's shown in the downvoted comment tends to leave a bad taste in peoples' mouths. Thus the downvoting.

secondcoming · on Dec 31, 2020

> And, either way, the cost of implementing extreme scalability can inflict its own harm. Highly scalable systems usually come with more operational complexity, and steeper learning curves

I think you're overestimating the cost factor of future-proofing an architecture for extreme scale (and 26K/sec/server isn't actually that 'extreme'). And instead of downvoting people who've walked that walk, perhaps people may realise this by engaging instead.

Also, I didn't read any 'arrogant dismissiveness' in the post. Each to their own! Happy New Year!

jfrisby · on Jan 1, 2021

As I touched on, the cost can vary a lot depending on the particulars of the situation.

For example, one company I came into was using Redshift and fighting a lot of problems with it. The problems stemmed from a combination of not knowing how to use it effectively (e.g. batching writes), it being the wrong tool for the job (they were using it for a combination of OLAP and OLTP(!!) workloads), and so forth. The long and short is that for both workloads, a basic RDS Postgres instance -- or a pair, one for OLAP, one for OLTP -- would've been considerably cheaper (they'd had to scale up a couple notches from the minimum cluster size/type because of performance), avoided correctness problems (e.g. UNIQUE constraints actually doing the thing), been several orders of magnitude more performant at the scale they were at, etc. They simply didn't understand the tool and tried to apply it anyway. They basically thought Redshift was like any RDBMS, but faster "because it stores data in a columnar format."

Had they understood it, designing a data pipeline that could make use of such a tool would have required considerably more engineering person-hours than the pipeline they actually built.

Obviously this is a terribly extreme example, but the learning curve for tools -- including the time/effort needed to discover unknown unknowns -- is a cost that must be factored in.

And, even if your org has the expertise and experience already, more-scalable solutions often have a lot more moving parts than simpler solutions. Another organization I was at wanted to set up a data pipeline. They decided to go with Kafka (self-managed because of quirks/limitations of AWS' offering at the time), and Avro. After 2 months that _team_ (4 sr + 1 jr engineers, IIRC) had accomplished depressingly little. Both in terms of functionality, and performance. Even considering only the _devops_ workload of Terraforming the setup and management of the various pieces (Avro server, Zookeeper cluster, Kafka cluster, IAM objects to control access between things...), it was a vastly more complicated project than the pipeline it was meant to replace (SQS-based). Yes, that's a bit of an apples-to-oranges comparison but the project's goal was to replace SQS with Kafka for both future throughput needs and desired capabilities (replaying old data into other targets / playing incoming data into multiple targets without coupling the receiving code).

By the time I left, that project had: 1. Not shipped. 2. Was not feature-complete. 3. Was still experiencing correctness issues. 4. Had Terraform code the CTO considered to be of unacceptably poor quality. 4. Monitoring and observability was, let's say, a "work in progress." Some of that is for sure the Second System Effect, but it is not at all clear to me that they would have been better off if they'd gone with Kafka from day 1.

Given that we could pretty easily extract another 2 orders of magnitude throughput out of SQS, there's a real discussion to be had about whether or not a better approach might've been to make a Go consumer that consumed data more efficiency, and shunted data to multiple destinations -- including S3 to allow for replay. That would've been a 1-2 week project for a single engineer. Kafka is 100% the right tool for the job _beyond a certain scale_ (both of throughput and DAG complexity), but the company was something like 4 years in when I got there, and had been using SQS productively for quite some time.

And no, 26k/sec/server isn't especially huge. I was referring to the fact that the downvoted commenter was making sweeping generalizations. Sweeping generalizations tend to shut discussion down, not prompt more nuanced discussion. Other threads on this post have seen very interesting and productive discussions emerge, but note that the downvoted commenter's post hasn't really drawn anything other than people being sucked into the very discussion we're having now. It's counter-productive.

jfrisby · on Dec 31, 2020

I think you're crossing the streams a bit here.

Twitter designed _Snow_flake. _Sony_flake is a reimplementation that changes the allocation of bits, and the author acknowledges that the maximum ID generation throughput ceiling is lower than that of Snowflake.

Snowflake uses a millisecond-precision timestamp, and had a 12-bit sequence number. So 4,096,000 IDs per node-second. At that point, the bottleneck won't be the format of the IDs but the performance of the code, and IPC mechanism. Which, in this case, is non-trivial since Snowflake uses a socket-based approach to communication. Lower-overhead, native IPC mechanisms are certainly possible via JNI but would probably take some doing to implement. For Sonyflake, I don't imagine the socket overhead is all that much of an issue given the low throughput it's capable of with its bit allocations.

Were I to design something like this again[1], I might start with something like Sonyflake (the self-assignment of host ID w/out needing to coordinate via Zookeeper is nice), shave a couple bits from the top of the timestamp, maybe a couple from the top of the host ID, and pack the remainder at the top, leaving a few zero bits at the bottom. That would essentially mean the value returned was the start of a _range_, and anything needing to generate IDs in large quantities can keep its own in-process counter. Only one API call per N IDs generated by a given thread/process. And, of course, unix-domain socket or other lower-overhead approach to communication for when the API calls are needed.

[1] - A decade prior to Snowflake's release, I wound up taking a very similar approach at one of my first startups, albeit much more crude/inelegant and without nice properties like k-ordering.