Hacker Newsnew | past | comments | ask | show | jobs | submit | Deadron's commentslogin

Just do a normal merge, then squash all your commits in into one, using rebase, then a rebase onto a branch is easy.


For when you inevitably need to expose the ids to the public the uuids prevent a number of attacks that sequential numbers are vulnerable to. In theory they can also be faster/convenient in a certain view as you can generate a UUID without needing something like a central index to coordinate how they are created. They can also be treated as globally unique which can be useful in certain contexts. I don't think anyone would argue that their performance overall is better than serial/bigserial though as they take up more space in indexes.


People really overthink this. You can safely expose internal IDs by doing a symmetric cipher, like a Feistel cipher. Even sequential IDs will appear random.


Looks easy on the surface, but the problem is key rotation.


I didn't know about this problem but was already thinking it sounds even harder. And the resulting IDs are probably quite large.


But these are internal IDs only, and public ones should be a separate col. Being able to generate uuid7 without a central index is useful in distributed systems, but this is a Postgres DB already.

Now, the index on the public IDs would be faster with a uuid7 than a uuid4, but you have a similar info leak risk that the article mentions.


"Distributed systems" doesn't have to mean some fancy, purpose-built thing. Just correlating between two Postgres databases might be a thing you need to do. Or a database and a flat text file.


I usually just have a uuid4 secondary for those correlations, with a serial primary. I've done straight uuid4 PK before, things got slow on not very large data because it affected every single join.


PgBouncer has always left me confused in the world of application level connection pooling. I have never quite understand the value of it if we are already using connection pools in our applications. I don't want to pool connections to another pool.


Application level connection pools are not enough if you're using something like k8s and have your "application" running across hundreds of pods, each with their own application level connection pool. pgBouncer helps tremendously in that situation because all those pods will use a single pool. We cut down avg open connections dramatically by doing that from over 1000 to less than 400.


Also IoT


I would hope you are not allowing IoT devices direct access to your database. There is no saving that haha.


Unfortunately some of my customers are. Hopefully they're setting up isolated roles which can only access stored procedures to log readings so that at worst they're opening themselves up to DDoS


This still doesn't really make sense to me. You can't scale an application that relies on a database heavily to this level because your fundamental constraint IS the database. If you are already hitting your max number of connections with a small number of applications there are no benefits to further horizontal scaling. You are just passing the buck around because only a limited number can hit the database at any one time.


The constraint is less often the database in reality then in theory. Expecially when you consider many large scale applications are doing complex things. A normal request trace might only spend 30% or 40% of its time in the DB call. When you consider that a single postgres database can clear 200k QPS, you start to get to a world where you have thousands of hosts. If you tried to tune the in application connection pool to suit the various scale of deployments you would quickly find that having a proxy is both simpler and safer.

I would confidently state that most large scale applications have run into the situation where they have scale up their application worker fleet, and then crashed their database with too many connections. Coordinating the size of the worker connection pool in a world where we have elastic application workers is enough of a task that deploying a proxy is really the simplest and best solution


Most large scale web applications spend their time reading and writing data, both to/from clients and to/from other remote services such as databases. You don't need thousands of hosts. Stackoverflow famously ran 9 server instances in 2016 with 7 dedicated to the primary sites.

Unlike postgres, Oracle and Sql server can support thousands of connections but they see performance degradation at a certain point. So I have never seen them crash from too many connections(Although they definitely get slower.).


Stackoverflow is very much the exception not the rule. Most of your top tier software companies have server fleets that scale well past the 10,000's of nodes level, and for container based workloads I don't think its uncommon to have even medium sized companies running 100k+ containers.


Tldr; aim to be the exception!


The real problem is working with postgres’ “each connection is a process” model. Pgbouncer puts an evented pool in front of postgres to deal with this. Apps that are aggressive with DB will not benefit from having an evented pool in front. However, web apps (think rails) will have connections checked out even if they don’t need them. Pgbouncer helps here. If your app recycles DB connections when not in use, that leads to connection thrashing and higher latency, which pgbouncer can help with. But you’re right that at some point, the DB is the bottleneck. For most people, it’s the number of connections, because postgres makes a process for each connection.


> You can't scale an application that relies on a database heavily to this level because your fundamental constraint IS the database.

I pool ~10.000 connections down to under 500. I can't do application level pooling because the application is just thousands of individual processes, you often see this with Python, PHP or NodeJS applications.

500 open connections is way less overhead than 10.000 on the Postgres server. I'm very happy to "pass the buck" of connection pooling to separate machines with pgBouncer.


The point is that if you have 50 k8s pods that each have their individual connection pool, some of them will be holding idle connections while others are hitting their max connection limit. A single pool is much more flexible.

Additionally, the "transaction" mode of PgBouncer can increase the utilization of connections further by making them available to a different application when one application holds a connection while doing something different (e.g. waiting for a call to an external service).


If one has CPU load for 50 pods, then it is not a usual project I guess? Average load should be thousands of requests per second I believe.

So the question is how many applications have CPU load for 50 pods but also cannot saturate a single database, so sharding is not considered yet? Gut feeling is that they are few, more like exceptions.

From my personal experience pgBouncer is used a lot when framework/library does not support application level pooling in the first place, and not so much when it does.


This. For the web service workloads most of us run, I’ve always run out of postgres connections before exceeding the vertical limit of a postgres server.


Is this because your default approach is to horizontally scale or because you have tried other options? Logically scaling vertically a single server with pooled connections should have a better effect than horizontally scaling to multiple servers that need to share connections through a proxy. The proxy doesn't magically increase the number of connections available to the database. It will add overhead to all your database interactions though.


Vertically scaling has a cost component. If you're able to use pgbouncer with transaction pooling to better utilize but not exceed your existing vertical postgres limit, then you're set. If you do, then you must scale up. Pgbouncer can help you lengthen your runway. Tuning your pgbouncer pool sizes can help minimize the overhead but as with anything we do, it's about weighing trade-offs for your situation.


Don't spin up 50 pods. You have outscaled your database. You can't make IO operations faster by throwing more workers at it and you can only have so many connections working at once. As a side note if your application is a typical IO bound web app its very unlikely you can process enough transactions to effectively use 50 workers in a single region.


We're not necessarily talking about 50 pods for the same application, could also be a zoo of smaller low traffic applications sharing a Postgres instance.


In those cases, does one keep the application-level pool at each pod, in addition to the "communal" PgBouncer pool, or does not offer any advantage?


If you limit yourself to a subset of Postgres' features, connections can become your bottleneck. I work with a production system where the major scaling constraints are 1) the VM capacity of the cloud region it runs in and 2) available connections to Postgres


To be clear pgbouncer does not add connections to postgres or remove the connection bottleneck. Its still there under the covers. If you are saturating your connections it will not be able to improve on throughput. It sounds like you need a different architecture to allow for queueing work. The approach pgbouncer takes may actually reduce performance overall as it will intermix work on the pg instance which, if you are already saturating the database, will slow down other operations overall.


Yup, one of the things we're doing is moving parts of the system away from Postgres into queues.


If you have 10 application instances each with a pool of 20 connections (half of which are idle), you have 100 active connections and 100 idle connections.

If you have a single "real" connection pool, the idle pool is effectively shared among all application instances. You can have 100 active connections and 10 (or maybe 20) idle.

I have run into this problem, but I solved it with careful tuning of automatic instance scaling parameters, connection pool size and timeout. But this only gets you so far; a single real connection pool would be more effective (at the cost of added complexity).


I think it's an issue with a microservice setup. TFA suggests ~300 connections being optimal for Postgres and I've seen microservice setups which have more than 300 processes, which means even limiting each process to a single DB connection might not be enough.

But yeah, for less distributed applications, just have N worker threads and don't close the DB connection after each job.


> You can't scale an application that relies on a database heavily to this level because your fundamental constraint IS the database.

That's a big assumption. If your app is just an HTTP wrapper for SQL queries against a poorly-optimized database, sure.

But there are plenty of applications that spend time the majority of their time doing other stuff (http requests, FFI calls, g/cpu-intensive processing, etc.) where the database interaction is a small part of the overall performance profile - certainly not the bottleneck, even with hundreds of concurrent clients.

In those cases, rather than artificially throttling the concurrency to stay within your global max_connections, you would ideally run as many as you can (ie fully utilize your infrastructure) and grab database connections from an external pool when you need them.


it lets me create a load balancer in front of multiple database instances without having to do a bunch of application-level BS

additionally with microservices, managing connection pooling can be difficult across legacy software, service versions, teams, etc.

PGBouncer lets me have a front-end and manage that at an infra level.


Lambdas


If you are running aws lambdas I believe you should be using the RDS proxy product(This is a very similar product though).


Super annoying that RDS Proxy doesn’t support IAM auth against the DB.

We moved all our DB users to use IAM auth based on instance roles and then found out that RDS proxy doesn’t support it.


Poorly vs well structured code.


It depends. The come and go based on legislation and the current fixed interest rates. They were very popular for a bit but as the fixed rates kept dipping they seemed to mostly vanish. They seem to only really be popular when financial institutions can easily resell them as they tend to target the lower income brackets.


Auto generated client code is nice in theory. In practice I find it only really useful as a starting point. There are enough choices to be made in writing even a simple HTTP api that its unlikely that a generic tool will generate useful code for a given application. This could include library usage (http client, serialization, logging, DI integration), async vs sync, logging requirements, tooling support for generated code. If you are in a language which has established std libraries and patterns this is less of a problem, but in something like Java that has evolved in all these areas over the years it can be a real problem.


Thrift seems to have a really nice solution of providing only types that eventually break down into scalars. And only for exchanging data - it doesn't dare prescribe the rest of that. I liked working with it.


The tooling is far from simplistic to setup and the available options can be overwhelming. Its all the pain of the JS stack but with less easily available help and tooling that produces less helpful error messages.


I think the tooling nowadays is pretty simple to setup, but the information out there doesn't speak to that new simple way, so everyone starting is following something that pushes them to outdated tooling.

Install Java:

    brew tap homebrew/cask-versions
    brew install --cask temurin17
Install Clojure:

    brew install clojure/tools/clojure
Type `clj` at the command line and play with Clojure!

Now install VSCode and get the Calva plugin for Clojure from the marketplace.

That's it. You'll have autocompletion, jump to definition, code formatting and highlighting, linting, support for editor integrated REPL, debugger, etc.

Then you can run:

    clojure -Ttools install io.github.seancorfield/deps-new '{:git/tag "v0.4.9"}' :as new
And now you can create new projects from various templates using:

    clojure -Tnew app :name myusername/mynewapp
This creates a new basic application project for example. Open it in VSCode and you can connect a REPL to it and start working.


The errors are bad, but understandable in 99% of cases with a little experience (like 2-3 weeks).

And the stuff about the tooling is flat out wrong. The CLJS tooling is light years ahead of JS tooling in my experience. Maybe that was different back in the day, but shadow-cljs is seamless. Vastly superior to working with package.json/babelrc/postcssrc/webpack.config.js/etc/etc/etc


If you are going to compare modern CLJS tooling against JS - you should also compare against modern JS tooling like esbuild/vitejs which operate at light-speed.


I don’t think it’s that

    $ npx nbb -e (println "Hello, world")
    Need to install the following packages:
      nbb
    Ok to proceed? (y)
    Hello, world
Or more comprehensively: https://clojurescript.org/guides/quick-start


Your example is missing anything actually related to rendering a webpage.


Unclear what "rendering a webpage" entails exactly.

If you want to do frontend development, you can give shadow-cljs a try, the quickstart is pretty quick: https://github.com/thheller/shadow-cljs#quick-start

If you want to just render server-side HTML, something like compojure (HTTP routing) and hiccup (Clojure data -> HTML) is pretty easy and quick to get started with (https://gist.github.com/zehnpaard/2071c3f55ed319aa8528d54d90...).

If you want to generate HTML files to serve with nginx/whatever, you can just use hiccup and `(spit)` the resulting HTML to files on disk.


I felt literal pain reading this.


Surprisingly, or maybe not, slf4j requires an implementation to be provided at runtime to log anything.


You can use a redis to distribute messages via PUB/SUB. The sockets subscribe to the events that are relevant to them. It can handle thousands of messages a second in a local environment and probably more in a dedicated hosting environment.


Seconding this approach. I've used PUB/SUB for this exact purpose.

When you need to send something through the websocket then publish a message via redis and have handlers in the code that are subscribed to that channel so every server can check if they have the websocket connection and whichever one has it will send the data through


That mostly sounds great, although TBH my first instinct would be to stick all the socket stuff in a microservice. I don't normally advocate for them, but this seems like a perfect situation: very well defined layer that contains no business logic.


Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: