Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Experimental format to help readability of a long rant:

1.

According to the OP, there's a "terrifying tale of VACUUM in PostgreSQL," dating back to "a historical artifact that traces its roots back to the Berkeley Postgres project." (1986?)

2.

Maybe the whole idea of "use X, it has been battle-tested for [TIME], is robust, all the bugs have been and keep being fixed," etc., should not really be that attractive or realistic for at least a large subset of projects.

3.

In the case of Postgres, on top of piles of "historic code" and cruft, there's the fact that each user of Postgres installs and runs a huge software artifact with hundreds or even thousands of features and dependencies, of which every particular user may only use a tiny subset.

4.

In Kleppmann's DDOA [1], after explaining why the declarative SQL language is "better," he writes: "in databases, declarative query languages like SQL turned out to be much better than imperative query APIs." I find this footnote to the paragraph a bit ironic: "IMS and CODASYL both used imperative query APIs. Applications typically used COBOL code to iterate over records in the database, one record at a time." So, SQL was better than CODASYL and COBOL in a number of ways... big surprise?

Postgres' own PL/pgSQL [2] is a language that (I imagine) most people would rather NOT use: hence a bunch of alternatives, including PL/v8, on its own a huge mass of additional complexity. SQL is definitely "COBOLESQUE" itself.

5.

Could we come up with something more minimal than SQL and looking less like COBOL? (Hopefully also getting rid of ORMs in the process). Also, I have found inspiring to see some people creating databases for themselves. Perhaps not a bad idea for small applications? For instance, I found BuntDB [3], which the developer seems to be using to run his own business [4]. Also, HYTRADBOI? :-) [5].

6.

A usual objection to use anything other than a stablished relational DB is "creating a database is too difficult for the average programmer." How about debugging PostgreSQL issues, developing new storage engines for it, or even building expertise on how to set up the instances properly and keep it alive and performant? Is that easier?

I personally feel more capable of implementing a small, well-tested, problem-specific, small implementation of a B-Tree than learning how to develop Postgres extensions, become an expert in its configuration and internals, or debug its many issues.

Another common opinion is "SQL is easy to use for non-programmers." But every person that knows SQL had to learn it somehow. I'm 100% confident that anyone able to learn SQL should be able to learn a simple, domain-specific, programming language designed for querying DBs. And how many of these people that are not able to program imperatively would be able to read a SQL EXPLAIN output and fix deficient queries? If they can, that supports even more the idea that they should be able to learn something different than SQL.

----

1: https://dataintensive.net/

2: https://www.postgresql.org/docs/7.3/plpgsql-examples.html

3: https://github.com/tidwall/buntdb

4: https://tile38.com/

5: https://www.hytradboi.com/



> I personally feel more capable of implementing a small, well-tested, problem-specific, small implementation of a B-Tree than learning how to develop Postgres extensions, become an expert in its configuration and internals, or debug its many issues.

It gets harder as you delve into high concurrency and ensuring ACID: if you are using an established database, these are simply problems you don't have to deal with (or rather more truthfully, there are known ways to deal with them like issuing an "UPDATE x=x+1" instead of fetching x and then setting it to x+1).

Still, writing an application expecting the datastore to ensure consistency is one thing, and ensuring that consistency are different problems requiring a different mindset (you are thinking of hard problems of your business logic, but you also have to think of hard problems common to db engines at the same time?).

> But every person that knows SQL had to learn it somehow. I'm 100% confident that anyone able to learn SQL should be able to learn a simple, domain-specific, programming language designed for querying DBs.

The benefit of languages as ubiquitous as SQL is that once you need something that you did not think of, SQL already enables it. But plenty of non-relational databases provide their own non-SQL APIs already (ElasticSearch, Redis, MongoDB, DynamoDB...), and as you suggest, developers cope with them just fine.

However, people used to expressiveness of SQL (even if we all know it's imperfect), always miss what they can achieve with a single query moving performance (and some correctness) considerations to the database. The idea is as old as programming: transfer responsibilities for accessing data performantly to whatever is managing that data, even if we know that there are always cases where it's an uphill battle.

It's that combination of good-enough performance, good-enough expressiveness, impressive consistency and correctness, and relational databases (and SQL) are a great choice for most applications today.


The ACID and concurrency aspects are definitely harder to deal with, but it also depends on what you need. I wonder if many people would find a nice perf increase by running a simpler, well designed db that runs in a single process of a beefy modern computer in a compiled language. In any case, writing any multithreading or multiprocess code is hard, and I doubt a multi-million LoC codebase makes it any easier.

> you are thinking of hard problems of your business logic, but you also have to think of hard problems common to db engines at the same time?

YES! everyone is complaining these days about slow software in our beefy machines. I guess the core of my rant is that it feels like all of us programmers should start caring a lot more about data organization, code size, minimizing dependencies, data oriented design and "mechanical sympathy". Advances in languages, tooling and accessibility to information should demystify the how-to of managing our own application data ourselves.


I symphatise with your last point! And I agree that great developers should understand how to build a sufficiently performant database for their app, even if they won't build one.

However, I think our applications are not slow due to database access, but one too many layers of indirection otherwise: eg even ORMs usually introduce a huge performance and complexity cost.

Just like we are trying to come up with better and less error prone concurrency models in code (async/await, coroutines...), I get that you are trying to come up with better tooling support for data access, and we should.

But we also need to be aware that some people simply want to solve a problem more efficiently, but not most efficiently (look at most ML code and you can barf at it — yet it still makes a huge progress in one area they care about).


> A usual objection to use anything other than a stablished relational DB is "creating a database is too difficult for the average programmer." How about debugging PostgreSQL issues, developing new storage engines for it

that's exactly what OP company is doing: they are building storage engine for postgres.


I doubt this initiative is gonna make Postgres easier to use, smaller in terms of dependencies, simplicity of its codebase or resource usage.


Regarding resource usage, the benchmarks in the article show reduced IO usage. Are you doubting the validity of those benchmarks?


it will unlock new performance improvement scenarios.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: