The topic of huge queries on tiny databases makes me think of this recent discus...

jiggawatts · on Nov 20, 2024

> SQLite

Well... there's your problem. SQLite is not a general-purpose RDBMS, it is marketed as a replacement for "fopen()", a purpose for which it excels.

A similar product is the Microsoft Jet database engine, used in products such as Microsoft Exchange and Active Directory. Queries have to be more-or-less manually optimised by the developer, but they run faster and more consistently than they would with a general-purpose query engine designed for ad-hoc queries.

cerved · on Nov 21, 2024

I hate Jet with a vengeance

recursive · on Nov 20, 2024

It's not obviously true at all. Optimizing out `'' = 'x'` can be done for a fixed cost regardless of record count.

lovasoa · on Nov 20, 2024

Optimizing out static expressions can be done in linear time at best. So if the number of clauses in WHERE is huge and the size of the underlying table is tiny (such as in the examples shown in the article we are commenting on), it will be better not to run the optimization.

But of course, in normal life, outside of the world of people having fun with Homomorphisms, queries are much smaller than databases.

recursive · on Nov 20, 2024

Parsing the expression in the first place is already linear time.

thaumasiotes · on Nov 21, 2024

True, but that doesn't mean doing additional work during the parse is free. Optimizing out static expressions will take additional time, and in general that additional time will be linear in the query size.

recursive · on Nov 21, 2024

My argument is that, on average, it will more than pay for itself.

The only losing case, if there are any measurable ones, is where you have long queries and short data. I'd call that a case of "doing it wrong". Wrong tool for the job.

hinkley · on Nov 20, 2024

Why would it be too expensive to optimize out static subexpressions?

jjice · on Nov 20, 2024

My guess is that the expense can be tricky to calculate since the additional optimization prior to executing the query may take longer than if the query was just able to run (depending on the dataset, of course). I wonder if it's too expensive to calculate a heuristic as well, so it just allows it to execute.

Just a guess.