Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The topic of huge queries on tiny databases makes me think of this recent discussion on the SQLite forum: https://sqlite.org/forum/forumpost/0d18320369

Someone had an issue because SQLite failed to optimize the following query

    select * from t where x = 'x' or '' = 'x'
Someone said that SQLite could not optimize out the "or '' = 'x'" because it would be too expensive to compute. Which is obviously true only for huge queries on tiny datasets.


> SQLite

Well... there's your problem. SQLite is not a general-purpose RDBMS, it is marketed as a replacement for "fopen()", a purpose for which it excels.

A similar product is the Microsoft Jet database engine, used in products such as Microsoft Exchange and Active Directory. Queries have to be more-or-less manually optimised by the developer, but they run faster and more consistently than they would with a general-purpose query engine designed for ad-hoc queries.


I hate Jet with a vengeance


It's not obviously true at all. Optimizing out `'' = 'x'` can be done for a fixed cost regardless of record count.


Optimizing out static expressions can be done in linear time at best. So if the number of clauses in WHERE is huge and the size of the underlying table is tiny (such as in the examples shown in the article we are commenting on), it will be better not to run the optimization.

But of course, in normal life, outside of the world of people having fun with Homomorphisms, queries are much smaller than databases.


Parsing the expression in the first place is already linear time.


True, but that doesn't mean doing additional work during the parse is free. Optimizing out static expressions will take additional time, and in general that additional time will be linear in the query size.


My argument is that, on average, it will more than pay for itself.

The only losing case, if there are any measurable ones, is where you have long queries and short data. I'd call that a case of "doing it wrong". Wrong tool for the job.


Why would it be too expensive to optimize out static subexpressions?


My guess is that the expense can be tricky to calculate since the additional optimization prior to executing the query may take longer than if the query was just able to run (depending on the dataset, of course). I wonder if it's too expensive to calculate a heuristic as well, so it just allows it to execute.

Just a guess.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: