The thing is, there's not much documentation on where each index type would be applicable, what their upsides and downsides are in particular situations.
For example, the hash index is (currently) not WAL-logged, which means it's not crash safe and not replicated. It also may have some concurrency issues, the resources are sparse on this one.
The BRIN index should be ideal for situations where there is a huge storage of data (TBs) inserted in sequential order and rarely queried - a BRIN index for such a thing would be only a few MB in size. I say should because when I tried it on 9.5.0, the query planner kept ignoring the index.
Both GIN and GIST indexes can be used for full text search (tsearch2), but GIN is faster for queries while GIST is smaller and faster for insertion.
The SP-GIST index is so underdocumented that, when I wanted to try it out in a project, I could not even understand how it works and does it work correctly as a radix tree for strings.
What's also somewhat underdocumented are the "operator classes" and which are and aren't implemented for different index types. I.e. a table of which index type can be used with which data type (and what exactly does such combination produce) would be great.
> The BRIN index should be ideal for situations where there is a huge storage of data (TBs) inserted in sequential order and rarely queried - a BRIN index for such a thing would be only a few MB in size. I say should because when I tried it on 9.5.0, the query planner kept ignoring the index.
A BRIN index only keeps summary information for a range of pages. Because of this, a BRIN can only be used for a bitmap scan and cannot be used for a normal index scan. This makes the set of queries where the planner will use the BRIN index really small. It is also possible, since it is such a common mistake, that you forgot to run ANALYZE and so the planner had no clue the data was in sequential order.
To all the upvoters, did you actually read the linked slides or are you just upvoting because it has "PostgreSQL" in the name[1]? Interesting to see this many votes with zero comments.
It is a great set of slides by the way. The btree piece would be old news to anyone familiar with database related data structures but the detail on GIN, GIST, and particularly BRIN make for a good read. Seeing the physical layout for each of those gives you a much better understanding of the cost of maintenance and what situations would they'd be useful.
Also, if you happened to follow the recent Postgres/Uber thread, this gives a lot more context as to how updates to a row would cascade to many writes for each index.
[1]: ... which isn't a bad idea anyway because PostgreSQL is awesome!
At least for me, I have two different voting style:
- Yeah, this is pretty cool. Upvote!
- I've got to save this for later when I can really read it
PostgreSQL stuff tends to be pretty reliably good, and usually with an educational bent. So I'll tend to use the upvote as a bookmark to go back to later.
I gotta say, I don't really like slides, but I've yet to be disappointed by a PostgreSQL post. And usually, a high vote/comment ratio is a nice indicator of quality.
You know, I actually never noticed that. Is that another cool whizbang feature on the site? Somewhere along the line I got it into my head that upvotes were just how you favorited things.
Well. Time to start indexing my upvotes... (later...)
From viewing my profile it appears that they are public. I've never actually used it, just noticed the feature a while back. I've been using Instapaper for ages for my "read later" links.
Let me tell you one open secret on HN. If you see high upvote article on the front page and it stays there for a somewhat longer time ,chances are it is of great quality stuff.Most folks here can relatively separate the wheat from the chaff.
Because there are many different implementations, with different trade-offs and design decisions. Also, the implementation-specific documentation is typically kept fairly close to the code, as the docs need to be kept in sync.
So for example if you need to know how indexes in PostgreSQL work, look into the READMEs in the proper directory:
If what you are looking is a general book (not necessarily just for Postgres) I can't recommend the "Use The Index, Luke"[1] tutorial, and the accompanying book "SQL Performance Explained" by Markus Winand, enough.
It doesn't go into a lot of relational theory, but it does provide a somewhat low-level description on how indices and database data are stored, and why you want to use different kind of indices in different scenarios.
It's useful (probably more useful for most of us than any kind of deep dive into internals, at least for a first read), but I don't remember it going that deep into implementation decisions.
For example, the hash index is (currently) not WAL-logged, which means it's not crash safe and not replicated. It also may have some concurrency issues, the resources are sparse on this one.
The BRIN index should be ideal for situations where there is a huge storage of data (TBs) inserted in sequential order and rarely queried - a BRIN index for such a thing would be only a few MB in size. I say should because when I tried it on 9.5.0, the query planner kept ignoring the index.
Both GIN and GIST indexes can be used for full text search (tsearch2), but GIN is faster for queries while GIST is smaller and faster for insertion.
The SP-GIST index is so underdocumented that, when I wanted to try it out in a project, I could not even understand how it works and does it work correctly as a radix tree for strings.
What's also somewhat underdocumented are the "operator classes" and which are and aren't implemented for different index types. I.e. a table of which index type can be used with which data type (and what exactly does such combination produce) would be great.