Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
PostgreSQL Index Internals (pgcon.org)
211 points by snaga on Aug 9, 2016 | hide | past | favorite | 21 comments


The thing is, there's not much documentation on where each index type would be applicable, what their upsides and downsides are in particular situations.

For example, the hash index is (currently) not WAL-logged, which means it's not crash safe and not replicated. It also may have some concurrency issues, the resources are sparse on this one.

The BRIN index should be ideal for situations where there is a huge storage of data (TBs) inserted in sequential order and rarely queried - a BRIN index for such a thing would be only a few MB in size. I say should because when I tried it on 9.5.0, the query planner kept ignoring the index.

Both GIN and GIST indexes can be used for full text search (tsearch2), but GIN is faster for queries while GIST is smaller and faster for insertion.

The SP-GIST index is so underdocumented that, when I wanted to try it out in a project, I could not even understand how it works and does it work correctly as a radix tree for strings.

What's also somewhat underdocumented are the "operator classes" and which are and aren't implemented for different index types. I.e. a table of which index type can be used with which data type (and what exactly does such combination produce) would be great.


> The BRIN index should be ideal for situations where there is a huge storage of data (TBs) inserted in sequential order and rarely queried - a BRIN index for such a thing would be only a few MB in size. I say should because when I tried it on 9.5.0, the query planner kept ignoring the index.

A BRIN index only keeps summary information for a range of pages. Because of this, a BRIN can only be used for a bitmap scan and cannot be used for a normal index scan. This makes the set of queries where the planner will use the BRIN index really small. It is also possible, since it is such a common mistake, that you forgot to run ANALYZE and so the planner had no clue the data was in sequential order.


Agreed. "Somewhat underdocumented" for, in places, some very large values of somewhat.


To all the upvoters, did you actually read the linked slides or are you just upvoting because it has "PostgreSQL" in the name[1]? Interesting to see this many votes with zero comments.

It is a great set of slides by the way. The btree piece would be old news to anyone familiar with database related data structures but the detail on GIN, GIST, and particularly BRIN make for a good read. Seeing the physical layout for each of those gives you a much better understanding of the cost of maintenance and what situations would they'd be useful.

Also, if you happened to follow the recent Postgres/Uber thread, this gives a lot more context as to how updates to a row would cascade to many writes for each index.

[1]: ... which isn't a bad idea anyway because PostgreSQL is awesome!


At least for me, I have two different voting style:

- Yeah, this is pretty cool. Upvote!

- I've got to save this for later when I can really read it

PostgreSQL stuff tends to be pretty reliably good, and usually with an educational bent. So I'll tend to use the upvote as a bookmark to go back to later.

I gotta say, I don't really like slides, but I've yet to be disappointed by a PostgreSQL post. And usually, a high vote/comment ratio is a nice indicator of quality.

EDIT: formatting


You might be interested in the "favorite" link under each submission title.


You know, I actually never noticed that. Is that another cool whizbang feature on the site? Somewhere along the line I got it into my head that upvotes were just how you favorited things.

Well. Time to start indexing my upvotes... (later...)


But I think the "favorites" are publicly view-able, isn't it? Is there any way to make them private?


From viewing my profile it appears that they are public. I've never actually used it, just noticed the feature a while back. I've been using Instapaper for ages for my "read later" links.


Let me tell you one open secret on HN. If you see high upvote article on the front page and it stays there for a somewhat longer time ,chances are it is of great quality stuff.Most folks here can relatively separate the wheat from the chaff.


Same question could be asked of anything "written in Go"


> Same question could be asked of anything "written in Go"

Haha. I 've noticed that as well.

It does beg the question, would a Postgres driver written in Go cause the HN front page to sink into a black hole of upvotes?


Only if it was written by PG...but last I heard he's a die hard LISP:er so we're probably safe..


> did you actually read the linked slides?

Valid question. Yes, I did.

I wish the final slide (which summarizes what each index type is good for) had been at the beginning also.


I watched the video, instead of the slides.


is there a book/documents about DB internals ? I have found some books about theory but not about an actual implementation.


Because there are many different implementations, with different trade-offs and design decisions. Also, the implementation-specific documentation is typically kept fairly close to the code, as the docs need to be kept in sync.

So for example if you need to know how indexes in PostgreSQL work, look into the READMEs in the proper directory:

b-tree indexes: https://github.com/postgres/postgres/blob/master/src/backend...

gin: https://github.com/postgres/postgres/blob/master/src/backend...

brin: https://github.com/postgres/postgres/blob/master/src/backend...

etc. The READMEs also include links to related papers etc.


hey thanks!, tho I agree with this, I was thinking, if there are books about kernel internals there may be some about db internals.

but Thanks for your links !


This is a good summary paper on the database internals and major architectural decisions: http://db.cs.berkeley.edu/papers/fntdb07-architecture.pdf


If what you are looking is a general book (not necessarily just for Postgres) I can't recommend the "Use The Index, Luke"[1] tutorial, and the accompanying book "SQL Performance Explained" by Markus Winand, enough.

It doesn't go into a lot of relational theory, but it does provide a somewhat low-level description on how indices and database data are stored, and why you want to use different kind of indices in different scenarios.

[1] http://use-the-index-luke.com


It's useful (probably more useful for most of us than any kind of deep dive into internals, at least for a first read), but I don't remember it going that deep into implementation decisions.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: