The most interesting thing about zstandard is how easy it makes it to train cust...

mbb70 · on Oct 3, 2023

We had billions of Protobufs to store in Cassandra as byte blobs, using a zstd dictionary dramatically reduced storage size and improved latency over the built in compression. The complexity overhead of managing these dictionaries and making sure the client always has access to the right dictionary to decompress was non-trivial but well worth it.

We looked at Brotli as well but decompression speed at acceptable ratio was the most important factor for us, that plus the far superior docs and evangelism sealed the deal for zstd.

metadat · on Oct 3, 2023

What kind of additional gains (% wise) did you see with custom dictionaries compared to vanilla zstd?

bhouston · on Oct 3, 2023

Depends on how much shared entropy your data has. Could you test this by trying to compress all your content into one stream (shared dictionary) compared to compressing it into separate streams (no shared dictionary)?

bhouston · on Oct 3, 2023

We think similarly. :) I posted the same idea while you were typing yours: https://news.ycombinator.com/item?id=37755005