SQL is better than 99% of the nosql alternatives out there. But one thing it fal...

3dbrows · on May 3, 2020

Have you seen TimescaleDB? https://www.timescale.com/

snidane · on May 3, 2020

I've seen it.

Most of these 'time series databases' are for processing of structured logs and metrics to be plugged into you favorite system for monitoring site latency.

Asof join is still an open issue in their issue tracker so it is not usable as a time series database.

https://github.com/timescale/timescaledb/issues/271

claytonjy · on May 3, 2020

If AS OF is a dealbreaker for you, so be it, but I think TimescaleDB is trying hard to be more than a "simple timeseries" aggregator like so many competitors. We're using it to store hundreds of millions of complex IIOT data packets, where a fully normalized packet adds dozens of rows across ~4 tables.

Having it be mostly plain-ol-postgres has been huge for designing and evolving the data model in a way that I'd hate to do without "real" SQL.

snidane · on May 3, 2020

I'm sure your product is great for many customers in their domains.

The asof and other time based joins are a necessity for just slightly more complicated data processing than just a trivial aggregation at the tail of the stream which is majority of use cases in IOT and real time monitoring.

Without that I can pipe my data to /dev/null giving me even better write performance while also not being able to solve this common financial industry use case.

claytonjy · on May 4, 2020

Sorry, didn't mean to imply I work there, I don't; my employer uses Timescale and I designed our schema.

Timescale has been quite performant for us fetching arbitrary past time windows, but I'll stop the cheerleading there as I think mfreed covered the details far better.

I'd certainly still like to see AS OF support!

mfreed · on May 3, 2020

(Timescale person here)

Just an observation that the parent post was the unordered "set" nature of tables, and not strictly about ASOF joins.

I should point out there is an alternative reason for this in TimescaleDB (and many other databases): to support loose-time-order writes (or even data backfill), where records don't always arrive in perfect timestamp order.

So some of these engineering decisions are to balance tradeoffs across insert vs. query rates.

In TimescaleDB, for example, we write data to time intervals in arrival order, not timestamp order, in order to achieve higher writes rates. On the other hand, we support the automated asynchronous rewriting of data to reorder data after it reaches a certain age, according to arbitrary columns.

This can be used to reorder data precisely in timestamp order. It's also highly useful and commonly used for composite reorderings, such that you can then reorder on (some_id, timestamp DESC), such that, within a time interval (chunk in TimescaleDB), data belonging to each some_id is collocated on disk, and within each id, it's sorted on disk again by timestamp. (Here, some_id might be a unique hostname, device_id, metric_id, stock ticker symbol, etc.)

These allow you to take much faster queries for SELECT * from table WHERE some_id = foo and time > X and time < Y.

We also take a similar approach when performing asynchronous columnar-style compression, where we "segment" together many records belonging to the same id into a single internal row, then order those records in array style within each row, sorted by some column (typically timestamp) before applying a type-specific compression algorithm to then. Again, all metrics then belonging to some_id are collocated, ordered by timestamp as an array on out-of-line pages so that only the requested columns need to be fetched from disk. And this gets hidden behind a SQL view that makes it look like you are still interacting with a standard row-based table.

Anyway, totally appreciate that we haven't prioritized ASOF JOINs, as our experience is that these typically come up in more specific financial use cases, rather than the vast majority of time-series use cases.

And we also find that our "gapfill/LOCF" functionality can be used to bucket-aggregate data with different ids for related behavior. That is, it's a data modeling choice to store different ticker symbols in different tables (and then ASOF JOIN), and not necessarily the only way to achieve the desired goals.

https://docs.timescale.com/latest/api#add_reorder_policy https://docs.timescale.com/latest/using-timescaledb/compress... https://docs.timescale.com/latest/api#time_bucket_gapfill

And thanks clatonjy for your use!