1. On Clickbench, make sure you're doing an apples-to-apples comparison by comparing scores from the same instance. We used the most commonly-used c6a.4xlarge instance. While a few databases like DuckDB rank higher, the performance of Datafusion (our underlying query engine) is constantly improving, and pg_analytics inherits those improvements.
Then again, people only care about performance and benchmarks up to a certain threshold. The goal of pg_analytics is not to displace something like StarRocks, but to enable analytical workloads that require both row and column-oriented data or Postgres transactions.
2. We're working on TPC-H benchmarks. They're good for demonstrating JOIN performance and we'll have them published early next week.
Most of the time, all that matter in terms of performance is user's tolerance. Once that is reached, operational complexity becomes a lot more important. We use raw Postgres for analytics, knowing that projects like these and cloud offerings like AlloyDB will make our lives easier (in terms of performance) as time goes.
pg_bm25 looks awesome too! Next up, take fdw to the level of Trino/Drill, and we dont need anything else other than postgres and its extensions!
1. On Clickbench, make sure you're doing an apples-to-apples comparison by comparing scores from the same instance. We used the most commonly-used c6a.4xlarge instance. While a few databases like DuckDB rank higher, the performance of Datafusion (our underlying query engine) is constantly improving, and pg_analytics inherits those improvements.
Then again, people only care about performance and benchmarks up to a certain threshold. The goal of pg_analytics is not to displace something like StarRocks, but to enable analytical workloads that require both row and column-oriented data or Postgres transactions.
2. We're working on TPC-H benchmarks. They're good for demonstrating JOIN performance and we'll have them published early next week.