R2 SQL - R2 SQL now supports approximate aggregation functions

R2 SQL now supports five approximate aggregation functions for fast analysis of large datasets. These functions trade minor precision for improved performance on high-cardinality data. New functions APPROX_PERCENTILE_CONT(column, percentile) — Returns the approximate value at a given percentile (0.0 to 1.0). Works on integer and decimal columns. APPROX_PERCENTILE_CONT_WITH_WEIGHT(column, weight, percentile) — Weighted percentile calculation where each row contributes proportionally to its weight column value. APPROX_MEDIAN(column) — Returns the approximate median. Equivalent to APPROX_PERCENTILE_CONT(column, 0.5) . APPROX_DISTINCT(column) — Returns the approximate number of distinct values. Works on any column type. APPROX_TOP_K(column, k) — Returns the k most frequent values with their counts as a JSON array. All functions support WHERE filters. All except APPROX_TOP_K support GROUP BY . Examples -- Percentile analysis on revenue data SELECT approx_percentile_cont (total_amount, 0 . 25 ), approx_percentile_cont (total_amount, 0 . 5 ), approx_percentile_cont (total_amount, 0 . 75 ) FROM my_namespace.sales_data -- Median per department SELECT department, approx_median(total_amount) FROM my_namespace.sales_data GROUP BY department -- Approximate distinct customers by region SELECT region, approx_distinct(customer_id) FROM my_namespace.sales_data GROUP BY region -- Top 5 most frequent departments SELECT approx_top_k(department, 5 ) FROM my_namespace.sales_data -- Combine approximate and standard aggregations SELECT COUNT ( * ), AVG (total_amount), approx_percentile_cont (total_amount, 0 . 5 ), approx_distinct(customer_id) FROM my_namespace.sales_data WHERE region = 'North' For the full syntax and additional examples, refer to the SQL reference .

R2 SQL - R2 SQL now supports approximate aggregation functions

Source

You might also wanna read

Version 18.2

Technical Discussion: Distributed SQL Engine Requirements for Ultra-Wide Tables in ML and Multi-Omics Data

How rqlite Addresses the CAP Theorem in Distributed Databases

ClickHouse AggregatingMergeTree: Accelerating Analytics with Pre-Aggregated Data

Reevaluating Sixth Normal Form (6NF) in Relational Database Modeling

SQLite Query Optimization: Using Uncorrelated Subqueries to Skip Correlated Subqueries for Performance Gains

Comments