Kore: A New High-Performance Columnar File Format for Big Data Analytics
By
arunkatherashala
A weekday bagel. Dependable, satisfying, no fuss.
Summary
Kore is a new high-performance binary file format for analytical workloads, claiming superior compression (38% vs 63% for Parquet), 131x query speedup with column pruning and predicate pushdown, zero data loss verification, and native Spark integration. The project is hosted on GitHub under the name "Kore — Killer Optimized Record Exchange" and is currently at version 0.1.0, with a Rust library available for reading and writing data.
Key quotes
· 5 pulledKORE is a high-performance binary file format optimized for analytical workloads.
38% compression ratio (vs 63% for Parquet)
131x query speedup with column pruning & predicate pushdown
Zero data loss verification (400K+ cells tested)
Native Spark integration — read/write with PySpark
You might also wanna read
How Mindbox replaced PySpark with YAML-based pipelines using dlt, dbt, and Trino
Data engineer Kiril Kazlou describes how Mindbox replaced PySpark-based data pipelines with a stack using dlt, dbt, and Trino, configured th
Six SQL patterns for detecting transaction fraud in benefit programs
A data professional on a program-integrity team shares six practical SQL patterns for detecting transaction fraud in government benefit prog
Rocky: A Rust-Based Control Plane for Data Warehouse Pipeline Management
Rocky is a Rust-based control plane for data warehouse pipelines that provides branching, replay, column-level lineage, compile-time safety,
Zibra AI Launches GPU-Native Data Orchestration Platform for Spatial AI Training
Zibra AI introduces a GPU-native data orchestration platform designed to solve I/O bottlenecks in spatial and physical AI training. The plat
Columnar Storage as Database Normalization: Understanding the Relational Foundation
The article explains that columnar storage in databases is essentially a form of normalization within the relational model, not a completely
buttondown.com·1mo agoSeeknal: A CLI Tool for Data & AI/ML Pipelines with Natural Language Queries
Seeknal is a CLI tool for data and AI/ML engineering that allows users to define pipelines in YAML or Python, run a safe draft→dry-run→apply
