All Topics
All Topics
Technology
Technology
Design
Design
Programming
Programming
Science
Science
News
News
Gaming
Gaming
Entertainment
Entertainment
Business
Business
Finance
Finance
Sports
Sports
Health
Health
Food
Food
Travel
Travel
Art
Art
Music
Music
Books
Books
Education
Education
Politics
Politics
Personal
Personal
No algorithm. No AI slop. No ads. Just RSS. Pro-human. Indie writers. Real journalism. Open web. Chronological. Hand toasted.

hist-rs: High-Performance Command-Line Tool for Counting Unique Lines with 25x Speed Improvement

By

noamteyssier

7mo ago· 3 min readenCode

Summary

hist-rs is a high-performance command-line tool for counting unique lines in text files, offering significant speed improvements over traditional Unix pipelines like 'sort | uniq -c'. The tool provides equivalent functionality to the standard Unix command chain but with 25x faster performance, as demonstrated through benchmarks comparing various tools. It also supports deduplicating input streams by printing only unique lines. The article includes installation instructions, usage examples, and detailed performance benchmarks using hyperfine on a 1M line FASTQ file generated with nucgen.

Key quotes

· 5 pulled
A high-throughput CLI to count unique lines.
This is a standalone tool with equivalent functionality to sort | uniq -c | sort -n.
There is also support for deduplicating an input stream (i.e. only printing unique lines).
I am measuring the performance of equivalent sort <file | uniq -c | sort -n functionality.
An efficient unique-line counter (25x over `sort | uniq -c`).
Snippet from the RSS feed
An efficient unique-line counter (25x over `sort | uniq -c`) - noamteyssier/hist-rs

You might also wanna read