All Topics
All Topics
Technology
Technology
Design
Design
Programming
Programming
Science
Science
News
News
Gaming
Gaming
Entertainment
Entertainment
Business
Business
Finance
Finance
Sports
Sports
Health
Health
Food
Food
Travel
Travel
Art
Art
Music
Music
Books
Books
Education
Education
Politics
Politics
Personal
Personal
No algorithm. No AI slop. No ads. Just RSS. Pro-human. Indie writers. Real journalism. Open web. Chronological. Hand toasted.

An Algorithm for Computing Optimal Tokenizers in Practice

By

mcyc

9h ago· 11 min readenInsight

Summary

This article presents an algorithm capable of computing an optimal tokenizer in certain settings, despite optimal tokenization being theoretically intractable. The author draws parallels to the Traveling Salesman Problem (TSP), where difficult instances can be solved optimally using cutting-plane techniques. The post explores the practical solvability of a theoretically hard problem in natural language processing.

Key quotes

· 3 pulled
In this post, I will present an algorithm that was able to compute an optimal tokenizer in some settings.
This result is cool because optimal tokenization is theoretically intractable, but seems to be solvable in practice.
My finding is very similar to various results on the Traveling Salesman Problem (TSP), where even difficult instances can be solved optimally using cutting-plane techniques.
Snippet from the RSS feed
In this post, I will present an algorithm that was able to compute an optimal tokenizer in some settings. This result is cool because optimal tokenization is theoretically intractable, but seems to be solvable in practice.

You might also wanna read