New Generation LLMs Show Improved Character-Level Text Manipulation Capabilities
By
curioussquirrel
If you only eat one bagel today, this is the bagel.
Summary
The article discusses how the latest generation of large language models (LLMs) like GPT-5 and Claude 4.5 have shown significant improvements in character-level text manipulation tasks, including character counting, character manipulation in sentences, and solving encoding and ciphers. This represents a notable advancement over previous LLM generations that struggled with granular character-level operations due to tokenization processes where text is encoded as tokens representing character clusters or full words rather than individual characters.
Key quotes
· 4 pulledSurprisingly, the newest models were able to solve these kinds of tasks, unlike previous generations of LLMs.
LLMs handle individual characters poorly. This is due to all text being encoded as tokens via the LLM tokenizer and its vocabulary.
Individual tokens typically represent clusters of characters, sometimes even full words (especially in English and other common languages in the training dataset).
This makes any considerations on a more granular level than tokens fairly difficult, although LLMs have been capable of certain simple tasks (such as spelling out individual characters in a word) for a while.
You might also wanna read
Cisco Researchers Find Multi-Turn Conversations Can Bypass LLM Safety Guardrails
Researchers at Cisco have discovered that safety guardrails in major large language models (LLMs) — including ChatGPT, Claude, Gemini, Amazo

OpenAI Releases GPT-5 for All ChatGPT Users, Marking a Major AI Advancement
OpenAI is launching GPT-5, its latest AI model, for all ChatGPT users and developers. CEO Sam Altman describes GPT-5 as a significant advanc

OpenAI launches GPT-5.5 with improved coding and cross-tool capabilities
OpenAI has announced GPT-5.5, its latest AI model, just one month after releasing GPT-5.4. The company claims the new model excels at writin
