All Topics
All Topics
Technology
Technology
Design
Design
Programming
Programming
Science
Science
News
News
Gaming
Gaming
Entertainment
Entertainment
Business
Business
Finance
Finance
Sports
Sports
Health
Health
Food
Food
Travel
Travel
Art
Art
Music
Music
Books
Books
Education
Education
Politics
Politics
Personal
Personal
No algorithm. No AI slop. No ads. Just RSS. Pro-human. Indie writers. Real journalism. Open web. Chronological. Hand toasted.

The Invalid Surrogate Pair Bug: A Software Engineer's Tale of Emoji and Encoding

By

meysamazad

15d ago· 9 min readenInsight

Summary

A software engineer recounts their favorite bug story involving invalid surrogate pairs in Unicode/emoji handling. While migrating a legacy editor to a real-time collaborative experience using TipTap, ProseMirror, and Yjs CRDT, they encountered a bug where two emoji characters would cause the editor to break when entered together. The article explores the technical underpinnings of surrogate pairs in UTF-16 encoding, how emoji are represented, and the fascinating edge case that led to data loss. An interactive tool is provided for readers to explore the concepts.

Key quotes

· 3 pulled
If you're in the business of building things that run on computers long enough, I think you will eventually acquire a favorite bug story.
The bug: two emoji enter, none leave.
TipTap on top (itself a wrapper around ProseMirror), Yjs underneath handling the CRDT magic for real-time syncing. It worked well! Mostly.
Snippet from the RSS feed
In which I revisit one of my favorite bugs, the invalid surrogate pair.

You might also wanna read