DeepTagger: A No-Code Platform for Extracting Structured Data from Documents Using Interactive Labeling
By
Talshyn Nova
Has the shape of a bagel but none of the steam.
Summary
DeepTagger is a no-code platform born from the challenge of extracting structured data from the Enron Email dataset during a PhD project. The creators struggled with custom parsers, RegEx, and traditional ML tools like spaCy Prodigy and Label Studio when trying to split long email chains into individual emails. They built their own annotation tool that uses user annotations as examples to extract information from new documents, making human judgment scalable through an API-accessible platform.
Key quotes
· 5 pulledThis product was born out of real-life problems.
Custom Python parsers failed. RegEx broke.
Traditional ML tools, such as spaCy Prodigy, or Label Studio, couldn't handle the complexity.
Doing it manually would have meant admitting defeat.
So we built our own annotation tool that could handle nested
You might also wanna read
DeepCode: Open-Source Multi-Agent System for Automated Code Generation (Paper2Code, Text2Web, Text2Backend)
DeepCode is an open-source multi-agent coding system developed by HKUDS that enables automated code generation through three main capabiliti
Reducto Launches Deep Extract: AI-Powered Structured Data Extraction with Self-Verification
Reducto has launched Deep Extract, a new AI-powered structured extraction system that uses an agent-in-the-loop approach to verify and corre
