How Mindbox replaced PySpark with YAML-based pipelines using dlt, dbt, and Trino
By
Kiril Kazlou
3d ago· 11 min readenInsight
95/100
Golden Brown
Bagelometer↗
The kind of bagel that ruins lesser bagels for you.
Score95TypeanalysisSentimentpositive
Summary
Data engineer Kiril Kazlou describes how Mindbox replaced PySpark-based data pipelines with a stack using dlt, dbt, and Trino, configured through just 4 YAML files. This transformation allowed analysts with no Python experience to build and deploy data pipelines in a single day, compared to the previous three-week turnaround time when every pipeline required a developer. The article details the technical architecture, the reasoning behind each tool choice, and the organizational impact of empowering non-engineers to own data workflows.
Key quotes
· 3 pulledIt took us three weeks to ship a single data pipeline. Today, an analyst with zero Python experience does it in a day.
You can't really work with PySpark without Python experience. Every new pipeline required a developer. And that meant waiting — sometimes for weeks.
We replaced Python pipelines with dlt, dbt, and Trino — and cut delivery time from weeks to one day.
How we replaced Python pipelines with dlt, dbt, and Trino — and cut delivery time from weeks to one day.
