All Topics
All Topics
Technology
Technology
Design
Design
Programming
Programming
Science
Science
News
News
Gaming
Gaming
Entertainment
Entertainment
Business
Business
Finance
Finance
Sports
Sports
Health
Health
Food
Food
Travel
Travel
Art
Art
Music
Music
Books
Books
Education
Education
Politics
Politics
Personal
Personal
No algorithm. No AI slop. No ads. Just RSS. Pro-human. Indie writers. Real journalism. Open web. Chronological. Hand toasted.

Efficiently mounting tar.gz archives in WebAssembly using file index metadata

By

datajeroen

1mo ago· 4 min readen

Summary

This article describes a technique for mounting tar.gz archives directly into the Emscripten virtual filesystem without extracting them. Instead of downloading, decompressing, and extracting files from a tarball, the author proposes generating a small index file that lists the size and offset of each file in the tar archive. This metadata allows the tar blob to be mounted directly via Emscripten's WORKERFS, eliminating the need for copying files. The approach is particularly useful for WebAssembly applications that need to access data from tarballs efficiently.

Key quotes

· 3 pulled
Lots of data on the internet lives in tarballs, often distributed as gzipped .tar.gz files.
To get to this data, we have to download the entire .tar.gz file, decompress it, and then iterate through the blob from beginning to end to make copies of the files we need. This is expensive.
Instead of extracting a .tar.gz archive, we can generate a small index file which lists the size and offset of each file in the tar, and use this metadata to mount the tar blob directly via Emscripten's WORKERFS without any copying.
Snippet from the RSS feed
How to use a file index to mount a tar.gz archive directly into the Emscripten virtual filesystem without extracting it

You might also wanna read