Methodology
How we build and maintain the archive
Data sourcing
All files in this archive originate from the Department of Justice Epstein Files Portal. We download files directly from the DOJ's official data set pages for each of the twelve data sets. No files from unofficial or third-party sources are included.
File conversion
The DOJ releases files primarily as PDF documents. We convert each PDF page into an optimized WebP image for faster loading and consistent display across devices. The conversion process:
- Rasterizes each PDF page at a configurable DPI (default: 150) for readable detail
- Encodes to WebP format with quality optimization for a balance of clarity and file size
- Preserves multi-page documents as linked page sequences, so you can navigate through an entire document
- Processes image files (JPEG, PNG, GIF) by converting them to WebP without altering content
No content is added, removed, or altered during conversion. The visual content of each page matches the DOJ original.
Indexing and metadata
After conversion, each file is uploaded to cloud storage and indexed. The index records:
- File key (derived from the original DOJ file name)
- File size
- Upload timestamp
- Data set identifier
- Source URL (link back to the DOJ data set page)
- Page count and page sequence for multi-page PDF documents
The index is cached and updated when new files are ingested. Search operates against file keys, allowing lookup by document identifiers and naming patterns.
Hosting and delivery
Images are stored on Cloudflare R2 and served via Cloudflare's global CDN. The web application runs on Cloudflare Workers. This architecture provides fast global access with aggressive caching for static assets (images are immutable once uploaded).
Accuracy and limitations
We make reasonable efforts to ensure all indexed files match the DOJ originals, but this archive has limitations:
- PDF-to-image conversion may introduce minor rendering differences (font smoothing, color profiles) compared to viewing the original PDF
- The archive may not include every file from every data set if DOJ pages were updated after our last ingestion
- Search is limited to file name matching; full-text search of document content is not currently supported
For authoritative use, always verify against the original DOJ source.