Methodology

How we build and maintain the archive

Data sourcing

All files in this archive originate from the Department of Justice Epstein Files Portal. We download files directly from the DOJ's official data set pages for each of the twelve data sets. No files from unofficial or third-party sources are included.

File conversion

The DOJ releases files primarily as PDF documents. We convert each PDF page into an optimized WebP image for faster loading and consistent display across devices. The conversion process:

Rasterizes each PDF page at a configurable DPI (default: 150) for readable detail
Encodes to WebP format with quality optimization for a balance of clarity and file size
Preserves multi-page documents as linked page sequences, so you can navigate through an entire document
Processes image files (JPEG, PNG, GIF) by converting them to WebP without altering content

No content is added, removed, or altered during conversion. The visual content of each page matches the DOJ original.

Indexing and metadata

After conversion, each file is uploaded to cloud storage and indexed. The index records:

File key (derived from the original DOJ file name)
File size
Upload timestamp
Data set identifier
Source URL (link back to the DOJ data set page)
Page count and page sequence for multi-page PDF documents

The index is cached and updated when new files are ingested. Search operates against file keys, allowing lookup by document identifiers and naming patterns.

Hosting and delivery

Images are stored on Cloudflare R2 and served via Cloudflare's global CDN. The web application runs on Cloudflare Workers. This architecture provides fast global access with aggressive caching for static assets (images are immutable once uploaded).

Accuracy and limitations

We make reasonable efforts to ensure all indexed files match the DOJ originals, but this archive has limitations:

PDF-to-image conversion may introduce minor rendering differences (font smoothing, color profiles) compared to viewing the original PDF
The archive may not include every file from every data set if DOJ pages were updated after our last ingestion
Search is limited to file name matching; full-text search of document content is not currently supported

For authoritative use, always verify against the original DOJ source.