Most people assume a black box on a PDF means the data is gone. We tested 100 real redacted PDFs and found that 68 of them still had the original text fully recoverable.
In December 2024, court documents from the Jeffrey Epstein case were released with heavy redaction. Within hours, researchers online had recovered names and details from beneath the black boxes using nothing more than a PDF reader and copy-paste. The redaction had been applied as a visual overlay — a rectangle drawn on top — leaving the original text intact in the file.
This was not an isolated incident. Paul Manafort's court documents, filed under seal and then improperly released, exposed sensitive case strategy because the redaction was a colored shape, not a content removal. A Meta legal filing in 2021 revealed internal revenue figures the company had intended to withhold — again, fake redaction.
These incidents share the same root cause: the people applying redaction did not understand that drawing a box over text in a PDF does not delete the text.
We decided to quantify how widespread this problem actually is.
We collected 100 publicly available PDF documents that contained visible redaction — black or white rectangles applied over text — from three categories:
All documents were obtained through public channels: PACER, government FOIA portals, SEC EDGAR, and direct links from news coverage. No documents were obtained through hacking or unauthorized access.
The detection method is straightforward. A genuine redaction removes content from the PDF's internal data structures. A fake redaction adds a visual element on top while leaving the data untouched.
Our checker tool works in three passes:
Pass 1 — Text extraction below annotations. Using PyMuPDF, we extract text from rectangles on each page that are covered by black or white annotation objects. If text is returned, the content is recoverable.
Pass 2 — Annotation layer inspection. We enumerate every annotation in the document. Redact-type annotations (/Redact) that have been applied via a compliant redaction workflow leave specific markers. Rectangle annotations (/Rect) drawn as overlays leave a different signature. Any /FreeText or /Square annotation covering a significant area is flagged as a probable fake redaction.
Pass 3 — Content stream diffing. For documents with multiple revisions (incremental updates), we compare the content stream from each revision. A real redaction modifies the content stream directly. A visual overlay only adds to the annotation structure while the content stream remains unchanged.
Documents that failed any of these passes were classified as having recoverable content.
Of the 100 documents tested:
That means 82 out of 100 redacted documents had at least some issue: either fully fake redaction or inconsistent application.
The breakdown by document category was significant:
| Category | Tested | Recoverable | Partial | Safe |
|---|---|---|---|---|
| Court filings | 42 | 31 (74%) | 7 (17%) | 4 (10%) |
| Government disclosures | 31 | 22 (71%) | 5 (16%) | 4 (13%) |
| Corporate documents | 27 | 15 (56%) | 2 (7%) | 10 (37%) |
Corporate documents fared better, likely because legal and compliance teams in large organizations have more rigorous review processes or use Adobe Acrobat's built-in redaction feature more consistently. Court filings had the worst rate — a problem given that legal documents often contain the most sensitive personal and strategic information.
The pattern is consistent across every category of document. Someone needed to redact a PDF. They opened it in whatever tool they had — often a browser PDF viewer, a free online editor, or the annotation tools in Microsoft Word's PDF export. They drew rectangles. They exported. The document looked redacted.
The disconnect is that PDF rendering hides the internal structure from users. A black rectangle over text looks identical whether the text is gone or still present. There is no visual signal that content is recoverable. Users have no reason to suspect their redaction failed.
The tools bear significant responsibility here. Many applications that allow you to draw on PDFs do not distinguish between annotation (which is a layer on top) and editing (which modifies the content). They make it easy to make a document look redacted while making it impossible to know whether it actually is.
Some tools that advertise "PDF redaction" features use annotation-based approaches internally. The marketing uses the right words. The implementation does not do what the words imply.
True PDF redaction requires two operations, not one:
The second step is the one that most annotation-based tools skip entirely. The mark is created. The removal never happens.
Compliant redaction workflows include Adobe Acrobat's "Mark for Redaction" + "Apply Redactions" sequence, PyMuPDF's add_redact_annot() + apply_redactions() pair, and a handful of purpose-built tools. Most general-purpose PDF editors do not implement the second step.
After applying real redaction, the text does not exist anywhere in the file. It cannot be recovered by any tool because the data is not there. The content stream contains the removal markers, not the original content.
If you have published or plan to publish a redacted PDF, you can verify whether the redaction is real before sending it.
Our free checker tool accepts any PDF and runs the same three-pass analysis we used in this study. It returns a per-page report showing which zones have recoverable content and which are clean.
Try it free
Upload your PDF and we'll tell you if the content under each redaction zone is recoverable.
checkerIf the checker finds recoverable content, you need to re-redact using a tool that actually removes the content rather than covering it. You can do that without leaving this page.
Try it free
Draw zones, download. Content is removed from the PDF content stream — not covered, not hidden, deleted.
redactThe fix is straightforward once you know the problem exists. The dangerous part is not knowing that it does.
Most tools marketed as free PDF redaction apply visual overlays that leave content extractable. This guide covers which tools actually remove content from the file and which do not.
Adobe Acrobat's redaction is reliable but costs $20/month. Here are five free alternatives that actually remove content from the file, not just cover it up.
PDFs containing personal data fall under GDPR obligations. Here's what you need to know about redaction, retention, and the right to erasure for PDF documents.
We don't use cookies or track you. Your PDFs are processed in-memory and never stored. Privacy policy