Most PDF redaction tools only apply a visual overlay. True redaction permanently removes content from the PDF file itself — no text or image data survives in the file.
Most people assume that adding a black box over text in a PDF permanently hides it. This assumption is dangerously wrong. Visual overlays leave the original text intact in the PDF's content streams — anyone with a text editor or the strings command can extract it in seconds.
True PDF redaction means removing the underlying content from the file, not just covering it up. The distinction matters because the consequences of getting it wrong are not theoretical — they have played out in public, repeatedly, in high-stakes legal and government contexts.
PDF files are not images. They are structured data containers. When a PDF viewer renders a page, it reads a content stream — a sequence of operators that describe text runs, images, and drawing commands. The visual result you see on screen is a representation of that underlying data, not the data itself.
When a tool adds a black rectangle annotation to cover sensitive content, it adds a new drawing instruction on top of the existing content stream. The original text objects remain in the file. They are unchanged. The black box is a layer — like placing a sticky note over a word on a printed page. The word is still there. Anyone who lifts the note reads it.
True redaction operates at the content stream level. A compliant tool must:
After this process, the data does not exist anywhere in the file. There is no recovery path because there is nothing to recover.
The mechanics are worth understanding in detail because the failure is not obvious until you know what to look for.
When software adds a visual overlay — a black rectangle annotation, a drawing shape, a whiteout fill — it adds an entry to the PDF's annotation dictionary or appends a new drawing command to the content stream. The existing text objects underneath are untouched. The file size often does not decrease at all; it increases slightly because something was added.
Recovering the hidden content requires no specialized tools:
The overlay is concealment, not removal. Concealment is always reversible.
The gap between apparent and actual redaction has caused documented, public failures:
Paul Manafort court filing (2019): Lawyers filed a court document with sensitive passages covered by black text box overlays — a standard annotation approach. Within hours of publication, reporters copied and pasted the text from the PDF and published the content that was supposed to be hidden. The redaction had done nothing. The underlying text was fully intact and selectable.
NSA intelligence report (2017): A leaked NSA document contained the name of an intelligence officer, nominally redacted with a colored shape layer applied digitally. The name was fully selectable in the published PDF. Journalists reported it within the same news cycle. The "redaction" was purely cosmetic.
These are not edge cases caused by inexperienced users. They represent a predictable outcome of applying visual overlays to a format where the visual rendering layer and the underlying data are independent. For more on the pattern, see why PDF redaction matters and the broader history of visual masking vs real redaction.
| Method | Content recoverable? | Common in free tools | File size after |
|---|---|---|---|
| Black rectangle annotation | Yes — select-all exposes it | Very common | Slightly larger (annotation added) |
| Whiteout fill overlay | Yes — text extractor ignores fill color | Common | Slightly larger |
| Image editor + re-export | Usually no — but text is destroyed | Uncommon | Much larger (image-based PDF) |
| True content stream redaction | No — content objects deleted | Rare in free tools | Smaller (content removed) |
The file size change is a practical diagnostic: if your redaction tool returns a file that is the same size or larger than the original, it almost certainly added an overlay rather than removing content.
Visual overlays are fine for informal contexts — blocking out a price in a screenshot you are sharing in a chat, covering a non-sensitive reference for readability. For any of the following contexts, only true content stream redaction is appropriate:
Legal filings: Court documents containing client names, privileged strategy notes, witness identities, or sealed information. Improper redaction in legal filings has resulted in sanctions, privilege waiver, and published exposure of protected content.
HIPAA-regulated healthcare documents: Patient records, referral packets, insurance claim files, and documents produced in response to subpoenas. HIPAA violations for negligent disclosure carry civil and criminal penalties.
FOIA responses: Government agencies and organizations responding to public records requests must protect informant identities, ongoing investigation details, and personal information of uninvolved individuals. Overlay redaction in FOIA documents has repeatedly exposed protected individuals in published records.
Financial due diligence: Loan applications, account statements, and M&A data rooms contain Social Security numbers, account numbers, and income data. Overlay redaction during deal review has exposed this to adverse parties.
HR and personnel records: Compensation data, performance improvement plans, termination documentation, and investigation reports shared with legal counsel or regulators with certain fields nominally blocked.
Customer data for support and compliance: Ticket exports, CRM records, and log files that must be shared with vendors or regulators with PII stripped.
For a direct comparison of how this plays out between tools, see pdf-redaction-vs-whiteout.
MyPDFBoy uses PyMuPDF to perform content stream redaction. The process:
Files are processed on our server in memory and discarded immediately after the response — no copy is written to disk, no logs contain file content, no persistent storage. No account is required.
The tool supports multi-page documents, scanned PDFs (image-based), and documents in multiple languages. For image-based PDFs, redaction removes the image pixel data within the zone boundaries rather than text stream objects.
After downloading, verify with a text extraction tool:
pdftotext your-redacted-file.pdf -If true redaction was applied, the redacted text will not appear in the output. If you see the content that should have been removed, the tool applied an overlay rather than performing content removal.
For additional verification:
If all three checks pass — no extracted text, nothing appears on paste, file is smaller — the redaction is genuine. The content bytes are not in the file.
Try it free
Permanently remove sensitive content from PDFs with true content stream redaction. No account required. Files are processed in memory and discarded immediately.
PDF RedactionMost tools marketed as free PDF redaction apply visual overlays that leave content extractable. This guide covers which tools actually remove content from the file and which do not.
Adobe Acrobat's redaction is reliable but costs $20/month. Here are five free alternatives that actually remove content from the file, not just cover it up.
PDFs containing personal data fall under GDPR obligations. Here's what you need to know about redaction, retention, and the right to erasure for PDF documents.
We don't use cookies or track you. Your PDFs are processed in-memory and never stored. Privacy policy