Black boxes and white rectangles look like redaction but leave every character intact in the PDF file. Here's how to tell the difference.
There is a version of redaction that feels right, looks right in every PDF viewer, and fails completely the moment anyone knows what to look for. It is called whiteout — or visual masking — and it is the default output of most general-purpose PDF editors and online tools.
True redaction and whiteout look identical on screen. The difference is what happens in the file underneath. With whiteout, the original content is still there, word for word. With true redaction, it is gone — not hidden, deleted. There is no underlying layer to uncover, no annotation to remove, no text to copy out.
Understanding the distinction is not technical trivia. It is the difference between a document that is safe to share and one that is a liability waiting to surface.
When a PDF editor draws a white or black rectangle over your sensitive text, it is doing exactly that: drawing a rectangle. The tool adds a new visual object — a shape, an annotation, a highlight layer — positioned above the existing content. The original text, images, and metadata beneath that shape are untouched.
At the PDF file level, whiteout means the document now contains two things it did not before: the original content you wanted to remove, and a colored rectangle sitting on top of it in the rendering layer.
This is roughly equivalent to taping a piece of paper over words in a printed document. The words are still on the page. You just cannot see them without lifting the tape.
PDF structure makes this even more exploitable than the physical analogy suggests. PDF files separate visual rendering from content storage. Text is stored in content streams as character sequences with positioning data. The viewer reads those streams and renders them to screen. When you add a covering shape, you add it to the rendering layer — but you never touch the content streams where the actual characters live.
A PDF parser does not care about the rendering layer. It reads the content streams directly. The covering rectangle is irrelevant.
True redaction operates at the content stream level. Instead of adding something to cover the content, it removes the content itself.
The process works in several steps. First, the tool identifies every text object and image data block whose bounding box intersects with the redaction zone. Then it removes those objects from the content stream — the bytes are gone from the file. A solid rectangle is written in their place, not as a covering annotation but as the actual content of that region. Finally, the file's internal cross-reference table is rebuilt so the document remains a valid PDF.
After true redaction, the original content does not exist anywhere in the file. There is no layer to peel back, no annotation to delete, no stream to parse. The data has been removed, not rearranged.
This is what tools like MyPDFBoy's redaction tool do — they use PDF libraries that modify content streams directly, not annotation APIs that sit on top of existing content.
The trap is straightforward: in a standard PDF viewer, whiteout and true redaction look identical. Both show a black or white box. Both prevent the reader from seeing the underlying text during normal use. Neither approach leaves an obvious visual signal that one is permanent and one is reversible.
Most people who create PDFs are not PDF engineers. They see a black box and conclude that redaction happened. The tool presented a redaction interface, they drew zones, the result looks redacted — the mental model feels complete even though the underlying operation was purely cosmetic.
This is compounded by the fact that some PDF viewers are stricter about text selection in covered areas than others. In certain configurations, trying to click and drag over a whiteout box will not let you select text, reinforcing the illusion that the content is gone. But this is viewer behavior, not file behavior. The bytes in the content stream have not moved.
Anyone who reads the file with a tool that bypasses the viewer — a command-line extractor, a PDF parsing library, a raw text editor — sees everything immediately.
The deeper issue is that the people most likely to look are the ones you least want finding the content: opposing counsel, investigative journalists, security researchers, and regulators with document forensics experience. As covered in why PDF redaction matters, the failures that make headlines follow a consistent pattern — someone with basic PDF tooling looks at a file that an organization believed was clean.
Recovery requires no special skills and no paid software. Here are the most common methods, in order of how quickly they work.
Select-all and copy. Open the PDF in any viewer. Press Cmd+A (or Ctrl+A on Windows) to select all content. Copy. Paste into a text editor. In many cases, the text under the covering rectangle is selected along with everything else and pastes in full. This works because the viewer selects text from the content stream, not from the visual layer.
pdftotext. This is a standard command-line tool available on every major operating system. It extracts all text from a PDF's content streams and outputs it to the terminal — completely ignoring any visual annotations or covering shapes.
pdftotext document-whiteout.pdf -If the output contains Social Security Number: 847-22-1931 or Account Balance: $247,000 or whatever you covered, the whiteout failed. The - flag prints to stdout; redirect to a file if you want to save it.
strings command. More blunt than pdftotext but requires zero installation. The strings command reads raw bytes from any file and outputs sequences that look like text. PDF content streams are partly human-readable, and text objects appear as legible strings even in raw byte output.
strings document-whiteout.pdf | grep "SSN\|Account\|Confidential"Annotation removal in a PDF editor. Open the file in Adobe Acrobat, PDF-XChange, or any editor that exposes annotation layers. Look for the covering rectangle in the annotation list. Delete it. The original document is now visible, exactly as it was before whiteout was applied.
Opening in a text editor. PDF files are partly human-readable. Open a PDF in any plain text editor. Scroll through the content stream sections and look for text runs. They appear in formats like (Social Security) or (confidential) within the stream data, surrounded by PDF syntax. No tools required beyond something that opens files.
None of these methods require specialized forensics software. A journalist with a laptop, a paralegal who googled "how to read PDF text," or a curious opposing party can do all of this in under five minutes.
These are not hypothetical risks. The pattern of visual masking misidentified as redaction has produced documented failures at the highest levels of government and law.
Paul Manafort Court Filing, 2019. Lawyers for Paul Manafort filed a court document with passages blacked out using text box overlays — the standard whiteout approach. Reporters immediately copied and pasted the content from the filed PDF and published the hidden material within hours of the filing becoming public. The hidden content included information about Manafort's cooperation with investigators. The redaction had done nothing to the content streams. The black rectangles were purely visual. The filing had to be corrected, and the information was already public.
NSA Intelligence Report, 2017. A document linked to a leaked NSA intelligence report contained the name of a covert officer, applied as a digital color layer. When journalists opened the file, the name was fully selectable — clicking and dragging directly over the "redacted" area highlighted the text. The redaction was a visual overlay and nothing more. The name was published. The case became a reference point for how digital redaction failures differ from physical ones: a physical black marker destroys the ink; a digital overlay does not touch the underlying data at all.
British Columbia Government FOIA Releases. Multiple British Columbia government responses to freedom of information requests produced documents with white rectangles covering portions of text. The background of the source documents was also white, which made the "redacted" areas invisible to the naked eye on screen. Researchers and journalists who ran the documents through standard text extraction tools found the covered names, dates, and policy details fully intact. The failures were not isolated incidents but a systemic pattern across multiple releases, ultimately requiring the government to audit its redaction procedures.
Each of these cases follows the same structure: someone applied what they believed was redaction, the visual result looked correct, and the underlying content was trivially recoverable. The how to redact a PDF guide covers the correct process in detail — starting with a tool that actually modifies content streams rather than adding covering layers.
| Method | What it does | Recoverable? | File size changes | Common in | Safe for sensitive data? |
|---|---|---|---|---|---|
| Whiteout / visual overlay | Adds a covering shape on top of existing content | Yes — trivially | Slightly larger (new object added) | Most online PDF tools, drawing tools, annotation tools | No |
| True redaction | Removes content objects from the content stream | No — content deleted | Smaller (data removed) | Dedicated redaction tools, PyMuPDF-based tools | Yes |
The file size difference is a useful diagnostic: if your PDF is the same size or larger after "redaction," that is a signal to investigate. True redaction removes bytes from the file, so the output is consistently smaller than the input. Whiteout adds a new object, so the output is the same size or marginally larger.
Not everything needs true redaction, and it would be misleading to suggest otherwise.
Whiteout is appropriate when the goal is visual markup rather than data removal. If you are annotating a draft document for a colleague and want to visually mark a section as outdated, a covering rectangle communicates that clearly without creating a security concern — because the document is not being shared with anyone who should not see the underlying content.
Printed documents are another context. If you print a PDF and then apply physical whiteout to a printed copy before handing it to someone, no PDF content stream exists anymore. The physical document's content is what it is. Physical whiteout on a printed page is a different class of operation entirely.
Review workflows often use visual markups specifically because they need to be reversible. A legal team reviewing a draft for production may add red boxes to flag text that might need redaction, pending a final decision. That annotation is intentionally temporary. True redaction would be premature at that stage.
The distinction to carry: if the document will be shared digitally and the recipient should not be able to access the covered content under any circumstances, whiteout is the wrong tool. If the goal is visual communication with people who already have access to the full document, whiteout is fine.
Before trusting any tool with sensitive content, verify its claims. Many tools describe their output as "redacted" in marketing copy while applying visual masking under the hood. The visual masking vs real redaction post covers this taxonomy in full, but here is a practical verification checklist:
MyPDFBoy sends your file to a processing server, handles it in memory, performs content stream redaction using PyMuPDF, returns the output, and discards the file. Nothing is written to disk, nothing is retained. That pipeline is what content stream redaction requires — it cannot happen in a browser tab.
Try it free
True content stream redaction — not visual overlays. Remove text and images permanently from your PDFs.
PDF RedactionThe visual appearance of redaction and the reality of redaction have been confused long enough that the confusion has a body count: court cases compromised, intelligence sources exposed, protected individuals identified. The gap between "looks redacted" and "is redacted" is the entire security surface. Close it by using a tool that operates on the file, not on the view.
Most tools marketed as free PDF redaction apply visual overlays that leave content extractable. This guide covers which tools actually remove content from the file and which do not.
Adobe Acrobat's redaction is reliable but costs $20/month. Here are five free alternatives that actually remove content from the file, not just cover it up.
PDFs containing personal data fall under GDPR obligations. Here's what you need to know about redaction, retention, and the right to erasure for PDF documents.
We don't use cookies or track you. Your PDFs are processed in-memory and never stored. Privacy policy