A black box on a PDF is not redaction unless the content underneath is gone. Most free tools skip the hard part. Here is how to tell the difference and why it matters.
When organizations need to share a document with sensitive information removed, most reach for whatever PDF editor they already have. They draw a black box. The box looks convincing. They send the file.
This pattern has caused real-world data breaches. Lawyers have exposed litigation strategy. Government agencies have leaked intelligence sources. Medical providers have published patient identifiers. In almost every case, the "redacted" PDF contained the original content verbatim, hidden under a rectangle that anyone could move, delete, or bypass.
The root cause is a misunderstanding about how PDF files work internally — and most free tools take advantage of this gap rather than closing it.
A PDF is not like a photograph. It is a document format that stores text and images as structured data objects inside content streams. When a PDF viewer renders a page, it reads those streams and draws each element at the coordinates specified.
Text is stored as character codes, font references, and position data. Images are stored as compressed pixel data. Both are embedded in the file structure with clear boundaries that any compliant PDF library can parse.
When you open a PDF in a viewer, you see the rendered result — but the underlying data is still there, accessible through any number of tools.
Visual masking tools — which include the drawing layers in Foxit Reader, PDF Annotator, most browser-based PDF editors, and the rectangle annotation tools in many "free PDF redaction" services — work like this:
The file now contains two things in the same region: the original content, and a black rectangle floating above it.
The annotation layer can be removed. In many viewers, you can simply click the rectangle and press Delete. In others, the "Edit Annotations" mode exposes every overlay. At minimum, any tool that reads content streams directly (including pdftotext, strings, Python's PyMuPDF library, and most OCR engines) ignores the annotation layer entirely and reads the original content.
Real redaction modifies the content stream itself. The process:
After this process, the information is gone from the file. There is no annotation layer to remove, no hidden text to extract, no metadata remnant. The bytes that encoded the sensitive characters have been overwritten or deallocated from the document structure.
You do not need special tools to verify this. All of the following tests can be performed on any computer.
Open the redacted PDF. Click and drag your cursor directly over one of the black rectangles. In a visually masked file, you will see a text selection highlight appear — the cursor is selecting the text underneath. Copy and paste into any text editor.
If text appears, the redaction failed.
In Adobe Acrobat Reader (free), open the document. In many cases, right-clicking a masking rectangle shows a Delete option. If the black box disappears and reveals original content, it was an annotation.
On Mac or Linux, run:
pdftotext yourfile.pdf - | grep "sensitive text"On Windows, use an online PDF text extractor or install xpdf tools. If your sensitive content appears in the output, it was never removed from the content stream.
strings yourfile.pdf | grep "SSN\|Account\|Password"The strings command extracts all printable character sequences from the raw file bytes. This bypasses all PDF structure and annotation logic. If your sensitive text appears here, it exists in the file regardless of what the viewer shows.
Building a visual masking tool is straightforward. Add a drawing layer, store a rectangle annotation — done. The PDF standard supports annotations natively, and every library supports writing them.
True redaction is harder. You need a PDF parser that understands content stream operators, can locate text runs by position, and can safely remove them without corrupting the surrounding stream structure. Font subsetting requires additional work to avoid breaking character rendering elsewhere in the document.
This complexity explains why most free tools — and even many paid ones — use masking. It is not necessarily bad faith; it is the easier implementation. But the result is a tool that looks like it removes information while leaving the data intact.
Tools that perform genuine content stream removal:
For non-technical users who need to redact documents with genuine security requirements, the options narrow quickly. Adobe is reliable but expensive. MyPDFBoy provides the same true removal for free, without an account or any file retention.
Documented cases where visual masking produced a data breach:
2005, DoD report: A PDF with black rectangles over names of intelligence sources. Copy-paste in Microsoft Word revealed all names. Published in major newspapers.
2011, TSA security manual: Posted to government websites with whiteout overlays. Aviation journalists extracted the full text in minutes.
2019, Manafort court filing: Defense attorneys used Foxit to black out strategy details. PDF text selection exposed the complete underlying text. Became widely cited in legal practice guides as an example of failed redaction.
These are not edge cases or unusual errors. They are exactly what happens when visual masking tools are used for redaction tasks.
If the content needs to be removed — not hidden, removed — use a tool that operates on content streams. Before sharing any redacted document, run at least the select-and-copy test. If text is selectable behind the black rectangle, start over with a different tool.
The standard for "this information is protected" is not "it looks covered." It is "the bytes are gone."
Try it free
True content stream redaction — not visual overlays. Verify it yourself: no text selectable after download.
PDF RedactionMost tools marketed as free PDF redaction apply visual overlays that leave content extractable. This guide covers which tools actually remove content from the file and which do not.
Adobe Acrobat's redaction is reliable but costs $20/month. Here are five free alternatives that actually remove content from the file, not just cover it up.
PDFs containing personal data fall under GDPR obligations. Here's what you need to know about redaction, retention, and the right to erasure for PDF documents.
We don't use cookies or track you. Your PDFs are processed in-memory and never stored. Privacy policy