When organizations need to share a document with sensitive information removed, most reach for whatever PDF editor they already have. They draw a black box. The box looks convincing. They send the file.

This pattern has caused real-world data breaches. Lawyers have exposed litigation strategy. Government agencies have leaked intelligence sources. Medical providers have published patient identifiers. In almost every case, the "redacted" PDF contained the original content verbatim, hidden under a rectangle that anyone could move, delete, or bypass.

The root cause is a misunderstanding about how PDF files work internally — and most free tools take advantage of this gap rather than closing it.

How PDF Files Store Content

A PDF is not like a photograph. It is a document format that stores text and images as structured data objects inside content streams. When a PDF viewer renders a page, it reads those streams and draws each element at the coordinates specified.

Text is stored as character codes, font references, and position data. Images are stored as compressed pixel data. Both are embedded in the file structure with clear boundaries that any compliant PDF library can parse.

When you open a PDF in a viewer, you see the rendered result — but the underlying data is still there, accessible through any number of tools.

What Visual Masking Actually Does

Visual masking tools — which include the drawing layers in Foxit Reader, PDF Annotator, most browser-based PDF editors, and the rectangle annotation tools in many "free PDF redaction" services — work like this:

You draw a shape (typically a filled black rectangle) over the content you want to hide
The tool stores that shape as an annotation layer in the PDF file
The PDF renderer draws the annotation on top of the existing content when displaying the page

The file now contains two things in the same region: the original content, and a black rectangle floating above it.

The annotation layer can be removed. In many viewers, you can simply click the rectangle and press Delete. In others, the "Edit Annotations" mode exposes every overlay. At minimum, any tool that reads content streams directly (including pdftotext, strings, Python's PyMuPDF library, and most OCR engines) ignores the annotation layer entirely and reads the original content.

What Real Redaction Does

Real redaction modifies the content stream itself. The process:

Parse the PDF content stream to identify all text runs and image objects within the specified region
Delete those objects from the stream data
Write a new content object — typically a filled rectangle — in the same position
Optionally rewrite the font subset to exclude any glyphs that only appeared in removed text

After this process, the information is gone from the file. There is no annotation layer to remove, no hidden text to extract, no metadata remnant. The bytes that encoded the sensitive characters have been overwritten or deallocated from the document structure.

How to Test Whether Redaction Actually Worked

You do not need special tools to verify this. All of the following tests can be performed on any computer.

Test 1: Select and copy

Open the redacted PDF. Click and drag your cursor directly over one of the black rectangles. In a visually masked file, you will see a text selection highlight appear — the cursor is selecting the text underneath. Copy and paste into any text editor.

If text appears, the redaction failed.

Test 2: Delete the annotation

In Adobe Acrobat Reader (free), open the document. In many cases, right-clicking a masking rectangle shows a Delete option. If the black box disappears and reveals original content, it was an annotation.

Test 3: Text extraction

On Mac or Linux, run:

pdftotext yourfile.pdf - | grep "sensitive text"

On Windows, use an online PDF text extractor or install xpdf tools. If your sensitive content appears in the output, it was never removed from the content stream.

Test 4: Raw bytes (most thorough)

strings yourfile.pdf | grep "SSN\|Account\|Password"

The strings command extracts all printable character sequences from the raw file bytes. This bypasses all PDF structure and annotation logic. If your sensitive text appears here, it exists in the file regardless of what the viewer shows.

Why Most Free Tools Use Masking Instead of True Removal

Building a visual masking tool is straightforward. Add a drawing layer, store a rectangle annotation — done. The PDF standard supports annotations natively, and every library supports writing them.

True redaction is harder. You need a PDF parser that understands content stream operators, can locate text runs by position, and can safely remove them without corrupting the surrounding stream structure. Font subsetting requires additional work to avoid breaking character rendering elsewhere in the document.

This complexity explains why most free tools — and even many paid ones — use masking. It is not necessarily bad faith; it is the easier implementation. But the result is a tool that looks like it removes information while leaving the data intact.

What Actually Provides Real Redaction

Tools that perform genuine content stream removal:

Adobe Acrobat Pro ($19.99/month) — the original standard, reliable
MyPDFBoy (free, browser-based) — uses PyMuPDF on the backend for true stream removal
Libreoffice with Draw macros (free, desktop) — possible with manual stream editing, not GUI-driven
qpdf (free, command-line) — requires manual stream editing, developer-only

For non-technical users who need to redact documents with genuine security requirements, the options narrow quickly. Adobe is reliable but expensive. MyPDFBoy provides the same true removal for free, without an account or any file retention.

The Stakes Are Not Theoretical

Documented cases where visual masking produced a data breach:

2005, DoD report: A PDF with black rectangles over names of intelligence sources. Copy-paste in Microsoft Word revealed all names. Published in major newspapers.

2011, TSA security manual: Posted to government websites with whiteout overlays. Aviation journalists extracted the full text in minutes.

2019, Manafort court filing: Defense attorneys used Foxit to black out strategy details. PDF text selection exposed the complete underlying text. Became widely cited in legal practice guides as an example of failed redaction.

These are not edge cases or unusual errors. They are exactly what happens when visual masking tools are used for redaction tasks.

The Practical Conclusion

If the content needs to be removed — not hidden, removed — use a tool that operates on content streams. Before sharing any redacted document, run at least the select-and-copy test. If text is selectable behind the black rectangle, start over with a different tool.

The standard for "this information is protected" is not "it looks covered." It is "the bytes are gone."

Try it free

True content stream redaction — not visual overlays. Verify it yourself: no text selectable after download.

PDF Redaction

Visual Masking vs Real Redaction: Why Most Free Tools Don't Actually Protect Your Data