Every PDF you create contains hidden metadata — author name, company, software version, and often revision history. Here is how to find it and remove it before you share.
Every PDF you create carries invisible cargo. When you save a document from Microsoft Word, export from LibreOffice, or scan a paper form, the resulting PDF embeds a set of data fields that describe the file itself — not its visible content, but information about how, when, and by whom it was created. This is PDF metadata, and it is disclosed to anyone who receives the file unless you actively remove it.
The problem is not that metadata exists. The problem is that most people do not know it is there. You proofread the visible text, confirm the layout looks right, then hit send — never realizing the file also contains your Windows username, your company's internal name, the exact software version you used, and a timestamp showing when you last edited it three days before the official "final" date you told the client.
This guide explains what PDF metadata contains, why it matters before sharing documents, and how to strip it using both command-line and GUI tools.
PDF metadata lives in two separate places inside every PDF file.
1. Document Information Dictionary (DocInfo)
This is the original PDF metadata format, part of the PDF specification since version 1.0. It stores the following fields:
2. XMP Packet
XMP (Extensible Metadata Platform) is an XML-based metadata format embedded directly in the PDF's binary stream. It replicates the DocInfo fields in XML and can include additional custom namespaces with arbitrary properties. Adobe applications write extensive XMP metadata including document history in some configurations.
Both stores can be present in the same file and may contain different values if the document was converted between formats or edited by multiple applications.
What common tools embed by default:
Before removing anything, verify what is actually present in your file.
Method 1: exiftool (command line)
ExifTool is the authoritative tool for reading and writing metadata in virtually every file format, including PDF. Install it via your system package manager, then run:
exiftool document.pdfThis outputs all fields it can read. A typical Word-generated PDF looks like:
ExifTool Version Number : 12.70
File Type : PDF
PDF Version : 1.7
Author : Jane Smith
Creator : Microsoft Word
Producer : macOS Quartz PDFContext
Create Date : 2026-03-15 14:32:41
Modify Date : 2026-03-20 09:17:05
Company : Acme Corporation
The Company field is particularly notable — it comes from Word's organisational settings and is rarely visible to the author because it is set once at install time and never displayed in normal editing.
Method 2: Adobe Acrobat
Open the file in Acrobat, then go to File → Properties → Description tab. This shows the DocInfo fields only. For a complete view including XMP, use the Acrobat Pro Tools → Redact → Sanitize Document dialog, which lists what it will remove before acting.
Method 3: Browser
Chrome and Firefox can display basic DocInfo when you open a PDF. Right-click on the viewer background and look for Document Properties. This gives you the basic fields but does not show the XMP packet or embedded content.
Metadata disclosure is rarely catastrophic in isolation. The risk is context. When paired with the visible content of a document, metadata fields can reveal information you had no intention of disclosing.
Court filings and legal documents
Several high-profile cases have involved metadata in publicly filed documents exposing law firm internal information: drafting attorney names on filings intended to be signed only by lead counsel, revision timestamps showing a "final" brief was edited after the stated submission deadline, and comment threads embedded in documents produced during discovery that contained frank internal assessments of the case's weaknesses. Courts in the US and UK have issued guidance specifically on metadata hygiene in litigation documents.
Whistleblower and source protection
A document leaked to a journalist carries its Author field unless the source explicitly removes it. In cases where a document was created by one of a small group of people with access, even a partial name or Windows username can narrow the list to a single person. This applies to any context where the sender needs to conceal their identity — regulatory complaints, anonymous tips to auditors, and internal ethics reports.
Vendor and client confidentiality
Proposal PDFs sometimes include a Company metadata field revealing the sender's employer — useful information if the sender is ghostwriting for a competitor, acting as a subcontractor, or using a shared document template from a previous engagement. A client receiving a "custom" proposal with a different firm's name in the metadata has grounds for concern regardless of whether the content is genuinely original.
ExifTool is free, scriptable, and the most reliable method for complete metadata removal.
exiftool -all= document.pdfThe -all= flag sets all writable metadata fields to empty. ExifTool creates a backup of the original as document.pdf_original before writing. Once you have verified the output, delete the backup:
rm document.pdf_originalTo remove metadata without creating a backup:
exiftool -overwrite_original -all= document.pdfTo verify the result immediately after stripping:
exiftool document.pdf | grep -E "Author|Company|Creator|Producer|Create Date|Modify Date"If any of those fields still have values, re-run with the -overwrite_original flag and check whether the PDF uses non-standard XMP namespaces that require additional flags.
Acrobat Pro includes a Sanitize Document function that removes metadata, hidden layers, embedded content, JavaScript, and form data in a single operation.
Go to Tools → Redact → Sanitize Document. Acrobat will display a list of what it found and will remove before saving. This is the most thorough GUI option and covers edge cases that manual field clearing misses.
For manual field clearing only: File → Properties → Description tab — clear each field, then save. This does not remove XMP metadata, only DocInfo, so it is incomplete for documents that need thorough cleaning.
Every platform that can print to a virtual PDF driver effectively re-renders the document and discards most metadata. Open the PDF in any viewer, then:
The result is a flat PDF containing only what is visually rendered. You lose interactive elements (hyperlinks become non-clickable text, form fields become static), but the file will have minimal metadata — typically just the Producer field identifying the print driver.
This method is appropriate for documents where you need a "clean copy" and do not need to preserve interactivity.
LibreOffice lets you control what metadata is embedded at export time.
File → Export as PDF → General tab — uncheck Export document information as metadata. This prevents the application from writing the Author, Title, and Subject fields to the output. LibreOffice does not give you control over the Producer field (it always identifies itself), but the user-identifying fields are omitted.
Note: none of these tools are affiliated with MyPDFBoy. ExifTool is maintained independently at exiftool.org.
Metadata removal cleans the file's header fields — the invisible structured data describing the document. It does nothing about the visible content of the PDF itself.
If the document contains sensitive visible information — names, account numbers, signatures, medical records, legal identifiers — those must be redacted separately. Redaction removes content from the PDF's content streams, the data that the viewer renders on screen. A metadata-clean PDF that still contains unredacted names and numbers in its body text is not a secure document.
Redact Sensitive Content from Your PDF
True content removal — not visual overlay. Free, no account required.
redactEven after a complete metadata strip, some information can remain in a PDF that a careful reader could use to identify the document's origin.
Embedded fonts
Font names are preserved in the PDF because the viewer needs them for rendering. A specialized or licensed font that is only used by a specific organization can narrow the pool of possible sources. Font subsetting (embedding only the glyphs used) is standard practice but does not fully anonymize the font data.
Thumbnail images
Some PDF creation tools embed a small raster thumbnail of the first page as a metadata stream. ExifTool can remove it specifically:
exiftool -ThumbnailImage= document.pdfEmbedded file attachments
PDFs can embed other files as attachments — Word documents, images, spreadsheets. These attachments carry their own metadata independently of the parent PDF. Check File → Attachments in Acrobat or use ExifTool to identify embedded files before sharing.
Comments and form data
Annotations, comments, and form field values are content, not metadata — they are part of the PDF's content streams. Removing them requires flattening (converting annotations to static content) or redaction, not metadata stripping. ExifTool's -all= flag will not touch these.
Creation software fingerprints
Some applications embed software-specific binary structures or object stream patterns that can identify the creating application even after metadata stripping. This is rarely relevant except in forensic analysis contexts.
Apply this checklist to any PDF that will leave your direct control:
exiftool document.pdf or Acrobat → File → Propertiesexiftool -ThumbnailImage=)-EmbeddedFile)PDF metadata is a disclosure risk that most document workflows ignore entirely. The fields are invisible in normal viewing, they are populated automatically by the tools you use every day, and they survive every step of review and proofreading because no one is looking at them.
The fix is a two-minute step: run ExifTool before sending, confirm the fields are empty, and proceed. For organizations that regularly share externally, adding metadata removal to the final step of any document workflow — alongside spell-check and formatting review — eliminates a class of inadvertent disclosure that causes disproportionate damage relative to how simple it is to prevent.
Most tools marketed as free PDF redaction apply visual overlays that leave content extractable. This guide covers which tools actually remove content from the file and which do not.
Adobe Acrobat's redaction is reliable but costs $20/month. Here are five free alternatives that actually remove content from the file, not just cover it up.
PDFs containing personal data fall under GDPR obligations. Here's what you need to know about redaction, retention, and the right to erasure for PDF documents.
We don't use cookies or track you. Your PDFs are processed in-memory and never stored. Privacy policy