MyPDFBoy
5 min read

How to Remove PDF Metadata Before Sharing Documents

Every PDF you create contains hidden metadata — author name, company, software version, and often revision history. Here is how to find it and remove it before you share.

document-securityprivacypdf-redaction

Every PDF you create carries invisible cargo. When you save a document from Microsoft Word, export from LibreOffice, or scan a paper form, the resulting PDF embeds a set of data fields that describe the file itself — not its visible content, but information about how, when, and by whom it was created. This is PDF metadata, and it is disclosed to anyone who receives the file unless you actively remove it.

The problem is not that metadata exists. The problem is that most people do not know it is there. You proofread the visible text, confirm the layout looks right, then hit send — never realizing the file also contains your Windows username, your company's internal name, the exact software version you used, and a timestamp showing when you last edited it three days before the official "final" date you told the client.

This guide explains what PDF metadata contains, why it matters before sharing documents, and how to strip it using both command-line and GUI tools.

What Metadata a PDF Stores

PDF metadata lives in two separate places inside every PDF file.

1. Document Information Dictionary (DocInfo)

This is the original PDF metadata format, part of the PDF specification since version 1.0. It stores the following fields:

  • Title — the document title (often different from the filename)
  • Author — the person who created the document, typically the OS username
  • Subject — a free-text subject field
  • Keywords — indexing keywords set by the authoring application
  • Creator — the application that originally created the document (e.g., "Microsoft Word 2021")
  • Producer — the PDF library used to produce the final file (e.g., "macOS Quartz PDFContext")
  • CreationDate — when the file was first created
  • ModDate — when the file was last modified

2. XMP Packet

XMP (Extensible Metadata Platform) is an XML-based metadata format embedded directly in the PDF's binary stream. It replicates the DocInfo fields in XML and can include additional custom namespaces with arbitrary properties. Adobe applications write extensive XMP metadata including document history in some configurations.

Both stores can be present in the same file and may contain different values if the document was converted between formats or edited by multiple applications.

What common tools embed by default:

  • Microsoft Word — Author (taken from Windows account name), Company name from Office settings, last-edit user, and revision count
  • Adobe Acrobat — full document history in some licensing configurations, named destinations, embedded color profiles
  • Scanned documents — GPS coordinates if the scanning device or smartphone had location services enabled
  • LibreOffice — author, organisation, edit duration (total time the document was open for editing), and template name
  • macOS Preview — typically strips most metadata on export, but retains Creator and Producer fields

How to View the Metadata in a PDF

Before removing anything, verify what is actually present in your file.

Method 1: exiftool (command line)

ExifTool is the authoritative tool for reading and writing metadata in virtually every file format, including PDF. Install it via your system package manager, then run:

exiftool document.pdf

This outputs all fields it can read. A typical Word-generated PDF looks like:

ExifTool Version Number : 12.70
File Type               : PDF
PDF Version             : 1.7
Author                  : Jane Smith
Creator                 : Microsoft Word
Producer                : macOS Quartz PDFContext
Create Date             : 2026-03-15 14:32:41
Modify Date             : 2026-03-20 09:17:05
Company                 : Acme Corporation

The Company field is particularly notable — it comes from Word's organisational settings and is rarely visible to the author because it is set once at install time and never displayed in normal editing.

Method 2: Adobe Acrobat

Open the file in Acrobat, then go to File → Properties → Description tab. This shows the DocInfo fields only. For a complete view including XMP, use the Acrobat Pro Tools → Redact → Sanitize Document dialog, which lists what it will remove before acting.

Method 3: Browser

Chrome and Firefox can display basic DocInfo when you open a PDF. Right-click on the viewer background and look for Document Properties. This gives you the basic fields but does not show the XMP packet or embedded content.

Why This Matters Before Sharing

Metadata disclosure is rarely catastrophic in isolation. The risk is context. When paired with the visible content of a document, metadata fields can reveal information you had no intention of disclosing.

Court filings and legal documents

Several high-profile cases have involved metadata in publicly filed documents exposing law firm internal information: drafting attorney names on filings intended to be signed only by lead counsel, revision timestamps showing a "final" brief was edited after the stated submission deadline, and comment threads embedded in documents produced during discovery that contained frank internal assessments of the case's weaknesses. Courts in the US and UK have issued guidance specifically on metadata hygiene in litigation documents.

Whistleblower and source protection

A document leaked to a journalist carries its Author field unless the source explicitly removes it. In cases where a document was created by one of a small group of people with access, even a partial name or Windows username can narrow the list to a single person. This applies to any context where the sender needs to conceal their identity — regulatory complaints, anonymous tips to auditors, and internal ethics reports.

Vendor and client confidentiality

Proposal PDFs sometimes include a Company metadata field revealing the sender's employer — useful information if the sender is ghostwriting for a competitor, acting as a subcontractor, or using a shared document template from a previous engagement. A client receiving a "custom" proposal with a different firm's name in the metadata has grounds for concern regardless of whether the content is genuinely original.

How to Remove PDF Metadata

Method 1: ExifTool (command line, all platforms)

ExifTool is free, scriptable, and the most reliable method for complete metadata removal.

exiftool -all= document.pdf

The -all= flag sets all writable metadata fields to empty. ExifTool creates a backup of the original as document.pdf_original before writing. Once you have verified the output, delete the backup:

rm document.pdf_original

To remove metadata without creating a backup:

exiftool -overwrite_original -all= document.pdf

To verify the result immediately after stripping:

exiftool document.pdf | grep -E "Author|Company|Creator|Producer|Create Date|Modify Date"

If any of those fields still have values, re-run with the -overwrite_original flag and check whether the PDF uses non-standard XMP namespaces that require additional flags.

Method 2: Adobe Acrobat Pro

Acrobat Pro includes a Sanitize Document function that removes metadata, hidden layers, embedded content, JavaScript, and form data in a single operation.

Go to Tools → Redact → Sanitize Document. Acrobat will display a list of what it found and will remove before saving. This is the most thorough GUI option and covers edge cases that manual field clearing misses.

For manual field clearing only: File → Properties → Description tab — clear each field, then save. This does not remove XMP metadata, only DocInfo, so it is incomplete for documents that need thorough cleaning.

Method 3: Print to PDF

Every platform that can print to a virtual PDF driver effectively re-renders the document and discards most metadata. Open the PDF in any viewer, then:

  • macOS: File → Print → PDF (bottom-left dropdown) → Save as PDF
  • Windows: File → Print → Microsoft Print to PDF
  • Linux: File → Print → Print to File (PDF)

The result is a flat PDF containing only what is visually rendered. You lose interactive elements (hyperlinks become non-clickable text, form fields become static), but the file will have minimal metadata — typically just the Producer field identifying the print driver.

This method is appropriate for documents where you need a "clean copy" and do not need to preserve interactivity.

Method 4: LibreOffice

LibreOffice lets you control what metadata is embedded at export time.

File → Export as PDF → General tab — uncheck Export document information as metadata. This prevents the application from writing the Author, Title, and Subject fields to the output. LibreOffice does not give you control over the Producer field (it always identifies itself), but the user-identifying fields are omitted.

Note: none of these tools are affiliated with MyPDFBoy. ExifTool is maintained independently at exiftool.org.

Redacting Content Is a Separate Step

Metadata removal cleans the file's header fields — the invisible structured data describing the document. It does nothing about the visible content of the PDF itself.

If the document contains sensitive visible information — names, account numbers, signatures, medical records, legal identifiers — those must be redacted separately. Redaction removes content from the PDF's content streams, the data that the viewer renders on screen. A metadata-clean PDF that still contains unredacted names and numbers in its body text is not a secure document.

Redact Sensitive Content from Your PDF

True content removal — not visual overlay. Free, no account required.

redact

What Metadata Removal Does Not Cover

Even after a complete metadata strip, some information can remain in a PDF that a careful reader could use to identify the document's origin.

Embedded fonts

Font names are preserved in the PDF because the viewer needs them for rendering. A specialized or licensed font that is only used by a specific organization can narrow the pool of possible sources. Font subsetting (embedding only the glyphs used) is standard practice but does not fully anonymize the font data.

Thumbnail images

Some PDF creation tools embed a small raster thumbnail of the first page as a metadata stream. ExifTool can remove it specifically:

exiftool -ThumbnailImage= document.pdf

Embedded file attachments

PDFs can embed other files as attachments — Word documents, images, spreadsheets. These attachments carry their own metadata independently of the parent PDF. Check File → Attachments in Acrobat or use ExifTool to identify embedded files before sharing.

Comments and form data

Annotations, comments, and form field values are content, not metadata — they are part of the PDF's content streams. Removing them requires flattening (converting annotations to static content) or redaction, not metadata stripping. ExifTool's -all= flag will not touch these.

Creation software fingerprints

Some applications embed software-specific binary structures or object stream patterns that can identify the creating application even after metadata stripping. This is rarely relevant except in forensic analysis contexts.

Checklist: Before Sharing Any PDF

Apply this checklist to any PDF that will leave your direct control:

  • View metadata with exiftool document.pdf or Acrobat → File → Properties
  • Strip Author, Company, Creator, and Producer fields using ExifTool or Acrobat Sanitize
  • Check for embedded thumbnail images and remove if present (exiftool -ThumbnailImage=)
  • Check for file attachments (Acrobat → File → Attachments or ExifTool -EmbeddedFile)
  • Redact any sensitive visible content from the document body
  • Re-verify after stripping — run ExifTool again on the cleaned file to confirm fields are empty
  • If full anonymization is needed, use Print to PDF to reset all identifying metadata

Summary

PDF metadata is a disclosure risk that most document workflows ignore entirely. The fields are invisible in normal viewing, they are populated automatically by the tools you use every day, and they survive every step of review and proofreading because no one is looking at them.

The fix is a two-minute step: run ExifTool before sending, confirm the fields are empty, and proceed. For organizations that regularly share externally, adding metadata removal to the final step of any document workflow — alongside spell-check and formatting review — eliminates a class of inadvertent disclosure that causes disproportionate damage relative to how simple it is to prevent.

Share this post

We don't use cookies or track you. Your PDFs are processed in-memory and never stored. Privacy policy