Format GuidesMay 6, 2026
Lucas Martín·LazyPDF

PDF Metadata Explained: How to View, Edit, and Remove Hidden Document Data

<p>PDF metadata is hidden information embedded in every PDF file — including the author's name, the organization that created it, the software used, creation and modification dates, revision history, and sometimes the computer username of the person who saved the file. When you share a PDF, all of this data travels with it unless you explicitly remove it. A contract sent to a client, a report shared with a partner organization, or an article submitted to a journal can reveal details about your identity, software environment, and revision history that you never intended to disclose.</p><p>Understanding PDF metadata matters for three reasons. First, privacy: metadata can reveal your name, employer, or internal document workflow to external recipients. Second, professional presentation: a document that exposes "Revision 14" or the username "jsmith-laptop" in its properties looks unpolished. Third, compliance: regulated industries like healthcare and legal often require that documents shared externally contain only approved, verified metadata — or none at all.</p><p>This guide covers what PDF metadata is, where it comes from, how to inspect it on any platform, what the risks are, and the most practical methods for removing or sanitizing it before you share a document. You'll also learn how common PDF operations — compression, conversion, and re-saving — affect metadata as a side effect.</p>

What Is PDF Metadata and What Information Does It Contain?

<p>A PDF file contains two distinct layers. The visible layer is the content you see — text, images, and layout. The invisible layer is metadata: structured data stored in the file's header that describes the document rather than displaying to readers. This metadata persists through email, cloud storage, and downloads, remaining readable to anyone who knows how to look.</p><p>The PDF specification defines two metadata systems. The older system, called the Document Information Dictionary (DocInfo), stores up to 8 standard fields: Title, Author, Subject, Keywords, Creator (the application that created the source document), Producer (the PDF engine that generated the PDF), CreationDate, and ModDate (last modification date). These fields are set automatically by most authoring software and are rarely visible to the document creator during normal workflow.</p><p>The second system, XMP (Extensible Metadata Platform), was developed by Adobe and adopted as an international standard under ISO 16684-1 in 2012. XMP is far more expansive: it supports more than 100 property types across multiple namespaces, including Dublin Core (basic bibliographic data), PDF-specific properties, rights management information, and custom enterprise properties. A Word document converted to PDF through Adobe Acrobat can contain over 20 distinct metadata values in the XMP stream — most of them populated automatically without the author's awareness.</p><p>Practically speaking, the fields that most often contain sensitive or surprising information are: Author (typically the Windows account name or Microsoft 365 profile name), Company (pulled from the Office installation's organization setting), Creator (the source application — revealing which version of Word, InDesign, or other software was used), and the revision or version counters some applications embed. A contract prepared by a law firm's paralegal using a 5-year-old version of Word, revised 11 times before finalization, can expose all of this through a 30-second metadata inspection by the opposing party.</p><p>For context on how different PDF variants handle metadata requirements, see our guide on <a href='/en/blog/pdf-format-types-pdf-a-pdf-x-pdf-ua-explained'>PDF/A, PDF/X, and PDF/UA format types</a> — PDF/A in particular mandates specific XMP metadata for archival compliance, meaning metadata removal is inappropriate for archive-bound documents.</p>

  1. 1Step 1: In Adobe Acrobat Reader (free), open the PDF and go to File → Properties → Description tab. All DocInfo fields are displayed here: Title, Author, Subject, Keywords, Creator, and Producer.
  2. 2Step 2: In any Chromium-based browser (Chrome, Edge, Brave), open the PDF, right-click, and select Document Properties. This shows the basic DocInfo fields without installing any software.
  3. 3Step 3: On macOS, open the PDF in Preview, then go to Tools → Show Inspector (or press Cmd+I). The Info panel displays all available metadata fields including creation and modification timestamps.
  4. 4Step 4: For a complete XMP metadata dump including all namespaces and custom fields, use the free ExifTool command-line utility: run exiftool filename.pdf in a terminal. This reveals every metadata property the file contains, including fields invisible in standard PDF viewers.

How Word, Excel, and PowerPoint Embed Metadata When Converted to PDF

<p>The majority of PDFs created in professional environments originate as Microsoft Office documents. When a Word document is converted to PDF — either through Word's Save As PDF function, by printing to a PDF printer, or by using an online converter like LazyPDF's <a href='/en/word-to-pdf'>word-to-pdf tool</a> — the conversion engine transfers metadata from the source file into the PDF's metadata fields automatically.</p><p>Word stores its own metadata in the document's Properties panel (File → Info → Properties in Word 2016+). The Author field is populated with the name associated with the Microsoft 365 account or Windows user profile. The Company field is set during Office installation from the organization's Office deployment configuration. These values propagate directly to the Author and Subject fields in the resulting PDF without any prompt or warning. If the Word document was created by one person (say, a contractor) and the PDF is exported by another (an internal employee), both names may appear in the metadata under different fields.</p><p>Excel spreadsheets converted to PDF carry the same Author and Company metadata, plus additional fields relevant to spreadsheets: the last-modified username, the calculation version, and the application version string (e.g., "Microsoft Excel for Microsoft 365 MSO (16.0.17830.20166) 64-bit"). This application version string in the Producer field can reveal your organization's Office update schedule — an information security consideration in contexts where software version disclosure is sensitive.</p><p>PowerPoint presentations add slide count data and theme information to the metadata. More significantly, PowerPoint's built-in "Document Inspector" (File → Info → Check for Issues → Inspect Document) can find and remove comments, revision history, and hidden slides — but it must be run explicitly before PDF export. Most users skip this step entirely, with the result that the exported PDF may carry comment threads, speaker notes in some export modes, and the full revision history of who edited the deck and when.</p><p>Google Docs, Sheets, and Slides exported as PDF through Google's built-in export use Google's own PDF engine (Producer: "Skia/PDF"), and they embed the Google account email address of the person who triggered the export as the Author — a detail worth knowing before submitting a document to a vendor or regulatory body under a company identity rather than a personal Gmail address.</p>

Why PDF Metadata Matters: Privacy Risks in Shared Documents

<p>Metadata exposure follows predictable patterns depending on how and where documents are shared. Understanding the specific risks helps you decide which documents need metadata cleaning and which don't.</p><p>In legal and contract contexts, metadata reveals negotiation history. A contract PDF that exposes "Revision 11" as its version counter tells the opposing party that your draft went through 10 previous rounds — a signal that the final terms may have been significantly different from your original position. Law firms discovered this problem publicly in the early 2000s when multiple high-profile cases involved accidental disclosure of internal revision notes through document metadata. The American Bar Association issued guidance on metadata removal as early as 2006, and it remains standard practice at compliant firms to scrub all outgoing client documents. For a broader look at PDF security tools used in legal workflows, see our guide on <a href='/en/blog/best-pdf-tools-for-lawyers-2026'>best PDF tools for lawyers in 2026</a>.</p><p>In academic contexts, metadata can compromise blind review. Journals and conferences that use double-blind peer review explicitly require authors to remove their names and institutional affiliations from submission PDFs. The metadata fields are as identifying as the author byline in the visible document — an Author field showing "Maria Chen, Stanford University" defeats the blind review process entirely, even if the visible document header has been anonymized. Many journals now list metadata removal in their submission guidelines.</p><p>In business and procurement contexts, metadata from proposal PDFs can reveal the contractor who prepared the document (and thus their software environment, revision count, and timeline), the client's internal organization if a template was used, and whether the proposal was prepared fresh or adapted from a previous client's document. A company field showing a different client's name in a repurposed proposal is a memorable mistake.</p><p>From a pure privacy perspective, documents shared publicly online — annual reports, public comments, filed disclosures — may contain personal information in metadata fields that the organization didn't intend to publish. The UK's Information Commissioner's Office has referenced metadata leakage in guidance on data minimization under GDPR, noting that unnecessary personal data in file metadata constitutes a data protection risk in the same way that unnecessary data in a form field does.</p>

How to Remove or Sanitize PDF Metadata Before Sharing

<p>Removing metadata from a PDF means producing a clean copy of the document whose properties contain only what you intend to disclose — or nothing at all. Several approaches achieve this with different levels of completeness, and the right choice depends on how thoroughly you need to clean the document and what tools you have available.</p><p>The most thorough approach is re-processing the PDF through a conversion pipeline that resets metadata. LazyPDF's <a href='/en/compress'>compress tool</a>, which uses Ghostscript under the hood, rewrites the PDF's internal structure during compression — and in doing so, replaces the original Author, Creator, and custom XMP metadata with generic Ghostscript values. The Author field in the output is typically blank or set to the empty string, the Creator field becomes "LazyPDF", and custom XMP namespaces from the original authoring software are stripped. A 20 MB Word-to-PDF document compressed even at the lightest setting exits the process with substantially fewer metadata fields than it entered with.</p><p>For documents where file size isn't a concern but metadata removal is, the same approach applies: run the PDF through a compression pass specifically to reset the metadata, not to reduce file size. Choose the lowest compression preset to minimize any quality impact on the visible content while still triggering the Ghostscript reprocessing that clears the metadata. The output will be slightly different in file size from the input, but visually identical.</p><p>A complementary approach is using Microsoft Office's Document Inspector before converting to PDF. In Word, Excel, or PowerPoint, go to File → Info → Check for Issues → Inspect Document. The Inspector lists all metadata categories found in the file: comments, revisions, personal information, custom XML data, and hidden content. Clicking Remove All next to each category strips those fields before you export to PDF — meaning the source document's metadata is clean before the conversion ever happens. This is the most reliable method for Office-originated documents.</p><p>For scanned PDFs that started as images (with no Office origin), there is no application metadata to strip — the Author and Creator fields will typically already be blank or set to the scanning software. The main metadata concern with scanned PDFs is the creation date and scanning device information embedded by the scanner's firmware. Running the scanned PDF through LazyPDF's compress tool removes the scanner-origin metadata just as it does for Office-converted PDFs.</p>

  1. 1Step 1: Open LazyPDF's Compress PDF tool at /en/compress and upload the document you want to sanitize. You're using the compression tool specifically for its metadata-resetting side effect, not primarily for file size reduction.
  2. 2Step 2: Select the Minimum or Low compression preset to preserve maximum visual quality. Even at the lowest setting, Ghostscript rewrites the PDF structure and clears Author, custom XMP fields, and Creator metadata.
  3. 3Step 3: Download the output PDF and inspect its metadata using File → Properties in Adobe Reader or your browser's built-in PDF properties panel. Verify that the Author, Company, and Subject fields are blank or contain only approved information.
  4. 4Step 4: If you need to set specific metadata values (such as a title and company name for a published report), do so in the PDF reader's document properties after inspection — then save. Note that LazyPDF does not currently offer a metadata editor; use Adobe Acrobat, Preview on macOS, or command-line tools like ExifTool for targeted field editing.

DocInfo Dictionary vs. XMP: The Two Metadata Systems Inside Every PDF

<p>Most guides to PDF metadata treat it as a single thing, but PDFs actually contain two separate, independent metadata systems that can hold different values simultaneously — and most PDF viewers only show one of them. Understanding both is necessary if you're trying to confirm that a document is truly clean before sharing.</p><p>The Document Information Dictionary (DocInfo) is the older of the two systems, defined in the original PDF 1.0 specification. It uses a simple key-value store format and supports exactly 8 fields: Title, Author, Subject, Keywords, Creator, Producer, CreationDate, and ModDate. Every PDF produced since the early 1990s has this dictionary, and it's what you see in the Properties panel of most PDF viewers. DocInfo dates are stored in a specific PDF date format (D:YYYYMMDDHHmmss) and can include timezone offsets.</p><p>XMP (Extensible Metadata Platform), introduced by Adobe with PDF 1.4 in 2001 and standardized as ISO 16684-1 in 2012, is stored as an embedded XML stream within the PDF file. XMP metadata is organized into namespaces: the Dublin Core namespace (dc:) handles title, creator, and description; the PDF namespace (pdf:) adds PDF-specific properties like keywords and PDF version; the XMP basic namespace (xmp:) adds creation date, modification date, and the creating tool name; and proprietary namespaces from Microsoft (msip:), Adobe (xmpTPg:), and other vendors can add arbitrarily many additional fields.</p><p>When a modern PDF viewer shows you Author: "Jane Smith", it may be reading from DocInfo, from XMP dc:creator, or from both — and those fields can contain different values if the document has been through multiple editing and conversion steps. A document cleaned by removing only the DocInfo Author field but leaving the XMP dc:creator field intact still exposes the author's name to anyone who inspects the raw XMP stream. For full sanitization you need to clean both systems.</p><p>The PDF/A standard for archival documents (ISO 19005) requires conforming XMP metadata: specifically, the DocInfo and XMP streams must be synchronized — they must contain the same values for overlapping fields. Removing metadata from a PDF/A document can break this synchronization and invalidate its PDF/A conformance. For more on when metadata preservation is required rather than discouraged, see our guide on <a href='/en/blog/pdf-format-types-pdf-a-pdf-x-pdf-ua-explained'>PDF/A, PDF/X, and PDF/UA format types explained</a>.</p>

Metadata in Compressed, Scanned, and Digitally Native PDFs

<p>The metadata profile of a PDF depends significantly on how the document was created. Digitally native PDFs — exported directly from Word, InDesign, or Google Docs — carry rich metadata populated by the authoring software. Scanned PDFs — created from physical documents by a scanner or mobile app — carry minimal metadata from the imaging device. Compressed PDFs — processed through Ghostscript or other PDF engines — often have their metadata partially or fully reset as a side effect of reprocessing. Understanding these differences helps you know what to expect when inspecting a document's properties.</p><p>A digitally native PDF from Microsoft Word 2021 or Microsoft 365 typically contains: the Author field (full name from Office account), the Company field (from Office installation settings), the Creator field ("Microsoft Word"), the Producer field ("Microsoft Word" or the macOS PDF subsystem if printed to PDF on Mac), creation and modification timestamps, a revision number, and XMP metadata synchronized with the DocInfo values. This is the richest metadata profile and the one most likely to require cleaning before external sharing.</p><p>A scanned PDF from a Canon or Epson scanner typically contains: a blank or device-name Author field, the Producer field set to the scanner's firmware name and version, a creation timestamp corresponding to the scan date, and nothing else. The content itself is an image, not text — which is why OCR is necessary to make it searchable. See our guide on the <a href='/en/blog/scanned-vs-digital-pdf-file-size'>difference between scanned and digital PDF file sizes</a> for more on how scanning parameters affect the resulting document.</p><p>A PDF produced by LazyPDF's compress tool will have its Creator and Producer fields set to Ghostscript and LazyPDF respectively, a fresh creation timestamp (the time of processing), and no Author or Company field. This is the cleanest metadata profile available for a processed document, and it's one reason that running an Office-originated PDF through the LazyPDF compression tool before sharing is an effective metadata sanitization step even when file size reduction isn't the primary goal.</p><p>For accessibility-conscious document workflows, metadata plays a different role: the Title and Language fields in a PDF's XMP metadata are required for screen reader software to correctly identify the document. A PDF with a blank Title field means the screen reader reads out the filename instead. Our <a href='/en/blog/pdf-accessibility-checklist-2026'>PDF accessibility checklist for 2026</a> covers which metadata fields are required for WCAG 2.1 and PDF/UA compliance.</p>

Frequently Asked Questions

What personal information is typically hidden in a PDF's metadata?

Most PDFs created from Office documents contain the author's full name (from the Windows account or Microsoft 365 profile), the company or organization name (from the Office installation), the software used, creation and modification dates, and sometimes a revision count. These fields are set automatically and are not visible in the main document view.

How do I view the metadata in a PDF without any special software?

In any Chromium browser (Chrome, Edge, Brave), open the PDF, right-click, and select Document Properties. On macOS, open the PDF in Preview and press Cmd+I to open the Info panel. In Adobe Reader (free), go to File → Properties → Description tab. All methods display the DocInfo fields including Author, Creator, and timestamps.

Does compressing a PDF remove its metadata?

Partially. Ghostscript-based compression (used by LazyPDF's compress tool) resets the Creator and Producer fields and clears custom XMP namespaces, typically leaving the Author field blank in the output. It is not a dedicated metadata removal tool, but it produces a significantly cleaner metadata profile than the original Office-converted document as a side effect of PDF reprocessing.

What is the difference between DocInfo and XMP metadata in a PDF?

DocInfo is the original PDF metadata system, storing 8 fixed fields (Author, Title, Creator, Producer, dates). XMP is a newer XML-based system supporting hundreds of properties across multiple namespaces. Both exist in modern PDFs and can hold different values. Full metadata sanitization requires cleaning both — removing only DocInfo leaves personal data visible in the XMP stream.

Should I remove metadata from a PDF before submitting to a journal or conference?

Yes, if the submission requires double-blind peer review. Most double-blind journals specify metadata removal in their author guidelines because Author and Affiliation fields in PDF metadata identify you just as clearly as a byline in the visible document. Run the PDF through the compression or re-export step and verify the Author field is blank before submitting.

Process your PDFs through LazyPDF to reset metadata, compress file sizes, and protect documents — free, no signup, nothing stored.

Compress & Clean PDF

Related Articles