Lawyer's Guide to PDF Discovery Document Management
Modern litigation generates document volumes that would have been unimaginable to attorneys of a previous generation. A single commercial dispute can produce hundreds of thousands of pages — emails, contracts, financial records, internal memos, and communications spanning years of business activity. Managing this volume efficiently is not just an organizational preference; it is a professional obligation. Attorneys who cannot search, review, and produce discovery documents quickly are at a material disadvantage in motion practice, depositions, and trial preparation. PDF has become the universal standard format for e-discovery. Courts accept it, opposing counsel expects it, and document review platforms are built around it. But simply having documents in PDF format is not enough. Scanned PDFs without text layers are functionally unsearchable. Uncompressed productions can exceed court e-filing size limits. Unprotected PDFs containing privileged communications expose firms to serious ethical liability. And disorganized productions — documents dumped without logical structure — slow review and raise spoliation risks. This guide covers the four core PDF workflows that litigation attorneys must master: making discovery documents searchable via OCR, compressing large productions for court e-filing, protecting sensitive attorney-client privileged materials, and organizing document bundles into coherent, court-ready packages. Each workflow uses accessible browser-based tools that require no expensive software license and no IT support — just a browser and a clear process.
OCR Scanning Discovery Documents for Full-Text Search
The single most consequential PDF task in discovery management is OCR — optical character recognition. When opposing counsel produces documents, a substantial portion often arrives as scanned images: paper files that were photocopied and converted to PDF without a text layer. These files look readable to a human eye but are completely invisible to search tools. You cannot run Ctrl+F through a smoking-gun phrase, and document review platforms cannot flag them as responsive to keyword searches. OCR processing converts scanned image PDFs into text-searchable documents. Modern OCR achieves very high accuracy on cleanly typed, photocopied documents — well above 99% for standard business correspondence and financial records. Accuracy drops for handwritten documents, faded copies, or heavily redacted pages, but even imperfect OCR dramatically improves review efficiency compared to page-by-page manual reading without any search capability. For litigation, OCR also enables Bates-number searching. Once you have applied sequential page identification numbers to a production, you need the ability to search the text of specific document ranges instantly. OCR makes this possible without loading each file individually. LazyPDF's OCR tool processes scanned PDFs directly in your browser, returning a text-searchable PDF you can download immediately. For opposing counsel productions, run each PDF through OCR before importing into your case management system. The upfront time investment — even across large productions — pays dividends when you can run instant keyword searches rather than manual page review throughout the matter.
- 1Step 1: Collect all scanned PDFs from the opposing counsel production into a single folder, separated from native-format documents that already have text layers
- 2Step 2: Upload each scanned PDF to LazyPDF's OCR tool and process it — the tool adds a full text layer while preserving the original visual appearance of every page
- 3Step 3: Download the text-searchable PDF and verify OCR quality on a sample page by running a Ctrl+F search for known text that should appear
- 4Step 4: Rename each OCR-processed file using a consistent convention: CaseNumber_BatesRange_DocumentType.pdf (e.g., 2024CV1234_DEF001-DEF0150_EmailChain.pdf)
- 5Step 5: Organize OCR-processed documents into case sub-folders by document category — Contracts, Emails, Financial Records, Internal Memos — before uploading to your document review platform
Compressing Large Discovery Files for Court E-Filing
Federal courts impose strict file size limits on electronic submissions through CM/ECF (Case Management/Electronic Case Filing). The standard limit in most district courts is 35 MB per filing event, though some courts set limits as low as 10 MB per individual document. A discovery exhibit package containing scanned financial records, photographs, or high-resolution contract images can easily exceed these limits — sometimes by an order of magnitude. Large file sizes arise from several sources: high-resolution scanning of photographs or engineering drawings, embedded color images in financial statements, redundant metadata embedded by scanning software, and uncompressed font data. A scanned bank statement at 300 DPI in color can run 400–600 KB per page — fifty pages would be 20–30 MB before you add other exhibits to the same filing. PDF compression reduces file size by optimizing image resolution, consolidating embedded fonts, and stripping unnecessary metadata — all without making the document visually unreadable. For legal exhibits, the goal is the minimum file size that still renders clearly on screen and prints legibly. A 35 MB exhibit package can typically be compressed to 5–8 MB with no perceptible quality loss for typed text documents and clean scans. Critically: always retain the original uncompressed PDF in your case files. Submit the compressed version to court, but maintain the full-fidelity original. If opposing counsel challenges the authenticity or completeness of a submitted exhibit, your original demonstrates the compressed version is a faithful reproduction. For productions that cannot be compressed below filing limits, courts generally permit splitting filings into multiple attachments — each clearly labeled as Part 1 of 3, Part 2 of 3, and so on. Plan your filing strategy days before the deadline, not the night before.
Protecting Confidential Attorney-Client Privileged PDFs
Attorney-client privilege is one of the most foundational protections in legal practice, and inadvertent waiver through document mishandling is a real and recurring problem. When privileged communications exist in PDF form — email threads with clients, legal strategy memoranda, draft pleadings shared for review — those documents require active protection, not just a folder label reading 'Privileged.' Password protection for privileged PDFs creates a meaningful barrier against inadvertent disclosure. If a file is accidentally attached to the wrong email, sent to opposing counsel instead of co-counsel, or accessed by an unauthorized person, password protection prevents the content from being immediately read. Courts recognize password-protected inadvertent disclosures differently from unprotected ones — the presence of protection demonstrates reasonable care, which matters for clawback arguments under Federal Rule of Evidence 502. For highly sensitive documents — ongoing grand jury matters, ongoing criminal referral analysis, or M&A communications in securities litigation — restrict not just document opening but also printing and text-copying permissions. This prevents a recipient from extracting substantive content even if they somehow obtain access. Watermarking adds a second layer of protection that survives even if password protection is removed. Apply a 'PRIVILEGED AND CONFIDENTIAL — ATTORNEY-CLIENT COMMUNICATION' watermark to PDFs before sharing with co-counsel, consulting experts, or clients reviewing draft materials. This creates a persistent, visible record of the document's protected status and reinforces clawback arguments if a document is inadvertently produced in discovery. When sharing protected PDFs with co-counsel at other firms, use a secure transfer channel and communicate the password through a separate message — never in the same email thread as the attachment.
Organizing Discovery Productions into Logical PDF Bundles
Disorganized document production is both a tactical weakness and a professional risk. Judges notice when exhibits lack consistent numbering. Opposing counsel notices when productions arrive with random file names and no apparent organization. And your own team wastes hours searching for documents that were never systematically filed. A disciplined PDF organization approach centers on three practices: consistent naming conventions, logical document bundling, and sequential page numbering before filing. For naming, every document should carry enough information in its filename to be identified without opening it. Include the case number, a document type descriptor, and the date span of the document. For productions to opposing counsel, include the Bates number range in each filename. For bundling, merge related documents into single PDF files at logical boundaries. All contracts related to a single transaction should be one PDF. All emails from a specific custodian over a defined date range should be one PDF. This reduces the total number of files to track while keeping each file focused and manageable. LazyPDF's merge tool handles this: upload component PDFs in the correct order and download a single organized bundle. For page numbering, every multi-page PDF submitted to court should carry sequential page numbers. This is essential for citing specific pages in briefs ('See Exhibit A at p. 14') and for deposition questioning ('Let's look at page 47 of Exhibit 12'). Add page numbers before filing — retrofitting numbering to exhibits already cited in a filed brief requires an errata or corrected filing. For large productions you receive from opposing counsel, use split tools to divide oversized PDF dumps into logical document units. A 2,000-page undifferentiated PDF is an obstacle to review, not a production. Breaking it into 50–100 page segments organized by document type makes review tractable and allows reviewers to work on one category at a time without losing their place.
Frequently Asked Questions
Can I use free PDF tools for attorney-client privileged documents?
The critical security factor is where processing occurs. Browser-based PDF tools that process documents client-side — meaning the file is handled locally using your computer's resources and is never uploaded to a remote server — are appropriate for privileged materials. LazyPDF's OCR and other tools process documents in your browser without sending files to external servers. Always verify any tool's privacy policy before uploading privileged documents. Tools that upload to cloud servers and retain copies raise privilege concerns and may constitute disclosure to a third party. For highly sensitive matters — ongoing criminal investigations, M&A communications in active transactions — consider whether even client-side browser tools are appropriate, or whether locally installed desktop software provides greater assurance. The ABA's ethics guidance on cloud computing requires reasonable measures to safeguard client information, which means understanding exactly where document processing occurs.
What file size limits do federal courts impose for e-filing?
The standard CM/ECF limit in most federal district courts is 35 MB per filing event, but this varies significantly by court. The Southern District of New York permits 50 MB for some filing types. Some courts apply per-document limits rather than per-event limits, which can be as low as 10–15 MB per attachment. State courts often impose lower limits — 10–15 MB is common in state electronic filing systems. Always check the local rules of the specific court before filing large exhibit packages. When exhibits genuinely exceed limits after compression, courts typically permit filing multiple attachments with sequential labels (Exhibit A Part 1 of 3), or filing an exhibit list with a request to submit physical copies. Contact the clerk's office well before a deadline when you anticipate size issues — they are generally accommodating with advance notice.
How do I add Bates numbers to PDF documents?
Bates numbering applies a sequential identifier to each page of a document production, creating a permanent, unique reference for every page across potentially millions of documents. The standard format uses a party-identifier prefix followed by a zero-padded sequential number (e.g., PLAINTIFF000001 through PLAINTIFF004237, or DEF-SMITH-00001). LazyPDF's page-numbers tool adds sequential numbers across documents, which you can configure to match Bates-style formatting. For large productions requiring bulk Bates stamping across thousands of files simultaneously, specialized e-discovery platforms like Relativity or Everlaw automate this with prefix management. For smaller productions under a few hundred documents, applying sequential page numbers with LazyPDF and incorporating the Bates range in each filename is a practical, court-accepted approach that works well for smaller litigation matters.
Should discovery PDFs be password protected when sharing with opposing counsel?
Generally, no. Documents produced in discovery are by definition not privileged — you are producing them because they are responsive and unprotected. Password-protecting produced documents is unusual, would draw objections as imposing undue burden, and has no legal basis absent a specific protective order provision. Password protection is appropriate for: documents shared with your own client, co-counsel communications at other firms, expert witness engagement packages, and draft pleadings under attorney review. For confidential business documents produced under a protective order, the order itself — not a PDF password — is the legal mechanism governing use restrictions. Mark such documents per the order's designation protocol (typically a 'CONFIDENTIAL' watermark or header) rather than password-protecting the file.