Paralegal's Step-by-Step Guide to Discovery PDF Processing

Discovery is the phase of litigation that generates the most documents and demands the most rigorous document management discipline. When opposing counsel produces a set of 5,000 PDFs in response to document requests, the paralegal's ability to rapidly process, organize, and make that production searchable directly determines the speed and quality of the attorney's subsequent review. Inefficient discovery processing costs money, creates errors, and can result in important evidence being missed simply because it was buried in an unorganized archive. Modern discovery PDF processing has become a core competency for litigation paralegals. Unlike the paper discovery era, digital productions require technical skills: applying OCR to make image PDFs searchable, splitting combined productions into individual documents, applying consistent naming and filing conventions, and compressing files for efficient storage and transmission. These tasks may seem administrative, but they are analytically important — the quality of the paralegal's processing work determines how effectively the legal team can find and use the evidence. This guide walks through the complete discovery PDF processing workflow, from the moment a production arrives through the creation of a fully organized, searchable discovery archive. It covers both plaintiff and defense perspectives, addresses the specific challenges of large-volume productions, and provides practical techniques that experienced litigation paralegals rely on in high-pressure case environments.

Receiving and Inventorying a Document Production

The first step when a discovery production arrives is to create a complete inventory before processing begins. How many files did you receive? What is the total size of the production? Are all the documents PDFs, or is it a mixed format production including native files (Excel, Word, email files)? Does the production include a privilege log identifying documents withheld? Answering these questions before starting to process gives you a baseline to verify completeness and helps identify any early problems — such as receiving fewer files than the production letter claimed to include. Create a production receipt memo in your case file documenting: the date received, the producing party's name, the number of files received, the total production size, the production number or bates range (if applicable), the format of the production, and any accompanying cover letter or privilege log. This memo becomes part of the case's permanent record and can be important if there are later disputes about what was actually produced. For very large productions, run a quick quality check before full processing: sample several documents from the beginning, middle, and end of the production to confirm they open correctly, that the image quality is adequate for OCR processing, and that the documents are what the production letter represents them to be. Identifying production quality problems early — corrupt files, intentionally degraded scans, missing pages — lets you raise those issues with opposing counsel promptly rather than discovering them weeks later during review.

1Count and log all files received, confirming they match the production letter's representations.
2Create a production receipt memo in the case file with date, party, file count, and format.
3Sample documents from the beginning, middle, and end to verify quality before processing.
4Confirm that the production includes a privilege log if documents were withheld on privilege grounds.
5Log any discrepancies or quality issues for potential follow-up with opposing counsel.

OCR Processing and Converting Image PDFs to Text-Searchable Files

The single highest-value step in discovery PDF processing is OCR (Optical Character Recognition). Scanned documents, photographed records, and image-based PDFs are invisible to search tools until OCR processing converts their image content into machine-readable text. In a production that includes scanned paper records — which is common for any document production that involves older records — unprocessed image PDFs can contain critical evidence that will never be found if the review team has to read every page without the ability to search. Apply OCR to every PDF in the production that is image-based rather than native-digital. Modern PDFs created directly from digital systems (emails, word processor documents, spreadsheet exports) are already text-searchable and do not need OCR processing. Image PDFs created by scanning physical documents do. To identify which is which: try selecting text in the PDF — if you can highlight text with your cursor, it already contains a text layer; if you cannot, it needs OCR processing. After OCR processing, verify the quality of the text recognition on a sample of documents. OCR accuracy depends heavily on scan quality, handwriting legibility, and document clarity. For typed documents scanned at adequate resolution, OCR accuracy is typically very high — above 99% character accuracy for clean scans. For older documents, carbon copies, or documents with handwritten annotations, accuracy may be lower, and you should note this for the reviewing attorney so they know to read those documents carefully rather than relying solely on keyword searches.

1Identify all image-based PDFs in the production by attempting text selection.
2Apply OCR to all image PDFs before organizing or reviewing the production.
3Verify OCR quality on a random sample of 10-15 documents from the production.
4Note any documents with poor OCR quality and flag them for manual review.
5Confirm that the final processed documents are text-searchable using keyword searches on known terms.

Organizing and Filing the Processed Production

After OCR processing, the next task is organizing the production into a structure that supports efficient review. The right organization structure depends on the case and the production content, but several principles apply universally: documents should be filed in a logical, consistent location; the filing structure should be intuitive enough for any team member to navigate without a guide; and every document should be findable through either its filing location or a central index. For productions that arrive with bates numbers, preserve the bates numbering in your filing and naming system. Bates numbers are the authoritative identifier for each document in discovery — they appear in deposition transcripts, expert reports, and court filings. Your internal organization system must always allow you to retrieve any document by its bates number. For productions without bates numbers (which is common in many state court matters), create your own production numbering system: 'PROD001-0001.pdf' through 'PROD001-[last]'. This creates a stable identifier for every document that you control, independent of whatever the producing party used in their own system. When documents from this production are cited in work product, they can always be located using the production number reference.

1Create a discovery production folder for each producing party and production number.
2Preserve bates numbers in filenames; create internal numbering if bates are not provided.
3File documents in category subfolders based on document type or date range.
4Build a production index spreadsheet with every document's identifier, filename, type, and date.
5Cross-reference the index to the production log to verify all produced documents are accounted for.

Preparing Responsive PDFs for Deposition and Trial

The final stage of discovery PDF work is using the organized production to support deposition preparation and trial readiness. At deposition, the paralegal may need to quickly locate any document in the case file in response to the attorney's requests during breaks. At trial, exhibits must be in precise order, instantly accessible, and displayable on courtroom technology without delay. For deposition support, maintain a 'Deposition Hot Docs' folder for each scheduled deposition, containing copies of every document likely to be used during that deposition. This eliminates the need to search the entire case file under deposition pressure — every potential exhibit is already pre-identified and immediately accessible. Update this folder as the deposition outline develops, adding or removing documents as the attorney's preparation evolves. For trial exhibits, compress all exhibit PDFs to ensure they load and display instantly on presentation software. Test every exhibit on the actual courtroom presentation system before trial day if at all possible. One technical failure with a key exhibit during trial testimony is a professional and logistical crisis that proper PDF preparation prevents. Label every trial exhibit PDF with its exhibit number in both the filename and as a visible notation on the first page of the document.

Frequently Asked Questions

How do I handle a discovery production that arrives as one massive PDF with all documents combined?

A common production format is a single large PDF combining hundreds of individual documents, often separated by bates number pages or header lines. The most efficient approach is to split this combined production file into individual documents using a PDF split tool. Identify the natural split points — usually the first page of each document — and split at those page boundaries. For productions with bates headers, split at each new bates prefix. After splitting, apply OCR to each individual document file and file it in your organized discovery folder structure. This per-document filing makes individual document retrieval far faster than searching a combined 500-page PDF file.

What should I do when OCR misreads critical numbers or names in a key document?

OCR errors on critical documents — particularly numbers in financial records or names of key witnesses — can cause those documents to be missed during keyword searches. When you identify an important document with OCR errors, manually annotate the document's entry in the production index with the correct terms (noting that OCR read them incorrectly). Also add a note in your review memo flagging these documents for personal attention rather than relying on search tools. For the most critical documents, consider adding a searchable text layer manually through your PDF software's annotation tools, though this should be tracked carefully to preserve the original document's integrity.

How should paralegals track which discovery documents have been reviewed?

Maintain a review tracking spreadsheet as part of your production index. Add columns for 'Reviewed By', 'Review Date', 'Relevance Rating' (highly relevant / potentially relevant / not relevant), and 'Privilege Flag'. As each document is reviewed, update its row in the index. This tracking enables the supervising attorney to see at a glance how review is progressing, which documents have been flagged as potentially privileged (for separate review before they are inadvertently produced), and which documents have been identified as highly relevant for priority attention.

Can paralegals independently determine if a document should be withheld as privileged?

No — privilege determinations must be made by or under the supervision of the responsible attorney, not independently by a paralegal. When a paralegal encounters a document that appears to involve attorney-client communications or attorney work product (a communication with counsel, a document prepared at counsel's direction, a legal memorandum), the correct procedure is to flag it in the review tracking system and route it to the supervising attorney for a privilege determination before including it in any production or disclosure. Under professional responsibility rules, paralegals who make independent privilege calls without attorney supervision expose the firm to potential waiver of privilege.

Process discovery productions faster and more accurately. Use LazyPDF for OCR, splitting, and document organization — free, no account needed.

Apply OCR to Production PDFs

Industry Guides