Scanned vs Digital PDFs: Why Scanned Files Are Larger and How to Fix It
<p>Scanned PDFs are dramatically larger than digitally created PDFs because each page is stored as a full raster image — a pixel grid typically captured at 200–600 DPI — while digital PDFs store text as lightweight vector outlines and only embed actual photographs as image data. A single-page scanned letter at 300 DPI occupies 8–25 MB as raw image data inside the PDF wrapper, whereas the same letter created in Microsoft Word and exported to PDF weighs just 50–120 KB. That is a size difference of 100–200x for identical visual content.</p><p>This size gap exists because scanners treat every page like a photograph. A flatbed scanner at 300 DPI captures an 8.5 × 11 inch page as a 2550 × 3300 pixel image — 8.4 million pixels. In 24-bit color, that single uncompressed image contains 25.2 MB of raw data. Even with JPEG compression inside the PDF, the result is typically 1–5 MB per page. Multiply by a 50-page contract and you reach 50–250 MB for a document that would be under 500 KB if typed digitally.</p><p>Digital PDFs avoid this entirely. When you create a PDF from Word, Google Docs, or any text editor, each character is represented by a glyph reference and a coordinate — a few bytes per character. A full page of 12-point text at 250 words consumes roughly 2–4 KB of vector data. Font subsets, page structure metadata, and embedded formatting add overhead, but the total rarely exceeds 50 KB per text-only page. This guide explains the internal mechanics behind this size difference, provides concrete benchmark data, and walks through proven strategies to reduce scanned PDF file sizes by 80–95%.</p>
DPI to File Size: The Numbers You Need to Know
<p>The most actionable data in scanned PDF management is understanding exactly how much data each DPI and color setting produces before and after compression. These benchmarks are for a standard 8.5 × 11 inch letter-size page and represent consistently measured values across common scanner models:</p><table><thead><tr><th>DPI Setting</th><th>Color Mode</th><th>Raw Image Size</th><th>JPEG Compressed (~80%)</th><th>CCITT Group 4 (mono)</th><th>Typical Use Case</th></tr></thead><tbody><tr><td>75 DPI</td><td>Color</td><td>2.4 MB</td><td>80–150 KB</td><td>N/A</td><td>Web thumbnails only — not suitable for reading</td></tr><tr><td>150 DPI</td><td>Color</td><td>9.5 MB</td><td>300–600 KB</td><td>N/A</td><td>Minimum for readable text; e-filing portals</td></tr><tr><td>200 DPI</td><td>Grayscale</td><td>4.3 MB</td><td>150–300 KB</td><td>30–60 KB</td><td>Internal document storage; acceptable quality</td></tr><tr><td>300 DPI</td><td>Color</td><td>25.2 MB</td><td>1–5 MB</td><td>N/A</td><td>Standard office scanning — most common default</td></tr><tr><td>300 DPI</td><td>Grayscale</td><td>8.4 MB</td><td>400–800 KB</td><td>N/A</td><td>Recommended for text-only documents</td></tr><tr><td>300 DPI</td><td>Monochrome (1-bit)</td><td>1.0 MB</td><td>N/A</td><td>30–80 KB</td><td>Best for pure black-and-white text — smallest file size</td></tr><tr><td>600 DPI</td><td>Color</td><td>100.8 MB</td><td>4–15 MB</td><td>N/A</td><td>Engineering drawings, fine print — rarely justified</td></tr><tr><td>600 DPI</td><td>Monochrome (1-bit)</td><td>4.2 MB</td><td>N/A</td><td>100–250 KB</td><td>Legal archival with fine detail requirements</td></tr></tbody></table><p>The most important insight from this table: a standard office scan at 300 DPI in color generates 1–5 MB per page, while the same page scanned at 300 DPI in monochrome (black and white) compresses to 30–80 KB with CCITT Group 4. That is a 25–65x size difference for visually identical output on black-text documents. Most office workers leave their scanner on the color default and are producing files 25x larger than necessary.</p><p>After scanning, compression provides a second opportunity to reduce size. LazyPDF's compression tool applies Ghostscript's optimized presets to scanned PDFs — selecting the right preset can reduce a 300 DPI color scan by 75–90%:</p><table><thead><tr><th>Compression Preset</th><th>Output DPI</th><th>JPEG Quality</th><th>Typical Size Reduction</th><th>Best For</th></tr></thead><tbody><tr><td>Screen</td><td>72 DPI</td><td>65</td><td>90–95%</td><td>On-screen viewing only; not suitable for printing</td></tr><tr><td>Ebook (recommended)</td><td>150 DPI</td><td>75</td><td>75–85%</td><td>Standard business documents, email attachments</td></tr><tr><td>Printer</td><td>300 DPI</td><td>85</td><td>40–60%</td><td>Documents requiring print quality preservation</td></tr><tr><td>Prepress</td><td>300 DPI</td><td>90</td><td>20–40%</td><td>High-quality archival, fine art reproduction</td></tr></tbody></table><p>If your PDF shows blank pages or rendering problems after an aggressive compression pass, our troubleshooting guide on <a href='/en/blog/pdf-shows-blank-pages-fix'>fixing blank pages in PDFs</a> covers the most common causes and solutions. For scanned documents intended for long-term archival, the PDF/A standard imposes specific technical constraints on compression and color profiles — our overview of <a href='/en/blog/pdf-format-types-pdf-a-pdf-x-pdf-ua-explained'>PDF/A, PDF/X, and PDF/UA format types</a> explains how those requirements interact with the compression settings covered in this guide.</p>
- 1Check your scanner's current DPI and color settingsOpen your scanner software and look for 'Resolution' or 'Quality' settings. Most office scanners default to 300 DPI color. If you're scanning text-only documents like contracts, invoices, or forms, change to grayscale (or monochrome for pure black-and-white) to immediately reduce file size by 60–95%.
- 2Match DPI to your intended useFor email attachments and portal uploads: 150–200 DPI grayscale. For internal document storage: 200–300 DPI grayscale. For legal archival requiring maximum fidelity: 300 DPI color or 600 DPI monochrome. Avoid 600 DPI color unless you need to capture fine color details — it produces files 16x larger than 150 DPI.
- 3Compress the scanned PDF after scanningEven with optimal scanner settings, apply a compression pass via /en/compress. The Ebook preset (150 DPI, JPEG quality 75) is the recommended default for most scanned documents. A 50-page color scan at 300 DPI typically compresses from 142 MB to 18–28 MB at Ebook quality — a 85% reduction achieved in under 30 seconds.
Inside a Digital PDF: How Vector Text Keeps Files Small
<p>A digitally created PDF stores text, lines, shapes, and most graphical elements as vector instructions — mathematical descriptions that define what to draw rather than storing every pixel of the result. Understanding this internal structure explains why digital PDFs stay compact regardless of page count or visual complexity.</p><p>When Microsoft Word exports a document to PDF, each character becomes a glyph identifier pointing into an embedded font subset. The PDF content stream contains instructions like 'move to coordinate (72, 680), show glyph 42 from font F1' — a few bytes per character. A full page of body text at 250 words and approximately 1,500 characters requires about 3 KB of content stream data. The font subset itself — containing only the specific glyphs used in the document rather than the full 2,000+ glyph font file — adds 15–40 KB per font. A typical business document uses 2–3 fonts, contributing 50–100 KB total for the entire file regardless of page count, because the font data is shared across all pages.</p><p>Vector graphics follow the same principle. A horizontal rule across the page is described as 'draw line from (72, 400) to (540, 400) with 1-point stroke width' — about 30 bytes. A table with 10 rows and 5 columns is defined by 60 line segments and coordinate pairs — roughly 2 KB. These elements render at any zoom level or print resolution with perfect sharpness because the viewer recalculates the pixels on the fly from the mathematical description.</p><p>Embedded raster images are the one exception where digital PDFs grow significantly. A photograph inserted into a Word document gets embedded in the PDF at its original resolution. A 12-megapixel smartphone photo inserted into a single-page report can make that PDF 4–8 MB — but this is the photo's contribution, not the text or layout. Remove or resize the photo and the file drops back to under 100 KB.</p><p>The practical result: a 100-page text-only contract created in Word and saved as PDF weighs 400–800 KB. The same 100 pages with occasional charts and tables reach 1–2 MB. Only when high-resolution photographs are embedded does the file size climb meaningfully — and the photographs are the exact same raster data that makes scanned PDFs large. The difference is that a digital PDF embeds photos selectively, while a scanned PDF treats every page as a photo.</p>
- 1Identify whether your PDF is digital or scannedOpen the PDF and try to select text. If you can highlight individual words and copy them as editable text, the document stores vector text data — it is a digital PDF. If selection highlights the entire page as a single block, or if you cannot select text at all, the page is a raster scan.
- 2Check zoom behaviorZoom in to 400% or higher. Digital PDF text remains perfectly sharp at extreme magnification — this confirms vector rendering. Scanned PDF text becomes visibly pixelated at high zoom, revealing the underlying raster image. This test takes 10 seconds and definitively identifies the PDF type.
- 3Review per-page file size distributionIn a mixed PDF, text-only pages contribute 2–5 KB each while pages containing photographs contribute 200 KB–5 MB each. This disparity reveals which pages are driving the total file size. Use /en/split to isolate high-size pages and apply different compression settings to each section.
Inside a Scanned PDF: Why Every Page Is a Multi-Megabyte Image
<p>A scanned PDF is structurally a container holding one large raster image per page, wrapped in PDF metadata. The scanner captures the physical page as a photograph at a specified resolution, compresses it (usually as JPEG or CCITT Group 4), and wraps the result in PDF page objects. No vector text, no font data, no mathematical shape descriptions — just pixels.</p><p>The resolution setting on the scanner determines the raw data volume before compression. At 300 DPI — the standard quality setting on most office scanners — an 8.5 × 11 inch page produces a 2550 × 3300 pixel image. In 24-bit RGB color, that is 25.2 MB of uncompressed data per page. At 600 DPI, the same page becomes 5100 × 6600 pixels and 100.8 MB uncompressed. Even at 200 DPI — the minimum setting that produces readable text — the image is 1700 × 2200 pixels and 11.2 MB uncompressed. Choosing the right color mode (RGB, grayscale, or CMYK) directly affects how large a scanned PDF becomes — our guide to <a href="/en/blog/pdf-color-spaces-rgb-cmyk-guide">PDF color spaces: RGB vs CMYK</a> explains the trade-offs for both screen and print workflows.</p><p>JPEG compression inside the PDF reduces these raw sizes significantly but not enough to match digital PDF efficiency. At typical scanner JPEG quality settings (80–90%), a 300 DPI color scan compresses to 1–5 MB per page depending on content complexity. Pages with dense text compress better than pages with photographs or color graphics because JPEG exploits spatial frequency patterns that text creates. A 50-page scanned contract at 300 DPI with JPEG compression typically weighs 75–150 MB.</p><p>Some scanners use CCITT Group 4 compression instead of JPEG, particularly for monochrome (black and white) scans. CCITT Group 4 is a lossless fax-standard algorithm optimized for bilevel images. A 300 DPI monochrome scan of a text page compresses to 30–80 KB with CCITT Group 4 — dramatically smaller than JPEG color scans. However, most modern office scanners default to color scanning even for black-and-white documents, storing three color channels of nearly identical data and missing the 10–20x size advantage that monochrome CCITT compression would provide.</p><p>Color depth is the often-overlooked multiplier. A 24-bit color scan stores 3 bytes per pixel (red, green, blue channels). A grayscale scan stores 1 byte per pixel — immediately 3x smaller before compression. A monochrome (1-bit) scan stores 1 bit per pixel — 24x smaller than color before compression. For black-text-on-white-paper documents, color scanning wastes 95%+ of the stored data on color information that does not exist in the original document.</p><p>For mobile users, our guide on <a href='/en/blog/scan-multiple-pages-to-pdf-mobile'>scanning multiple pages to a single PDF</a> on a smartphone covers how to minimize file size from the scanning stage itself — choosing the right app settings before you scan can prevent bloated files entirely.</p>
Head-to-Head Benchmarks: Scanned vs Digital File Size Comparison
<p>Abstract explanations of raster versus vector efficiency become concrete when you compare identical documents in both formats. The following benchmarks compare real documents created digitally in Microsoft Word and then printed and re-scanned at standard office scanner settings.</p><p><strong>One-page business letter (250 words, no images):</strong><br>Digital PDF exported from Word: 68 KB. Scanned at 300 DPI color JPEG: 1.8 MB. Scanned at 300 DPI grayscale JPEG: 620 KB. Scanned at 300 DPI monochrome CCITT: 42 KB. The digital PDF is actually larger than the optimized monochrome scan because of embedded font data, but 26x smaller than the color scan that most office workers would produce by default.</p><p><strong>Ten-page contract (3,200 words, two signature blocks, company letterhead):</strong><br>Digital PDF: 210 KB. Color scan at 300 DPI: 18.4 MB. Grayscale scan at 300 DPI: 6.1 MB. The color scan is 87x larger than the digital original. After LazyPDF compression at Ebook quality (150 DPI), the color scan drops to 2.3 MB — still 11x larger than the digital PDF, but small enough for email attachment.</p><p><strong>Fifty-page employee handbook (15,000 words, 12 photographs, charts on 8 pages):</strong><br>Digital PDF with embedded photos: 4.8 MB. Color scan at 300 DPI: 142 MB. Grayscale scan at 200 DPI: 31 MB. The digital PDF is 29x smaller than the color scan. After compression, the color scan reaches 14.2 MB and the grayscale scan reaches 4.8 MB — matching the digital PDF's size but with inferior text sharpness because the text is still raster pixels rather than vector outlines.</p><p><strong>Single-page form with handwritten entries (IRS W-9, filled by hand):</strong><br>Digital blank form from IRS website: 78 KB. Color scan of completed form: 2.4 MB. This represents the legitimate use case for scanning — capturing handwritten content that cannot exist digitally. After compression at 150 DPI: 380 KB. After OCR processing followed by compression: 420 KB (slightly larger because the OCR text layer adds 40 KB, but the document is now searchable).</p><p><strong>Two-hundred-page technical manual (65,000 words, 45 diagrams, 20 photographs):</strong><br>Digital PDF from InDesign: 12.4 MB (already optimized during export). Color scan at 300 DPI: 580 MB. The scan is 47x larger. After compression at Ebook quality: 52 MB. After compression at Screen quality: 18 MB. The digital PDF remains 3–4x smaller than even the most aggressively compressed scan because vector text and diagram data cannot be replicated by raster compression alone.</p><p>These benchmarks reveal a consistent pattern: scanned PDFs are 25–90x larger than their digital equivalents for text-heavy documents, and 5–15x larger for photo-heavy documents where the digital PDF already contains significant raster data. Compression can close the gap to 3–10x, but a scanned PDF can never match the efficiency of a natively digital file because the fundamental data representation remains raster pixels rather than vector instructions.</p><p>For a deeper look at how embedded images within digital PDFs behave — including CMYK handling and transparency reconstruction — our guide on <a href='/en/blog/extract-images-from-pdf-high-quality'>extracting images from PDFs in high quality</a> covers the mechanics of how image data is stored and retrieved from both scanned and digital PDF formats.</p>
How to Shrink Scanned PDFs: Compression and Optimization Strategies
<p>Reducing scanned PDF file size requires a multi-step approach because no single technique addresses all the factors that make scans large. The optimal workflow combines resolution reduction, color space optimization, stream compression, and structural cleanup in a specific order to maximize savings without compounding quality loss.</p><p>The most impactful single step is resolution reduction via lossy compression. A 300 DPI scan downsampled to 150 DPI discards 75% of pixel data (half the width times half the height) while maintaining full readability for standard business text at 10–12 point size. LazyPDF's Ebook compression preset applies this 150 DPI target automatically, typically reducing scanned PDF size by 75–85% in a single operation. For a 50-page color scan at 142 MB, this single step produces a 21–28 MB file. For a detailed guide on choosing the right compression preset and understanding quality trade-offs, see our article on <a href='/en/blog/compress-pdf-without-losing-quality'>compressing PDFs without quality loss</a>.</p><p>Color space conversion provides the second largest gain. If the scanned document is primarily black text on white paper — contracts, invoices, tax forms, legal filings — converting from 24-bit RGB to 8-bit grayscale eliminates two-thirds of the color data. Combined with 150 DPI downsampling, grayscale conversion typically achieves 85–92% total reduction. The 142 MB contract scan drops to 11–14 MB. For documents that are strictly black and white with no gray tones, monochrome (1-bit) conversion achieves 95%+ reduction.</p><p>JPEG quality parameter tuning offers additional control. The default JPEG quality in most compression tools is 75–85 on a 0–100 scale. Reducing to 65–70 saves an additional 20–30% with minimal visible impact on text readability. Below 60, JPEG blocking artifacts become visible around text edges and fine lines, making the trade-off unacceptable for professional documents. LazyPDF's presets are calibrated to avoid this threshold — Ebook uses quality 75, Screen uses quality 65, and Printer uses quality 85.</p><p>For organizations processing high volumes of scanned documents — law firms digitizing case files, accounting departments archiving invoices, healthcare facilities scanning patient records — the cumulative impact is substantial. A law firm scanning 500 pages per day at 300 DPI color generates approximately 1.5 GB of scanned PDF data daily. Applying 150 DPI compression with grayscale conversion reduces daily storage to 150–225 MB — a 85–90% reduction that translates to 400+ GB of annual storage savings.</p>
- 1Upload your scanned PDF to LazyPDF's compress toolGo to /en/compress and upload your scanned PDF. Select the Ebook preset (150 DPI) for standard business documents or the Printer preset (300 DPI) for documents requiring maximum detail preservation. For very large files over 100 MB, split the document into 50-page segments first using /en/split, then compress each segment.
- 2Review the compression resultsFor scanned documents, expect 75–90% reduction at Ebook quality. If the result exceeds your size target, consider whether Screen quality (72 DPI) is acceptable for your use case — it typically achieves 90–95% reduction but text may appear soft at high zoom.
- 3For text-only documents, re-scan in grayscale or monochromeIf possible, re-scan black-and-white text documents in grayscale or monochrome mode. This single scanner setting change reduces source file size by 60–95% before any PDF compression is applied — it is the most effective optimization you can make.
- 4Add OCR to enable text searchIf you need the document to be searchable, run OCR processing at /en/ocr before compressing. OCR adds a lightweight text layer (typically 1–4 KB per page) that enables full-text search, copy-paste, and screen reader accessibility. Do OCR first at full resolution for best accuracy, then compress.
- 5Merge compressed segments back togetherIf you split the file before compressing, reassemble the compressed segments using /en/merge. Arrange them in the correct order and combine into a single PDF. The final file will maintain the compression applied to each segment.
OCR: Making Scanned PDFs Searchable and More Efficient
<p>Optical Character Recognition transforms a scanned PDF from a collection of pixel images into a hybrid document with a searchable text layer overlaid on the original scan. While OCR's primary purpose is enabling text search and copy-paste, it also creates opportunities for more efficient storage and opens the door to converting scanned pages into true digital PDFs.</p><p>The OCR process analyzes each page image, identifies character shapes using pattern matching and neural network classifiers, and embeds the recognized text as an invisible layer positioned precisely over the corresponding characters in the scan image. The visual appearance of the page does not change — the raster image remains the visible layer. But text search, screen readers, copy-paste operations, and automated document processing can now access the underlying text content.</p><p>LazyPDF's OCR engine (tesseract.js v7) runs entirely in the browser — no server upload required. Processing speed depends on page complexity and device hardware, but typical rates are 2–5 seconds per page on a modern laptop. A 20-page scanned contract completes OCR in 40–100 seconds. The OCR text layer adds minimal file size — typically 1–4 KB per page, or 50–200 KB for a 50-page document — because text stored as Unicode characters is inherently compact compared to the megabytes of raster data on each page.</p><p>OCR accuracy depends directly on scan quality. At 300 DPI with clean, high-contrast text, modern OCR engines achieve 98–99.5% character accuracy on standard fonts. At 200 DPI, accuracy drops to 95–98%. Below 150 DPI, accuracy falls sharply for small text (8-point footnotes, fine print) and decorative fonts. This accuracy gradient is why compressing before OCR can be counterproductive — downsampling from 300 to 150 DPI before OCR reduces the pixel data available for character recognition. If you need to <a href='/en/blog/ocr-pdf-offline-without-cloud'>OCR your PDF offline without a cloud upload</a>, LazyPDF's browser-based approach keeps all document data on your device.</p><p>The recommended workflow for scanned documents that need both small file size and searchability: OCR first at the original scan resolution (300 DPI), then compress the OCR-processed file. This sequence gives the OCR engine the best possible image data for recognition accuracy, then reduces storage without affecting the text layer. Language support affects OCR accuracy significantly — LazyPDF supports 100+ languages through tesseract.js language packs.</p>
Converting Between Scanned and Digital PDF Formats
<p>Understanding format conversion options helps you choose the right approach based on your specific needs — whether that is reducing file size, enabling text editing, improving accessibility, or preparing documents for long-term archival storage.</p><p><strong>Scanned PDF to compressed scanned PDF (fastest, simplest):</strong> This is the most common operation and requires only a single compression pass. The output remains a raster-based PDF with reduced resolution and optimized JPEG parameters. File size reduction: 75–90%. Text remains as pixel images — not selectable or searchable. Processing time: 2–10 seconds for a typical document. Use case: reducing file size for email, cloud storage, or upload to portals with size limits.</p><p><strong>Scanned PDF to searchable scanned PDF (OCR overlay):</strong> OCR adds a text layer without altering the visual scan. The output is a hybrid: raster images for visual display, vector text for search and accessibility. File size increases slightly (1–4 KB per page for the text layer). Processing time: 2–5 seconds per page. Use case: making scanned archives searchable, meeting accessibility requirements (Section 508, WCAG 2.1). For long-term archival, understanding the distinctions between <a href='/en/blog/pdf-format-types-pdf-a-pdf-x-pdf-ua-explained'>PDF/A, PDF/X, and PDF/UA format types</a> helps you choose the right standard for your use case. Note that both scanned and digital PDFs carry metadata — author, creation date, software origin — that may need to be reviewed or removed before sharing; our guide on <a href="/en/blog/pdf-metadata-how-to-view-edit-remove">how to view and remove PDF metadata</a> covers this in detail.</p><p><strong>Scanned PDF to individual images (extraction):</strong> Using LazyPDF's PDF-to-JPG tool, each scanned page can be extracted as a standalone JPEG or PNG image. For PDFs with embedded graphics rather than full-page scans, you can also extract individual embedded images at their original resolution — our guide on <a href='/en/blog/extract-images-from-pdf-high-quality'>extracting images from PDFs in high quality</a> covers CMYK handling, transparency reconstruction, and resolution expectations. This is useful when individual pages need to be processed through specialized image editing software.</p><p><strong>Images to scanned PDF (assembly):</strong> LazyPDF's Image-to-PDF tool assembles individual scanned images (JPEG, PNG, TIFF) into a multi-page PDF. If you need to scan multiple pages into a single PDF directly from your phone, our guide on <a href='/en/blog/scan-multiple-pages-to-pdf-iphone-android-free'>scanning multiple pages to one PDF on iPhone and Android</a> walks through the built-in iOS and Android workflows in detail. The output is a raster PDF with no text layer unless OCR is applied afterward.</p><p><strong>Choosing the right conversion path depends on three factors:</strong> How much size reduction do you need? A simple compression pass gets you 80% of the way with zero effort. Do you need text searchability? Add OCR for a small processing time investment with no size penalty. Do you need full editability? Plan for significant manual effort or accept imperfect automated conversion. Most users need only compression — the effort-to-benefit ratio of full reconstruction is justified only for documents that will be actively edited and redistributed.</p>
- 1Determine your primary goalIdentify what you actually need: file size reduction, text searchability, image extraction, or full format conversion. Each goal maps to a different tool and workflow. For size reduction only: use /en/compress. For searchability: use /en/ocr first, then compress. For image extraction: use /en/pdf-to-jpg or /en/extract-images.
- 2For size reduction, use the Ebook presetGo to /en/compress and select the Ebook preset (150 DPI). This handles 90% of scanned PDF optimization needs in a single step with no configuration required. For documents requiring print-quality output, use the Printer preset (300 DPI) instead.
- 3For searchability, OCR before compressingRun /en/ocr on the original full-resolution scan before compressing. This ensures maximum OCR accuracy from the highest-quality source image. After OCR completes, the resulting file is slightly larger but fully searchable — then compress it to your target size.
- 4For extracting pages or imagesUse /en/pdf-to-jpg to extract pages as images. Select JPEG for photographs or PNG for text and line art to preserve sharp edges. Use /en/extract-images to pull embedded images from within the PDF at their native resolution.
- 5For assembling multiple scanned images into one PDFUse /en/image-to-pdf. Arrange pages in the correct order before conversion, as reordering after assembly requires an additional step with the organize tool at /en/organize. After assembly, apply OCR and compression as needed.
Frequently Asked Questions
Why is my scanned PDF 50 MB when the original Word document was only 200 KB?
Your scanner captured each page as a full raster image at 300 DPI, storing 8.4 million pixels per page in 24-bit color. The Word PDF stores the same text as vector outlines at roughly 3 KB per page. A 20-page document scanned in color at 300 DPI produces 40–100 MB of image data, while the vector equivalent stays under 500 KB because mathematical shape descriptions are inherently thousands of times more compact than pixel grids.
Can I make a scanned PDF as small as a digitally created PDF?
Not without fully recreating the document digitally. Compression can reduce a scanned PDF by 80–90%, but the result will still be 3–10x larger than a native digital PDF because raster pixels cannot match the compactness of vector text storage. The only way to achieve true digital PDF file sizes is to OCR the text, paste it into a word processor, and re-export as PDF.
Should I scan in color or grayscale for smaller file sizes?
Scan in grayscale for black-and-white text documents. Grayscale stores 1 byte per pixel versus 3 bytes for color, producing files approximately 60% smaller before any PDF compression. For documents with no meaningful color content — contracts, invoices, forms, printed text — color scanning wastes storage on empty red and blue channels that contain no useful information.
Does OCR increase or decrease scanned PDF file size?
OCR increases file size slightly — typically by 1–4 KB per page — because it adds a text layer on top of the existing scan images. A 50-page document gains roughly 50–200 KB from OCR. However, this small increase enables text search, copy-paste, and accessibility features. Compressing after OCR reduces the total file size well below the original while preserving the searchable text layer.
What DPI should I use when scanning documents I plan to compress later?
Scan at 300 DPI for the best balance between source quality and compression flexibility. Scanning at 600 DPI doubles processing time and storage with no visible benefit after compression to 150 DPI. Scanning below 200 DPI limits future OCR accuracy and produces visibly soft text even before compression. The 300 DPI scan gives compression algorithms the most data to work with for optimal output quality.
Why does my scanned PDF look blurry after compression but digital PDFs stay sharp?
Digital PDFs store text as vector outlines that render at any resolution without quality loss — compression only affects embedded photographs. Scanned PDFs store text as pixels, so downsampling from 300 DPI to 150 DPI reduces text pixel density by 75%. At normal viewing zoom the difference is imperceptible, but at 200%+ zoom, compressed scans show visible softness that vector text never exhibits.
What is the fastest way to reduce a scanned PDF for email?
Upload to /en/compress and select the Ebook preset (150 DPI). This reduces most scanned PDFs by 75–85% in under 30 seconds — a 50-page color scan at 140 MB typically compresses to 18–28 MB. If you need to hit a specific size limit like 10 MB or 25 MB, try the Screen preset (72 DPI) which achieves 90–95% reduction at the cost of some text sharpness.