How to Extract Images from PDF and Keep Their Original Size
When you extract images from a PDF, many tools silently shrink them. The embedded image might be 3000 × 2000 pixels at 300 dpi, but the extracted file comes out at 800 × 533 pixels at 72 dpi — a 94% reduction in pixel count. This happens because the extraction tool is rendering the PDF page and screenshotting it rather than actually extracting the embedded image data. There's an important distinction between two different things people mean by 'extract images from PDF': extracting the actual embedded image files (which can be JPEGs, PNGs, or other formats embedded in the PDF stream), and rendering each PDF page as an image at a specified resolution. These produce very different results. This guide explains both approaches, shows you how to get full-resolution extracted images regardless of which method you need, and compares the tools that do this correctly versus those that silently downsample your output.
Understanding How Images Are Stored in PDFs
PDFs store images in several ways. The most common: JPEG images embedded directly in the PDF's binary data. When you extract these, you get the original JPEG file exactly as it was embedded — original resolution, original compression, original color profile. This is the 'real' extraction approach. Vector graphics (charts, diagrams, icons) are stored as mathematical instructions in the PDF stream. These have no pixels — they're resolution-independent. To 'extract' a vector element, you must render it at a specific resolution (DPI), which creates pixels from mathematical descriptions. The higher the DPI, the larger and sharper the output. Mixed pages contain both raster images and vector overlays (text, borders, annotations). True extraction pulls only the raster images from the PDF stream. Page rendering captures everything visible on the page at once — both vectors and rasters. Knowing which type of content you're extracting determines which approach to use. For photographs embedded in a PDF, true extraction preserves original quality. For charts, diagrams, or rendered pages, high-DPI rendering is the correct approach.
- 1Identify what you want to extract: embedded photographs (use true extraction), or rendered page content (use high-DPI rendering).
- 2For true extraction: use LazyPDF Extract Images, pdfimages (poppler), or Adobe Acrobat's export function.
- 3For page rendering: use pdftoppm, Ghostscript, or ImageMagick at 300 dpi minimum.
- 4After extraction, verify dimensions: right-click → Properties on Windows, or Get Info on Mac.
- 5Compare against expected dimensions — a 300 dpi letter page should be 2550 × 3300 pixels.
- 6If dimensions are smaller than expected, the tool downsampled — try a different approach.
Tools That Extract at Full Original Resolution
LazyPDF Extract Images extracts the actual image data embedded in the PDF stream — not rendered screenshots. This means extracted JPEGs come out at their original resolution and compression quality, exactly as they were placed in the PDF. For PDFs containing high-resolution photographs, this produces full-quality images regardless of display zoom level. Pdfimages (command-line, part of poppler-utils) is the gold-standard tool for true image extraction. It reads the raw image streams from the PDF binary: `pdfimages -all input.pdf output` extracts all images to numbered files in their original format. The `-j` flag forces JPEG output, `-png` forces PNG. Install on Ubuntu/Debian: `sudo apt install poppler-utils`. Adobe Acrobat Pro: Tools → Edit PDF → select image → right-click → Export Image. Exports the embedded image at its original embedded resolution. Available in the paid Pro version. Python with PyMuPDF: `page.get_images()` returns embedded image data; `doc.extract_image(xref)['image']` gives the raw image bytes at original quality. This is the most programmatic approach for batch processing. Avoid tools that 'extract' by rendering the PDF page as a screenshot — these produce lower resolution output unless you explicitly set high DPI, and even then the output quality may differ from the original embedded image.
How to Verify Your Extracted Images Are Full Size
After extracting, always verify that image dimensions match your expectations: Window: right-click the extracted image → Properties → Details tab. Look for Width and Height in pixels, and Horizontal/Vertical resolution in DPI. For a photograph that was originally 300 dpi at 10 × 8 inches, you expect 3000 × 2400 pixels. Mac: right-click → Get Info → More Info section shows dimensions. Or open in Preview and check Tools → Show Inspector → General tab for pixel dimensions and DPI. Linux: `identify image.jpg` (ImageMagick) shows dimensions, color depth, and format. `exiftool image.jpg` shows embedded DPI metadata. If dimensions are smaller than expected: the tool used page rendering rather than true extraction, or the original embedded image was lower resolution than you thought. Check the original PDF — open it and zoom to 400% on the image in question. If it looks pixelated at high zoom, the embedded image is low-resolution regardless of what DPI you extract at. If only some images extracted at full size: PDFs often contain images at mixed resolutions — some high-res, some low-res. This is expected behavior, not a tool failure.
Batch Extraction While Preserving Original Size
For extracting all images from multiple PDFs while preserving original sizes, the command-line approach is most reliable: Using pdfimages for batch extraction: ```bash for f in *.pdf; do mkdir -p "${f%.pdf}_images" pdfimages -all "$f" "${f%.pdf}_images/img" done ``` This creates a folder for each PDF and extracts all embedded images at their original size. Using PyMuPDF for Python-based batch extraction: ```python import fitz, os for pdf_path in os.listdir('.'): if not pdf_path.endswith('.pdf'): continue doc = fitz.open(pdf_path) folder = pdf_path.replace('.pdf', '_images') os.makedirs(folder, exist_ok=True) for page_num, page in enumerate(doc): for img_index, img in enumerate(page.get_images()): xref = img[0] base = doc.extract_image(xref) ext = base['ext'] with open(f'{folder}/page{page_num+1}_img{img_index+1}.{ext}', 'wb') as f: f.write(base['image']) doc.close() ``` Both approaches extract images at their original embedded resolution — no rendering, no downsampling.
Frequently Asked Questions
Why are my extracted images from PDF smaller than the originals?
Most likely the tool is rendering the PDF page at screen resolution (72–96 dpi) and saving the screenshot as your 'extracted' image — not actually extracting the embedded image data. Use pdfimages (poppler), PyMuPDF, LazyPDF Extract Images, or Adobe Acrobat's true image extraction to get the original embedded images at their actual resolution.
How can I extract images from a PDF at a specific high resolution?
If you need page rendering at a specific DPI (for charts, diagrams, or mixed-content pages): use pdftoppm (`pdftoppm -jpeg -r 300 input.pdf output`), Ghostscript (`gs -sDEVICE=jpeg -r300 -sOutputFile=page%d.jpg input.pdf`), or PyMuPDF with a custom matrix (`mat = fitz.Matrix(300/72, 300/72)`). For extracting actual embedded image data without rendering, use pdfimages — the output resolution is whatever was embedded in the PDF.
Does LazyPDF Extract Images preserve original image quality?
Yes. LazyPDF's Extract Images tool reads the embedded image streams from the PDF and extracts them in their original format and resolution. For a PDF containing a 2400 × 3600 pixel JPEG at 300 dpi, the extracted file will be 2400 × 3600 pixels. This is true extraction, not page rendering.
Can I extract images from a PDF without them being recompressed?
Yes, with true extraction tools like pdfimages (using the `-j` flag for JPEG passthrough) or PyMuPDF's extract_image function. These read the raw compressed image bytes from the PDF stream without decompression or re-encoding — the extracted JPEG is byte-for-byte identical to what was embedded. Recompression only happens with tools that render the page as a new image.