The Complete Linux PDF Toolkit Guide for 2026
Linux is arguably the best platform for serious PDF work. The combination of powerful command-line tools, excellent scripting capabilities, and increasingly capable browser-based alternatives creates a PDF ecosystem that's flexible, free, and highly capable. Whether you need to process a single PDF or automate a workflow involving thousands of documents, Linux has the right tools. This complete guide is the reference resource for Linux PDF work in 2026. It covers every major PDF operation — viewing, merging, splitting, compressing, converting, OCR, watermarking, protecting, and signing — with the best tool recommendation for each task, installation commands, practical usage examples, and tips for building automated workflows. The guide is organized by task type rather than by tool. For each operation, we present the best command-line option for power users, the best GUI option for those who prefer visual interfaces, and the best browser-based option for users who want zero setup. All recommended tools are free and open-source (or free to use). By the end of this guide, you'll have a complete mental map of the Linux PDF ecosystem and know exactly which tool to reach for regardless of what PDF task you're facing.
Setting Up Your Linux PDF Toolkit: Installation Guide
Start by installing the core command-line PDF tools. This single command installs everything you'll need for 90% of PDF tasks on Ubuntu or Debian:
- 1Install core PDF tools: `sudo apt install ghostscript pdftk poppler-utils qpdf ocrmypdf tesseract-ocr tesseract-ocr-eng`
- 2Install GUI viewers: `sudo apt install okular evince` — Okular for features, Evince for lightweight viewing.
- 3Install LibreOffice for document conversion: `sudo apt install libreoffice` (usually pre-installed on Ubuntu).
- 4Install ImageMagick for image processing with PDFs: `sudo apt install imagemagick`
- 5Verify all installs: `gs --version && pdftk --version && pdfinfo --version && tesseract --version`
PDF Manipulation Reference: Every Common Operation
Here is a concise reference for the most common PDF operations on Linux with ready-to-use commands. **Merge PDFs:** - pdftk: `pdftk a.pdf b.pdf c.pdf cat output merged.pdf` - pdfunite: `pdfunite a.pdf b.pdf merged.pdf` **Split PDFs:** - Extract pages: `pdftk input.pdf cat 3-7 output section.pdf` - Burst all pages: `pdftk input.pdf burst output page_%04d.pdf` - Separate pages: `pdfseparate input.pdf page_%d.pdf` **Compress PDFs:** - Standard: `gs -dBATCH -dNOPAUSE -q -sDEVICE=pdfwrite -dPDFSETTINGS=/ebook -sColorConversionStrategy=RGB -sOutputFile=out.pdf in.pdf` **Rotate PDFs:** - Rotate all pages 90° clockwise: `pdftk input.pdf rotate 1-endeast output rotated.pdf` **Extract text from PDF:** - `pdftotext input.pdf output.txt` - With layout preservation: `pdftotext -layout input.pdf output.txt` **Encrypt PDF:** - `qpdf --encrypt password owner-pass 256 -- input.pdf encrypted.pdf` **Decrypt PDF:** - `qpdf --decrypt --password=pass input.pdf decrypted.pdf` **Convert PDF to images:** - `pdftoppm -jpeg -r 150 input.pdf output_prefix` **OCR scanned PDF:** - `ocrmypdf -l eng --deskew --clean input.pdf searchable.pdf`
PDF Format Conversion on Linux: Complete Reference
Format conversion is where LibreOffice becomes central to the Linux PDF toolkit. LibreOffice can convert between PDF and Office formats using its powerful `--headless` mode. **PDF to DOCX (Word):** `libreoffice --headless --infilter='writer_pdf_import' --convert-to docx input.pdf` **PDF to XLSX (Excel):** `libreoffice --headless --infilter='writer_pdf_import' --convert-to xlsx input.pdf` **PDF to PPTX (PowerPoint):** `libreoffice --headless --infilter='impress_pdf_import' --convert-to pptx input.pdf` **DOCX to PDF:** `libreoffice --headless --convert-to pdf document.docx` **XLSX to PDF:** `libreoffice --headless --convert-to pdf spreadsheet.xlsx` **HTML to PDF:** `libreoffice --headless --convert-to pdf page.html` For batch conversion, add `--outdir /path/to/output/` to specify where converted files go. Run multiple conversions in parallel with GNU Parallel for maximum throughput on multi-core systems. Note that PDF-to-Office conversion quality depends heavily on the original PDF's structure. Text-based PDFs (created from Office documents) convert much better than scanned PDFs, which require OCR first.
Building Automated PDF Workflows on Linux
The ultimate advantage of Linux for PDF work is automation. These tools compose naturally in shell scripts and can be scheduled, chained, and integrated into larger data processing pipelines. A complete document processing pipeline might look like this: incoming scanned PDFs are dropped in a watched directory, a script detects new files, runs OCRmyPDF to add text layers, compresses the results with Ghostscript, and stores them in an organized archive. This entire pipeline runs unattended, 24/7, with zero manual intervention. For watching directories and triggering scripts automatically, use inotifywait from the inotify-tools package: `sudo apt install inotify-tools`. Example watcher script: ```bash inotifywait -m /path/to/watch -e create -e moved_to | while read dir action file; do if [[ "$file" =~ \.pdf$ ]]; then ocrmypdf "$dir$file" "$dir/ocr_$file" fi done ``` For scheduled batch processing, cron handles nightly or weekly document runs. For more complex workflow logic, tools like Apache Airflow or simple Python scripts with scheduling can orchestrate multi-step PDF processing pipelines. For users who need quick, one-off operations without the complexity of CLI tools, LazyPDF in Firefox or Chromium handles any individual PDF task free in the browser — it's a perfect complement to the command-line toolkit for tasks that don't require automation.
Frequently Asked Questions
What's the minimum set of PDF tools to install on a new Linux system?
At minimum: `sudo apt install ghostscript poppler-utils`. Ghostscript handles compression and conversion; poppler-utils gives you pdftoppm (PDF to images), pdftotext, pdfunite, and pdfseparate. Add pdftk for easier merge/split syntax and ocrmypdf for OCR. These four packages cover virtually all common PDF operations.
Can I use Linux PDF tools on a remote server via SSH?
Yes, all command-line tools (Ghostscript, pdftk, poppler-utils, QPDF, Tesseract, OCRmyPDF, LibreOffice headless) work perfectly over SSH on remote Linux servers. LibreOffice requires `--headless` flag. This makes Linux ideal for server-side PDF processing in web applications and document management systems.
How do I convert a scanned PDF to a searchable, editable document on Linux?
Two-step process: first, run OCR with `ocrmypdf -l eng --deskew input.pdf searchable.pdf` to create a searchable PDF. Then convert to DOCX with LibreOffice: `libreoffice --headless --infilter='writer_pdf_import' --convert-to docx searchable.pdf`. The result is an editable Word document you can open in LibreOffice Writer.
Is there a free alternative to Adobe Acrobat Pro on Linux?
No single free tool perfectly replicates all of Acrobat Pro's features on Linux. However, the combination of Ghostscript (compression/conversion), pdftk (manipulation), OCRmyPDF (OCR), QPDF (encryption), LibreOffice (editing and conversion), and Okular (viewing/annotation) together cover most of what Acrobat Pro does — often better for specific tasks.