You scan a stack of receipts, a signed contract, or an old family record, and the scanner asks what to save it as — PDF, TIFF, or JPEG. Pick wrong and you end up with a 200 MB folder of single-page images nobody can open, or a tidy-looking file that turns to mush when you zoom into the fine print, or an “archived” document that won’t render the same way in ten years. For most scanned documents the answer is PDF — it holds every page in one file, opens on any device, and can carry an invisible text layer that makes the scan searchable. But TIFF and JPEG each win specific jobs. This guide covers which format fits which use case, why “searchable” depends on OCR, and the archival-grade PDF/A standard for records you have to keep.
Quick answer: Use PDF for almost every scanned document — it’s multi-page, universally viewable, compact, and can hold an OCR text layer so the scan is searchable. Use TIFF only for a lossless preservation master or professional print work (large files, needs special software to view). Use JPEG only for a single-page photographic scan where small size beats everything — it’s lossy and a poor choice for text. For long-term archives, save PDF/A (ISO 19005), the self-contained archival subset of PDF. You can merge your scans into one PDF here.
Jump to a section
- The three contenders at a glance
- Why PDF is the default for scanned documents
- “Searchable” means an OCR text layer
- PDF/A: the archival standard (ISO 19005)
- TIFF: the lossless preservation master
- JPEG: small, lossy, and wrong for text
- Decision table: which format for which job
- Turn your scans into one PDF
- FAQ
The three contenders at a glance
All three can store a scanned page, but they were built for different things:
- PDF (Portable Document Format) is a document container. One PDF holds many pages in order, renders identically on phones, browsers, and every OS, and can carry an invisible text layer on top of the page images. It’s the format people expect a scanned document to arrive in.
- TIFF (Tagged Image File Format) is a flexible image container. It can hold multiple pages and is lossless, which makes it the institutional choice for preservation masters — but files are large and most apps and browsers can’t display it without dedicated software.
- JPEG is a single, lossy photographic image. It’s small and opens everywhere, but it discards detail to save space and produces visible artifacts around the sharp edges of text. There’s no multi-page JPEG.
The split is structural, not a simple “quality” ranking. The right pick depends on what you’re going to do with the scan — share it, search it, print it, or preserve it.
Why PDF is the default for scanned documents
For the vast majority of scanned documents — contracts, invoices, forms, statements, school records — PDF is the right answer for four reasons at once:
- Multi-page in one file. A 30-page agreement becomes a single PDF, in order, not 30 loose image files to name and re-order. (TIFF can also do multi-page; JPEG cannot.)
- Universal viewing. Every browser, phone, and OS opens a PDF without extra software. You can email it, upload it to a portal, or drop it in a chat and be confident it renders. TIFF generally needs a dedicated viewer.
- Compact. A PDF assembled from page scans is typically far smaller than the equivalent multi-page TIFF, and you can compress it further when needed.
- It can be searchable. A PDF can carry a hidden OCR text layer so you can
Ctrl+Fthrough a scan — something a flat TIFF or JPEG image can’t do on its own (see the next section).
That combination — portable, multi-page, small, and optionally searchable — is why government portals, courts, and businesses almost universally ask for scanned documents as PDF.
To get there from a pile of page images, combine them in order with the merge images to PDF tool, which accepts JPG, PNG, TIFF, WebP, HEIC and RAW scans and outputs a single PDF.
“Searchable” means an OCR text layer
A common misconception is that any PDF of a scanned page is searchable. It isn’t. A scan is a picture of text — the computer sees pixels, not words. To make it searchable, software has to run OCR (Optical Character Recognition), which analyzes the shapes of the letters in the image and writes an invisible text layer behind the page image. The visible scan looks identical; underneath, there’s now selectable, searchable text (OCRmyPDF, Litera support).
This matters for picking a format:
- PDF can hold that OCR text layer — it’s the only one of the three formats designed to carry searchable text over page images. That’s a big part of why PDF dominates document workflows.
- TIFF and JPEG store the image only. They have no text-layer concept, so a TIFF or JPEG scan is never searchable by itself.
One honest caveat about turning images into PDF: assembling scans into a PDF is not the same as OCR. xconvert’s merge-images-to-PDF tool embeds your images as images — it produces a clean, shareable, multi-page PDF, but the text inside those images is not automatically made selectable. If you specifically need a searchable PDF, run the file through a dedicated OCR step (Adobe Acrobat, OCRmyPDF, or a similar OCR service) to add the text layer. For most “I just need to send this scan” cases, an image-based PDF is exactly what the recipient expects; OCR is an extra step you add only when search matters.
PDF/A: the archival standard (ISO 19005)
If you’re keeping a scanned document for the long term — legal records, medical files, anything you might need to open and trust years from now — the right target is PDF/A, the archival subset of PDF defined by ISO 19005. PDF/A-1 (ISO 19005-1) was published on September 28, 2005 (PDF/A — Wikipedia; Library of Congress).
PDF/A constrains regular PDF to guarantee the file renders the same way far into the future. Its core rules:
- Self-contained. Everything needed to display the document — text, images, embedded fonts, and color information — lives inside the file. No external dependencies that could go missing.
- No encryption. PDF/A forbids encryption, so a future reader can’t be locked out of an archival record.
- Defined color. Color must be specified with calibrated/ICC profiles or an embedded output intent, so the document looks consistent across devices and years.
Because PDF/A is still PDF, it remains multi-page and can carry an OCR text layer (PDF/A-2 and later), so an archived scan can be both faithfully preserved and searchable. Government and regulated archives in many countries mandate or recommend PDF/A for permanent electronic records (Library of Congress, Sustainability of Digital Formats).
TIFF: the lossless preservation master
TIFF still has a clear, narrow role: the lossless preservation master. Libraries, archives, and digitization projects keep an uncompressed or losslessly-compressed TIFF of a scan as the authoritative “master” copy because it preserves full bit depth and color fidelity with zero compression loss. TIFF can also hold multiple pages in one file and natively supports CMYK for professional print.
The downsides are exactly why it’s not the everyday sharing format:
- Large files. A lossless or high-bit-depth TIFF is much bigger than a PDF or JPEG of the same scan.
- Limited viewing. Browsers generally don’t display TIFF inline — they download it — and exotic TIFF compression/color variants don’t open in every app, so recipients may need special software.
- Not searchable on its own. Like JPEG, a TIFF is an image; it carries no text layer.
A common professional pattern is to keep a TIFF preservation master and produce a PDF/A access copy for everyday viewing, search, and sharing. For a deeper comparison of TIFF against another lossless format for scans, see TIFF vs PNG for Scanned Documents.
JPEG: small, lossy, and wrong for text
JPEG (ISO/IEC 10918-1, standardized in 1992) is a lossy compression format: it discards image detail to make files small, and re-saving a JPEG repeatedly degrades it further (JPEG — Wikipedia). That trade-off is fine for photographs, where small losses are invisible, but it’s a poor fit for scanned documents:
- Artifacts on text. JPEG is “not well suited for line drawings and other textual or iconic graphics, where the sharp contrasts between adjacent pixels can cause noticeable artifacts.” Crisp black text on white is exactly the high-contrast edge case JPEG handles worst — fine print can blur or ring.
- Single image only. There’s no multi-page JPEG, so a document means a folder of separate files.
- No text layer. A JPEG can’t be searchable on its own.
JPEG earns a place only for a single-page photographic scan — say, a scanned photo or a one-page color flyer — where small file size matters more than perfect text fidelity. For an actual document, convert those JPEGs into a PDF instead. If a JPEG-heavy PDF ends up too large, compress the PDF by downsampling its images while keeping the text sharp.
Decision table: which format for which job
| Job | Best format | Why |
|---|---|---|
| Scanned document to share or upload | Multi-page, universal viewer, compact, can hold OCR text | |
| Multi-page document you need to keep together | One file, in order, opens everywhere | |
| Searchable scanned document | PDF (with OCR) | Only format that carries an invisible text layer behind the image |
| Long-term / legal / compliance archive | PDF/A | ISO 19005 — self-contained, embedded fonts, no encryption |
| Lossless preservation master of a scan | TIFF | Zero compression loss, full bit depth (large, needs viewer) |
| Professional CMYK print of a scan | TIFF | Native CMYK + high bit depth |
| Single-page photographic scan, smallest size | JPEG | Small and universal — but lossy, bad for text |
| Lossless single-image scan for the web | PNG / TIFF | See TIFF vs PNG for scans |
Turn your scans into one PDF
Once you’ve scanned your pages — as JPG, PNG, TIFF, HEIC, or RAW — the practical move is to combine them into a single PDF:
- Merge images to PDF — drop in your page images in order and get one PDF. You can set paper size (A4, US Letter, Legal, and more), margins, orientation, and image quality before merging.
- Compress PDF — if the resulting PDF is too large to email or upload, shrink it. Presets from Screen (smallest) to Prepress (highest quality) downsample embedded images while keeping text sharp.
Your files upload over an encrypted connection, the conversion runs on our servers, and the files are deleted automatically a few hours later — nothing lingers. The result is one tidy PDF anyone can open. Note that merging embeds your scans as images; if you need the text inside them to be selectable, add a separate OCR step afterward to create a searchable PDF.
For the related question of which lossless image format to scan to before you build the PDF, see the companion article TIFF vs PNG for Scanned Documents.
FAQ
What is the best file format for scanned documents?
For almost every scanned document, PDF — it holds multiple pages in one file, opens on any device or browser, is far smaller than a multi-page TIFF, and can carry an OCR text layer to become searchable. Use TIFF only when you need a lossless preservation master or CMYK print file, and JPEG only for a single-page photographic scan where small size beats text fidelity. For long-term records, use PDF/A (ISO 19005), the archival subset of PDF.
Is a scanned PDF automatically searchable?
No. A scan is a picture of text, so by default the computer sees pixels, not words. A PDF only becomes searchable when OCR software analyzes the page image and adds an invisible text layer behind it. Many scanning apps and OCR tools (Adobe Acrobat, OCRmyPDF, and others) do this; simply assembling images into a PDF does not — the text stays part of the image until OCR is run.
Does xconvert’s merge-to-PDF tool make the text searchable (OCR)?
No. The merge images to PDF tool embeds your scans as images and produces a clean, multi-page, shareable PDF — but it does not run OCR, so the text inside the images is not automatically selectable. If you need a searchable PDF, run the merged file through a dedicated OCR step afterward to add the text layer.
What is PDF/A and when should I use it?
PDF/A is the archival subset of PDF defined by ISO 19005 (PDF/A-1 published September 28, 2005). It requires the file to be self-contained — embedded fonts, defined color, no external dependencies — and prohibits encryption, so the document renders the same way far into the future. Use it for documents you must keep long-term: legal, medical, financial, or compliance records. Many government archives mandate or recommend PDF/A for permanent records.
Should I scan documents to PDF or TIFF?
For sharing, searching, and everyday use, PDF — it’s portable, multi-page, compact, and can be made searchable. Reach for TIFF only when you need a lossless preservation master (full image fidelity for archiving) or a CMYK print file. A common professional pattern is to keep a TIFF master and a PDF/A access copy. Note TIFF files are large and usually need a dedicated viewer.
Why is JPEG a bad choice for scanned text?
JPEG uses lossy compression and was designed for photographs, not text. The sharp black-on-white edges of printed text are exactly where JPEG produces visible artifacts — ringing and blur around letters — and re-saving degrades it further. JPEG also can’t hold multiple pages or a searchable text layer. For documents, convert your JPEGs into a single PDF instead.
How do I make a scanned PDF smaller?
Use the compress PDF tool, which downsamples the embedded page images while keeping text sharp. Pick a preset from Screen (smallest file) through Ebook, Default, Printer, to Prepress (highest quality), or adjust the image-quality slider. Screen or Ebook is usually plenty for emailing or uploading a scanned document to a portal.
Sources
Last verified 2026-06-18.
- PDF/A — Wikipedia — PDF/A defined by ISO 19005; PDF/A-1 published September 28, 2005; embedded fonts required; encryption prohibited; OCR text layer in PDF/A-2/3.
- PDF/A Family — Library of Congress, Sustainability of Digital Formats — PDF/A as the long-term preservation standard; self-containment; archival mandate.
- OCRmyPDF — GitHub — OCR adds an invisible, searchable text layer behind the scanned page image.
- How an image PDF becomes a text searchable PDF — Litera — searchable PDF = original image plus invisible OCR text layer.
- JPEG — Wikipedia — JPEG is lossy (ISO/IEC 10918-1, 1992); produces artifacts on text/line art; single-image format.
- TIFF — Wikipedia — TIFF is lossless, multi-page capable, native CMYK; institutional preservation-master format.
