Can I run OCR without my file leaving my device?

Yes — pick "Process in your browser" above the upload zone to switch on browser-mode OCR. The file is read locally, rendered to canvas via PDF.js, recognised by Tesseract.js, and the result is delivered as a download — all without any network request during processing. First use downloads about 8 MB of OCR model and library code (cached for later sessions). Browser mode currently supports English only and is capped at 10 MB or 30 pages per file; for other languages or longer documents, server mode uses a larger model and is the right pick. Network tab inspection in DevTools verifies the privacy claim: zero outbound requests during the actual processing run.

How accurate is the OCR?

On clean, modern, well-lit scans of Latin-script text: typically 99%+ accurate. On phone photos taken with reasonable lighting: 97–99%. On heavily skewed, low-contrast, or stained scans: it varies. Numbers and short fields are very reliable across all conditions; running paragraphs are where errors accumulate.

Which languages are supported?

English, French, Spanish, German, Arabic, and Chinese are explicitly supported with optimised language models. Auto-detect picks the right one per page when the document has a single dominant language. Other Latin-script languages (Italian, Portuguese, Dutch) also work via auto-detect with slightly lower accuracy.

Does the PDF look different after OCR?

No. OCR overlays an invisible text layer on top of the original page images — the visual appearance is identical. The difference is internal: Cmd-F now finds text, copy-paste now works, screen readers can read the document.

Can I OCR a multi-page PDF?

Yes. There's no hard page cap — only the upload-size cap on your plan (25 MB free, 500 MB Pro). Processing time scales with page count: budget about 1–3 seconds per page.

Will OCR work on handwritten notes?

Partially. Block-printed handwriting from a steady hand is recognised with moderate accuracy. Cursive handwriting and casual notes are less reliable — typed material works best. For handwritten content consider it a rough transcript that needs proofing.

Does OCR change my original?

No — your uploaded file is unchanged. The OCR output is a new PDF you download separately. Both are removed from our servers within one hour.

OCR PDF | pdfrun.io

Why this works

Turn a scanned PDF — phone photos of a contract, an office-scanner dump — into a searchable, copy-pasteable, screen-reader-friendly document. Auto-detects English, French, Spanish, German, Arabic and Chinese.

Files never leave your device. This tool processes your PDF entirely in your browser using pdf-lib. Open your browser's Network tab while running this tool and you'll see zero outbound requests during processing — the verifiable basis for the claim. Switch to "Process on our server" only when you need server-grade processing for very large files.

A scanned PDF is technically an image trapped inside a PDF wrapper. You can see the text but your computer can't: Cmd-F finds nothing, screen readers read nothing, search indexes ignore the content. OCR (optical character recognition) reads the image with a vision model, transcribes it to actual text, and stamps an invisible text layer over the page images. The PDF still looks identical when you open it — but now you can search, select, copy, paste, and have it read aloud.

This tool is the right call for: contracts and receipts you scanned with your phone, statements from a bank that only emails image-based PDFs, archival material from before document workflows went digital, anything coming out of a flatbed or sheet-fed scanner.

Language matters. The default Auto-detect handles a single dominant language per page; switch to the specific language when you know it (English, French, Spanish, German, Arabic, Chinese) for slightly higher accuracy on mixed or marginal-quality scans. For very poor scans (low contrast, skewed, dirty originals) consider scanning again at higher contrast — OCR accuracy is bottlenecked by input quality more than by the model.

Accuracy on clean modern scans is typically 99%+ for Latin scripts, slightly lower for handwritten or stylised fonts. Numbers in tables are reliably captured; multi-column layouts and footnotes can occasionally interleave wrongly but text inside each column remains correct.

Files are processed on our servers and removed within one hour. No watermark.

Privacy note: an in-browser OCR mode is available as an opt-in alternative to server processing. When you pick "Process in your browser" above the upload zone, OCR runs entirely locally using Tesseract.js — your file never reaches our servers. First use downloads about 8 MB of OCR model and library code, cached after that. Browser mode currently supports English only and is capped at 10 MB or 30 pages per file; for other languages, marginal-quality scans where accuracy matters most, or longer documents, server mode uses a larger model and remains the right pick. The opt-in default reflects the model-download cost: privacy by explicit choice rather than imposing 8 MB on every visitor.

How it works

Upload your scanned PDF

Drop your file into the upload box. Both single-image PDFs and multi-page scans work.

Pick the document language

Auto-detect is fine for most jobs. Pick the specific language (English, French, Spanish, German, Arabic, Chinese) for marginal-quality scans where you know the language.

Run OCR

Press Run OCR. Processing takes roughly 1–3 seconds per page. A 30-page scanned contract finishes in under a minute.

Download the searchable PDF

The output looks identical to the input but is now searchable, selectable, and screen-reader friendly. Files are auto-deleted within one hour.

OCR PDF

Options

How to use

Why this works

How it works

Real-world uses

Researchers

Bookkeepers

Lawyers

Accessibility teams

Common questions

Scanned PDF to editable Word

Extract all text from a PDF

Compress a scanned PDF

AI-summarise a PDF