OCR PDF
Make scanned PDFs searchable and selectable.
or click to browse — supports PDF files up to 100MB
How to use
- 1 Drop or click to upload your file
- 2 Adjust options if shown
- 3 Click Run Tool
- 4 Download your result instantly
- ✓ Files up to 1GB
- ✓ Unlimited jobs/hour
- ✓ Batch processing
- ✓ Priority support
Files are processed securely and permanently deleted within 1 hour. We never store, read, or share your documents.
Why this works
Turn a scanned PDF — phone photos of a contract, an office-scanner dump — into a searchable, copy-pasteable, screen-reader-friendly document. Auto-detects English, French, Spanish, German, Arabic and Chinese.
A scanned PDF is technically an image trapped inside a PDF wrapper. You can see the text but your computer can't: Cmd-F finds nothing, screen readers read nothing, search indexes ignore the content. OCR (optical character recognition) reads the image with a vision model, transcribes it to actual text, and stamps an invisible text layer over the page images. The PDF still looks identical when you open it — but now you can search, select, copy, paste, and have it read aloud.
This tool is the right call for: contracts and receipts you scanned with your phone, statements from a bank that only emails image-based PDFs, archival material from before document workflows went digital, anything coming out of a flatbed or sheet-fed scanner.
Language matters. The default Auto-detect handles a single dominant language per page; switch to the specific language when you know it (English, French, Spanish, German, Arabic, Chinese) for slightly higher accuracy on mixed or marginal-quality scans. For very poor scans (low contrast, skewed, dirty originals) consider scanning again at higher contrast — OCR accuracy is bottlenecked by input quality more than by the model.
Accuracy on clean modern scans is typically 99%+ for Latin scripts, slightly lower for handwritten or stylised fonts. Numbers in tables are reliably captured; multi-column layouts and footnotes can occasionally interleave wrongly but text inside each column remains correct.
Files are processed on our servers and removed within one hour. No watermark.
Privacy note: an in-browser OCR mode is available as an opt-in alternative to server processing. When you pick "Process in your browser" above the upload zone, OCR runs entirely locally using Tesseract.js — your file never reaches our servers. First use downloads about 8 MB of OCR model and library code, cached after that. Browser mode currently supports English only and is capped at 10 MB or 30 pages per file; for other languages, marginal-quality scans where accuracy matters most, or longer documents, server mode uses a larger model and remains the right pick. The opt-in default reflects the model-download cost: privacy by explicit choice rather than imposing 8 MB on every visitor.
How it works
-
1Upload your scanned PDFDrop your file into the upload box. Both single-image PDFs and multi-page scans work.
-
2Pick the document languageAuto-detect is fine for most jobs. Pick the specific language (English, French, Spanish, German, Arabic, Chinese) for marginal-quality scans where you know the language.
-
3Run OCRPress Run OCR. Processing takes roughly 1–3 seconds per page. A 30-page scanned contract finishes in under a minute.
-
4Download the searchable PDFThe output looks identical to the input but is now searchable, selectable, and screen-reader friendly. Files are auto-deleted within one hour.
Real-world uses
Researchers
Making scanned archival material searchable so quotes can be found by Cmd-F instead of skimming page by page.
Bookkeepers
Receipts photographed with a phone need to be searchable for expense audits — OCR before filing.
Lawyers
Discovery documents come in as image-only PDFs. OCR is the first pass before anything can be reviewed or indexed.
Accessibility teams
A scanned PDF is invisible to screen readers. OCR is mandatory to meet basic WCAG document compliance.
Common questions
Can I run OCR without my file leaving my device?
Yes — pick "Process in your browser" above the upload zone to switch on browser-mode OCR. The file is read locally, rendered to canvas via PDF.js, recognised by Tesseract.js, and the result is delivered as a download — all without any network request during processing. First use downloads about 8 MB of OCR model and library code (cached for later sessions). Browser mode currently supports English only and is capped at 10 MB or 30 pages per file; for other languages or longer documents, server mode uses a larger model and is the right pick. Network tab inspection in DevTools verifies the privacy claim: zero outbound requests during the actual processing run.
How accurate is the OCR?
On clean, modern, well-lit scans of Latin-script text: typically 99%+ accurate. On phone photos taken with reasonable lighting: 97–99%. On heavily skewed, low-contrast, or stained scans: it varies. Numbers and short fields are very reliable across all conditions; running paragraphs are where errors accumulate.
Which languages are supported?
English, French, Spanish, German, Arabic, and Chinese are explicitly supported with optimised language models. Auto-detect picks the right one per page when the document has a single dominant language. Other Latin-script languages (Italian, Portuguese, Dutch) also work via auto-detect with slightly lower accuracy.
Does the PDF look different after OCR?
No. OCR overlays an invisible text layer on top of the original page images — the visual appearance is identical. The difference is internal: Cmd-F now finds text, copy-paste now works, screen readers can read the document.
Can I OCR a multi-page PDF?
Yes. There's no hard page cap — only the upload-size cap on your plan (25 MB free, 500 MB Pro). Processing time scales with page count: budget about 1–3 seconds per page.
Will OCR work on handwritten notes?
Partially. Block-printed handwriting from a steady hand is recognised with moderate accuracy. Cursive handwriting and casual notes are less reliable — typed material works best. For handwritten content consider it a rough transcript that needs proofing.
Does OCR change my original?
No — your uploaded file is unchanged. The OCR output is a new PDF you download separately. Both are removed from our servers within one hour.