Conversion

Scanned PDF to editable Word

A scanned PDF is just a stack of images — copy-paste won't work and Word can't open it cleanly. You need OCR (optical character recognition) to extract the actual text and rebuild a real Word document.

Tool

⚡ Skip the retyping →

Free · No account · Files deleted in 1 hour

OCR + Word export in one step · 100+ languages

Why this works

Our scanned-to-Word pipeline runs Tesseract OCR over each page, preserves paragraph structure, and writes a proper .docx file with editable text, headings and reasonable layout.

The mechanical reality of a scanned PDF. When you scan a paper document, the scanner produces an image of each page — the same way a photograph is an image. That image gets wrapped in a PDF container, so the file extension says .pdf and your PDF reader opens it, but the content inside is purely pixel data. You can see the text on screen because your eyes interpret pixel patterns as letters, but your computer sees only colours, not characters. That\'s why Cmd-F finds nothing, copy-paste returns blank or garbage, screen readers read silence, and Word\'s direct "Open" treats the file as embedded images rather than editable content.

OCR (optical character recognition) is the bridge. It runs a computer-vision model over each page image, recognises pixel patterns as letters, words, and paragraphs, and outputs actual text characters. With OCR done, the document becomes "searchable" (a hidden text layer over the page images that Cmd-F can find) and "convertible" (the recognised text can be extracted into Word, Excel, or any other editable format).

What our pipeline does, step by step. Page-level OCR: each page image is processed individually through Tesseract OCR with language hints if you provided them. Layout reconstruction: the recognised text is grouped into paragraphs, headings (detected by font size and weight in the page image), and lists (detected by bullet/number patterns at the start of lines). Table detection: simple grid tables are detected by row/column alignment and reconstructed as Word table objects. Image preservation: any non-text elements (logos, photos, diagrams) are kept as images, placed in their original positions on the Word page. Output assembly: the result is written as a .docx file that opens cleanly in Word, Google Docs, LibreOffice, or Pages.

OCR accuracy expectations by source quality. Clean modern scans (office multifunction at 300 DPI in a well-lit room): typically 99%+ character accuracy for Latin-script text. Phone-app scans (Adobe Scan, CamScanner, Apple Notes scan) in good lighting: 97–99%. Marginal-quality scans (low contrast, skewed pages, faded paper, copy-of-copy degradation): 85–95% — expect manual proofreading. Handwritten content: depends heavily on handwriting clarity — firmly-printed block letters from a steady hand are recognised reasonably; casual cursive is unreliable; signatures rarely transliterate cleanly. Multi-language documents: pass the dominant language as a hint; mixed-language pages may need OCR run per language.

What survives the round-trip well. Body text in standard fonts (Times, Arial, Helvetica equivalents) converts almost perfectly. Headings — detected by relative font size — carry over with appropriate Word heading styles. Numbered and bulleted lists: clean conversion. Simple grid tables: convert to Word tables with editable cells.

What needs touch-up after conversion. Complex multi-column layouts (newsletters, magazine-style pages) may flatten to single-column flow — expect to manually re-establish columns in Word. Tables with merged cells, nested layouts, or rotated text: the data usually arrives intact but the structure may need rebuilding. Footnote sequences: often arrive in the wrong position (alongside the body text rather than at the page bottom) — cut/paste in Word to fix. Handwritten annotations: typically arrive as image artefacts rather than text. Signature blocks: signature images carry as images; if you need editable signatory names, retype them.

A practical workflow tip. Run OCR first on the raw PDF (using our OCR tool) and verify the recognised text looks right before converting to Word. If you skip this verification step and go straight to Scanned PDF to Word, you might spend 20 minutes editing layout issues only to find OCR misread a key paragraph. Five seconds of OCR verification saves the headache.

How it works

1

Open the OCR-to-Word tool
Tap the orange button above to launch with OCR + Word output pre-selected. Both passes run automatically in sequence.
2

Upload the scan
Drop your scanned PDF in. We support multi-page documents up to 25 MB on the free tier, 1 GB on Pro. Phone-app scans, scanner output, and copy-shop PDFs all work.
3

Pick the language
OCR works best when you tell it which language to expect. English, French, Spanish, German, Arabic, Chinese, and 100+ others supported. Auto-detect handles single-language documents reasonably; multi-language pages need explicit selection.
4

Wait for OCR + conversion
Processing takes 2–4 seconds per page. A 30-page contract finishes in about a minute and a half. You'll see a progress indicator.
5

Download the editable .docx
Open in Word, Google Docs, Pages, or LibreOffice. Text is fully selectable and editable. Expect to spend a few minutes on layout cleanup for complex source documents.

Who this is for

Real-world uses

Lawyers

Convert scanned contracts to editable drafts for redlining and counter-proposal work.

Researchers

Pull text out of archival scans (journal articles, historical documents) for citation, quotation, and content analysis.

HR teams

Update scanned policy documents, employee handbooks, and offer letter templates without retyping them.

Translators

Get clean source text from scanned originals for CAT tool ingestion without re-keying.

Accountants

Extract figures from scanned invoices and statements into editable form for spreadsheet entry.

Anyone with paper-only documents

A photographed contract, a receipt, a letter — turn it into something you can edit, search, and reuse.

FAQ

Common questions

How accurate is OCR?

On clean printed scans, 98–99%+ character accuracy. Phone-app scans in good light: 97–99%. Marginal-quality scans (low contrast, skewed, faded): 85–95% — always proofread. Handwriting accuracy varies enormously by source clarity.

Will my tables come through?

Simple grid tables convert to Word tables with editable cells. Complex tables with merged cells, nested layouts, or rotated text usually have correct data but may need structural cleanup.

Which languages are supported?

English, French, Spanish, German, Arabic, Chinese, and 100+ others via Tesseract. Pass the document's dominant language as a hint for best accuracy; multi-language pages may need separate passes.

Will images and logos carry over?

Yes. Non-text elements (logos, photos, diagrams) are preserved as images in the .docx output at their original positions.

What if my scan is skewed or low-contrast?

OCR accuracy drops on heavily skewed or low-contrast scans. Consider rescanning with a flatter angle and better lighting, or use the Crop tool to remove scanner-bed shadows first.

Should I OCR first separately, then convert to Word?

It's an option. Running our OCR tool first lets you verify recognised text before committing to the full Word conversion. If the OCR output looks wrong, you can re-scan rather than wasting time editing a flawed Word output. For straightforward scans, the combined tool is faster.

Will signatures convert as editable text?

No — signatures are visual marks, not text. They carry over as images on the Word page. If you need editable signatory names, retype them next to the image.

Other conversion fixes

🔄