Open the tool now — free, no signup, no watermark.
Copy a table out of a PDF and paste into Excel. Watch every column collapse into a single cell separated by spaces. That’s a flat-out useless format for analysis. The fix is real table extraction, not paste.
What “table detection” actually means
A good extractor looks for visual cues: vertical alignment of text, consistent row spacing, ruler lines, alternating row backgrounds. From those, it reconstructs a grid of rows and columns and writes each cell to its own .xlsx cell. The result is analysable data — sortable, filterable, formula-able.
Three patterns to expect
- Cleanly bordered tables (think SEC filings, scientific reports): near-perfect extraction. Column boundaries are unambiguous.
- Borderless tables with consistent alignment (most invoices, bank statements): great extraction with a quick boundary check.
- Multi-page tables: a good tool stitches them back together when row format is consistent. Mixed formats per page need manual intervention.
Three steps that save you 90% of the cleanup
- Run extraction with auto-detection on. Accept the proposed boundaries unless you spot a column merging two real columns.
- Open the .xlsx and scan the first and last rows of each table. Header row should be header-styled; last row should be data, not a footer.
- Convert numeric-looking strings to actual numbers (Excel’s “Convert to Number” prompt) so totals work.
What you cannot recover
Formulas don’t exist in the PDF. If the original Excel had =SUM(B2:B12), the PDF only has the displayed sum. You’ll see the value, not the formula behind it. Rebuilding formulas from totals is sometimes feasible — if you need exact original formulas, source them from the workbook, not the PDF.
Frequently asked questions
Does PDF-to-Excel work on scanned tables?
Yes, but run OCR first. Direct extraction without OCR will return empty cells because the "text" is just pixels.
My table spans 30 pages. Can it be merged?
If row format is consistent across pages, yes — modern extractors stitch automatically. Inconsistent layouts need a per-page export.