- How accurate is text extraction?
- Very accurate for text-based PDFs (typed, not scanned). Scanned PDFs need OCR — this tool doesn't do OCR.
- What about multi-column layouts?
- Reading order is inferred from the PDF's content stream. Usually correct; complex layouts (magazines, newspapers) may need manual reordering.
- Are formatting and images preserved?
- No — output is plain text. For structure-preserving extraction use PDF-to-Markdown (not yet in this suite) or a tool like pdftohtml.
- Does it handle tables?
- Tables become text with spaces as column separators. For tabular extraction, look at CSV-based PDF tools (Tabula, Camelot).