PDF to RAG Chunks

Parse PDFs into RAG-ready Markdown and JSONL chunks with reading order.

About this tool

Prepare PDFs for retrieval-augmented generation. Parse documents into clean, RAG-ready Markdown and JSONL chunks that preserve reading order, headings, tables, code blocks, and captions — with OCR for scanned pages.

What it does

Parse PDFs into Markdown and JSONL chunks
Preserves reading order and document structure
Keeps headings, tables, code blocks, and captions
OCR for scanned documents
Output ready for RAG pipelines

Repository

chayprabs/pdf-to-rag-chunksFull source code, issues, and releasesOpen →

Spotted a bug or have an idea?

This tool is built in the open and shaped by feedback. If something feels off — or you want a feature — I read every message.

Send feedback →