PDF to RAG Chunks

Parse PDFs into RAG-ready Markdown and JSONL chunks with reading order.

Prepare PDFs for retrieval-augmented generation. Parse documents into clean, RAG-ready Markdown and JSONL chunks that preserve reading order, headings, tables, code blocks, and captions — with OCR for scanned pages.

  • Parse PDFs into Markdown and JSONL chunks
  • Preserves reading order and document structure
  • Keeps headings, tables, code blocks, and captions
  • OCR for scanned documents
  • Output ready for RAG pipelines
chayprabs/pdf-to-rag-chunksFull source code, issues, and releasesOpen →

Spotted a bug or have an idea?

This tool is built in the open and shaped by feedback. If something feels off — or you want a feature — I read every message.