PDF to RAG Chunks
Parse PDFs into RAG-ready Markdown and JSONL chunks with reading order.
About this tool
Prepare PDFs for retrieval-augmented generation. Parse documents into clean, RAG-ready Markdown and JSONL chunks that preserve reading order, headings, tables, code blocks, and captions — with OCR for scanned pages.
What it does
- Parse PDFs into Markdown and JSONL chunks
- Preserves reading order and document structure
- Keeps headings, tables, code blocks, and captions
- OCR for scanned documents
- Output ready for RAG pipelines
Repository
chayprabs/pdf-to-rag-chunksFull source code, issues, and releasesOpen →Spotted a bug or have an idea?
This tool is built in the open and shaped by feedback. If something feels off — or you want a feature — I read every message.
Related tools