[go: up one dir, main page]

Docling converts messy documents into structured data and simplifies downstream document and AI processing by detecting tables, formulas, reading order, OCR, and much more.

Start

Install Docling as a Python library with your favorite package manager:

pip install docling

Use the CLI directly from your terminal:

docling https://arxiv.org/pdf/2206.01062

Integrate a document conversion into your Python application:

from docling.document_converter import DocumentConverter

source = "https://arxiv.org/pdf/2408.09869"
converter = DocumentConverter()
doc = converter.convert(source).document
print(doc.export_to_markdown())

Explore the examples

Features

Parse many document formats into a unified and structured form.

Export a parsed document to formats that simplify processing and ingestion into AI, RAG, and agentic systems.