Taxon is an OCR + AI extraction API for teams that can't send their documents to US-hosted models. Describe the fields you want with JSON Schema; get matching JSON back, with confidences and a tier-trace.
No credit card. 100 extractions/month free. Self-hosted option available.
No vertical "invoice extractor" lock-in — describe the fields you actually want and Taxon's tier router picks the cheapest model that can answer. Switch from Mindee, Reducto, Extend, or LlamaParse in an afternoon.
{
"type": "object",
"properties": {
"invoice_number": { "type": "string" },
"issue_date": { "type": "string", "format": "date" },
"total": { "type": "number" },
"line_items": {
"type": "array",
"items": {
"type": "object",
"properties": {
"description": { "type": "string" },
"quantity": { "type": "number" },
"price": { "type": "number" }
}
}
}
}
}
# Upload a PDF
FILE_ID=$(curl -sS -X POST https://app.taxon.kfs.hr/v1/files \
-H "Authorization: Bearer $TAXON_KEY" \
-F file=@invoice.pdf \
| jq -r .id)
# Extract against the schema
curl -sS -X POST https://app.taxon.kfs.hr/v1/extractions \
-H "Authorization: Bearer $TAXON_KEY" \
-H "Content-Type: application/json" \
-d '{
"file_id": "'$FILE_ID'",
"json_schema": '$(cat schema.json)'
}'
{
"id": "ext_8a3f…",
"status": "completed",
"tier": 1,
"confidence": 0.94,
"data": {
"invoice_number": "INV-2026-0481",
"issue_date": "2026-04-12",
"total": 1247.50,
"line_items": [/* … */]
}
}
Three things closed-API competitors structurally cannot offer.
Inference on Mistral La Plateforme, Nebius, Scaleway, OVHcloud, or your own self-hosted vLLM. Storage in Hetzner Object Storage. Zero US subprocessors on the document path. Schrems II clean.
Every correction your team makes lands in this workspace's audit + accuracy ledger — only ever for you. The accumulated value is yours, the schema versions stay deterministic, your historical extractions never silently re-interpret.
No "invoice extractor" / "receipt extractor" / "ID parser" SKU maze. One API, one schema language. Works for invoices, contracts, ID cards, receipts, lab reports — anything you can describe.
Access export, erasure, rectification, portability — first-class API features, not bolt-ons. PII redaction toggle as a pre-LLM step. Immutable audit log. DPA on file with every subprocessor.
Helm chart for k3s, Postgres-backed job queue, vLLM-friendly provider abstraction. Run it on-prem when your auditors require it; switch back to the SaaS without code changes.
Tier 0 → text-only LLM (cents). Tier 1 → vision LLM (when needed). Tier 2 → docTR + LLM fallback. Pay only for the tier the router actually uses; trace it in every response.
Paddle handles VAT MOSS for you. Cancel any time, export your data any time, no minimum commit on the self-serve tiers.
Bursty workloads? Pay-as-you-go is available at €0.012 / extraction with no monthly fee — same dashboard, same API, same EU-only data path. Switch to a subscription plan once your volume stabilises (the dashboard tells you when it pays off).
No setup, no install. The free tier covers most evaluations end-to-end.
Open the dashboard