
The data layer for
document intelligence
Enterprise AI runs on documents. We build the datasets that make it possible.

Production-grade document AI datasets don't exist. Current options are synthetic, dated, and blind to the long tail.

Shipping real world document AI taught us the bottleneck is high-quality data. That experience shapes our datasets.




/ DATASETS
Featured Datasets
98% accuracy with massive coverage across 100+ languages, 20+ domains, and every document type in the wild. Document Understanding for layout and parsing, Document Action for complex workflows.

Parsing
Setting the gold standard for real-world document understanding. End-to-end parsing covering layout, reading order, 50+ language OCR, table to HTML, forms, formulas, and charts.
/ PRODUCT
Complete, living data products
Our datasets are stress-tested with in-house models and continuously improved. Complete data products built with the same rigor as the models they train.
Core Data
Expert-created and rigorously sourced. Domain expertise throughout, from annotation to QA/QC, continuously refined.
Expansion
Synthetic expansion rigorously developed on top of core data. With rich metadata for building your own splits and crafting solid training recipes.
Insights
Interactive reports to explore how we built it. Sourcing, distributions, annotation logic. ML learnings from in-house training to inform your experiments.
Iterative
Accuracy, schemas, and coverage all improve continuously after delivery. An ongoing partnership around your evolving needs.

/ CONTACT
Get samples or build a dataset with us
Our library goes beyond what's listed here. Whether you need an off-the-shelf dataset or a custom build, we're ready to help.
How we work with you
Tell us what you need
Short call to understand your exact needs. We identify the best dataset for you and share samples.
Simple licensing
Straightforward data license for your specific use-cases. We skip the procurement headache.
Start training
Get access to production-grade data in days, not months. Your team starts building immediately.

Bespoke Partnerships
For novel tasks where our off-the-shelf data doesn't fit, we partner with labs to create it. We build the exact recipe you need.
- Custom recipes designed for your specific capability
- Deep collaboration with your researchers
- Scale from pilot to production volume seamlessly
