Appearance
What this is
CanonicAI is the corpus-to-dataset engine: it turns large bodies of unstructured knowledge (books, research papers, domain documents) into canonical, queryable datasets with provenance, via a structured, repeatable generative-AI pipeline. It is the producer in the portfolio — ingest, extraction, registry, and a federation spine — not a public storefront. The consumable surfaces are downstream (Principia, peopleanalyst.com, the toolbox).
Corpus in. Canonical data out. The moat is thousands of multi-step extractions run reliably, idempotently, with lineage, at controlled cost — a production line, not a prompt.
Who it's for
- Operators running the Book Factory / Article Factory pipelines (largely via DevPlane).
- Downstream consumers (Principia, toolbox, PA-site) that read canonical outputs and the federation-spine contracts.
A note on shape (how this differs from a service repo)
CanonicAI's interface is not a REST/MCP gateway. It is CLI pipelines + canonical-output data contracts + package exports. So this docs surface adapts the standard's "API reference" section into a Data Contracts & Consumers section — the product's real interface. (Standard: ~/.claude/DOCUMENTATION-STANDARD.md — the interface section flexes to the product's actual surface.)
Documentation tree
Overview ........................ this page
Concepts ........................ concepts.md (corpus · canonical output · asset registry · provenance · federation spine)
Getting Started ................. getting-started.md (run the pipeline on one book · registry commands)
Architecture .................... architecture.md (collector → organizer → referee → outputs · packages · registry)
Data Contracts & Consumers ...... data-contracts.md (canonical_outputs schema · federation spine · who consumes what)
Trust & Provenance .............. trust-and-provenance.md (lineage · idempotency · the clean store)
Interface reference (generated) . reference-mcp.generated.md (sparse by design — see below)Source of truth
- Assets:
asset-registry.json(+packages/core/src/asset-registry.ts) — ~8,491 assets across 6 domains. - Capabilities:
docs/capability-manifest.json(programs + realization gates). - Canonical outputs:
canonical_outputs/<book>/…(per-chapter +book_level/book_level_sweep.json).