Methodology

How the AGI Canary Index is built—frameworks, axes, and pipeline

Frameworks

The index consolidates established academic and policy frameworks into a single auditable system.

CHC cognitive domains (Hendrycks et al.) — "A Definition of AGI" proposes operationalizing AGI as matching cognitive versatility across Cattell–Horn–Carroll domains, producing an interpretable jagged profile rather than one score.
Levels of AGI (Morris et al., DeepMind) — Frames progress using performance, generality, and autonomy. Explains progress without claiming "AGI achieved" prematurely.
ARC-AGI — Measures abstraction and reasoning beyond narrow training distribution. The canary for generalization vs memorization.
METR autonomy evaluations — Tooling and protocols for agentic autonomy, long-horizon task success, and dangerous-capability detection. The backbone for our autonomy & risk track.
OECD AI Capability Indicators — Policy-grade, human-skill-referenced capability levels across domains. Cross-walks with CHC for a public-friendly taxonomy.

9 Cognitive Axes

Capability signals are mapped to these axes. Scores use -1..1 scale with uncertainty bounds.

Pipeline

Content flows from discovery through extraction to daily snapshots.

Discovery — RSS feeds, curated sources, search APIs, and X. URLs are deduplicated and stored as items.
Acquisition — Firecrawl scrapes full-text, content is validated (length, paywall), stored in R2, and linked to documents.
AI extraction — Vercel AI SDK + OpenRouter. LLM extracts claims, axis impacts, benchmarks, confidence, citations. Confidence is adjusted by source trust weight; signals below threshold are filtered.
Daily snapshots — Signals for a date are aggregated into axis scores (confidence-weighted average of direction × magnitude). Delta is day-over-day change.

Audit trail

Every displayed score links to citations and provenance. The app surfaces source attribution, confidence, and uncertainty.