chronohorn

runtime · replay · fleet · frontier control

Track every experiment. Search the frontier.

Chronohorn is a family-agnostic experiment tracker and architecture-search runtime for predictive descendants. SQLite-backed truth, 64 MCP tools for AI-agent integration, multi-backend fleet dispatch across CPU / Metal / CUDA, saturation and forecast analysis, and a plug-in family adapter protocol.

  • Python ≥ 3.11
  • MIT License
  • 64 MCP tools
  • SQLite-backed
  • Family-agnostic
Chronohorn
  • 64MCP tools
  • 3backends
  • frontier

Track everything

Results from any model family land in a single SQLite DB. Legality and trust state stay attached to results — probes, finals, and forecasts share one schema.

Run the search loop

Manifest-driven fleet dispatch with drain, result pull-back, planner placement, and auto-deepen. One control surface across heterogeneous hardware.

Talk to agents

An MCP server exposes 64 tools for frontier analysis, ablation tracking, fleet control, saturation detection, and learning-curve comparison.

Quickstart

Chronohorn depends on the decepticons kernel — pip pulls it automatically.

# install from PyPI
pip install chronohorn

# launch the dashboard against a result directory
chronohorn observe serve --result-dir out/results

# emit a family-owned scan manifest
chronohorn fleet emit-family-matrix --family causal-bank --regime gated-retention

# full daemon: drain + fleet probe + observer + MCP
chronohorn runtime --manifest manifests/frontier_gated_retention.jsonl

# stdio MCP transport (wire into Claude Code, Claude Desktop, etc.)
chronohorn mcp

CLI help: chronohorn --help. Live HTTP dashboard defaults to http://localhost:7878.

Architecture

Chronohorn is the middle layer of a three-repo split. Dependencies flow one direction.

decepticons   →   chronohorn   →   heinrich
   kernel          runtime          forensics

mechanisms    +   training      +   model geometry
substrates    +   tracking      +   activation traces
readouts      +   fleet
              +   MCP
              +   dashboard

Single source of truth

ChronohornDB serializes every mutation through a dedicated writer thread. Reads run on a separate WAL connection. JSON files are archives; the DB is live truth.

Family-agnostic core

Family-specific code lives only in families/<name>/. Core infra (db, mcp, serve, runtime) never imports family modules directly — everything goes through the auto-discovered registry.

Unified daemon

chronohorn runtime bundles drain, fleet probe, observer dashboard, and auto-deepen into one long-running process. Use --no-dispatch for monitor-only mode.

Rust replay path

The chronohorn-causal-bank crate owns checkpoint replay, batched readout, and oracle-budgeted artifact builds. CPU-only, Accelerate/BLAS-backed on macOS.

MCP integration

Chronohorn exposes a stateful MCP surface so an agent (Claude Code, Claude Desktop, any compliant client) can read frontier state, launch jobs, and reason over learning curves with the same data the dashboard sees.

Add to .mcp.json in your project:

{
  "mcpServers": {
    "chronohorn": {
      "command": "python",
      "args": ["-m", "chronohorn.mcp_transport"],
      "env": { "PYTHONPATH": "python" }
    }
  }
}

Tool surface

Observation

chronohorn_status
chronohorn_frontier
chronohorn_learning_curves
chronohorn_compare
chronohorn_marginal_rank
chronohorn_ablation_board

Fleet control

chronohorn_fleet_dispatch
chronohorn_fleet_drain_tick
chronohorn_fleet_status
chronohorn_fleet_sync
chronohorn_register_run

Decisions

chronohorn_control_recommend
chronohorn_control_act
chronohorn_auto_deepen
chronohorn_artifact_check
chronohorn_saturation

Live registry: python/chronohorn/mcp.py — 64 tools and counting.

Fleet

One manifest. Three honest backends. The planner places work where it actually fits.

cpu

Linux snapshot jobs for Rust artifact builders, packed-compiler stages, and full-val eval. Heavy on hashing and sparse tables — does not need a GPU.

metal

Local MLX descendants on Apple Silicon. The honest way to use the Mac GPU for causal-bank training and bridge probes.

cuda

Remote Linux Docker jobs for CUDA-native compression work. VRAM-tier-aware so cheap O(n) screens prefer the smallest sufficient lane.

Manifest row, minimum viable

{
  "name": "causal-bank-10k-pilot",
  "backend": "cuda",
  "launcher": "managed_command",
  "command": "python -m chronohorn.train ...",
  "host": "auto",
  "min_gpu_mem_gb": 8,
  "gpu_placement_policy": "smallest_sufficient",
  "work_tokens": 10000000
}

Drain runs the loop unattended:

python -m chronohorn fleet drain \
  --manifest manifests/frontier_gated_retention.jsonl \
  --result-dir out/results \
  --poll-interval 60

Adding a family

Drop a package at python/chronohorn/families/<name>/. The registry auto-discovers it via pkgutil.iter_modules — no manual wiring.

  1. Create python/chronohorn/families/<name>/__init__.py exporting a <UPPER_NAME>_TRAINING_ADAPTER singleton.
  2. Implement adapter.py following the FamilyTrainingAdapter protocol — architecture aliases, illegal detection, config summaries, infer_from_config(), and training entrypoints.
  3. That's it. Core infra never imports your family module directly; it goes through the registry.

Currently shipped

  • causal-bank — Decepticons kernel models. Frozen linear substrate, local conv, MLP/expert readout.
  • polyhash — Hash-embedding models with O(1) lookup tables, gated scan, and PKM. Independent of decepticons.
  • transformer — Adapter only. External training pipelines plug in here.

Measurement methodology

bpb (bits per byte)

bpt × tokens_per_byte, where tokens_per_byte comes from sentencepiece on the actual test data — not shard file bytes. For sp1024: tokens_per_byte ≈ 0.411.

Probes vs finals

Probes use 2-8 eval batches and exist for monitoring trends only. Final eval uses 200 batches (6.5M tokens) and is the only number safe for claims and comparisons. Probes carry eval_batches so the two never get confused.

Eval-stream reset

The test stream resets to position 0 before every eval. Every probe and every final measures the same data — no drift, no progressive contamination.

Causality

Verified by feeding identical sequences up to position t and different content after. If logits at t differ, causality is violated. The check lives in decepticons/tests/test_causality.py.

Result JSON format

Anything you want tracked drops a result file in this shape. test_bpb is required; architecture drives family detection; probes enable curve and saturation analysis.

{
  "model": {
    "test_bpb": 1.75,
    "architecture": "my_model",
    "params": 10000000
  },
  "config": {
    "train": {
      "steps": 10000,
      "seq_len": 512,
      "batch_size": 64,
      "learning_rate": 0.005
    }
  },
  "training": {
    "final_eval_batches": 200,
    "performance": {
      "tokens_per_second": 350000,
      "elapsed_sec": 900,
      "steps_completed": 10000
    },
    "probes": [
      {"step": 100,   "bpb": 2.5,  "eval_batches": 2},
      {"step": 1000,  "bpb": 2.0,  "eval_batches": 8},
      {"step": 10000, "bpb": 1.75, "eval_batches": 16}
    ]
  },
  "dataset": {
    "test_tokens_per_byte": 0.4105,
    "test_bytes_per_token": 2.436
  }
}