runtime · replay · fleet · frontier control
Track every experiment. Search the frontier.
Chronohorn is a family-agnostic experiment tracker and architecture-search runtime for predictive descendants. SQLite-backed truth, 64 MCP tools for AI-agent integration, multi-backend fleet dispatch across CPU / Metal / CUDA, saturation and forecast analysis, and a plug-in family adapter protocol.
- Python ≥ 3.11
- MIT License
- 64 MCP tools
- SQLite-backed
- Family-agnostic
- 64MCP tools
- 3backends
- ∞frontier
Track everything
Results from any model family land in a single SQLite DB. Legality and trust state stay attached to results — probes, finals, and forecasts share one schema.
Run the search loop
Manifest-driven fleet dispatch with drain, result pull-back, planner placement, and auto-deepen. One control surface across heterogeneous hardware.
Talk to agents
An MCP server exposes 64 tools for frontier analysis, ablation tracking, fleet control, saturation detection, and learning-curve comparison.
Quickstart
Chronohorn depends on the decepticons kernel — pip pulls it automatically.
# install from PyPI
pip install chronohorn
# launch the dashboard against a result directory
chronohorn observe serve --result-dir out/results
# emit a family-owned scan manifest
chronohorn fleet emit-family-matrix --family causal-bank --regime gated-retention
# full daemon: drain + fleet probe + observer + MCP
chronohorn runtime --manifest manifests/frontier_gated_retention.jsonl
# stdio MCP transport (wire into Claude Code, Claude Desktop, etc.)
chronohorn mcp
CLI help: chronohorn --help. Live HTTP dashboard defaults to http://localhost:7878.
Architecture
Chronohorn is the middle layer of a three-repo split. Dependencies flow one direction.
decepticons → chronohorn → heinrich
kernel runtime forensics
mechanisms + training + model geometry
substrates + tracking + activation traces
readouts + fleet
+ MCP
+ dashboard
Single source of truth
ChronohornDB serializes every mutation through a dedicated writer thread. Reads run on a separate WAL connection. JSON files are archives; the DB is live truth.
Family-agnostic core
Family-specific code lives only in families/<name>/. Core infra (db, mcp, serve, runtime) never imports family modules directly — everything goes through the auto-discovered registry.
Unified daemon
chronohorn runtime bundles drain, fleet probe, observer dashboard, and auto-deepen into one long-running process. Use --no-dispatch for monitor-only mode.
Rust replay path
The chronohorn-causal-bank crate owns checkpoint replay, batched readout, and oracle-budgeted artifact builds. CPU-only, Accelerate/BLAS-backed on macOS.
MCP integration
Chronohorn exposes a stateful MCP surface so an agent (Claude Code, Claude Desktop, any compliant client) can read frontier state, launch jobs, and reason over learning curves with the same data the dashboard sees.
Add to .mcp.json in your project:
{
"mcpServers": {
"chronohorn": {
"command": "python",
"args": ["-m", "chronohorn.mcp_transport"],
"env": { "PYTHONPATH": "python" }
}
}
}
Tool surface
Observation
chronohorn_statuschronohorn_frontierchronohorn_learning_curveschronohorn_comparechronohorn_marginal_rankchronohorn_ablation_board
Fleet control
chronohorn_fleet_dispatchchronohorn_fleet_drain_tickchronohorn_fleet_statuschronohorn_fleet_syncchronohorn_register_run
Decisions
chronohorn_control_recommendchronohorn_control_actchronohorn_auto_deepenchronohorn_artifact_checkchronohorn_saturation
Live registry: python/chronohorn/mcp.py — 64 tools and counting.
Fleet
One manifest. Three honest backends. The planner places work where it actually fits.
cpu
Linux snapshot jobs for Rust artifact builders, packed-compiler stages, and full-val eval. Heavy on hashing and sparse tables — does not need a GPU.
metal
Local MLX descendants on Apple Silicon. The honest way to use the Mac GPU for causal-bank training and bridge probes.
cuda
Remote Linux Docker jobs for CUDA-native compression work. VRAM-tier-aware so cheap O(n) screens prefer the smallest sufficient lane.
Manifest row, minimum viable
{
"name": "causal-bank-10k-pilot",
"backend": "cuda",
"launcher": "managed_command",
"command": "python -m chronohorn.train ...",
"host": "auto",
"min_gpu_mem_gb": 8,
"gpu_placement_policy": "smallest_sufficient",
"work_tokens": 10000000
}
Drain runs the loop unattended:
python -m chronohorn fleet drain \
--manifest manifests/frontier_gated_retention.jsonl \
--result-dir out/results \
--poll-interval 60
Adding a family
Drop a package at python/chronohorn/families/<name>/. The registry auto-discovers it via pkgutil.iter_modules — no manual wiring.
- Create
python/chronohorn/families/<name>/__init__.pyexporting a<UPPER_NAME>_TRAINING_ADAPTERsingleton. - Implement
adapter.pyfollowing theFamilyTrainingAdapterprotocol — architecture aliases, illegal detection, config summaries,infer_from_config(), and training entrypoints. - That's it. Core infra never imports your family module directly; it goes through the registry.
Currently shipped
- causal-bank — Decepticons kernel models. Frozen linear substrate, local conv, MLP/expert readout.
- polyhash — Hash-embedding models with O(1) lookup tables, gated scan, and PKM. Independent of decepticons.
- transformer — Adapter only. External training pipelines plug in here.
Measurement methodology
bpb (bits per byte)
bpt × tokens_per_byte, where tokens_per_byte comes from sentencepiece on the actual test data — not shard file bytes. For sp1024: tokens_per_byte ≈ 0.411.
Probes vs finals
Probes use 2-8 eval batches and exist for monitoring trends only. Final eval uses 200 batches (6.5M tokens) and is the only number safe for claims and comparisons. Probes carry eval_batches so the two never get confused.
Eval-stream reset
The test stream resets to position 0 before every eval. Every probe and every final measures the same data — no drift, no progressive contamination.
Causality
Verified by feeding identical sequences up to position t and different content after. If logits at t differ, causality is violated. The check lives in decepticons/tests/test_causality.py.
Result JSON format
Anything you want tracked drops a result file in this shape. test_bpb is required; architecture drives family detection; probes enable curve and saturation analysis.
{
"model": {
"test_bpb": 1.75,
"architecture": "my_model",
"params": 10000000
},
"config": {
"train": {
"steps": 10000,
"seq_len": 512,
"batch_size": 64,
"learning_rate": 0.005
}
},
"training": {
"final_eval_batches": 200,
"performance": {
"tokens_per_second": 350000,
"elapsed_sec": 900,
"steps_completed": 10000
},
"probes": [
{"step": 100, "bpb": 2.5, "eval_batches": 2},
{"step": 1000, "bpb": 2.0, "eval_batches": 8},
{"step": 10000, "bpb": 1.75, "eval_batches": 16}
]
},
"dataset": {
"test_tokens_per_byte": 0.4105,
"test_bytes_per_token": 2.436
}
}
the stack
Two repos. One direction of dependency.
decepticons
Backend-neutral kernel of predictive primitives — substrates, memory, gating, routing, readouts. Reusable mechanisms that downstream systems combine into trained models.
decepticons.win →chronohorn
Family-agnostic experiment tracker and architecture-search runtime. Training, replay, fleet dispatch, MCP, observation. Imports decepticons; never the reverse.
you are here ✦