Production Data
From 17% to 96%
How policy enforcement matured an autonomous engineering fleet in 8 days.
The Setup
Summit Cognitive runs on autonomous AI agents: Devin writes features, Codex refactors libraries, Jules triages bugs, Dependabot manages dependencies, and human-authored PRs flow through the same pipeline. Six agents. One monorepo. No gatekeeping layer between an agent's decision and production.
Day 1: May 12, 2026
Decision Receipt went live. It evaluated every PR merge event against 9 policy rules, issued a cryptographically signed receipt for each, and blocked any action that failed.
Day 1 acceptance rate: 16.7%. Out of 6 events, 5 were blocked. The agents were submitting work that looked fine in a log but failed basic evidence requirements when held to a formal standard.
The Improvement Curve
| Date | Events | Accepted | Blocked | Rate |
|---|---|---|---|---|
| May 12 | 6 | 1 | 5 | 16.7% |
| May 13 | 5 | 2 | 3 | 40.0% |
| May 14 | 12 | 6 | 6 | 50.0% |
| May 15 | 7 | 3 | 4 | 42.9% |
| May 16 | 35 | 18 | 17 | 51.4% |
| May 17 | 28 | 28 | 0 | 100% |
| May 18 | 26 | 24 | 2 | 92.3% |
| May 19 | 25 | 24 | 1 | 96.0% |
Per-Agent Breakdown
The Insight
No agent was retrained. No model was fine-tuned. Policy enforcement created a feedback loop. The 9 rules set a clear, deterministic bar. Agents that could adapt improved within days. Agents that could not (Dependabot) were identified immediately.
Data from the live Decision Receipt production API. All numbers reflect real enforcement receipts. Last updated: May 19, 2026.