Gotcha: Adaptive concurrency clutch is built and deliberately deferred¶
Status¶
Deliberately deferred, DEF-01 in docs/deferred.md.
The clutch was never accidentally left unwired. We built it in V5
with the intent to wire it after cross-run memory had Run 1+Run 2
data to reason about. Every run since has kept it serial because the
wiring work is bundled with a dashboard rewrite (parallel lanes
instead of a single linear narrative), and stopping momentum on
memory, ordering, or the second skill to spend a weekend on
concurrency + UI never felt like the right trade. Entry 32 promoted
that recurring deferral to the debt registry where the full plan
lives.
This page remains as an orientation marker — if you're reading the code, you'll see the clutch initialized and will want to know why it doesn't seem to do anything.
Symptom¶
The clutch_initialized event in the run log shows per-category
worker recommendations (authentication=3, kernel=2, audit=1), but
every rule is still processed one at a time. The clutch has no
effect on execution — by design, for now.
What is actually running¶
ralph.py's outer loop is a simple for loop that processes one
rule per iteration. The clutch is initialized, reads the difficulty
model from prior runs, and logs its recommendations — but
clutch.recommend_workers() is never called and asyncio.gather
is never used to run multiple rules concurrently. The methods are
covered by tests/test_memory_and_clutch.py and exercised by
tools/smoke_memory_e2e.py, so the infrastructure is production-
ready from a correctness standpoint. The gating item is the UI.
What wiring would involve¶
- Before selecting the next rule, ask the clutch how many workers the current category supports.
- If >1, use
asyncio.gatherto process multiple rules from that category concurrently. - Respect resource conflict constraints from the TaskGraph (rules that touch the same files can't run in parallel).
- Rewrite the dashboard's "now processing" ribbon to widen and narrow as the clutch's recommendation changes — the active-queue band design captured in DEF-01.
Why it stayed deferred¶
The concurrency work is not trivial and the existing dashboard is built around a single linear narrative. Delivering hidden throughput without the UI to show it would destroy the "watch the edge AI work" demo that is core to what this project is for. The bundle — concurrency wiring + dashboard rewrite — is a weekend of focused work that hasn't been the right trade against V2 memory, the ordering constraint, CVE as a second skill, or the per-family reboot architecture. DEF-01 captures the trigger: when we're ready to trade a demo weekend for throughput, the plan is already written.
Environment¶
- gemma_forge/harness/clutch.py — ClutchConfig, Clutch class
- gemma_forge/harness/ralph.py — outer loop (lines ~885-900)
- Single vLLM instance, TP=4 — concurrent requests are supported but not tested under parallel agent load