The Stateful Loop Refactor: Fresh Sessions per Iteration¶
The story in one sentence¶
The first working version of the Ralph loop used the ADK
LoopAgent as the outer loop, which accumulated conversation
history across iterations and poisoned later attempts with
earlier context. I refactored the outer loop into a Python-driven
orchestration that creates a fresh ADK session per agent turn,
with run state managed separately from conversation history.
Why this is its own entry¶
This is one of the load-bearing refactors in the project. The
original LoopAgent-driven version looked elegant in code but
produced subtly-wrong behavior as runs got longer — by iteration
3 or 4, the LLM was seeing its own reasoning from iterations 1
and 2 in the conversation history, and its responses were
increasingly shaped by what it had already said rather than by
what the current rule required. This is a classic context-
pollution failure mode and it's worth naming explicitly.
The original (broken) shape¶
ADK provides a LoopAgent class that wraps a sub-agent and
re-invokes it in a loop. The natural way to use it is:
loop_agent = LoopAgent(
name="ralph_loop",
sub_agent=worker_agent,
max_iterations=50,
)
async for event in runner.run_async(session_id, initial_message):
...
Under the hood, ADK maintains a single session across all iterations. Each loop cycle appends to the session's conversation history. By design, this lets the sub-agent "remember" what it did on previous iterations — which is exactly what you want for many agentic workflows (research agents, code-editing agents).
For a reflexion loop where each iteration is a fresh attempt at the same rule, this is exactly what you don't want. Each iteration should see the rule as if it were the first time, enriched by a curated, compact summary of prior attempts (the episodic memory), not by the raw conversation of those attempts.
The symptoms¶
Early-run iterations looked fine. The Worker would pick a rule, generate a fix, the fix would fail, the Worker would try again, and the second attempt was usually different from the first — which is what you want.
By iteration 4 or 5, the Worker started producing responses that sounded like continuations of earlier reasoning rather than fresh approaches:
- "As I mentioned earlier..."
- "Building on the previous attempt..."
- Reasoning chains that referenced details only relevant to an earlier rule, not the current one
- A slow drift toward apologetic, hedged language as if the model felt accountable for the whole history
The problem wasn't that the model was stupid. It was that the model was being asked the wrong question. It was seeing its own prior attempts in context and was trying to be consistent with them, when what was needed was for it to see a fresh problem with selected lessons from history.
The refactor¶
Two architectural changes:
1. Python-driven outer loop, not LoopAgent¶
The outer reflexion loop is now an ordinary Python while loop
in run_ralph(). ADK is still used for the individual agent
turns — each _run_agent_turn() call uses ADK's Runner and
tools — but the outer loop is not an ADK construct. This means
the outer loop has full control over what state is passed into
each turn.
while True:
# Determine what this attempt needs to see
context = assemble_prompt(...)
# Fresh session for this turn
session = InMemorySessionService().create_session(...)
# One agent turn, one tool call, one text response
response = await _run_agent_turn(agent, session, context)
# Evaluate
...
2. Separate state management from conversation history¶
The run's state — remediated rules, escalated rules, banned
patterns, episodic memory per rule — lives in a plain Python
RunState dataclass. It's updated by the outer loop, not by the
agent. Each agent turn reads from RunState (via the prompt
assembler) but doesn't mutate it directly.
This separation matters because it means the agent's conversation history (which is now one turn long, always) is decoupled from the project's accumulated learning (which is arbitrarily long, summarized, and curated).
The contract after the refactor¶
Each call to _run_agent_turn(agent, session_service, message)
is guaranteed to:
- Create a fresh
InMemorySessionServicesession - Feed it exactly one message (the assembled prompt)
- Collect whatever response comes out (text, tool calls, tool results)
- Return a clean response string
- Never retain any state between calls
From the Worker's perspective, every invocation is "the first time you're seeing this problem." The Worker is therefore forced to reason about the current rule on its merits, not in continuation of prior reasoning. Any lessons from prior attempts arrive via the structured episodic memory summary in the assembled prompt, not via conversation history.
What this unlocked¶
Several things became possible or cleaner after this refactor:
- The memory tier architecture. Episodic memory per rule, semantic memory across rules, working memory per attempt — all three are now explicit Python data structures managed by the outer loop. The prompt assembler decides what slice of each one to show to a given turn. Without the stateful-loop refactor, this separation would have been impossible because ADK's session would have been fighting the memory model.
- The token budget assembler. With full control over what
goes into each turn's message, the budget-aware prompt
assembly (see
improvements/03-context-budget-assembly) became straightforward. Before the refactor, enforcing a budget would have meant fighting ADK's session management. - Architect re-engagement. The re-engagement flow needs the Architect to see a completely new prompt (the re-engagement message) without contamination from its earlier rule-selection prompt. Fresh-session-per-turn makes that trivial.
- Clean observability. Each turn's LLM call is a discrete event with a discrete span in the OpenTelemetry trace. No ambiguity about which tokens belong to which iteration.
- The defensive tool-call cap. The cap described in
improvements/02-worker-single-action-enforcementlives in_run_agent_turn()and is a property of the turn, not the loop. Only possible because the turn is an explicit construct.
What this refactor did NOT fix¶
Worth noting what this refactor didn't solve, because it's easy to assume it did:
- Tool-call explosion inside a single turn. This refactor
ensures each turn starts fresh. It does not prevent the
model from making multiple tool calls inside one turn. That
separate bug (the Worker internal retry loop) was discovered
later in the overnight run and fixed by the per-turn action
cap. See
journey/14-overnight-run-findingsFinding 4. - Prompt overflow at turn assembly. Even with fresh sessions, the prompt handed to each turn could still be too big if the accumulated run state was big. That's what the context budget assembler fixes. Separate problem, separate fix.
The stateful loop refactor is a necessary condition for the rest of the v3 architecture, but it is not sufficient. It had to happen first so the later fixes had clean ground to build on.
What I learned¶
-
"Looping" is not the same as "reflexion." ADK's
LoopAgentis great for iterative workflows where the agent is meant to build on its own reasoning. For reflexion — where each attempt should be fresh except for curated lessons — it is the wrong abstraction. Use ADK for the individual turn; drive the outer loop yourself. -
State belongs outside the session. Conversation history is not the same thing as application state. Conflating them produces subtle, hard-to-debug issues that compound with run length. Keep them separated, even if the framework tempts you to merge them.
-
"Subtly wrong" is more expensive than "clearly broken." The pre-refactor version worked for short runs and only broke at iteration 4+. That made the bug hard to see at first and let me build more architecture on top of a broken foundation before noticing. When a subtle architectural bug is suspected, it's usually worth pausing and fixing it before adding more layers.
Related entries¶
journey/06-tool-calling— the initial ADK tool-calling integration, before this refactorjourney/14-overnight-run-findings— the overnight run that surfaced a different context- accumulation bug (tool-call explosion inside a turn), which this refactor did not preventimprovements/03-context-budget-assembly— the prompt-assembly work that this refactor enabledgotchas/context-window-edge— the atomic note about per-turn ADK session growth