The Stateful Loop Refactor: Fresh Sessions per Iteration¶

The first working version of the Ralph loop used ADK's LoopAgent as the outer loop. It was elegant in code — fifteen lines of Python, a clean abstraction, the kind of thing that looks right in a code review. It produced subtly-wrong behavior by the third or fourth iteration: the LLM was seeing its own reasoning from iterations 1 and 2 in the conversation history, and its responses were increasingly shaped by what it had already said rather than by what the current rule required.

That's context pollution — the Ralph-loop failure mode Ghuntley's original post exists to prevent. The entire point of the pattern is that each iteration starts fresh, reads a distilled external record, and acts with clean reasoning. A LoopAgent that carries the conversation history across iterations has Ralph's shape but not Ralph's content. The refactor was to split the loop: Python drives iteration, ADK sessions get created fresh per agent turn, run state lives separately from conversation history. Small-looking change. Load-bearing.

The original (broken) shape¶

ADK provides a LoopAgent class that wraps a sub-agent and re-invokes it in a loop. The natural way to use it is:

loop_agent = LoopAgent(
    name="ralph_loop",
    sub_agent=worker_agent,
    max_iterations=50,
)

async for event in runner.run_async(session_id, initial_message):
    ...

Under the hood, ADK maintains a single session across all iterations. Each loop cycle appends to the session's conversation history. By design, this lets the sub-agent "remember" what it did on previous iterations — which is exactly what you want for many agentic workflows (research agents, code-editing agents).

For a reflexion loop where each iteration is a fresh attempt at the same rule, this is exactly what you don't want. Each iteration should see the rule as if it were the first time, enriched by a curated, compact summary of prior attempts (the episodic memory), not by the raw conversation of those attempts.

The symptoms¶

Early-run iterations looked fine. The Worker would pick a rule, generate a fix, the fix would fail, the Worker would try again, and the second attempt was usually different from the first — which is what you want.

By iteration 4 or 5, the Worker started producing responses that sounded like continuations of earlier reasoning rather than fresh approaches:

"As I mentioned earlier..."
"Building on the previous attempt..."
Reasoning chains that referenced details only relevant to an earlier rule, not the current one
A slow drift toward apologetic, hedged language as if the model felt accountable for the whole history

The problem wasn't that the model was stupid. It was that the model was being asked the wrong question. It was seeing its own prior attempts in context and was trying to be consistent with them, when what was needed was for it to see a fresh problem with selected lessons from history.

The refactor¶

Two architectural changes:

1. Python-driven outer loop, not `LoopAgent`¶

The outer reflexion loop is now an ordinary Python while loop in run_ralph(). ADK is still used for the individual agent turns — each _run_agent_turn() call uses ADK's Runner and tools — but the outer loop is not an ADK construct. This means the outer loop has full control over what state is passed into each turn.

while True:
    # Determine what this attempt needs to see
    context = assemble_prompt(...)

    # Fresh session for this turn
    session = InMemorySessionService().create_session(...)

    # One agent turn, one tool call, one text response
    response = await _run_agent_turn(agent, session, context)

    # Evaluate
    ...

2. Separate state management from conversation history¶

The run's state — remediated rules, escalated rules, banned patterns, episodic memory per rule — lives in a plain Python RunState dataclass. It's updated by the outer loop, not by the agent. Each agent turn reads from RunState (via the prompt assembler) but doesn't mutate it directly.

This separation matters because it means the agent's conversation history (which is now one turn long, always) is decoupled from the project's accumulated learning (which is arbitrarily long, summarized, and curated).

The contract after the refactor¶

Each call to _run_agent_turn(agent, session_service, message) is guaranteed to:

Create a fresh InMemorySessionService session
Feed it exactly one message (the assembled prompt)
Collect whatever response comes out (text, tool calls, tool results)
Return a clean response string
Never retain any state between calls

From the Worker's perspective, every invocation is "the first time you're seeing this problem." The Worker is therefore forced to reason about the current rule on its merits, not in continuation of prior reasoning. Any lessons from prior attempts arrive via the structured episodic memory summary in the assembled prompt, not via conversation history.

What this unlocked¶

Several things became possible or cleaner after this refactor:

The memory tier architecture. Episodic memory per rule, semantic memory across rules, working memory per attempt — all three are now explicit Python data structures managed by the outer loop. The prompt assembler decides what slice of each one to show to a given turn. Without the stateful-loop refactor, this separation would have been impossible because ADK's session would have been fighting the memory model.
The token budget assembler. With full control over what goes into each turn's message, the budget-aware prompt assembly (see improvements/03-context-budget-assembly) became straightforward. Before the refactor, enforcing a budget would have meant fighting ADK's session management.
Architect re-engagement. The re-engagement flow needs the Architect to see a completely new prompt (the re-engagement message) without contamination from its earlier rule-selection prompt. Fresh-session-per-turn makes that trivial.
Clean observability. Each turn's LLM call is a discrete event with a discrete span in the OpenTelemetry trace. No ambiguity about which tokens belong to which iteration.
The defensive tool-call cap. The cap described in improvements/02-worker-single-action-enforcement lives in _run_agent_turn() and is a property of the turn, not the loop. Only possible because the turn is an explicit construct.

What this refactor did NOT fix¶

Worth noting what this refactor didn't solve, because it's easy to assume it did:

Tool-call explosion inside a single turn. This refactor ensures each turn starts fresh. It does not prevent the model from making multiple tool calls inside one turn. That separate bug (the Worker internal retry loop) was discovered later in the overnight run and fixed by the per-turn action cap. See journey/14-overnight-run-findings Finding 4.
Prompt overflow at turn assembly. Even with fresh sessions, the prompt handed to each turn could still be too big if the accumulated run state was big. That's what the context budget assembler fixes. Separate problem, separate fix.

The stateful loop refactor is a necessary condition for the rest of the v3 architecture, but it is not sufficient. It had to happen first so the later fixes had clean ground to build on.

What I learned¶

"Looping" is not the same as "reflexion." ADK's LoopAgent is great for iterative workflows where the agent is meant to build on its own reasoning. For reflexion — where each attempt should be fresh except for curated lessons — it is the wrong abstraction. Use ADK for the individual turn; drive the outer loop yourself.
State belongs outside the session. Conversation history is not the same thing as application state. Conflating them produces subtle, hard-to-debug issues that compound with run length. Keep them separated, even if the framework tempts you to merge them.
"Subtly wrong" is more expensive than "clearly broken." The pre-refactor version worked for short runs and only broke at iteration 4+. That made the bug hard to see at first and let me build more architecture on top of a broken foundation before noticing. When a subtle architectural bug is suspected, it's usually worth pausing and fixing it before adding more layers.

journey/06-tool-calling — the initial ADK tool-calling integration, before this refactor
journey/14-overnight-run-findings — the overnight run that surfaced a different context- accumulation bug (tool-call explosion inside a turn), which this refactor did not prevent
improvements/03-context-budget-assembly — the prompt-assembly work that this refactor enabled
gotchas/context-window-edge — the atomic note about per-turn ADK session growth