Skip to content

The Stateful Loop Refactor: Fresh Sessions per Iteration

The story in one sentence

The first working version of the Ralph loop used the ADK LoopAgent as the outer loop, which accumulated conversation history across iterations and poisoned later attempts with earlier context. I refactored the outer loop into a Python-driven orchestration that creates a fresh ADK session per agent turn, with run state managed separately from conversation history.

Why this is its own entry

This is one of the load-bearing refactors in the project. The original LoopAgent-driven version looked elegant in code but produced subtly-wrong behavior as runs got longer — by iteration 3 or 4, the LLM was seeing its own reasoning from iterations 1 and 2 in the conversation history, and its responses were increasingly shaped by what it had already said rather than by what the current rule required. This is a classic context- pollution failure mode and it's worth naming explicitly.

The original (broken) shape

ADK provides a LoopAgent class that wraps a sub-agent and re-invokes it in a loop. The natural way to use it is:

loop_agent = LoopAgent(
    name="ralph_loop",
    sub_agent=worker_agent,
    max_iterations=50,
)

async for event in runner.run_async(session_id, initial_message):
    ...

Under the hood, ADK maintains a single session across all iterations. Each loop cycle appends to the session's conversation history. By design, this lets the sub-agent "remember" what it did on previous iterations — which is exactly what you want for many agentic workflows (research agents, code-editing agents).

For a reflexion loop where each iteration is a fresh attempt at the same rule, this is exactly what you don't want. Each iteration should see the rule as if it were the first time, enriched by a curated, compact summary of prior attempts (the episodic memory), not by the raw conversation of those attempts.

The symptoms

Early-run iterations looked fine. The Worker would pick a rule, generate a fix, the fix would fail, the Worker would try again, and the second attempt was usually different from the first — which is what you want.

By iteration 4 or 5, the Worker started producing responses that sounded like continuations of earlier reasoning rather than fresh approaches:

  • "As I mentioned earlier..."
  • "Building on the previous attempt..."
  • Reasoning chains that referenced details only relevant to an earlier rule, not the current one
  • A slow drift toward apologetic, hedged language as if the model felt accountable for the whole history

The problem wasn't that the model was stupid. It was that the model was being asked the wrong question. It was seeing its own prior attempts in context and was trying to be consistent with them, when what was needed was for it to see a fresh problem with selected lessons from history.

The refactor

Two architectural changes:

1. Python-driven outer loop, not LoopAgent

The outer reflexion loop is now an ordinary Python while loop in run_ralph(). ADK is still used for the individual agent turns — each _run_agent_turn() call uses ADK's Runner and tools — but the outer loop is not an ADK construct. This means the outer loop has full control over what state is passed into each turn.

while True:
    # Determine what this attempt needs to see
    context = assemble_prompt(...)

    # Fresh session for this turn
    session = InMemorySessionService().create_session(...)

    # One agent turn, one tool call, one text response
    response = await _run_agent_turn(agent, session, context)

    # Evaluate
    ...

2. Separate state management from conversation history

The run's state — remediated rules, escalated rules, banned patterns, episodic memory per rule — lives in a plain Python RunState dataclass. It's updated by the outer loop, not by the agent. Each agent turn reads from RunState (via the prompt assembler) but doesn't mutate it directly.

This separation matters because it means the agent's conversation history (which is now one turn long, always) is decoupled from the project's accumulated learning (which is arbitrarily long, summarized, and curated).

The contract after the refactor

Each call to _run_agent_turn(agent, session_service, message) is guaranteed to:

  1. Create a fresh InMemorySessionService session
  2. Feed it exactly one message (the assembled prompt)
  3. Collect whatever response comes out (text, tool calls, tool results)
  4. Return a clean response string
  5. Never retain any state between calls

From the Worker's perspective, every invocation is "the first time you're seeing this problem." The Worker is therefore forced to reason about the current rule on its merits, not in continuation of prior reasoning. Any lessons from prior attempts arrive via the structured episodic memory summary in the assembled prompt, not via conversation history.

What this unlocked

Several things became possible or cleaner after this refactor:

  • The memory tier architecture. Episodic memory per rule, semantic memory across rules, working memory per attempt — all three are now explicit Python data structures managed by the outer loop. The prompt assembler decides what slice of each one to show to a given turn. Without the stateful-loop refactor, this separation would have been impossible because ADK's session would have been fighting the memory model.
  • The token budget assembler. With full control over what goes into each turn's message, the budget-aware prompt assembly (see improvements/03-context-budget-assembly) became straightforward. Before the refactor, enforcing a budget would have meant fighting ADK's session management.
  • Architect re-engagement. The re-engagement flow needs the Architect to see a completely new prompt (the re-engagement message) without contamination from its earlier rule-selection prompt. Fresh-session-per-turn makes that trivial.
  • Clean observability. Each turn's LLM call is a discrete event with a discrete span in the OpenTelemetry trace. No ambiguity about which tokens belong to which iteration.
  • The defensive tool-call cap. The cap described in improvements/02-worker-single-action-enforcement lives in _run_agent_turn() and is a property of the turn, not the loop. Only possible because the turn is an explicit construct.

What this refactor did NOT fix

Worth noting what this refactor didn't solve, because it's easy to assume it did:

  • Tool-call explosion inside a single turn. This refactor ensures each turn starts fresh. It does not prevent the model from making multiple tool calls inside one turn. That separate bug (the Worker internal retry loop) was discovered later in the overnight run and fixed by the per-turn action cap. See journey/14-overnight-run-findings Finding 4.
  • Prompt overflow at turn assembly. Even with fresh sessions, the prompt handed to each turn could still be too big if the accumulated run state was big. That's what the context budget assembler fixes. Separate problem, separate fix.

The stateful loop refactor is a necessary condition for the rest of the v3 architecture, but it is not sufficient. It had to happen first so the later fixes had clean ground to build on.

What I learned

  1. "Looping" is not the same as "reflexion." ADK's LoopAgent is great for iterative workflows where the agent is meant to build on its own reasoning. For reflexion — where each attempt should be fresh except for curated lessons — it is the wrong abstraction. Use ADK for the individual turn; drive the outer loop yourself.

  2. State belongs outside the session. Conversation history is not the same thing as application state. Conflating them produces subtle, hard-to-debug issues that compound with run length. Keep them separated, even if the framework tempts you to merge them.

  3. "Subtly wrong" is more expensive than "clearly broken." The pre-refactor version worked for short runs and only broke at iteration 4+. That made the bug hard to see at first and let me build more architecture on top of a broken foundation before noticing. When a subtle architectural bug is suspected, it's usually worth pausing and fixing it before adding more layers.