Improvement: Snapshot-Based Revert with Diagnostic Capture¶
Status: Implemented 2026-04-11 as part of the Fix #5 pass
Related: journey/14-overnight-run-findings.md Finding 3 (sudo deadlock)
Files changed: gemma_forge/harness/tools/ssh.py, gemma_forge/harness/ralph.py
The problem this solves¶
The original revert mechanism was script-based: the Worker generated both a
fix_script and a revert_script, and the harness ran the revert script via
SSH+sudo when the fix failed its evaluation. This had three fatal assumptions:
- The revert script is correct. The Worker sometimes produces reverts that don't cover the same filesystem state the fix modified, or that have bash syntax errors, or that silently no-op on state that was partially applied.
- Sudo still works after the fix. The overnight run hit a rule
(
sudo_remove_nopasswd) whose fix ransed -i 's/NOPASSWD/...' /etc/sudoers. That fix passed the STIG rule check but broke passwordless sudo. The revert script needed sudo to restore sudoers. Sudo was broken. Revert failed silently. From that point on, every subsequent attempt for the next ~7 hours saw sudo broken and reported "privilege escalation deadlock" in 42 reflections before the run was stopped. - The filesystem is still writable. A fix can break disk space,
readonly-remount
/etc, remove/bin/bash, or corrupt any number of preconditions the revert script implicitly depends on.
None of these assumptions hold under adversarial conditions, and "an agent trying to remediate STIG rules" is exactly adversarial for the revert mechanism.
The right answer: VM snapshots¶
libvirt snapshots are authoritative. A restored snapshot is a pristine qcow2 COW state that cannot be corrupted by anything the guest does. Restore is fast (~2–5 seconds on internal snapshots). libvirt is rock-solid at this.
But we don't want to lose accumulated progress. If rule 1 and rule 2 are
successfully remediated and rule 3 fails, rolling back to baseline would
lose rules 1 and 2. The fix is a two-snapshot scheme:
baseline— VM initial state, never modified after first creation. The forever fallback when nothing else exists.progress— rolling snapshot, advanced after each successful remediation. Contains all accumulated successful fixes. Replaced (delete + create) after each win so the chain stays short.
On failure of an attempt, restore to progress if it exists, else
baseline. Successful attempts preserve their work by advancing progress.
Diagnostics before revert — the learning layer¶
When an attempt fails, we could just restore the snapshot and move on. But that throws away the one thing that matters: why did it fail? Without forensics, the Reflector only sees "APPLY_FAILED" and can't reason about root causes.
Before the snapshot restore, we run gather_environment_diagnostics() which
captures a structured post-mortem via SSH (or virsh console fallback):
| Section | What it tells us |
|---|---|
SUDO_PROBE |
Did the fix break non-tty sudo? (sudo -n whoami) |
SERVICE_STATUS |
Which mission-critical services are still running? |
MISSION_HEALTHCHECK |
Does the mission app's own healthcheck pass? |
RECENT_AUTH_FAILURES |
What's in journalctl for PAM/sudo/auth errors? |
SUDOERS_STATE |
Has /etc/sudoers or /etc/sudoers.d/ been modified? |
PAM_STATE |
Has /etc/pam.d/sudo been rewritten? |
FS_STATE |
Is the filesystem still mounted rw? Can we write to /tmp? |
NETWORK_STATE |
What's listening? Did the fix break network? |
RECENT_JOURNAL_ERRORS |
Top journal errors in the last 5 minutes |
This gets serialized into a structured post_mortem event in the run log,
with boolean flags (sudo_ok, services_ok, mission_healthy) for quick
decisioning and the full parsed sections for the Reflector's context. The
Reflector now receives real facts about why things broke, not just the
fact that they broke.
The revert sequence (new)¶
# 1. Gather forensics (runs via SSH, falls back to virsh console if broken)
diagnostics = await gather_environment_diagnostics(ssh_config)
# 2. Emit structured post_mortem event for the log + Reflector
run_log.log("post_mortem", "harness", {
"rule_id": ..., "category": ..., "attempt": ...,
"sudo_ok": diagnostics["sudo_ok"],
"services_ok": diagnostics["services_ok"],
"mission_healthy": diagnostics["mission_healthy"],
"sudo_probe": diagnostics["sudo_probe"],
"service_status": diagnostics["service_status"],
"recent_auth_failures": diagnostics["recent_auth_failures"],
...
})
# 3. Snapshot-restore to the last known-good state
ok, detail = await snapshot_restore_progress() # prefers `progress`, falls back to `baseline`
# 4. Verify restore actually worked (lightweight sudo probe)
post_ok, _ = await check_sudo_healthy(ssh_config)
if not post_ok:
run_log.log("environment_unrecoverable", ...) # libvirt or snapshot chain is wrong
And on successful remediation:
# Advance progress snapshot so future failures revert to this state
snap_ok, snap_detail = await snapshot_save_progress()
Why this is also going to help with other problems¶
Ken's instinct — "it will be better in the long run and it will help with other problems" — is exactly right. The snapshot-based revert also buys us:
- Resilience to Worker bugs. A Worker that produces an incorrect revert script no longer corrupts future attempts.
- Resilience to unanticipated fix side effects. A fix that modifies state the Worker didn't think about (log files, systemd runtime state, kernel parameters) is fully rolled back by the snapshot, not by whatever subset the revert_script happened to target.
- Honest attempt counting. Previously a "successful revert" could leave the VM in a subtly different state than before the fix, which meant attempt N+1 on the same rule didn't start from a true known-good baseline. Now every attempt starts from a guaranteed known-good state.
- A path for skill authors to stop writing revert scripts. Future
skills don't need to produce
revert_scriptat all — the snapshot is the revert. Therevert_scriptfield stays as informational metadata (the Worker says what it thought it was undoing), useful for post-run analysis but no longer load-bearing for correctness. - Forensic-quality post-mortems. The
post_mortemevent is a rich, structured record of exactly what broke. Future iterations of the dashboard can surface these as "failure cards" — showing "here's the failure, here's exactly what broke, here's the rollback proof" instead of "rule X failed, we tried again." - Decouples Reflector quality from Worker quality. The Reflector can
reason about real environment facts (from
post_mortem) rather than the Worker's self-reported output. This is a more robust learning loop.
What stayed¶
- The
ssh_applyfunction still acceptsrevert_scriptfor compatibility and logs it as metadata. Future cleanup: the skill manifest can make this field optional. - The virsh-console fallback in
_run_sshis still there and is actually more important now — when SSH is broken, the diagnostic gather uses console to grab the post-mortem before we restore. - The
baselinesnapshot creation and management is unchanged; it's still a one-time setup viavm-snapshot.sh create baseline.
What the run start now does¶
run_ralph() does a preflight snapshot check before any scan or LLM call:
- Verify
baselineexists; hard-fail if not (with instructions to create it) - Delete any stale
progresssnapshot from a prior run (non-fatal if absent) - Emit a
snapshot_preflightevent
This ensures every run starts with a clean revert target and catches libvirt issues early (before the LLM burns any tokens).
Tests¶
Both paths were verified end-to-end against the real VM:
Diagnostic capture against healthy baseline:
sudo_ok: True
services_ok: True
mission_healthy: True
service_status:
nginx: active
postgresql: active
sshd: active
chronyd: active
mission_healthcheck: HEALTHY: nginx=ok postgres=ok sshd=ok
Snapshot save/restore cycle:
1. Create progress snapshot from current VM state → ok
2. Progress exists? True
3. Restore from progress → ok, restored=progress
4. Delete progress → ok
5. Progress exists after cleanup? False
Both paths complete in under 5 seconds against the current libvirt setup.
Known limitations¶
- Snapshot create time will add ~2s per successful remediation. At 100 remediations per run, that's 200s overhead — negligible against 10+ hours of LLM grinding.
- Snapshot restore time is ~2–5s for internal qcow2 snapshots. A run that has 50 failed attempts would add 100–250s of restore overhead, still negligible.
- libvirt internal snapshots accumulate qcow2 metadata. The two-snapshot scheme (baseline + single rolling progress) keeps this bounded. Do not use this pattern with unbounded snapshot chains.
- The snapshot is VM-wide, so if the fix modifies state outside /etc (e.g., /var/lib/postgresql), the revert rolls that back too. This is actually what we want for correctness.
- Progress snapshot is host-local. Moving the workload to a different libvirt host would lose it. For the demo, this is fine.