How I Let AI Agents Write Code While I Sleep - Sam French's Programming Blog

I got tired of babysitting Claude Code sessions. Every task needed me watching, confirming, waiting. So I built a system to eliminate myself from the loop entirely.

The trick was claude -p — Claude Code’s headless mode. Once I realized I could drive it programmatically, the rest fell into place: SQS queue, EC2 daemon, auto-push to GitHub, email me the results. Zero interaction required. $20/month.

Open source: github.com/samuelfrench/claude-autonomous-runner

What It Actually Does

The inspiration was Clawdbot (now OpenClaw) — Peter Steinberger’s open-source AI agent that blew up to 247K GitHub stars by showing what happens when you give an LLM real system access. I copied the core concept: pipe a prompt into a headless claude -p session, let it edit code and commit, then queue the next task. Clawdbot does personal automation through chat apps. I wanted the same idea applied to autonomous coding — so I built this with SQS, EC2, and bash.

You submit a task from the CLI, a coding agent picks it up, executes it against your repo, pushes any commits, and emails you the results. No babysitting.

./client/clawd honey-explorer "Fix the broken quiz page" --provider claude

Architecture

The system has three runners, each polling its own SQS queue — all following the same lifecycle:

Claude runner — EC2 t3a.medium, runs claude -p --dangerously-skip-permissions
Codex runner — Same EC2 instance, runs codex exec --full-auto
Ollama runner — My local workstation (RTX 4090), runs aider with qwen2.5-coder:32b

SQS long-poll (20s wait) for tasks
DynamoDB update — mark task as running
Git clone/fetch/checkout the target project
Execute the coding agent with the prompt
Git push if there are new commits
DynamoDB update — mark completed/failed with output
SES email — send results
SQS delete — remove the message

The runner scripts are pure bash. No frameworks, no orchestration layers. Each one is ~300 lines.

The Self-Queuing Loop

When autonomous.enabled: true, after each completed task the runner reads TODO.md, picks the highest-leverage item, executes it, updates the file, and commits. Then queues itself again.

I’ve woken up to 6 commits I didn’t write. That’s the feeling. You go to sleep with a half-finished project and come back to forward progress — tests added, a bug fixed, a component refactored. Disorienting in a good way.

{
  "honey-explorer": {
    "repo": "git@github.com:samuelfrench/honey-explorer.git",
    "branch": "main",
    "autonomous": {
      "enabled": true,
      "goal": "Test and fix broken features, then visual polish",
      "cooldown_minutes": 30,
      "effort": "high"
    }
  }
}

Failure handling is what makes this safe to leave running. Exponential backoff: cooldown × 2^failures, capped at 60 minutes. This saved me when a bad commit broke a build — instead of hammering GitHub Actions with 50 failing tasks, the loop slowed itself down and I got a single alert email after 5 consecutive failures. Clean stop, no runaway bill.

Local LLM Fallback with Ollama

The third runner runs entirely on my local machine — an RTX 4090 with 24GB VRAM — using aider as the coding agent with Ollama’s qwen2.5-coder:32b model. The Claude runner uses my Max subscription. The Codex runner needs OpenAI API credits. The Ollama runner burns electricity and nothing else.

It also has local image generation via ComfyUI with Stable Diffusion XL. If the prompt contains [IMAGE: description], it generates the image locally before handing the code task to aider.

# The execution line in ollama-runner.sh
timeout 14400 aider \
    --model "ollama_chat/$MODEL" \
    --yes-always \
    --no-auto-lint \
    --no-stream \
    --message "$PROMPT"

The 4-hour timeout is necessary — local inference runs at ~20 tokens/sec.

The Web Dashboard

I didn’t want to manage a build system just to monitor a task queue. The dashboard is a single HTML file — no bundler, no framework, no npm. Drop it in S3, done.

It talks to three Lambda functions behind API Gateway:

POST /tasks — submit a task to any provider
GET /tasks — list and filter tasks
GET /projects — list configured projects

That’s the whole backend. No server to maintain, costs essentially nothing.

Credential Management

The Claude runner on EC2 needs fresh OAuth credentials. I have a cron job on my local machine that syncs them every 30 minutes:

# Smart sync — validates before copying
LOCAL_EXPIRES=$(jq -r '.claudeAiOauth.expiresAt // 0' "$CREDS")
NOW_MS=$(($(date +%s) * 1000))

if [ "$LOCAL_EXPIRES" -le "$NOW_MS" ]; then
    log "SKIP: Local credentials expired"
    exit 0
fi

# Compare with remote — only copy if local is newer
REMOTE_EXPIRES=$(ssh $SSH_OPTS "$REMOTE" \
    "jq -r '.claudeAiOauth.expiresAt // 0' $REMOTE_CREDS")

if [ "$LOCAL_EXPIRES" -le "$REMOTE_EXPIRES" ]; then
    log "SKIP: Remote already up-to-date"
    exit 0
fi

scp $SSH_OPTS "$CREDS" "$REMOTE:$REMOTE_CREDS"

Validates local credentials, checks EC2 is reachable, compares expiry timestamps. No blind overwrites.

Cost Breakdown

Component	Monthly Cost
EC2 t3a.medium (on-demand)	~$20
SQS, DynamoDB, SES	< $0.10
S3 + CloudFront (dashboard)	< $0.50
Ollama runner (local)	$0
Total	~$21

Claude inference is covered by my Max subscription. Ollama is free. The real cost is the EC2 instance.

Getting Started

github.com/samuelfrench/claude-autonomous-runner

Run infrastructure/setup.sh to create all AWS resources
SSH into the instance and run claude auth login
Configure your projects in config/projects.json
Submit tasks with ./client/clawd <project> "your prompt"

The setup script creates SQS queues, DynamoDB table, IAM roles, security groups, and EC2 instance in one shot. infrastructure/teardown.sh deletes everything.

Lessons from Running an Unsupervised Agent

Running autonomous agents 24/7 for a week taught me things I couldn’t have learned any other way.

TODO.md is everything. Without it, the agent spins in circles, re-doing work it already did. I watched it refactor the same component twice before I added explicit task tracking.
Backoff or burn. A misconfigured repo caused 47 failed re-queues in 4 minutes. Exponential backoff isn’t optional.
Local models earn their keep. qwen2.5-coder:32b through aider is not Claude — but it’s free, and it ships code.
No git push = nothing happened. A task can “complete” and do absolutely nothing. Push detection is the only metric that matters.

honey-explorer has been running itself for a week. Real bugs fixed, real commits pushed. One task per run. Queue the next. Repeat.