I got tired of babysitting Claude Code sessions. Every task needed me watching, confirming, waiting. So I built a system to eliminate myself from the loop entirely.
The trick was claude -p — Claude Code’s headless mode. Once I realized I could drive it programmatically, the rest fell into place: SQS queue, EC2 daemon, auto-push to GitHub, email me the results. Zero interaction required. $20/month.
Open source: github.com/samuelfrench/claude-autonomous-runner

What It Actually Does
The inspiration was Clawdbot (now OpenClaw) — Peter Steinberger’s open-source AI agent that blew up to 247K GitHub stars by showing what happens when you give an LLM real system access. I copied the core concept: pipe a prompt into a headless claude -p session, let it edit code and commit, then queue the next task. Clawdbot does personal automation through chat apps. I wanted the same idea applied to autonomous coding — so I built this with SQS, EC2, and bash.
You submit a task from the CLI, a coding agent picks it up, executes it against your repo, pushes any commits, and emails you the results. No babysitting.
./client/clawd honey-explorer "Fix the broken quiz page" --provider claude
Architecture
The system has three runners, each polling its own SQS queue — all following the same lifecycle:
- Claude runner — EC2
t3a.medium, runsclaude -p --dangerously-skip-permissions - Codex runner — Same EC2 instance, runs
codex exec --full-auto - Ollama runner — My local workstation (RTX 4090), runs
aiderwithqwen2.5-coder:32b
- SQS long-poll (20s wait) for tasks
- DynamoDB update — mark task as
running - Git clone/fetch/checkout the target project
- Execute the coding agent with the prompt
- Git push if there are new commits
- DynamoDB update — mark
completed/failedwith output - SES email — send results
- SQS delete — remove the message
The runner scripts are pure bash. No frameworks, no orchestration layers. Each one is ~300 lines.
The Self-Queuing Loop
When autonomous.enabled: true, after each completed task the runner reads TODO.md, picks the highest-leverage item, executes it, updates the file, and commits. Then queues itself again.
I’ve woken up to 6 commits I didn’t write. That’s the feeling. You go to sleep with a half-finished project and come back to forward progress — tests added, a bug fixed, a component refactored. Disorienting in a good way.
{
"honey-explorer": {
"repo": "git@github.com:samuelfrench/honey-explorer.git",
"branch": "main",
"autonomous": {
"enabled": true,
"goal": "Test and fix broken features, then visual polish",
"cooldown_minutes": 30,
"effort": "high"
}
}
}Failure handling is what makes this safe to leave running. Exponential backoff: cooldown × 2^failures, capped at 60 minutes. This saved me when a bad commit broke a build — instead of hammering GitHub Actions with 50 failing tasks, the loop slowed itself down and I got a single alert email after 5 consecutive failures. Clean stop, no runaway bill.

Local LLM Fallback with Ollama
The third runner runs entirely on my local machine — an RTX 4090 with 24GB VRAM — using aider as the coding agent with Ollama’s qwen2.5-coder:32b model. The Claude runner uses my Max subscription. The Codex runner needs OpenAI API credits. The Ollama runner burns electricity and nothing else.
It also has local image generation via ComfyUI with Stable Diffusion XL. If the prompt contains [IMAGE: description], it generates the image locally before handing the code task to aider.
# The execution line in ollama-runner.sh
timeout 14400 aider \
--model "ollama_chat/$MODEL" \
--yes-always \
--no-auto-lint \
--no-stream \
--message "$PROMPT"The 4-hour timeout is necessary — local inference runs at ~20 tokens/sec.

The Web Dashboard
I didn’t want to manage a build system just to monitor a task queue. The dashboard is a single HTML file — no bundler, no framework, no npm. Drop it in S3, done.
It talks to three Lambda functions behind API Gateway:
POST /tasks— submit a task to any providerGET /tasks— list and filter tasksGET /projects— list configured projects
That’s the whole backend. No server to maintain, costs essentially nothing.
Credential Management
The Claude runner on EC2 needs fresh OAuth credentials. I have a cron job on my local machine that syncs them every 30 minutes:
# Smart sync — validates before copying
LOCAL_EXPIRES=$(jq -r '.claudeAiOauth.expiresAt // 0' "$CREDS")
NOW_MS=$(($(date +%s) * 1000))
if [ "$LOCAL_EXPIRES" -le "$NOW_MS" ]; then
log "SKIP: Local credentials expired"
exit 0
fi
# Compare with remote — only copy if local is newer
REMOTE_EXPIRES=$(ssh $SSH_OPTS "$REMOTE" \
"jq -r '.claudeAiOauth.expiresAt // 0' $REMOTE_CREDS")
if [ "$LOCAL_EXPIRES" -le "$REMOTE_EXPIRES" ]; then
log "SKIP: Remote already up-to-date"
exit 0
fi
scp $SSH_OPTS "$CREDS" "$REMOTE:$REMOTE_CREDS"Validates local credentials, checks EC2 is reachable, compares expiry timestamps. No blind overwrites.
Cost Breakdown
| Component | Monthly Cost |
|---|---|
| EC2 t3a.medium (on-demand) | ~$20 |
| SQS, DynamoDB, SES | < $0.10 |
| S3 + CloudFront (dashboard) | < $0.50 |
| Ollama runner (local) | $0 |
| Total | ~$21 |
Claude inference is covered by my Max subscription. Ollama is free. The real cost is the EC2 instance.
Getting Started
github.com/samuelfrench/claude-autonomous-runner
- Run
infrastructure/setup.shto create all AWS resources - SSH into the instance and run
claude auth login - Configure your projects in
config/projects.json - Submit tasks with
./client/clawd <project> "your prompt"
The setup script creates SQS queues, DynamoDB table, IAM roles, security groups, and EC2 instance in one shot. infrastructure/teardown.sh deletes everything.
Lessons from Running an Unsupervised Agent
Running autonomous agents 24/7 for a week taught me things I couldn’t have learned any other way.
- TODO.md is everything. Without it, the agent spins in circles, re-doing work it already did. I watched it refactor the same component twice before I added explicit task tracking.
- Backoff or burn. A misconfigured repo caused 47 failed re-queues in 4 minutes. Exponential backoff isn’t optional.
- Local models earn their keep.
qwen2.5-coder:32bthrough aider is not Claude — but it’s free, and it ships code. - No git push = nothing happened. A task can “complete” and do absolutely nothing. Push detection is the only metric that matters.
honey-explorer has been running itself for a week. Real bugs fixed, real commits pushed. One task per run. Queue the next. Repeat.






