agentic-toolkit/README.md
Khalim Conn-Kowlessar 3f27d10fb6 feat: add /ralph-loop skill (subscription-based runner)
Subscription-based counterpart to `agentic-toolkit run`. Instead of
sandcastle + Docker + Anthropic API, dispatches each ready ticket to
a fresh Claude Code subagent (general-purpose) — same fresh-context
property as per-container sandcastle runs, but zero infra.

Trade-off: no sandbox isolation. Recommend running on a clean checkout.

Mirrors the CLI runner's project schema, phase logic, branch naming,
status transitions, and idempotency. v1 fails on first error (no retry
state machine yet) — failure-handler.ts parity is future work.

README updated with two-path workflow diagram and comparison table.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-05 11:08:31 +00:00

118 lines
5.5 KiB
Markdown

# agentic-toolkit
Domna's agentic toolkit. Two things in one repo:
1. **A curated, version-pinned skill set** for Claude Code (Matt Pocock's skills + Domna's own), installable into any target repo with one script.
2. **A sandcastle-based runner** that executes a GitHub Project of issues against a target repo, in either per-ticket-PR mode or single-PR mode.
## Quick start (consume in a target repo)
From the root of any Domna repo:
```sh
curl -fsSL https://raw.githubusercontent.com/Hestia-Homes/agentic-toolkit/main/setup.sh | bash
```
This installs the curated skills into the repo and writes `skills-lock.json`. Re-run whenever the toolkit bumps its pinned versions.
After installation, run `/setup-matt-pocock-skills` once per repo to record the issue tracker, triage labels, and domain-doc layout.
## Running the runner
The runner is invoked from inside this repo, pointing at a target repo.
Prerequisites:
- Docker Desktop running on macOS
- `GITHUB_TOKEN` env var with `repo` and `project` scopes
- A GitHub Project (v2) created via `/to-project` (or manually with the same Status schema)
```sh
git clone https://github.com/Hestia-Homes/agentic-toolkit.git
cd agentic-toolkit
npm install
npm run build
GITHUB_TOKEN=ghp_xxx \
GITHUB_VIEWER_LOGIN=KhalimCK \
node bin/run-sandcastle.js run \
--project 7 \
--mode per-ticket \
--owner Hestia-Homes \
--repo assessment-model \
--target-repo ~/Documents/hestia/assessment-model
```
Modes:
- `per-ticket` — one PR per issue, phase-gated. Runner exits between phases; re-run after PRs merge.
- `single-pr` — one PR for the whole DAG. Runner halts on any failure.
## Workflow
```
┌─→ agentic-toolkit run --project N --mode <variant> (Docker + ANTHROPIC_API_KEY)
/grill-me → /to-prd → /to-issues → /to-project ─┤
└─→ /ralph-loop project=N mode=<variant> (Claude Code subscription, no Docker)
```
`to-project` and `ralph-loop` skills live under `skills/engineering/` and are installed by `setup.sh`.
### Pick a runner
| Path | Auth | Sandbox | Cost | Parallelism |
|------|------|---------|------|-------------|
| `agentic-toolkit run` (sandcastle) | `ANTHROPIC_API_KEY` | Docker container | API metered | Per-container |
| `/ralph-loop` (skill) | Claude Code subscription | None — host repo | Subscription flat | Serial (one ticket at a time per Claude session) |
`/ralph-loop` is the zero-infra path: no Docker, no API key. It dispatches each ticket to a fresh Claude Code subagent (clean context per tick) using the same project schema and phase logic as the CLI runner. Trade-off: no sandbox isolation — run on a clean checkout. See `skills/engineering/ralph-loop/SKILL.md`.
## Architecture (modules in `src/modules/`)
| Module | Role |
|------------------------|------------------------------------------------------------------------------------------------|
| `PhaseScheduler` | Pure: topological sort of `Blocked by` → ordered phases. |
| `PromptBuilder` | Pure: build the per-ticket agent prompt. |
| `FailureHandler` | Pure state machine: retry / skip / halt given variant + retry count. |
| `ProjectStateClient` | GitHub Projects v2 + Issues GraphQL (read state, claim, set status, comment). |
| `BranchManager` | Git + `gh` ops in the target repo (push, open PR). |
| `AgentRunner` | Wraps `sandcastle.run()` with Docker provider and Claude Code agent. |
| `LoopOrchestrator` | Wires the above; runs the per-tick algorithm. |
### Variant differences
| Concern | per-ticket | single-pr |
|-----------------------|---------------------------|-----------------------------|
| Branches | one per issue | one per project, reused |
| PRs | one per issue | one for the whole DAG |
| Phase gates | yes (exit between phases) | no (topological order only) |
| HITL mid-run | issue parked; peers continue | runner halts |
| Failure after retry | skip + continue | halt |
### Project Status field
`/to-project` configures a single-select `Status` field with these options:
- `Backlog` — has unmet blockers.
- `Ready` — runner-pickable; AFK with all blockers Done.
- `In progress` — being executed by an agent right now.
- `In review` — PR open, waiting for human merge.
- `Needs human` — failed twice, or HITL.
- `Done` — issue closed (set automatically on PR merge by Projects' built-in workflow).
## Development
```sh
npm install
npm test
npm run typecheck
```
Pure modules (`PhaseScheduler`, `PromptBuilder`, `FailureHandler`, `ProjectStateClient` parsers) are unit-tested. Integration with sandcastle / git / GraphQL is exercised manually before each release.
## Out of scope (v1)
- Remote / parallel runners across machines (local-first).
- Slack / email failure notifications (issue comments only).
- Stacked PRs and phase branches.
- Cross-repo projects.
- Pinning Matt Pocock skills to a specific commit SHA — `setup.sh` tracks HEAD for now; SHA pinning will land when the upstream `skills` CLI supports `repo#sha`.