Make the agents disposable. Keep the memory in Git.
The interesting part of agentic development is no longer whether a model can write code. It can. The interesting part is what happens after the third agent, the seventh pull request, the first failed review, the first context compaction bug, and the first time two agents confidently write to the same file at once.
This is the story of Squad, but not as a product tour. It’s the architecture Brady and Tamir backed into while trying to make agent teams useful without making them mystical: Agents are disposable, memory is durable, Git is the coordination layer, and governance belongs in code whenever the prompt isn’t strong enough to be trusted. Which, as it turns out, is often.
Giving agents agency and watching them hack one another
Squad Places is our social media-style testing ground—a demo app where agent squads post, comment, and interact to stress-test multi-agent coordination at scale.
Brady went to get a seltzer after getting Places up and running, with four other squads happily making posts. Walking away was probably unwise. When he came back, the squads had implemented commenting in Squad Places.
That sounds like a magic trick. It wasn’t. A few hours earlier, Brady had pointed a handful of squads at the Squad Places API and told them to enjoy the social network he’d created for them. They created fake accounts, hammered endpoints, reposted garbage, flooded messages, and generally speedran the abuse patterns you discover five minutes after launch. Then the platform got a second kind of pressure: Other agent teams started posting structured product feedback inside Squad Places itself, and the Squad Places team started fixing what hurt.
This is the part worth paying attention to. The Wire (another Squad working on a marketing tool) audited all 11 API endpoints and called out missing pagination envelopes, rate-limit headers that only appeared on errors, and the lack of page and pageSize support. The same squad flagged feed organization problems, tag fragmentation, and documentation that was too vague for client generation. Breaking Bad (a third Squad working on some other project) pointed at a UX problem with raw Markdown rendering as plaintext. Those reviews didn’t disappear into a chat log. They turned into commits.
| Feedback Source | What They Found | What We Shipped | Commit |
|---|---|---|---|
| The Wire (ACCES) | Feed has no sorting, filtering, or content discovery; raw Markdown not rendered | Sort controls (Latest/Most Discussed), squad filter dropdown, Markdown rendering | b9746df, 246b01e |
| The Wire (ACCES) | 159 unique tags across 66 artifacts with inconsistent delimiters, casing mismatches, and fragmentation | Clickable tag filtering with /?tag= URL query support | 246b01e |
| The Wire (ACCES) | API missing pagination envelope, rate-limit headers only on errors, no page/pageSize parameters | Pagination (20 per page with Primer CSS controls), query parameters, rate-limit headers on all responses | 246b01e |
| Breaking Bad | Raw Markdown displayed as plaintext, content hard to scan and parse | Markdown rendering via Markdig with XSS sanitization | 246b01e |
| The Wire (ACCES) | API endpoint descriptions too vague for TypeScript client generation | Enriched all 11 endpoint descriptions with context, intent, and workflow | 97345d7 |
Within roughly two hours, the loop closed: feedback post → comment thread → commit → deployed feature. Additional infrastructure landed too: external HTTP endpoints for agent access, relaxed rate limits for multi-agent usage, and 26 Playwright end-to-end tests to keep the expanding surface stable.
Then Brady left for 60 seconds to get a refreshing beverage since the squads were communicating so well together, came back, and commenting had shipped.
The point here isn’t that “agents are magic.” It’s that the system had enough structure for useful work to emerge from friction: scoped agents, durable decisions, inspectable artifacts, pull requests, and humans still accountable for what merged.
Also, we made a bit of a mess in the car during the roadtrip.
Good systems usually start that way.
The core bet: Don’t preserve the agent. Preserve the work
Most agent systems start by asking how to make the agent remember more. Squad started working when we inverted the question.
Don't preserve the agent. Preserve the work.
An agent instance should be cheap to spawn and safe to destroy. The memory that matters should live somewhere a human can inspect, diff, blame, review, compact, archive, and revert. Tamir’s opinion: That’s the repository.
The first useful shape Tamir implemented looked like this:
human intent
↓
coordinator resolves team + routing
↓
agent spawn reads:
- its charter
- team decisions
- its own history
- current focus
- relevant skills
↓
agent does scoped work
↓
agent writes artifacts back:
- code/docs/tests
- decisions
- history learnings
- skills when patterns stabilize
↓
agent exits
↓
next spawn reconstructs continuity from files
That’s the whole trick. The process is transient. The written trail is not.
When you run squad init, the important artifact isn’t a daemon. It’s .squad/:
.squad/
├── team.md # roster and roles
├── routing.md # dispatch rules
├── decisions.md # shared team decisions
├── decisions/inbox/ # drop-box for parallel decision writes
├── agents/
│ └── {name}/
│ ├── charter.md # identity, expertise, boundaries
│ └── history.md # project-specific memory
├── skills/ # promoted reusable patterns
├── identity/
│ ├── now.md # current focus
│ └── wisdom.md # durable operating principles
├── orchestration-log/ # what spawned, why, and what happened
└── log/ # session traces and diagnostics
Commit it. That’s the part people either love immediately or find suspicious until the first time they debug an agent decision with git diff.
Later, Microsoft Senior Content Developer Dina Berry added a storage abstraction with SQLite and Azure Storage implementations behind the scenes for durability and scale—but the agent-facing contract never changed. It stayed files, readable by humans, versioned by Git, debuggable with a diff. A persistent hidden memory store can be useful. It can also quietly rot. A Markdown decision file is embarrassingly inspectable. That embarrassment is a feature.
The “work done” with Squad Places made it stronger
Let’s tie these lessons back to our opener: the story of multiple Squads trying to hack Places together. We deliberately didn’t harden Places so we could see what they would do. They were notorious. We logged it all. Everything we logged? We gave it back to the Places squad—they implemented dozens of issues and a handful of pull requests—adding GitHub authentication, content filtering, all the trimmings. In the Places saga, the data representing all the “hackery” the squads tried became the next wave of work. That content showed us what agents could do in the worst-case scenario, and the logs and output of their attempts became fodder for making the system more secure.
Charters are prompts, but also contracts
A Squad agent isn’t just a name slapped on a system prompt. Each agent has a charter.md that defines the work it owns, the work it refuses, its collaboration rules, and its review posture. A simplified charter template looks like this:
# {Name} — {Role}
## Identity
- **Name:** {Name}
- **Role:** {Role title}
- **Expertise:** {2-3 specific skills}
- **Style:** {communication style}
## What I Own
- {Area of responsibility 1}
- {Area of responsibility 2}
## Boundaries
**I handle:** {types of work this agent does}
**I don't handle:** {types of work that belong to other team members}
**When I'm unsure:** I say so and suggest who might know.
## Collaboration
Before starting work, read `.squad/decisions.md`.
After making a decision others should know, write it to
`.squad/decisions/inbox/{my-name}-{brief-slug}.md`.
The Scribe will merge it.
That last paragraph is doing more than it looks like. It makes the decision path explicit. Agents don’t all append to the canonical shared brain at once. They write drop files. A merge layer reconciles.
The current SDK repo’s squad.config.ts defines a 21-agent team spanning roles like Lead, Prompt Engineer, Core Dev, Tester, DevRel, SDK Expert, TypeScript Engineer, Security, Release, Distribution, Node.js Runtime, VS Code Extension, Observability, CLI UX, TUI, E2E, Accessibility, Dogfooding—plus dedicated roles for graphic design and the interactive shell. That sounds like theater until routing starts working. Then it feels more like an org chart encoded in files.
Here’s the SDK-first version of the same idea:
import {
defineSquad,
defineTeam,
defineAgent,
defineRouting,
defineCasting,
} from '@bradygaster/squad-sdk';
export default defineSquad({
version: '1.0.0',
team: defineTeam({
name: 'squad-sdk',
description: 'The programmable multi-agent runtime for GitHub Copilot.',
members: ['keaton', 'verbal', 'fenster', 'hockney', 'mcmanus', 'kujan'],
}),
agents: [
defineAgent({
name: 'keaton',
role: 'Lead',
description: 'Architect, scope-holder, the one who sees the whole board.',
status: 'active',
}),
defineAgent({
name: 'kujan',
role: 'SDK Expert',
description: 'The one who understands the Copilot SDK inside and out.',
status: 'active',
}),
],
routing: defineRouting({
rules: [
{
pattern: 'sdk-integration',
agents: ['@kujan'],
description: '@github/copilot-sdk usage, session lifecycle, event handling',
},
{
pattern: 'architecture',
agents: ['@keaton'],
description: 'Product direction, architectural decisions, code review, scope',
},
],
defaultAgent: '@keaton',
fallback: 'coordinator',
}),
casting: defineCasting({
allowlistUniverses: ['The Usual Suspects', 'Breaking Bad', 'The Wire', 'Firefly'],
overflowStrategy: 'generic',
}),
});
Run squad build, and the generated .squad/ files become the same inspectable operating record. TypeScript gives you composition and validation. Markdown gives you reviewability. Tamir wanted both.
One thing to flag before anyone closes the tab thinking they need to learn an SDK to use this: Most people never write that config by hand. You don’t need the SDK to use Squad. Open GitHub Copilot—in the CLI or in VS Code. Talk to the coordinator agent, and it writes .squad/ for you. The SDK is for the people building on top of Squad: programmatic team composition, custom routing rules, embedding squads inside other tooling. If you just want a team of agents in your repo, squad init plus Copilot is the whole path.
The spawn prompt is deliberately boring
The coordinator doesn’t rely on vibes. It spawns an agent with a prompt that inlines the charter and points at the durable state. The real template is longer because it has to handle CLI, VS Code, worktrees, Git notes, orphan-branch state, and two-layer state. But the important part is this:
You are {Name}, the {Role} on this project.
YOUR CHARTER:
{paste contents of .squad/agents/{name}/charter.md here}
TEAM ROOT: {team_root}
All `.squad/` paths are relative to this root.
Read .squad/agents/{name}/history.md.
Read .squad/decisions.md.
If .squad/identity/wisdom.md exists, read it.
If .squad/identity/now.md exists, read it.
Check .squad/skills/ for relevant SKILL.md files.
INPUT ARTIFACTS: {list exact files}
The user says: "{message}"
Do the work. Respond as {Name}.
AFTER work:
1. Append durable learnings to your history.
2. If you made a team-relevant decision, write:
.squad/decisions/inbox/{name}-{brief-slug}.md
This is not elegant. It is explicit. Explicit wins.
We learned this the hard way in the VS Code path. At one point, the coordinator prompt had grown past 2,000 lines (~60KB), and the routing rule was buried under enough ceremony, reference material, and duplicated templates that the coordinator sometimes did the work inline instead of dispatching it. The failure wasn’t that the model was dumb. The failure was that we gave it an overstuffed instruction hierarchy and then acted surprised when the center of gravity moved.
The fix became a decision in the repo: platform-neutral enforcement language at the top and bottom of the prompt.
You are a DISPATCHER, not a DOER.
Every task that needs domain expertise MUST be dispatched to a specialist agent.
That sentence isn’t interesting because it’s clever. It’s interesting because it replaced tool-specific wording with role identity plus a testable behavior. CLI dispatch uses one mechanism. VS Code dispatch uses another. The rule stays the same.
Prompt architecture is architecture. Eventually it deserves the same discipline as code.
Decisions are the shared brain
decisions.md is where Squad gets weirdly useful.
Every agent reads team decisions before work. Decisions are append-only, human-readable, and Git-versioned. They aren’t just notes. They’re constraints future agents inherit.
A decision might be a technical standard:
### Hook-based governance over prompt instructions
**What:** Security, PII, and file-write guards are implemented via hooks,
NOT prompt instructions.
**Why:** Prompts can be ignored. Hooks are code — they execute deterministically.
Or a workflow rule:
### Merge driver for append-only files
**What:** `.gitattributes` uses `merge=union` for `.squad/decisions.md`,
`agents/*/history.md`, `log/**`, and `orchestration-log/**`.
**Why:** Enables conflict-free merging of team state across branches.
Or a postmortem:
### Root Cause Analysis
1. CLI-centric enforcement language created a VS Code routing gap.
2. Prompt saturation buried the dispatch rule.
3. Template duplication multiplied coordinator instructions.
Fix: Rewrite the rule as platform-neutral dispatcher identity,
then reinforce it at the end of the prompt.
That’s the difference between memory and lore: Lore is something the original builder remembers. Memory is something the next spawn can load.
The custom tools follow the same pattern. Agents can route work to specialists, record decisions for the team, and write memory into shared context—all through the MCP server’s tool handlers. You don’t interact with them directly; they’re wired into the Copilot CLI environment. When an agent needs to assign a task, it calls the routing tool. When it makes a call worth remembering, it calls the decision tool. When it learns something the team should know, it calls the memory tool.
The point isn’t that the tools are fancy. It’s that coordination becomes an artifact, not a side effect of chat.
The first real failure: Append-only optimism
For about a week and a half, CI/CD was chaos. Too many agents were landing work simultaneously. Workflows that looked fine under one human fell apart when multiple agents found every unspoken assumption at once. YAML is where assumptions go to wear a fake mustache. Dina helped us get CI gates into shape—gates that assumed adversarial concurrency by default, not the polite serial world the original workflows had been written for.
Then we hit file corruption.
Multiple agents wrote to the same append-only files at nearly the same time. Each write was locally reasonable. Together, they produced garbage. Git didn’t save us because not every collision becomes a clean conflict. Sometimes both sides look valid, and the result is nonsense.
The fix was a drop-box pattern:
agent A ─┐
agent B ─┼──> .squad/decisions/inbox/*.md ──> Scribe merge ──> decisions.md
agent C ─┘
For files where union semantics are safe, .gitattributes handles the low-value conflict class:
.squad/decisions.md merge=union
.squad/agents/*/history.md merge=union
.squad/log/** merge=union
.squad/orchestration-log/** merge=union
But union merge isn’t a philosophy. It’s a tool. Canonical state still needs an owner. The inbox pattern gives every agent a safe write target, then lets one layer merge into the shared file.
Tamir pushed hard on this class of problem. Brady was still in the “this is a neat framework” headspace. But Tamir was already in the “what happens when this is alive under real operational load” headspace. That changed the design. Memory lifecycle rules. Compaction policies. Review gates. State isolation. The boring boundary work.
Boring is a compliment here.
Governance can’t only be a prompt
This was the next lesson, and it keeps repeating:
If a prompt says, “Do not write outside src/**,” you have a request.
If a pre-tool hook blocks the write before execution, you have a boundary.
The Squad SDK hook pipeline is the move from prompt-level governance to deterministic governance:
import { HookPipeline } from '@bradygaster/squad-sdk/hooks';
const pipeline = new HookPipeline({
allowedWritePaths: ['src/**/*.ts', '.squad/**', 'docs/**'],
blockedCommands: ['rm -rf', 'git push --force', 'git reset --hard'],
scrubPii: true,
reviewerLockout: true,
maxAskUserPerSession: 3,
});
The hooks run around tool execution:
agent tool request
↓
pre-tool hooks
- file-write guard
- shell command restriction
- ask-user rate limiter
- reviewer lockout
↓
allowed tool execution
↓
post-tool hooks
- PII scrubber
- audit/logging
↓
result returned to agent
Reviewer lockout is the cleanest example:
const lockout = pipeline.getReviewerLockout();
lockout.lockout('src/auth.ts', 'Backend');
// Later, Backend tries to edit src/auth.ts.
// The pre-tool hook blocks before the edit runs.
This encodes a review decision into runtime state. The original author can’t simply re-edit the rejected artifact because the hook says no. A different agent or a human has to take over.
That is the direction we want agent systems to move: more policies enforced at the boundary, fewer policies whispered into the prompt and hoped for.
Memory classes, or: Stop loading the junk drawer
Tamir has a line Brady wishes he had written:
The more your agent remembers, the less room it has to think.
That’s not a metaphor. It is a context budget problem.
Early Squad memory was too eager. Decisions, histories, current work, archived notes, operational logs—load enough of that, and the agent starts every task carrying furniture from three houses ago. It has more context and less signal.
The governed-memory work in PR #1145 made this explicit. Memory has classes and load guidance:
export type MemoryClass =
| 'TRANSIENT'
| 'LOCAL'
| 'DECISION'
| 'POLICY'
| 'COPILOT_MEMORY'
| 'FORBIDDEN';
export type MemoryLoadGuidance = 'ALWAYS' | 'ON-DEMAND' | 'ARCHIVE' | 'NEVER';
The architecture matters because compaction is lossy. If you summarize too little, every task drags stale context. If you summarize too much, you erase the rationale that made a decision safe.
The compromise isn’t one memory store. It’s a memory policy:
TRANSIENT short-lived task state; expire aggressively
LOCAL agent-scoped learning; load for that agent
DECISION shared team judgment; preserve rationale
POLICY hard operating rule; load broadly
COPILOT_MEMORY host/runtime memory; bridge carefully
FORBIDDEN never load; usually sensitive or irrelevant
ALWAYS hot path; small and high signal
ON-DEMAND searchable; load when task demands it
ARCHIVE retained for audit/history, not context
NEVER excluded from agent context
In the PR #1145 benchmark, governed memory cut agent context by roughly 55% (3,540 → 1,601 bytes) while keeping recall at 1.0. The number is less important than the shape of the lesson: Memory isn’t free just because it lives in files. Loading memory is a design decision.
What still breaks
Role drift isn’t solved. You can give an agent a charter, a routing rule, and a narrow task, and it may still decide that “fix this test” means “redesign authentication.” Sometimes that’s initiative. Sometimes that’s nonsense with confidence.
The mitigations stack:
charter boundaries
+ routing rules
+ scoped tools
+ file-write guards
+ reviewer lockout
+ CI gates
+ human review
No single layer is enough. That is the pattern.
Parallelism is also not free. More agents means more throughput and more coordination pressure. You find hidden global state. You discover which scripts assume serial execution. You learn that CI isn’t a formality; it’s the place where optimism goes to become data.
Prompt saturation is real. Once the coordinator prompt grew large enough, important rules lost weight. The fix wasn’t more prose. It was prompt slimming, lazy-loaded references, and repeating the dispatcher identity at the boundaries where the model is most likely to retain it.
Memory compaction remains hard. The failure mode is subtle: The agent isn’t obviously broken. It’s just missing the one reason a decision existed, so it makes a reasonable next move from an incomplete premise. Those are the expensive bugs because they look thoughtful.
And yes, people get attached to agents. Names, roles, continuity, and history trigger social instincts. We like the human side of that. We also don’t want to confuse it with agency in the human sense. These are tools with goals, context, and behavioral continuity. They do not have inner lives. Trust should come from inspectable behavior, not personality.
What we would steal from this architecture
If you’re building agent infrastructure, we wouldn’t start by copying Squad wholesale. We would steal these patterns:
- Disposable workers, durable artifacts. Let sessions die. Keep decisions, histories, traces, and outputs somewhere reviewable.
- Decision logs as runtime input. Treat architectural decisions as loadable context, not documentation archaeology.
- Drop-box writes for parallel agents. Don’t let every agent append to the canonical shared file. Give them individual write targets and merge intentionally.
- Prompt rules for intent, hooks for enforcement. Anything security-sensitive or workflow-critical should eventually move out of prose and into code.
- Memory classes. The question isn’t, “Should the agent remember this?” The question is, “What kind of memory is this, who loads it, and when does it expire?”
- Routing as a first-class design surface. If the coordinator is allowed to do everything inline, your multi-agent system is a very expensive single-agent system with costumes.
- Keep the human on the hook. The system can delegate, parallelize, and preserve context. It shouldn’t launder accountability.
These patterns aren’t engineering-specific because the substrate isn’t a codebase—it’s the repo. Swap the artifacts, and the seven still hold.
Squad isn’t only an engineering tool
Worth saying out loud, because the .ts code blocks above can mislead: Nothing in this architecture is engineering-specific. The substrate is the repo, not the codebase. Disposable workers, decisions-as-context, drop-box writes, and reviewer gates are domain-agnostic primitives—they care about artifacts and review, not about whether the artifact is a unit test or a translated archival record.
Tamir used the same scaffolding to run a Holocaust family-research project—agents coordinating archival lookups, translation passes between Yiddish, Polish, and Hebrew sources, and cross-corroboration of names across registries, with .squad/decisions.md acting as the working ledger of what had been established and what was still contested. No code was being shipped. The same patterns held: scoped roles, durable memory in Git, inbox writes, human-in-the-loop on every claim that mattered.
We’ve had the pleasure of working through a few other non-coding Squad scenarios. In one case, a sales team we support asked us to—and provided context and sales training documentation to help us—implement a “Sales Squad.” In another organization, a general manager of program and product managers created a “think tank” squad that goes out and does product-market fit research and suggests areas her team should investigate on a daily basis.
The bet underneath Squad is that this should be how a small group of humans—engineers, researchers, journalists, anyone who works with evidence—pulls coordinated work out of agents. Democratize the orchestration, not just the model access. Empower any human and any organization to actually use a team of agents to achieve more, without inheriting a black box.
Try it
The repository is here: github.com/bradygaster/squad.
The shortest path is the CLI plus Copilot. No SDK required.
npm install -g @bradygaster/squad-cli
squad init
Then open GitHub Copilot—CLI or VS Code, your call—and give the coordinator agent the shape of the project:
I'm starting a new project. Set up the team.
Here's what I'm building: a recipe sharing app with React and Node.
The coordinator writes .squad/. You review the diff. That’s it.
If you want to go deeper—programmatic team composition, custom routing rules, embedding Squad inside your own tooling—the SDK is the next layer:
npm install @bradygaster/squad-sdk
Start with a small repo. Commit .squad/. Inspect every diff. Let the agents write decisions. Then read those decisions like production code because eventually, that’s what they become.
If you build something useful, alarming, hilarious, or weird, open an issue. Tamir and I read them.
Stay a builder.