My Agentic Engineering Workflow: Shipping Better Code with AI Agents

Over the past six months, I’ve fundamentally changed how I write software. Instead of treating AI as a code completion tool, I now run multiple parallel AI agents — each responsible for a specific task — while I focus on planning, architecture, and validation.

This shift was inspired by engineers like Kun Chen (ex-Meta L8), who demonstrated that with the right orchestration, you can ship 20-40 PRs per day without becoming a bottleneck. The key insight: the bottleneck has shifted from writing code to managing agent workflows and reviewing outputs at scale.

Here’s the system I’ve built, the tools I use, and the exact process I follow.

The Three-Phase Framework

Every feature I build goes through three distinct phases:

[Plan] → [Code] → [Validate]
  ↑                        |
  └────── Iterate ─────────┘

Phase 1: Plan (Human-led, AI-assisted)

This is where I invest the most time. The quality of the plan directly controls how long agents can run autonomously before needing human re-direction.

My planning process:

Write a detailed spec — Not a one-liner prompt. A proper specification with acceptance criteria, edge cases, and non-functional requirements.
Create visual artifacts — I use Lavish to generate interactive HTML mockups of the UI or data flow. Visual artifacts compress feedback loops dramatically — I can spot UX issues in seconds that would take paragraphs to describe in text.
Define measurable goals — Instead of “make it faster,” I write “API response under 200ms for 95th percentile with 1000 concurrent users.” Agents work better with measurable targets.

The output of this phase is a structured prompt that an agent can execute without ambiguity.

Phase 2: Code (Agent-led, human-supervised)

I run multiple agents in parallel across separate branches. The tool that makes this practical is Treehouse — it manages multiple git worktrees so each agent operates in its own isolated branch without interference.

┌─────────────┐    ┌─────────────┐    ┌─────────────┐
│  Agent 1    │    │  Agent 2    │    │  Agent 3    │
│  (Feature A)│    │  (Feature B)│    │  (Bug fix)  │
└──────┬──────┘    └──────┬──────┘    └──────┬──────┘
       │                  │                  │
       └──────────────────┼──────────────────┘
                          ▼
              ┌─────────────────────┐
              │   Validation Gate   │
              │   (No Mistakes)     │
              └─────────────────────┘

Each agent session:

Operates from a detailed spec (from Phase 1)
Works in its own git worktree via Treehouse
Is configured with project-specific context via an AGENTS.md file

Why AGENTS.md matters: Default agent behavior produces generic code. I maintain a project-level AGENTS.md that encodes conventions, testing requirements, and architectural decisions. Agents read this before touching any file.

Phase 3: Validate (Fully automated)

I do not review the first-pass code from agents. Instead, I use an automated validation pipeline called No Mistakes that runs a fresh-context agent to scrutinize the change.

This is critical: reviewing code in the same context as the agent that wrote it introduces confirmation bias. A fresh agent, given only the diff and the spec, catches bugs the original agent missed.

The pipeline runs:

# .no-mistakes/gates.yml (simplified)
steps:
  - intent:     Verify code matches the spec
  - rebase:     Rebase onto latest main
  - review:     Fresh-context agent reviews every line
  - test:       Run full test suite
  - document:   Update docs if signatures changed
  - lint:       Enforce code style
  - push:       Push to remote
  - pr:         Create PR with changelog
  - ci:         Wait for CI to pass

When Kun Chen first built this system, he ran parallel testing — reviewing changes himself alongside the automated pipeline — to calibrate the prompts. Eventually, he reached a point where he “never catches anything the agents don’t catch.” I’ve had the same experience.

The Tools I Use

Tool	Purpose	Link
Lavish	Interactive HTML mockups for visual planning	GitHub
Treehouse	Parallel git worktree management for multi-agent sessions	GitHub
No Mistakes	Automated validation pipeline (review, test, lint, push, PR, CI)	GitHub
WezTerm	Terminal emulator for multi-pane agent sessions	wezfurlong.org
Neovim	Editor with oil.nvim, neogit for file/ git management	neovim.io

Running 20-30 Agents in Parallel

The biggest misconception about agentic engineering is that you run one agent at a time. The reality is closer to a distributed system:

5 main sessions running simultaneously (each in its own Treehouse worktree)
Each session can spawn 4-6 sub-agents for exploratory tasks (researching APIs, investigating bugs, drafting tests)
Total: 20-30 concurrent agents

Sub-agents are spawned primarily to prevent context window bloat in the main session. If an agent needs to investigate a database schema or read a large file, it delegates that to a sub-agent rather than loading it into its own context.

What This Means for Engineering Teams

This workflow exposes a structural problem with traditional team processes:

Code review cadences, PR processes, and sprint planning were designed around human coding speed. They break when one engineer opens 10x the usual number of PRs.

Teams adopting agentic workflows need to adapt:

Automate the first-pass review — Human reviewers should see AI-sanitized diffs, not raw agent output
Batch related changes — A single PR can contain changes from multiple agents if they’re logically related
Shift left on planning — More time in spec, less time in review

My Daily Rhythm

08:00 — Review automated validation results from overnight agents
08:30 — Plan today's features (specs + visual artifacts)
10:00 — Launch agent sessions, switch to deep work (architecture, code review)
12:00 — Review agent output, iterate on specs
14:00 — Launch afternoon agent sessions
16:00 — Review PRs, merge what's green
17:00 — Update AGENTS.md with lessons learned

Getting Started

If you want to adopt this workflow, start small:

Install No Mistakes in one repository — npx no-mistakes init
Write an AGENTS.md — encode your project’s conventions
Try one parallel session — Use Treehouse to run two agents side-by-side
Measure your validation recall — Review alongside the pipeline until you trust it
Scale up — Add more agents as your confidence grows

References

Kun Chen — L8 Principal’s Agentic Engineering Workflow
Peter Yang — How This Ex-Meta L8 Engineer Ships 40 PRs a Day with AI Agents
ByteByteGo — Inside Kun Chen’s Agentic Engineering Workflow
No Mistakes — Automated code validation pipeline
Treehouse — Parallel git worktree management
Lavish — Visual HTML planning artifacts