26 KiB
Learn Claude Code -- Harness Engineering for Real Agents
The Model IS the Agent
Before we talk about code, let's get one thing absolutely straight.
An agent is a model. Not a framework. Not a prompt chain. Not a drag-and-drop workflow.
What an Agent IS
An agent is a neural network -- a Transformer, an RNN, a learned function -- that has been trained, through billions of gradient updates on action-sequence data, to perceive an environment, reason about goals, and take actions to achieve them. The word "agent" in AI has always meant this. Always.
A human is an agent. A biological neural network, shaped by millions of years of evolutionary training, perceiving the world through senses, reasoning through a brain, acting through a body. When DeepMind, OpenAI, or Anthropic say "agent," they mean the same thing the field has meant since its inception: a model that has learned to act.
The proof is written in history:
-
2013 -- DeepMind DQN plays Atari. A single neural network, receiving only raw pixels and game scores, learned to play 7 Atari 2600 games -- surpassing all prior algorithms and beating human experts on 3 of them. By 2015, the same architecture scaled to 49 games and matched professional human testers, published in Nature. No game-specific rules. No decision trees. One model, learning from experience. That model was the agent.
-
2019 -- OpenAI Five conquers Dota 2. Five neural networks, having played 45,000 years of Dota 2 against themselves in 10 months, defeated OG -- the reigning TI8 world champions -- 2-0 on a San Francisco livestream. In a subsequent public arena, the AI won 99.4% of 42,729 games against all comers. No scripted strategies. No meta-programmed team coordination. The models learned teamwork, tactics, and real-time adaptation entirely through self-play.
-
2019 -- DeepMind AlphaStar masters StarCraft II. AlphaStar beat professional players 10-1 in a closed-door match, and later achieved Grandmaster status on European servers -- top 0.15% of 90,000 players. A game with imperfect information, real-time decisions, and a combinatorial action space that dwarfs chess and Go. The agent? A model. Trained. Not scripted.
-
2019 -- Tencent Jueyu dominates Honor of Kings. Tencent AI Lab's "Jueyu" defeated KPL professional players in a full 5v5 match at the World Champion Cup. In 1v1 mode, pros won only 1 out of 15 games and never survived past 8 minutes. Training intensity: one day equaled 440 human years. By 2021, Jueyu surpassed KPL pros across the full hero pool. No handcrafted matchup tables. No scripted compositions. A model that learned the entire game from scratch through self-play.
-
2024-2025 -- LLM agents reshape software engineering. Claude, GPT, Gemini -- large language models trained on the entirety of human code and reasoning -- are deployed as coding agents. They read codebases, write implementations, debug failures, coordinate in teams. The architecture is identical to every agent before them: a trained model, placed in an environment, given tools to perceive and act. The only difference is the scale of what they've learned and the generality of the tasks they solve.
Every one of these milestones shares the same truth: the "agent" is never the surrounding code. The agent is always the model.
What an Agent Is NOT
The word "agent" has been hijacked by an entire cottage industry of prompt plumbing.
Drag-and-drop workflow builders. No-code "AI agent" platforms. Prompt-chain orchestration libraries. They all share the same delusion: that wiring together LLM API calls with if-else branches, node graphs, and hardcoded routing logic constitutes "building an agent."
It doesn't. What they build is a Rube Goldberg machine -- an over-engineered, brittle pipeline of procedural rules, with an LLM wedged in as a glorified text-completion node. That is not an agent. That is a shell script with delusions of grandeur.
Prompt plumbing "agents" are the fantasy of programmers who don't train models. They attempt to brute-force intelligence by stacking procedural logic -- massive rule trees, node graphs, chain-of-prompt waterfalls -- and praying that enough glue code will somehow emergently produce autonomous behavior. It won't. You cannot engineer your way to agency. Agency is learned, not programmed.
Those systems are dead on arrival: fragile, unscalable, fundamentally incapable of generalization. They are the modern resurrection of GOFAI (Good Old-Fashioned AI) -- the symbolic rule systems the field abandoned decades ago, now spray-painted with an LLM veneer. Different packaging, same dead end.
The Mind Shift: From "Developing Agents" to Developing Harness
When someone says "I'm developing an agent," they can only mean one of two things:
1. Training the model. Adjusting weights through reinforcement learning, fine-tuning, RLHF, or other gradient-based methods. Collecting task-process data -- the actual sequences of perception, reasoning, and action in real domains -- and using it to shape the model's behavior. This is what DeepMind, OpenAI, Tencent AI Lab, and Anthropic do. This is agent development in the truest sense.
2. Building the harness. Writing the code that gives the model an environment to operate in. This is what most of us do, and it is the focus of this repository.
A harness is everything the agent needs to function in a specific domain:
Harness = Tools + Knowledge + Observation + Action Interfaces + Permissions
Tools: file I/O, shell, network, database, browser
Knowledge: product docs, domain references, API specs, style guides
Observation: git diff, error logs, browser state, sensor data
Action: CLI commands, API calls, UI interactions
Permissions: sandboxing, approval workflows, trust boundaries
The model decides. The harness executes. The model reasons. The harness provides context. The model is the driver. The harness is the vehicle.
A coding agent's harness is its IDE, terminal, and filesystem access. A farm agent's harness is its sensor array, irrigation controls, and weather data feeds. A hotel agent's harness is its booking system, guest communication channels, and facility management APIs. The agent -- the intelligence, the decision-maker -- is always the model. The harness changes per domain. The agent generalizes across them.
This repo teaches you to build vehicles. Vehicles for coding. But the design patterns generalize to any domain: farm management, hotel operations, manufacturing, logistics, healthcare, education, scientific research. Anywhere a task needs to be perceived, reasoned about, and acted upon -- an agent needs a harness.
What Harness Engineers Actually Do
If you are reading this repository, you are likely a harness engineer -- and that is a powerful thing to be. Here is your real job:
-
Implement tools. Give the agent hands. File read/write, shell execution, API calls, browser control, database queries. Each tool is an action the agent can take in its environment. Design them to be atomic, composable, and well-described.
-
Curate knowledge. Give the agent domain expertise. Product documentation, architectural decision records, style guides, regulatory requirements. Load them on-demand (s05), not upfront. The agent should know what's available and pull what it needs.
-
Manage context. Give the agent clean memory. Subagent isolation (s04) prevents noise from leaking. Context compression (s06) prevents history from overwhelming. Task systems (s07) persist goals beyond any single conversation.
-
Control permissions. Give the agent boundaries. Sandbox file access. Require approval for destructive operations. Enforce trust boundaries between the agent and external systems. This is where safety engineering meets harness engineering.
-
Collect task-process data. Every action sequence the agent executes in your harness is training signal. The perception-reasoning-action traces from real deployments are the raw material for fine-tuning the next generation of agent models. Your harness doesn't just serve the agent -- it can help improve the agent.
You are not writing the intelligence. You are building the world the intelligence inhabits. The quality of that world -- how clearly the agent can perceive, how precisely it can act, how rich its available knowledge is -- directly determines how effectively the intelligence can express itself.
Build great harnesses. The agent will do the rest.
Why Claude Code -- A Masterclass in Harness Engineering
Why does this repository dissect Claude Code specifically?
Because Claude Code is the most elegant and fully-realized agent harness we have seen. Not because of any single clever trick, but because of what it doesn't do: it doesn't try to be the agent. It doesn't impose rigid workflows. It doesn't second-guess the model with elaborate decision trees. It provides the model with tools, knowledge, context management, and permission boundaries -- then gets out of the way.
Look at what Claude Code actually is, stripped to its essence:
Claude Code = one agent loop
+ tools (bash, read, write, edit, glob, grep, browser...)
+ on-demand skill loading
+ context compression
+ subagent spawning
+ task system with dependency graph
+ team coordination with async mailboxes
+ worktree isolation for parallel execution
+ permission governance
That's it. That's the entire architecture. Every component is a harness mechanism -- a piece of the world built for the agent to inhabit. The agent itself? It's Claude. A model. Trained by Anthropic on the full breadth of human reasoning and code. The harness doesn't make Claude smart. Claude is already smart. The harness gives Claude hands, eyes, and a workspace.
This is why Claude Code is the ideal teaching subject: it demonstrates what happens when you trust the model and focus your engineering on the harness. Every session in this repository (s01-s12) reverse-engineers one harness mechanism from Claude Code's architecture. By the end, you understand not just how Claude Code works, but the universal principles of harness engineering that apply to any agent in any domain.
The lesson is not "copy Claude Code." The lesson is: the best agent products are built by engineers who understand that their job is harness, not intelligence.
The Vision: Fill the Universe with Real Agents
This is not just about coding agents.
Every domain where humans perform complex, multi-step, judgment-intensive work is a domain where agents can operate -- given the right harness. The patterns in this repository are universal:
Estate management agent = model + property sensors + maintenance tools + tenant comms
Agricultural agent = model + soil/weather data + irrigation controls + crop knowledge
Hotel operations agent = model + booking system + guest channels + facility APIs
Medical research agent = model + literature search + lab instruments + protocol docs
Manufacturing agent = model + production line sensors + quality controls + logistics
Education agent = model + curriculum knowledge + student progress + assessment tools
The loop is always the same. The tools change. The knowledge changes. The permissions change. The agent -- the model -- generalizes.
Every harness engineer reading this repository is learning patterns that apply far beyond software engineering. You are learning to build the infrastructure for an intelligent, automated future. Every well-designed harness deployed in a real domain is one more place where an agent can perceive, reason, and act.
First we fill the workshops. Then the farms, the hospitals, the factories. Then the cities. Then the planet.
Bash is all you need. Real agents are all the universe needs.
THE AGENT PATTERN
=================
User --> messages[] --> LLM --> response
|
More tool call requests?
/ \
yes no
| |
execute @Tool methods return text
return results
continue loop -----------------> messages[]
That's the minimal loop. Every AI agent needs this loop.
The model decides when to call tools and when to stop.
Spring AI's ChatClient.call() automatically drives this loop.
This repo teaches you to build everything around this loop --
the harness that makes the agent effective in a specific domain.
12 progressive sessions, from a simple loop to isolated autonomous execution. Each session adds one harness mechanism. Each mechanism has one motto.
s01 "One loop & Bash is all you need" — one tool + one loop = an agent
s02 "Adding a tool means adding one handler" — the loop stays the same; new tools register with
@Toolannotation +defaultTools()s03 "An agent without a plan drifts" — list the steps first, then execute; completion doubles
s04 "Break big tasks down; each subtask gets a clean context" — subagents use independent messages[], keeping the main conversation clean
s05 "Load knowledge when you need it, not upfront" — inject via tool_result, not the system prompt
s06 "Context will fill up; you need a way to make room" — three-layer compression strategy for infinite sessions
s07 "Break big goals into small tasks, order them, persist to disk" — a file-based task graph with dependencies, laying the foundation for multi-agent collaboration
s08 "Run slow operations in the background; the agent keeps thinking" — daemon threads run commands, inject notifications on completion
s09 "When the task is too big for one, delegate to teammates" — persistent teammates + async mailboxes
s10 "Teammates need shared communication rules" — one request-response pattern drives all negotiation
s11 "Teammates scan the board and claim tasks themselves" — no need for the lead to assign each one
s12 "Each works in its own directory, no interference" — tasks manage goals, worktrees manage directories, bound by ID
The Core Pattern
// Spring AI's ChatClient + @Tool annotation implement the Agent loop
// The model automatically decides when to call tools and when to return text -- the loop is driven by the framework
@SpringBootApplication
public class S01AgentLoop implements CommandLineRunner {
@Bean
public CommandLineRunner agentLoop(ChatClient.Builder builder) {
ChatClient chatClient = builder
.defaultSystem("You are a helpful assistant with access to tools.")
.defaultTools(new BashTool()) // 注册工具
.build();
return args -> {
// 一次 call() 内部自动完成: 调用模型 → 检测工具请求 → 执行工具 → 回传结果 → 再次调用模型...
String result = chatClient.prompt()
.user(userInput)
.call()
.content();
System.out.println(result);
};
}
}
// @Tool 注解让方法自动成为模型可调用的工具
public class BashTool {
@Tool(description = "Execute a shell command and return stdout/stderr")
public String executeBash(String command) {
// 执行命令并返回结果
}
}
Spring AI's ChatClient.call() encapsulates the complete agent loop internally: call the LLM → detect tool call requests → execute @Tool methods → return results to the model → repeat until the model returns text. Each session layers one harness mechanism on top of this loop -- without changing the loop itself. The loop belongs to the agent. The mechanisms belong to the harness.
Scope (Important)
This repository is a 0->1 learning project for harness engineering -- building the environment that surrounds an agent model. It intentionally simplifies or omits several production mechanisms:
- Full event/hook buses (for example PreToolUse, SessionStart/End, ConfigChange). s12 includes only a minimal append-only lifecycle event stream for teaching.
- Rule-based permission governance and trust workflows
- Session lifecycle controls (resume/fork) and advanced worktree lifecycle controls
- Full MCP runtime details (transport/OAuth/resource subscribe/polling)
Treat the team JSONL mailbox protocol in this repo as a teaching implementation, not a claim about any specific production internals.
Quick Start
Requirements
- JDK 21+ (recommended: Eclipse Temurin or GraalVM)
- Maven 3.9+
- An OpenAI-compatible LLM API key (DeepSeek, GLM, Qwen, OpenAI, etc.)
Clone & Build
git clone https://github.com/abel533/claude-code
cd learn-claude-code
mvn compile # 编译项目
Set Environment Variables
# Linux / macOS
export AI_API_KEY=your-api-key
export AI_BASE_URL=https://api.deepseek.com # 替换为你的模型服务商地址
export AI_MODEL=deepseek-chat # 替换为你使用的模型名称
# Windows PowerShell
$env:AI_API_KEY="your-api-key"
$env:AI_BASE_URL="https://api.deepseek.com"
$env:AI_MODEL="deepseek-chat"
Run Sessions
# 从第一课开始
mvn exec:java -Dexec.mainClass=io.mybatis.learn.s01.S01AgentLoop
# 完整递进终点
mvn exec:java -Dexec.mainClass=io.mybatis.learn.s12.S12WorktreeIsolation
# 总纲: 全部机制合一
mvn exec:java -Dexec.mainClass=io.mybatis.learn.full.SFullAgent
Web Platform
Interactive visualizations, step-through animations, source viewer, and documentation for each session.
cd web && npm install && npm run dev # http://localhost:3000
Java Version Features
This project uses the Java 21 + Spring Boot 3.5.7 + Spring AI 1.0.3 stack. Compared to the original Python version:
- Compatible with multiple LLM providers -- adapts to DeepSeek, GLM, Qwen, Moonshot and other models via the OpenAI protocol, no vendor lock-in
@Toolannotation handles the tool call loop automatically -- Spring AI completes the full "model call → tool execution → result return" cycle, no hand-written while loop needed- Java 21 Virtual Threads -- lightweight concurrency for background tasks and multi-agent collaboration, without thread pool management overhead
- Each session is independently runnable -- every session is a
@SpringBootApplication+CommandLineRunner, launchable with a singlemvn exec:javacommand - Type safety -- Java's strong type system catches errors at compile time, with IDE-friendly auto-completion
Learning Path
Phase 1: THE LOOP Phase 2: PLANNING & KNOWLEDGE
================== ==============================
s01 The Agent Loop [1] s03 TodoWrite [5]
ChatClient + @Tool TodoManager + nag reminder
| |
+-> s02 Tool Use [4] s04 Subagents [5]
@Tool registers tools independent ChatClient per child
|
s05 Skills [5]
SKILL.md via tool_result
|
s06 Context Compact [5]
3-layer compression
Phase 3: PERSISTENCE Phase 4: TEAMS
================== =====================
s07 Tasks [8] s09 Agent Teams [9]
file-based CRUD + deps graph teammates + JSONL mailboxes
| |
s08 Background Tasks [6] s10 Team Protocols [12]
virtual threads + notify queue shutdown + plan approval FSM
|
s11 Autonomous Agents [14]
idle cycle + auto-claim
|
s12 Worktree Isolation [16]
task coordination + on-demand isolated execution lanes
[N] = number of tools
Project Structure
learn-claude-code/
|
|-- src/main/java/io/mybatis/learn/ # Java implementation (Spring AI + Spring Boot)
| |-- core/ # shared utilities (AgentRunner, BashTool, EditFileTool, etc.)
| |-- s01/ S01AgentLoop.java # session 01: Agent Loop
| |-- s02/ S02ToolUse.java # session 02: Multi-Tool Registration
| |-- s03/ S03TodoWrite.java # session 03: Plan-Driven Execution
| |-- s04/ S04Subagent.java # session 04: Subagents
| |-- s05/ S05SkillLoading.java # session 05: Skill Loading
| |-- s06/ S06ContextCompact.java # session 06: Context Compression
| |-- s07/ S07TaskSystem.java # session 07: Task System
| |-- s08/ S08BackgroundTasks.java # session 08: Background Tasks
| |-- s09/ S09AgentTeams.java # session 09: Agent Teams
| |-- s10/ S10TeamProtocols.java # session 10: Team Protocols
| |-- s11/ S11AutonomousAgents.java# session 11: Autonomous Agents
| |-- s12/ S12WorktreeIsolation.java# session 12: Worktree Isolation
| +-- full/ SFullAgent.java # capstone: all mechanisms combined
|
|-- agents/ # Python reference implementations (original version, kept for comparison)
|-- docs/{en,zh,ja}/ # Mental-model-first documentation (3 languages)
|-- web/ # Interactive learning platform (Next.js)
|-- skills/ # Skill files for s05
|-- pom.xml # Maven build config (Spring Boot 3.5.7 + Spring AI 1.0.3)
+-- .github/workflows/ci.yml # CI: typecheck + build
Documentation
Mental-model-first: problem, solution, ASCII diagram, minimal code. Available in English | 中文 | 日本語.
| Session | Topic | Motto |
|---|---|---|
| s01 | Agent Loop | One loop & Bash is all you need |
| s02 | Tool Use | Adding a tool means adding one handler |
| s03 | TodoWrite | An agent without a plan drifts |
| s04 | Subagents | Break big tasks down; each subtask gets a clean context |
| s05 | Skills | Load knowledge when you need it, not upfront |
| s06 | Context Compact | Context will fill up; you need a way to make room |
| s07 | Task System | Break big goals into small tasks, order them, persist to disk |
| s08 | Background Tasks | Run slow operations in the background; the agent keeps thinking |
| s09 | Agent Teams | When the task is too big for one, delegate to teammates |
| s10 | Team Protocols | Teammates need shared communication rules |
| s11 | Autonomous Agents | Teammates scan the board and claim tasks themselves |
| s12 | Worktree + Task Isolation | Each works in its own directory, no interference |
What's Next -- from understanding to shipping
After the 12 sessions you understand how harness engineering works inside out. Two ways to put that knowledge to work:
Kode Agent CLI -- Open-Source Coding Agent CLI
npm i -g @shareai-lab/kode
Skill & LSP support, Windows-ready, pluggable with GLM / MiniMax / DeepSeek and other open models. Install and go.
GitHub: shareAI-lab/Kode-cli
Kode Agent SDK -- Embed Agent Capabilities in Your App
The official Claude Code Agent SDK communicates with a full CLI process under the hood -- each concurrent user means a separate terminal process. Kode SDK is a standalone library with no per-user process overhead, embeddable in backends, browser extensions, embedded devices, or any runtime.
GitHub: shareAI-lab/Kode-agent-sdk
Sister Repo: from on-demand sessions to always-on assistant
The harness this repo teaches is use-and-discard -- open a terminal, give the agent a task, close when done, next session starts blank. That is the Claude Code model.
OpenClaw proved another possibility: on top of the same agent core, two harness mechanisms turn the agent from "poke it to make it move" into "it wakes up every 30 seconds to look for work":
- Heartbeat -- every 30s the harness sends the agent a message to check if there is anything to do. Nothing? Go back to sleep. Something? Act immediately.
- Cron -- the agent can schedule its own future tasks, executed automatically when the time comes.
Add multi-channel IM routing (WhatsApp / Telegram / Slack / Discord, 13+ platforms), persistent context memory, and a Soul personality system, and the agent goes from a disposable tool to an always-on personal AI assistant.
claw0 is our companion teaching repo that deconstructs these harness mechanisms from scratch:
claw agent = agent core + heartbeat + cron + IM chat + memory + soul
learn-claude-code claw0
(agent harness core: (proactive always-on harness:
loop, tools, planning, heartbeat, cron, IM channels,
teams, worktree isolation) memory, soul personality)
License
MIT
The model is the agent. The code is the harness. Build great harnesses. The agent will do the rest.
Bash is all you need. Real agents are all the universe needs.