Published on

AI State of Play - Part 2: The Surface Area

Authors
The surface area, in motion
The surface area, in motion

Part 1 argued that the shift from "AI suggests" to "AI executes" is a category change in how engineers work. This post is about the surface area of the tool you stand in front of when you're doing that work.

Most posts about Claude Code's surface area read as a feature tour: here is plan mode, here is a skill, here is a hook, here is the MCP. That format is cheap to write and not very useful to read. Each primitive is a generic thing in isolation and a specific thing in motion. You learn what they're for by watching them come up in real sessions.

So this post does it the other way around. Three workflows are walked through end-to-end, with the primitives surfacing where they actually matter.

The three sessions are:

  1. A caching bug - 404s appearing intermittently after a deploy.
  2. A small feature with a spec - auto-updating lastmod on save for blog posts.
  3. A production deadlock - transactions hanging in a payment workflow.

If you use a different agentic tool, the names and exact JSON differ but the shapes translate.


Table of contents


A short word on what surface area means

Before the workflows, two paragraphs of orientation, because some readers will not have seen these primitives by name.

Claude Code, like most agentic coding tools, exposes a small set of moving parts that you can configure independently. Plan mode is a flag that asks the agent to form a plan before it changes anything. Skills are markdown files Claude auto-loads when the conversation context matches their description. Hooks are shell commands fired by the harness on lifecycle events - before a tool runs, after a file is edited, when the session stops. MCP servers are external processes Claude calls as tools, exposing capabilities outside the repo (databases, dashboards, ticketing systems, internal APIs). Subagents are isolated agent instances delegated a specific task, run with their own context window. Slash commands are user-invoked saved prompts. Settings.json is where most of this is wired together, at either project (.claude/settings.json) or personal (~/.claude/settings.json) scope.

That's the cast list. The point of the rest of this post is to show what they actually do for you. The point of the cast list is so the names mean something when they show up.


Workflow 1: A 404 after deploy

A small product team ships a new build on a Tuesday afternoon. Within twenty minutes, support flags it: a handful of users are seeing 404s on /products/[id] pages that exist and worked fine yesterday. Most users see the pages correctly. The pattern is intermittent and feels cache-shaped, but nobody is sure where the cache is misbehaving - browser, CDN, edge function, origin?

The on-call engineer opens Claude Code and types something close to:

Intermittent 404s on /products/[id] started showing up about 20 minutes after the latest deploy. Some users see it, most don't. Help me diagnose. Don't change anything yet.

The "don't change anything yet" is doing real work in that prompt. It tells the agent to operate in a planning mood rather than a fixing mood. In Claude Code, this can also be done structurally by entering plan mode before the prompt - the agent will then explicitly form a plan and surface it for approval before any action that writes, runs, or modifies state.

The agent starts by reading. It pulls the recent deploy logs via Bash, looks at the build output for the affected route, and curls the route directly with -I to inspect headers. It notices a cache-control value on the new build that doesn't match the previous deploy. It also notices that the route file in the repo has changed from a static export to a dynamic one - which means the CDN's previous edge cache for the old static URL is now serving 404s for users whose DNS is still pinned to the previous edge.

At this point, an MCP server earns its keep. If the team has wired in a CDN MCP - a small process that exposes their CDN's API as a tool the agent can call - the agent can directly query the cache state for the affected route and propose a selective purge. Without an MCP, the agent has to ask the engineer to run a curl command against the CDN's REST API and paste the response back. Both work. The MCP path is meaningfully tighter.

The agent surfaces a plan: revert the route's rendering mode, redeploy, and selectively purge the CDN cache for the prefix. The engineer reads it, agrees, and authorises action. The agent runs the changes, the redeploy completes, headers look right on the next curl. The 404s stop.

A PostToolUse hook running through this whole session has been quietly logging every Bash command and its output to a session-log.txt. That log becomes the raw material for the incident note the engineer writes ten minutes later.

What surfaced:

  • Plan mode turned a "fix it now" reflex into "diagnose first, act second." The 404s were not actually a code bug; they were a cache-shape bug. Plan mode prevented a wasted patch.
  • MCP (a CDN MCP) collapsed a four-step "agent asks, engineer runs, engineer pastes, agent reads" loop into a single tool call.
  • A hook (PostToolUse, logging Bash) gave the engineer a rough transcript of the session, which made the postmortem write itself.

Workflow 2: Auto-updating lastmod on save

The second workflow is smaller in scope and more representative of day-to-day repo hygiene.

The setup is a generic one. A blog or docs repository with MDX posts under some predictable path - say content/posts/** or data/posts/** - and a frontmatter schema that has a lastmod field that exists but is rarely populated, because the author always forgets. The goal: the agent should set lastmod to today's date whenever an MDX post under that path is edited, automatically. If the field is missing from the frontmatter, it should be added.

This is a feature with a tight spec. The author opens Claude Code and writes the spec out plainly:

When you edit any .mdx file under data/posts/, set the frontmatter lastmod field to today's date in YYYY-MM-DD format. Add the field if it doesn't exist. I want this enforced via a hook, not via your judgement - I forget about it, and I don't trust either of us to remember.

The agent reads the existing settings, looks at a representative post to confirm the frontmatter format, and proposes the following structure:

  1. A small Node script at scripts/update-lastmod.mjs that takes a file path, reads the frontmatter, sets lastmod to today, and writes the file back.
  2. A PostToolUse hook in .claude/settings.json that triggers on Edit|Write, filters to data/posts/**/*.mdx, and runs the script.
  3. Optionally, a skill at .claude/skills/blog-post-mdx/SKILL.md that documents the repo's frontmatter conventions, so future agent edits to posts pick up the right shape automatically.

The agent surfaces all three before writing any of them. The author approves.

The hook config the agent writes looks like this:

.claude/settings.json
{
  "hooks": {
    "PostToolUse": [
      {
        "matcher": "Edit|Write",
        "hooks": [
          {
            "type": "command",
            "command": "jq -r '.tool_input.file_path' | { read fp; case \"$fp\" in data/posts/*.mdx) node scripts/update-lastmod.mjs \"$fp\" ;; esac; }"
          }
        ]
      }
    ]
  }
}

The pattern is canonical. PostToolUse fires after a file-editing tool runs. The matcher narrows it to Edit and Write. The hook receives the tool input as JSON on stdin, jq extracts the path, and a case statement gates the action to the post directory.

The skill the agent writes looks like this:

.claude/skills/blog-post-mdx/SKILL.md
---
name: blog-post-mdx
description: Conventions for editing MDX blog posts in this repository. Use when reading or writing files under the posts directory, when generating new posts, or when discussing post frontmatter or structure.
---

# Blog post conventions

## Frontmatter fields

Required: `title`, `date` (ISO, YYYY-MM-DD), `summary` (one to two sentences,
shown on the index page).

Optional but expected: `tags` (array), `draft` (boolean, defaults false),
`lastmod` (YYYY-MM-DD, auto-updated on save by hook).

## Files and paths

Posts live under the configured posts directory. Multi-part series share
a subfolder and use `1-`, `2-`, ... numeric prefixes on the filename.

## Tone and structure

Long-form posts open with a specific hook - a moment, a number, a named
voice - rather than a definition. They include a Table of Contents after
the intro, a sequence of `##` sections, and a closing section that takes
a position rather than restating the post.

Two important things about that skill. The name and description are the only required fields; the description is what determines when Claude pulls this skill into context. The body is the skill's procedural knowledge, loaded only when the description matches. When the agent later edits a post, it auto-reads this file, sees the conventions, and writes the new frontmatter accordingly. The author never has to remind it.

After the configuration lands, the next edit to a post triggers the hook on save, the script bumps lastmod, and the field stays accurate forever - or at least until the hook breaks. Silent hook failure is a real failure mode in itself; the engineer's habit of glancing at git status after a save catches it cheaply.

What surfaced:

  • Hooks (PostToolUse on Edit|Write) for deterministic, "always happens" enforcement that doesn't depend on the agent remembering.
  • Skills (.claude/skills/<name>/SKILL.md) for repo conventions the agent should pick up automatically when relevant context appears.
  • Settings.json as the wiring file where hooks are declared.

Workflow 3: A production deadlock

The third workflow is forensic. A team running a payment workflow on Postgres is seeing intermittent transaction timeouts under load. The error logs are noisy and full. The team has a rough hypothesis - probably lock-order related, probably touching the orders and payment_attempts tables - but the volume of logs is large enough that nobody wants to spend an afternoon reading them.

The on-call engineer opens Claude Code and types:

Production is throwing transaction timeouts on the payment workflow under load. Logs in prod-logs/ for the last 6 hours. Use the postgres MCP to inspect lock graphs. Spawn a subagent for the log dive - I don't want the full log volume in this context. Goal: identify the deadlock pattern and propose a fix.

A subagent is a separately-invoked Claude instance with its own context window, given a narrow task. The agent here can either pick a generic explore-and-summarise subagent, or - if the team has set one up - a specifically scoped log-triage subagent at .claude/agents/log-triage.md. The subagent gets the log directory, the time window, and a goal: find error patterns, group by transaction signature, and return only the top three candidate failure modes. It runs, it reads thousands of log lines, and it returns roughly two screens of summary. Those two screens enter the main conversation; the underlying log volume does not.

In parallel, the agent uses the Postgres MCP - if one is configured - to query pg_stat_activity and pg_locks directly. Without an MCP, it would have to ask the engineer to run those queries and paste output. With one, it queries them itself, sees a pattern of two transactions acquiring locks on orders and payment_attempts in opposite orders, and identifies the deadlock candidate.

The agent surfaces two hypotheses, with weights. The first - lock-order inversion in a refund retry path - matches both the log triage summary and the lock graph evidence. The second - a long-held SELECT FOR UPDATE in a reconciliation job - is plausible but not currently active. The agent proposes a fix: enforce a canonical lock order in the refund path, add a regression test that exercises the contended path, and leave the reconciliation job for a follow-up.

The engineer agrees. The agent makes the change with the engineer reviewing each diff, runs the new test, confirms it fails on the old code and passes on the new, and stages a commit. The engineer authorises the commit and the deploy.

Then the engineer types:

/postmortem

A slash command is a user-invoked saved prompt - a markdown file at .claude/commands/postmortem.md containing a template prompt the agent fills in. In this case, the slash command pulls the session transcript, the diff, the test names, and the time window, and produces a draft postmortem in the team's standard format. The engineer reviews it, edits two paragraphs, and files it.

What surfaced:

  • Subagents (.claude/agents/<name>.md) for context-heavy tasks (log dives, repo-wide research) that would otherwise pollute the main conversation.
  • MCP (a Postgres MCP) for direct, structured access to the production system, instead of round-tripping queries through the human.
  • Slash commands (.claude/commands/<name>.md) for parameterised workflows that are run repeatedly and benefit from a saved template.

The deadlock workflow also previews Part 5 of this series. Replace "engineer types /postmortem after the fix" with "an autonomous agent fires /postmortem triggered by an alert closure," and you have the loop the next two posts will build toward.


The surface area, summarised

The following table is the index. Each row is a primitive, what it is, where it lives, and the kinds of problems it earns its keep on.

PrimitiveWhat it isLives atEarns its keep on
Plan modeA flag that asks the agent to plan before actingSession toggleMulti-file changes, ambiguous prompts, anything you'd want to review pre-commit
SkillsAuto-loaded markdown that Claude reads when context matches.claude/skills/<name>/SKILL.md (or ~/.claude/)Repo conventions, domain knowledge, anything you don't want to repeat in every prompt
HooksShell commands fired on lifecycle events (PostToolUse, Stop, etc.).claude/settings.json (or ~/.claude/)Deterministic enforcement: formatters, linters, custom validators, audit logging
MCP serversExternal processes exposing tools to the agent.claude/settings.json (or ~/.claude/)Reading and writing systems outside the repo: databases, dashboards, CDNs, ticketing, internal APIs
SubagentsIsolated agent instances delegated a specific task.claude/agents/<name>.md (or ~/.claude/)Long-running research, log dives, anything that would blow your main context window
Slash commandsUser-invoked saved prompts.claude/commands/<name>.md (or ~/.claude/)Repeated session shapes: /postmortem, /release-notes, /explain-this-error, /test-this-pr
Settings.jsonThe wiring file.claude/settings.json (or ~/.claude/)Declaring hooks, registering MCP servers, setting permissions, env, model preferences

A note on personal vs project scope. The personal versions of all of these (~/.claude/...) follow you across every repository on your machine. The project versions (.claude/... at the repo root) are committed and shared with anyone who clones the repo. As a rule of thumb: anything tied to the codebase goes in the project; anything tied to your habits goes in personal. Mixing the two is the most common configuration mistake people make.


An honest take

Most engineers who use Claude Code badly do so because they configure it badly. The two failure modes are mirror images of each other.

The first is under-configuration: running plain claude, no settings file, no skills, no hooks, no MCPs - and concluding that the tool is impressive but unreliable. It is impressive but unreliable in that mode, because the agent has no leverage to apply your conventions and no way to enforce anything. You are paying for a senior contractor with no onboarding.

The second is over-configuration: discovering all of the primitives in this post in a single afternoon, writing seven skills, four hooks, three subagents, and two MCPs over the weekend, and then spending the following week debugging which of those is responsible for the agent acting strangely. Any one of these primitives can introduce a subtle bias in agent behaviour. Configured too aggressively, they compound.

The right approach, in my experience, is genuinely boring. Start with one. Add the next when you've felt the pain of not having it. After three months of regular use, most engineers I've watched converge on roughly the same shape: a project skill or two for repo conventions, one or two hooks for deterministic enforcement (a formatter, maybe a frontmatter rewriter), an MCP for whichever external system they touch most often, and plan mode habitually for anything multi-file. Subagents and slash commands enter the picture later, when specific recurring shapes start to show up in your sessions and reach a threshold of "I would like to not have to re-type this prompt."

The primitives are powerful. They are not yet free. Every one of them is something you have to maintain, and every one of them is something a future you (or a future colleague) has to read. Configure them deliberately, name them well, and delete the ones you stopped using.

The next post is about what goes wrong when you don't.


Sources and further reading: