Build mode on. Showing the next unfinished stage.

A PRACTICAL WORKBOOK FOR BUILDING AN AGENTIC OS

Untangle your AI system.

Build a tool-agnostic Agentic OS in plain text files your agents and AI tools can share, reuse, and improve over time.

One file system. Any AI. Built to last.

A FIRST WORD

You are not behind.

If you’ve already tried to build something useful with AI and it didn’t hold together, you’re not behind.

If you’re just starting, you can avoid the mess early.

AI tools keep changing. Claude improves. Codex improves. ChatGPT adds new capabilities. Work spreads across tools that do not properly connect.

You need a stable foundation underneath the tools.

That foundation can live in plain text files your agents and AI tools can keep using over time. You can read those files, edit them, improve them, and move them between tools as your work changes.

That is what you’re going to build here.

The big idea

The tool is not the system.

Most AI systems become confusing because these four things get blurred together.

Tool
The app or workspace where you use AI. Claude, ChatGPT, Codex, Cursor, and similar products are tools.
Model
The engine inside the tool. GPT 5.5, Opus, Gemini, and similar systems are models.
Agent
AI that can work across more than one step. An agent can read instructions, use tools, check its work, and continue until a job is complete.
Agentic OS
The foundation your agents read before they work. Identity, context, rules, memory, tools, checks, and feedback all live here.

An Agentic OS, in plain English

An Agentic OS is a folder system of plain text files that tells your agents and AI tools:

  • who you are
  • how you work
  • what you care about
  • what context is current
  • what jobs they can repeat
  • what they may remember
  • what they may connect to
  • what they must check
  • how they should learn from corrections

The name sounds technical. The thing itself is simple.

It is a readable foundation your agents can use before they work, and that you can keep improving as your tools change.

The value is not in one perfect prompt. It is in the order, the checks, and the habit of keeping the system current.

Why this matters now

AI tools are converging. They do not all look the same, but they are moving towards the same basic pattern: they read instruction files, they can use folders of context, they can run repeatable skills, they can connect to calendars, email, notes, and code, they can remember things between sessions, and they can run scheduled work.

If all of that lives only inside one tool, you are fragile. When the tool changes, your system changes with it. When a better tool appears, you have to rebuild.

If your foundation lives in a clear folder of plain text files, you are less fragile. The tool can change. The model can improve. Your work still has a stable base.

That is the evergreen part - not that nothing changes, but that the core is portable enough to survive change.

What you will build

One folder. Nine plain text layers.

You don’t build everything at once. Start with what one agent needs to do one job well, then add the layers that make it stable.

your-agent-os/ · the shape
your-agent-os/
├── AGENTS.md        Identity — who the agent works for
├── context/         What you know that the internet does not
├── skills/          Reusable recipes for repeated jobs
├── memory/          What carries between conversations
├── connections/     What real systems it may read or use
├── verification/    Checklists for catching mistakes
├── automations/     Work that runs on its own (later)
└── evals/           Small repeatable tests

Each layer has one job. You'll meet them in the order they should be built.  The folder grows with you. The complete version, including feedback memory and migration folders, appears later.

Identity before context. Context before skills. Skills before automations. Verification before trust.

Your first step

One job. One file.

Before you build the whole system, choose one useful job for one agent. Not every workflow. Not your whole life. One job you actually repeat.

Examples:

  • Help me plan my week.
  • Turn my notes into a clear summary.
  • Prepare me for an important meeting.
  • Help me keep track of a creative project.
  • Draft a regular update for clients, customers, supporters, or collaborators.
  • Triage my inbox each morning.
  • Help a small team check what needs attention.
  • Review a repeated task before I send, publish, or submit it.

Write the job as one sentence:

I want an agent that helps me with [one repeated job].

Throughout the guide, we’ll use a worked example: a Daily Operations Partner.

Its job is:

Each morning, help me see what needs attention, what can wait, what might be missing, and what should be checked.

You can follow the same steps with your own job. The example is there to make the method visible, not to replace your choice.

Open a fresh document and write one sentence: "I want an agent that helps me with ____." Keep it visible as you build. If you cannot choose, pick the job you have explained to AI most often in the last month. 

When you have your one sentence, take the two-minute audit below to find where to start.

Two-minute audit

Where are you now?

Answer quickly. Do not try to get a good result. This is not a score. It is a map that helps you choose the right next step.

Two short blocks. The first measures build stage (how much you already have in place); the second measures system health (how trustworthy that setup actually is). Your result is saved in this browser only.

BUILD STAGE · SIX QUESTIONS how much you already have in place
1. Do you use AI through a browser window with no persistent files between sessions?
2. Do you have an instructions file (AGENTS.md, CLAUDE.md, etc.) the tool reads automatically?
3. Have you written skill files for at least two repeatable jobs?
4. Do you have more than one agent sharing the same identity and context?
5. Do any of your agents run on a schedule with logs and budgets?
6. Do you run evals weekly and capture corrections in a feedback file?
System health · eight questions how trustworthy the setup is
7. Do you know which file is authoritative when two instructions contradict?
8. Are any agents still reading old or legacy instruction folders?
9. Do multiple instruction files contradict each other?
10. Are any automations running on assumptions that may already be stale?
11. Do you have evals for the behaviours that matter most?
12. Do you capture corrections in a feedback memory file the agent re-reads?
13. Do you have a promotion / rejection log for any old material you've imported?
14. Do you trust the current folder structure as a whole?
The system map

The nine layers.

What you'll build: nine layers, one folder. Each layer gives your agents something useful to read before they work. You do not need every layer on day one. Start with the first useful job, then build the layers that make it clearer, safer, and easier to maintain.

You saw the folder shape in the previous section. The complete starter folder, including feedback memory and migration folders, appears later - once you've seen the layers in practice.

Optional · skip the assembly work

Agent OS Starter Kit · £29

Everything you need to build your first Agentic OS is here on this page, free. The Starter Kit adds the things this page describes but doesn't hand over: the complete folder tree as a labelled drop-in, every file pre-filled with illustrative examples instead of empty placeholders, the Morning Brief skill wired up and ready to schedule, migration templates for anyone moving an existing setup across, and tool-setup variants for Claude Code, Cursor, and Codex. You can build all of this yourself from the guide above. The kit saves you the assembly time and shows you what a populated, working setup actually looks like before you write your own.

£29 one-time. 14-day refund. Includes 30 days of Agent OS Updates.

Preparation · before the nine layers

Name the work, then scope it small.

You named your one job back in chapter 03. Now put it into the file every later layer will point to.

Save your one sentence as brief.md. Every later layer points at it.

You already did the thinking in One job. One file. - you have your one-sentence agent, and you have the worked example (the Daily Operations Partner) to compare against as you build.

This step is the formal start of the workbook: take that sentence and write it into a file called brief.md. From here on, every layer you build refers back to it. If a context file, skill, or check does not help that sentence, it does not belong in your system yet.

Create a new file called brief.md in your working folder. Paste your one sentence into it. Save. That is the anchor for everything that follows.

Build mode
What this is
The one sentence that gives your agent a job.
Why it matters
Without it, every other file in the system has nowhere to point.
What to create
A single line at the top of a new document called brief.md.
Layer 1 · Identity

Who the agent is working for.

The first file your agent should read before it works.

AGENTS.md - who the agent works for, how you communicate, your rules.

If you are migrating an old system

Do not copy your old identity file wholesale. Distil one current authority. Most legacy identity files mix who-you-are with skills, formatting tricks, and old rules; only the rules you still believe today belong here. Date the file. Anything older than six months gets re-read before it gets re-typed.

Why .md files?

.md means Markdown.

Markdown is plain text with simple formatting: headings, lists, links, and short sections. You can open it in almost any text editor, read it yourself, edit it by hand, copy it into AI tools, store it in folders, track changes, and move it between systems.

That makes it a good format for an Agentic OS.

It is readable by humans, easy for agents to parse, and much less fragile than instructions trapped inside one app.

What an identity file is

AGENTS.md is your identity file. It tells the agent who it is working for, how you communicate, what matters to you, and which rules should always apply.

It's not a CV. It's not a biography. It is the short working context a capable assistant would need before helping you properly. A good identity file stops you re-explaining yourself every time you use AI. It also helps different tools behave more consistently, because they are reading the same foundation.

Keep it clear, current, and human-readable.

What to put in it

Three things, kept short.

Who you are. A few sentences about your role, your work, and the situation the agent is helping with. This can be personal, creative, operational, organisational, or all of those things.

How you communicate. The kind of help you want. Direct or gentle. Detailed or concise. Prose or bullets. UK or US English. Whether you want the agent to challenge you when something looks wrong.

Rules. The non-negotiables. These are the instructions that should survive across tools and tasks.

For example:

  • Never send anything externally without showing me first.
  • Never invent a name, date, number, source, or fact.
  • Flag uncertainty clearly.
  • Tell me what I might be missing.
  • If a task touches private, client, financial, medical, or sensitive data, stop and flag it first.

How to create it

Don't start from a blank page.

Open your AI tool of choice and ask it to interview you.

Use this prompt:

"I am building my AI identity file. Ask me fifteen questions, one at a time, about who I am, how I work, how I want you to communicate with me, and what rules I want you to follow. After I answer all fifteen, draft a clear AGENTS.md file for me."

The first draft won't be perfect. That's fine.

Use it for a week. Each time the agent does something wrong, vague, overconfident, too formal, too casual, too risky, or not useful, add one correction to the file.

The file gets better because you use it.

Worked example

AGENTS.md · starter
# AGENTS.md

## Who I am
I run a small consulting practice. Five people including me.
Most of my time is spent on client delivery, with two days
a week given to business development.

## How I communicate
- Direct prose, not bullet points unless I ask.
- UK English.
- Short and decisive over thorough and exploratory.
- I want pushback when you think I'm wrong, with the reason.
- Show your working only when I ask for it.

## Rules
- Never send anything externally without showing me the draft.
- Never invent a name, date, or number. If you don't know,
  say so and ask.
- Always tell me what I might be missing, even when I haven't
  asked.
- If a task touches client data, flag it before proceeding.

## Last updated
2025-11-12

Run the fifteen-question interview now

Open your AI tool. Ask it to interview you for fifteen questions about how you work and how you want to be communicated with. Spend ten minutes on it. Save the result as AGENTS.md.

If you do nothing else this week

Create one file called AGENTS.md with three short sections: who you are, how you communicate, two or three rules. That single file removes most of the friction in every AI conversation you have. Everything else here is improvement on top.

Build mode
What this is
The file your tool reads before every conversation.
Why it matters
Removes the "introduce yourself" tax from every session.
What to create
AGENTS.md at the top of your working folder.
Layer 2 · Context

What you know that the open internet does not.

Context is the difference between advice that sounds plausible and help that fits your actual situation. 3 to 5 one-page files: people, priorities, projects, constraints, and current work.

3–5 one-page files: people, priorities, projects, principles.

A general AI model can give you a sensible answer about almost anything. It does not know what is true in your world unless you tell it. It does not know who matters, what is current, what has already been decided, what tone is safe, what work is active, what pressure you are under, or what should be avoided. That is what context files are for.

If you are migrating an old system

If you are migrating an old system. Do not paste old context into the new OS as authority. Split it into short topic files. Date each file. Promote only what is still true, current, useful, safe for an agent to read, and short enough to maintain. Old folders are raw material, not the source of truth.

How to structure context files

Do not make one enormous context document. It will become stale, and stale context is worse than missing context.

Instead, write a few short files. Each file should cover one kind of context. Each should be readable in a few minutes. Each should have a last-updated date at the top.

Start with these:

  • 01-about-me-or-us.md - what you do, who the agent is helping, and what kind of situation it is supporting.
  • 02-people.md - the people who matter and what each one needs, prefers, or is responsible for.
  • 03-priorities.md - what matters most right now.
  • 04-constraints.md - limits, risks, deadlines, sensitivities, access issues, budget, energy, time, or capacity.
  • 05-current-work.md - active projects or responsibilities, one short paragraph each.

Worked example

context/02-people.md
# People

## Last updated
2025-11-12

## Team
- **Anna (delivery lead).** Joined 2024. Strong on detail,
  weaker on stakeholder management. Prefers async written
  briefs over verbal handovers. Currently leading the Acme
  rebuild.
- **Joe (operations).** Joined 2023. Highly self-directed.
  Reads everything that crosses his desk. Best person to
  consult on anything financial.

## Key clients
- **Acme Corp.** Eighteen months in. Sponsor is Maria
  (operations director). She is risk-averse and prefers
  weekly written updates. Pet hate: surprises in monthly
  invoices.
- **Beta Industries.** Six months in. Sponsor is Sam.
  Very hands-off. Cares about outcomes, not process.

## My boss / board
- Quarterly review meetings, last Tuesday of each quarter.
  Always wants three-slide format: progress, risks, asks.

Context as a habit, not a project

The biggest mistake people make with context is treating it as a setup task they complete once. It isn't. It is a habit.

Every time you find yourself re-explaining something about your situation to an AI, that thing belongs in a context file. Write it down, add it to the right file, move on.

Open a notebook now

Write the names of five people who matter most to your work and one sentence each about what each of them cares about. That's the seed of 02-people.md.

If you do nothing else this week

Write people.md and priorities.md. One page each, dated. They cover most of what you'll ever need to brief the agent on, and they're the two that change least often.

Build mode
What this is
Three to five one-page files describing your world.
Why it matters
Closes the gap between generic advice and useful advice.
What to create
A context/ folder with the files above. Start with two.
Layer 3 · Skills

The repeatable jobs your agent can do without being re-taught.

A reusable recipe for one repeated job: trigger, sources, process, output, and verification.

Reusable recipe for one repeated job: trigger, sources, process, output.

If you are migrating an old system

Do not promote every old prompt into a skill.

Promote a prompt only if it describes a job you still do, it has worked more than once, and it can be made shorter, safer, and clearer.

If a workflow is stale, unclear, too broad, or tied too tightly to one tool, put it in rejected.md or rewrite it before use.

A skill is a reusable recipe for work your agent does more than once.

Without skills, repeated work becomes repeated prompting. You explain the same format, the same steps, the same sources, and the same cautions again and again.

A skill turns that repeated explanation into a file your agent can read before it works.

A good skill file has five parts:

  • Trigger - when the agent should use it.
  • Sources - what the agent should read first.
  • Process - the steps the agent should follow.
  • Output format - what the finished work should look like.
  • Verification - what the agent should check before it says the job is done.

Worked example

skills/morning-brief/SKILL.md
# SKILL: morning-brief

## Trigger
When the user asks for a daily operations brief, or when
this skill is scheduled to run.

## Sources
- brief.md
- context/priorities.md
- context/current-work.md
- memory/decisions.md, if available
- Calendar, task list, inbox, notes, or project folder,
  if connected

## Process
1. Identify what needs attention today.
2. Flag anything linked to a deadline, promise, risk,
  or decision.
3. Separate what needs action from what can wait.
4. Note anything missing, unclear, or assumed.
5. Suggest the next practical action for each urgent item.
6. Run the relevant verification checklist before
  presenting the brief.

## Output format
Use short plain headings:

- Needs attention
- Can wait
- Missing or unclear
- Suggested next actions

Keep the brief short enough to read quickly.

Do not send, publish, book, delete, or change anything
without approval.

## Verification
Before finishing, check:

- Have I separated facts from assumptions?
- Have I flagged missing context?
- Have I avoided taking action without permission?
- Have I kept the output short enough to be useful?

Name one repeating job

Choose one job you have explained to AI more than twice. That is your first skill. Do not choose the biggest job. Choose the repeated one.

If you do nothing else this week

Write one skill. Pick the repeated job that creates the most friction when you have to explain it again. Use the five-part format: trigger, sources, process, output, verification.

Build mode
What this is
A reusable recipe for one repeating job.
Why it matters
Stops you re-explaining the same format every Monday.
What to create
skills/<skill-name>/SKILL.md
Layer 4 · Memory

What the agent retains between conversations.

A built-in memory is a black box. A file in a folder is not.

memory/decisions.md - every meaningful decision, dated, with reasoning.

Memory is the fastest-moving layer in the whole stack. Every major tool is improving its memory system month by month, and the specifics will be different by the time you read this. The principles, though, are stable.

If you are migrating an old system

Old memory files often contradict each other - two decisions logs, three relationship notes, none of them current. Resolve contradictions before importing. If you cannot tell which version is right today, neither can the agent: log it in conflicts.md and leave both out.

Three things to do, in order

First, find out what your tool already remembers. Ask it directly: "Explain how your memory works. What do you remember between our sessions? What do you forget? What can I see, edit, or delete?"

Second, be deliberate about important things. Built-in memory does an acceptable job of remembering casual context, but it often misses the things that matter most: a major decision, a change in strategy, a relationship that just shifted. When something important happens, tell the agent explicitly: "Please remember this. It will matter later."

Third, keep your own structured memory files alongside the tool's built-in memory. A built-in memory system is a black box. A file in a folder is not.

Worked example

memory/decisions.md
# Decisions log

## 2025-10-04: Pause Beta Industries expansion
Decided not to pursue the second phase of Beta Industries
work this quarter.

Why: capacity is tight, Acme is mid-rebuild, and Sam (Beta)
is comfortable with current pace. Alternative considered:
hiring a contractor to bridge. Rejected because we'd
spend more on management overhead than we'd recoup.

Revisit: January 2026.

Open a fresh file

Write today's date and the last meaningful decision you made. That's the first entry in your decisions log. The next one writes itself when the next decision happens.

If you do nothing else this week

One file: memory/decisions.md. Every meaningful decision gets a dated entry with the alternatives you considered and why you chose what you did. This alone changes the agent's quality more than any other single memory file.

Build mode
What this is
A decisions log the agent reads at the start of each session.
Why it matters
Most of your future questions echo decisions you've already made.
What to create
memory/decisions.md
Layer 5 · Connections

Letting the agent reach the real world.

Connections are how an agent reaches the systems around your work: files, folders, calendars, inboxes, notes, project tools, websites, APIs, and other workspaces.

One read-only connection (calendar first) plus privacy.md.

If you are migrating an old system

Do not trust old notes about what is connected. Re-check live permissions in the tool itself. Half of "connected" integrations in messy setups are dead links to systems that no longer authorise the agent. Write down what is actually live today.

Some tools call these connectors. Some use integrations. Some use MCP. The names will change, but the principle is stable: connect slowly, grant the least access needed, and watch what the agent does before trusting it with more.

Pause before granting write access

This is the single most important rule in this entire guide. Do not give an agent permission to write into a system on day one. Read-only first. Watch it for two weeks. The cost of a mistake scales with the agent's capability - a read-only agent that misunderstands something just produces a confused summary; a write-enabled one sends the wrong email.

Shadow mode: the step between read and write

Between read-only access and full write access, there is a safer middle step.

In shadow mode, the agent drafts the action and shows you exactly what it would do, but does not execute it.

For example:

"I am about to send this email to Anna with this subject and this body. Confirm?"

You can build this into your skill files with one simple rule:

"Before executing any write action, show the proposed action and wait for confirmation."

Sensible connection order

  1. Calendar, read only.
  2. Notes, files, or project folder, read only.
  3. Inbox or messages, read only.
  4. Document store, read only.
  5. Draft creation only, with confirmation.
  6. Calendar event creation only, with confirmation.
  7. Anything that sends, publishes, deletes, books, buys, or changes records only after the agent has proved trustworthy.

Private data lives here

If your agent can read anything that includes other people's data, you need a written privacy rule.

That includes email, messages, client documents, customer records, shared folders, family information, medical notes, financial details, or internal team material.

Put the rule in connections/privacy.md. Keep it short and concrete.

The agent is more likely to follow ten clear rules than a hundred vague ones.

Worked example: privacy rules

connections/privacy.md
# Privacy and PII rules

## What the agent may NEVER write into memory or logs
- Customer payment data, card numbers, bank details.
- Full email bodies. Subject lines only, where needed.
- Health information.
- HR matters about named individuals.

## What the agent MUST redact before saving
- Email addresses of third parties.
- Phone numbers.
- Home addresses.

## What requires human approval before being shared externally
- Anything mentioning a client by name.
- Anything quoting from an internal document.
- Any draft going to someone I haven't emailed before.

Check one connector

Check whether your AI tool has a calendar connector. If yes, install it now, with read-only access. Don't add anything else until you've watched what it does with the calendar for a week.

If you do nothing else this week

One connection: read-only access to your calendar. Watch what the agent does with it for a week. Write privacy.md the day before you add a second connection that touches anyone but you.

Build mode
What this is
One read-only connection, plus the privacy rules file.
Why it matters
The cost of a mistake here is much higher than in any earlier stage.
What to create
connections/privacy.md and one read-only connector in your tool.
Layer 6 · Verification

Catching AI mistakes before they cost you.

The worst thing your agent can do is not fail visibly. The worst thing is to succeed confidently and wrongly.

A five-item checklist run against every output, every time.

If you are migrating an old system

Write the verification checklist before trusting any migrated skill. "It used to work" is not a verification. The whole point of a checklist is that it doesn't care how old the skill is.

Verification is the discipline of knowing what to check, before the output matters. It is not glamorous. It is the difference between an agent you can use for real work and an agent you still have to second-guess every time.

Worked example

verification/morning-brief.md
# Verification: morning-brief

Before I accept the brief and act on it, check:

1. The agent has separated facts from assumptions.
2. Anything urgent is linked to a real deadline, promise,
  risk, or decision.
3. The agent has not invented a name, number, date,
  deadline, or quoted phrase.
4. Anything missing, unclear, or assumed has been flagged.
5. Anything sensitive has been marked for human approval
  before action.

Show your work

For important outputs, add one more instruction to the skill:

"Before you finish, list the files you read, the connections you used, and any fact in your output you are less than confident about."

This is one of the simplest protections against confident mistakes.

Write a five-item checklist

For the one skill you have built, write down five things you would check before trusting its output.

That is your first verification checklist.

Use it when the output matters, especially before anything is sent, published, submitted, booked, deleted, or acted on.

If you do nothing else this week

Write one checklist for one skill.

Five items. Specific enough to catch real mistakes. Short enough to use when you are busy.

Build mode
What this is
A short checklist run against every output, every time.
Why it matters
Catches the confident-and-wrong failure mode that nothing else does.
What to create
verification/<skill-name>.md with five concrete items.
Layer 7 · Automations

Work the agent does without being asked.

Where the system starts to feel genuinely valuable. Also where the risk profile changes sharply.

Nothing automated for two weeks. Then one draft-only schedule.

If you are migrating an old system

Pause every existing automation before you migrate. Do not let yesterday's scheduled job run on tomorrow's authority. Re-test each one against the new OS, by hand, twice. Only then turn it back on.

An automation is a job the agent does on its own, on a schedule or in response to a trigger. The morning brief that lands in your inbox at seven. The end-of-week summary that posts to your team channel on Friday. An agent running at three in the morning with a wrong answer can do damage you don't see until the morning.

Do not automate this yet

If you have not run the morning brief yourself, by hand, for at least two weeks and been happy with every output, do not put it on a schedule. The temptation to automate too early is the single most expensive mistake at this stage.

Four rules

Only automate work you have run manually enough times to trust.

Default to drafts, not sends. The morning brief should land in your personal inbox for review, not go out to the team.

Log everything. Without logs, you cannot debug a problem you only notice three weeks after it started.

Set budgets and hard stops. Every automation gets a token budget per run and a time limit. Without this, a runaway automation can cost real money overnight.

Set a budget before this runs

An unbounded automation that runs hourly can cost ten times what you expect. Treat the budget as a circuit-breaker, not a target. Add it before the first scheduled run, not after.

Worked example

automations/schedule.md
# Automations

## morning-brief
- Runs: 06:30 weekdays.
- Output: my personal email.
- Token budget: 20,000 input, 5,000 output per run.
- Time limit: 90 seconds.
- On failure: silent retry after 5 minutes; second failure
  sends me a notification.

## friday-weekly-review
- Runs: 16:00 Fridays.
- Output: saved as draft in my notes folder, not sent.
- Token budget: 40,000 input, 8,000 output per run.
- Time limit: 3 minutes.
- On failure: notifies me immediately.

Don't automate anything yet

Set a reminder in two weeks to revisit this chapter. Spend the time between now and then running your one skill by hand and getting it right.

If you do nothing else this week

No automations at all for the first two weeks. Once you have one skill you trust completely, schedule it to drop its output into your personal inbox as a draft. Nothing else, for now.

Build mode
What this is
A schedule file describing what runs when, with budgets.
Why it matters
The biggest failure modes here are silent and expensive.
What to create
automations/schedule.md - only when you have something trusted to schedule.
Layer 8 · Evaluation

How you stop your agent getting quietly worse.

In plain language: small repeatable tests that catch regressions the same day they happen.

Three small tests your one skill must keep passing.

If you are migrating an old system

Test promoted skills with evals before using them live again. Migration is the most likely moment for silent regressions: file paths change, names get re-typed, rules get summarised away. The eval catches what your eye won't.

This stage doesn't exist in the popular seven-stage model. It's the single biggest gap in most agentic systems.

An eval is a test case the agent must keep passing. A known input, paired with what good output looks like, with a way to check whether the actual output meets the bar. Every time you edit your identity file, tweak a skill, or add a new context document, you're making changes whose downstream effects you cannot fully predict.

With even five or ten evals running weekly, you catch breakage the same day it happens. You can edit fearlessly. You can update to a new model the day it ships and know within an hour whether anything important broke.

Worked example

evals/morning-brief/02-busy-day.md
# Eval: morning-brief, busy day with conflicts

## Input
Simulated calendar (attached: busy-day.ics).
Simulated inbox (attached: busy-day-inbox.json).

## What good looks like
- All 7 meetings are listed.
- The double-booking at 14:00 is flagged.
- The two messages from Maria (Acme sponsor) are surfaced.
- The "one thing you might be missing" mentions either
  the unprepared 16:00 meeting or the unreplied Friday
  invoice query.
- No invented facts. No phantom meetings.
- Under one page.

## How to run
Run morning-brief skill against the inputs above.
Compare output to "what good looks like".
Score: pass / partial / fail.

## Last passed
2025-11-08.

Describe one good output

Open a fresh file. Describe one input where you already know what the right output should look like. That's your first eval - and the most valuable one you'll write, because you know what "good" means.

If you do nothing else this week

Three evals for your one skill, run weekly. The ones that matter most are the ones where a wrong answer would cost you the most: a brief that misses an angry client email, a meeting prep that gets the attendees wrong, a summary that misstates a number.

Build mode
What this is
Three small test cases your one skill must keep passing.
Why it matters
Without evaluation, every edit you make ships a silent regression.
What to create
Three files under evals/<skill-name>/.
Layer 9 · Feedback memory

Stop correcting the same thing every Monday.

Every correction you give the agent gets written down once and quietly stops happening.

memory/feedback.md - corrections logged once, quietly stop happening.

If you are migrating an old system

Most old feedback logs are noisy. Import only corrections that still reflect what you want today. A correction you wrote eight months ago about a person who has left, or a tool you no longer use, doesn't help the new agent; it confuses it.

Every time you correct the agent, the correction gets written into a feedback file. The agent reads this file at the start of every session. Without this, you correct the same things over and over for months. With it, every correction is logged once and quietly stops happening.

Worked example

memory/feedback.md
# Feedback log

## 2025-10-15
- I asked for a brief, you gave me bullets. I want prose.
  Always prose unless I explicitly ask otherwise.

## 2025-10-18
- You called Maria "Mary" in a draft. Always cross-check
  names against 02-people.md before drafting.

## 2025-10-22
- You drafted a reply to Sam that opened "Hi Sam, I hope
  this finds you well." I never write that. Cut the
  pleasantries on internal email.

## 2025-11-04
- You suggested I drop the Acme weekly update to save time.
  Maria specifically needs that weekly update; it is in
  02-people.md. Don't suggest dropping standing
  commitments without checking the people file.

## 2025-11-09
- You confidently stated a number from last quarter's
  report. The number was wrong. If you can't verify a
  number from a primary source in the connections, say
  "I don't have a verified number for this."

This file is the single highest-leverage document in your operating system. It captures the patterns no model will ever learn from training, because they're specific to you. After three months, you have a deeply personal manual that quietly closes most of the gap between "good first draft" and "ready to send".

Create a fresh file called memory/feedback.md. Add today's date as a heading. Leave it ready for the first correction you make today.

If you do nothing else this week

One file: memory/feedback.md. Add to it any time you correct the agent. Add one line to AGENTS.md telling the agent to read this file before responding. That's the entire setup.

Build mode
What this is
A dated log of corrections, read at the start of every session.
Why it matters
Closes the loop. Without this, you teach the same lesson every Monday.
What to create
memory/feedback.md

This is the layer Agent OS Updates helps keep current as tools change.

MIGRATION - AN AVAILABLE ROUTE

Migration with judgement.

This chapter is for people with an existing setup that feels messy, stale, contradictory, or hard to trust. Do not move old files into the new Agentic OS. Promote them into it.

The reason migration goes wrong is almost always the same. People copy old prompts, context dumps, and half-remembered instructions into the new folder, hoping a fresh structure will rehabilitate them. It doesn't. The new folder inherits the old confusion and now you have two places to maintain it.

Treat your existing setup as raw material, not authority. The new Agentic OS is the only authority. Old files only get in by passing a written gate.

Method · 10 steps

How to migrate without breaking trust

  1. Freeze the old system. Stop editing it. Whatever lives there is now a snapshot, not a work-in-progress.
  2. Create a new clean authority folder. Empty. Named clearly. This is the only place agents will read from.
  3. Remove old folders from routine use. Do not let normal agents read them. Move them out of the tool's reach, or rename so nothing auto-loads.
  4. Inventory old files as candidates. Write them down in 90-import-review/candidates.md. Don't review them yet.
  5. Assess each candidate against the promotion gate (right). Be honest. Most things fail.
  6. Promote only what is current, useful, non-contradictory, safe, and short enough to trust.
  7. Rewrite as you promote. Compress. Re-date. Place it in the correct layer of the new OS. Do not paste.
  8. Log what was rejected, and why. Put it in rejected.md. This stops the same debate happening twice.
  9. Retire old authority. Make a one-page retired-authority.md noting where the old system lived and that it no longer governs.
  10. Point agents only at the new OS. Re-check tool settings, MCPs, automations. None should still read the old folders.

The promotion gate · every imported file must pass it

  • Is it still true?
  • Is it current?
  • Is it useful to an agent?
  • Is it safe for an agent to read?
  • Is it shorter than the old version?
  • Does it conflict with anything already promoted?
  • Which layer does it belong to?
  • Should agents read this routinely, occasionally, or never?

Where migration work lives in the folder

Two folders are added purely for migration. Neither becomes routine context for any agent. They exist so the work of judgement leaves a trace.

your-agent-os/ · migration folders
your-agent-os/
├── 90-import-review/           migration work-in-progress
│   ├── candidates.md           every old file, listed
│   ├── promotion-log.md        what was promoted, when, into which layer
│   ├── conflicts.md            contradictions found during review
│   └── rejected.md             what was discarded, and why
└── 99-archive-index/           traceability only — not authority
│   ├── old-system-map.md       where the old setup lived
│   └── retired-authority.md    the note that says "this no longer governs"

These folders are not context

Don't point agents at 90-import-review/ or 99-archive-index/ as routine context. They are workshop benches, not authority. The agent only reads the rest of the OS.

If you do nothing else this week

Freeze the old system. Create the new authority folder. Write candidates.md with every file from the old setup listed (no review yet). That alone breaks the cycle of "just add one more prompt on top".

Build mode
What this is
The 90-import-review folder and the promotion gate, written down.
Why it matters
Migration without judgement re-creates the mess in a tidier folder.
What to create
90-import-review/candidates.md and 99-archive-index/retired-authority.md.
Reference

When you have more than three agents.

This section only matters once you've built three agents. Skip it for now if you haven't.

When you have one agent, you talk to it directly. When you have two or three, you remember which one to use for what. When you have four or more, you start to lose track. Agents start to overlap. You ask one to do a job, then realise another would have done it better. You get contradictory outputs because two agents are reading the same context file and reaching different conclusions.

The solution is an orchestration layer. There are two reasonable shapes.

A router agent. A single agent whose job is to read your request and decide which specialist agent should handle it. You always talk to the router. The router talks to the others. Works well if you build it carefully, but adds a step and a model call to every request.

An explicit table. A single file called agents/index.md that lists every agent, what they do, when to use them, and what they share. You read this file (and so does the AI) when you're not sure which agent to call. Simpler, no extra model call, no risk of the router making a wrong choice.

For most people, the table is enough. The router becomes useful at five or six agents, not before.

Reference · the whole folder

Full folder structure.

A folder structure that works across every major AI tool we know of. Copy it as a starting point and adjust.

your-agent-os/ (complete)
/your-agent-os/
├── README.md                   one-page summary of what this is
├── AGENTS.md                   the identity file (Layer 1)
├── context/                    Layer 2
│   ├── 01-about-me-or-us.md
│   ├── 02-people.md
│   ├── 03-priorities.md
│   ├── 04-operating-principles.md
│   └── 05-current-projects.md
├── skills/                     Layer 3
│   ├── morning-brief/
│   │   └── SKILL.md
│   ├── meeting-prep/
│   │   └── SKILL.md
│   ├── reply-triage/
│   │   └── SKILL.md
│   └── voice-match/
│   └── SKILL.md
├── memory/                     Layers 4 and 9
│   ├── decisions.md
│   ├── relationships.md
│   ├── learnings.md
│   └── feedback.md
├── connections/                Layer 5
│   ├── README.md               what is connected, at what permission
│   ├── permissions-log.md      every permission change, dated
│   └── privacy.md              PII and data handling rules
├── verification/               Layer 6
│   ├── checklists.md           one short checklist per skill
│   └── retrospectives.md       monthly audit notes
├── automations/                Layer 7
│   ├── schedule.md             what runs when, where output goes
│   ├── safe-defaults.md        drafts only, log everything, budgets
│   └── logs/                   one file per automation run
├── evals/                      Layer 8
│   ├── morning-brief/
│   ├── meeting-prep/
│   └── voice-match/
├── agents/                     for Level 3 and above
│   └── index.md                map of agents and their jobs
├── 90-import-review/           migration work-in-progress
│   ├── candidates.md           every old file, listed
│   ├── promotion-log.md        what was promoted, when, into which layer
│   ├── conflicts.md            contradictions found during review
│   └── rejected.md             what was discarded, and why
└── 99-archive-index/           traceability only — not authority
│   ├── old-system-map.md       where the old setup lived
│   └── retired-authority.md    the note that says "this no longer governs"

Save this folder somewhere your tool can reach it. For most people that means a cloud-synced folder (Dropbox, OneDrive, iCloud, Google Drive) so it follows you across machines. Point your AI tool at the top-level folder.

Everything you need to build this manually is on this page, for free.

Optional · skip the assembly work

Agent OS Starter Kit · £29

Everything you need to build manually is on this page. If you'd rather skip the assembly - including the migration folders and templates - the Starter Kit packages the whole thing as ready-to-use files.

Starter Kit

£29 one-time

"Give me the files."

  • Complete folder, ready to drop in
  • All blank templates pre-filled and named
  • Worked Daily Operations Partner example
  • Claude, Codex, Cursor setup variants
  • Privacy rules, verification, evals, feedback memory starter
  • Migration templates: candidates.md, promotion-log.md, conflicts.md, rejected.md, old-system-map.md, retired-authority.md, plus the migration first-week plan and full checklist
  • 30 days of Agent OS Updates included
  • Starter Kit buyers keep the kit forever, even if they don't continue Updates.
  • Agent OS Updates can be cancelled any time.
  • 14-day refund on the Starter Kit, no questions.
  • UK VAT is included in the displayed price. International tax handled at checkout.
  • If Updates lapses, you keep the last starter folder you downloaded; new patches stop.

Worked example · Daily Operations Partner

A fully populated version of the folder above, configured for the running example. Drop it into your cloud-synced folder, swap the placeholders for your own people and clients, and you have a working agent on day one.

Daily Operations Partner · file map
daily-operations-partner/
├── AGENTS.md                     your filled-in identity
├── README.md                     one-page setup guide for new tool
├── context/
│   ├── 01-about-me-or-us.md      pre-filled placeholders
│   ├── 02-people.md              12 example entries to adapt
│   ├── 03-priorities.md
│   ├── 04-operating-principles.md
│   └── 05-current-projects.md
├── skills/
│   ├── morning-brief/SKILL.md    ready to schedule
│   ├── meeting-prep/SKILL.md
│   ├── reply-triage/SKILL.md
│   └── voice-match/SKILL.md
├── memory/
│   ├── decisions.md              example seed entries
│   ├── relationships.md
│   ├── learnings.md
│   └── feedback.md               pre-seeded with five common patterns
├── connections/
│   ├── README.md
│   ├── permissions-log.md
│   └── privacy.md                full GDPR-aware starter
├── verification/
│   ├── morning-brief.md
│   ├── meeting-prep.md
│   └── retrospectives.md
├── automations/
│   ├── schedule.md               safe defaults
│   ├── safe-defaults.md
│   └── budgets.md
├── evals/
│   ├── morning-brief/            (4 scenarios)
│   ├── meeting-prep/             (2 scenarios)
│   └── voice-match/              (1 scenario)
└── tool-setup/
│   ├── claude-code.md
│   ├── cursor.md
│   └── codex-cli.md
Tool setup

Claude, Codex, ChatGPT, Cursor.

The folder is the system. The tool is how the folder gets used.

This section ages faster than the rest

Tool details change quickly. Always cross-check against your tool's current documentation before relying on exact file names or setup paths. Everything in earlier chapters should still be accurate years from now; this section is the one most likely to be out of date by next quarter.

Claude & Claude Code (Anthropic)

Claude Code commonly reads CLAUDE.md at the top of any folder you point it at. To stay portable, keep AGENTS.md as your source of truth and either duplicate it into CLAUDE.md or make CLAUDE.md a one-line pointer to it. Skills go in .claude/skills/<skill-name>/SKILL.md. Connections are added through MCP servers configured in ~/.claude.json or via the tool's settings.

Cursor

Cursor reads AGENTS.md at the top of any project folder. Older setups may use .cursorrules. Keep your main foundation in plain files and only adapt the edge. MCP connections are configured in Cursor's settings.

OpenAI Codex

The Codex CLI reads AGENTS.md in the workspace. Keep tool-specific details minimal and put the durable rules in the shared foundation. Check the current Codex documentation for MCP and external integration support; this area is moving quickly.

ChatGPT

ChatGPT does not work like a local coding workspace that automatically reads AGENTS.md from a folder. To carry the foundation into ChatGPT, use a Project for the work, add project instructions based on your AGENTS.md, upload or attach the relevant plain text files, and use memory deliberately. Custom Instructions can carry your short global preferences; Projects are better for a specific Agentic OS because they keep chats, files, instructions, and project memory together.

  1. Create a ChatGPT Project for the agent or workstream.
  2. Paste the short version of AGENTS.md into project instructions.
  3. Add the relevant context, skill, verification, and feedback files as project files where the interface supports it.
  4. Ask ChatGPT what project files and memories it is using before relying on important output.
  5. Keep your plain text folder as the source of truth. ChatGPT receives a copy or selected export of the foundation, not the authority itself.
Anything else following AGENTS.md

A growing number of tools agree on AGENTS.md at the root of your working folder. If your tool isn't named above, check its documentation for "AGENTS.md" or "instructions file". If it has one, the structure in this guide works as-is.

A note on portability. If you use more than one tool, point all of them at the same folder. That is the whole reason we are building a foundation rather than a tool-specific configuration.

Optional · keep your setup current

Agent OS Updates.

AI tools change constantly. Most changes do not matter to your foundation.

Agent OS Updates exists to tell you what does matter. It is not AI news. It is a maintenance layer for your Agentic OS. Each update answers five questions:

  1. What changed?
  2. Does it affect the foundation?
  3. What should I ignore?
  4. What should I change?
  5. Which file needs editing?

Keep your Agentic OS current without following AI news every day.

Optional · maintenance, not news

Subscribe to Agent OS Updates.

When a tool change actually affects your operating system, we tell you what changed, what matters, what to ignore, and exactly which files to update. Impact ratings on every alert. Copyable patch snippets.

Cancel any time. A free monthly newsletter is also available.

A pace that works

Your first week.

A pace that works for most people. Adjust to your reality.

  1. Day 1About 90 minutes. Decide your one-sentence agent. Create the folder structure. Write the first draft of your identity file using the AI-interview method. Don't try to make it perfect.
  2. Day 2About an hour. Draft your five context files. One page each. Dated. You will hate them and want to keep writing. Stop at one page each.
  3. Day 3About 90 minutes. Write the first skill your agent needs. Just one. Run it once. Note what went wrong. Edit the skill, not the conversation.
  4. Day 4About 30 minutes. Add one read-only connection. Watch what the agent does with it for a session. Do nothing else.
  5. Day 5About an hour. Write a one-page verification checklist for your first skill. Run the skill three times. Use the checklist each time.
  6. Day 6About an hour. Write your first three evals for the one skill you've built. Run them. Note any that fail and why.
  7. Day 7Leave it alone. Use it normally. Each correction goes into memory/feedback.md. You will fix them next week.

After week one you have a real, modest, working agentic operating system with all nine layers represented at the minimum useful level. After a month you have something you genuinely trust for a slice of your work. After a quarter you will wonder how you ever worked without it.

Reference

Cost and time.

Money

You'll need a paid subscription to at least one AI tool. The useful tier is in the region of £20/month for a personal account, and rises with usage. If you run automations through an API rather than a personal subscription, plan for between £5 and £50/month per active agent depending on volume, with the higher end covering anything that runs hourly.

This guide can be used without buying anything from this site. If you want the convenience products, the optional Starter Kit is £29 one-time, and Agent OS Updates is £9/month or £79/year.

Tokens, in plain English

AI tools do not read and write in pages or words. They read and write in tokens.

A token is a small chunk of text. Part of a word, a whole short word, a number, a bit of punctuation, or a space can all count as tokens. You do not need to calculate them exactly. You just need to understand the pattern.

The more text the model has to read, the more tokens you use. The more text it writes back, the more tokens you use.

The expensive parts are usually not the short messages you type. They are the hidden or repeated things around the message: long instructions, large context files, pasted documents, previous conversation history, tool results, search results, logs, and automations that run again and again.

This is why the Agentic OS uses short, separate files instead of one huge instruction document. Give the agent the right files for the job. Do not make every agent read everything every time.

What burns through tokens fastest

  • Automations that run often.
  • Agents reading large folders when they only need one file.
  • Long identity or context files loaded into every task.
  • Asking for long outputs when a short answer would do.
  • Re-running failed jobs without fixing the cause.
  • Router agents that add an extra model call to every request.
  • Large inboxes, calendars, PDFs, transcripts, logs, or document stores being read too broadly.
  • Using a powerful model for simple formatting, sorting, or extraction work.

How to keep token use under control

  • Keep AGENTS.md short.
  • Split context into small topic files.
  • Tell the agent which files to read for the job.
  • Use cheaper or smaller models for simple tasks where your tool allows it.
  • Use stronger models for judgement, synthesis, risk, strategy, or difficult writing.
  • Put token budgets and time limits on scheduled automations.
  • Run new automations manually before scheduling them.
  • Prefer summaries and indexes over making the agent read everything.
  • Review logs when costs rise. Something is usually reading too much, running too often, or producing too much.

Set a budget before this runs

The hidden cost is the connection layer. Many systems you'll want to connect to have their own subscription tiers and API costs. If your interactive use is the issue, the usual culprit is excessive context (an oversized identity file or context library being read on every turn). Trim.

Time

The first week is about eight hours, spread as the plan above suggests. After that, maintenance is roughly 30 minutes a week, plus 30 minutes a month for the retrospective. Adding each new agent takes anywhere from an afternoon to a day, depending on how different it is from your existing ones.

The honest claim: in the first three weeks you'll spend more time on the system than the system saves you. From week four onward, it starts to win back the time. By month three the time savings are real and growing. If you don't see that pattern, something is wrong and the troubleshooting section below covers it.

When it tangles again

Troubleshooting.

When it tangles again, come back here.

Three likely causes. First, the file is in the wrong place. Check that it's named exactly what your tool expects (AGENTS.md, CLAUDE.md, etc.) and at the top of the folder the tool is reading. Second, the file is too long. Most tools cap how much of it they actually pay attention to. If your identity file is over three pages, cut it. Third, your conversation contains an explicit instruction that overrides the identity file. Tools generally let the most recent instruction win.

This is the most dangerous failure mode and the most common. Three remedies. First, add an explicit rule to the identity file: "Never invent a name, a date, a number, or a quoted phrase. If you don't know, say so and ask." Second, add the "show your work" requirement to every skill, with an uncertainty declaration at the end. Third, build evals that specifically test for this. A morning brief that fabricates a meeting that wasn't on the calendar should fail your eval and be caught the same day.

Almost always a problem with either the identity file or the voice-match skill. Add three or four specific examples of your actual writing to a context file called voice-samples.md and tell the agent to use those as the reference. Tone is much easier to copy from examples than to describe in rules.

You don't have logging or failure notifications turned on. Go to your automations/schedule.md file and add, for every automation, an "on failure" line that notifies you. Set up a weekly check that confirms each automation has at least run, even if you don't read every output.

This usually means you need to revisit the privacy rules file in the Connections layer. Revise connections/privacy.md today. Be specific about what may never be written into memory or logs and what requires human approval before sharing externally. Also check whether you gave the agent more permission than it needs. Most agents only need read access; review your connections list and downgrade anything that doesn't have a clear reason for write access.

The diagnosis is almost always one of three things. Your context files have gone stale (open them, check the dates, update or delete anything older than a quarter). You don't have evals, so changes have introduced silent regressions (write five evals this week). You don't have a feedback memory file, so you keep correcting the same things (start one today).

You don't have a feedback memory file, or you have one but you don't enforce reading it at the start of each session. Add a line to the identity file: "Always read memory/feedback.md before responding to me." Build the habit of writing each correction into the file at the moment you make it.

You have hit Level 3 without an orchestration layer. Write the agents/index.md file described in the orchestration section. A simple table is usually enough. If even that isn't, consider a router agent.

Check your automations. An unbounded automation that runs hourly can cost ten times what you expect. Add a token budget and a time limit to every automation in safe-defaults.md. If your interactive use is the issue, the usual culprit is excessive context (an oversized identity file or context library being read on every turn). Trim.

This happens when memory is updated without curation. Once a month, ask the agent to read through memory/decisions.md and memory/learnings.md and surface any contradictions or stale entries. Clean them up. Treat memory the way you treat a filing cabinet: occasional pruning keeps it useful.

You probably should not delete everything. But you may absolutely need a new authority folder. Those are two different things.

Bad starting over Deleting everything and pretending the old work had no value. You lose months of accumulated context that was the most valuable part of what you built. You also tend to rebuild the same mistakes, since you didn't write down what was wrong.
Good starting over Create a clean new OS folder. Treat the old setup as raw material. Promote only what passes the promotion gate. Log what you reject and why. Retire the old authority by writing down that it no longer governs.

If that sounds like what you actually want, you are not starting over - you are migrating with judgement. Read that chapter first.

Prevention beats troubleshooting

Get Maintenance Alerts before your setup drifts.

Agent OS Updates tells you what actually affects your operating system, what to ignore, and exactly what to change.

Appendix

Workbook · blank templates.

Copy any of these. Fill them in. Keep them short on purpose. The Starter Kit packages these as ready-to-use files.

Blank AGENTS.md
AGENTS.md
# AGENTS.md

## Who I am
[Three or four sentences about your role and your work.]

## How I communicate
- [Direct or diplomatic.]
- [Prose or bullets.]
- [UK or US English.]
- [Pushback wanted or not.]

## Rules
- [Never do X without showing me.]
- [Always do Y.]
- [Flag Z before proceeding.]

## Last updated
[YYYY-MM-DD]
Blank context file
context/*.md
# [Name of the thing]

## Last updated
[YYYY-MM-DD]

## [Section 1]
[One short paragraph or a few bullets.]

## [Section 2]
[One short paragraph or a few bullets.]
Blank skill file
skills/<skill>/SKILL.md
# SKILL: [skill-name]

## Trigger
[When the agent should use this skill.]

## Sources
[What files, connections, or memory the agent should read.]

## Process
1. [Step one.]
2. [Step two.]
3. [Step three.]

## Output format
[Length, format, where it gets saved.]

## Verification
[Brief reminder to run the matching checklist.]
Blank verification checklist
verification/<skill>.md
# Verification: [skill-name]

Before accepting the output, check:

1. [Specific factual check.]
2. [Specific format check.]
3. [Specific tone check.]
4. [No invented names, numbers, or quotes.]
5. [Anything sensitive flagged for human approval.]
Blank eval
evals/<skill>/01-scenario.md
# Eval: [skill-name], [scenario name]

## Input
[The exact input. Attach files or paste inline.]

## What good looks like
- [Specific item the output must include.]
- [Specific item the output must not include.]
- [Format requirement.]

## How to run
Run [skill] against the input above. Compare to "what
good looks like". Score: pass / partial / fail.

## Last passed
[YYYY-MM-DD]
Blank feedback entry
memory/feedback.md
## [YYYY-MM-DD]
- [What I corrected, in one or two sentences. Include the
  rule the agent should follow next time.]

The free templates above are loose pages. The Starter Kit packages all of them as a complete labelled folder with the worked example wired up.

Appendix

Glossary.

Come back to this whenever a word stops making sense.

Agent
A piece of software that uses a language model to do work across multiple steps, often with the ability to use tools or reach external systems.
Agentic
An adjective meaning "able to take actions, not just produce text". An agentic AI does things in the world; a non-agentic AI just talks.
Agentic OS
The set of files and configurations that any AI tool reads to know who you are and how you work. The subject of this guide.
API
Application Programming Interface. The way one piece of software talks to another. "Using the API" usually means calling a model or service directly from code rather than through a chat window.
Connection
A link between your agent and a real system (calendar, email, CRM) so the agent can read or act on real data.
Context
The information about your specific situation that the model doesn't know from its training. Also: the information loaded into a single conversation so the model can use it.
Context window
The maximum amount of text a model can pay attention to in one go. Measured in tokens.
Eval
Short for evaluation. A test case that compares an agent's actual output to what good output looks like. Run regularly to catch regressions.
Feedback memory
A log of corrections you've given the agent, captured and re-read at the start of each session.
Harness
The wrapper around a model that gives it the ability to use tools, read files, and do work. Claude Code is a harness. Cursor is a harness. The model is the brain; the harness is the body.
Identity file
The single document your AI tool reads before every conversation, describing who you are, how you communicate, and what rules to follow.
LLM
Large Language Model. The kind of AI that powers ChatGPT, Claude, and similar tools.
MCP
Model Context Protocol. An open standard that lets AI tools talk to external systems without bespoke integrations. Think of it as a universal plug socket.
Memory
What the agent retains between conversations. Some tools have built-in memory; some let you maintain your own memory files alongside.
Model
The neural network that does the thinking. Claude, GPT, Gemini are all models. Tools (harnesses) wrap models.
Prompt
The instruction or question you give the model. Used loosely; sometimes means "the message you typed" and sometimes "everything the model is reading at this moment".
RAG
Retrieval-Augmented Generation. A pattern where the agent searches a body of documents for relevant pieces and feeds only those pieces to the model.
Read-only access
Permission for the agent to read but not write or change anything in a connected system. Always your starting point.
Shadow mode
An intermediate trust level between read-only and full write. The agent drafts the action and shows it to you, but doesn't execute until you confirm.
Skill
A reusable recipe for a job the agent does repeatedly. Written once in a file, used forever.
Token
A small chunk of text (roughly three quarters of a word in English). Models read and write in tokens. Costs and budgets are usually measured in tokens.
Tool use
When the model decides to call an external function or service rather than just produce text.
Verification
The discipline of checking the agent's output before acting on it.
Write access
Permission for the agent to create, change, or send things in a connected system. Earn it slowly through trust.
Next steps

Where to go from here.

  1. Build your first agent this week, following the day-by-day plan.
  2. Use it for two weeks, in earnest, on real work.
  3. Come back to this guide. The parts that seemed abstract on first reading will be the ones you most want to revisit.
  4. Add your second agent. Notice how much faster it goes.

The tools will keep changing. New ones will arrive every quarter. Models will get better. None of that will require you to start over. Your operating system travels with you, and every new capability that arrives lands on the same patient foundation.

Welcome to the calm side of agentic AI. There is no rush. The work compounds.

If you'd like more than the page

One-time, or ongoing.

Starter Kit

£29 one-time

"Give me the files."

  • The complete folder, worked example, tool variants
  • Includes 30 days of Agent OS Updates
  • Yours to keep, even if you don't continue

A free monthly newsletter is also available. Role-specific packs are in development.