Series: Build an Agent Harness

Have You Built an Agent Harness Yet?
AI doesn't remember your project, Markdown does
Do We Even Need Multiple Tools?
Sandboxing an AI Harness on macOS
Teaching Skills to an AI Harness
Replacing Bash with Swift in an AI Harness

21 April 2026 32^min read

Have You Built an Agent Harness Yet?

For years I have repeated a thing that I still believe. Every programmer should write a promise library once. I think agent harnesses are the 2026 version of that exercise.

If you use AI coding tools every day, and especially if you have opinions about agent workflows, subagents, MCPs, commands, skills, and whatever new ritual the week invented, build a tiny one yourself. Once. Not to ship it. Not because you are going to beat the existing ones, although honestly, with so much vibed slop around it would not be that hard. The point is that after you build one, the whole space stops feeling mystical. You stop thinking in terms of what big corp marketing wants and start thinking in terms of simple reality.

There is a model. There is a loop. There are tools. There is context. That is the heart of it.

Of course the real products do much more. But it’s still built around a core that is much simpler than people think.

What is an Agent Harness

A harness is just the little program that sits between the model and the outside world. That is it. Yes, a normal program with normal code. I guess nowadays we call it “classic”.

The model, the AI, does not touch your filesystem. The model does not open files. The model does not run commands. It only autocompletes text.

It is the harness that decides what messages get sent to the model, what context gets included, how the replies are interpreted, what tools the model has to see the world, and what result gets sent back after something happens locally.

That also means the conversation is yours.

This is one of the big things I think people should internalize. The continuity of the chat, the remembered context, the tool results, all of that is managed by the harness. The model endpoint is not secretly keeping your whole little world alive for you, it doesn’t have all your project in memory. The harness keeps appending messages and resending what matters. Sometimes even removing old ones!

Now, to be fair, modern APIs offer more stateful variants and convenience helpers on the server side. That changes the ergonomics, one would argue that for the worse, but not the core mental model. Somebody still owns the conversation contract, and when you build your own harness that somebody is you.

So the smallest possible mental model is this:

You send a message.
The harness, the program you are running and interacting with, adds a bunch of context that the user doesn’t see, what’s often called the “system prompt”, and sends that to the LLM.
The model replies with text, by autocompleting from the last message it has received, which includes the entire history.
Your program decides what that text means.
If needed, your program does something in the world.
The result goes back as more text and context.

That is the whole trick. No magic. No AGI.

Let’s build a simple Harness

I can describe what a harness is, but to make sure we internalize it let’s build a simple one, from scratch.

To start, let’s keep the first step easy and simple. Let’s just make a CLI app that lets you send messages to the AI and shows you the LLM responses. Very simple, but useful to see how this works if you’ve never seen it before, and a necessary step before we get into the proper agentic features.

Let’s start with a simple CLI Swift package. Nothing fancy. No framework for agents. No giant abstraction tower. Just swift package init --type executable and the bits we actually need. As usual for command line tools in Swift, I used swift-argument-parser. That gives me a proper executable entrypoint with typed arguments.

At this stage the executable needs only a few things:

The base URL of the LLM chat-completions endpoint.
The model name.
An API key, loaded from the environment.

That is already enough to talk to an LLM.

For this post we will assume we have access to some LLM API, which in my project is represented by OpenAICompatibleClient. It’s not the interesting part of the project, but it is useful glue. It takes messages, performs the HTTP call to an OpenAI-compatible endpoint, and gives us back the assistant text. Don’t think it does anything special, it’s literally just an HTTP request-response.

The SwiftAgentHarness app is just a AsyncParsableCommand that reads the arguments, validates the endpoint, loads the API key from the environment, instantiates the client, creates an Agent, and calls run():

@main
struct SwiftAgentHarness: AsyncParsableCommand {
    @Option(help: "LLM chat completions endpoint.")
    var baseURL: String

    @Option(help: "Model name to use.")
    var model = "gpt-5.4"

    mutating func run() async throws {
        let environment = ProcessInfo.processInfo.environment
        guard let apiKey = environment["LLM_API_KEY"], apiKey.isEmpty == false else {
            throw HarnessError.missingAPIKey
        }
        guard let endpoint = URL(string: baseURL) else {
            throw HarnessError.invalidArguments("Invalid base URL: \(baseURL)")
        }

        let client = OpenAICompatibleClient(apiKey: apiKey, baseURL: endpoint, model: model)
        let agent = Agent(client: client)
        try await agent.run()
    }
}

At this point you can see our agent harness only needs the API client, nothing more. Let’s now make this Agent do something very basic.

struct Agent {
    let client: OpenAICompatibleClient

    func run() async throws {
        print("Swift Agent Harness")
        print("Model: \(client.model)")
        print("Ctrl+C to quit")
    }
}

With this in place we can now iterate on the Agent itself.

Having a conversation

The first step is to set up what’s needed to have a conversation with the AI. A simple chat, nothing else.

The key thing to add now is the conversation structure itself. Not just reading one line and making one HTTP request, but actually keeping the state of the exchange in memory.

First, we need a system prompt. This is the hidden part of the context that the user never sees but that we, as the harness developers, can manipulate. This is quite an important piece for AI to behave as one desires, it’s part of the context, and as you know, context is all that matters (because remember, it is all there is, the only thing the AI sees).

struct Agent {
    private let systemPrompt = "You are a helpful assistant."

Then we need to construct and keep around the conversation array. This is the state that we maintain to keep track of the whole conversation between the user and the AI (and later extra things our harness will do)

    
    func run() async throws {
        print("Swift Agent Harness")
        print("Model: \(client.model)")
        print("Ctrl+C to quit")

        var conversation = [Message(role: "system", content: systemPrompt)]

Then we need the loop that reads user input from the terminal.


        while true {
            print("\u{001B}[94mYou\u{001B}[0m: ", terminator: "")
            guard let input = readLine(), input.isEmpty == false else {
                continue
            }

That gives us the human side of the conversation. But that input is not useful yet until we actually append it to the state that we keep around.


            conversation.append(Message(role: "user", content: input))

Now comes the important bit. We send the entire conversation, not just the last line typed by the user.


            let response = try await client.send(messages: conversation)

And once we get the response back, we print it and append it too.


            print("\u{001B}[93mAssistant\u{001B}[0m: \(response)")
            conversation.append(Message(role: "assistant", content: response))
        }
    }
}

And yes, this is the first place where the illusion starts to break. The only reason the AI feels like it remembers what you said two messages ago is because the harness, the app, keeps sending the history back.

That is already a harness. Not a tool-using one, not a coding agent yet, but a simple one that is just a chat. But definitely a harness.

And we can give it a spin!

Swift Agent Harness
Model: gpt-5.4
Ctrl+C to quit
You: Hello
Assistant: Hello! How can I help you today?
You: What can you tell me about this project.
Assistant: I’d be happy to help. Please share the project details—such as a description, code, repository link, files, screenshots, or goals—and I can explain:

- what the project does
- its architecture and components
- the technologies used
- how the code is organized
- likely strengths, risks, and next steps

If you want, you can paste the README, folder structure, or source files here.

This is why I wanted to start here. Before the model can read files or edit code or do anything that looks magical, it first has to live inside a very boring loop. Read input. Append message. Call model. Print output. Append response. Repeat.

This was where AI started, where the ChatGPT revolution stayed for quite some time. You can see how we asked a question about the project, and it had no clue about it. There is no magic, the AI doesn’t know about our project, it has no access to it, so it just replies asking for more context. For a long time, even today, people still use AI this way, copy-pasting information into the context. But that’s not the revolution we expected.

So then, we gave it tools.

AI interacting with our world

So far we only have a chat. Useful, yes. But not very exciting. The AI can only talk about whatever is already in the context. It still cannot see our project or interact with the world around it.

This is the first real step into agentic territory. We need to give the model a tool.

The important thing here is that there is still no hidden magic API involved. Current models are good enough that you can often teach them a new local convention just by adding it to the context. So for our little harness we do not need any special provider feature. We just extend the system prompt and tell the model that if it wants to use a tool it must reply in a very specific format.

To keep the first step small, let’s just add one. read_file.

private let systemPrompt = """
You are a helpful assistant with access to one tool.

Tool:
Name: read_file
Description: Read a UTF-8 text file from disk.
Arguments: path

When you want to use the tool, reply with exactly one line in this format and nothing else:
TOOL_CALL {"name":"read_file","arguments":{"path":"some/file.txt"}}

After receiving a TOOL_RESULT message, continue the task.
If no tool is needed, answer normally.
"""

This is worth pausing on. The tool is not just a function in Swift. The tool is also part of the prompt. The model needs to know that it exists, what it is for, and what shape of arguments it expects. The better you describe the tool, the better the model can use it.

Then, after the model replies, we check if the reply starts with TOOL_CALL.

let response = try await client.send(messages: conversation)

let trimmed = response.trimmingCharacters(in: .whitespacesAndNewlines)
if trimmed.hasPrefix("TOOL_CALL ") {
    print("tool-call: \(trimmed)")

    let outcome: ToolOutcomeResult
    do {
        // Get the tool call text from the LLM
        let payload = String(trimmed.dropFirst(10))
        guard let data = payload.data(using: .utf8) else {
            throw HarnessError.invalidToolInvocation
        }
      
        // Parse the tool call text to the JSON contract we specified in the prompt
        let invocation: ToolInvocation
        do {
            invocation = try JSONDecoder().decode(ToolInvocation.self, from: data)
        } catch {
            throw HarnessError.invalidToolPayload(payload)
        }
      
        // Check for the only tool we have
        guard invocation.name == "read_file" else {
            throw HarnessError.invalidToolInvocation
        }

        // Run the "tool"
        let path = invocation.arguments["path"] ?? ""
        let url = URL(fileURLWithPath: path)
        let result = try String(contentsOf: url, encoding: .utf8)
        print("tool-success: read_file -> \(summarizeToolResult(result))")
        outcome = .success(result)
    } catch {
        let errorMessage = "Tool invocation failed: \(error.localizedDescription). Reply again, either with a valid TOOL_CALL line or a normal answer."
        print("tool-error: \(errorMessage)")
        outcome = .failure(errorMessage)
    }

    conversation.append(Message(role: "assistant", content: response))
    conversation.append(Message(role: "user", content: formatToolResult(outcome)))

    let followUp = try await client.send(messages: conversation)
    print("Assistant: \(followUp)")
    conversation.append(Message(role: "assistant", content: followUp))
    continue
}

And just like that now our harness gives more powers to the AI:

You: Read the `README.md` file and summarize it
tool-call: TOOL_CALL {"name":"read_file","arguments":{"path":"README.md"}}
tool-success: read_file -> 17 lines
Assistant: `swiftagentharness` is a Swift command-line project built with Swift Package Manager and Apple’s ArgumentParser.

Key points:
- Requires Swift tools 6.2
- Requires macOS 26 or later
- Build with: `swift build`
- Run with: `swift run`
- Optionally run via: `./run.sh`

Again, no magic. The model returns text. The harness interprets that text as a tool call. Then the harness decides to execute something locally. The tool-call and tool-success lines are just for the user, and for debugging in the terminal. What goes back to the AI is the TOOL_RESULT message, which includes the actual result together with a tiny SUCCESS or ERROR protocol we invented, just to make it more clear to the AI. With that new context in place, the model can then reply with the summary of the file as we asked.

The ToolOutcomeResult definition

The only extra thing I did allow myself is one small enum to represent the outcome.

enum ToolOutcomeResult {
    case success(String)
    case failure(String)
}

Doesn’t matter much, but I like having these things properly modeled.

Yes, that really is it. Now the AI has access to, a small part of, our world. The moment you do this, the conversation changes. Before, if you asked about a local file the AI had to ¯\_(ツ)_/¯ and ask you to paste it. Now it can ask you to run the little contract you gave it.

And notice what happened here. We did not give the AI direct access to the file system. We did not open a mystical portal into the machine. We only gave it a way to request something in text, and we wrote the normal boring Swift code that fulfills that request.

And to make it even more clear, I have here a real run from this version of the harness that has so much to learn from:

Swift Agent Harness
Model: gpt-5.4
Ctrl+C to quit
You: tell me about this project
Assistant: Sure. If you want me to describe a specific project, I need to inspect its files first.

Please give me:
- the project path, or
- the main file to read

You: readme
Assistant: Please provide the file path, e.g. `README.md` or `./README.md`, and I’ll read it and summarize the project.

You: README.md
tool-call: TOOL_CALL {"name":"read_file","arguments":{"path":"README.md"}}
tool-success: read_file -> 17 lines
Assistant: This project is `swiftagentharness`, a Swift command-line app. ...

You: now read the run file
tool-call: TOOL_CALL {"name":"read_file","arguments":{"path":"run.sh"}}
tool-success: read_file -> 15 lines
Assistant: `run.sh` is a convenience launcher for the CLI. ...

When I ask vaguely about the project, the AI still does not know anything about it. That part has not changed. It only knows what is in the conversation. But now knows it has a way to learn. It cannot inspect the whole project by magic, but it knows it can read files if I give it a path.

But it knows it only has access to reading single files, so note that when I say readme, it pushes back and asks for something more specific. It doesn’t have enough context to make up a proper tool call. That is nice. It means the model is trying to operate within the contract we gave it instead of hallucinating a path.

Then I give it a proper file name README.md, it uses the read tool, and suddenly that file is now part of the conversation. Thanks to that, I can be a bit more ambiguous and say “the run file” and it understands I probably mean run.sh, because that is now in context too.

This is one of those moments where the illusion becomes very educational. The AI did not gain general awareness. It just accumulated a bit more text in the conversation, text that came from a tool call your harness fulfilled.

Models being nice

One funny thing here is that this first version did not stay stable for very long. As is typical with LLMs, things are not consistent and they reply with random responses, is part of their nature. In this case, it’s actually a good thing.

Soon enough I hit errors like this:

tool-error: Tool invocation failed: Tool invocation payload could not be decoded: {"name":"read_file","arguments":{"path":"README.md"}}
I don’t have the contents of `README.md` yet. Please provide the file contents or ensure the tool result is available, and I’ll summarize it.. Reply again, either with a valid TOOL_CALL line or a normal answer.

At first this looks confusing, because the JSON in there seems totally valid. The problem was that the model, in this case GPT, was trying to be nice. Instead of replying with only the exact TOOL_CALL line I asked for, it sometimes produced an extra little explanatory sentence. Good for the user, not so much for our parsing code.

And honestly, that is a very model thing to do. Modern models are trained to be helpful, conversational, and a bit eager to explain themselves. They are not trying to break your harness. They are trying to be polite.

So this is a good moment to improve the code a bit. Not because the architecture changed, but because the real world showed us a new shape of output. This is a whole rabbit hole that makes making a production-ready harness a bit more complex. We would need to tailor the system prompt for every model, improve our parsing, and other engineering techniques that are not worth for this post.

But at least, let’s make it work for this use case so we can continue learning. Instead of assuming the whole response starts with TOOL_CALL, we now parse the assistant response line by line.

private func parseAssistantResponse(from response: String) throws -> ParsedAssistantResponse {
    let lines = response
        .split(whereSeparator: \.isNewline)
        .map { String($0).trimmingCharacters(in: .whitespaces) }
        .filter { $0.isEmpty == false }

    var userFacingLines = [String]()
    var invocation: ToolInvocation?

    for line in lines {
        guard line.hasPrefix("TOOL_CALL ") else {
            userFacingLines.append(line)
            continue
        }

        let payload = String(line.dropFirst(10))
        guard let data = payload.data(using: .utf8) else {
            throw HarnessError.invalidToolInvocation
        }
        do {
            invocation = try JSONDecoder().decode(ToolInvocation.self, from: data)
        } catch {
            throw HarnessError.invalidToolPayload(payload)
        }
    }

    return ParsedAssistantResponse(
        userFacingText: userFacingLines.joined(separator: "\n"),
        invocation: invocation
    )
}

Then the loop can show the assistant text to the user first, and still execute the tool if there is one.

let response = try await client.send(messages: conversation)
let parsedResponse = try parseAssistantResponse(from: response)

if parsedResponse.userFacingText.isEmpty == false {
    print("Assistant: \(parsedResponse.userFacingText)")
}

if let invocation = parsedResponse.invocation {
    // execute tool
}

This is a bit of the secret sauce of every harness. Not secret magic. Not genius algorithms. Just more cases of text interpretation, because at the end of the day that is still what the harness is doing. AI dumps text, and somebody needs to deal with it to create the illusion.

Let it explore

At this point the next obvious limitation appears. read_file is useful, but it still depends too much on the human already knowing what file should be read.

The harness can inspect.

But it still cannot explore.

That is why the next tool to add is list_files.

And this is also the moment where the code starts earning a tiny bit of structure. Having one hardcoded tool inline was nice for the first learning step, but with a second tool it already makes sense to generalize a little.

So instead of special-casing everything directly in the loop, we will define a small tool type.

struct ToolDefinition {
    let name: String
    let description: String
    let arguments: [String]
    let run: ([String: String]) throws -> String
}

Then in the app entrypoint we can define the available tools explicitly.

let tools: [ToolDefinition] = [
    .readFile(),
    .listFiles(),
]
let agent = Agent(client: client, tools: tools)

And because the tools are now dynamic, the system prompt should be dynamic too. Instead of hardcoding the tool descriptions by hand, the Agent now builds the tool section of the prompt from the actual ToolDefinition values it receives.

init(client: OpenAICompatibleClient, tools: [ToolDefinition]) {
    self.client = client
    self.toolsByName = Dictionary(uniqueKeysWithValues: tools.map { ($0.name, $0) })

    let toolsPrompt = tools
        .sorted { $0.name < $1.name }
        .map(\.promptBlock)
        .joined(separator: "\n\n")

    self.systemPrompt = """
        You are a helpful assistant with access to tools.

        Tools:
        \(toolsPrompt)

        When you want to use the tool, reply with exactly one line in this format and nothing else:
        TOOL_CALL {"name":"tool_name","arguments":{"path":"some/path"}}

        After receiving a TOOL_RESULT message, continue the task.
        If no tool is needed, answer normally.
        """

That is an important little detail. It means the prompt does not drift away from reality. If I add a tool in Swift but forget to tell the model about it or if I tell the model about a tool that is not really available, the harness will be lying and the model will freak out. Building the prompt from the tool definitions keeps both sides synchronized.

With this in place we can extend the ToolDefinition and use it as a way to host the different tools we have.

extension ToolDefinition {
    static func readFile() -> Self { ... }

The list_files tool itself is still very simple.

static func listFiles() -> Self {
    let fileManager = FileManager.default

    return ToolDefinition(
        name: "list_files",
        description: "List files and directories in a path.",
        arguments: ["path"],
        run: { arguments in
            let path = arguments["path"] ?? "."
            let url = URL(fileURLWithPath: path)
            let values = try fileManager.contentsOfDirectory(
                at: url,
                includingPropertiesForKeys: [.isDirectoryKey],
                options: [.skipsHiddenFiles]
            )
            let lines = try values
                .sorted { $0.lastPathComponent < $1.lastPathComponent }
                .map { entry in
                    let isDirectory = try entry.resourceValues(forKeys: [.isDirectoryKey]).isDirectory ?? false
                    return isDirectory ? entry.lastPathComponent + "/" : entry.lastPathComponent
                }
            return lines.joined(separator: "\n")
        }
    )
}

Notice what happened here. The architecture did not change. We still have the same loop, the same text protocol, the same TOOL_CALL, the same TOOL_RESULT.

What changed is just the model’s ability to gather context by itself.

This is a much bigger jump than it may look. read_file gave the AI a flashlight but list_files gives it a map.

You: List the names of the files in this project.
tool-call: TOOL_CALL {"name":"list_files","arguments":{"path":"."}}
tool-success: list_files -> 5 lines
Assistant: Package.resolved
Package.swift
README.md
Sources/
run.sh
You: summarize the readme
tool-call: TOOL_CALL {"name":"read_file","arguments":{"path":"README.md"}}
tool-success: read_file -> 17 lines
Assistant: The README says this is a Swift ...

Nice! Now the user doesn’t have to specify the exact file names and paths because the model can explore directories itself.

AI needs longer turns

The example above was a bit forced to show the happy path. The reality is that we have a problem with our harness.

You: explore this project and tell me what is about
Assistant: I need to inspect the project files first. Please provide the tool results for the directory listing so I can explore it and summarize what the project is about.
tool-call: TOOL_CALL {"name":"list_files","arguments":{"path":".\/"}}
tool-success: list_files -> 5 lines
Assistant: TOOL_CALL {"name":"read_file","arguments":{"path":"README.md"}}
You:

Notice how we asked a more open ended question to the agent, and it correctly recognized that to answer needed to explore the project, which it did by using the list_files tool correctly. Then the harness gave the result of that tool call just like we’ve done before, but hold on. The agent didn’t reply with text, it replied with more tool calls!

This brings us to the next piece of our learning: turns and steps.

In this context a turn is what includes one user request and the agent response to that request. A step is every individual interaction the agent needs to answer the user’s request.

When we started with the simple chat model, we had 1 step per 1 turn. User sends a message, the agent replies. When we added tool calling, we changed that a bit. User sends a message, the agent replies with a tool call, harness sends the result and agent replies with the final response. So there was an intermediate step in there, but that was it. And that’s not enough because agents often need multiple tool calls, multiple roundtrips to accomplish the goal the user asked for.

In this case it knew that it had to explore the project, but once it knew about the list of files, it then wanted to know about a specific file, but our harness is not yet ready for that, so it just stopped and waited for user input.

The problem is that right now our harness is just following a simple loop, we need to put the tool calling in an inner loop so the agent can have multiple steps in a single turn. We want the agent to be able to keep requesting tool calls until it is satisfied and gives as final answer.

In other words, a single turn can include multiple steps before the agent can give the final answer to the user.

So the code grows one inner loop.

while true {
    turn += 1
    print("<Turn \(turn)>")
    ...
    var step = 0

    while true { // <= new inner loop
        step += 1
        print("<Step \(step)>")
        let response = try await client.send(messages: conversation)
        let parsedResponse = try parseAssistantResponse(from: response)

        if parsedResponse.userFacingText.isEmpty == false {
            ...
        }

        if let invocation = parsedResponse.invocation {
            ...
            conversation.append(Message(role: "assistant", content: response))
            conversation.append(Message(role: "user", content: formatToolResult(outcome)))
            continue
        }

        conversation.append(Message(role: "assistant", content: response))
        break
    }
}

That is a really important moment in understanding how agents work.

Let’s see how this change impacts our previous example.

Note that I’ve added turn and step logs so it’s clearer when there are roundtrips.

<Turn 1>
You: explore this project and tell me what is about
<Step 1>
Assistant: I need the project files to inspect it, but I don’t have the directory listing yet. Please provide the repository contents or let me know the project path, and I’ll explore it and summarize what it’s about.
tool-call: TOOL_CALL {"name":"list_files","arguments":{"path":".\/"}}
tool-success: list_files -> 5 lines
<Step 2>
tool-call: TOOL_CALL {"name":"read_file","arguments":{"path":"README.md"}}
tool-success: read_file -> 17 lines
<Step 3>
tool-call: TOOL_CALL {"name":"read_file","arguments":{"path":"Package.swift"}}
tool-success: read_file -> 22 lines
<Step 4>
tool-call: TOOL_CALL {"name":"list_files","arguments":{"path":"Sources"}}
tool-success: list_files -> swiftagentharness/
<Step 5>
tool-call: TOOL_CALL {"name":"list_files","arguments":{"path":"Sources\/swiftagentharness"}}
tool-success: list_files -> 3 lines
<Step 6>
tool-call: TOOL_CALL {"name":"read_file","arguments":{"path":"Sources\/swiftagentharness\/App.swift"}}
tool-success: read_file -> 34 lines
<Step 7>
tool-call: TOOL_CALL {"name":"read_file","arguments":{"path":"Sources\/swiftagentharness\/Harness.swift"}}
tool-success: read_file -> 223 lines
<Step 8>
tool-call: TOOL_CALL {"name":"read_file","arguments":{"path":"Sources\/swiftagentharness\/OpenAICompatibleClient.swift"}}
tool-success: read_file -> 39 lines
<Step 9>
Assistant: This project is a small Swift command-line “agent harness” for talking to an OpenAI-compatible chat completions API and letting the model use a couple of local filesystem tools.
What it does
...
<Turn 2>
You: Thanks!
<Step 1>
Assistant: You’re welcome!
<Turn 3>
You:

The model is not magically traversing your project in one giant sweep. It is taking little steps. List files. See the result. Decide what file matters. Read that file. See the result. Then decide if it can answer.

This is why I wanted to show the failing version first. Once you see the harness stop too early, the need for the inner loop becomes obvious.

Giving It Boundaries

Now that the agent can list and read files on its own, we need to showcase the dangers of these tools.

<Turn 1>
You: Can you explore this project, and then find similar projects in this computer.
<Step 1>
Assistant: I can do that, but I need access to the project files first.
Please provide the project path, for example:
- `.`
- `./my-project`
- `/home/user/project`
Once I have the path, I’ll:
1. explore the project structure,
2. identify what kind of project it is,
3. search nearby/on this computer for similar projects.
tool-call: TOOL_CALL {"name":"list_files","arguments":{"path":".\/"}}
tool-success: list_files -> 5 lines
<Step 2>
tool-call: TOOL_CALL {"name":"read_file","arguments":{"path":"Package.swift"}}
tool-success: read_file -> 22 lines

...

<Step 9>
tool-call: TOOL_CALL {"name":"list_files","arguments":{"path":".."}}
tool-success: list_files -> 28 lines
<Step 10>

...

<Step 23>
Assistant: I explored the current project and found a few similar ones nearby.
Current project: `swiftagentharness`
- Type: Swift Package Manager executable
...
Similar projects on this computer
...
<Turn 2>
You:

wow. I had to redact some things but you get the gist. The agent started listing files outside the directory we’re running it in, the magical .. path. Sure, of course I asked for it, but you could imagine the LLM doing it on its own volition.

Now this is a whole rabbit hole about security that is still an open question, only really solved with full isolation, aka running the agent in its own computer, a sandbox. I’m not gonna pretend to solve this in this exploratory blog post, but we can use this again as an excuse to learn more about the lack of magic.

The first thing we need is to introduce the concept of a workspace root into the CLI itself.

@Option(help: "Workspace root for the agent.")
var workspace = FileManager.default.currentDirectoryPath

let workspaceRoot = try resolveWorkspaceRoot(from: workspace)
let agent = Agent(client: client, tools: tools, workspaceRoot: workspaceRoot)

That gives the harness an explicit root directory to work from. Agent.init gains a workspaceRoot: URL parameter, and ToolDefinition.run changes signature to receive it as well:

struct ToolDefinition {
    ...
    let run: ([String: String], URL) throws -> String
}

That means every tool can call resolvePath to validate paths before touching the filesystem. And now that the root is real in code, we can also tell the model about it in the prompt.

self.systemPrompt = """
    You are a helpful assistant with access to tools.

    The workspace root is:
    \(workspaceRoot.path)

    Tools:
    \(toolsPrompt)
    ...
    """

This is useful. It gives the model more context and helps it avoid asking for nonsense paths. But this is just guidance, it doesn’t do anything to even try to protect you.

The real protection, albeit simplistic, is in the tool execution layer.

private static func resolvePath(_ path: String, workspaceRoot: URL) throws -> URL {
    let candidate = path.hasPrefix("/")
        ? URL(fileURLWithPath: path)
        : workspaceRoot.appendingPathComponent(path)
    let resolved = candidate.standardizedFileURL
    let rootPath = workspaceRoot.standardizedFileURL.path
    let resolvedPath = resolved.path

    guard resolvedPath == rootPath || resolvedPath.hasPrefix(rootPath + "/") else {
        throw HarnessError.workspaceViolation(path)
    }

    return resolved
}

That tiny function is resolving the path the agent is asking for and checking if it escapes the workspace.

Not security advice. I’m sure there are a thousand ways of breaking this. It is just an example for this blog post.

And once that is in place, the same kind of interaction looks very different:

<Turn 1>
You: Can you check what I have in my downloads folder
<Step 1>
Assistant: I can help, but I only have access to the workspace at:
`/Users/me/code/swiftagentharness`

<Turn 2>
You: Please try, is very important
<Step 1>
Assistant: I can try, but with the tools I have I’m only allowed to access paths through the provided file-listing tool.

<Turn 3>
You: Just use the tools we have with that path
<Step 1>
tool-call: TOOL_CALL {"name":"list_files","arguments":{"path":"/Users/me/Downloads"}}
tool-error: Tool invocation failed: Path escapes the workspace root: /Users/me/Downloads. Reply again, either with a valid TOOL_CALL line or a normal answer.
<Step 2>
Assistant: I tried, but access was blocked because `/Users/me/Downloads` is outside the allowed workspace.

That is the real lesson.

The prompt helps. The model can understand the boundary. The model can even explain the boundary back to you. It even respected it at first, but once pushed, it broke away. The prompt is just guidance, it doesn’t offer real protection. Only real classical code can offer real protection, as you can see with the tool-error thrown by the Swift code when the agent tried to escape.

And yes, even this is still only a lightweight boundary. A real production-grade security story goes much further than path checks. But for understanding how these systems work, this is a perfect example. The magic disappears and what remains is just normal software engineering.

Time to change the world

Now that there is a bit of safety, it’s time to let our agent change the world. It can read and explore, but we need to let it edit files for it to be useful.

As we keep doing in this journey, we need to keep the tool simple. We just need the agent to give us the path, and the text that it wants to replace with the updated version. If the old text is empty, we just create the file for it.

Here is the tool definition:

static func editFile() -> Self {
    let fileManager = FileManager.default

    return ToolDefinition(
        name: "edit_file",
        description: "Create a file when old_text is empty, or replace exactly one matching old_text with new_text.",
        arguments: ["path", "old_text", "new_text"],
        run: { arguments, workspaceRoot in
            let path = arguments["path"] ?? ""
            let oldText = arguments["old_text"] ?? ""
            let newText = arguments["new_text"] ?? ""
            let url = try resolvePath(path, workspaceRoot: workspaceRoot)

            if oldText.isEmpty {
                let directory = url.deletingLastPathComponent()
                try fileManager.createDirectory(at: directory, withIntermediateDirectories: true)
                try newText.write(to: url, atomically: true, encoding: .utf8)
                return "Created \(path)"
            }

            let content = try String(contentsOf: url, encoding: .utf8)
            let matches = content.components(separatedBy: oldText).count - 1
            guard matches == 1 else {
                return "Edit failed. Expected exactly one match for old_text, found \(matches)."
            }

            let updated = content.replacingOccurrences(of: oldText, with: newText)
            try updated.write(to: url, atomically: true, encoding: .utf8)
            return "Edited \(path)"
        }
    )
}

Nothing fancy.

And let’s reiterate again the important part. The model is still not editing files directly. It is not dragging a cursor around. The LLM is not an editor. It is not doing anything mystical. It is just proposing a very boring text transformation contract and the harness applies it.

And now, finally, the harness can do the thing that makes coding agents feel like coding agents. Not just read the world. Change it.

<Turn 1>
You: Can you create a new md file to summarize what this project is about.
<Step 1>
tool-call: TOOL_CALL {"name":"read_file","arguments":{"path":"README.md"}}
tool-success: read_file -> 17 lines
<Step 2>
tool-call: TOOL_CALL {"name":"list_files","arguments":{"path":"."}}
tool-success: list_files -> 5 lines
<Step 3>
tool-call: TOOL_CALL {"name":"read_file","arguments":{"path":"Package.swift"}}
tool-success: read_file -> 22 lines
<Step 4>
...
<Step 9>
tool-call: TOOL_CALL {"name":"edit_file","arguments":{"new_text":"# Project Summary\n\n...","old_text":"","path":"PROJECT_SUMMARY.md"}}
tool-success: edit_file -> Created PROJECT_SUMMARY.md
<Step 10>
Assistant: Created `PROJECT_SUMMARY.md` in the project root with a summary of the project, its purpose, architecture, tools, and usage.
<Turn 2>

This is the first moment where the harness stops being just an observer and starts becoming a real tool.

The world is whatever the harness says it is

There is one last exercise I really recommend if you build one of these.

Lie.

I mean it. Go to one of your tools and make it return something false on purpose.

Make read_file always return a joke. Make list_files invent files that do not exist. A fun one is to rewrite extensions and pretend every Swift file is actually a JavaScript file.

The model will believe you.

And yes, of course it will. That is the whole point.

By now it should be obvious, but I think doing this with your own hands makes it click in a completely different way. The model does not have direct access to your machine. It does not have a secret backdoor into reality. It only has the world your harness describes to it through tool results.

That means the world, for the model, is whatever the harness says it is.

Not because the model is stupid. Not because the model is broken. But because that is literally the way this autocomplete machine works.

And the Illusion is Broken

If you reached this point I hope you now see past the illusion. Now you know how agentic tools and AI harnesses work, you built one yourself!

No matter what new features you see from marketing, everything boils down to the simple concepts that you learned. The model emits text. The harness interprets that text as a tool call. Your code executes something locally, or pretends to, and then sends more text back.

No mystical illusion. Just tools and code.

Models. Loops. Context. Tools. Turns. Engineering.

Nothing more.

In the next post in the series we will learn how the harness helps the AI pretend. Because AI doesn’t remember your project, Markdown does.

If you enjoyed this post

Share on Twitter

Share on Mastodon

Have You Built an Agent Harness Yet?

What is an Agent Harness

Let’s build a simple Harness

Having a conversation

AI interacting with our world

Models being nice

Let it explore

AI needs longer turns

Giving It Boundaries

Time to change the world

The world is whatever the harness says it is

And the Illusion is Broken

Continue reading

AI doesn't remember your project, Markdown does

A Small SwiftUI Warning and a Long Journey to Understand It

Replacing Bash with Swift in an AI Harness

Teaching Skills to an AI Harness

Sandboxing an AI Harness on macOS

Do We Even Need Multiple Tools?

The Swift Concurrency Transition I Learned to Love

Back to the basics with Genesis