Series: Build an Agent Harness

Have You Built an Agent Harness Yet?
AI doesn't remember your project, Markdown does
Do We Even Need Multiple Tools?
Sandboxing an AI Harness on macOS
Teaching Skills to an AI Harness
Replacing Bash with Swift in an AI Harness

2 May 2026 11^min read

Do We Even Need Multiple Tools?

After teaching the harness to load AGENTS.md, my mind went back to thinking about the tools, and wondered, do we even need multiple tools?

Our tiny Swift harness had the usual little trio: read_file, list_files, and edit_file. Which is fine. It works. It is also exactly the kind of thing that makes agent tooling feel more magical and more special than it actually is.

Because if you squint a bit, those three tools are not really three capabilities. They are just tiny wrappers around what the shell has done for decades.

I’ve heard for a while that LLMs are now very good at just using bash, so I thought, maybe that’s the only tool they need? So… let’s remove them all and just use bash. Let’s see what happens.

Current tools

Before changing anything, I wanted a baseline. Let’s remind ourselves how the current tool usage looks like.

Swift Agent Harness
Model: gpt-5.4
Loaded AGENTS.md (3 lines)
Ctrl+C to quit
<Turn 1>
You: List and read all files in this project and tell me what's most interesting about them.
<Step 1>
Assistant: I can do that, but I need the actual file list first.
...
tool-call: TOOL_CALL {"name":"read_file","arguments":{"path":"AGENTS.md"}}
tool-success: read_file -> 2 lines
<Step 2>
tool-call: TOOL_CALL {"name":"list_files","arguments":{"path":"."}}
tool-success: list_files -> 7 lines
<Step 3>
tool-call: TOOL_CALL {"name":"read_file","arguments":{"path":"Package.resolved"}}
tool-success: read_file -> 15 lines
<Step 4>
tool-call: TOOL_CALL {"name":"read_file","arguments":{"path":"Package.swift"}}
tool-success: read_file -> 22 lines
...
<Step 13>
Assistant: I read all files that were visible from the project root...

The answer itself is not the interesting part, what I care about is the shape of the interaction. The model is not exploring the project in some rich, one off way. It’s as if it was playing a point-and-click adventure with the three verbs I gave it. It doesn’t know the environment in the same way we didn’t know everything that was clickable, so it goes and tries a call at a time and reassesses on every step, just like we rapidly click across the screen until we find an interactable object.

First read_file, because it already knows AGENTS.md exists. Then list_files, then read_file, then read_file again, then more listing, then more reading. Perfectly reasonable.

And so the question is, would reducing this down to a single tool that can do anything make this better or worse? Because with these three explicit tools, the model is forced to keep translating its intent into those wrappers over and over. That is not necessarily bad. There is something nice about how explicit it is. You can inspect every step. You can see exactly what the model asked for. You can keep the capabilities narrow.

But what if the model was able to cook a sequence of bash scripts in a single tool call and get everything it needs all at once? That was enough for me to want the next experiment. Remove the little toolbox. Leave just bash. See if the interaction becomes simpler, or just messier in a different way.

New Bash Tool

Let’s start by making the new bash tool without removing the existing ones yet.

I will use swift-subprocess, because is a nice showcase of modern Swift tooling, but also because it solves a real problem. If you let an LLM call bash, it is very easy for it to produce commands that dump a lot of output. Directory walks, file listings, find, sed, cat, maybe all of them chained together. That means stdout and stderr handling stops being a boring implementation detail very quickly. It is easy to get into awkward buffering problems or deadlocks if you try to wire process spawning by hand.

swift-subprocess is designed exactly for this kind of thing, and its API makes the “run a process and collect output properly” path feel trivial, not like a pile of edge cases we have to rediscover ourselves.

So the new tool looks like this:

static func bash() -> Self {
    ToolDefinition(
        name: "bash",
        description: "Run a bash command with the workspace root as the current directory.",
        arguments: ["command"],
        run: { arguments, workspaceRoot in
            let command = arguments["command"] ?? ""

            let result = try await Subprocess.run(
                .path("/bin/bash"),
                arguments: ["-c", command],
                workingDirectory: FilePath(workspaceRoot.path),
                output: .string(limit: 1_048_576),
                error: .string(limit: 1_048_576)
            )

            let stdout = result.standardOutput ?? ""
            let stderr = result.standardError ?? ""

            func jsonString(_ value: String) -> String {
                guard let data = try? JSONEncoder().encode(value),
                      let string = String(data: data, encoding: .utf8) else {
                    return "\"\""
                }
                return string
            }

            return """
            {"stdout":\(jsonString(stdout)),"stderr":\(jsonString(stderr)),"exit_code":\(result.terminationStatus)}
            """
        }
    )
}

And then I just added it to the tools array. Nothing more.

let tools: [ToolDefinition] = [
    .bash(),
    .readFile(),
    .listFiles(),
    .editFile(),
]

This was a very deliberate intermediate step. I did not want to jump straight into “bash only” yet. I wanted to see how the model behaved when bash existed, but the old explicit tools were still there too.

And the first answer was, not that differently at all.

When I asked for a quick architecture overview with the previous prompt, the model still preferred the old tools. Even with bash available, it kept thinking in terms of read_file and list_files. Which honestly makes sense. Those tools were still there, still very explicit, still easier to map to the task.

So I had to push it a bit more and ask explicitly for bash.

I asked this:

Use bash tool to give me a quick architecture overview of this project, including file sizes or line counts

And that was enough for the LLM to use bash.

<Step 1>
Assistant: Quick architecture overview based on the current workspace...
...
tool-call: TOOL_CALL {"name":"bash","arguments":{"command":"printf 'Top-level structure:\\n'; find . -maxdepth 2 \\( -path '.\\/ .git' -o -path '.\\/ .build' \\) -prune -o -print | sed 's#^.\\/##' | sort; printf '\\n\\nFile sizes (bytes) and line counts:\\n'; find . -type f \\( -path '.\\/ .git\\/*' -o -path '.\\/ .build\\/*' \\) -prune -o -type f -print | sed 's#^.\\/##' | sort | while read f; do bytes=$(wc -c < \"$f\" | tr -d ' '); lines=$(wc -l < \"$f\" | tr -d ' '); printf '%8s bytes  %6s lines  %s\\n' \"$bytes\" \"$lines\" \"$f\"; done"}}
tool-success: bash -> {"stdout":"Top-level structure:\n.\n.DS_Store\n.gitignore\nAGENTS.md\nPackage...
<Step 2>
Assistant: Here's a quick architecture overview of the project, based on the file layout and line counts.
...

And as expected the LLM was able to make a tool call to get all the information it needed.

That was useful to see. Adding a more powerful tool does not automatically make the model prefer it. I’m sure we could tweak the system prompt to nudge it that way, but that’s an exercise on LLM research, and we’re here to talk about the harness. So let’s do the next logical step…

One For All

It was time to remove all the other specific tool calls and see what happened. Luckily this part was trivial, just remove them from the tools list and leave bash.

let tools: [ToolDefinition] = [
    .bash(),
]

And the result was immediately different.

I went back to the original style of prompt, no special instruction this time:

Give me a quick architecture overview of this project, including file sizes or line counts

This time the model had no alternative. If it wanted to inspect the repo, it had to do it through bash. Depending on the run, sometimes it used one bash call and sometimes a couple. But it did not have to follow the old pattern of many tiny file reads anymore. That is what I wanted to see.

<Turn 1>
You: Give me a quick architecture overview of this project, including file sizes or line counts
<Step 1>
...
tool-call: TOOL_CALL {"name":"bash","arguments":{"command":"printf 'Top-level:\\n'; find . -maxdepth 2 -type f | sed 's#^.\\/##' | sort; printf '\\n\\nTree (depth 3):\\n'; find . -maxdepth 3 | sed 's#^.\\/##' | sort"}}
tool-success: bash -> {"stdout":"Top-level:\n.build\/.DS_Store\n.build\/.lock\n.build\/build.db\n.b...
<Step 2>
tool-call: TOOL_CALL {"name":"bash","arguments":{"command":"printf 'Line counts (project files only):\\n'; for f in AGENTS.md Package.swift Package.resolved README.md run.sh share.sh Sources\\/swiftagentharness\\/*.swift; do [ -f \"$f\" ] && wc -l \"$f\"; done | sort -n; printf '\\n\\nPackage.swift:\\n'; sed -n '1,220p' Package.swift; printf '\\n\\nREADME.md:\\n'; sed -n '1,260p' README.md"}}
tool-success: bash -> {"stdout":"Line counts (project files only):\n       2 AGENTS.md\n       9 sh...
<Step 3>
Assistant: Here's a quick architecture overview of the project.
...

But we also need to note what we lose when we remove the specific tools. With read_file and list_files, part of the harness design was encoded in the tools themselves. The model could only ask for fairly narrow things and we had control over it. For example, we could make the list of files not return any hidden files, thus reducing the amount of likely useless data we fed back into the LLM’s context. With bash, that structure is gone. The model has much more freedom, which is exactly why it can gather more in fewer calls, but it also means the quality of the interaction depends much more on how well it shapes the command. As you can see in the exchange above, the LLM got a lot of hidden files that were not necessary and distracted from the goal.

That tradeoff is exactly what I was hoping to understand with this exercise. Bash really can replace the tiny toolbox. But it does not simply make the harness better. It moves complexity away from the tool harness and into the LLM’s command construction.

Performance Impact

Seeing how the LLM can cook a single tool call using bash to gather all the information in one single step is eye-opening, it makes you realize the performance implications of tools. We’ve already seen how the context is the thing that matters to get the most out of AIs, but here we’ve seen how the tools that you give actually can have a tremendous impact too.

Even in this simple harness we can see how the impact on performance of the AI can be totally different depending on the tools we give it. For that we need to really internalize that every time the LLM needs more information, it must make a tool call, that means an extra step, and extra back and forth between the LLM and our harness. That also means triggering a new inference run. And remember, even if LLM architectures have caches and fancy tricks, every inference run virtually starts from the beginning of the conversation and it autocompletes the next part. So having a lot of back and forth on the same turn just to gather extra data is less than desirable.

Side by side comparison of many small tool calls versus a single bash call

Of course this is simplified, the bash version may still take two calls on some runs, and the multi-tool version may vary too. But the high level shape is the important part. One approach encourages lots of tiny round-trips. The other makes it possible to gather much more context in one go. Even if it’s up to the LLM to decide, the bash tool gives it the option to reduce the roundtrips.

This is where tradeoffs, analysis, engineering and research all meet to tweak a harness implementation and make it the most performant while making it useful and concrete enough for the LLM to know what to do. And this keeps evolving as LLMs evolve. This is one of the things that might make a harness feel better than another.

So, do we?

Can we answer the question posed in the title of this post… as always, it depends. I think there is no real winner here, or at least not as clearly as I thought. I imagine one would have to run deeper and scientific benchmarks to see some sort of trend.

The multiple specific tools were useful mostly because they encoded structure. They constrained the model, made the interaction easier to inspect, and gave the harness more control over what kind of information could flow back into the context. And that is also exactly their limitation. Bash, on the other hand, got surprisingly close to replacing the whole little toolbox precisely because it removed that structure. Once the model had access to it, it could gather broad context much faster and with fewer round-trips. And that flexibility is also exactly the problem, because you lose control and specificity.

And that, for me, was the interesting shift in understanding. The question is not only what tools we can give the model. It is where we want the structure to live. In a carefully designed tool protocol, or in the commands the model writes for itself.

The advantage is that we’re just here to learn how these things work and see past the curtain, and on that I think we won. So for this step of the journey, I will leave it here. Each one can take the conclusion they want. I might just keep all tools available, and add a flag to run it in different modes.

If you enjoyed this post

Share on Twitter

Share on Mastodon

Do We Even Need Multiple Tools?

Current tools

New Bash Tool

One For All

Performance Impact

So, do we?

Continue reading

Sandboxing an AI Harness on macOS

AI doesn't remember your project, Markdown does

Replacing Bash with Swift in an AI Harness

Teaching Skills to an AI Harness

Have You Built an Agent Harness Yet?

Your Agent Stack Is the New Clean Code

A Small SwiftUI Warning and a Long Journey to Understand It

Back to the basics with Genesis