Skills, Agents, and Tools: Why FileMaker AI Agentic Coding Needs a Harness

Ask any agentic coding tool — Claude Code, Codex, Cursor — to build you a FileMaker script or add a field, and the result must get into your file somehow. FileMaker is a closed binary: logic and schema live inside the .fmp12 together with the data, with no text source to edit, no diff, no merge.

There are currently two vehicles for moving externally generated code in, and both use XML (but using different syntax). The first is the clipboard — a fmxmlsnippet payload you paste into the Script Workspace, or onto a layout, or anywhere where FileMaker schema can be introduced via copy-paste. The second is the FileMaker Upgrade Tool (FMUT), which applies a structured, SaXML-based patch to a closed file. Almost every community effort so far has targeted the clipboard; the patch route is less travelled, but ultimately superior since it provides more complete coverage (more schema objects, more actions: create / update / delete). (And yes: you can create tables and fields through xDBC and the OData API, but those are very limited vehicles).

If you have not paid much attention to the capabilities of the FileMaker Upgrade Tool here are some resources:

Back to the work of generating FileMaker code. When you hand the work to the model, what it produces is XML in one of those two shapes — and it produces it the only way a language model can: by predicting the next token, in other words: by guessing.

Most of the time, it gets close. Some of the time it’s exactly right. And some of the time it emits something that looks plausible, pastes or applies without complaint, and is subtly wrong — a missing attribute, an imagined function, a step that references a field ID that doesn’t exist in your file. FileMaker doesn’t always throw an error. It silently ignores the bad parts or accepts a broken object, and you find out later. The XML is unforgiving in exactly the places a language model is weakest: precise and predictable structure, real identifiers, and version-specific shape.

So the interesting question is not “how do we make the model better at writing FileMaker XML.” It’s “why is the model authoring the shape at all?”

And it’s worth being explicit that the XML is incidental to the argument. The clipboard and FMUT are simply the mechanisms we have today; if Claris ships a different route tomorrow — a structured API, a JSON import, something else — the principle is unchanged. The model should never guess at the code shape, whatever that shape happens to be. XML is just the current format for the shape.

That question is the difference between a skill and a harness. The FileMaker community has produced excellent work at both ends of that spectrum, and getting the distinction right matters — because the answer shapes the size of the effort to create this tooling.

A note on the word “harness”

“Harness” has become the standard term for the scaffolding that wraps a language model and turns it into a working AI application — the loops that call the model when needed, the tool dispatching (what tools to call when), the context management (every model has a maximum context size), the guardrails on what is allowed and what not. The usual analogy is that the model is the brain, and the harness is the hands and the exoskeleton that let the brain act. The central finding over the past year is that the harness, not the model, is usually what determines whether an agent is reliable in production — the model is the smallest part of the system.

Your coding tool. Claude Code, Codex, and Cursor are general-purpose agent harnesses — Claude Code’s own codebase, when it leaked, was something like half a million lines, and essentially all of it is harness. So to be precise about what we’re describing here: not a replacement for that big harness, but an addition to it through a domain-specific harness — a FileMaker harness — that sits on top of the general-purpose one and adds the FileMaker-specific tools, agents, and guardrails the generic harness has no reason to know about. The coding tool gives the model hands. The FileMaker harness makes sure that when those hands reach into a .fmp12 file, what comes out is correct by design rather than by luck.

The rest of this post is about how you build that, and why the engineering effort is mostly around tools and less about skills and agents.

What a skill actually is

The word “skill” has a specific, increasingly standardized meaning. A skill is a folder containing a SKILL.md file: a bit of YAML frontmatter (a name and a description, at minimum) followed by Markdown instructions, optionally bundling templates, and/or reference files alongside it.

The key efficiency mechanic that makes skills work well is progressive disclosure: the AI application loads only the short description at startup — a few dozen tokens — and pulls in the full body only when the description matches the task at hand.

You build a skill once and it travels across projects.

This is no longer an Anthropic-only convention. Agent Skills was released as an open standard in December 2025 and donated to the Agentic AI Foundation under the Linux Foundation. The same SKILL.md format is now read by Claude Code, Codex CLI, Gemini CLI, GitHub Copilot, and Cursor, among others. Each tool extends it in small ways, but the core is portable.

Its companion is AGENTS.md, the de facto operating-context file for coding agents: a plain Markdown file that describes the project itself — what the project is about, how it’s built, how to run tests, what conventions to follow. It’s the project’s README, written for an AI application rather than a human. The thing to be clear about is that AGENTS.md describes a place, not an actor — which makes it a different layer from an agent definition, the kind of file that specifies a particular dispatched sub-agent and its role. A project has a single AGENTS.md but can have as many agent definition files as your project’s architecture needs: the first says where you are, the other one says who’s acting, what their role and specialty is.

AGENTS.md is read natively by more than twenty tools. (Worth knowing: Claude Code reads its own CLAUDE.md rather than AGENTS.md directly – the standard is real, but it still has edges.)

AGENTS.md versus SKILL.md — and the thing they have in common

It’s worth being precise about the division of labor, because it sets up everything that follows.

  • AGENTS.md answers “where am I” — the project context every agent in the repo works against.
  • An agent definition answers “who is acting” — a specific actor’s role.

They’re different files doing related jobs: together they establish the operator and its environment, and neither is task-specific.

A SKILL.md, by contrast, answers how to do a particular kind of task well — an on-demand capability the operator reaches for when the work calls for it, then sets back down.

That distinction is useful. But notice what they have in common: neither one executes anything. Both are instructions given to a probabilistic model. They change what the model attends to and the procedure it tries to follow — and that genuinely helps — but the model still does the actual work entirely by prediction. An AGENTS.md that says “always use ≠ instead of <>” and a SKILL.md that documents the exact shape of a “Set Field” step will both improve your odds. But they will not make a wrong shape impossible. The model can still read the perfect instruction and then generate imperfect XML, because generation is where the uncertainty lives.

A tool is the thing that closes that gap. A tool is real code (i.e. deterministic) with a guarantee: given the inputs, it produces the correct output every time, because it isn’t predicting — it’s executing in a reliable and predictable way. The whole argument of this post is that for a domain like FileMaker, the parts of the work that must have a correct answer should be done by tools, and the model’s job should shrink to the parts that genuinely require judgment.

The FileMaker community is already on this ladder

Three community projects map the spectrum well, and all three deserve credit. Read in order, they climb the ladder.

Our own Mislav Kos created a skill that leverages the newly LLM-friendly Claris help docs.

CadenceUX’s claris-filemaker-pro-skill is a skill in the pure sense, and a good one. It ships reference catalogs — all 360 calculation functions, all 155 script steps, the error codes, the help-center URLs — with a local-first, live-verify strategy and even a version-drift detector that flags when a fetched doc references a newer FileMaker version than the skill was built against. What it fixes is what the model knows: it stops your model from hallucinating a function signature or inventing an error code. That is real value. But it is, by design, a knowledge layer. The model still authors the XML; the skill just makes sure the model is better informed while it does that.

Andrew Kear’s FileMaker Layout XML Skill (from Clockwork Creative Technology, with a companion skill for script XML) is a more sophisticated skill, and it’s worth studying closely because it argues this very thesis in its own README — that there should be a clear boundary between what the AI determines (the layout logic and content) and what must be deterministic (the XML structure). It supplies that deterministic layer as a specification: all 18 layout object types, the decoded flag bits, the element-ordering constraints, and the paste-handler rules that cause FileMaker to silently drop malformed objects. And it was built the hard way — not from documentation, because Claris publishes no formal spec for the clipboard or FMUT XML formats, but by empirical round-trip reverse-engineering (generate, paste, save, copy back out, diff against native) across more than thirty-five production layouts. It is genuinely excellent work. It is also the clearest illustration of a skill’s ceiling: it makes the shape known to the model, but the model still authors the XML. A perfect spec lowers the error rate dramatically; it cannot make a wrong shape impossible, because the model is still doing the typing. Kear names precisely the boundary that the next rung up enforces in code rather than describes in prose.

Matt Petrowsky’s agentic-fm is something else entirely — and it’s important to say so plainly, because it would be easy to lump all community efforts together as “just skills.” agentic-fm is a harness. It has a tools layer (validate_snippet.py, fmparse.sh, an XML exploder, a converter from “Save a Copy as XML” to clipboard snippets), a step catalog it describes as the single source of truth for step XML structure, an agent/skills layer of opt-in workflows, a curated knowledge base of FileMaker gotchas, and a live-context loop that reads the running solution over OData.

That does not mean “skills bad, harness good.” It’s a ladder — a spectrum of how much determinism you’ve built in:

  • Skills alone fix what the model knows. Execution stays fully probabilistic.
  • Skills plus agents add roles, routing, and operating context. Better organized; still probabilistic execution.
  • Skills plus agents plus tools hand the deterministic parts to code. This is where guarantees enter.

And that top rung has its own internal gradient, which is where the most interesting engineering happens.

Tools are where the work is

Here’s the asymmetry that explains why the community has produced many skills and few harnesses: a skill is cheap and a tool is expensive.

A SKILL.md documenting the shape of a value list is a Markdown file you can write with minimal effort. A tool that deterministically constructs a valid value-list patch — extracting the real object IDs and UUIDs from your actual solution, minting fresh identifiers that won’t collide, placing the element in the right structural slot, and producing XML that the official FileMaker tooling will apply without complaint, and then verify that the patch landed properly — is a piece of software. It needs a catalog of every object type it covers, verified template fixtures for each, envelope builders that match the format byte-for-byte, identity resolution against a live export, and a validation layer. Multiply that across fields, scripts, layouts, custom functions, value lists, table occurrences, relationships, and the rest, and you are building a compiler back-end for a closed format, not writing documentation.

That is the real reason a harness is more powerful than a collection of skills, and it’s not a mystery: the power comes from the tools, and the tools are most of the work. Comparatively few will build the deterministic machinery that makes the instruction unnecessary.

So, if tools are the key to accurate code, should you let the model create the code (the schema XML) and the tools just validate it and tell it to try again if not correct? Or should tools pick the right XML and assemble the code?

The strongest rung: the model should never author the shape

This is the design principle worth internalizing, because it inverts the obvious approach. (Today the shape is XML — clipboard snippets and FMUT patches — but as noted earlier, read “shape” for “XML” throughout; the principle outlives the format.)

The obvious approach is generate-then-validate. The model writes the fmxmlsnippet or the patch XML, and a validator checks it afterward: is it well-formed, are the If/End If pairs balanced, do the referenced IDs exist? This catches a lot. But the model is still the author, and validation that runs after generation can only reject what’s already wrong. You’re playing defense against your own engine.  And you are going to be running more of these generate-then-validate iterations than an approach that does not use the model to generate the XML.  You will also need to work with a model powerful enough to generate XML properly given your guidance.

The stronger approach is assemble-from-templates, and it’s a genuine step change. Instead of asking the model to produce the artifact, you ask it to produce values — a small structured payload of intent: “a Text field named Status, indexed, with this comment.” Deterministic code takes those values and assembles the XML from canonical, verified templates, while pulling every real identifier — the table’s ID, a related field’s UUID — straight out of a fresh export of the target solution. The model is never asked to supply an identity, because the model has no way to know one. For a path like adding a new object, the structural shape is entirely template-derived, which means a wrong shape isn’t caught — it’s impossible by construction.

The mental model that makes this concrete is to classify every value in the output by who is allowed to supply it:

  • Model-filled — the probabilistic guess based on the user’s intent: the field name, the data type, the one attribute the user wants changed.
  • Harness-filled — deterministically minted: the XML, new IDs, fresh UUIDs.
  • Source-filled — when changing something that already exists, copied verbatim from the current solution’s save-as-xml export.

The accuracy lever is then obvious and ruthless: keep the model-filled set as small as possible. Every value you move out of that bucket is a value the model can no longer get wrong. The fundamental problem — the model guessing at the snippet shape — doesn’t get better under this design. It ceases to exist, because the model isn’t producing the shape at all.

There’s a second mechanism that pairs with this: guardrails as code, not as instructions. A skill that says “use the custom FileMaker clipboard tools, not pbcopy” is advice the model may or may not follow. (And it matters: native clipboard tools like pbcopy put plain text on the pasteboard, and FileMaker — which only accepts its own private class codes like XMSC for a full script or XMFD for fields — silently ignores the paste, so the user sees nothing happen and assumes the whole thing failed.) A runtime hook that intercepts the command and blocks it before it runs enforces the rule regardless of what the model decides in the moment. The instruction is a suggestion; the hook is a fact. A mature harness uses hooks to make the safe path the only path.

Given that the model ony needs to look at intent and generate values to slot-fill into pre-existing XML, this can be done with less powerful models, and requires less tokens to be consumed. So it is cheaper and more sustainable.

What Claris is building, with our help

None of this is hypothetical. Claris has said publicly that it’s making FileMaker a first-class development target inside agentic coding tools like Claude Code, Cursor, and Codex — you describe what you need, and the result deploys directly into your solution, inheriting its existing security and permissions, with developer previews expected this summer. Soliant has written about the same direction, and we’ve been doing the deterministic groundwork in the open: patchlab, our tool for exploring the SaXML format and building, testing, and organizing FMUT patch files, is exactly the kind of plumbing a harness stands on.

We’re not going to reveal the name of the harness yet, that’s for Claris to do — and it’s early, and the shape will change.

The reason a first-party effort matters here is that this is harness constructed by Claris, who has the most control over the core tools and the mechanism that get the code into FileMaker. Claris can deliver a deeply native and integrated harness in a way that simply isn’t possible from the outside. Recall that Andrew Kear had to reconstruct the clipboard format by round-trip diffing, because Claris publishes no formal spec for it; every community project is in that same position, reverse-engineering shapes from exports and maintaining catalogs by hand. That hand-maintained knowledge can silently drift the moment FileMaker ships a new version and changes a shape.  When the templates are authoritative, “wrong shape is impossible by construction” stops being an aspiration and becomes a property you can rely on. That is the difference between a brilliant community workaround and core infrastructure.

When a skill is enough, and when you need a harness

None of this is an argument against skills. Skills are the right tool when the problem is knowledge: the model doesn’t know a function signature, or your team’s naming conventions, or the gotchas of found-set behavior. CadenceUX’s reference skill is genuinely useful, and most teams should have skills like it. If the cost of a model mistake is low and easily spotted, a skill plus a good AGENTS.md is proportionate and cheap.

The distinction comes down to one line. A skill makes the model better informed; a tool makes the model unnecessary for the part that has a right answer. For FileMaker, the leverage is in the second — and building those tools, not writing more instructions, is the work that actually moves the needle.

So the architecture isn’t a detail — it’s the difference between an AI that approximates FileMaker and one you can trust to write accurate code. That’s why the rung of the ladder a tool sits on matters, and why we’d rather build a harness than ship another skill.

And it is why this is a collaboration and not a Soliant side project: Claris brings the source of truth, and we bring the miles that we have already put in on agentic development. The authoritative version can only come from the platform itself. Soliant has been far enough down the agentic-coding road to have formed strong opinions about doing it well, and we’re glad to put them to work helping Claris get FileMaker’s version right. If you’d like to learn more about the agentic development we’re doing for our clients, contact our team to set up a call.

Leave a Comment

Your email address will not be published. Required fields are marked *

Close the CTA

GET OUR INSIGHTS DELIVERED

Scroll to Top