The AI Journey – A Complete Decision Trajectory

You will be using AI models in your solutions; it is inevitable. Similarly, you will be using AI to create solutions.

If you care – like we do – about security and privacy for your data and intellectual property, you will have to make some decisions on who to trust and where you want those models to be.

This is a journey story that outlines your options and how to make deliberate decisions armed with the facts about what the consequences are of each decision, around who has control over those AI models and the data and intellectual property you are sending to them.

Commercial Providers

Our default stance is to not use commercial AI providers such as OpenAI, Cohere, and Anthropic for deep integrations into your solutions, nor for our own AI use (chat or agentic coding). The reasons have been stated in Sara’s blog post on Ethical AI and our Executive Summary on the FileMaker 22.0 (aka 2025) platform release.

We are not dogmatic about this, and we will certainly lend a hand if you want to go that route, but for reasons of privacy, security, and to a lesser extent cost, sending your data and intellectual property to an outside organization, with no oversight, makes little sense to us. That’s why we will always favor one of the other alternatives.

We are not blind to how easy and convenient integrating with these commercial providers looks.  Especially after you start reading about your alternatives. But convenience is almost always at odds with other important factors, such as security and privacy.

This is a journey story that outlines your options and how to make deliberate decisions, armed with the facts about what the consequences are of each decision.

So what are your choices?

Overall considerations

When you take control over running AI models under your control, it requires basic knowledge on the concepts that impact the model’s performance and implementation cost.

When you start your AI journey and are considering running your own AI Server, the jargon and terminology may be daunting. From hardware (GPU vs CPU, VRAM vs RAM, MLX and CUDA) to the size of the models based on their accuracy (measured in billions of parameters and the level of quantization).

If these terms and acronyms are foreign to you, then the document we wrote for Claris will help tremendously to demystify all of that. I strongly suggest you start with that since it also explains what FileMaker features uses which types of AI models.

One critical thing to understand is that the Claris AI server needs to run on its own hardware. It does NOT make sense to try to run this on your main FileMaker Server because an AI model server requires very different resources than a FileMaker Server does, namely lots of fast GPUs and a lot of VRAM.

Claris FileMaker Server - AI Services

Use the Claris FileMaker Server installer and toggle on the AI Services on a dedicated machine that becomes your Claris AI Server.

The reasons why are explained in a lot of detail in the white paper linked to from the Claris Knowledge Base article mentioned earlier.

On-prem?

You can run AI models on your own hardware. If your solution is based on the FileMaker platform, then you can use Claris’ own AI server, which is fundamentally a Python-based Fast API microservice that conforms to the OpenAI standard for endpoint names and JSON data structures.

We discussed the minimum specs in the Claris Knowledge Base article linked earlier. What has not been covered yet is the cost of such hardware.

A Mac Studio M3 Ultra, which is the best machine in Apple’s line-up for hosting AI models, starts at $4,000 and can go up by another $3,000 if you want more GPU cores (60 to 80) and memory ( 96GB to 256GB ) than the base model.

Equivalent Windows or Linux hardware with NVIDIA GPUs delivering a relevant amount of VRAM (48GB and up, say two or more NVIDIA RTX-4090 GPUs) will easily be in the same ballpark, but very likely will be substantially more. If your organization uses virtual servers, then you would need NVIDIA hardware that can be used as virtual GPUs (vGPUs), which, of course, is more expensive.

In short, suitable hardware is not cheap. Whether it is cost-effective depends on how much the equivalent load on a commercial provider would cost you (if we conveniently forget the other aspects of such a choice), especially over time. One feature in the FileMaker platform that can help with this is the Get(LastTokenUse), which you can use to keep track of the overall tokens used by your functionality. Commercial providers charge by the token usage, so this will help with calculating the cost-effectiveness aspect.

Another consideration is around flexibility, given that we are all at the start of our AI journeys and have no clear idea yet about how taxing our AI use will be. Add to that the fact that the AI landscape is changing at a dizzying speed, so what is required now may look completely different a couple of months from now.

And there is one more consideration: the conundrum of peak vs. base load. Especially when it comes to generating embeddings. As you add data to your Retrieval Augmented Generation (RAG – <link to the help section on RAG?>) database or when you embed your catalog of data (images and/or text), you will generate an initial high load, which is different than the load you will generate when users query against the RAG data or do semantic searches. So which of those do you prioritize, keeping in mind that bulk embeddings will probably need to be redone as you discover new models?  The conundrum here is that you need hardware that performs well in both of these scenarios, which typically means overprovisioning to make sure the peak load is handled efficiently.

And of course, we shouldn’t avoid the obvious. The open models are less capable than the frontier models you get access to with your OpenAI/Anthropic subscriptions. Open models are generally considered to be about 6 months behind in capabilities. Whether that matters largely depends on your use cases. These open models are good, very good, sometimes even. Good enough: that needs to be determined on what you want to use it for.  Often, for business use cases, they are. For agentic coding: maybe less so.

Self-cloud hosted?

Virtual machines

As you think through the capital outlay of an on-prem server, maybe you will consider turning that into a recurring expense by using cloud infrastructure. This would also solve the flexibility part of peak vs. base load and is an easier way to future-proof the deployment, as you can easily switch hardware instance types.

Since this involves a 3rd-party vendor, there is an element of trust here, obviously, as this is not infrastructure you own. But in our opinion, there are sufficient controls available to make a deployment private and secure. We use AWS, as we are an AWS Advanced Tier Partner. Each client has their own Virtual Private Cloud (VPC), which is segmented and separated from any other AWS user, and AWS offers granular access controls, private communication pathways (that do not even traverse the public internet if you do not want that), and all the monitoring and auditing tools to satisfy any oversight requirements you may have.

Every cloud provider has specialized instance types that are suitable for AI workloads. On AWS, those are the P and G families.  Remember that Claris’ AI Server requires NVIDIA hardware for both Linux and Windows.

If your AI workloads can achieve sufficient accuracy with smaller models that fit within 48GB of VRAM constraints, then the cheapest instance will be g6e.xlarge. Its on-demand run rate is $1.8610 per hour. If you run it during 8 business hours, that will be $14.89 per day, and only during workdays, then that will cost you ~$320/month. More, of course, if you have longer business hours.

The cheapest instance with 64GB of VRAM (g4dn.12xlarge) running 8 hours per day during weekdays will set you back $677.56 per month.

Here is a good overview of all AWS instance types with at least one GPU and their associated cost.

So while you have a lot of flexibility in moving up and down in hardware specifications to suit your peak vs. base loads, and you can always switch to new hardware offerings, the run rate for these instances is not cheap. And the same model capability limitations that we touched on for on-prem deployment apply here as well.

There is a better way.

Serverless

The answer is serverless, so that you truly only pay for what you use and gain scalability in the process, while maintaining the privacy and security inherent to your Virtual Private Cloud setup.

In this kind of setup, you completely forgo using the Claris AI Model server and rely on the fact that the FileMaker script steps expect an OpenAI-compatible endpoint and JSON structures for both the request and the response.

This also means that you can use the biggest, most accurate models – models that would be nearly impossible to self-host due to their size.

In the AWS space, serverless AI workloads are handled by their Bedrock service.

The main challenge here is that FileMaker expects AI API endpoints that are OpenAI compatible, and the Bedrock endpoints are not, bedrock also has its own data schema that is different than the OpenAI standard, which means that when FileMaker tries to send something to Bedrock it will not be understood.

In its simplest form, you can set up your own microservice to talk to Bedrock directly using the AWS SDK. These SDKs are available for several programming and scripting languages.

Setting up microservice to talk to Bedrock

The microservice in this case is responsible for taking the OpenAI-formatted request from FileMaker and passing it on to Bedrock and translating the Bedrock response into OpenAI json that FileMaker can understand. You can then use the native FileMaker script steps to interact with the microservice.

The downside of using your own microservice, of course, is scalability and resilience, and there are some secrets management that you’ll have to put in place (the AWS credentials, any API key you want to protect your NodeJS microservice with.

The coding agents like Claude and Codex, do not need a translation layer, those can talk directly to the frontier models hosted by Bedrock.

An architecture like below will address those scalability and resilience concerns, with https endpoints in the OpenAI format exposed for use from inside FileMaker Pro, Go, or WebDirect, and a lambda function to do the translation with AWS Secrets Manager to protect your Bedrock credentials.

Architecture addressing scalability and resilience concerns
(source: adapted from https://github.com/aws-samples/bedrock-access-gateway?tab=readme-ov-file)

If you are wondering why there is an Application Load Balancer (ALB) instead of a simpler API Gateway: API Gateway does not support server-sent events (SSE) for streaming responses, which are essential to having the expected interaction with Text Generation models (think Chat), where it will show you its response as it builds up instead of waiting and then presenting you with the end result.

The AWS documentation provides some alternative architectures, such as using AWS Fargate instead of Lambda for faster startup times, or replacing the Application Load Balancer with a Lambda Function URL and cutting out the cost of the ALB.

Using Bedrock for text generation models is the same cost per million tokens as you would incur when you use the model’s commercial provider.  And while you would also pay for the AWS infrastructure, that cost will be very minimal.

What you gain, however, is significant: privacy and security.

You have full control over your data, any logging you want or not want, and encryption using your own keys.

Update: AWS closed the OpenAI-compatibility gap with Bedrock Mantle (somewhat)

 When I first wrote this post, the situation was clear-cut:

That was accurate at the time, and the microservice pattern or AWS Lambda approach was the right answer. It no longer is — at least not for the part of the problem most people were solving it for.

At re:Invent 2025, AWS introduced Project Mantle, and in March 2026, the OpenAI-compatible API surface on the bedrock-mantle endpoint went generally available. Mantle is a separate, AWS-native endpoint (bedrock-mantle.{region}.api.aws/v1) that speaks the OpenAI Chat Completions and Responses APIs — and the Anthropic Messages API — directly. You point a client at the Mantle base URL, pass a Bedrock API key as the bearer token, and existing OpenAI-shaped tooling works with only a base-URL and key change. No SDK, no IAM signing dance, no schema translation.

 For FileMaker, that maps cleanly onto the Configure AI Account script step: a custom OpenAI-compatible endpoint plus a bearer key is exactly what Mantle expects. The translation microservice — the cube in my original diagram — simply disappears for the chat path.

But it only applies to chat interactions. For Embeddings (which you need for things like semantic search on text or images) you still need to go through a translation layer to Bedrock.

Embednigs going through a translation layer to Bedrock

3rd-party hosted?

If you are not intimately familiar with the architecture tools and services from providers such as AWS, Google Cloud or Azure, then what is outlined above will seem like a complex setup.

That is where we can help.  Our developers can help you navigate through these choices  to securely and privately use AI inside your FileMaker solution, or set up a private and secure agentic coding environment for you.  The Soliant team is ready to help you set up either or both of these environments.

You’ve now seen the full spectrum of options. From commercial providers to on-premise hardware to serverless architectures, each path has real trade-offs, like cost, capability, control, flexibility. Pick what matters most to your business, but understand what you’re giving up and what you are gaining when you do.

What ties these options together is something we care about deeply: keeping your data and intellectual property under your control.

This is where Soliant comes in. Our team has walked this path with clients in healthcare, manufacturing, financial services, and beyond. We understand the technical architecture. We also understand the business constraints, the regulatory requirements, and the budget realities.

We can help you build secure, private AI features directly inside FileMaker, like semantic search, intelligent data retrieval, and workflows that understand context. All of it would be protected because the data stays yours.

We can also help you set up a secure coding environment like the one we use internally. One where your developers harness AI safely, where your intellectual property never leaves your walls, and where you have full visibility into what’s happening.

The difference between knowing these options exist and actually implementing them cleanly is significant. We’ve done this work. We can save you months of research and missteps.

If you want a partner who understands both the technical details and the security implications, reach out to set up a call. We’re happy to sit down and work through what makes sense for your specific situation. This isn’t something you should navigate alone.

Leave a Comment

Your email address will not be published. Required fields are marked *

Close the CTA

GET OUR INSIGHTS DELIVERED

Scroll to Top