From Concept to Production: Lessons from Building a Production AI Assistant

There’s no shortage of AI demos. You’ve probably seen them. Someone types a question, the AI answers, and everyone nods.

What you see less of is an honest account of what it takes to make one of those work reliably in a production environment. With real users, real data, and real questions the system wasn’t designed for.

I’m aiming to share that account with this blog post. We built a production AI assistant for a client, and we learned a lot along the way. Some of it confirmed what we expected. Some of it didn’t. All of it was useful.

The Use Case: Putting Business Data Within Reach

The client’s core problem was straightforward. Their team had data they needed to access regularly, but not everyone who needed it was comfortable navigating the application directly. They wanted a simpler and more modern way in.

The answer was an AI Assistant integrated into Slack and Claude (Desktop and web), a tool their team was already using every day. Instead of logging into a database, opening reports, or asking a developer to pull a number, a user could just send a message:

“How many orders were placed today?”

“Now break it down by sales rep”

And get a real explanation, with context, of how the data was fetched (to confirm the path the AI took with the user), in plain English.

The client later asked about extending the same capability to email, so users could query their data by simply sending a message and having the AI assistant reply. The architecture we built makes that straightforward to add.

The AI assistant helped customer service representatives and executives quickly find important information about their sales goals, production status, and financial progress from application data. Because the assistant responded so quickly, users could focus on more strategic tasks, give their clients better information, and ultimately boost financial, production, and sales results.

The Architecture: What We Built and Why

Good AI assistant development starts with architecture decisions. The ones you make early shape everything downstream. Here’s what we chose and why.

Workflow Orchestration with N8N

We used a self-hosted version of N8N as the orchestration layer rather than building it from scratch for practical reasons. N8N comes with built-in logging, error handling, retry logic, and rate limiting. Those are real infrastructure concerns that take significant time to build and maintain. Using N8N meant we could focus on the assistant logic itself rather than the plumbing around it.

The Agent Router

One of the most important decisions we made was to not build a single monolithic agent.

Different types of questions require different logic, different tools, and different context. A question about a specific order is fundamentally different from a question about aggregate order volumes, or a sales report. Handling all use cases well with one agent is harder and less efficient than it sounds.

So we built an agent router. When the AI assistant receives a message, the first step is deciding which specialized agent should handle it. Each agent has its own system prompt, its own tools, and its own focused context.

The user just sees the AI assistant. In the background, the right specialist is handling their request.

Memory via MongoDB

For an AI assistant to feel natural, it needs to remember context across messages.

We store chat history in MongoDB rather than keeping it in ephemeral memory. This means conversation history persists between sessions. It also means we can audit it, which matters for a production system.

But MongoDB isn’t just holding chat history here. It’s also the data layer the AI queries against.

The client’s application data lives in FileMaker. We continuously replicate that data into MongoDB to support fast AI and analytical workloads. Orders, customers, designs, statuses, shipping, invoicing — all of it flows from FileMaker into MongoDB and stays current. The AI sits on the MongoDB side of that pipeline, not directly against FileMaker.

The architecture looks like this: FileMaker is the system of record. MongoDB is the working layer. The AI reads from MongoDB.

This separation matters for performance. MongoDB handles the kinds of fast, flexible queries that AI workloads require. FileMaker handles what it was built for. Each system does what it’s good at.

Context window management is part of this too. The agent is configured to reference the last several messages rather than the entire conversation history. This keeps the system responsive without overloading it with context it doesn’t need.

The Data Layer

All order data lives in MongoDB alongside the chat history.

When a user asks about an order, the agent calls a tool. Think of it like a script with parameters. The agent passes the relevant identifier, the tool queries the database, and the result comes back as structured data. The agent then turns that into a plain-language response.

This separation matters. The AI isn’t directly querying the database. It’s calling defined tools with defined parameters. That gives us control over what the AI can access and how.

The Details That Make It Trustworthy

Getting an AI assistant to work is one thing. Getting users to trust it is another. These are the details that made the difference.

A Processing Indicator

When the AI assistant receives a message, the workflow immediately adds a reaction emoji to that message in Slack.

This is a small thing that matters a lot. Some complex requests take 30 to 40 seconds to process. Without any acknowledgment, users don’t know if their message was received, if the system is working, or if something broke. The emoji tells them: we got it, we’re working on it.

Transparent Responses

The AI assistant doesn’t just return a number. It returns the methodology.

“I filtered to orders that were shipped last week (data x to date y) and counted the total number of orders (using the “Invoice Net Total” field).”

Users can see how the answer was generated. That transparency is a big part of why the client’s team adopted it. They weren’t being asked to trust a black box. They could verify the logic.

Logging Every Query

Every query and every response is logged.

During the early weeks, especially, we reviewed those logs regularly. We were looking for questions the system wasn’t handling well, for edge cases we hadn’t anticipated, and for answers that looked wrong. Logging isn’t just a debugging tool. It’s how you maintain a trustworthy system over time and capture model drift.

Making It Clear What Shirley Is

The AI assistant identifies herself as an experimental AI agent. Users know what they’re interacting with.

This was a deliberate choice. Users who understand they’re talking to an AI agent are more likely to verify answers that matter and less likely to be blindsided if the system doesn’t handle something perfectly. Honest framing sets the right expectations from the start.

The Prompt Engineering Reality

The system prompts that define each agent’s behavior were not written once and left alone.

Most of them were drafted with the help of an AI assistant, then reviewed and edited by a human, then revised again. That iteration is part of the process. Good prompts don’t come out fully formed.

The system prompt for the order agent, for example, defines Shirley as a helpful assistant specializing in retrieving information and providing concise, well-structured summaries in plain English. It also includes a checklist of how to respond to different types of questions.

The difference between what an agent can do and what it does reliably is often entirely a function of how well the prompt is written. Prompt engineering is real work, and it’s worth the time.

Maximizing Data Protection & Accuracy While Minimizing Hallucinations

We didn’t want to send raw client data directly to LLMs, even with strict zero-retention guarantees and guardrails in place. We also wanted to maximize accuracy and minimize hallucinations.

Instead of having the model answer directly from raw context, we trained it to generate highly specialized queries against MongoDB’s flexible query and aggregation engine. Depending on the request, the system might execute anything from a single query to a chain of 20 coordinated queries.

What This Looks Like at Scale

The multi-agent architecture we built wasn’t just a design choice for this project. It’s a foundation for what comes next.

Adding a new type of query doesn’t mean rebuilding the system. It means adding a new specialized agent, defining its tools, and updating the router to recognize when to use it.

The client’s appetite for what the AI assistant can do has grown as they’ve seen what’s possible. That’s both an opportunity and a design consideration. Building for extensibility from the start means new use cases can be added without disrupting what’s already working.

The same infrastructure can support different interfaces too. Slack today. Email tomorrow. The underlying system doesn’t change.

Because the system uses multiple specialized agents behind a single frontend interface (currently Claude Desktop, Claude web, and Slack), we can choose the most effective model for each task. Our infrastructure combines models from different trusted providers: some are optimized for deep reasoning, others for low latency, and others for cost efficiency on simpler workloads.

The architecture also gives us the flexibility to swap and upgrade quickly, since inference is routed through OpenRouter. That flexibility helped keep operating costs low while still maintaining strong performance. Beyond inference routing, OpenRouter also provided useful infrastructure features, including guardrails, spend tracking, logging, and token usage monitoring.

What We’d Tell Anyone Starting an AI Assistant Development Project

A few things we’d say if someone asked us what we know now that we wish we’d known earlier.

Architecture decisions made early are hard to undo.

Spend the time thinking through how data flows, where memory lives, and how agents are separated before you build anything. The router pattern, in particular, is something we’d recommend to anyone building an AI assistant that handles more than one type of request.

Logging is not optional.

Build it in from the start. Review it regularly. It’s the only reliable way to know what your AI assistant is actually doing in production.

Prompts take longer than you think.

Budget time for iteration. The first version of a system prompt is rarely the version that works in production.

Deterministic code beats AI inference for tasks that require consistency.

Know the difference between what should be handled by an LLM and what should be handled by code that an LLM helped you write.

Trust is built in the details.

The emoji. The methodology in the response. The honest label. The domain-specific agent specialization shaped by application experts. None of these takes too long to implement, but all of them matter to users.

Production AI Is an Engineering Discipline

Building an AI assistant that works reliably in production is not fundamentally different from building any other production system.

It requires clear requirements. Careful architecture. Disciplined testing. Ongoing monitoring. Focus on reducing complexity to make troubleshooting problems easy and fast. And the willingness to be honest when something isn’t working and try a different approach.

The tools have matured. The barrier to entry is lower than it’s ever been. But lower barrier doesn’t mean no barrier. The difference between a demo and a production system is the difference between impressive and trustworthy.

At Soliant, we build tailored AI solutions across FileMaker, Salesforce, AWS, and cloud-native applications. We’ve done this work in production, for real clients, and we know where the hard parts are.

If you’re thinking through an AI assistant development project and want a team that will work through the details with you, we’re happy to talk. Reach out to connect with one of our consultants.

Leave a Comment

Your email address will not be published. Required fields are marked *

Close the CTA

GET OUR INSIGHTS DELIVERED

Scroll to Top