Why Your Data Architecture Matters More Than the AI Algorithm

AI Hype vs. AI Reality

Everyone in tech is raving about AI agents and automation. We all see companies everywhere rushing to implement ChatGPT integrations and AI assistants. However, there’s a crucial question most organizations skip: Is your data ready for AI?

Your AI Is Only as Good as Your Data

The Data Dream

When most companies start dreaming up ways they can leverage AI in their business, they think of the following:

Natural language queries to answer questions like, “What’s the status of order 475991?”
Automated workflows that can, for example, send tracking numbers when orders ship
Intelligent assistants that can answer customer questions in real-time
Predictive analytics that optimize production capacity

The Data Reality

Reality, however, is a different story. Performant AI systems that return a strong ROI require the following:

Speed: AI agents need sub-second response times. If your system takes 30 seconds to fetch order data, your AI will struggle to deliver good results quickly.
Accessibility: For AI to work, your data must be queried programmatically, meaning it must be made accessible via an API without human intervention. It can’t be locked behind an interface, inside a CSV file, or manually copied from another system.
Structure: Your data must be in a format AI can understand. This is a big requirement. Deeply nested legacy structures create barriers for AI. In our AI work, we’ve spent significant time reformatting data for our clients.
Freshness: AI needs current data, often real-time data. Manual exports and stale data warehouses won’t give you the results you want via AI.
Security: AI opens up potential for loss of data and intellectual property (the schema of your solutions). Providing data access without compromising your production system is critical.

Well-prepped data architecture creates a multiplier effect for your business, enabling not just AI but also BI and automation.

Creating a Successful Custom AI Implementation Starts with Good Data

We recently worked with a large Chicago screen printing company that wanted to add AI capabilities to its FileMaker system. They had decades of business data locked in the application, which had been highly customized not only to their industry but also to their specific business. Unfortunately, that same data was also slowing it down and creating blocks in their team’s operational workflows. Before we could build an AI agent, we had to solve the fundamental problem of their data first. Otherwise, the same problems would plague the AI implementations, hindering its results and return on investment.

Building a Foundation of Good Data

Our client reached out to us with a request for AI capabilities for its FileMaker-based vertical solution. The on-premises system was outdated, on an old version of FileMaker, and challenged the organization with slow update cycles and limited admin access. It delivered slow query performance, forcing users to wait for complex reports, which was further exacerbated by nested data structures. Team members avoided using the application and were implementing their own workarounds, magnifying the issue.

Cleaning up the client’s FileMaker system and data infrastructure wasn’t an option at the time. Direct integration would have been painfully slow due to an additional load on an already-stressed application. Its legacy structure made it incompatible with modern AI tools and unable to provide real-time responses. Our team clearly couldn’t build AI on top of this infrastructure. We needed to solve the data problem first.

Our Approach with MongoDB

We set out to build a real-time data warehouse for the client that would serve as a performant, accessible data layer with read options separate from production. Our goal was to give our client’s critical data an opportunity to scale, not just for AI but also for automation and BI.

We started with near-real-time data replication from FileMaker to MongoDB, on a schedule of every 50 minutes. This was the interval we all agreed on that is short enough to provide relevant fresh data and long enough to not cause too much extra load on the ageing system. Our team built the change-data-capture process based on modification timestamps. We also developed flattened, pre-processed views optimized for different use cases with a human-readable schema design.

This approach aimed to deliver immediate wins:

Speed: Pre-processed data enables sub-second queries.
Flexibility: One foundational infrastructure supports BI, automation, and AI.
Protection: The production FileMaker system remains separate and unaffected.
Innovation: MongoDB serves as a safe space to experiment with new technologies.
Scale: The implementation can support thousands of users without performance degradation.

Technical Details

The Replication Pipeline

Our development team established FileMaker scripts that monitor key tables such as orders, customers, statuses, and shipping. The system safely tracks modification timestamps to identify changed records and then sends them over to MongoDB, where they’re stored with metadata, using the Execute FileMaker Data API script step on the FileMaker side and a scalable and containerized HTTPS Replication API on the MongoDB side. The result is a read-replica of FileMaker records and child records in MongoDB, with new, modified, and deleted records being replicated into MongoDB.

Data Transformation

In MongoDB, we flattened nested structures into a queryable format and created purpose-built views for different needs. Our team renamed fields to be human and AI-friendly, added indexes for common query patterns, and established pre-processed aggregations for different dashboards and workflows that are refreshed regularly, allowing the dashboards to load quickly in the BI portal.

The Result: AI-Ready Data

Once in MongoDB and properly flattened and made accessible, we could deliver sub-second query response times for the clients, even on very complex queries and aggregations. The client experienced near-real-time data updates without an impact to production systems.

The Multiplier Effect: How Good Data Architecture Enables Everything

The MongoDB data warehouse we built for AI readiness delivered immediate value across multiple domains.

Business Intelligence Transformation

Via the web portals and MongoDB database we developed, complex reports now load in seconds. Our client’s team now has near real-time visibility into production. They can monitor production output for various decoration techniques, enabling them to handle more daily volumes. They can also optimize schedules to maximize their production capacity. This was impossible with a flattened and accessible data structure.

In fact, due to the solution’s immediate success and impact, the client has asked our development team to expand into building experimental dashboards. Their team can explore their data in ways that were previously impossible or incredibly slow to pull off.

Intelligent Automation (The Low-Hanging AI Fruit)

Our work also makes future automation possible. It’s not quite AI but a step beforehand. Our change-data-capture workflows monitor the FileMaker modification log for specific table changes. This triggers automation based on data events, all without delays and manual processes.

For example, consider the potential for automated shipping notifications. The system detects when tracking numbers are added to orders and automatically sends tracking info to third-party API endpoints. Zero human intervention is required to deliver an instant improvement in customer and partner satisfaction.

Data Archival Strategy

Our client has also been able to establish an easily-accessible data archive for when they need it and has successfully archived more than 50% of their production data. This complete historical archive is maintained in MongoDB, alleviating pressure on the legacy FileMaker system and ensuring the client does not lose historical data for future AI training.

Foundation for Advanced AI

The sub-second data access we built enables conversational AI via a clean JSON structure perfect for LLM consumption. This near-real-time data access allows AI to have current information. We leveraged this foundation to implement a Google Chat and Slack AI agent for simple to complex order queries. For instance, the Slack AI agent retrieves data, transforms it into an Excel spreadsheet, and then shares the file directly with the user who requested the query, along with a short explanation of how it queried the database.

Implementing Custom Google Chat and Slack AI Agents

With this data foundation in place, we could get to the original focus of our project and client’s need – an AI solution that streamlined the company’s daily operations. Our development team ended up building a custom Google Chat and Slack AI Agent that allows our client’s staff to look up order information by order ID or Purchase Order (PO). The client asked to name the agent Shirley, and she comes with a customized persona developed just for our client. Their team can use natural language to ask Shirley questions about the order and get returned actionable information in a structured, readable, and understandable format.

Now that the data is AI-ready, we are quickly iterating and releasing new features. We added a custom query feature that can transform plain English questions into MongoDB queries, export the result as Excel spreadsheet, then return the spreadsheet as a file as part of the Slack conversation. We’re using different models for different tasks to speed up the agent’s response times and accuracy.

Because of search load, Shirley is actually made up of three agents, all sharing the same memory. This allows them to read each other’s conversations so the client’s users don’t feel like they are talking to another agent.

More agents will be added later, as having multiple agents helps with using smaller system prompts, resulting in faster responses, reduced token usage/costs, and different models for different tasks. Some models are better at certain tasks than others, and each agent is specialized. We’re using an intelligent agent router that routes requests to the right agent depending on the user’s prompt.

Again, our data foundation in MongoDB makes this possible. Shirley, the custom AI Chat agent, retrieves near real-time information from MongoDB and then analyzes and communicates it due to our data formatting and preparation, all without taxing the client’s legacy FileMaker system.

Shirley can answer follow-up questions about these orders, such as, “What’s the status of this order?” The AI also remembers context from earlier in the conversation, ensuring delivery of a usable narrative and understanding of an order for staff.

Soon, users will be able to send Shirley a file, like an order PDF, and have it uploaded to the correct location. This shows how the AI can not only provide instant data analysis and order information but also manage tasks for your team through conversations with an AI agent.

To further improve our cient’s workflows, we migrated from Google Chat to Slack, their preferred platform. We also made it possible to query the database for orders then attach the results as an Excel spreadsheet.

Shirley gives them a fast, convenient, and conversational way to look up information and move through their daily operations without having to log into another system and navigate through screens.

We’re still working on adding features and improving this AI agent, but our client loves its functionality and is looking forward to expanding into different AI opportunities and experiments. All of this is now possible with a strong data foundation.

Real Results: the ROI for Data-First AI

Prioritizing a solid data infrastructure before jumping into the more fun AI work right away sets our client up for success. Once this step was completed, our development team was able to create almost immediate value via BI and automation, which paid for the infrastructure investment before we even started AI work.

We prioritized a straightforward continuous improvement model to deliver an ROI early, avoiding wasting money on AI that couldn’t perform while protecting the legacy FileMaker system.

Even better, we built a safe experimentation environment for new technologies on a reusable data infrastructure without locking them into an AI technology, platform, or vendor.

Building Your Data Foundation and AI Strategy

AI may be dominating the headlines, but your data architecture determines AI success or failure. Your first question shouldn’t be, “How can we implement AI?” but “Is our data ready for AI?” Shifting to the mindset of data modernization and preparation reduces risk in an AI investment and encourages scalability.

Our team can help you assess your current data and help you launch data architecture that supports your AI implementation goals. We can then ensure you generate a strong ROI early on via BI and automation while building toward AI.

Contact our team of AI architects to learn more about our custom AI readiness assessment services and the AI implementations we’ve built for our clients following our work on their data infrastructure.

Talk With an AI Specialist

More Resources

What Makes Data "AI-Ready"?

Before implementing AI, your data must meet these criteria:

Speed: Can your system return data in under 2 seconds?
Accessibility: Can AI query your data programmatically via API?
Structure: Is your data in AI-consumable formats (JSON, flat structures)?
Freshness: Do you have real-time or near-real-time data access?
Security: Can you provide data access without compromising production systems?

If you answered "no" to any of these, you're not ready for AI implementation

Common Signs Your Data Isn't AI-Ready

Your data queries take more than 5 seconds.
You don't have programmatic API access.
Your data is locked in proprietary formats.
You rely on manual data exports to analyze your data.
Your data is in nested structures that are difficult to query.
You can't experiment with your data without affecting your production system.

What is a Data Warehouse for AI?

A data warehouse is a centralized data management system that collects, integrates, stores, and manages large volumes of data from multiple sources. It serves to support business intelligence and analytics by organizing current and historical data in a format optimized for querying and reporting.

A good data warehouse delivers:

Real-time or near-real-time data replication
Flattened data structures for fast queries
API access for programmatic queries
Separated read and write operations

What Does "Flattening Data" Mean?

Flattening data means transforming nested, hierarchical data structures into a single-level format where all related information is accessible in one query. Instead of navigating through a complex system of data hierarchies, your data is represented in a non-nested structure. It allows for faster queries and is a common requirement for fast and token-efficient AI systems.

Cloud Computing

Well-Architected Framework Review

Digital Transformation

Cloud Infrastructure Services

Cloud Client Stories

Salesforce Consulting

Experience Cloud Portals

Salesforce Integrations

Salesforce Training

Salesforce Client Stories

FileMaker Development

FileMaker Web Solutions

FileMaker Hosting

Claris Studio & Connect

FileMaker Client Stories

Application Development

Strategic Portals

Application Modernization

Specialized Integrations

Custom Apps Client Stories

Meet the Team

Our Process

Our Work

News

Careers