While there are several good resources for advanced monitoring of FileMaker Server, including configuring for use with Zabbix, it is non-trivial to set up and configure the necessary resources.
Additionally, it has always been a challenge to troubleshoot a server or individual processes that may not have completely crashed but continue to run and become unresponsive. In that case, the process may be running, so testing to see if the process is running will not produce an error. In case it does crash, the FileMaker scripting engine (FMSE) will restart on its own and may also cause a dump file to be placed in the Logs directory. The former case – where FMSE does not crash but becomes unresponsive – is harder to detect.
When FMSE Becomes Unresponsive
There are various reasons why FMSE might become unresponsive. One such scenario may be a scheduled script that has no timeout set, so it may continue to run even if it gets hung up in the middle of a script. Another possibility could be that an under-provisioned server is under too much load and becomes unresponsive, even though all processes are still actively running.
Another example might be that the web publishing engine has become unresponsive, trying to process too many requests at one time, possibly from suboptimal search requests such as finding on related fields.
For the circumstances we have outlined above, an alternate approach may be better suited to deal with this issue and not require separate servers or running active agents on the host server.
Dead Man’s Switch
Using the concept of a dead man’s switch, we do not have to rely on processes running on the server itself to handle reporting errors. If you have ever operated a lawnmower, you likely know what a dead man’s switch is, where the mower engine will automatically shut off if you release the handle. The idea is that if the operator becomes incapacitated, for whatever reason, a process can be activated or deactivated.
A simple, low-cost method of configuring a dead man’s switch for a FileMaker Server is to utilize a service such as CloudWatch in AWS. We can use the AWS API to create a custom metric in CloudWatch, at an interval that meets your requirements. For example, you can “PUT” a metric once every minute that can simply have a value of “1” like a ping.
Once that is in place, we can create an alarm in CloudWatch. Again, the threshold that you configure here is going to be variable and should work with the interval you configure above to put the metric from FileMaker Server. As an example, we can configure the alarm to check every five minutes, where we watch for a statistic of Minimum looking for a threshold lower than “1” to trigger an alarm.
Additionally, and importantly, we will also “treat missing data as bad,” which is effectively breaching the threshold.
We are effectively looking for a negative, so when the metrics fail to get put in CloudWatch, it will trigger the alarm. The action you take will depend on how far you want to go with setting up automation. In our example, sending a notification to an email list, which can be easily configured in AWS by posting to an SNS Topic, allows us to receive notification or take some other action, as needed.
While I would be cautious about putting too much automation in place, such as restarting a server or individual services, having the ability to get notified when specific services become unresponsive, regardless of their running state, is valuable. More likely, it would warrant manual intervention and review of any causes for services to be down or unresponsive. Getting notified early of a potential issue helps us to isolate problems and resolve them quickly.
By utilizing cloud infrastructure to automate external monitoring of our critical resources, we can improve the overall reliability of our deployments without adding any significant costs to do so. If you’re interested in this for your FileMaker solution, our team can help. Contact us to get started.