A major new feature that rolled out with FileMaker Server 18 is its Data Restoration. In a nutshell: FileMaker Server will keep a transaction log of all the changes (data and schema) that are being made to the databases; if FileMaker Server or the server machine were to crash, FileMaker Server will use these logs to make sure that all of the logged transactions are applied to the database at the next startup.
The basic workings of Data Restoration are explained in FileMaker’s white paper. The white paper is a very worthwhile read and I won’t repeat much of what it says. However, I do want to highlight some important practical aspects before I walk you through a crash event, show how the data restoration works, and warn you what to look out for.
It’s On By Default
First off: the feature is on by default when you install FileMaker Server 18. You can turn it off only by making an Admin API or CLI call to your server.
If you have not used the Admin API yet, the FileMaker community has released a good set of tools that use the Admin API. You can also use an API testing tool like the Postman to execute the necessary calls. The data restoration setting is part of the “General Configuration,” and you can check its current state with a GET call to proper endpoint:
The two new configuration settings are highlighted below in the JSON returned by the API call:
A PATCH call to the same endpoint is all you need to toggle it on or off. Or to change the path to where the logs are stored.
Speaking of the path where the logs are stored, there are two fairly immediate performance impacts with this feature:
- The required disk space: FileMaker Server will need 16GB of disk space for this feature. 8GB of which it will claim immediately and pre-allocate for its logs. An additional 8GB will be used if a data restoration is needed but Server cannot perform it immediately. In that case, it will save the old logs and start a new set. A typical scenario for this is when the encryption key for the hosted files has not been saved on FileMaker Server and requires manual key entry by an administrator.
16GB of disk space is sizeable. It is not uncommon for server’s to be configured with less than the recommended disk space, especially in virtualized environments. The extra required disk space for this feature is an important aspect to consider when upgrading to FileMaker Server 18. The extra disk space consumption may severely affect the server’s performance if it is going to eat up much of the currently available disk space.
- The required disk activity: if your solution is such that it creates, deletes, or edits many records then FileMaker Server will have to expend more resources than before to write those transactions to its log before applying them to the database. On machines with marginal specs (slow disk i/o, slow processors, low free disk space,…) the effect will certainly be felt. Consider your current server’s specs when deciding to upgrade to FileMaker Server 18, and favor very fast disks when deciding to spend money on upgraded hardware.
The question has already come up in the community: “Can we use those transaction logs?” The answer is: no. These logs are strictly for FileMaker Server’s data restoration. If you need transactions in your scripted workflow, you still need to code to that using the existing techniques. These FileMaker Server transaction logs cannot be used for this.
Backup Strategy / Disaster Recovery Strategy / Business Continuity
Where does the new feature fit in with the other tools that help us with a backup and disaster recovery strategy?
It complements the existing tools; it does not replace any of them, but they all fit together like this:
- Data Restoration gives you just one restore point: the database as it was at the time of the crash. The restore time is fast and pretty much automatic at the moment FileMaker Server starts again
- Progressive backups give you two restore points that you can rely on in case the Data Restoration fails. Their age depends on the interval you use, and that interval is typically fairly short (5-15 minutes). The restore time is usually fast but does involve copying the backup set over.
- Backup schedules give you as many restore points as you have configured (hourly, daily, weekly, etc.) Restore time is the same as with progressive backups unless you need to retrieve a backup from an off-machine / off-site location.
Just like the Progressive backups, Data Restoration does not give you an off-machine or off-site backup. For that you need to work with the regular backup schedules combined with some OS-level scripting.
If you are going to rely on this feature as part of your overall backup or disaster recovery strategy, it is important to test it as best as you can so that you are familiar with its workings.
Probably the first thing to test for is whether the data restoration is actively running. It may turn itself off at startup when it detects that there is not enough free disk space. You can use the Admin API to perform scheduled testing to verify that the feature is on, and you can also monitor the FileMaker Server event log for messages that the server failed to start the data restoration at startup. Our upcoming DevCon session on server monitoring with Zabbix will include this kind of functionality.
A second test to perform from time to time would be to actually crash the server and become familiar with the normal course of operation that a Data Restoration goes through. That way you will be able to verify that the data restoration took its normal course and spot anomalies when it does not.
Obviously, you would do this on a test machine with a backup of your solution. One of the challenges in testing features like this is generating enough relevant load so that the test is somewhat representative of a normal situation on your server. After all, you want to be sure that the transaction logging part of the server is engaged so that there is something in those logs that needs to be restored.
During beta testing here is how we handled this:
- An AWS EC2 windows instance with FileMaker Server 18 installed
- A hosted file that creates records through PSoS, in a continuous loop, with the “wait for result” toggled off. This allows us spawn as many of these as we want, generate a good load, and pretty much guarantee that FileMaker Server is writing to its transaction log when we induce a crash.
- The “Not My Fault” Microsoft utility that allows you to crash Windows
With the server busy creating many records through the PSoS sessions:
I induce a crash with the Microsoft tool:
After the server recovers from the crash and reboots, let’s look for what happened. There is the obvious evidence of the crash with Windows prompting us to acknowledge the unexpected shutdown:
And an entry in the Windows System log, at the time Windows rebooted itself, confirming that the crash did indeed happen:
In the Application event log, where FileMaker Server also logs its events, we see FMS starting up at 10:21am, at the time Windows restarted:
As per the best practices, I have FileMaker Server configured to start up but not auto-host any files. This allows administrators to inspect the sequence of events that brought down the FileMaker Server machine before deciding to continue with the files as they are.
At this point in time, my FileMaker Server is operational, but no files are being hosted.
A little further in the event log, we have confirmation that the Data Restoration feature is enabled:
This is an important event to confirm because without it, we could try to open the files but there will be no attempt to restore them to their pre-crash state!
The next event that we want to see is that the restoration has been applied and completed without error:
Without this event we cannot continue to use this file and we would need to revert to a progressive backup or a regular backup.
Note that FileMaker Server performed the data restoration when its processes started. Not when the file was opened to be hosted. Given that we have auto-hosting disabled, at this point in time the FileMaker file is not yet hosted and available to the users.
The only exception is if the file has Encryption-at-Rest (EAR) enabled and the encryption key is not saved in FileMaker Server. In this scenario, FileMaker Server will not be able to do the data restoration at this point but will postpone it until the files are opened through the admin console and the administrator inputs the encryption key.
In our scenario, the data restoration has been done successfully, and we can proceed to open the file in the admin console:
This is normal, albeit somewhat unexpected and a bit confusing. Given that the data restoration was successful we did not really expect FileMaker Server to log this warning. Which is why it is crucially important to confirm that there is an event ID 1070 (data restoration completed) preceding this warning event.
If data restoration was not successful, then this warning about the file having been improperly closed would also be there but the file would not have been restored to its pre-crash state and as per best practice it is not safe to re-use a file in that state.
The new Data Restoration feature can play an important role in how we deal with crashes and outages.
Understanding its behavior is crucial however in making sure we can rely on the outcome of a FileMaker Server restart:
- Is the data restoration feature enabled?
- Did the data restoration complete successfully?
As always feel free to leave questions and comments.