A well-designed production infrastructure allows your development team to sleep at night and relax on weekends, while users benefit from your app 24/7.
So, before you go into production with your FHIR-based application, you definitely need to take care of two critical risks and minimize them:
These tips will help you develop the production-ready and reliable FHIR solution in the cloud. To keep it simple, we will intentionally skip the topics of security, networking, and monitoring. Instead, we will focus on the basics to move forward.
Infrastructure design starts with choosing the right principles. The major principle of building high-availability systems is that there is no single point of failure. In simple words, you will need to duplicate the key components of the system, i.e. make them redundant.
The basic FHIR solution includes two key components that should be duplicated:
If you have more key components, you can apply this principle to all of them in the same way.
Applications and databases require different approaches and technologies when it comes to duplication. Let’s explore how to deal with them in turn.
To avoid a single point of failure on the application layer, you need to have two or more instances of the FHIR server/backend that will serve requests in parallel. You also can’t operate your solution without a mechanism to monitor, relaunch and allocate traffic for your apps if they fail.
The particular implementation depends on your selected cloud and services. Nowadays, public clouds provide many options, so choose wisely based on your budget and desired level of oversight.
We highly recommend looking at the managed Kubernetes (K8s) service: EKS in AWS, GKE in GCP, and AKS in Azure. From our point of view, it’s a win-win and best value for money.
When you run your containerized apps within the Kubernetes cluster, K8s will:
This process is called self-healing.
To configure this Kubernetes-based solution, you need to start the Kubernetes cluster in a cloud, setting up health checks and self-healing/liveness policies. The basic self-healing policy includes the following parameters that you can adjust for your needs:
Find the example of Kubernetes liveness probe here.
Get started with the Aidbox FHIR Server for data storage, integrations, healthcare analytics, and more, or hire our team to support your software development needs.
It is also important to note that Kubernetes is designed so that a single Kubernetes cluster can run across multiple failure zones. It protects your cloud ecosystem against infrastructure and availability zone failures. So, please, take it into account when you configure the cluster.
As a final step, you will need to make sure that your selected FHIR server can serve requests in parallel. For example, the Aidbox FHIR platform has a built-in internal Aidbox cache and a mechanism for cache synchronization between instances. Aidbox instances can share state/common configs between each other, making them interchangeable.
Short summary: As a result, you have a basic high-available application layer with automated app failover based on the K8s self-healing mechanism. This approach helps you to minimize system downtime and will not require manual actions as part of the recovery process.
To avoid a single point of failure on the database layer, we need to have a replica, backups and a failover/recovery process (ideally automated). In our case, we use PostgreSQL and the examples below will be based on it.
PostgreSQL allows you to set up two or more replicas by using the built-in streaming replication feature. We suggest sticking with the following parameters:
This will save your data if you erroneously delete a table from the database, so you can recover it easily.
PostgreSQL backups can be categorized into two types:
To configure a backup pipeline, you will need additional extensions and an app to trigger and execute the process. We recommend looking at the WAL-G extension for PostgreSQL. WAL-G is an archival restoration tool for PostgreSQL, MySQL/MariaDB, and MS SQL Server (beta for MongoDB and Redis). The best approach to storing backups in the cloud is Amazon’s AWS S3 file-storage service or Google Cloud Storage.
Going further, the best practice is having your own documented backup policy based on your solution requirements and usage. This policy will form the basis of your backup configuration. Below is an example of the key variables for this policy that can serve as a reference:
To test and refine your FHIR-based solution before going live, try the free version of Aidbox. It offers a comprehensive environment to develop and validate your solution, providing all necessary tools without any feature limitations.
Well-designed production-ready infrastructure should be reliable and self-healing. This means that if any of the key components fail, the system will re-balance and keep on running smoothly.
In practice, you need to duplicate your components and embed failover mechanisms on the application and database levels separately.
Kubernetes cluster with native health check monitoring and self-healing logic will juggle your FHIR-based containers, while PostgreSQL will be replicated and backed up with the WAL-G extension and built-in replication features.
The next biggest challenge will be to create a smart failover and recovery mechanism for databases. We’ll be sure to explore this topic in future articles.
This checklist is enough to protect your solution from data loss and system downtime. For a turn-key production-ready infrastructure, you will also need to take care of security, networking, and monitoring or delegate it to all-in-one SaaS solutions like the Aidbox FHIR platform in AWS.
Author:
Mike Ryzhikov
COO at Health Samurai
Get in touch with us today!