High availability

Our recommended HA setup for production is: - Galera with at least 3 nodes. Always an odd number of nodes. - Load balance requests using MaxScale as database proxy. - Use dedicated nodes to avoid noisy neighbours. - Define pod disruption budgets.

Refer to the following sections for further detail.

Topologies

[!WARNING]
SemiSync Replication is in alpha stage and not ready for production. Use it at your own risk.

Multi master HA via Galera: All nodes support reads and writes. We have a designated primary where the writes are performed.
Single master HA via SemiSync Replication: The primary node allows both reads and writes, while secondary nodes only allow reads.

Kubernetes Services

In order to address nodes, mariadb-operator provides you with the following Kubernetes Services: - <mariadb-name>: To be used for read requests. It will point to all nodes. - <mariadb-name>-primary: To be used for write requests. It will point to a single node, the primary. - <mariadb-name>-secondary: To be used for read requests. It will point to all nodes, except the primary.

Whenever the primary changes, either by the user or by the operator, both the <mariadb-name>-primary and <mariadb-name>-secondary Services will be automatically updated by the operator to address the right nodes.

The primary may be manually changed by the user at any point by updating the spec.[replication|galera].primary.podIndex field. Alternatively, automatic primary failover can be enabled by setting spec.[replication|galera].primary.automaticFailover, which will make the operator to switch primary whenever the primary Pod goes down.

MaxScale

While Kubernetes Services can be utilized to dynamically address primary and secondary instances, the most robust high availability configuration we recommend relies on MaxScale. Please refer to MaxScale docs for further details.

Pod Anti-Affinity

[!WARNING]
Bear in mind that, when enabling this, you need to have at least as many Nodes available as the replicas specified. Otherwise your Pods will be unscheduled and the cluster won't bootstrap.

To achieve real high availability, we need to run each MariaDB Pod in different Kubernetes Nodes. This practice, known as anti-affinity, helps reducing the blast radius of Nodes being unavailable.

By default, anti-affinity is disabled, which means that multiple Pods may be scheduled in the same Node, something not desired in HA scenarios.

You can selectively enable anti-affinity in all the different Pods managed by the MariaDB resource:

apiVersion: k8s.mariadb.com/v1alpha1
kind: MariaDB
metadata:
  name: mariadb-galera
spec:
  bootstrapFrom:
    restoreJob:
      affinity:
        antiAffinityEnabled: true
  ...
  galera:
    initJob:
      affinity:
        antiAffinityEnabled: true
  ...
  metrics:
    exporter:
      affinity:
        antiAffinityEnabled: true
  ...
  affinity:
    antiAffinityEnabled: true

Anti-affinity may also be enabled in the the resources that have a reference to MariaDB, resulting in their Pods being scheduled in Nodes where MariaDB is not running. For instance, the Backup and Restore processes can run in different Nodes:

apiVersion: k8s.mariadb.com/v1alpha1
kind: Backup
metadata:
  name: backup
spec:
  mariaDbRef:
    name: mariadb-galera
  ...
  affinity:
    antiAffinityEnabled: true

apiVersion: k8s.mariadb.com/v1alpha1
kind: Restore
metadata:
  name: restore
spec:
  mariaDbRef:
    name: mariadb-galera
  ...
  affinity:
    antiAffinityEnabled: true

In the case of MaxScale, the Pods will also be placed in Nodes isolated in terms of compute, ensuring isolation not only among themselves but also from the MariaDB Pods. For example, if you run a MariaDB and MaxScale with 3 replicas each, you will need 6 Nodes in total:

apiVersion: k8s.mariadb.com/v1alpha1
kind: MaxScale
metadata:
  name: maxscale-galera
spec:
  mariaDbRef:
    name: mariadb-galera
  ...
  metrics:
    exporter:
      affinity:
        antiAffinityEnabled: true
  ...
  affinity:
    antiAffinityEnabled: true

Default anti-affinity rules generated by the operator might not satisfy your needs, but you can always define your own rules. For example, if you want the MaxScale Pods to be in different Nodes, but you want them to share Nodes with MariaDB:

apiVersion: k8s.mariadb.com/v1alpha1
kind: MaxScale
metadata:
  name: maxscale-galera
spec:
  mariaDbRef:
    name: mariadb-galera
  ...
  affinity:
    podAntiAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
      - labelSelector:
          matchExpressions:
          - key: app.kubernetes.io/instance
            operator: In
            values:
            - maxscale-galera
            # 'mariadb-galera' instance omitted (default anti-affinity rule)
        topologyKey: kubernetes.io/hostname

Dedicated Nodes

If you want to avoid noisy neighbours running in the same Kubernetes Nodes as your MariaDB, you may consider using dedicated Nodes. For achieving this, you will need: - Taint your Nodes and add the counterpart toleration in your Pods.

[!IMPORTANT]
Tainting your Nodes is not covered by this operator, it is something you need to do by yourself beforehand. You may take a look at the Kubernetes documentation to understand how to achieve this. - Select the Nodes to schedule in via a nodeSelector in your Pods. [!NOTE]
Although you can use the default Node labels, you may consider adding more significative labels to your Nodes, as you will have to refer to them in your Pod nodeSelector. Refer to the Kubernetes documentation.

Add podAntiAffinity to your Pods as described in the Pod Anti-Affinity section.

Once you have completed the previous steps, you can configure your MariaDB as follows:

apiVersion: k8s.mariadb.com/v1alpha1
kind: MariaDB
metadata:
  name: mariadb-galera
spec:
  ...
  tolerations:
    - key: "k8s.mariadb.com/ha"
      operator: "Exists"
      effect: "NoSchedule"
  nodeSelector:
    "k8s.mariadb.com/node": "ha" 
  affinity:
    antiAffinityEnabled: true

Pod Disruption Budgets

[!IMPORTANT]
Take a look at the Kubernetes documentation if you are unfamiliar to PodDisruptionBudgets

By defining a PodDisruptionBudget, you are telling Kubernetes how many Pods your database tolerates to be down. This quite important for planned maintenance operations such as Node upgrades.

mariadb-operator creates a default PodDisruptionBudget if you are running in HA, but you are able to define your own by setting:

apiVersion: k8s.mariadb.com/v1alpha1
kind: MariaDB
metadata:
  name: mariadb-galera
spec:
  ...
    podDisruptionBudget:
      maxUnavailable: 33%