# High Availability mode

> This document describes the principles, preparations, and server operations of the high availability mode

## 1.Theory

TuGraph provides high availability (HA) mode through multi-machine hot backup. In high availability mode, write operations to the database will be synchronized to all servers (non-witness), so that even if some servers are down, the availability of the service will not be affected.

When the high-availability mode is started, multiple TuGraph servers form a backup group, which is a high-availability cluster. Each backup group consists of three or more TuGraph servers, one of which serves as the `leader` and the other replication group servers as `followers`. Write requests are served by a `leader`, which replicates and synchronizes each request to a `follower` and can only respond to the client after the request has been synchronized to the server. This way, if any server fails, the other servers will still have all the data written so far. If the `leader` server fails, other servers will automatically select a new `leader`.

TuGraph's high-availability mode provides two types of nodes: `replica` nodes and `witness` nodes. Among them, the `replica` node is an ordinary node, has logs and data, and can provide services to the outside world. The `witness` node is a node that only receives heartbeats and logs but does not save data. According to deployment requirements, `leader` nodes and `follower` nodes can be flexibly deployed as `replica` nodes or `witness` nodes. Based on this, there are two deployment methods for TuGraph high-availability mode: one is the ordinary deployment mode, and the other is the simple deployment mode with witness.

For normal deployment mode, `leader` and all `followers` are nodes of type `replica`. Write requests are served by a `leader`, which copies each request to a `follower` and cannot respond to the client until the request has been synchronized to more than half of the servers. This way, if less than half of the servers fail, the other servers will still have all the data written so far. If the `leader` server fails, other servers will automatically elect a new `leader` to ensure data consistency and service availability.

However, when the user server resources are insufficient or a network partition occurs, a normal HA cluster cannot be established. At this time, since the `witness` node has no data and takes up little resources, the `witness` node and the `replica` node can be deployed on one machine. For example, when there are only 2 machines, you can deploy the `replica` node on one machine, and the `replica` node and `witness` node on another machine, which not only saves resources, but also does not require log application To the state machine, there is no need to generate and install snapshots, so the response to requests is very fast, and it can help quickly elect a new leader when the cluster crashes or the network is partitioned. This is the simple deployment mode of the TuGraph HA cluster. Although `witness` nodes have many benefits, since there is no data, the cluster actually adds a node that cannot become `leader`, so the availability will be slightly reduced. To improve the availability of the cluster, you can allow the `witness` node to be the leader temporarily by specifying the `ha_enable_witness_to_leader` parameter as `true`. After the `witness` node synchronizes the new log to other nodes, it will actively switch the leader role to the node with the latest log.

This feature is supported in version 3.6 and above.

## 2.Preparation

To enable high availability mode, users need to:

- Three or more instances of TuGraph servers.
- To enable high availability mode when starting lgraph_server, the 'enable_ha' option can be set to 'true' using a configuration file or the command line.
- Set the correct rpc_port through the configuration file or command line

## 3.Start the initial backup group

After installing TuGraph, you can use the `lgraph_server` command to start a high-availability cluster on different machines. This section mainly explains how to start a high-availability cluster. For cluster status management after startup, see [lgraph_peer tool](../6.utility-tools/5.ha-cluster-management.md)

### 3.1.The initial data is consistent

When the data in all servers is the same or there is no data at startup, the user can
specify `--ha_conf host1:port1,host2:port2` to start the server.
In this way, all prepared TuGraph instances can be added to the initial backup group at one time,
All servers in the backup group elect `leader` according to the RAFT protocol, and other
servers join the backup group with the role of `follower`.

An example command to start an initial backup group is as follows:

```bash
$ ./lgraph_server -c lgraph.json --rpc_port 9090 --enable_ha true --ha_conf 172.22.224.15:9090,172.22.224.16:9090,172.22.224.17:9090
```

After the first server is started, it will elect itself as the 'leader' and organize a backup group with only itself.

### 3.2.Inconsistent initial data
If there is already data in the first server (imported by the `lgraph_import` tool or transferred from a server in non-high availability mode),
And it has not been used in high-availability mode before, the user should use the boostrap method to start. Start the server with data in bootstrap
mode with the `ha_bootstrap_role` parameter as 1, and specify the machine as the `leader` through the `ha_conf`
parameter. In bootstrap mode, the server will copy its own data to the new server before adding the newly
joined server to the backup group, so that the data in each server is consistent.

An example command to start a data server is as follows:

```bash
$ ./lgraph_server -c lgraph.json --rpc_port 9090 --enable_ha true --ha_conf 172.22.224.15:9090,172.22.224.16:9090,172.22.224.17:9090 --ha_bootstrap_role 1
```

Other servers without data need to specify the `ha_bootstrap_role` parameter as 2, and specify the `leader` through the `ha_conf` parameter. The command example is as follows

```bash
**$ ./lgraph_server -c lgraph.json --rpc_port 9090 --enable_ha true --ha_conf 172.22.224.15:9090,172.22.224.16:9090,172.22.224.17:9090 --ha_bootstrap_role 2
```

**You need to pay attention to two points when using bootstrap to start an HA cluster:**
1. You need to wait for the `leader` node to generate a snapshot and start successfully before joining the `follower` node, otherwise the `follower` node may fail to join. When starting the `follower` node, you can configure the `ha_node_join_group_s` parameter to be slightly larger to allow multiple waits and timeout retries when joining the HA cluster.
2. The HA cluster can only use the bootstrap mode when it is started for the first time. It can only be started in the normal mode (see Section 3.1) when it is started later. In particular, multiple nodes of the same cluster cannot be started in the bootstrap mode, otherwise it may cause Data inconsistency

## 4.Start witness node

### 4.1. Witness nodes are not allowed to become leader

The startup method of `witness` node is the same as that of ordinary nodes. You only need to set the `ha_is_witness` parameter to `true`. Note that the number of witness nodes should be less than half of the total number of cluster nodes.

An example command to start the witness node server is as follows:

```bash
$ ./lgraph_server -c lgraph.json --rpc_port 9090 --enable_ha true --ha_conf 172.22.224.15:9090,172.22.224.16:9090,172.22.224.17:9090 --ha_is_witness 1
```

Note: By default, the `witness` node is not allowed to become the `leader` node, which can improve the performance of the cluster, but will reduce the availability of the cluster when the `leader` node crashes.

### 4.1. Allow witness nodes to become leaders

You can specify the `ha_enable_witness_to_leader` parameter as `true`, so that the `witness` node can temporarily become the `leader` node, and then actively switch to the master after the new log synchronization is completed.

An example of the command to start the `witness` node server that is allowed to become the `leader` node is as follows:

```bash
$ ./lgraph_server -c lgraph.json --rpc_port 9090 --enable_ha true --ha_conf 172.22.224.15:9090,172.22.224.16:9090,172.22.224.17:9090 --ha_is_witness 1 --ha_enable_witness_to_leader 1
```

Note: Although allowing `witness` nodes to become `leader` nodes can improve the availability of the cluster, it may affect data consistency in extreme cases. Therefore, it should generally be ensured that the number of `witness` nodes + 1 is less than half of the total number of cluster nodes.

## 5.Scale out other servers

After starting the initial backup group, if you want to scale out the backup group, add new servers to the backup group,
The `--ha_conf HOST:PORT` option should be used, where `HOST` can be the IP address of any server already in this backup group,
And `PORT` is its RPC port. E.g:

```bash
./lgraph_server -c lgraph.json --rpc_port 9090 --enable_ha true --ha_conf 172.22.224.15:9090
```

This command will start a TuGraph server in high availability mode and try to add it to the backup group containing the server `172.22.224.15:9090`.
Note that joining a backup group requires a server to synchronize its data with the backup group's `leader` server, and this process may take a considerable amount of time, depending on the size of the data.

## 6.Stopping the Server

When a server goes offline via 'CTRL-C', it will notify the current 'leader' server to remove the server from the backup group. If the leader server goes offline, it will pass the leader identity permission to another server before going offline.

If a server is terminated or disconnected from other servers in the backup group, the server is considered a failed node and the leader server will remove it from the backup group after a specified time limit.

If any server leaves the backup group and wishes to rejoin, it must start with the '--ha_conf {HOST:PORT}' option, where 'HOST' is the IP address of a server in the current backup group.

## 7.Restarting the Server

Restarting the entire backup group is not recommended as it disrupts service. All servers can be shut down if desired. But on reboot,
It must be ensured that at least N/2+1 servers in the backup group at shutdown can start normally, otherwise the startup will fail. and,
Regardless of whether `enable_bootstrap` is specified as true when initially starting the replication group, restarting the server only needs to pass
Specify the `--ha_conf host1:port1,host2:port2` parameter to restart all servers at once. The command example is as follows:

```bash
$ ./lgraph_server -c lgraph.json --rpc_port 9090 --enable_ha true --ha_conf 172.22.224.15:9090,172.22.224.16:9090,172.22.224.17:9090
```

## 8.docker deploys a highly available cluster
In real business scenarios, it is likely to encounter the need to deploy high-availability clusters on multiple operating systems or architectures.
Differentiated environments may cause some dependencies to be missing when compiling TuGraph. therefore,
Compiling software in docker and deploying high-availability clusters is very valuable. Take the centos7 version of docker as an example,
The steps to deploy a highly available cluster are as follows.

### 8.1.Install mirror
Use the following command to download TuGraph's compiled docker image environment
```shell
docker pull tugraph/tugraph-compile-centos7
```
Then pull the TuGraph source code and compile and install

### 8.2.Create container
Use the following command to create a container, use `--net=host` to make the container run in host mode, in this mode
Docker and the host machine share the network namespace, that is, they share the same IP.
```shell
docker run --net=host -itd -p -v {src_dir}:{dst_dir} --name tugraph_ha tugraph/tugraph-compile-centos7 /bin/bash
```

### 8.3.Start service
Use the following command to start the service on each server, because docker and the host share IP, so you can directly specify to start the service on the host IP
```shell
$ lgraph_server -c lgraph.json --host 172.22.224.15 --rpc_port 9090 --enable_ha true --ha_conf 172.22.224.15:9090,172.22.224.16:9090,172.22.224.17:9090
```

## 9.Server Status

The current status of the backup group can be obtained from the TuGraph visualization tool, REST API, and Cypher query.

In the TuGraph visualization tool, you can find the list of servers and their roles in the backup group in the DBInfo section.

With the REST API, you can use `GET /info/peers` to request information.

In Cypher, the `CALL dbms.listServers()` statement is used to query the status information of the current backup group.

## 10.Data synchronization in high availability mode

In high availability mode, different servers in the same backup group may not always be in the same state. For performance reasons, if a request has been synchronized to more than half of the servers, the leader server will consider the request to be in the committed state. Although the rest of the servers will eventually receive the new request, the inconsistent state of the servers will persist for some time. A client may also send a request to a server that has just restarted, thus having an older state and waiting to join a backup group.

To ensure that the client sees consistently continuous data, and in particular to get rid of the 'reverse time travel' problem, where the client reads a state older than it has seen before, each TuGraph server keeps a monotonically increasing data version number. The mapping of the data version number to the database state in the backup group is globally consistent, meaning that if two servers have the same data version number, they must have the same data. When responding to a request, the server includes its data version number in the response. Thus, the client can tell which version it has seen. The client can choose to send this data version number along with the request. Upon receiving a request with a data version number, the server compares the data version number to its current version and rejects the request if its own version is lower than the requested version. This mechanism ensures that the client never reads a state that is older than before.