High Availability mode
This document describes the principles, preparations, and server operations of the high availability mode
1.Theory
TuGraph provides high availability (HA) mode through multi-machine hot backup. In high availability mode, write operations to the database will be synchronized to all servers (non-witness), so that even if some servers are down, the availability of the service will not be affected.
When the high-availability mode is started, multiple TuGraph servers form a backup group, which is a high-availability cluster. Each backup group consists of three or more TuGraph servers, one of which serves as the leader
and the other replication group servers as followers
. Write requests are served by a leader
, which replicates and synchronizes each request to a follower
and can only respond to the client after the request has been synchronized to the server. This way, if any server fails, the other servers will still have all the data written so far. If the leader
server fails, other servers will automatically select a new leader
.
TuGraph’s high-availability mode provides two types of nodes: replica
nodes and witness
nodes. Among them, the replica
node is an ordinary node, has logs and data, and can provide services to the outside world. The witness
node is a node that only receives heartbeats and logs but does not save data. According to deployment requirements, leader
nodes and follower
nodes can be flexibly deployed as replica
nodes or witness
nodes. Based on this, there are two deployment methods for TuGraph high-availability mode: one is the ordinary deployment mode, and the other is the simple deployment mode with witness.
For normal deployment mode, leader
and all followers
are nodes of type replica
. Write requests are served by a leader
, which copies each request to a follower
and cannot respond to the client until the request has been synchronized to more than half of the servers. This way, if less than half of the servers fail, the other servers will still have all the data written so far. If the leader
server fails, other servers will automatically elect a new leader
to ensure data consistency and service availability.
However, when the user server resources are insufficient or a network partition occurs, a normal HA cluster cannot be established. At this time, since the witness
node has no data and takes up little resources, the witness
node and the replica
node can be deployed on one machine. For example, when there are only 2 machines, you can deploy the replica
node on one machine, and the replica
node and witness
node on another machine, which not only saves resources, but also does not require log application To the state machine, there is no need to generate and install snapshots, so the response to requests is very fast, and it can help quickly elect a new leader when the cluster crashes or the network is partitioned. This is the simple deployment mode of the TuGraph HA cluster. Although witness
nodes have many benefits, since there is no data, the cluster actually adds a node that cannot become leader
, so the availability will be slightly reduced. To improve the availability of the cluster, you can allow the witness
node to be the leader temporarily by specifying the ha_enable_witness_to_leader
parameter as true
. After the witness
node synchronizes the new log to other nodes, it will actively switch the leader role to the node with the latest log.
This feature is supported in version 3.6 and above.
2.Preparation
To enable high availability mode, users need to:
Three or more instances of TuGraph servers.
To enable high availability mode when starting lgraph_server, the ‘enable_ha’ option can be set to ‘true’ using a configuration file or the command line.
Set the correct rpc_port through the configuration file or command line
3.Start the initial backup group
After installing TuGraph, you can use the lgraph_server
command to start a high-availability cluster on different machines. This section mainly explains how to start a high-availability cluster. For cluster status management after startup, see lgraph_peer tool
3.1.The initial data is consistent
When the data in all servers is the same or there is no data at startup, the user can
specify --ha_conf host1:port1,host2:port2
to start the server.
In this way, all prepared TuGraph instances can be added to the initial backup group at one time,
All servers in the backup group elect leader
according to the RAFT protocol, and other
servers join the backup group with the role of follower
.
An example command to start an initial backup group is as follows:
$ ./lgraph_server -c lgraph.json --rpc_port 9090 --enable_ha true --ha_conf 172.22.224.15:9090,172.22.224.16:9090,172.22.224.17:9090
After the first server is started, it will elect itself as the ‘leader’ and organize a backup group with only itself.
3.2.Inconsistent initial data
If there is already data in the first server (imported by the lgraph_import
tool or transferred from a server in non-high availability mode),
And it has not been used in high-availability mode before, the user should use the boostrap method to start. Start the server with data in bootstrap
mode with the ha_bootstrap_role
parameter as 1, and specify the machine as the leader
through the ha_conf
parameter. In bootstrap mode, the server will copy its own data to the new server before adding the newly
joined server to the backup group, so that the data in each server is consistent.
An example command to start a data server is as follows:
$ ./lgraph_server -c lgraph.json --rpc_port 9090 --enable_ha true --ha_conf 172.22.224.15:9090,172.22.224.16:9090,172.22.224.17:9090 --ha_bootstrap_role 1
Other servers without data need to specify the ha_bootstrap_role
parameter as 2, and specify the leader
through the ha_conf
parameter. The command example is as follows
**$ ./lgraph_server -c lgraph.json --rpc_port 9090 --enable_ha true --ha_conf 172.22.224.15:9090,172.22.224.16:9090,172.22.224.17:9090 --ha_bootstrap_role 2
You need to pay attention to two points when using bootstrap to start an HA cluster:
You need to wait for the
leader
node to generate a snapshot and start successfully before joining thefollower
node, otherwise thefollower
node may fail to join. When starting thefollower
node, you can configure theha_node_join_group_s
parameter to be slightly larger to allow multiple waits and timeout retries when joining the HA cluster.The HA cluster can only use the bootstrap mode when it is started for the first time. It can only be started in the normal mode (see Section 3.1) when it is started later. In particular, multiple nodes of the same cluster cannot be started in the bootstrap mode, otherwise it may cause Data inconsistency
4.Start witness node
4.1. Witness nodes are not allowed to become leader
The startup method of witness
node is the same as that of ordinary nodes. You only need to set the ha_is_witness
parameter to true
. Note that the number of witness nodes should be less than half of the total number of cluster nodes.
An example command to start the witness node server is as follows:
$ ./lgraph_server -c lgraph.json --rpc_port 9090 --enable_ha true --ha_conf 172.22.224.15:9090,172.22.224.16:9090,172.22.224.17:9090 --ha_is_witness 1
Note: By default, the witness
node is not allowed to become the leader
node, which can improve the performance of the cluster, but will reduce the availability of the cluster when the leader
node crashes.
4.1. Allow witness nodes to become leaders
You can specify the ha_enable_witness_to_leader
parameter as true
, so that the witness
node can temporarily become the leader
node, and then actively switch to the master after the new log synchronization is completed.
An example of the command to start the witness
node server that is allowed to become the leader
node is as follows:
$ ./lgraph_server -c lgraph.json --rpc_port 9090 --enable_ha true --ha_conf 172.22.224.15:9090,172.22.224.16:9090,172.22.224.17:9090 --ha_is_witness 1 --ha_enable_witness_to_leader 1
Note: Although allowing witness
nodes to become leader
nodes can improve the availability of the cluster, it may affect data consistency in extreme cases. Therefore, it should generally be ensured that the number of witness
nodes + 1 is less than half of the total number of cluster nodes.
5.Scale out other servers
After starting the initial backup group, if you want to scale out the backup group, add new servers to the backup group,
The --ha_conf HOST:PORT
option should be used, where HOST
can be the IP address of any server already in this backup group,
And PORT
is its RPC port. E.g:
./lgraph_server -c lgraph.json --rpc_port 9090 --enable_ha true --ha_conf 172.22.224.15:9090
This command will start a TuGraph server in high availability mode and try to add it to the backup group containing the server 172.22.224.15:9090
.
Note that joining a backup group requires a server to synchronize its data with the backup group’s leader
server, and this process may take a considerable amount of time, depending on the size of the data.
6.Stopping the Server
When a server goes offline via ‘CTRL-C’, it will notify the current ‘leader’ server to remove the server from the backup group. If the leader server goes offline, it will pass the leader identity permission to another server before going offline.
If a server is terminated or disconnected from other servers in the backup group, the server is considered a failed node and the leader server will remove it from the backup group after a specified time limit.
If any server leaves the backup group and wishes to rejoin, it must start with the ‘–ha_conf {HOST:PORT}’ option, where ‘HOST’ is the IP address of a server in the current backup group.
7.Restarting the Server
Restarting the entire backup group is not recommended as it disrupts service. All servers can be shut down if desired. But on reboot,
It must be ensured that at least N/2+1 servers in the backup group at shutdown can start normally, otherwise the startup will fail. and,
Regardless of whether enable_bootstrap
is specified as true when initially starting the replication group, restarting the server only needs to pass
Specify the --ha_conf host1:port1,host2:port2
parameter to restart all servers at once. The command example is as follows:
$ ./lgraph_server -c lgraph.json --rpc_port 9090 --enable_ha true --ha_conf 172.22.224.15:9090,172.22.224.16:9090,172.22.224.17:9090
8.docker deploys a highly available cluster
In real business scenarios, it is likely to encounter the need to deploy high-availability clusters on multiple operating systems or architectures. Differentiated environments may cause some dependencies to be missing when compiling TuGraph. therefore, Compiling software in docker and deploying high-availability clusters is very valuable. Take the centos7 version of docker as an example, The steps to deploy a highly available cluster are as follows.
8.1.Install mirror
Use the following command to download TuGraph’s compiled docker image environment
docker pull tugraph/tugraph-compile-centos7
Then pull the TuGraph source code and compile and install
8.2.Create container
Use the following command to create a container, use --net=host
to make the container run in host mode, in this mode
Docker and the host machine share the network namespace, that is, they share the same IP.
docker run --net=host -itd -p -v {src_dir}:{dst_dir} --name tugraph_ha tugraph/tugraph-compile-centos7 /bin/bash
8.3.Start service
Use the following command to start the service on each server, because docker and the host share IP, so you can directly specify to start the service on the host IP
$ lgraph_server -c lgraph.json --host 172.22.224.15 --rpc_port 9090 --enable_ha true --ha_conf 172.22.224.15:9090,172.22.224.16:9090,172.22.224.17:9090
9.Server Status
The current status of the backup group can be obtained from the TuGraph visualization tool, REST API, and Cypher query.
In the TuGraph visualization tool, you can find the list of servers and their roles in the backup group in the DBInfo section.
With the REST API, you can use GET /info/peers
to request information.
In Cypher, the CALL dbms.listServers()
statement is used to query the status information of the current backup group.
10.Data synchronization in high availability mode
In high availability mode, different servers in the same backup group may not always be in the same state. For performance reasons, if a request has been synchronized to more than half of the servers, the leader server will consider the request to be in the committed state. Although the rest of the servers will eventually receive the new request, the inconsistent state of the servers will persist for some time. A client may also send a request to a server that has just restarted, thus having an older state and waiting to join a backup group.
To ensure that the client sees consistently continuous data, and in particular to get rid of the ‘reverse time travel’ problem, where the client reads a state older than it has seen before, each TuGraph server keeps a monotonically increasing data version number. The mapping of the data version number to the database state in the backup group is globally consistent, meaning that if two servers have the same data version number, they must have the same data. When responding to a request, the server includes its data version number in the response. Thus, the client can tell which version it has seen. The client can choose to send this data version number along with the request. Upon receiving a request with a data version number, the server compares the data version number to its current version and rejects the request if its own version is lower than the requested version. This mechanism ensures that the client never reads a state that is older than before.