This page helps with operating Restate clusters.
To understand the terminology used on this page, it might be helpful to read through the architecture reference.

Controlling clusters with restatectl

Restate includes a command line utility tool to connect to and control running Restate servers called restatectl. This tool is specifically designed for system operators to manage Restate servers and is particularly useful in a cluster environment.
Follow the installation instructions to get restatectl set up on your machine.
The restatectl tool communicates with Restate at the advertised address specified in the server configuration - by default TCP port 5122.

Growing the cluster

You can expand an existing cluster by adding new nodes after it has been started.
1

Starting point: single node

A Restate cluster can initially be started with a single node. Follow the cluster deployment instructions and ensure that:
# Replicating data to one node: cluster cannot tolerate node failures
default-replication = 1
2

Launch new nodes

Launch a new node with the same cluster-name and specify at least one existing node’s address in metadata-client.addresses. This allows the new node to discover the metadata servers and join the cluster.
3

Modify cluster configuration

Update the cluster’s replication settings to take advantage of the additional nodes and improve fault tolerance.Increase log replication to your desired number. For example, to replicate to two nodes:
restatectl config set --log-replication 2 --partition-replication 2
Then list the logs:
restatectl logs list
You might need to re-run the command a few times until all logs reflect the updated replication setting. If the update takes longer than expected, check the node logs for errors or warnings.

Managing the Replicated Loglet

You can manage the replicated loglet via:
restatectl replicated-loglet
When you use the replicated loglet, which is required for distributed operation, the Restate control plane selects nodes on which to replicate the log according to the specified log replication. Each log-server node in the cluster has a storage state which determines how the control plane may use this node. The set-storage-state tool allows you to manually override this state as operational needs dictate. New log servers come up in the provisioning state and will automatically transition to read-write. The read-write state means that the node is considered both healthy to read from and accept writes, that is it may be selected as a nodeset member for new loglet segments.

View storage state of log server

You can view the current storage state of log servers in your cluster using the list-servers sub-command.
restatectl replicated-loglet list-servers
Other valid storage include data-loss, read-only, and disabled. Nodes may transition to data-loss if they detect that some previously written data is not available. This does not necessarily imply corruption, only that such nodes may not participate in some quorum checks. Such nodes may transition back to read-write if they can be repaired. The read-only and disabled states are of particular interest to operators. Log servers in read-only storage state may continue to serve both reads and writes, but will no longer be selected as participants in new segments’ nodesets. The control plane will reconfigure existing logs to move away from such nodes.

Manually update the log server state

Danger of data loss:set-storage-state is a low-level command that allows you to directly set log servers’ storage-state. Changing this can lead to cluster unavailability or data loss.
Use the set-storage-state sub-command to manually update the log server state, for example to prevent log servers from being included in new nodesets. Consider the following example:
restatectl replicated-loglet set-storage-state --node 1 --storage-state read-only
Output:
Node N1 storage-state updated from read-write to read-only
The cluster controller reconfigures the log nodeset to exclude N1. Depending on the configured log replication level, you may see a warning about compromised availability or, if insufficient log servers are available to achieve the minimum required replication, the log will stop accepting writes altogether. The restatectl checks whether it is possible to create new node sets after marking a given node or set of nodes as read-only. Examine the logs using restatectl logs describe.

Troubleshooting Clusters