Skip to main content
This page helps with operating Restate clusters.
To understand the terminology used on this page, it might be helpful to read through the architecture reference.

Controlling clusters with restatectl

Restate includes a command line utility tool to connect to and control running Restate servers called restatectl. This tool is specifically designed for system operators to manage Restate servers and is particularly useful in a cluster environment.
Follow the installation instructions to get restatectl set up on your machine.
The restatectl tool communicates with Restate at the advertised address specified in the server configuration - by default TCP port 5122.
restatectl status
Optionally, specify the addresses via --addresses http://localhost:5122/:
restatectl nodes list
Output:
Node Configuration (v21)
NODE  GEN  NAME    ADDRESS                 ROLES
N1    6    node-1  http://127.0.0.1:5122/  admin | log-server | metadata-server | worker
N2    4    node-2  http://127.0.0.1:6122/  admin | log-server | metadata-server | worker
N3    6    node-3  http://127.0.0.1:7122/  log-server | metadata-server | worker
View the current log configuration (provider, replication, and nodeset) and the effective logs per partition:
restatectl logs list
restatectl logs describe
Output:
Log chain v8
 Logs Provider: replicated
 Log replication: {node: 2}
 Nodeset size: 0
L-ID  FROM-LSN  KIND        LOGLET-ID  REPLICATION  SEQUENCER  NODESET
0     61        Replicated  0_5        {node: 2}    N1:6       [N1, N2, N3]
1     4         Replicated  1_4        {node: 2}    N1:6       [N1, N2, N3]
2     4         Replicated  2_4        {node: 2}    N1:6       [N1, N2, N3]
3     5         Replicated  3_5        {node: 2}    N1:6       [N1, N2, N3]
4     4         Replicated  4_4        {node: 2}    N1:6       [N1, N2, N3]
5     5         Replicated  5_5        {node: 2}    N1:6       [N1, N2, N3]
6     6         Replicated  6_6        {node: 2}    N1:6       [N1, N2, N3]
7     4         Replicated  7_4        {node: 2}    N1:6       [N1, N2, N3]
8     4         Replicated  8_4        {node: 2}    N1:6       [N1, N2, N3]
restatectl partitions list
Output:
Alive partition processors (nodes config v21, partition table v21)
ID   NODE  MODE      STATUS  EPOCH  APPLIED  DURABLE  ARCHIVED  LSN-LAG  UPDATED
0    N1:6  Leader    Active  N1:6   61       -        -         0        1 second and 170 ms ago
0    N2:4  Follower  Active  N1:6   61       -        -         0        1 second and 64 ms ago
1    N1:6  Leader    Active  N1:6   4        -        -         0        801 ms ago
1    N2:4  Follower  Active  N1:6   4        -        -         0        779 ms ago
2    N1:6  Leader    Active  N1:6   4        -        -         0        600 ms ago
2    N2:4  Follower  Active  N1:6   4        -        -         0        1 second and 108 ms ago
3    N1:6  Leader    Active  N1:6   5        -        -         0        1 second and 369 ms ago
3    N2:4  Follower  Active  N1:6   5        -        -         0        1 second and 306 ms ago
4    N1:6  Leader    Active  N1:6   4        -        -         0        651 ms ago
4    N2:4  Follower  Active  N1:6   4        -        -         0        1 second and 169 ms ago
5    N1:6  Leader    Active  N1:6   5        -        -         0        567 ms ago
5    N2:4  Follower  Active  N1:6   5        -        -         0        1 second and 382 ms ago
6    N1:6  Leader    Active  N1:6   6        -        -         0        804 ms ago
6    N2:4  Follower  Active  N1:6   6        -        -         0        1 second and 145 ms ago
7    N1:6  Leader    Active  N1:6   4        -        -         0        1 second and 79 ms ago
7    N2:4  Follower  Active  N1:6   4        -        -         0        974 ms ago
8    N1:6  Leader    Active  N1:6   4        -        -         0        1 second and 71 ms ago
8    N2:4  Follower  Active  N1:6   4        -        -         0        717 ms ago


☠️ Dead nodes
NODE  LAST-SEEN
N3    11 minutes, 40 seconds and 995 ms ago
restatectl config get
Output:
⚙️ Cluster Configuration
 Number of partitions: 8
 Partition replication: *
 Logs Provider: replicated
 Log replication: {node: 1}
 Nodeset size: 0
restatectl config set --help # check options
restatectl config set --log-replication 2 # increase log replication
Output:
⚙️ Cluster Configuration
 Number of partitions: 8
 Partition replication: *
 Logs Provider: replicated
- Log replication: {node: 1}
+ Log replication: {node: 2}
 Nodeset size: 0


? Apply changes? (y/n) › yes

Growing the cluster

You can expand an existing cluster by adding new nodes after it has been started.
1

Starting point: single node

A Restate cluster can initially be started with a single node. Follow the cluster deployment instructions and ensure that:
# Replicating data to one node: cluster cannot tolerate node failures
default-replication = 1
2

Launch new nodes

Launch a new node with the same cluster-name and specify at least one existing node’s address in metadata-client.addresses. This allows the new node to discover the metadata servers and join the cluster.
3

Modify cluster configuration

Update the cluster’s replication settings to take advantage of the additional nodes and improve fault tolerance.Increase log replication to your desired number. For example, to replicate to two nodes:
restatectl config set --log-replication 2 --partition-replication 2
⚙️ Cluster Configuration
 Number of partitions: 4
-├ Partition replication: {node: 1}
+├ Partition replication: {node: 2}
 Logs Provider: replicated
- Log replication: {node: 1}
+ Log replication: {node: 2}
 Nodeset size: 0


? Apply changes? (y/n) › no
Then list the logs:
restatectl logs list
Log chain v5
 Logs Provider: replicated
 Log replication: {node: 2}
 Nodeset size: 0
L-ID  FROM-LSN  KIND        LOGLET-ID  REPLICATION  SEQUENCER  NODESET
0     2         Replicated  0_2        {node: 2}    N1:2       [N1, N2, N3]
1     2         Replicated  1_2        {node: 2}    N1:2       [N1, N2, N3]
2     2         Replicated  2_2        {node: 2}    N1:2       [N1, N2, N3]
3     2         Replicated  3_2        {node: 2}    N1:2       [N1, N2, N3]
4     2         Replicated  4_2        {node: 2}    N1:2       [N1, N2, N3]
5     2         Replicated  5_2        {node: 2}    N1:2       [N1, N2, N3]
6     2         Replicated  6_2        {node: 2}    N1:2       [N1, N2, N3]
7     2         Replicated  7_2        {node: 2}    N1:2       [N1, N2, N3]
You might need to re-run the command a few times until all logs reflect the updated replication setting. If the update takes longer than expected, check the node logs for errors or warnings.

Managing the Replicated Loglet

You can manage the replicated loglet via:
restatectl replicated-loglet
When you use the replicated loglet, which is required for distributed operation, the Restate control plane selects nodes on which to replicate the log according to the specified log replication. Each log-server node in the cluster has a storage state which determines how the control plane may use this node. The set-storage-state tool allows you to manually override this state as operational needs dictate. New log servers come up in the provisioning state and will automatically transition to read-write. The read-write state means that the node is considered both healthy to read from and accept writes, that is it may be selected as a nodeset member for new loglet segments.

View storage state of log server

You can view the current storage state of log servers in your cluster using the list-servers sub-command.
restatectl replicated-loglet list-servers
Node configuration v12
Log chain v3
NODE  GEN   STORAGE-STATE  HISTORICAL LOGLETS  ACTIVE LOGLETS
N1    N1:3  read-write     4                   2
N2    N2:2  read-write     4                   2
N3    N3:2  read-write     4                   2
Other valid storage include data-loss, read-only, and disabled. Nodes may transition to data-loss if they detect that some previously written data is not available. This does not necessarily imply corruption, only that such nodes may not participate in some quorum checks. Such nodes may transition back to read-write if they can be repaired. The read-only and disabled states are of particular interest to operators. Log servers in read-only storage state may continue to serve both reads and writes, but will no longer be selected as participants in new segments’ nodesets. The control plane will reconfigure existing logs to move away from such nodes.

Manually update the log server state

Danger of data loss:set-storage-state is a low-level command that allows you to directly set log servers’ storage-state. Changing this can lead to cluster unavailability or data loss.
Use the set-storage-state sub-command to manually update the log server state, for example to prevent log servers from being included in new nodesets. Consider the following example:
restatectl replicated-loglet set-storage-state --node 1 --storage-state read-only
Output:
Node N1 storage-state updated from read-write to read-only
The cluster controller reconfigures the log nodeset to exclude N1. Depending on the configured log replication level, you may see a warning about compromised availability or, if insufficient log servers are available to achieve the minimum required replication, the log will stop accepting writes altogether. The restatectl checks whether it is possible to create new node sets after marking a given node or set of nodes as read-only. Examine the logs using restatectl logs describe.

Troubleshooting Clusters

If a misconfigured Restate node with the log server role attempts to join a cluster where the node id is already in use, you will observe that the newly started node aborts with an error:
ERROR restate_core::task_center: Shutting down: task 4 failed with: Node cannot start a log-server on N3, it has detected that it has lost its data. storage-state is `data-loss`
Restarting the existing node that previously owned this id will also cause it to stop with the same message. Follow these steps to return the initial log server into service without losing its stored log segments.First, prevent the misconfigured node from starting again until the configuration has been corrected. If this was a brand new node, there should be no data stored on it, and you may delete it altogether.The reused node id has been marked as having data-loss. This precaution that tells the Restate control plane to avoid selecting this node as member of new log nodesets. You can view the current status using the restatectl replicated-loglet tool:
restatectl replicated-loglet servers
Node configuration v21
Log chain v6
NODE  GEN   STORAGE-STATE  HISTORICAL LOGLETS  ACTIVE LOGLETS
N1    N1:5  read-write     8                   2
N2    N2:4  read-write     8                   2
N3    N3:6  data-loss      6                   0
You should also observe that the control plane is now avoiding using this node for log storage. This will result in reduced fault tolerance or even unavailability, depending on the configured minimum log replication:
restatectl logs list
Logs v3
└ Logs Provider: replicated
├ Log replication: {node: 2}
└ Nodeset size: 0
L-ID  FROM-LSN  KIND        LOGLET-ID  REPLICATION  SEQUENCER  NODESET
0     2         Replicated  0_1        {node: 2}    N2:1       [N1, N2]
1     2         Replicated  1_1        {node: 2}    N2:1       [N1, N2]
To restore the original node’s ability to accept writes, we can update its metadata using set-storage-state subcommand.
Only proceed if you are confident that you understand the reason why the node is in this state, and are certain that its locally stored data is still intact. Since Restate cannot automatically validate that it safe to put this node back into service, we must use the --force flag to override the default state transition rules.
restatectl replicated-loglet set-storage-state --node 3 --storage-state 'read-write' --force
Node N3 storage-state updated from data-loss to read-write
You can validate that the server is once again being used for log storage using logs list and replicated-loglet servers subcommands.
You are observing a partition processor repeatedly crash-looping with a TrimGapEncountered error, or see one of the following errors in the Restate server logs:
  • A log trim gap was encountered, but no snapshot repository is configured!
  • A log trim gap was encountered, but no snapshot is available for this partition!
  • The latest available snapshot is from an LSN before the target LSN!
You are observing a situation where the local state available on a given worker node does not allow it to resume from the log’s trim point - either because it is brand new, or because its applied partition state is behind the trim point of the partition log. If you are attempting to migrate from a single-node Restate to a cluster deployment, you can also refer to the migration guide.To recover from this situation, you need to make available a snapshot of the partition state from another worker, which is up to date with the log. This situation can arise if you have manually trimmed the log, the node is missing a snapshot repository configuration, or the snapshot repository is otherwise inaccessible. See Log trimming and Snapshots for more context about how logs, partitions, and snapshots are related.

Recovery procedure

1. Identify whether a snapshot repository is configured and accessible
If a snapshot repository is set up on other nodes in the cluster, and simply not configured on the node where you are seeing the partition processor startup errors, correct the configuration on the new node - refer to Configuring Snapshots. If you have not yet set up a snapshot repository, please do so now. If it is impossible to use an object store to host the snapshots repository, you can export snapshots to a local filesystem and manually transfer them to other nodes - skip to step 2b.In your server configuration, you should have a snapshot path specified as follows:
[worker.snapshots]
destination = "s3://snapshots/prefix"
Confirm that this is consistent with other nodes in the cluster.Check the server logs for any access errors; does the node have the necessary credentials and are those credentials authorized to access the snapshots destination?
2. Publish a snapshot to the repository
Snapshots are produced periodically by partition processors on certain triggers, such as a number of records being appended to the log. If you are seeing the following error, check that snapshot are being written to the object store destination you have configured.Verify that this partition has an active node:
restatectl partitions list
If you have lost all nodes which previously hosted this partition, you have permanent data loss - the partition state can not be fully recovered. Get in touch with us to assist in re-starting the partition accepting the data loss.Request a snapshot for this partition:
restatectl snapshots create-snapshot {partition_id}
You can manually confirm that the snapshot was published to the expected destination. Within the specified snapshot bucket and prefix, you will find a partition-based tree structure. Navigate to the bucket path {prefix}/{partition_id} - you should see an entry for the new snapshot id matching the output of the create snapshot command.
2b. Alternative: Manually transfer snapshot from another node
If you are running a cluster but are unable to setup a snapshot repository in a shared object store destination, you can still recover node state by publishing a snapshot from a healthy node ot the local filesystem and manually transferring it to the new node.
Experimenting with snapshots without an object store: Note that shared filesystems are not a supported target for cluster snapshots, and have known correctness risks. The file:// protocol does not support conditional updates, which makes it unsuitable for potentially contended operation.
Identify an up-to-date node which is running the partition by running:
restatectl partitions list
On this node, configure a local destination for the partition snapshot repository - make sure this already exists:
[worker.snapshots]
destination = "file:///mnt/restate-data/snapshots-repository"
Restart the node. If you have multiple nodes which may assume leadership for this partition, you will need to either repeat this on all of them, or temporarily shut them down. Create snapshot(s) for the affected partition(s):
restatectl snapshots create-snapshot {partition_id}
Copy the contents of the snapshots repository to the node experiencing issues, and configure it to point to the snapshot repository. If you have multiple snapshots produced by multiple peer nodes, you can merge them all in the same location - each partition’s snapshots will be written to dedicated sub-directory for that partition.
3. Confirm that the affected node starts up and bootstraps its partition store from a snapshot
Once you have confirmed that a snapshot for the partition is available at the configured location, the configured repository access credentials have the necessary permissions, and the local node configuration is correct, you should see the partition processor start up and join the partition. If you have updated the Restate server configuration in the process, you should restart the server process to ensure that the latest changes are picked up.
I