The following is the default configuration. It does not include all possible configuration options, since some can be conflicting. Take a look at the configuration reference below for a full list of options.Note that configuration defaults might change across server releases, if you want to make sure you use stable values, use an explicit configuration file and pass the path via --config-path=<PATH> as described above.Important changes in recent versions:
Restate now listens on both TCP and Unix sockets by default (listen-mode = "all"). Unix sockets are created under restate-data/*.sock.
Advertised addresses are automatically detected based on your network configuration. You no longer need to explicitly set advertised-address for most deployments.
The metadata-client.addresses field is now optional for single-node setups.
Address that other nodes will use to connect to this service.The full prefix that will be used to advertise this service publicly.
For example, if this is set to https://my-host then others will use this
as base URL to connect to this service.If unset, the advertised address will be inferred from public address of this node
or it’ll use the value supplied in advertised-host if set.advertised address: An externally accessible URI address for admin-api-server. This can be set to unix:restate-data/admin.sock to advertise the automatically created unix-socket instead of using tcp if neededExamples:
“http//127.0.0.1:9070/” or “https://my-host/” or “unix:/data/restate-data/admin.sock”
Advertised Admin endpoint: Optional advertised Admin API endpoint.
[Deprecated] Use advertised-address instead.advertised address: An externally accessible URI address for admin-api-server. This can be set to unix:restate-data/admin.sock to advertise the automatically created unix-socket instead of using tcp if neededExamples:
“http//127.0.0.1:9070/” or “https://my-host/” or “unix:/data/restate-data/admin.sock”
The combination of bind-ip and bind-port that will be used to bindThis has precedence over bind-ip and bind-portBind address: The local network address to bind on for admin-api-server. This service uses default port 9070 and will create a unix-socket file at the data directory under the name admin.sockExamples:
”[::]:9070” or “0.0.0.0:9070” or “127.0.0.1:9070”
Deployment routing headers: List of header names considered routing headers.These will be used during deployment creation to distinguish between an already existing deployment and a new deployment.
Controller heartbeats: Controls the interval at which cluster controller polls nodes of the cluster.Examples:
“10 hours” or “5 days” or “5d” or “1h 4m” or “P40D”
Address that other nodes will use to connect to this service.The full prefix that will be used to advertise this service publicly.
For example, if this is set to https://my-host then others will use this
as base URL to connect to this service.If unset, the advertised address will be inferred from public address of this node
or it’ll use the value supplied in advertised-host if set.advertised address: An externally accessible URI address for tokio-console-server. This can be set to unix:restate-data/tokio.sock to advertise the automatically created unix-socket instead of using tcp if neededExamples:
“http//127.0.0.1:6669/” or “https://my-host/” or “unix:/data/restate-data/tokio.sock”
Auto cluster provisioning: If true, then this node is allowed to automatically provision as a new cluster.
This node must have an admin role and a new nodes configuration will be created that includes this node.auto-provision is allowed by default in development mode and is disabled if restate-server runs with --production flag
to prevent cluster nodes from forming their own clusters, rather than forming a single cluster.Use restatectl to provision the cluster/node if automatic provisioning is disabled.This can also be explicitly disabled by setting this value to false.Default: true
Append retry maximum interval: Maximum retry duration used by the exponential backoff mechanism for bifrost appends.Examples:
“10 hours” or “5 days” or “5d” or “1h 4m” or “P40D”
Append retry minimum interval: Minimum retry duration used by the exponential backoff mechanism for bifrost appends.Examples:
“10 hours” or “5 days” or “5d” or “1h 4m” or “P40D”
Auto recovery threshold: Time interval after which bifrost’s auto-recovery mechanism will kick in. This
is triggered in scenarios where the control plane took too long to complete loglet
reconfigurations.Examples:
“10 hours” or “5 days” or “5d” or “1h 4m” or “P40D”
Disable Automatic Improvement: When enabled, automatic improvement periodically checks with the loglet provider
if the loglet configuration can be improved by performing a reconfiguration.This allows the log to pick up replication property changes, apply better placement
of replicas, or for other reasons.
Interval between retries.Can be configured using the jiff::fmt::friendly format or ISO8601, for example 5 hours.Examples:
“10 hours” or “5 days” or “5d” or “1h 4m” or “P40D” or “0”
Initial Interval: Initial interval for the first retry attempt.Can be configured using the jiff::fmt::friendly format or ISO8601, for example 5 hours.Examples:
“10 hours” or “5 days” or “5d” or “1h 4m” or “P40D” or “0”
Max interval: Maximum interval between retries.Can be configured using the jiff::fmt::friendly format or ISO8601, for example 5 hours.Human-readable duration: Duration string in either jiff human friendly or ISO8601 format. Check https://docs.rs/jiff/latest/jiff/struct.Span.html#parsing-and-printing for more details.Examples:
“10 hours” or “5 days” or “5d” or “1h 4m” or “P40D” or “0”
Interval between retries.Can be configured using the jiff::fmt::friendly format or ISO8601, for example 5 hours.Examples:
“10 hours” or “5 days” or “5d” or “1h 4m” or “P40D” or “0”
Initial Interval: Initial interval for the first retry attempt.Can be configured using the jiff::fmt::friendly format or ISO8601, for example 5 hours.Examples:
“10 hours” or “5 days” or “5d” or “1h 4m” or “P40D” or “0”
Max interval: Maximum interval between retries.Can be configured using the jiff::fmt::friendly format or ISO8601, for example 5 hours.Human-readable duration: Duration string in either jiff human friendly or ISO8601 format. Check https://docs.rs/jiff/latest/jiff/struct.Span.html#parsing-and-printing for more details.Examples:
“10 hours” or “5 days” or “5d” or “1h 4m” or “P40D” or “0”
Log Server RPC timeoutTimeout waiting on log server responseNon-zero human-readable duration: Non-zero duration string in either jiff human friendly or ISO8601 format. Check https://docs.rs/jiff/latest/jiff/struct.Span.html#parsing-and-printing for more details.Examples:
“10 hours” or “5 days” or “5d” or “1h 4m” or “P40D”
Maximum number of inflight records sequencer can acceptOnce this maximum is hit, sequencer will induce back pressure
on clients. This controls the total number of records regardless of how many batches.Note that this will be increased to fit the biggest batch of records being enqueued.
Limits memory per remote batch read: When reading from a log-server, the server stops reading if the next record will tip over the
total number of bytes allowed in this configuration option.Note the limit is not strict and the server will always allow at least a single record to be
read even if that record exceeds the stated budget.
Maximum number of records to prefetch from log serversThe number of records bifrost will attempt to prefetch from replicated loglet’s log-servers
for every loglet reader (e.g. partition processor). Note that this mainly impacts readers
that are not co-located with the loglet sequencer (i.e. partition processor followers).
Trigger to prefetch more recordsWhen read-ahead is used (readahead-records), this value (percentage in float) will determine when
readers should trigger a prefetch for another batch to fill up the buffer. For instance, if
this value is 0.3, then bifrost will trigger a prefetch when 30% or more of the read-ahead
slots become available (e.g. partition processor consumed records and freed up enough slots).The higher the value is, the longer bifrost will wait before it triggers the next fetch, potentially
fetching more records as a result.To illustrate, if readahead-records is set to 100 and readahead-trigger-ratio is 1.0. Then
bifrost will prefetch up to 100 records from log-servers and will not trigger the next
prefetch unless the consumer consumes 100% of this buffer. This means that bifrost will
read in batches but will not do while the consumer is still reading the previous batch.Value must be between 0 and 1. It will be clamped at 1.0.
Adaptive timeout for LogServer RPCThis configures the adaptive timeout range for RPC operations from this node to log servers.
The timeout range is also used to determine the appropriate retry delay between retry attempts.Examples:
“250ms..1m” or “10ms..60s” or “0.5s..5m” or “1s..1h”
Interval between retries.Can be configured using the jiff::fmt::friendly format or ISO8601, for example 5 hours.Examples:
“10 hours” or “5 days” or “5d” or “1h 4m” or “P40D” or “0”
Initial Interval: Initial interval for the first retry attempt.Can be configured using the jiff::fmt::friendly format or ISO8601, for example 5 hours.Examples:
“10 hours” or “5 days” or “5d” or “1h 4m” or “P40D” or “0”
Max interval: Maximum interval between retries.Can be configured using the jiff::fmt::friendly format or ISO8601, for example 5 hours.Human-readable duration: Duration string in either jiff human friendly or ISO8601 format. Check https://docs.rs/jiff/latest/jiff/struct.Span.html#parsing-and-printing for more details.Examples:
“10 hours” or “5 days” or “5d” or “1h 4m” or “P40D” or “0”
Adaptive timeout for LogServer Store MessagesThis configures the adaptive timeout range for Store operations from this node to log servers.Since v1.7.0Examples:
“250ms..1m” or “10ms..60s” or “0.5s..5m” or “1s..1h”
The combination of bind-ip and bind-port that will be used to bindThis has precedence over bind-ip and bind-portBind address: The local network address to bind on for tokio-console-server. This service uses default port 6669 and will create a unix-socket file at the data directory under the name tokio.sockExamples:
”[::]:6669” or “0.0.0.0:6669” or “127.0.0.1:6669”
Partitions: Number of partitions that will be provisioned during initial cluster provisioning.
partitions are the logical shards used to process messages.Cannot be higher than 65535 (You should almost never need as many partitions anyway)NOTE 1: This config entry only impacts the initial number of partitions, the
value of this entry is ignored for provisioned nodes/clusters.NOTE 2: This will be renamed to default-num-partitions by default as of v1.3+Default: 24
Default replication factor: Configures the global default replication factor to be used by the the system.Note that this value only impacts the cluster initial provisioning and will not be respected after
the cluster has been provisioned.To update existing clusters use the restatectl utility.
Default retry policy: The default retry policy to use for invocations.The retry policy can be customized on a service/handler basis, using the respective SDK APIs.
Check https://docs.restate.dev/services/configuration#retries for more details.
On max attempts: Behavior when max attempts are reached.Set to pause to pause invocations when max attempts are reached.
Set to kill to kill the invocation when max attempts are reached.For more details about the invocation lifecycle, check https://docs.restate.dev/services/invocation/managing-invocations
"pause" : Pause the invocation when max attempts are reached.
"kill" : Kill the invocation when max attempts are reached.
Disable telemetry: Restate uses Scarf to collect anonymous usage data to help us understand how the software is being used.
You can set this flag to true to disable this collection. It can also be set with the environment variable DO_NOT_TRACK=1.
Gossip extras exchange frequency: In addition to basic health/liveness information, the gossip protocol is used to exchange
extra information about the roles hosted by this node. For instance, which partitions are
currently running, their configuration versions, and the durable LSN of the corresponding
partition databases. This information is sent every Nth gossip message. This setting
controls the frequency of this exchange. For instance, 10 means that every 10th gossip
message will contain the extra information about.
Gossip loneliness threshold: How many intervals need to pass without receiving any gossip messages before considering
this node as potentially isolated/dead. This threshold is used in the case where the node
can still send gossip messages but did not receive any. This can rarely happen in
asymmetric network partitions.In this case, the node will advertise itself as dead in the gossip messages it sends out.Note: this threshold does not apply to a cluster that’s configured with a single node.
Number of peers to gossip: On every gossip interval, how many peers each node attempts to gossip with. The default is
optimized for small clusters (less than 5 nodes). On larger clusters, if gossip overhead is noticeable,
consider reducing this value to 1.
Address that other nodes will use to connect to this service.The full prefix that will be used to advertise this service publicly.
For example, if this is set to https://my-host then others will use this
as base URL to connect to this service.If unset, the advertised address will be inferred from public address of this node
or it’ll use the value supplied in advertised-host if set.advertised address: An externally accessible URI address for http-ingress-server. This can be set to unix:restate-data/ingress.sock to advertise the automatically created unix-socket instead of using tcp if neededExamples:
“http//127.0.0.1:8080/” or “https://my-host/” or “unix:/data/restate-data/ingress.sock”
[Deprecated] Use advertised-address instead.
Ingress endpoint that the Web UI should use to interact with.advertised address: An externally accessible URI address for http-ingress-server. This can be set to unix:restate-data/ingress.sock to advertise the automatically created unix-socket instead of using tcp if neededExamples:
“http//127.0.0.1:8080/” or “https://my-host/” or “unix:/data/restate-data/ingress.sock”
The combination of bind-ip and bind-port that will be used to bindThis has precedence over bind-ip and bind-portBind address: The local network address to bind on for http-ingress-server. This service uses default port 8080 and will create a unix-socket file at the data directory under the name ingress.sockExamples:
”[::]:8080” or “0.0.0.0:8080” or “127.0.0.1:8080”
Concurrency limit: Local concurrency limit to use to limit the amount of concurrent requests. If exceeded,
the ingress will reply immediately with an appropriate status code. Default is unlimited.
HTTP/2 max concurrent streams: Caps the number of concurrent HTTP/2 streams accepted per inbound ingress connection.
If unset, Restate does not configure this limit and leaves it at hyper’s runtime default.
With the current hyper version, that default is 200 streams.
Service-mesh clients such as Linkerd honor the advertised value as a hard per-connection
concurrency limit, so high-concurrency or long-poll deployments may need to raise it.Since v1.7.0
Interval between retries.Can be configured using the jiff::fmt::friendly format or ISO8601, for example 5 hours.Examples:
“10 hours” or “5 days” or “5d” or “1h 4m” or “P40D” or “0”
Initial Interval: Initial interval for the first retry attempt.Can be configured using the jiff::fmt::friendly format or ISO8601, for example 5 hours.Examples:
“10 hours” or “5 days” or “5d” or “1h 4m” or “P40D” or “0”
Max interval: Maximum interval between retries.Can be configured using the jiff::fmt::friendly format or ISO8601, for example 5 hours.Human-readable duration: Duration string in either jiff human friendly or ISO8601 format. Check https://docs.rs/jiff/latest/jiff/struct.Span.html#parsing-and-printing for more details.Examples:
“10 hours” or “5 days” or “5d” or “1h 4m” or “P40D” or “0”
Inflight Memory Budget: Maximum total size of in-flight ingestion requests in bytes.
Tune this to your workload so there are enough unpersisted
requests for efficient batching without exhausting memory.Defaults to 1 MiB.
Request Batch Size: Maximum size of a single ingestion request batch.
Tune to keep enough requests per batch for
throughput; overly large batches can increase tail latency.Defaults to 50 KiB.
Kafka cluster options: Configuration options to connect to a Kafka cluster.
Deprecated in 1.7: Kafka clusters should now be configured through the UI/Admin API
Request size limit: Maximum size of request that can be received over ingress. If a request size is
larger than this limit, the request will fail.If unset, defaults to networking.message-size-limit. If set, it will be clamped at
the value of networking.message-size-limit since larger requests cannot be transmitted
over the cluster internal network.Since v1.7.0Non-zero human-readable bytes
Node Location: Setting the location allows Restate to form a tree-like cluster topology.
The value is written in the format of “region[.zone]” to assign this node
to a specific region, or to a zone within a region.The value of region and zone is arbitrary but whitespace and . are disallowed.NOTE: It’s strongly recommended to not change the node’s location string after
its initial registration. Changing the location may result in data loss or data
inconsistency if log-server is enabled on this node.When this value is not set, the node is considered to be in the default location.
The default location means that the node is not assigned to any specific region or zone.Examples
us-west — the node is in the us-west region.
us-west.a1 — the node is in the us-west region and in the a1 zone.
Disable ANSI in log output: Disable ANSI terminal codes for logs. This is useful when the log collector doesn’t support processing ANSI terminal codes.
Logging Filter: Log filter configuration. Can be overridden by the RUST_LOG environment variable.
Check the RUST_LOG documentation for more details how to configure it.
RocksDB compaction readahead size in bytes: If non-zero, we perform bigger reads when doing compaction. If you’re
running RocksDB on spinning disks, you should set this to at least 2MB.
That way RocksDB’s compaction is doing sequential instead of random reads.Non-zero human-readable bytes
Disable Direct IO for reads: Files will be opened in “direct I/O” mode
which means that data r/w from the disk will not be cached or
buffered. The hardware buffer of the devices may however still
be used. Memory mapped files are not impacted by these parameters.
Disable L0/L1 SST compression: When false (the default), L0 and L1 SST files are compressed with Lz4.
Higher levels (L2+) always use Zstd regardless of this setting.
Set to true to disable compression for L0/L1, which can improve write
throughput at the cost of higher disk usage since these files are
short-lived and frequently compacted.Default: false (L0/L1 compression enabled)
Disable WAL compression: When false (the default), the Write-Ahead Log is compressed with Zstd.
Set to true to disable WAL compression. Only applies when WAL is enabled.Default: false (WAL compression enabled)
The memory budget for rocksdb memtables in bytesIf this value is set, it overrides the ratio defined in rocksdb-memory-ratio.Non-zero human-readable bytes
The memory budget for rocksdb memtables as ratioThis defines the total memory for rocksdb as a ratio of the node’s total budget
for rocksdb memtables.(See rocksdb-total-memtables-ratio in common).
RocksDB statistics level: StatsLevel can be used to reduce statistics overhead by skipping certain
types of stats in the stats collection process.Default: “except-detailed-timers”
"disable-all" : Disable all metrics
"except-histogram-or-timers" : Disable timer stats, and skip histogram stats
"except-timers" : Skip timer stats
"except-detailed-timers" : Collect all stats except time inside mutex lock AND time spent on
compression.
"except-time-for-mutex" : Collect all stats except the counters requiring to get time inside the
mutex lock.
"all" : Collect all stats, including measuring duration of mutex operations.
If getting time is expensive on the platform to run, it can
reduce scalability to more threads, especially for writes.
Custom Path for WAL filesIf unset, the default data directory is used for WAL files. If set, the path
must point to a directory that exists and is writable. The directory will be
used to store WAL files.The recommendation is to use low-latency NVMe SSD for the WAL directory with
sufficient IOPS capacity and bandwidth to your sustained workload. The fdatasync
latency of the storage device is a significant factor in the WAL performance if
rocksdb-disable-wal-fsync is set to false.Since v1.7.0
Write Batch (in bytes): If unset, a reasonable value is automatically derived from the memory budget
available for memtables of the log-server.If write-batch-commit-count is set, the write batch will be limited to whichever
of the two is reached first. The value is clamped to the size of a single memtable
which is derived from rocksdb-memory-budget.Non-zero human-readable bytes
Write Batch (count): The number of records to batch in a single write batch. This is a (soft) upper bound
and the actual number of records may be lower in low-throughput scenarios or if
the write batch size (in bytes) is reached first.If unset, the write batch will only be limited by its size in bytes.
Maximum idempotency retention duration that can be configured.
Applied when ingesting the invocation: values higher than this limit are clamped down to it.Unset means no limit.Since v1.7.0Human-readable duration: Duration string in either jiff human friendly or ISO8601 format. Check https://docs.rs/jiff/latest/jiff/struct.Span.html#parsing-and-printing for more details.Examples:
“10 hours” or “5 days” or “5d” or “1h 4m” or “P40D” or “0”
Maximum journal retention duration that can be configured.
Applied when ingesting the invocation: values higher than this limit are clamped down to it.Unset means no limit.Human-readable duration: Duration string in either jiff human friendly or ISO8601 format. Check https://docs.rs/jiff/latest/jiff/struct.Span.html#parsing-and-printing for more details.Examples:
“10 hours” or “5 days” or “5d” or “1h 4m” or “P40D” or “0”
Max configurable value for retry policy max attempts: Maximum max attempts configurable in an invocation retry policy.
When discovering a service deployment with configured retry policies, or when modifying the invocation retry policy using the Admin API, the given value will be clamped.None means no limit, that is infinite retries is enabled.
Maximum workflow completion retention duration that can be configured.
Applied when ingesting the invocation: values higher than this limit are clamped down to it.Unset means no limit.Since v1.7.0Human-readable duration: Duration string in either jiff human friendly or ISO8601 format. Check https://docs.rs/jiff/latest/jiff/struct.Span.html#parsing-and-printing for more details.Examples:
“10 hours” or “5 days” or “5d” or “1h 4m” or “P40D” or “0”
Interval between retries.Can be configured using the jiff::fmt::friendly format or ISO8601, for example 5 hours.Examples:
“10 hours” or “5 days” or “5d” or “1h 4m” or “P40D” or “0”
Initial Interval: Initial interval for the first retry attempt.Can be configured using the jiff::fmt::friendly format or ISO8601, for example 5 hours.Examples:
“10 hours” or “5 days” or “5d” or “1h 4m” or “P40D” or “0”
Max interval: Maximum interval between retries.Can be configured using the jiff::fmt::friendly format or ISO8601, for example 5 hours.Human-readable duration: Duration string in either jiff human friendly or ISO8601 format. Check https://docs.rs/jiff/latest/jiff/struct.Span.html#parsing-and-printing for more details.Examples:
“10 hours” or “5 days” or “5d” or “1h 4m” or “P40D” or “0”
Metadata Network Message Size: Maximum size of network messages that metadata client can receive from a metadata server.If unset, defaults to networking.message-size-limit. If set, it will be clamped at
the value of networking.message-size-limit since larger messages cannot be transmitted
over the cluster internal network.Non-zero human-readable bytes
Show Option 2: Store metadata on an external etcd cluster.
The addresses are formatted as `host:port`
Interval between retries.Can be configured using the jiff::fmt::friendly format or ISO8601, for example 5 hours.Examples:
“10 hours” or “5 days” or “5d” or “1h 4m” or “P40D” or “0”
Initial Interval: Initial interval for the first retry attempt.Can be configured using the jiff::fmt::friendly format or ISO8601, for example 5 hours.Examples:
“10 hours” or “5 days” or “5d” or “1h 4m” or “P40D” or “0”
Max interval: Maximum interval between retries.Can be configured using the jiff::fmt::friendly format or ISO8601, for example 5 hours.Human-readable duration: Duration string in either jiff human friendly or ISO8601 format. Check https://docs.rs/jiff/latest/jiff/struct.Span.html#parsing-and-printing for more details.Examples:
“10 hours” or “5 days” or “5d” or “1h 4m” or “P40D” or “0”
Metadata Network Message Size: Maximum size of network messages that metadata client can receive from a metadata server.If unset, defaults to networking.message-size-limit. If set, it will be clamped at
the value of networking.message-size-limit since larger messages cannot be transmitted
over the cluster internal network.Non-zero human-readable bytes
Show Option 3: Store metadata on an external object store.
Object store API endpoint URL override: When you use Amazon S3, this is typically inferred from the region and there is no need to
set it. With other object stores, you will have to provide an appropriate HTTP(S) endpoint.
If not using HTTPS, also set aws-allow-http to true.
AWS profile: The AWS configuration profile to use for S3 object store destinations. If you use
named profiles in your AWS configuration, you can replace all the other settings with
a single profile reference. See the [AWS documentation on profiles]
(https://docs.aws.amazon.com/sdkref/latest/guide/file-format.html) for more.
AWS region to use with S3 object store destinations. This may be inferred from the
environment, for example the current region when running in EC2. Because of the
request signing algorithm this must have a value. For Minio, you can generally
set this to any string, such as us-east-1.
Interval between retries.Can be configured using the jiff::fmt::friendly format or ISO8601, for example 5 hours.Examples:
“10 hours” or “5 days” or “5d” or “1h 4m” or “P40D” or “0”
Initial Interval: Initial interval for the first retry attempt.Can be configured using the jiff::fmt::friendly format or ISO8601, for example 5 hours.Examples:
“10 hours” or “5 days” or “5d” or “1h 4m” or “P40D” or “0”
Max interval: Maximum interval between retries.Can be configured using the jiff::fmt::friendly format or ISO8601, for example 5 hours.Human-readable duration: Duration string in either jiff human friendly or ISO8601 format. Check https://docs.rs/jiff/latest/jiff/struct.Span.html#parsing-and-printing for more details.Examples:
“10 hours” or “5 days” or “5d” or “1h 4m” or “P40D” or “0”
Object store path for metadata storage: This location will be used to persist cluster metadata. Takes the form of a URL
with s3:// as the protocol and bucket name as the authority, plus an optional
prefix specified as the path component.Example: s3://bucket/prefix
Interval between retries.Can be configured using the jiff::fmt::friendly format or ISO8601, for example 5 hours.Examples:
“10 hours” or “5 days” or “5d” or “1h 4m” or “P40D” or “0”
Initial Interval: Initial interval for the first retry attempt.Can be configured using the jiff::fmt::friendly format or ISO8601, for example 5 hours.Examples:
“10 hours” or “5 days” or “5d” or “1h 4m” or “P40D” or “0”
Max interval: Maximum interval between retries.Can be configured using the jiff::fmt::friendly format or ISO8601, for example 5 hours.Human-readable duration: Duration string in either jiff human friendly or ISO8601 format. Check https://docs.rs/jiff/latest/jiff/struct.Span.html#parsing-and-printing for more details.Examples:
“10 hours” or “5 days” or “5d” or “1h 4m” or “P40D” or “0”
Metadata Network Message Size: Maximum size of network messages that metadata client can receive from a metadata server.If unset, defaults to networking.message-size-limit. If set, it will be clamped at
the value of networking.message-size-limit since larger messages cannot be transmitted
over the cluster internal network.Non-zero human-readable bytes
Auto join the metadata cluster when being startedDefines whether this node should auto join the metadata store cluster when being started
for the first time.
The raft log trim threshold: The threshold for trimming the raft log. The log will be trimmed if the number of apply entries
exceeds this threshold. The default value is 1000.
The number of ticks before triggering an electionThe number of ticks before triggering an election. The value must be larger than
raft_heartbeat_tick. It’s recommended to set raft_election_tick = 10 * raft_heartbeat_tick.
Decrease this value if you want to react faster to failed leaders. Note, decreasing this
value too much can lead to cluster instabilities due to falsely detecting dead leaders.
The number of ticks before sending a heartbeatA leader sends heartbeat messages to maintain its leadership every heartbeat ticks.
Decrease this value to send heartbeats more often.
RocksDB compaction readahead size in bytes: If non-zero, we perform bigger reads when doing compaction. If you’re
running RocksDB on spinning disks, you should set this to at least 2MB.
That way RocksDB’s compaction is doing sequential instead of random reads.Non-zero human-readable bytes
Disable Direct IO for reads: Files will be opened in “direct I/O” mode
which means that data r/w from the disk will not be cached or
buffered. The hardware buffer of the devices may however still
be used. Memory mapped files are not impacted by these parameters.
Disable L0/L1 SST compression: When false (the default), L0 and L1 SST files are compressed with Lz4.
Higher levels (L2+) always use Zstd regardless of this setting.
Set to true to disable compression for L0/L1, which can improve write
throughput at the cost of higher disk usage since these files are
short-lived and frequently compacted.Default: false (L0/L1 compression enabled)
Disable WAL compression: When false (the default), the Write-Ahead Log is compressed with Zstd.
Set to true to disable WAL compression. Only applies when WAL is enabled.Default: false (WAL compression enabled)
The memory budget for rocksdb memtables in bytesIf this value is set, it overrides the ratio defined in rocksdb-memory-ratio.Non-zero human-readable bytes
The memory budget for rocksdb memtables as ratioThis defines the total memory for rocksdb as a ratio of all memory available to memtables
(See rocksdb-total-memtables-ratio in common).
RocksDB statistics level: StatsLevel can be used to reduce statistics overhead by skipping certain
types of stats in the stats collection process.Default: “except-detailed-timers”
"disable-all" : Disable all metrics
"except-histogram-or-timers" : Disable timer stats, and skip histogram stats
"except-timers" : Skip timer stats
"except-detailed-timers" : Collect all stats except time inside mutex lock AND time spent on
compression.
"except-time-for-mutex" : Collect all stats except the counters requiring to get time inside the
mutex lock.
"all" : Collect all stats, including measuring duration of mutex operations.
If getting time is expensive on the platform to run, it can
reduce scalability to more threads, especially for writes.
Interval between retries.Can be configured using the jiff::fmt::friendly format or ISO8601, for example 5 hours.Examples:
“10 hours” or “5 days” or “5d” or “1h 4m” or “P40D” or “0”
Initial Interval: Initial interval for the first retry attempt.Can be configured using the jiff::fmt::friendly format or ISO8601, for example 5 hours.Examples:
“10 hours” or “5 days” or “5d” or “1h 4m” or “P40D” or “0”
Max interval: Maximum interval between retries.Can be configured using the jiff::fmt::friendly format or ISO8601, for example 5 hours.Human-readable duration: Duration string in either jiff human friendly or ISO8601 format. Check https://docs.rs/jiff/latest/jiff/struct.Span.html#parsing-and-printing for more details.Examples:
“10 hours” or “5 days” or “5d” or “1h 4m” or “P40D” or “0”
Networking options: Common network configuration options for communicating with Restate cluster nodes. Note that
similar keys are present in other config sections, such as in Service Client options.
Interval between retries.Can be configured using the jiff::fmt::friendly format or ISO8601, for example 5 hours.Examples:
“10 hours” or “5 days” or “5d” or “1h 4m” or “P40D” or “0”
Initial Interval: Initial interval for the first retry attempt.Can be configured using the jiff::fmt::friendly format or ISO8601, for example 5 hours.Examples:
“10 hours” or “5 days” or “5d” or “1h 4m” or “P40D” or “0”
Max interval: Maximum interval between retries.Can be configured using the jiff::fmt::friendly format or ISO8601, for example 5 hours.Human-readable duration: Duration string in either jiff human friendly or ISO8601 format. Check https://docs.rs/jiff/latest/jiff/struct.Span.html#parsing-and-printing for more details.Examples:
“10 hours” or “5 days” or “5d” or “1h 4m” or “P40D” or “0”
Node Name: Unique name for this node in the cluster. The node must not change unless
it’s started with empty local store. It defaults to the node’s hostname.
RocksDB compaction readahead size in bytes: If non-zero, we perform bigger reads when doing compaction. If you’re
running RocksDB on spinning disks, you should set this to at least 2MB.
That way RocksDB’s compaction is doing sequential instead of random reads.Non-zero human-readable bytes
Disable Direct IO for reads: Files will be opened in “direct I/O” mode
which means that data r/w from the disk will not be cached or
buffered. The hardware buffer of the devices may however still
be used. Memory mapped files are not impacted by these parameters.
Disable L0/L1 SST compression: When false (the default), L0 and L1 SST files are compressed with Lz4.
Higher levels (L2+) always use Zstd regardless of this setting.
Set to true to disable compression for L0/L1, which can improve write
throughput at the cost of higher disk usage since these files are
short-lived and frequently compacted.Default: false (L0/L1 compression enabled)
Disable WAL compression: When false (the default), the Write-Ahead Log is compressed with Zstd.
Set to true to disable WAL compression. Only applies when WAL is enabled.Default: false (WAL compression enabled)
Rocksdb High Priority Background Threads: The number of threads to reserve to high priority Rocksdb background tasks.Defaults to 1/4 of the number of CPU cores.Since v1.7.0 (renamed from rocksdb-high-priority-bg-threads)
Rocksdb Low Priority Background Threads: The number of threads to reserve to lower priority Rocksdb background tasks.Defaults to the remaining CPU cores not used by high-priority rocksdb threadsSince v1.7.0 (renamed from rocksdb-bg-threads)
Rocksdb performance statistics level: Defines the level of PerfContext used internally by rocksdb. Default is enable-count
which should be sufficient for most users. Note that higher levels incur a CPU cost and
might slow down the critical path.
"disable" : Disable perf stats
"enable-count" : Enables only count stats
"enable-time-except-for-mutex" : Count stats and enable time stats except for mutexes
"enable-time-and-c-p-u-time-except-for-mutex" : Other than time, also measure CPU time counters. Still don’t measure
time (neither wall time nor CPU time) for mutexes
RocksDB statistics level: StatsLevel can be used to reduce statistics overhead by skipping certain
types of stats in the stats collection process.Default: “except-detailed-timers”
"disable-all" : Disable all metrics
"except-histogram-or-timers" : Disable timer stats, and skip histogram stats
"except-timers" : Skip timer stats
"except-detailed-timers" : Collect all stats except time inside mutex lock AND time spent on
compression.
"except-time-for-mutex" : Collect all stats except the counters requiring to get time inside the
mutex lock.
"all" : Collect all stats, including measuring duration of mutex operations.
If getting time is expensive on the platform to run, it can
reduce scalability to more threads, especially for writes.
Rocksdb total memtable size ratio: The memory size used across all memtables (ratio between 0.1 to 1.0). This
limits how much memory memtables can eat up from the value in rocksdb-total-memory-limit.The remaining memory will be dedicated to the block cache.This value will be sanitized to 1.0 if outside the valid bounds.
Storage high priority thread poolThis configures the restate-managed storage thread pool for performing
high-priority or latency-sensitive storage tasks when the IO operation cannot
be performed on in-memory caches.
Storage low priority thread poolThis configures the restate-managed storage thread pool for performing
low-priority or latency-insensitive storage tasks.
Tracing Endpoint: This is a shortcut to set both [Self::tracing_runtime_endpoint], and [Self::tracing_services_endpoint].Specify the tracing endpoint to send runtime traces to.
Traces will be exported using OTLP gRPC
through opentelemetry_otlp.To configure the sampling, please refer to the opentelemetry autoconfigure docs.
Distributed Tracing JSON Export Path: If set, an exporter will be configured to write traces to files using the Jaeger JSON format.
Each trace file will start with the trace prefix.If unset, no traces will be written to file.It can be used to export traces in a structured format without configuring a Jaeger agent.To inspect the traces, open the Jaeger UI and use the Upload JSON feature to load and inspect them.
Runtime Tracing Endpoint: Overrides [Self::tracing_endpoint] for runtime tracesSpecify the tracing endpoint to send runtime traces to.
Traces will be exported using OTLP gRPC
through opentelemetry_otlp.To configure the sampling, please refer to the opentelemetry autoconfigure docs.
Services Tracing Endpoint: Overrides [Self::tracing_endpoint] for services tracesSpecify the tracing endpoint to send services traces to.
Traces will be exported using OTLP gRPC
through opentelemetry_otlp.To configure the sampling, please refer to the opentelemetry autoconfigure docs.
Durability mode: Every partition store is backed up by a durable log that is used to recover the state of
the partition on restart or failover. The durability mode defines the criteria used
to determine whether a partition is considered fully durable or not at a given point in the
log history. Once a partition is fully durable, its backing log is allowed to be trimmed to
the durability point.This helps keeping the log’s disk usage under control but it forces nodes that need to restore
the state of the partition to fetch a snapshot of that partition that covers the changes up to
and including the “durability point”.Since v1.4.2 (not compatible with earlier versions)
"none" : This disables durability tracking and trimming completely.
Trims and snapshots are still possible if performed manually or by an external
component.
"snapshot-and-replica-set" : In this mode, a partition is considered durable when its state can be restored from
any of members of the replica-set as well as the latest snapshot.
In other words, do not trim unless all replicas cover this Lsn, and the snapshot.[requires snapshot repository]
DurabilityPoint = Min(Min(ReplicaSetDurablePoints), SnapshotDurablePoint)
"balanced" : In this mode, a partition is considered durable when its state can be restored from
the snapshot and at least a single replica.
Do not trim unless the Lsn is covered (durably) by any of the replicas and by
the snapshot. Gives weight to snapshots over the durability of the replica-set but
without ignoring the replica-set completely.In practice, this means that after a snapshot has been created on the leader, the
system will wait for the nearest memtable flush that cover this Lsn before considering
this Lsn for trimming. If the leader crashes before the memtable flush, we are confident
that the leader will be able to replay the log without any trim-gaps. This is under the
condition that the leader didn’t move to another node. In the latter case, the system will
fetch the snapshot as usual.[requires snapshot repository]
[default] if restate-server is in cluster mode.
DurabilityPoint = Min(Max(ReplicaSetDurablePoints), SnapshotDurablePoint)
"replica-set-only" : A partition is considered durable once all nodes in the replica-set are durable, regardless
of the state of snapshots.
Do not trim unless all replicas durably include this Lsn.default in standalone-mode with no snapshot repository configured[default] if restate-server is in single-node mode.
DurabilityPoint = Min(ReplicaSetDurablePoints)
"snapshot-only" : A partition is durable ONLY after a snapshot has been created.
[requires snapshot repository]
Do not trim unless the Lsn is covered by the snapshot with no regard to the
state of durability of the replica-set members.DurabilityPoint = SnapshotDurablePoint
Invoker options: Configuration for the HTTP/2 keep-alive mechanism, using PING frames.Please note: most gateways don’t propagate the HTTP/2 keep-alive between downstream and upstream hosts.
In those environments, you need to make sure the gateway can detect a broken connection to the upstream deployment(s).
Action throttling: Configures rate limiting for service actions at the node level.
This throttling mechanism uses a token bucket algorithm to control the rate
at which actions can be processed, helping to prevent resource exhaustion
and maintain system stability under high load.The throttling limit is shared across all partitions running on this node,
providing a global rate limit for the entire node rather than per-partition limits.
When unset, no throttling is applied and actions are processed
without throttling.Throttling options per invoker.
Refill rate: The rate at which the tokens are replenished.Syntax: <rate>/<unit> where <unit> is s|sec|second, m|min|minute, or h|hr|hour.
unit defaults to per second if not specified.
Additional request headers: Headers that should be applied to all outgoing requests (HTTP and Lambda).
Defaults to x-restate-cluster-name: <cluster name>.Proxy type to implement HashMap<HeaderName, HeaderValue> ser/de
Use it directly or with #[serde(with = \"serde_with::As::<serde_with::FromInto<restate_serde_util::SerdeableHeaderMap>>\")].
Eager state size limit (since v1.7.0): Maximum total size (in bytes) of state entries to send eagerly in the StartMessage.
When the total size of state entries exceeds this limit, only a partial state is sent
and the service will fetch remaining state lazily using GetEagerState commands.Set to 0 to disable eager state entirely (equivalent to enabling lazy state).This helps reduce memory pressure on deployments for services with large state.
If unset, defaults to message-size-limit (clamped to that value if set higher).Human-readable bytes
Proxy URI: A URI, such as http://127.0.0.1:10001, of a server to which all invocations should be sent, with the Host header set to the deployment URI.
HTTPS proxy URIs are supported, but only HTTP endpoint traffic will be proxied currently.
Can be overridden by the HTTP_PROXY environment variable.
HTTP/2 Initial Max Send Streams: Sets the initial maximum of locally initiated (send) streams.This value will be overwritten by the value included in the initial
SETTINGS frame received from the peer as part of a [connection preface].Note: This value is capped by [Self::streams_per_connection_limit]Default: NoneNOTE: Setting this value to None (default) users the default
recommended value from HTTP2 specs
HTTP/2 Keep-Alive Jitter: Fractional jitter added to http2-keep-alive-interval, expressed as a fraction
of the interval (e.g. 0.1 = up to +10%, 1.0 = up to +100%).Default 0.2 (20% of http2-keep-alive-interval)
Upper bound on the per-connection max-send-streams.Caps the remote server’s advertised max_concurrent_streams.A high number of concurrent streams per connection works
poorly with L4 load balancers because streams are not balanced across
backends.Since v1.7.0Default: 128
Spill invocations to disk: Defines the threshold after which queues invocations will spill to disk at
the path defined in tmp-dir. In other words, this is the number of invocations
that can be kept in memory before spilling to disk. This is a per-partition limit.
Invocation throttling: Configures throttling for service invocations at the node level.
This throttling mechanism uses a token bucket algorithm to control the rate
at which invocations can be processed, helping to prevent resource exhaustion
and maintain system stability under high load.The throttling limit is shared across all partitions running on this node,
providing a global rate limit for the entire node rather than per-partition limits.
When unset, no throttling is applied and invocations are processed
without throttling.Throttling options per invoker.
Refill rate: The rate at which the tokens are replenished.Syntax: <rate>/<unit> where <unit> is s|sec|second, m|min|minute, or h|hr|hour.
unit defaults to per second if not specified.
Message size limit: Maximum size of journal messages that can be received from a service. If a service sends a message
larger than this limit, the invocation will fail.If unset, defaults to networking.message-size-limit. If set, it will be clamped at
the value of networking.message-size-limit since larger messages cannot be transmitted
over the cluster internal network.Non-zero human-readable bytes
No proxy: IP subnets, addresses, and domain names eg localhost,restate.dev,127.0.0.1,::1,192.168.1.0/24 that should not be proxied by the http_proxy.
IP addresses must not have ports, and IPv6 addresses must not be wrapped in ’[]’.
Subdomains are also matched. An entry “*” matches all hostnames.
Can be overridden by the NO_PROXY environment variable, which supports comma separated values.
Per-invocation memory limit: Maximum memory (in bytes) a single invocation may use per direction (inbound and
outbound). Once an invocation’s directional budget reaches this ceiling it must
wait for in-flight data to be consumed or yield back to the scheduler.If unset, defaults to message-size-limit. If set, it will be clamped at
the value of message-size-limit.Since v1.7.0Human-readable bytes
Request Compression threshold: Request minimum size to enable compression.
The request size includes the total of the journal replay and its framing using Restate service protocol, without accounting for the json envelope and the base 64 encoding.Default: 4MB (The default AWS Lambda Limit is 6MB, 4MB roughly accounts for +33% of Base64 and the json envelope).Human-readable bytes
Request identity private key PEM file: A path to a file, such as “/var/secrets/key.pem”, which contains exactly one ed25519 private
key in PEM format. Such a file can be generated with openssl genpkey -algorithm ed25519.
If provided, this key will be used to attach JWTs to requests from this client which
SDKs may optionally verify, proving that the caller is a particular Restate instance.This file is currently only read on client creation, but this may change in future.
Parsed public keys will be logged at INFO level in the same format that SDKs expect.
Maximum command batch size for partition processors: The maximum number of commands a partition processor will apply in a batch. The larger this
value is, the higher the throughput and latency are.
Num timers in memory limit: The number of timers in memory limit is used to bound the amount of timers loaded in memory. If this limit is set, when exceeding it, the timers farther in the future will be spilled to disk.
Interval between retries.Can be configured using the jiff::fmt::friendly format or ISO8601, for example 5 hours.Examples:
“10 hours” or “5 days” or “5d” or “1h 4m” or “P40D” or “0”
Initial Interval: Initial interval for the first retry attempt.Can be configured using the jiff::fmt::friendly format or ISO8601, for example 5 hours.Examples:
“10 hours” or “5 days” or “5d” or “1h 4m” or “P40D” or “0”
Max interval: Maximum interval between retries.Can be configured using the jiff::fmt::friendly format or ISO8601, for example 5 hours.Human-readable duration: Duration string in either jiff human friendly or ISO8601 format. Check https://docs.rs/jiff/latest/jiff/struct.Span.html#parsing-and-printing for more details.Examples:
“10 hours” or “5 days” or “5d” or “1h 4m” or “P40D” or “0”
Inflight Memory Budget: Maximum total size of in-flight ingestion requests in bytes.
Tune this to your workload so there are enough unpersisted
requests for efficient batching without exhausting memory.Defaults to 1 MiB.
Request Batch Size: Maximum size of a single ingestion request batch.
Tune to keep enough requests per batch for
throughput; overly large batches can increase tail latency.Defaults to 50 KiB.
Object store API endpoint URL override: When you use Amazon S3, this is typically inferred from the region and there is no need to
set it. With other object stores, you will have to provide an appropriate HTTP(S) endpoint.
If not using HTTPS, also set aws-allow-http to true.
AWS profile: The AWS configuration profile to use for S3 object store destinations. If you use
named profiles in your AWS configuration, you can replace all the other settings with
a single profile reference. See the [AWS documentation on profiles]
(https://docs.aws.amazon.com/sdkref/latest/guide/file-format.html) for more.
AWS region to use with S3 object store destinations. This may be inferred from the
environment, for example the current region when running in EC2. Because of the
request signing algorithm this must have a value. For Minio, you can generally
set this to any string, such as us-east-1.
Snapshot destination URL: Base URL for cluster snapshots. Currently only supports the s3:// protocol scheme.
S3-compatible object stores must support ETag-based conditional writes.Default: None
Export concurrency limit: Maximum number of concurrent partition snapshot exports. This controls how
many partition stores can simultaneously export snapshots to the snapshot
repository.Default: 4
Snapshot retention count: Number of most recent snapshots to retain. Older snapshots will be
deleted automatically. Only snapshots created after this setting is enabled will
be considered for pruning.Retaining multiple snapshots causes the partition archived LSN to be reported as that of the
oldest retained snapshot. Therefore, retaining multiple snapshots will cause increased disk
usage on log-server nodes.Default: 1Since v1.7.0
Interval between retries.Can be configured using the jiff::fmt::friendly format or ISO8601, for example 5 hours.Examples:
“10 hours” or “5 days” or “5d” or “1h 4m” or “P40D” or “0”
Initial Interval: Initial interval for the first retry attempt.Can be configured using the jiff::fmt::friendly format or ISO8601, for example 5 hours.Examples:
“10 hours” or “5 days” or “5d” or “1h 4m” or “P40D” or “0”
Max interval: Maximum interval between retries.Can be configured using the jiff::fmt::friendly format or ISO8601, for example 5 hours.Human-readable duration: Duration string in either jiff human friendly or ISO8601 format. Check https://docs.rs/jiff/latest/jiff/struct.Span.html#parsing-and-printing for more details.Examples:
“10 hours” or “5 days” or “5d” or “1h 4m” or “P40D” or “0”
Automatic snapshot time interval: A time interval at which partition snapshots will be created. If
snapshot-interval-num-records is also set, it will be treated as an additional requirement
before a snapshot is taken. Use both time-based and record-based intervals to reduce the
number of snapshots created during times of low activity.Snapshot intervals are calculated based on the wall clock timestamps reported by cluster
nodes, assuming a basic level of clock synchronization within the cluster.This setting does not influence explicitly requested snapshots triggered using restatectl.Default: None - automatic snapshots are disabledHuman-readable duration: Duration string in either jiff human friendly or ISO8601 format. Check https://docs.rs/jiff/latest/jiff/struct.Span.html#parsing-and-printing for more details.Examples:
“10 hours” or “5 days” or “5d” or “1h 4m” or “P40D” or “0”
Automatic snapshot minimum records: Number of log records that trigger a snapshot to be created.As snapshots are created asynchronously, the actual number of new records that will trigger
a snapshot will vary. The counter for the subsequent snapshot begins from the LSN at which
the previous snapshot export was initiated.This setting does not influence explicitly requested snapshots triggered using restatectl.Default: None - automatic snapshots are disabled
RocksDB compaction readahead size in bytes: If non-zero, we perform bigger reads when doing compaction. If you’re
running RocksDB on spinning disks, you should set this to at least 2MB.
That way RocksDB’s compaction is doing sequential instead of random reads.Non-zero human-readable bytes
Disable automatic rocksdb memory reclaimer: When set to true, disables RocksDB’s memory reclaimer for partition stores.
The reclaimer automatically reclaims memory from memtables when they are no longer
needed, or when the total memtable budget is exceeded. Disabling this will cause
rocksdb to exceed the specific budget under extreme conditions and when flushes
are falling behind. The benefit of disabling the reclaimer is reduce the chances
of rocksdb write stalls under heavy load.[Supports configuration hot-reloading]
Disable compact-on-deletion collector: When set to true, disables RocksDB’s CompactOnDeletionCollector for partition stores.
The collector automatically triggers compaction when SST files accumulate a high density
of tombstones (deletion markers), helping reclaim disk space after bulk deletions.This helps control space amplification when invocation journal retention expires and
the cleaner purges completed invocations.Consider disabling this if you observe frequent unnecessary compactions triggered by
the collector causing performance issues.
Disable Direct IO for reads: Files will be opened in “direct I/O” mode
which means that data r/w from the disk will not be cached or
buffered. The hardware buffer of the devices may however still
be used. Memory mapped files are not impacted by these parameters.
Disable L0/L1 SST compression: When false (the default), L0 and L1 SST files are compressed with Lz4.
Higher levels (L2+) always use Zstd regardless of this setting.
Set to true to disable compression for L0/L1, which can improve write
throughput at the cost of higher disk usage since these files are
short-lived and frequently compacted.Default: false (L0/L1 compression enabled)
Disable WAL compression: When false (the default), the Write-Ahead Log is compressed with Zstd.
Set to true to disable WAL compression. Only applies when WAL is enabled.Default: false (WAL compression enabled)
The memory budget for rocksdb memtables in bytesThe total is divided evenly across partitions. The server will rebalance the memory budget
periodically depending on the number of running partitions on this node.If this value is set, it overrides the ratio defined in rocksdb-memory-ratio.Non-zero human-readable bytes
The memory budget for rocksdb memtables as ratioThis defines the total memory for rocksdb as a ratio of all memory available to memtables
(See rocksdb-total-memtables-ratio in common). The budget is then divided evenly across
partitions.
RocksDB statistics level: StatsLevel can be used to reduce statistics overhead by skipping certain
types of stats in the stats collection process.Default: “except-detailed-timers”
"disable-all" : Disable all metrics
"except-histogram-or-timers" : Disable timer stats, and skip histogram stats
"except-timers" : Skip timer stats
"except-detailed-timers" : Collect all stats except time inside mutex lock AND time spent on
compression.
"except-time-for-mutex" : Collect all stats except the counters requiring to get time inside the
mutex lock.
"all" : Collect all stats, including measuring duration of mutex operations.
If getting time is expensive on the platform to run, it can
reduce scalability to more threads, especially for writes.