Overview
Restate allocates memory from several distinct pools, each with its own configuration:| Memory Pool | Default | Configuration |
|---|---|---|
| RocksDB | 2 GiB | rocksdb-total-memory-size |
| Query Engine | 1 GiB | admin.query-engine.memory-size |
| Bifrost Record Cache | 250 MiB | bifrost.record-cache-memory-size |
The default memory values are sized for production workloads. Restate runs well with significantly less memory for development, testing, or smaller deployments.For example, a minimal configuration suitable for development or low-traffic environments:Or via environment variables:
restate.toml
RocksDB Memory
The RocksDB memory pool is typically the largest. Restate uses RocksDB as its storage engine for durable state, journal entries, and log data.restate.toml
RESTATE_ROCKSDB_TOTAL_MEMORY_SIZE
This value can be specified using human-readable byte units such as "4 GiB", "8192 MiB", or as a raw byte count.
Restate creates a shared block cache and write buffer manager used by all RocksDB databases on the node. The rocksdb-total-memory-size parameter sets the total budget that Restate internally manages and distributes across the partition store, log server, metadata server, and local loglet databases.
Automatic Memory Rebalancing
In a cluster environment, the number of active partitions on each node can dynamically change as nodes come up, go down, and during failover events. To ensure efficient use of the memory budget regardless of how many partitions are currently active, Restate includes a memory controller that automatically rebalances memory budgets across partitions. Every few seconds, the controller monitors memory usage across all open partition databases, redistributes the total memory budget evenly across active partitions, and triggers memtable flushes for partitions that exceed their allocated budget. The rebalancing also responds to configuration changes, so you can adjustrocksdb-total-memory-size at runtime and the system will adapt.
Expected Memory Behavior
It is completely normal and expected to see the RocksDB memory budget remain fully consumed even after load on the cluster goes down. This is because the block cache operates as an LRU (Least Recently Used) cache.The cache does not proactively release memory when load decreases. Instead, it retains cached data until new data needs to be cached, at which point older entries are evicted to make room. This behavior is intentional and beneficial: keeping the cache warm means that if similar queries or operations occur again, they can be served from memory rather than requiring disk I/O.In other words, high memory usage by the block cache is a sign that the system is working efficiently, not a problem to be solved.
Query Engine Memory
The query engine (used for SQL introspection queries) has its own separate memory budget:restate.toml
RESTATE_ADMIN__QUERY_ENGINE__MEMORY_SIZE
This memory is used for query processing, including sorting, aggregation, and join operations. Complex queries on large datasets may require more memory. If queries fail with out-of-memory errors, consider increasing this value.
Bifrost Record Cache
The Bifrost log subsystem maintains an in-memory cache for recently written log records:restate.toml
RESTATE_BIFROST__RECORD_CACHE_MEMORY_SIZE
This cache improves read performance for the replicated log by keeping recently written records in memory. Increasing this value can improve performance for workloads that frequently read from the log, such as partition replay during failover.
Recommendations
General Guidelines
- Consider allocating 20-50% of available process memory to RocksDB as a starting point. This leaves room for the query engine, Bifrost record cache, runtime, network buffers, and other components. You may need to adjust this based on your specific workload and usage patterns.
-
On Linux, Restate automatically detects cgroup memory limits (both v1 and v2) and will emit warnings if the configured
rocksdb-total-memory-sizeis too close to the process memory limit. - Monitor memory usage using the metrics exposed by Restate to ensure your configuration is appropriate for your workload.
Container and Kubernetes Deployments
When running Restate in containers or Kubernetes, consider setting memory pools to fit within your container’s memory limit. For example, with a 4 GiB container:restate.toml
Using the Restate Operator
When deploying Restate on Kubernetes using the Restate Operator, configure memory through theRestateCluster resource:
spec.config field.
Monitoring Memory Usage
Restate exposes memory usage metrics via Prometheus at the/metrics endpoint on the node port (default: 5122). Key metrics to monitor include:
| Metric | Description |
|---|---|
restate_rocksdb_memory_write_buffer_manager_capacity_bytes | Total capacity of the write buffer manager, which manages combined memory usage across all RocksDB databases |
restate_rocksdb_memory_write_buffer_manager_usage_bytes | Current usage of the write buffer manager (combined across all RocksDB databases) |
restate_rocksdb_memory_approx_memtable_bytes | Approximate total memory used by memtables |
restate_rocksdb_memory_approx_memtable_unflushed_bytes | Approximate memory used by unflushed memtables |
restate_rocksdb_block_cache_capacity_bytes | Total block cache capacity |
restate_rocksdb_block_cache_usage_bytes | Current block cache usage |
| Metric | Description |
|---|---|
restate_rocksdb_cur_size_active_mem_table_bytes | Size of the active memtable |
restate_rocksdb_cur_size_all_mem_tables_bytes | Size of all memtables (active + immutable) |
restate_rocksdb_mem_table_flush_pending_count | Whether a memtable flush is pending |
SIGUSR1 to the Restate process, which will dump the current configuration including memory settings to standard error.
Troubleshooting
OOM Errors
If the Restate process is being killed by the OOM killer:- Reduce
rocksdb-total-memory-sizeto leave more headroom - Try reducing
admin.query-engine.memory-sizeif running complex queries - Ensure container memory limits match your configuration
- Review
bifrost.record-cache-memory-sizeand reduce if necessary
Advanced Configuration
For most deployments, the top-level memory settings described above are sufficient. This section covers advanced options for fine-tuning how RocksDB memory is distributed internally.RocksDB Memory Architecture
RocksDB uses memory for two primary purposes:- Block Cache: An LRU cache that stores uncompressed data blocks read from disk. This speeds up read operations by avoiding disk I/O for frequently accessed data.
- Memtables: In-memory write buffers where data is accumulated before being flushed to disk as sorted string tables (SST files).
Memtable Ratio
By default, 85% of the total RocksDB memory is allocated to memtables, with the remaining 15% for the block cache. You can adjust this ratio:restate.toml
RESTATE_ROCKSDB_TOTAL_MEMTABLES_RATIO
A higher ratio allocates more memory to write buffers, which can improve write throughput but reduces the block cache size available for reads.
Per-Component Memory Allocation
Within the total memtable budget, each component (partition store, log server, etc.) receives a portion based on its configured ratio. These are the defaults:| Component | Default Ratio | Description |
|---|---|---|
| Partition Store | 49% | Stores service state and journal entries |
| Log Server | 50% | Stores replicated log segments |
| Metadata Server | 1% | Stores cluster metadata |
restate.toml
RESTATE_WORKER__STORAGE__ROCKSDB_MEMORY_RATIORESTATE_WORKER__STORAGE__ROCKSDB_MEMORY_BUDGETRESTATE_LOG_SERVER__ROCKSDB_MEMORY_RATIORESTATE_LOG_SERVER__ROCKSDB_MEMORY_BUDGETRESTATE_METADATA_SERVER__ROCKSDB_MEMORY_RATIORESTATE_METADATA_SERVER__ROCKSDB_MEMORY_BUDGET
When both
rocksdb-memory-ratio and rocksdb-memory-budget are set for a component, the explicit budget takes precedence over the ratio.Configuration Reference
For the complete list of memory-related configuration options, see the Server Configuration Reference.| Option | Default | Environment Variable | Description |
|---|---|---|---|
rocksdb-total-memory-size | 2 GiB | RESTATE_ROCKSDB_TOTAL_MEMORY_SIZE | Total memory for RocksDB block cache and memtables |
rocksdb-total-memtables-ratio | 0.85 | RESTATE_ROCKSDB_TOTAL_MEMTABLES_RATIO | Ratio of total memory allocated to memtables |
admin.query-engine.memory-size | 1 GiB | RESTATE_ADMIN__QUERY_ENGINE__MEMORY_SIZE | Memory budget for SQL query execution |
bifrost.record-cache-memory-size | 250 MiB | RESTATE_BIFROST__RECORD_CACHE_MEMORY_SIZE | In-memory cache for log records |