Microservice orchestration is about coordinating multiple services to complete complex business workflows. Restate provides powerful primitives for building resilient, observable orchestration patterns. In this guide, you’ll learn how to:
  • Build durable, fault-tolerant service orchestrations with automatic failure recovery
  • Implement sagas for distributed transactions with resilient compensation
  • Use durable timers and external events for complex async patterns
  • Implement stateful entities with Virtual Objects
Select your SDK:

Getting Started

A Restate application is composed of two main components:
  • Restate Server: The core engine that manages durable execution and orchestrates services. It acts as a message broker or reverse proxy in front of your services.
  • Your Services: Your business logic, implemented as service handlers using the Restate SDK to perform durable operations.
Application Structure A basic subscription service orchestration looks like this: A service has handlers that can be called over HTTP. Each handler receives a Context object that provides durable execution primitives. Any action performed with the Context is automatically recorded and can survive failures. You don’t need to run your services in any special way. Restate works with how you already deploy your code, whether that’s in Docker, on Kubernetes, or via AWS Lambda.

Run the example

Install Restate and launch it:
restate-server
Click in the UI’s invocations tab on the inovcation ID of your request to see the execution trace of your request.
Invocation overview

Durable Execution

Restate uses Durable Execution to ensure your orchestration logic survives failures and restarts. Whenever a handler executes an action with the Restate Context, this gets send over to the Restate Server and persisted in a log. On a failure or a crash, the Restate Server sends a retry request that contains the log of the actions that were executed so far. The service then replays the log to restore state and continues executing the remaining actions. This process continues until the handler runs till completion. Context in Restate Key Benefits:
  • Context run actions make external calls or non-deterministic operations durable. They get replayed on failures.
  • If the service crashes after payment creation, it resumes at the subscription step
  • Deterministic IDs logged with the context ensure operations are idempotent
  • Full execution traces for debugging and monitoring

Error Handling

By default, Restate retries failures infinitely with an exponential backoff strategy. For some failures, you might not want to retry or only retry a limited number of times. For these cases, Restate distinguishes between two types of errors:
  • Transient Errors: These are temporary issues that can be retried, such as network timeouts or service unavailability. Restate automatically retries these errors.
  • Terminal Errors: These indicate a failure that will not be retried, such as invalid input or business logic violations. Restate stops execution and allows you to handle these errors gracefully.
Throw a terminal error in your handler to indicate a terminal failure:
Learn more with the Error Handling Guide.

Sagas and Rollback

On a terminal failure, Restate stops the execution of the handler. You might, however, want to roll back the changes made by the workflow to keep your system in a consistent state. This is where Sagas come in. Sagas are a pattern for rolling back changes made by a handler when it fails. In Restate, you can implement a saga by building a list of compensating actions for each step of the workflow. On a terminal failure, you execute them in reverse order: Benefits with Restate:
  • The list of compensations can be recovered after a crash, and Restate knows which compensations still need to be run.
  • Sagas always run till completion (success or complete rollback)
  • Full trace of all operations and compensations
  • No complex state machines needed
Learn more with the Sagas Guide.

Virtual Objects

Until now, the services we looked at did not share any state between requests. To implement stateful entities like shopping carts, user profiles, or AI agents, Restate provides Virtual Objects. Each Virtual Object instance maintains isolated state and is identified by a unique key. Here is an example of a Virtual Object that tracks user subscriptions: Objects Virtual Objects are ideal for implementing any entity with mutable state:
  • Long-lived state: K/V state is stored permanently. It has no automatic expiry. Clear it via ctx.clear().
  • Durable state changes: State changes are logged with Durable Execution, so they survive failures and are consistent with code execution
  • State is queryable via the state tab in the UI:
    State
  • Built-in concurrency control: Restate’s Virtual Objects have built-in queuing and consistency guarantees per object key. Handlers either have read-write access (ObjectContext) or read-only access (shared object context).
    • Only one handler with write access can run at a time per object key to prevent concurrent/lost writes or race conditions.
    • Handlers with read-only access can run concurrently to the write-access handlers.
Queue

Resilient Communication

The Restate SDK includes clients to call other handlers reliably. You can call another handler in three ways:
  • Request-Response: Wait for a response
  • One-Way Messages: Fire-and-forget
  • Delayed Messages: Schedule for later
When you call another handler, the Restate Server acts as a message broker. All communication is proxied via the Restate Server where it gets durably logged and retried till completion. Imagine a handler which processes a concert ticket purchase, and calls multiple services to handle payment, ticket delivery, and reminders: Each of these calls gets persisted in Restate’s log and will be retried upon failures. The handler can finish execution without waiting for the ticket delivery or reminder to complete. You can use Restate’s communication primitives to implement microservices that communicate reliably and scale independently.

Request Idempotency

Restate allows adding an idempotency header to your requests. It will then deduplicate requests with the same idempotency key, ensuring that they only execute once. This can help us prevent duplicate calls the concert ticketing service if the user accidentally clicks “buy” multiple times. Add an idempotency header to your request: Notice how doing the same request with the same idempotency key will print the same payment reference. Instead of executing the handler again, Restate returns the result of the first execution.

External Events

Until now we showed either synchronous API calls via run or calls to other Restate services. Another common scenario is APIs that respond asynchronously via webhooks or callbacks. For this, you can use Restate’s awakeables. For example, some payment providers like Stripe require you to initiate a payment and then wait for their webhook to confirm the transaction. Awakeables are like promises or futures that can be recovered after a crash. Restate persists the awakeable in its log and can recover it on another process when needed. There is no limit to how long you can persist an awakeable, so you can wait for external events that may take hours, or even months to arrive. You can also use awakeables to implement human-in-the-loop interactions, such as waiting for user input or approvals.

Durable Timers

Waiting on external events might take a long time, and you might want to add timeouts to operations like this. The Restate SDK offers durable timer implementations that you can use to limit waiting for an action. Restate tracks these timers so they survive crashes and do not restart from the beginning. Let’s extend our payment service to automatically cancel payments that don’t complete within a reasonable time: You can also set timeouts for RPC calls or other asynchronous operations with the Restate SDK.

Concurrent Tasks

When you are waiting on an awakeable or a timer, you are effectively running concurrent tasks and waiting for one of them to complete. Restate allows more advanced concurrency patterns to run tasks in parallel and wait for their results. Let’s extend our subscription service to process all subscriptions concurrently and handle failures gracefully: Restate retries all parallel tasks until they all complete and can deterministically replay the order of completion.
You can extend this to include the saga pattern and run all compensations in parallel as well. Have a look at the Concurrent Tasks docs for your SDK to learn more (TS / Java / Kotlin / Python / Go).

Summary

Restate simplifies microservice orchestration with:
  • Durable Execution: Automatic failure recovery without complex retry logic
  • Sagas: Distributed transactions with resilient compensation
  • Service Communication: Reliable RPC and messaging between services
  • Stateful Processing: Consistent state management without external stores
  • Advanced Patterns: Fault-tolerant timers, awakeables, and parallel execution
Build resilient distributed systems without the typical complexity.