Tour of Restate for Agents with OpenAI SDK

AI agents are long-running processes that combine LLMs with tools and external APIs to complete complex tasks. With Restate, you can build agents that are resilient to failures, stateful across conversations, and observable without managing complex retry logic or external state stores. In this guide, you’ll learn how to:

Build durable AI agents that recover automatically from crashes and API failures
Integrate Restate with the OpenAI Agent SDK for Python
Observe and debug agent executions with detailed traces
Implement resilient human-in-the-loop workflows with approvals and timeouts
Manage conversation history and state across multi-turn interactions
Orchestrate multiple agents working together on complex tasks

Getting Started

A Restate AI application has two main components:

Restate Server: The core engine that takes care of the orchestration and resiliency of your agents
Agent Services: Your agent or AI workflow logic using the Restate SDK for durability

Restate works with how you already deploy your agents, whether that’s in Docker, on Kubernetes, or via serverless platforms (Modal, AWS Lambda…). You don’t need to run your agents in any special way. Let’s run an example locally to get a better feel for how it works.

Run the agent

Install Restate and launch it:

restate-server

Get the example:

git clone [email protected]:restatedev/ai-examples.git
cd ai-examples/openai-agents/tour-of-agents

View on GitHub

Export your OpenAI API key and run the agent:

export OPENAI_API_KEY=sk-...
uv run .

Then, tell Restate where your agent is running via the UI (http://localhost:9070) or CLI:

restate deployments register http://localhost:9080

This registers a set of agents that we will be covering in this tutorial. To test your setup, invoke the weather agent, either via the UI playground by clicking on the run handler of the WeatherAgent in the overview:

Or via curl:

curl localhost:8080/WeatherAgent/run \
  --json '{"message": "What is the weather like in San Francisco?"}'

You should see the weather information printed in the terminal. Let’s have a look at what happened under the hood to make your agents resilient.

Durable Execution

AI agents make multiple LLM calls and tool executions that can fail due to rate limits, network issues, or service outages. Restate uses Durable Execution to make your agents withstand failures without losing progress. The Restate SDK records the steps the agent executes in a log and replays them if the process crashes or is restarted:

Durable Execution is the basis of how Restate makes your agents resilient to failures. Restate offers durable execution primitives via its SDK.

Creating a Durable Agent

To implement a durable agent, you can use the Restate SDK in combination with the OpenAI Agent SDK. Here’s the implementation of the durable weather agent you just invoked:

durable_agent.py

@function_tool(failure_error_function=raise_restate_errors)
async def get_weather(
    wrapper: RunContextWrapper[restate.Context], req: WeatherRequest
) -> WeatherResponse:
    """Get the current weather for a given city."""
    # Do durable steps using the Restate context
    restate_context = wrapper.context
    return await restate_context.run_typed("Get weather", fetch_weather, city=req.city)


weather_agent = Agent[restate.Context](
    name="WeatherAgent",
    instructions="You are a helpful agent that provides weather updates.",
    tools=[get_weather],
)


agent_service = restate.Service("WeatherAgent")


@agent_service.handler()
async def run(restate_context: restate.Context, prompt: WeatherPrompt) -> str:

    result = await Runner.run(
        weather_agent,
        input=prompt.message,
        # Pass the Restate context to tools to make tool execution steps durable
        context=restate_context,
        # Choose any model and let Restate persist your calls
        run_config=RunConfig(
            model="gpt-4o",
            model_provider=DurableModelCalls(restate_context),
            model_settings=ModelSettings(parallel_tool_calls=False),
        ),
    )

    return result.final_output

View on GitHub

First, you implement your agent and its tools, similar to how you would do it with the OpenAI Agent SDK. To serve the agent over HTTP with Restate, you create a Restate Service and define handlers. Here, the agent logic is called from the run handler. The endpoint that serves the agents of this tour over HTTP is defined in __main__.py. The agent can now be called at http://localhost:8080/WeatherAgent/run. The main difference compared to a standard OpenAI agent is the use of the Restate Context at key points throughout the agent logic. Any action with the Context is automatically recorded by the Restate Server and survives failures. We use this for:

Persisting LLM responses: We use the DurableModelCalls(restate_context) model provider in Runner.run, so that every LLM response is saved in Restate Server and can be replayed during recovery.
Resilient tool execution: Tools can make steps durable by using Context actions. Their outcome will then be persisted for recovery and retried until they succeed. restate_context.run_typed runs an action durably, retrying it until it succeeds and persisting the result in Restate (e.g. database interaction, API calls, non-deterministic actions).

Propagating Restate Context to tools

The Restate Context gets supplied to the run handler by the Restate SDK when the handler is invoked. This context is then propagated to tools via the agent context. The Restate Context can also be contained in an object together with additional context. Learn more from the OpenAI docs.

Try out Durable Execution

Ask for the weather in Denver:

curl localhost:8080/WeatherAgent/run \
--json '{"message": "What is the weather like in Denver?"}'

On the invocation page in the UI, click on the invocation ID of the failing invocation. You can see that your request is retrying because the weather API is down:

To fix the problem, remove the line fail_on_denver from the fetch_weather function in the app/utils/utils.py file:

utils/utils.py

async def fetch_weather(city: str) -> WeatherResponse:
    fail_on_denver(city)
    weather_data = await call_weather_api(city)
    return parse_weather_data(weather_data)

Once you restart the service, the workflow finishes successfully.

Observing your Agent

As you saw in the previous section, the Restate UI comes in handy when monitoring and debugging your agents. The Invocations tab shows all agent executions with detailed traces of every LLM call, tool execution, and state change:

OpenTelemetry Integration

Restate supports OpenTelemetry for exporting traces to external systems like Langfuse, DataDog, or Jaeger:Have a look at the tracing docs to set this up.

Now that you know how to build and debug an agent, let’s look at more advanced patterns.

Human-in-the-Loop Agent

Many AI agents need human oversight for high-risk decisions or gathering additional input. Restate makes it easy to pause agent execution and wait for human input. Benefits with Restate:

If the agent crashes while waiting for human input, Restate continues waiting and recovers the promise on another process.
If the agent runs on function-as-a-service platforms, the Restate SDK lets the function suspend while it’s waiting. Once the approval comes in, the Restate Server invokes the function again and lets it resume where it left off. This way, you don’t pay for idle waiting time (Learn more).

Here’s an insurance claim agent that asks for human approval for high-value claims:

human_approval_agent.py

@function_tool(failure_error_function=raise_restate_errors)
async def human_approval(
    wrapper: RunContextWrapper[restate.Context], claim: InsuranceClaim
) -> str:
    """Ask for human approval for high-value claims."""
    restate_context = wrapper.context

    # Create an awakeable for human approval
    approval_id, approval_promise = restate_context.awakeable(type_hint=str)

    # Request human review
    await restate_context.run_typed(
        "Request review", request_human_review, claim=claim, awakeable_id=approval_id
    )

    # Wait for human approval
    return await approval_promise

View on GitHub

To implement human approval steps, you can use Restate’s awakeables. An awakeable is a promise that can be resolved externally via an API call by providing its ID. When you create the awakeable, you get back an ID and a promise. You can send the ID to the human approver, and then wait for the promise to be resolved.

You can also use awakeables outside of tools, for example, to implement human approval steps in between agent iterations.

Try out human approval

Start a request for a high-value claim that needs human approval. Use the playground or curl with /send to start the claim asynchronously, without waiting for the result.

curl localhost:8080/HumanClaimApprovalAgent/run/send \
--json '{"message": "Process my hospital bill of 3000USD for a broken leg."}'

You can restart the service to see how Restate continues waiting for the approval.If you wait for more than a minute, the invocation will get suspended.Simulate approving the claim by executing the curl request that was printed in the service logs, similar to:

curl localhost:8080/restate/awakeables/sign_1M28aqY6ZfuwBmRnmyP/resolve --json 'true'

See in the UI how the workflow resumes and finishes after the approval.

Timeouts and Escalation

Add timeouts to human approval steps to prevent workflows from hanging indefinitely.Restate persists the timer and the approval promise, so if the service crashes or is restarted, it will continue waiting with the correct remaining time:

human_approval_agent_with_timeout.py

# Wait for human approval for at most 3 hours to reach our SLA
match await restate.select(
    approval=approval_promise,
    timeout=restate_context.sleep(timedelta(hours=3)),
):
    case ["approval", approved]:
        return "Approved" if approved else "Rejected"
    case _:
        return "Approval timed out - Evaluate with AI"

View on GitHub

Try it out by sending a request to the service:

curl localhost:8080/HumanClaimApprovalWithTimeoutsAgent/run/send \
--json '{"message": "Process my hospital bill of 3000USD for a broken leg."}'

You restart the service and check in the UI how the process will block for the remaining time without starting over.You can also lower the timeout to a few seconds to see how the timeout path is taken.

Resilient workflows as tools

You can pull out complex parts of your tool logic into separate workflows. This lets you break down complex agents into smaller, reusable components that can be developed, deployed, and scaled independently. The Restate SDK gives you clients to call other Restate services durably from your agent logic. All calls are proxied via Restate. Restate persists the call and takes care of retries and recovery. For example, let’s implement the human approval tool as a separate service:

sub_workflow_agent.py

# Sub-workflow service for human approval
human_approval_workflow = restate.Service("HumanApprovalWorkflow")


@human_approval_workflow.handler("requestApproval")
async def request_approval(
    restate_context: restate.Context, claim: InsuranceClaim
) -> str:
    """Request human approval for a claim and wait for response."""
    # Create an awakeable that can be resolved via HTTP
    approval_id, approval_promise = restate_context.awakeable(type_hint=str)

    # Request human review
    await restate_context.run_typed(
        "Request review", request_human_review, claim=claim, awakeable_id=approval_id
    )

    # Wait for human approval
    return await approval_promise

View on GitHub

This can now be called from the main agent via a service client:

sub_workflow_agent.py

@function_tool(failure_error_function=raise_restate_errors)
async def human_approval(
    wrapper: RunContextWrapper[restate.Context], claim: InsuranceClaim
) -> str:
    """Ask for human approval for high-value claims using sub-workflow."""
    restate_context = wrapper.context

    # Call the human approval sub-workflow
    return await restate_context.service_call(request_approval, claim)

View on GitHub

These workflows have access to all Restate SDK features, including durable execution, state management, awakeables, and observability. They can be developed, deployed, and scaled independently.

Try out sub-workflows

Start a request for a high-value claim that needs human approval. Use /send to start the claim asynchronously, without waiting for the result.

curl localhost:8080/SubWorkflowClaimApprovalAgent/run/send \
--json '{"message": "Process my hospital bill of 3000USD for a broken leg."}'

In the UI, you can see that the agent called the workflow service and is waiting for the response. You can see the trace of the sub-workflow in the timeline.Once you approve the claim, the workflow returns, and the agent continues.

Follow the Tour of Workflows to learn more about implementing resilient workflows with Restate.

Durable Sessions

The next ingredient we need to build AI agents is the ability to maintain context and memory across multiple interactions. The OpenAI SDK allows plugging in custom session providers to manage conversation history. This integrates very well with Restate’s stateful entities, called Virtual Objects. Virtual Objects are Restate’s way of implementing stateful services with durable state management and built-in concurrency control. To implement stateful entities like chat sessions, or stateful agents, Restate provides Virtual Objects. Each Virtual Object instance maintains isolated state and is identified by a unique key.

Virtual Objects as OpenAI Session Providers

The Restate OpenAI middleware includes a SessionProvider that automatically persists the agent’s conversation history in the Virtual Object state. Here is an example of a stateful, durable agent represented as a Virtual Object:

chat.py

chat = VirtualObject("Chat")


@chat.handler()
async def message(restate_context: ObjectContext, chat_message: ChatMessage) -> dict:

    restate_session = await RestateSession.create(
        session_id=restate_context.key(), ctx=restate_context
    )

    result = await Runner.run(
        Agent(name="Assistant", instructions="You are a helpful assistant."),
        input=chat_message.message,
        run_config=RunConfig(
            model="gpt-4o",
            model_provider=DurableModelCalls(restate_context),
            model_settings=ModelSettings(parallel_tool_calls=False),
        ),
        session=restate_session,
    )
    return result.final_output


@chat.handler(kind="shared")
async def get_history(ctx: ObjectSharedContext):
    return await ctx.get("items") or []

View on GitHub

Virtual Objects are ideal for implementing any entity with mutable state:

Long-lived state: K/V state is stored permanently. It has no automatic expiry. Clear it via ctx.clear().
Durable state changes: State changes are logged with Durable Execution, so they survive failures and are consistent with code execution
State is queryable via the state tab in the UI.

Built-in concurrency control: Restate’s Virtual Objects have built-in queuing and consistency guarantees per object key. Handlers either have read-write access (ObjectContext) or read-only access (shared object context).
- Only one handler with write access can run at a time per object key to prevent concurrent/lost writes or race conditions (for example message()).
- Handlers with read-only access can run concurrently to the write-access handlers (for example get_history()).

Try out Virtual Objects

Stateful Chat Agent:Ask the agent to do some task:

curl localhost:8080/Chat/session123/message \
--json '{"message": "Make a poem about durable execution."}'

Continue the conversation - the agent remembers previous context:

curl localhost:8080/Chat/session123/message \
--json '{"message": "Shorten it to 2 lines."}'

Get conversation history or view it in the UI:

curl localhost:8080/Chat/session123/get_history

Seeing concurrency control in action:In the chat service, the message handler is an exclusive handler, while the getHistory handler is a shared handler.Let’s send some messages to a chat session:

curl localhost:8080/Chat/session123/message/send --json '{"message": "make a poem about durable execution"}' &
curl localhost:8080/Chat/session456/message/send --json '{"message": "what are the benefits of durable execution?"}' &
curl localhost:8080/Chat/session789/message/send --json '{"message": "how does workflow orchestration work?"}' &
curl localhost:8080/Chat/session123/message/send --json '{"message": "can you make it rhyme better?"}' &
curl localhost:8080/Chat/session456/message/send --json '{"message": "what about fault tolerance in distributed systems?"}' &
curl localhost:8080/Chat/session789/message/send --json '{"message": "give me a practical example"}' &
curl localhost:8080/Chat/session101/message/send --json '{"message": "explain event sourcing in simple terms"}' &
curl localhost:8080/Chat/session202/message/send --json '{"message": "what is the difference between async and sync processing?"}'

The UI shows how Restate queues the requests per session to ensure consistency:

Stateful Serverless Agents

You can run Virtual Objects on serverless platforms like Modal, Render, or AWS Lambda. When the request comes in, Restate attaches the correct state to the request, so your handler can access it locally.This way, you can implement stateful, serverless agents without managing any external state store and without worrying about concurrency issues.

Virtual Objects for storing context

You can store any context information in Virtual Objects, for example, user preferences or the last agent they interacted with. Use ctx.set and ctx.get in your handler to store and retrieve state. We will show an example of this in the next section when we orchestrate multiple agents.

Resilient multi-agent coordination

As your agents grow more complex, you may want to break them down into smaller, specialized agents that can delegate tasks to each other. Similar to sub-workflows, you can break down complex agents into multiple specialized agents. All agents can run in the same process or be deployed independently.

Agents as tools/handoffs

If you want to share context between agents, run the agents in the same process and use handoffs or tools. You don’t need to do anything special to make this work with Restate. Use Virtual Object state to maintain context between runs. For example, store the last agent that was called in the object state, so the user can connect back seamlessly on the next interaction:

multi_agent.py

intake_agent = Agent[restate.ObjectContext](
    name="IntakeAgent",
    instructions="Route insurance claims to the appropriate specialist: medical, auto, or property.",
)

medical_specialist = Agent[restate.ObjectContext](
    name="MedicalSpecialist",
    handoff_description="I handle medical insurance claims from intake to final decision.",
    instructions="Review medical claims for coverage and necessity. Approve/deny up to $50,000.",
)

auto_specialist = Agent[restate.ObjectContext](
    name="AutoSpecialist",
    handoff_description="I handle auto insurance claims from intake to final decision.",
    instructions="Assess auto claims for liability and damage. Approve/deny up to $25,000.",
)

# Configure handoffs so intake agent can route to specialists
intake_agent.handoffs = [medical_specialist, auto_specialist]

agent_dict = {
    "IntakeAgent": intake_agent,
    "MedicalSpecialist": medical_specialist,
    "AutoSpecialist": auto_specialist,
}

agent_service = restate.VirtualObject("MultiAgentClaimApproval")


@agent_service.handler()
async def run(restate_context: restate.ObjectContext, claim: InsuranceClaim) -> str:

    # Store context in Restate's key-value store
    last_agent_name = (
        await restate_context.get("last_agent_name", type_hint=str) or "IntakeAgent"
    )
    last_agent = agent_dict.get(last_agent_name, intake_agent)

    restate_session = await RestateSession.create(
        session_id=restate_context.key(), ctx=restate_context
    )
    result = await Runner.run(
        last_agent,
        input=f"Claim: {claim.model_dump_json()}",
        context=restate_context,
        run_config=RunConfig(
            model="gpt-4o",
            model_provider=DurableModelCalls(restate_context),
            model_settings=ModelSettings(parallel_tool_calls=False),
        ),
        session=restate_session,
    )

    restate_context.set("last_agent_name", result.last_agent.name)

    return result.final_output

View on GitHub

The execution trace in the Restate UI will allow you to see the full chain of calls between agents and their individual steps.

Try out multi-agent systems

Start a request for a claim that needs to be analyzed by multiple agents.

curl localhost:8080/MultiAgentClaimApproval/session123/run --json '{
    "date":"2024-10-01",
    "category":"orthopedic",
    "reason":"hospital bill for a broken leg",
    "amount":3000,
    "placeOfService":"General Hospital"
}'

In the UI, you can see that the agent called the sub-agents and is waiting for their responses. You can see the trace of the sub-agents in the timeline.Once all sub-agents return, the main agent continues and makes a decision.

The state now contains the last agent that was called, so you can continue the conversation directly with the same agent:

Remote agents as tools

If you want to run agents independently, for example, to scale them separately, run them on different platforms, or let them get developed by different teams, then you can call them as tools via service calls. Restate will proxy all calls, persist them, and will guarantee that they complete successfully. Your main agent can suspend and save resources while waiting for the remote agent to finish. Restate invokes your main agent again once the remote agent returns.

multi_agent.py

# Durable service call to the fraud agent; persisted and retried by Restate
@function_tool(failure_error_function=raise_restate_errors)
async def check_fraud(
    wrapper: RunContextWrapper[restate.Context], claim: InsuranceClaim
) -> str:
    """Analyze the probability of fraud."""
    restate_context = wrapper.context
    return await restate_context.service_call(run_fraud_agent, claim)


claim_approval_coordinator = Agent[restate.Context](
    name="ClaimApprovalCoordinator",
    instructions="You are a claim approval engine. Analyze the claim and use your tools to decide whether to approve it.",
    tools=[check_fraud, check_eligibility],
)

agent_service = restate.Service("RemoteMultiAgentClaimApproval")


@agent_service.handler()
async def run(restate_context: restate.Context, claim: InsuranceClaim) -> str:
    result = await Runner.run(
        claim_approval_coordinator,
        input=f"Claim: {claim.model_dump_json()}",
        context=restate_context,
        run_config=RunConfig(
            model="gpt-4o",
            model_provider=DurableModelCalls(restate_context),
            model_settings=ModelSettings(parallel_tool_calls=False),
        ),
    )
    return result.final_output

View on GitHub

Note, any shared context between agents needs to be passed explicitly via the input. The execution trace in the Restate UI will allow you to see the full chain of calls between agents and their individual steps.

You cannot put both agents within the same Virtual Object, because this leads to deadlocks. The main agent would block on the call to the sub-agent, preventing the sub-agent from executing, cause only one handler can run at a time per object key.

Try out multi-agent systems

Start a request for a claim that needs to be analyzed by multiple agents.

curl localhost:8080/RemoteMultiAgentClaimApproval/run --json '{
    "date":"2024-10-01",
    "category":"orthopedic",
    "reason":"hospital bill for a broken leg",
    "amount":3000,
    "placeOfService":"General Hospital"
}'

Parallel Work

Now that our agents are broken down into smaller parts, let’s have a look at how to run different parts of our agent logic in parallel to speed up execution.

When using the OpenAI Agent SDK with Restate, tool calls are executed sequentially by default to ensure deterministic execution during replays. When multiple tools execute in parallel and use the Restate Context, the order of operations might differ between the original execution and the replay, leading to inconsistencies.

Restate provides primitives that allow you to run tasks concurrently while maintaining deterministic execution during replays. Most actions on the Restate Context can be composed using restate.gather to gather their results or restate.select to wait for the first one to complete.

Parallel Tool Steps

To parallelize tool steps, implement an orchestrator tool that uses durable execution to run multiple steps in parallel. Here is an insurance claim agent tool that runs multiple analyses in parallel:

parallel_tools_agent.py

@function_tool(failure_error_function=raise_restate_errors)
async def calculate_metrics(
    wrapper: RunContextWrapper[restate.Context], claim: InsuranceClaim
) -> list[str]:
    """Calculate claim metrics."""
    restate_context = wrapper.context

    # Run tools/steps in parallel with durable execution
    results_done = await restate.gather(
        restate_context.run_typed("eligibility", check_eligibility, claim=claim),
        restate_context.run_typed("cost", compare_to_standard_rates, claim=claim),
        restate_context.run_typed("fraud", check_fraud, claim=claim),
    )
    return [await result for result in results_done]

View on GitHub

Restate makes sure that all parallel tasks are retried and recovered until they succeed.

If you want to allow the LLM to call multiple tools in parallel, then you need to manually implement the agent tool execution loop using restate.select and durable promises.

Try out parallel tool steps

Start a request for a claim that needs to be analyzed by multiple tools in parallel.

curl localhost:8080/ParallelToolClaimAgent/run --json '{
    "date":"2024-10-01",
    "category":"orthopedic",
    "reason":"hospital bill for a broken leg",
    "amount":3000,
    "placeOfService":"General Hospital"
}'

In the UI, you can see that the agent ran the tool steps in parallel. Their traces all start at the same time.Once all tools return, the agent continues and makes a decision.

Parallel Agents

You can use the same durable execution primitives to run multiple agents in parallel. For example, to race agents against each other and use the first result that returns, while cancelling the others. Or to let a main orchestrator agent combine the results of multiple specialized agents in parallel:

parallel_agents.py

@agent_service.handler()
async def run(restate_context: restate.Context, claim: InsuranceClaim) -> str:

    # Start multiple agents in parallel with auto retries and recovery
    eligibility = restate_context.service_call(run_eligibility_agent, claim)
    cost = restate_context.service_call(run_rate_comparison_agent, claim)
    fraud = restate_context.service_call(run_fraud_agent, claim)

    # Wait for all responses
    await restate.gather(eligibility, cost, fraud)

    # Run decision agent on outputs
    result = await Runner.run(
        Agent(
            name="ClaimApprovalAgent", instructions="You are a claim decision engine."
        ),
        input=f"Decide about claim: {claim.model_dump_json()}. "
        "Base your decision on the following analyses:"
        f"Eligibility: {await eligibility} Cost {await cost} Fraud: {await fraud}",
        run_config=RunConfig(
            model="gpt-4o",
            model_provider=DurableModelCalls(restate_context),
            model_settings=ModelSettings(parallel_tool_calls=False),
        ),
    )
    return result.final_output

View on GitHub

Try out parallel agents

Start a request for a claim that needs to be analyzed by multiple agents in parallel.

curl localhost:8080/ParallelAgentClaimApproval/run --json '{
    "date":"2024-10-01",
    "category":"orthopedic",
    "reason":"hospital bill for a broken leg",
    "amount":3000,
    "placeOfService":"General Hospital"
}'

In the UI, you can see that the handler called the sub-agents in parallel. Once all sub-agents return, the main agent makes a decision.

Error Handling

LLM calls are costly, so you can configure retry behavior in both Restate and your AI SDK to avoid infinite loops and high costs. Restate distinguishes between two types of errors:

Transient errors: Temporary issues like network failures or rate limits. Restate automatically retries these until they succeed or the retry policy is exhausted.
Terminal errors: Permanent failures like invalid input or business rule violations. Restate does not retry these. The invocation fails permanently. You can catch these errors and handle them gracefully.

You can throw a terminal error via:

from restate import TerminalError

raise TerminalError("This tool is not allowed to run for this input.")

You can catch and handle terminal errors in your agent logic if needed. Many AI SDKs also have their own retry behavior for LLM calls and tool executions, so let’s look at how these interact.

Retries of LLM calls

Restate’s DurableModelCalls provider lets you specify the maximum number of retries for LLM calls.

run_config=RunConfig(
    model="gpt-4o",
    model_provider=DurableModelCalls(restate_context, max_retries=3),
    model_settings=ModelSettings(parallel_tool_calls=False)
),

View on GitHub

By default, the middleware retries three times. Once Restate’s retries are exhausted, the invocation fails with a TerminalError and won’t be retried further.

Tool execution errors

By default, the OpenAI Agent SDK will convert any error in tool execution into a message to the LLM, and the LLM will decide how to proceed. This is often desirable, as the LLM can decide to retry the tool call, use a different tool, or provide a fallback answer.

Surfacing suspensions and terminal errors

There are some Restate errors that should not be handled by the LLM, for example, if a tool execution is suspended waiting for human input. Restate lets tool executions suspend if they need to wait for a long time. This suspension is started by raising an artificial error. These errors should be re-raised by the agent, instead of ingested. Therefore, you should always set your tool’s failure_error_function to raise Restate errors like suspensions.

error_handling.py

@function_tool(failure_error_function=raise_restate_errors)
async def get_weather(
    wrapper: RunContextWrapper[restate.Context], req: WeatherRequest
) -> WeatherResponse:
    """Get the current weather for a given city."""
    restate_context = wrapper.context
    return await restate_context.run_typed("Get weather", fetch_weather, city=req.city)

This error function will also re-raise any terminal errors that happen during tool execution. You can then handle the terminal error in your agent logic if needed, or let it fail the invocation:

error_handling.py

try:
    result = await Runner.run(
        weather_agent,
        input=prompt.message,
        context=restate_context,
        run_config=RunConfig(
            model="gpt-4o",
            model_provider=DurableModelCalls(restate_context, max_retries=2),
            model_settings=ModelSettings(parallel_tool_calls=False),
        ),
    )
except restate.TerminalError as e:
    # Handle terminal errors gracefully
    return "The agent couldn't complete the request."

The OpenAI Agent SDK also allows setting failure_error_function to None, which will rethrow any error in the agent execution as-is. Also for example invalid LLM responses (e.g. tool call with invalid arguments or to a tool that doesn’t exist). The error will then lead to Restate retries. Restate will recover the invocation by replaying the journal entries. This can lead to infinite retries if the error is not transient. Therefore, be careful when using this option and handle errors appropriately in your agent logic. You also might want to set a retry policy at the service or handler level to avoid infinite retries.

Retry-ing transient errors

If you use Restate Context actions like ctx.run in your tool execution, Restate will retry any transient errors in these actions until they succeed. So for all operations that might suffer from transient errors (like network calls, database interactions, etc.), you should use Context actions to make them resilient. Here is a small practical example:

// Without ctx.run - error goes straight to agent
async function myTool() {
  const result = await fetch('/api/data'); // Might fail due to network
  // If this fails, agent gets the error immediately
}

// With ctx.run - Restate handles retries
async function myToolWithRestate(ctx: restate.Context) {
  const result = await ctx.run('fetch-data', () =>
      fetch('/api/data')
  );
  // Network failures get retried automatically
  // Only terminal errors reach the AI
}

You can set custom retry policies for ctx.run steps in your tool executions.

Advanced patterns

Manual Agent Loop

If you need more control over the agent loop, you can implement it manually using Restate’s durable primitives.This allows you to:

Parallelize tool calls with restate.select and restate.gather
Implement custom stopping conditions
Implement custom logic between steps (e.g. human approval)
Interact with external systems between steps
Handle errors in a custom way

Here is an example of a manual agent loop:

advanced/manual_loop_agent.py

    ChatCompletionToolMessageParam, ChatCompletionUserMessageParam
from pydantic import BaseModel
from restate import Context
from openai import OpenAI, pydantic_function_tool

from app.utils.models import WeatherRequest
from app.utils.utils import fetch_weather, as_chat_completion_param

# Initialize OpenAI client
client = OpenAI()

# Tool definitions
TOOLS = [
    pydantic_function_tool(
        WeatherRequest,
        name="get_weather",
        description="Get the current weather in a given location",
    )
]


manual_loop_agent = restate.Service("ManualLoopAgent")


class MultiWeatherPrompt(BaseModel):
    message: str = "What is the weather like in New York and San Francisco?"


@manual_loop_agent.handler()
async def run(ctx: Context, prompt: MultiWeatherPrompt) -> str | None:
    """Main agent loop with tool calling"""
    messages: list[ChatCompletionMessageParam] = [
        ChatCompletionUserMessageParam(role="user", content=prompt.message)
    ]

    while True:
        # Call OpenAI with durable execution
        def llm_call() -> ChatCompletion:
            return client.chat.completions.create(
                model="gpt-4o",
                messages=messages,
                tools=TOOLS,
            )

        response = await ctx.run_typed(
            "llm-call",
            llm_call,
            restate.RunOptions(
                max_attempts=3
            ),  # To avoid using too many credits on infinite retries during development
        )

        # Save function call outputs for subsequent requests
        assistant_message = response.choices[0].message
        messages.append(as_chat_completion_param(assistant_message))

        if not assistant_message.tool_calls:
            return assistant_message.content

        # Check if we need to call tools
        for tool_call in assistant_message.tool_calls:
            if isinstance(tool_call, ChatCompletionMessageFunctionToolCall) and tool_call.function.name == "get_weather":
                req = WeatherRequest.model_validate_json(tool_call.function.arguments)
                tool_output = await ctx.run_typed(
                    "Get weather", fetch_weather, city=req.city
                )

                # Add tool response to messages
                messages.append(
                    ChatCompletionToolMessageParam(
                        role="tool",
                        tool_call_id=tool_call.id,
                        content=tool_output.model_dump_json(),
                    )
                )

View on GitHub

This can be extended to include any custom control flow you need: persistent state, parallel tool calls, custom stopping conditions, or custom error handling.Try it out by sending a request to the service:

curl localhost:8080/ManualLoopAgent/run \
--json '{"message": "What is the weather like in New York and San Francisco?"}'

In the UI, you can see how the agent runs multiple iterations and calls tools.

Rolling back tool executions on failure

Sometimes you need to undo previous agent actions when a later step fails. Restate makes it easy to implement compensation patterns (Sagas) for AI agents.Just track the rollback actions as you go, let the agent rethrow terminal tool errors, and execute the rollback actions in reverse order.Here is an example of a travel booking agent that first reserves a hotel, flight and car, and then either confirms them or rolls back if any step fails with a terminal error (e.g. car type not available).We let tools add rollback actions to the agent context for each booking step the do. The run handler catches any terminal errors and runs all the rollback actions.

advanced/rollback_agent.py

class BookingContext(BaseModel):
    model_config = ConfigDict(arbitrary_types_allowed=True)

    booking_id: str
    restate_context: restate.Context
    on_rollback: list[Callable] = Field(default=[])


# Functions raise terminal errors instead of feeding them back to the agent
@function_tool(failure_error_function=raise_restate_errors)
async def book_hotel(
    wrapper: RunContextWrapper[BookingContext], booking: HotelBooking
) -> BookingResult:
    """Book a hotel"""
    booking_context = wrapper.context

    # Register a rollback action for each step, in case of failures further on in the workflow
    booking_context.on_rollback.append(
        lambda: booking_context.restate_context.run_typed(
            "Cancel hotel", cancel_hotel, booking_id=booking_context.booking_id
        )
    )

    # Execute the workflow step
    return await booking_context.restate_context.run_typed(
        "Book hotel", reserve_hotel, booking_id=booking_context.booking_id, booking=booking
    )


@function_tool(failure_error_function=raise_restate_errors)
async def book_flight(
    wrapper: RunContextWrapper[BookingContext], booking: FlightBooking
) -> BookingResult:
    """Book a flight"""
    booking_context = wrapper.context

    booking_context.on_rollback.append(
        lambda: booking_context.restate_context.run_typed(
            "Cancel flight", cancel_flight, booking_id=booking_context.booking_id
        )
    )
    return await booking_context.restate_context.run_typed(
        "Book flight", reserve_flight, booking_id=booking_context.booking_id, booking=booking
    )


# ... Do the same for cars ...


agent_service = restate.Service("BookingWithRollbackAgent")


@agent_service.handler()
async def book(restate_context: restate.Context, prompt: BookingPrompt) -> str:

    booking_context = BookingContext(
        booking_id=prompt.booking_id, restate_context=restate_context
    )

    booking_agent = Agent[BookingContext](
        name="BookingWithRollbackAgent",
        instructions="Book a complete travel package with the requirements in the prompt."
        "Use tools to first book the hotel, then the flight.",
        tools=[book_hotel, book_flight],
    )

    try:
        result = await Runner.run(
            booking_agent,
            input=prompt.message,
            context=booking_context,
            run_config=RunConfig(
                model="gpt-4o", model_provider=DurableModelCalls(restate_context)
            ),
        )
    except TerminalError as e:
        # Run all the rollback actions on terminal errors
        for compensation in reversed(booking_context.on_rollback):
            await compensation()
        raise e

    return result.final_output

View on GitHub

Try it out by sending the following request:

curl localhost:8080/BookingWithRollbackAgent/book \
--json '{
    "booking_id": "booking_123",
    "message": "I need to book a business trip to San Francisco from March 15-17. Flying from JFK, need a hotel downtown for 1 guest."
}'

Have a look at the UI to see how the flight booking fails, and the bookings are rolled back.Check out the sagas guide for more details.

Long-running background agents

Restate supports implementing scheduling and timer logic in your agents. This allows you to build agents that run periodically, wait for specific times, or implement complex scheduling logic. Agents can either be long-running or reschedule themselves for later execution.Have a look at the scheduling docs to learn more.

Streaming back intermediate results

Have a look at the pub-sub example.

Interrupting agents

Have a look at the interruptible coding agent.

Summary

Durable Execution, paired with your existing SDKs, gives your agents a powerful upgrade:

Durable Execution: Automatic recovery from failures without losing progress
Persistent memory and context: Persistent conversation history and context
Observability by default across your agents and workflows
Human-in-the-Loop: Seamless approval workflows with timeouts
Multi-Agent Coordination: Reliable orchestration of specialized agents
Suspensions to save costs on function-as-a-service platforms when agents need to wait
Advanced Patterns: Real-time progress updates, interruptions, and long-running workflows

Next Steps

Learn more about how to implement resilient tools with Restate in the Tour of Workflows
Check out the other Restate AI examples on GitHub
Sign up for Restate Cloud and start building agents without managing infrastructure

Use Cases

Foundations

Tour of Restate for...

Tour of Restate for Agents with OpenAI SDK

Getting Started

Run the agent

Durable Execution

Creating a Durable Agent

Observing your Agent

Human-in-the-Loop Agent

Resilient workflows as tools

Durable Sessions

Virtual Objects as OpenAI Session Providers

Virtual Objects for storing context

Resilient multi-agent coordination

Agents as tools/handoffs

Remote agents as tools

Parallel Work

Parallel Tool Steps

Parallel Agents

Error Handling

Retries of LLM calls

Tool execution errors

Surfacing suspensions and terminal errors

Retry-ing transient errors

Advanced patterns

Summary

Next Steps

Use Cases

Foundations

Tour of Restate for...

​Getting Started

​Run the agent

​Durable Execution

​Creating a Durable Agent

​Observing your Agent

​Human-in-the-Loop Agent

​Resilient workflows as tools

​Durable Sessions

​Virtual Objects as OpenAI Session Providers

​Virtual Objects for storing context

​Resilient multi-agent coordination

​Agents as tools/handoffs

​Remote agents as tools

​Parallel Work

​Parallel Tool Steps

​Parallel Agents

​Error Handling

​Retries of LLM calls

​Tool execution errors

​Surfacing suspensions and terminal errors

​Retry-ing transient errors

​Advanced patterns

​Summary

​Next Steps

Getting Started

Run the agent

Durable Execution

Creating a Durable Agent

Observing your Agent

Human-in-the-Loop Agent

Resilient workflows as tools

Durable Sessions

Virtual Objects as OpenAI Session Providers

Virtual Objects for storing context

Resilient multi-agent coordination

Agents as tools/handoffs

Remote agents as tools

Parallel Work

Parallel Tool Steps

Parallel Agents

Error Handling

Retries of LLM calls

Tool execution errors

Surfacing suspensions and terminal errors

Retry-ing transient errors

Advanced patterns

Summary

Next Steps