> ## Documentation Index
> Fetch the complete documentation index at: https://docs.restate.dev/llms.txt
> Use this file to discover all available pages before exploring further.

<AgentInstructions>

## Submitting Feedback

If you encounter incorrect, outdated, or confusing documentation on this page, submit feedback:

POST https://docs.restate.dev/feedback

```json
{
  "path": "/ai/patterns/error-handling",
  "feedback": "Description of the issue"
}
```

Only submit feedback when you have something specific and actionable to report.

</AgentInstructions>

# Retries & Error Handling

> Implement robust error handling and retry strategies for reliable agents.

export const GitHubLink = ({url}) => <div style={{
  marginTop: '-8px',
  marginBottom: '8px',
  textAlign: 'right'
}}>
    <a href={url} target="_blank" rel="noopener noreferrer" style={{
  fontSize: '0.75rem',
  color: '#6B7280',
  textDecoration: 'none',
  display: 'inline-flex',
  alignItems: 'center',
  gap: '3px',
  padding: '2px 6px',
  borderRadius: '3px',
  border: '1px solid #E5E7EB',
  backgroundColor: 'transparent',
  transition: 'all 0.2s ease'
}} onMouseOver={e => {
  e.target.style.color = '#6B7280';
  e.target.style.backgroundColor = '#F9FAFB';
}} onMouseOut={e => {
  e.target.style.color = '#6B7280';
  e.target.style.backgroundColor = 'transparent';
}}>
      <svg width="12" height="12" viewBox="0 0 24 24" fill="currentColor">
        <path d="M12 0c-6.626 0-12 5.373-12 12 0 5.302 3.438 9.8 8.207 11.387.599.111.793-.261.793-.577v-2.234c-3.338.726-4.033-1.416-4.033-1.416-.546-1.387-1.333-1.756-1.333-1.756-1.089-.745.083-.729.083-.729 1.205.084 1.839 1.237 1.839 1.237 1.07 1.834 2.807 1.304 3.492.997.107-.775.418-1.305.762-1.604-2.665-.305-5.467-1.334-5.467-5.931 0-1.311.469-2.381 1.236-3.221-.124-.303-.535-1.524.117-3.176 0 0 1.008-.322 3.301 1.23.957-.266 1.983-.399 3.003-.404 1.02.005 2.047.138 3.006.404 2.291-1.552 3.297-1.230 3.297-1.230.653 1.653.242 2.874.118 3.176.77.84 1.235 1.911 1.235 3.221 0 4.609-2.807 5.624-5.479 5.921.43.372.823 1.102.823 2.222v3.293c0 .319.192.694.801.576 4.765-1.589 8.199-6.086 8.199-11.386 0-6.627-5.373-12-12-12z" />
      </svg>
      View on GitHub
    </a>
  </div>;

export const GlobalTab = ({title, icon, children}) => {
  return <div>{children}</div>;
};

export const GlobalTabs = ({children, className = ''}) => {
  const [activeTab, setActiveTab] = useState(0);
  const tabs = React.Children.toArray(children).filter(child => child.type && child.type.name === 'GlobalTab');
  useEffect(() => {
    const savedLanguage = localStorage.getItem('language');
    if (savedLanguage) {
      const matchingIndex = tabs.findIndex(tab => tab.props.title === savedLanguage);
      if (matchingIndex !== -1) {
        setActiveTab(matchingIndex);
      }
    }
  }, [tabs]);
  useEffect(() => {
    const handleGlobalTabChange = event => {
      const targetTitle = event.detail.title;
      const matchingIndex = tabs.findIndex(tab => tab.props.title === targetTitle);
      if (matchingIndex !== -1 && matchingIndex !== activeTab) {
        setActiveTab(matchingIndex);
      }
    };
    window.addEventListener('globalTabChange', handleGlobalTabChange);
    return () => window.removeEventListener('globalTabChange', handleGlobalTabChange);
  }, [tabs, activeTab]);
  const handleTabClick = index => {
    setActiveTab(index);
    const title = tabs[index].props.title;
    localStorage.setItem('language', title);
    window.dispatchEvent(new CustomEvent('globalTabChange', {
      detail: {
        title
      }
    }));
  };
  return <div className={`tabs tabs tab-container ${className}`}>
            <ul className="not-prose mb-6 pb-[1px] flex-none min-w-full overflow-auto border-b border-gray-200 gap-x-6 flex dark:border-gray-200/10" data-component-part="tabs-list">
                {tabs.map((tab, index) => <li key={index} className="cursor-pointer">
                        <button className={index === activeTab ? "flex text-sm items-center gap-1.5 leading-6 font-semibold whitespace-nowrap pt-3 pb-2.5 -mb-px max-w-max border-b text-primary dark:text-primary-light border-current" : "flex text-sm items-center gap-1.5 leading-6 font-semibold whitespace-nowrap pt-3 pb-2.5 -mb-px max-w-max border-b text-gray-900 border-transparent hover:border-gray-300 dark:text-gray-200 dark:hover:border-gray-700"} data-component-part="tab-button" data-active={index === activeTab} onClick={() => handleTabClick(index)}>
                            {tab.props.icon && <img src={tab.props.icon} alt="" className="h-4 w-4 not-prose" noZoom />}
                            {tab.props.title}
                        </button>
                    </li>)}
            </ul>
            <div className="prose dark:prose-dark overflow-x-auto" data-component-part="tab-content">
                {tabs[activeTab]?.props.children}
            </div>
        </div>;
};

LLM calls are costly, so you want to configure retry behavior to avoid infinite loops and high costs while still recovering from transient failures.

Restate distinguishes between two types of errors:

* **Transient errors**: Temporary issues like network failures or rate limits. Restate automatically retries these until they succeed or the retry policy is exhausted.
* **Terminal errors**: Permanent failures like invalid input or business rule violations. Restate does not retry these. The invocation fails permanently. You can catch these errors and handle them gracefully.

<GlobalTabs>
  <GlobalTab title="Vercel AI" icon={"/img/languages/typescript.svg"} />

  <GlobalTab title="OpenAI Agents" icon={"/img/languages/python.svg"} />

  <GlobalTab title="Google ADK" icon={"/img/languages/python.svg"} />

  <GlobalTab title="Pydantic AI" icon={"/img/languages/python.svg"} />

  <GlobalTab title="Restate TS" icon={"/img/languages/typescript.svg"} />

  <GlobalTab title="Restate Py" icon={"/img/languages/python.svg"} />
</GlobalTabs>

## Retrying LLM calls

LLM API calls fail transiently (rate limits, network issues, provider outages). Configure retry limits to handle this automatically and prevent runaway costs.

<GlobalTabs className={"hidden-tabs"}>
  <GlobalTab title="Vercel AI">
    In the Vercel AI SDK, set `maxRetries` on `generateText` (default: 2) to retry failed calls due to rate limits or transient errors.
    After retries are exhausted, the agent throws an error.
    Restate then retries the invocation with exponential backoff to handle longer outages or network issues.

    You can limit Restate's retries with the `maxRetryAttempts` option in `durableCalls` middleware:

    ```typescript errorhandling/fail-on-terminal-tool-agent.ts {"CODE_LOAD::https://raw.githubusercontent.com/restatedev/ai-examples/refs/heads/main/vercel-ai/tour-of-agents/src/errorhandling/fail-on-terminal-tool-agent.ts#max_attempts_example"}  theme={null}
    const model = wrapLanguageModel({
      model: openai("gpt-4o"),
      middleware: durableCalls(ctx, { maxRetryAttempts: 3 }),
    });
    ```

    <GitHubLink url="https://github.com/restatedev/ai-examples/tree/main/vercel-ai/tour-of-agents/src/errorhandling/fail-on-terminal-tool-agent.ts" />

    Each Restate retry triggers up to `maxRetries` SDK attempts.
    For example, with `maxRetryAttempts`: 3 and `maxRetries`: 2, a call may be attempted 6 times.
    Once Restate's retries are exhausted, the invocation fails with a `TerminalError` and won't be retried further.
  </GlobalTab>

  <GlobalTab title="OpenAI Agents">
    Restate's `DurableRunner` lets you specify the retry behavior for LLM calls:

    ```python error_handling.py {"CODE_LOAD::https://raw.githubusercontent.com/restatedev/ai-examples/refs/heads/main/openai-agents/tour-of-agents/app/error_handling.py#handle"}  theme={null}
    try:
        result = await DurableRunner.run(
            agent,
            req.message,
            llm_retry_opts=LlmRetryOpts(
                max_attempts=3, initial_retry_interval=timedelta(seconds=2)
            ),
        )
    except restate.TerminalError as e:
        # Handle terminal errors gracefully
        return f"The agent couldn't complete the request: {e.message}"
    ```

    <GitHubLink url="https://github.com/restatedev/ai-examples/blob/main/openai-agents/tour-of-agents/app/error_handling.py" />

    By default, the runner retries ten times with an initial interval of one second. Once Restate's retries are exhausted, the invocation fails with a `TerminalError` and won't be retried further.
  </GlobalTab>

  <GlobalTab title="Google ADK">
    Configure the number of retries for LLM calls when activating the Restate plugin for your ADK App:

    ```python error_handling.py {"CODE_LOAD::https://raw.githubusercontent.com/restatedev/ai-examples/refs/heads/main/google-adk/tour-of-agents/app/error_handling.py#retries"}  theme={null}
    app = App(
        name=APP_NAME, root_agent=agent, plugins=[RestatePlugin(max_model_call_retries=3)]
    )
    ```

    <GitHubLink url="https://github.com/restatedev/ai-examples/blob/main/google-adk/tour-of-agents/app/error_handling.py" />

    By default, the runner retries ten times with an initial interval of one second. Once Restate's retries are exhausted, the invocation fails with a `TerminalError` and won't be retried further.
  </GlobalTab>

  <GlobalTab title="Pydantic AI">
    Restate's `RestateAgent` lets you specify the retry behavior for LLM calls via `RunOptions`:

    ```python error_handling.py {"CODE_LOAD::https://raw.githubusercontent.com/restatedev/ai-examples/refs/heads/main/pydantic-ai/tour-of-agents/app/error_handling.py#retries"}  theme={null}
    restate_agent = RestateAgent(
        agent,
        run_options=RunOptions(max_attempts=3, initial_retry_interval=timedelta(seconds=2)),
    )
    ```

    <GitHubLink url="https://github.com/restatedev/ai-examples/blob/main/pydantic-ai/tour-of-agents/app/error_handling.py" />

    By default, the runner retries ten times with an initial interval of one second. Once Restate's retries are exhausted, the invocation fails with a `TerminalError` and won't be retried further.
  </GlobalTab>

  <GlobalTab title="Restate TS">
    Wrap LLM calls in `ctx.run()` with a retry limit to handle transient failures automatically:

    ```typescript theme={null}
    // Retries up to 3 times with exponential backoff
    const result = await ctx.run(
      "LLM call",
      async () => llmCall(messages, tools),
      { maxRetryAttempts: 3 }
    );
    ```

    Without `maxRetryAttempts`, Restate retries indefinitely with exponential backoff. For LLM calls, setting a limit prevents runaway costs from persistent failures.

    You can set [custom retry policies](/guides/error-handling#at-the-run-block-level) for `ctx.run` steps.
  </GlobalTab>

  <GlobalTab title="Restate Py">
    Wrap LLM calls in `ctx.run_typed()` with a retry limit to handle transient failures automatically:

    ```python theme={null}
    # Retries up to 3 times with exponential backoff
    result = await ctx.run_typed(
        "LLM call",
        llm_call,
        RunOptions(max_attempts=3),
        messages=messages,
        tools=tools,
    )
    ```

    Without `max_attempts`, Restate retries indefinitely with exponential backoff. For LLM calls, setting a limit prevents runaway costs from persistent failures.

    You can set [custom retry policies](/guides/error-handling#at-the-run-block-level) for `.run` actions.
  </GlobalTab>
</GlobalTabs>

## Tool execution errors

<GlobalTabs className={"hidden-tabs"}>
  <GlobalTab title="Vercel AI">
    When agent tools use Restate Context actions like `ctx.run`, Restate automatically retries transient errors in these operations. This makes your tools resilient to network failures, database hiccups, and other temporary issues. For all operations that might suffer from transient errors, use Context actions.

    For errors that should not be retried, throw a terminal error:

    ```typescript {"CODE_LOAD::ts/src/tour/agents/terminal_error.ts#terminal_error"}  theme={null}
    throw new TerminalError("This tool is not allowed to run for this input.");
    ```

    By default, the Vercel AI will convert any errors in tool executions into a message to the LLM, and the agent will decide how to proceed.
    This is often desirable, as the LLM can decide to use a different tool or provide a fallback answer.

    However, if you use Restate Context actions like `ctx.run` in your tool execution, Restate will retry any transient errors in these actions until they succeed.

    ```typescript {"CODE_LOAD::ts/src/tour/agents/inline-tool-errors.ts#here"}  theme={null}
    // Without ctx.run - error goes straight to agent
    async function myTool() {
      const result = await fetch("/api/data"); // Might fail due to network
      // If this fails, agent gets the error immediately
    }

    // With ctx.run - Restate handles retries
    async function myToolWithRestate(ctx: restate.Context) {
      const result = await ctx.run("fetch-data", () => fetch("/api/data"));
      // Network failures get retried automatically
      // Only terminal errors reach the AI
    }
    ```

    Terminal errors thrown from Restate Context actions are not retried by Restate, and get processed by the Vercel AI.
    Also here, the Vercel AI will convert the error into a message to the LLM, and the agent will decide how to proceed.

    In some cases, you might want to treat terminal tool execution errors as permanent failures and stop the agent instead of letting the LLM decide how to proceed.
    The Restate middleware provides two utilities to help with this:

    <AccordionGroup>
      <Accordion title="Fail the agent on terminal tool errors">
        **To fail the agent on terminal tool errors**, rethrow the error in `onStepFinish`:

        ```typescript errorhandling/fail-on-terminal-tool-agent.ts {"CODE_LOAD::https://raw.githubusercontent.com/restatedev/ai-examples/refs/heads/main/vercel-ai/tour-of-agents/src/errorhandling/fail-on-terminal-tool-agent.ts#option2"}  theme={null}
        const { text } = await generateText({
          model,
          tools: {
            getWeather: tool({
              description: "Get the current weather for a given city.",
              inputSchema: z.object({ city: z.string() }),
              execute: async ({ city }) => {
                return await ctx.run("get weather", () => fetchWeather(city));
              },
            }),
          },
          stopWhen: [stepCountIs(5)],
          onStepFinish: rethrowTerminalToolError,
          system: "You are a helpful agent that provides weather updates.",
          messages: [{ role: "user", content: prompt }],
        });
        ```

        <GitHubLink url="https://github.com/restatedev/ai-examples/tree/main/vercel-ai/tour-of-agents/src/errorhandling/fail-on-terminal-tool-agent.ts" />
      </Accordion>

      <Accordion title="Stop the agent on terminal tool errors">
        To stop the agent on terminal tool errors and handle it after the agent finishes, you can use `hasTerminalToolError` in `stopWhen` and then inspect the steps for errors:

        ```typescript errorhandling/stop-on-terminal-tool-agent.ts {"CODE_LOAD::https://raw.githubusercontent.com/restatedev/ai-examples/refs/heads/main/vercel-ai/tour-of-agents/src/errorhandling/stop-on-terminal-tool-agent.ts#option3"}  theme={null}
        const { steps, text } = await generateText({
          model,
          tools: {
            getWeather: tool({
              description: "Get the current weather for a given city.",
              inputSchema: z.object({ city: z.string() }),
              execute: async ({ city }) => {
                return await ctx.run("get weather", () => fetchWeather(city));
              },
            }),
          },
          stopWhen: [stepCountIs(5), hasTerminalToolError],
          system: "You are a helpful agent that provides weather updates.",
          messages: [{ role: "user", content: prompt }],
        });

        const terminalSteps = getTerminalToolSteps(steps);
        if (terminalSteps.length > 0) {
          // Do something with the terminal tool error steps
        }
        ```

        <GitHubLink url="https://github.com/restatedev/ai-examples/tree/main/vercel-ai/tour-of-agents/src/errorhandling/stop-on-terminal-tool-agent.ts" />
      </Accordion>
    </AccordionGroup>
  </GlobalTab>

  <GlobalTab title="OpenAI Agents">
    When agent tools use Restate Context actions like `ctx.run`, Restate automatically retries transient errors in these operations. This makes your tools resilient to network failures, database hiccups, and other temporary issues. For all operations that might suffer from transient errors, use Context actions.

    For errors that should not be retried, throw a terminal error:

    ```python theme={null}
    from restate import TerminalError

    raise TerminalError("This tool is not allowed to run for this input.")
    ```

    By default, the Restate OpenAI integration will raise any terminal errors in tool executions and will let you handle them in your handler.

    <Warning>
      The OpenAI Agent SDK also allows setting `failure_error_function` to `None`, which will rethrow any error in the agent execution as-is.
      Also for example invalid LLM responses (e.g. tool call with invalid arguments or to a tool that doesn't exist).
      The error will then lead to Restate retries. Restate will recover the invocation by replaying the journal entries.
      This can lead to infinite retries if the error is not transient.
      Therefore, be careful when using this option and handle errors appropriately in your agent logic.
      You also might want to set [a retry policy at the service or handler level](/services/configuration#how-to-configure) to avoid infinite retries.
    </Warning>
  </GlobalTab>

  <GlobalTab title="Google ADK">
    When agent tools use Restate Context actions like `ctx.run`, Restate automatically retries transient errors in these operations. This makes your tools resilient to network failures, database hiccups, and other temporary issues. For all operations that might suffer from transient errors, use Context actions.

    For errors that should not be retried, throw a terminal error:

    ```python theme={null}
    from restate import TerminalError

    raise TerminalError("This tool is not allowed to run for this input.")
    ```

    Restate retries tool executions by default until they succeed.
    For errors which should not be retried, raise terminal errors from within your tool implementations.

    You can catch these terminal errors in your handler and handle them accordingly.
  </GlobalTab>

  <GlobalTab title="Pydantic AI">
    When agent tools use Restate Context actions like `ctx.run`, Restate automatically retries transient errors in these operations. This makes your tools resilient to network failures, database hiccups, and other temporary issues. For all operations that might suffer from transient errors, use Context actions.

    For example, wrapping a tool call in `restate_context().run_typed()` makes it durable with automatic retries:

    ```python error_handling.py {"CODE_LOAD::https://raw.githubusercontent.com/restatedev/ai-examples/refs/heads/main/pydantic-ai/tour-of-agents/app/error_handling.py#here"}  theme={null}
    async def get_weather(city: WeatherRequest) -> WeatherResponse:
        """Get the current weather for a given city."""
        return await restate_context().run_typed(
            f"Get weather {city}", fetch_weather, req=city
        )
    ```

    <GitHubLink url="https://github.com/restatedev/ai-examples/blob/main/pydantic-ai/tour-of-agents/app/error_handling.py" />

    For errors that should not be retried, raise a terminal error:

    ```python theme={null}
    from restate import TerminalError

    raise TerminalError("This tool is not allowed to run for this input.")
    ```

    Restate retries tool executions by default until they succeed.
    For errors which should not be retried, raise terminal errors from within your tool implementations.

    You can catch these terminal errors in your handler and handle them accordingly:

    ```python error_handling.py {"CODE_LOAD::https://raw.githubusercontent.com/restatedev/ai-examples/refs/heads/main/pydantic-ai/tour-of-agents/app/error_handling.py#handle"}  theme={null}
    @agent_service.handler()
    async def run(_ctx: restate.Context, req: WeatherPrompt) -> str:
        try:
            result = await restate_agent.run(req.message)
        except TerminalError as e:
            # Handle terminal errors gracefully
            return f"The agent couldn't complete the request: {e.message}"
        return result.output
    ```

    <GitHubLink url="https://github.com/restatedev/ai-examples/blob/main/pydantic-ai/tour-of-agents/app/error_handling.py" />
  </GlobalTab>

  <GlobalTab title="Restate TS">
    Restate automatically retries transient errors. This makes your tools resilient to network failures, database hiccups, and other temporary issues.

    When a tool encounters an unrecoverable error (e.g., resource not found, invalid input, business rule violation), throw a `TerminalError` to stop retries immediately:

    ```typescript {"CODE_LOAD::ts/src/tour/agents/terminal_error.ts#terminal_error"}  theme={null}
    throw new TerminalError("This tool is not allowed to run for this input.");
    ```

    You can catch and handle terminal errors in your agent logic if needed.
  </GlobalTab>

  <GlobalTab title="Restate Py">
    Restate automatically retries transient errors. This makes your tools resilient to network failures, database hiccups, and other temporary issues.

    When a tool encounters an unrecoverable error (e.g., resource not found, invalid input, business rule violation), raise a `TerminalError` to stop retries immediately:

    ```python theme={null}
    from restate import TerminalError

    raise TerminalError("This tool is not allowed to run for this input.")
    ```

    You can catch and handle terminal errors in your agent logic if needed.
  </GlobalTab>
</GlobalTabs>

<Tip>
  To learn more about error handling with Restate, consult the [error handling guide](/guides/error-handling).
</Tip>

## Combining with rollback

For multi-step agent workflows where steps have side effects (bookings, payments, emails), combine error handling with [compensation/rollback patterns](/ai/patterns/rollback) to undo completed work when later steps fail.
