- Querying multiple AI models (e.g., GPT-4, Claude, Gemini) and returning the fastest response
- Running different agents, prompts or strategies in parallel and using the first successful outcome
How does Restate help?
The benefits of using Restate for competitive racing patterns are:- First-to-succeed optimization: Restate lets you race multiple approaches and automatically return the first successful result
- Durable coordination: Restate turns Promises/Futures into durable, distributed constructs that persist across failures and process restarts
- Cancel slow tasks: Failed or slower approaches can be cancelled, preventing resource waste
- Serverless scaling: Deploy racing strategies on serverless infrastructure for automatic scaling while the main process remains suspended
- Works with any LLM SDK (Vercel AI, LangChain, LiteLLM, etc.) and any programming language supported by Restate (TypeScript, Python, Go, etc.).
Example
When you need a quick response and have access to multiple AI models, race them against each other to get the fastest result:
Run the example
Run the example
1
Requirements
- AI SDK of your choice (e.g., OpenAI, LangChain, Pydantic AI, LiteLLM, etc.) to make LLM calls.
- API key for your model provider.
2
Download the example
3
Start the Restate Server
4
Start the Service
Export the API key of your model provider as an environment variable and then start the agent. For example, for OpenAI:
5
Register the services
- UI
- CLI

6
Send a request
In the UI (
http://localhost:9070), click on the run handler of the RacingAgent service to open the playground and send a prompt to race multiple models:
7
Check the Restate UI
In the UI, you can see how multiple models are queried simultaneously, with the fastest response winning the race:
