When an LLM decides to call multiple tools, executing them in parallel instead of sequentially can significantly reduce latency.
Use Restate’s parallelization primitives
Agent SDKs natively support parallel tool calls, but this is disabled when integrating with Restate.
Parallel tool calls that use the Restate Context can execute in a different order during replays, breaking Restate’s deterministic execution guarantees.
Instead, you use Restate’s durable execution primitives (RestatePromise.all() in TypeScript, restate.gather() in Python) to parallelize work. There are two patterns for this:
- With Agent SDK: use orchestrator tool: Create a single tool that internally fans out multiple steps in parallel using Restate. The agent SDK sees one tool call, but that tool runs work concurrently.
- With only Restate: Custom agent loop: Manage the agentic loop yourself with the Restate SDK directly. You control the tool execution step and can run all tool calls in parallel.
When you manage the agentic loop yourself with the Restate SDK, you have full control over tool execution. After the LLM returns multiple tool calls, you start all of them concurrently and wait for all to complete before feeding results back to the LLM.
The Restate UI shows how multiple tool calls execute concurrently, with all operations completing in parallel:
