Skip to main content
Combine Restate and Langfuse to get full observability into your agent executions. Langfuse traces every LLM call, tool invocation, token usage, and cost. Restate traces every durable step, so you see agentic steps alongside regular workflow steps in a single trace. You don’t need to change your agent code. You only add the Langfuse instrumentation to your entry point.

Langfuse Documentation

Learn more about Langfuse’s observability, prompt management, and evaluation features.

Instrumentation setup

Select your Agent SDK:

What you see in Langfuse

Once you send a request, you can inspect the trace in Langfuse. You see the agentic steps (LLM calls, tool invocations) alongside regular workflow steps (e.g. currency conversion, reimbursement), with inputs, outputs, model configuration, and token usage for each LLM call. Restate manages the execution, starts the parent span, and exports the full journal as OpenTelemetry traces. The Langfuse SDK attaches AI-specific spans and metadata under Restate’s parent span.
Langfuse trace
Langfuse timeline
Restate’s Tracer Provider flattens the Langfuse spans to make them appear consistently structured with the Restate spans in the UI. We are working on a next iteration of the integration which will respect the Langfuse span nesting and puts the Restate spans at the right depths inside them.

Going further

Langfuse and Restate complement each other beyond basic tracing:

Versioning

Restate’s versioning model ensures each trace is linked to a single immutable code version. Compare quality across versions in Langfuse and spot regressions.
Langfuse timeline

Prompt management

Fetch version-controlled prompts from Langfuse as a durable step with ctx.run. Retries reuse the same prompt; new executions pick up the latest version.

Async evals example

Run LLM-as-a-Judge evaluations as async Restate workflows that don’t block agent execution. Restate acts as both the queue and the orchestrator. See the example below. You can submit an evaluation from the agent via a one-way call, so it runs asynchronously without blocking the main agent. The eval workflow runs an LLM judge and writes the score back to the original trace in Langfuse:
evaluation.py

langfuse = get_client()

judge_agent = Agent(
    name="ClaimEvaluationJudge",
    instructions=(
        "You are an expert evaluator of insurance claim processing. "
        "Rate the overall quality of the claim agent's response as a score "
        "between 0.0 and 1.0, and provide a brief reason for your rating."
    ),
    output_type=EvaluationScore,
)

evaluation_service = restate.Service("LLMJudgeEvaluation")


@evaluation_service.handler()
async def evaluate(ctx: restate.Context, req: EvaluationRequest) -> None:
    # Step 1: Run the LLM judge (durable — retried on failure)
    result = await DurableRunner.run(
        judge_agent,
        f"Evaluate this insurance claim processing:\n\n"
        f"**Claim Input:**\n{req.input}\n\n"
        f"**Agent Output:**\n{req.output}",
    )
    evaluation: EvaluationScore = result.final_output

    # Step 2: Write the score to Langfuse on the original claim trace
    async def score_trace() -> None:
        langfuse.create_score(
            trace_id=req.trace_id(),
            name="quality",
            value=evaluation.score,
            data_type="NUMERIC",
            comment=evaluation.reason,
        )
        langfuse.flush()

    await ctx.run_typed("Score trace in Langfuse", score_trace)
If you run the OpenAI Agents example above, then you will see the evaluation score in Langfuse:
Langfuse timeline
For a full walkthrough of these features, see the Restate + Langfuse blog post.