Braintrust Integration

Temporal's integration with Braintrust gives you full observability into your AI agent Workflows—tracing every LLM call, managing prompts without code deploys, and tracking costs across models.

When building AI agents with Temporal, you get durable execution: automatic retries, state persistence, and the ability to recover from failures mid-workflow. Braintrust adds the observability layer: see exactly what your agents are doing, iterate on prompts in a UI, and measure whether changes improve outcomes.

The integration connects these capabilities with minimal code changes. Every Workflow and Activity becomes a span in Braintrust, and every LLM call is traced with inputs, outputs, tokens, and latency.

All code snippets in this guide are taken from the deep research sample. Refer to the sample for the complete code and run it locally.

Prerequisites

This guide assumes you are already familiar with Braintrust. If you aren't, refer to the Braintrust documentation for more details.
If you are new to Temporal, we recommend reading Understanding Temporal or taking the Temporal 101 course.
Ensure you have set up your local development environment by following the Set up your local development environment guide. When you're done, leave the Temporal Development Server running if you want to test your code locally.

Configure Workers to use Braintrust

Workers execute the code that defines your Workflows and Activities. To trace Workflow and Activity execution in Braintrust, add the BraintrustPlugin to your Worker.

Follow the steps below to configure your Worker.

Install the Braintrust SDK with Temporal support.
```
uv pip install "braintrust[temporal]"
```

Initialize the Braintrust logger before creating your Worker. The logger must be initialized first so that spans are properly connected.

import os
from braintrust import init_logger

# Initialize BEFORE creating the Temporal client or worker
init_logger(project=os.environ.get("BRAINTRUST_PROJECT", "my-project"))

Add the BraintrustPlugin to your Worker.

from braintrust.contrib.temporal import BraintrustPlugin
from temporalio.worker import Worker

worker = Worker(
    client,
    task_queue="my-task-queue",
    workflows=[MyWorkflow],
    activities=[my_activity],
    plugins=[BraintrustPlugin()],  # Add this line
)

Add the plugin to your Temporal Client as well. This enables span context propagation, linking client code to the Workflows it starts.

from temporalio.client import Client
from braintrust.contrib.temporal import BraintrustPlugin

client = await Client.connect(
    "localhost:7233",
    plugins=[BraintrustPlugin()],
)

Run the Worker. Ensure the Worker process has access to your Braintrust API key via the BRAINTRUST_API_KEY environment variable.
```
export BRAINTRUST_API_KEY="your-api-key"
python worker.py
```
tip
You only need to provide API credentials to the Worker process. The client application that starts Workflow Executions doesn't need the Braintrust API key.

Trace LLM calls with wrap_openai

The simplest way to trace LLM calls is to wrap your OpenAI client. Every call through the wrapped client automatically creates a span in Braintrust with inputs, outputs, token counts, and latency.

from braintrust import wrap_openai
from openai import AsyncOpenAI

# Wrap the client - all calls are now traced
# max_retries=0 because Temporal handles retries
client = wrap_openai(AsyncOpenAI(max_retries=0))

Use this client in your Activities:

from temporalio import activity

@activity.defn
async def invoke_model(prompt: str) -> str:
    client = wrap_openai(AsyncOpenAI(max_retries=0))

    response = await client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {"role": "system", "content": "You are a helpful assistant."},
            {"role": "user", "content": prompt},
        ],
    )

    return response.choices[0].message.content

After running a Workflow, you'll see a trace hierarchy in Braintrust:

my-workflow-request (client span)
└── temporal.workflow.MyWorkflow
    └── temporal.activity.invoke_model
        └── Chat Completion (gpt-4o)

Add custom spans for application context

Add your own spans to capture business-level context like user queries, workflow inputs, and final outputs.

from braintrust import start_span

async def run_research(query: str):
    with start_span(name="research-request", type="task") as span:
        span.log(input={"query": query})

        result = await client.execute_workflow(
            ResearchWorkflow.run,
            query,
            id=f"research-{uuid.uuid4()}",
            task_queue="research-task-queue",
        )

        span.log(output={"result": result})
        return result

Manage prompts with load_prompt

Braintrust lets you manage prompts in a UI and deploy changes without code deploys. The workflow is:

Develop prompts in code, see results in Braintrust traces
Create a prompt in the Braintrust UI from your best version
Evaluate different versions using Braintrust's eval tools
Deploy by pointing your code at the Braintrust prompt
Iterate in the UI—changes go live without code deploys

To load a prompt from Braintrust in your Activity:

import braintrust
from temporalio import activity

@activity.defn
async def invoke_model(prompt_slug: str, user_input: str) -> str:
    # Load prompt from Braintrust
    prompt = braintrust.load_prompt(
        project=os.environ.get("BRAINTRUST_PROJECT", "my-project"),
        slug=prompt_slug,
    )

    # Build returns the full prompt configuration
    built = prompt.build()

    # Extract system message
    system_content = None
    for msg in built.get("messages", []):
        if msg.get("role") == "system":
            system_content = msg["content"]
            break

    client = wrap_openai(AsyncOpenAI(max_retries=0))

    response = await client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {"role": "system", "content": system_content},
            {"role": "user", "content": user_input},
        ],
    )

    return response.choices[0].message.content

tip

Provide a fallback prompt in your code for resilience. If Braintrust is unavailable, your Workflow continues with the hardcoded prompt.

DEFAULT_SYSTEM_PROMPT = "You are a helpful assistant."

try:
    prompt = braintrust.load_prompt(project="my-project", slug="my-prompt")
    system_content = extract_system_message(prompt.build())
except Exception as e:
    activity.logger.warning(f"Failed to load prompt: {e}. Using fallback.")
    system_content = DEFAULT_SYSTEM_PROMPT

Example: Deep Research Agent

The deep research sample demonstrates a complete AI agent that:

Plans research strategies
Generates search queries
Executes web searches in parallel
Synthesizes findings into comprehensive reports

The sample shows all integration patterns: wrapped OpenAI client, BraintrustPlugin on Worker and Client, custom spans, and prompt management with load_prompt().

To run the sample:

# Terminal 1: Start Temporal
temporal server start-dev

# Terminal 2: Start the worker
export BRAINTRUST_API_KEY="your-api-key"
export OPENAI_API_KEY="your-api-key"
export BRAINTRUST_PROJECT="deep-research"
uv run python -m worker

# Terminal 3: Run a research query
uv run python -m start_workflow "What are the latest advances in quantum computing?"

Prerequisites​

Configure Workers to use Braintrust​

Trace LLM calls with wrap_openai​

Add custom spans for application context​

Manage prompts with load_prompt​

Example: Deep Research Agent​