Your Business — Changelog · What we shipped, week by week

v2.4.0

Apr 28, 2026

Major

Streaming traces are live by default.

Until today, traces showed up in your dashboard about 3 seconds after the trace closed. As of this release, every span streams to the dashboard within 800ms of its start — meaning you can watch a long-running agent run live as it happens, without waiting for it to finish.

What's new

Live trace tail in dashboard with sub-second latency, default-on for all Pro and Enterprise tiers
New --follow flag in CLI that streams traces in real time, including streaming LLM token chunks
Live waterfall visualization that updates as nested spans complete
Backwards-compatible — no SDK upgrade required for the streaming UI to work

// new in v2.4 — follow a trace as it runs
$ yourbusiness trace --follow agent-7c4e
→ streaming live trace from production…

Migration notes

Existing traces continue to work as-is. The streaming feature is opt-out via your project settings if you'd rather keep the old batched UI for now. We'll remove that toggle in v3.0 (Q4 2026).

Shipped by Priya, Yusuf, and the Streaming Squad · 21 PRs · 4,820 lines

v2.3.6

Apr 21, 2026

Feature

Per-customer eval slicing

Eval results were already filterable by feature and model — now you can slice by any tag you've set, including customer_id. The most common use case: catching when a single high-value customer's responses regress while everyone else's stay stable.

What's new

New slice parameter on eval definitions, accepts any tag key
Alerts can fire per-slice — get paged only when a specific customer's evals drop
Dashboard heatmap of eval scores by tag value, sortable and exportable to CSV
Backfill mode: run a new eval definition over the last 30 days of traces in one click

Shipped by Yvonne, Tomás · 8 PRs

v2.3.5

Apr 14, 2026

Feature

LangGraph adapter, GA

The LangGraph adapter has been in beta since January. After 4,200 customer installs and a couple thousand bug reports later, it's stable enough to GA. Per-node spans, edge traversal traces, and full state snapshots at each step.

What's new

Native auto-instrumentation — drop in withTracing(graph) and you're done
State snapshots at each node, optionally redacted by key pattern
Edge traversal events traced as their own span type
Compatible with LangGraph 0.0.40+ and LangChain 0.2+

Shipped by Aiden, the Adapters team · 14 PRs

v2.3.4

Apr 7, 2026

Fix

Replay sandbox handles tool-call concurrency

Reported by three customers in the last month: when a trace had multiple parallel tool calls within the same span, the replay sandbox would serialize them, sometimes producing different outputs from the original trace. We rewrote the sandbox executor to honor the original concurrency, including async ordering.

What's new

Replay now executes tool calls in the original concurrency model — Promise.all groups behave correctly
New --strict flag flags any divergence from the recorded trace and bails with a useful error
Improved error messages when sandbox restoration fails — now points at the missing tool stub by name

Shipped by Devesh · 3 PRs

v2.3.3

Mar 31, 2026

Feature

Budget alerts at the slice level

You can now set spend budgets per slice — per customer, per feature, per model — and alert when any one slice approaches threshold. Previously, you could only alert on total project spend.

Budget targets supported per tag value
50/80/100% threshold alerts with separate channels per stage
Spend snapshot in the alert payload — see exactly which span family is driving the bill
Optional auto-throttle: cap ingest from a single slice when it exceeds 100% of its budget

Shipped by Yvonne, Tomás · 6 PRs

v2.3.2

Mar 24, 2026

Fix

Anthropic prompt-caching tokens captured correctly

When Claude returned cached prompt tokens, we were attributing the cost to the input bucket instead of the (much cheaper) cached bucket. The dashboard's spend was overstated by 3–4% for prompt-caching-heavy workloads. Fixed and back-filled all traces from the last 90 days.

Cached input tokens now tracked as their own span attribute (cached_input_tokens)
Provider price list updated to reflect the discounted cached rate
Historical traces re-priced — your March bill might tick down a hair

Shipped by Aiden · 1 PR · with extreme apologies

v2.3.0

Mar 17, 2026

Major

Datasets, graduated.

Datasets came out of beta this week. Curate eval datasets from real production traces with one click, version them, and run regression suites in CI. Three of our largest customers are running >12,000 evals per release in CI as of today, all sourced from the production-trace dataset feature.

What's new

Dataset CLI: yourbusiness dataset add <trace_id>, freeze, diff, run
Frozen versions are immutable; you can roll back evals to a known-good dataset state
GitHub Actions integration: a dataset run on every PR, with eval results as a check status
Deduplication on add — refuses to add a trace that's already part of the dataset

Shipped by the Eval Platform team · 28 PRs · 6 weeks of work

What we shipped, week by week.

Streaming traces are live by default.

What's new

Migration notes

Per-customer eval slicing

What's new

LangGraph adapter, GA

What's new

Replay sandbox handles tool-call concurrency

What's new

Budget alerts at the slice level

Anthropic prompt-caching tokens captured correctly

Datasets, graduated.

What's new

Older releases

Try the new features on real traffic.