// changelog · weekly ships

What we shipped, week by week.

We deploy on Tuesdays. Big releases get a write-up. Small ones get a bullet. Every entry below is a real change with a real engineer behind it.

RSS · /changelog.rss JSON · /changelog.json @ · subscribe by email X · @yourbusiness
v2.4.0
Apr 28, 2026
Major

Streaming traces are live by default.

Until today, traces showed up in your dashboard about 3 seconds after the trace closed. As of this release, every span streams to the dashboard within 800ms of its start — meaning you can watch a long-running agent run live as it happens, without waiting for it to finish.

Live trace streaming

What's new

  • Live trace tail in dashboard with sub-second latency, default-on for all Pro and Enterprise tiers
  • New --follow flag in CLI that streams traces in real time, including streaming LLM token chunks
  • Live waterfall visualization that updates as nested spans complete
  • Backwards-compatible — no SDK upgrade required for the streaming UI to work
// new in v2.4 — follow a trace as it runs
$ yourbusiness trace --follow agent-7c4e
→ streaming live trace from production…

Migration notes

Existing traces continue to work as-is. The streaming feature is opt-out via your project settings if you'd rather keep the old batched UI for now. We'll remove that toggle in v3.0 (Q4 2026).

Shipped by Priya, Yusuf, and the Streaming Squad · 21 PRs · 4,820 lines
v2.3.6
Apr 21, 2026
Feature

Per-customer eval slicing

Eval results were already filterable by feature and model — now you can slice by any tag you've set, including customer_id. The most common use case: catching when a single high-value customer's responses regress while everyone else's stay stable.

What's new

  • New slice parameter on eval definitions, accepts any tag key
  • Alerts can fire per-slice — get paged only when a specific customer's evals drop
  • Dashboard heatmap of eval scores by tag value, sortable and exportable to CSV
  • Backfill mode: run a new eval definition over the last 30 days of traces in one click
Shipped by Yvonne, Tomás · 8 PRs
v2.3.5
Apr 14, 2026
Feature

LangGraph adapter, GA

The LangGraph adapter has been in beta since January. After 4,200 customer installs and a couple thousand bug reports later, it's stable enough to GA. Per-node spans, edge traversal traces, and full state snapshots at each step.

What's new

  • Native auto-instrumentation — drop in withTracing(graph) and you're done
  • State snapshots at each node, optionally redacted by key pattern
  • Edge traversal events traced as their own span type
  • Compatible with LangGraph 0.0.40+ and LangChain 0.2+
Shipped by Aiden, the Adapters team · 14 PRs
v2.3.4
Apr 7, 2026
Fix

Replay sandbox handles tool-call concurrency

Reported by three customers in the last month: when a trace had multiple parallel tool calls within the same span, the replay sandbox would serialize them, sometimes producing different outputs from the original trace. We rewrote the sandbox executor to honor the original concurrency, including async ordering.

What's new

  • Replay now executes tool calls in the original concurrency model — Promise.all groups behave correctly
  • New --strict flag flags any divergence from the recorded trace and bails with a useful error
  • Improved error messages when sandbox restoration fails — now points at the missing tool stub by name
Shipped by Devesh · 3 PRs
v2.3.3
Mar 31, 2026
Feature

Budget alerts at the slice level

You can now set spend budgets per slice — per customer, per feature, per model — and alert when any one slice approaches threshold. Previously, you could only alert on total project spend.

  • Budget targets supported per tag value
  • 50/80/100% threshold alerts with separate channels per stage
  • Spend snapshot in the alert payload — see exactly which span family is driving the bill
  • Optional auto-throttle: cap ingest from a single slice when it exceeds 100% of its budget
Shipped by Yvonne, Tomás · 6 PRs
v2.3.2
Mar 24, 2026
Fix

Anthropic prompt-caching tokens captured correctly

When Claude returned cached prompt tokens, we were attributing the cost to the input bucket instead of the (much cheaper) cached bucket. The dashboard's spend was overstated by 3–4% for prompt-caching-heavy workloads. Fixed and back-filled all traces from the last 90 days.

  • Cached input tokens now tracked as their own span attribute (cached_input_tokens)
  • Provider price list updated to reflect the discounted cached rate
  • Historical traces re-priced — your March bill might tick down a hair
Shipped by Aiden · 1 PR · with extreme apologies
v2.3.0
Mar 17, 2026
Major

Datasets, graduated.

Datasets came out of beta this week. Curate eval datasets from real production traces with one click, version them, and run regression suites in CI. Three of our largest customers are running >12,000 evals per release in CI as of today, all sourced from the production-trace dataset feature.

What's new

  • Dataset CLI: yourbusiness dataset add <trace_id>, freeze, diff, run
  • Frozen versions are immutable; you can roll back evals to a known-good dataset state
  • GitHub Actions integration: a dataset run on every PR, with eval results as a check status
  • Deduplication on add — refuses to add a trace that's already part of the dataset
Shipped by the Eval Platform team · 28 PRs · 6 weeks of work

Older releases

The full archive goes back to the v0.1 alpha in October 2023. We don't delete entries — even the embarrassing ones.

Browse the archive →

Try the new features on real traffic.

Start free, 50,000 spans a month, no credit card. Streaming traces are on by default — you'll see your first one in the dashboard a minute after install.

Start free → $ npm i @yourbusiness/sdk