Skip to content

Status bar — how to read it

Audience: operators running CAPA on the rig, contributors triaging "why does my run look weird?" Scope: what each pill in the bottom status bar means, what counts as healthy / unhealthy, what to do when it isn't healthy.

The status bar polls live metrics from the conductor and UI ring buffers at 1 Hz. Every pill is designed to mean something right now — counters that grow by-design or percentiles over stale windows have been deliberately removed. See statusbar.py for the implementation and runtime-architecture.md for the runtime concepts each pill exposes.


Pill order, left to right

state · elapsed · UI overflow · sat · loop · q · safety queue · disk · cam · op · bundle

Pill What it shows Good Warn Fail
state Run lifecycle RUNNING (green) / IDLE (gray) / SEALED (green) PREPARING / DRAINING / FINALIZING FAILED
elapsed Run wall time n/a n/a n/a
UI overflow UI ring buffer rollovers — oldest sample evicted (excludes decimation) informational
sat Worst current blocked_since_ms across worker→conductor bridges sat ok (green) blocked Ns ≥ 25% of saturation_deadline_s (yellow) ≥ 50% of deadline (red)
loop Conductor loop p99 lag (sliding 1024-sample window, ~51 s at 20 Hz) < runtime.loop_lag_warn_ms warn–4×warn (yellow) ≥ 4×warn (red)
q Worst current depth / lifetime max depth across bridges cur near 0 cur climbing toward max cur ≈ capacity
safety queue Reserved safety monitor queue always 0 today
disk Free space on runs_root > 15% free < 15% (yellow) < 5% (red)
cam Camera health cam n/a when no camera is active
op Operator id non-empty
bundle Path of the active run's bundle shown when sealed

UI overflow

What it shows. Total ring rollovers summed across every channel's UI ring buffer (ringbuffer.py:total_overflow). Each rollover = a new sample arrived while the ring was already at capacity, so the oldest sample was evicted to make room.

This is not a backpressure signal. Plot snapshots are non-draining copies — the ring is never emptied by the consumer. So once a buffer accumulates its capacity-worth of samples, every subsequent push rolls over by construction, regardless of how fast or slow the UI thread is running. The pill is informational; for actual UI / conductor distress use sat, loop, and q.

Sizing. RingBufferRegistry.register defaults to enough capacity to hold DEFAULT_HISTORY_S (10 min) of samples at the channel's decimate_to_hz. So under normal config a 50 Hz balance ring holds ~30 000 samples and only starts rolling over after ten minutes of run time.

Decimation tooltip. Each ring buffer has a decimate_to_hz (ChannelSpec default 60 Hz); samples that arrive faster than that interval are dropped by the ring on push. The tooltip surfaces this count separately so you can tell whether a fast producer is pushing past the channel's configured decimate rate.

When the rollover rate matters. If overflow grows much faster than (producer_rate − decimate_to_hz) × elapsed-past-fill, something is registering the buffer with an unexpectedly small capacity — check that decimate_to_hz on the ChannelSpec is set high enough to keep the samples you care about, and that no caller is overriding capacity downward.


sat (saturation)

What it shows. Worst blocked_since_ms across worker→conductor bridges. A bridge's blocked_since_ms is non-None whenever a producer (worker) is currently waiting for outbound bridge space (bridge.py:blocked_since_ms). This is the saturation-deadline signal the conductor's SaturationMonitor polls.

Healthy. sat ok (green). No bridge has a blocked producer.

Warn (yellow). blocked Ns where N ≥ 25% of saturation_deadline_s (default 10s, so ≥ 2.5 s). Drain is falling behind a producer; not yet fatal but trending.

Fail (red). N ≥ 50% of saturation_deadline_s (≥ 5 s by default). The run is in real danger of being sealed as crashed_but_sealed (runtime-architecture.md §6.3).

What to check, in order. 1. Loop lag pill — high conductor loop lag means the loop itself is CPU-starved; nothing downstream gets done. Likely a CustomStep procedure handler doing inline CPU work, see runtime-architecture.md §11. 2. Disk pill — writer fsync slows when disk is full or slow; writer inbox fills; conductor drain blocks on await writer.submit/record_frame; bridges back up. 3. Camera config — high-resolution / high-fps video encode (libx264) on a CPU-bound writer thread is the canonical cause. Swap to h264_qsv (Intel iGPU) / h264_nvenc (NVIDIA) / mjpeg (no encode) in the camera params. 4. A BLOCK-policy DataBus subscriber doing slow work — the conductor drain awaits bus.publish, so a slow subscriber stalls it.

What it is NOT. This pill does not reflect bridge latency p99. The legacy "sink lag" metric used latency_p99_ms, a 1024-sample ring — which on low-rate bridges (Watlow at 2 Hz = ~8.5 minutes per window) meant one bad startup observation pinned the readout for ages. sat reads current state only.


loop (conductor loop lag)

What it shows. Conductor loop p99 lag in ms, over a sliding 1024-observation window at 20 Hz heartbeat (≈51 seconds of memory) (heartbeat.py). Lag is the difference between when a 50 ms heartbeat should have fired and when it actually did — i.e. how long the loop was not scheduling tasks.

Healthy. Below runtime.loop_lag_warn_ms (default 50 ms).

Warn. Between runtime.loop_lag_warn_ms and four times that value (default 50–200 ms, yellow). The conductor's drain tasks, heartbeat, and saturation monitor are not getting fair scheduling. Procedure preflight _wait_for may start missing its target cadence.

Fail. At or above four times runtime.loop_lag_warn_ms (default ≥ 200 ms, red). Something on the conductor loop is doing heavy synchronous work. The most common cause is a CustomStep handler ignoring the §11 contract — every plugin author's custom step MUST wrap CPU work in anyio.to_thread.run_sync.

How to triage. - loop high + sat ok → conductor loop is CPU-busy but bridges aren't filling yet (busy loop is processing emissions, just slowly). Will become saturation if sustained. - loop low + sat warning/fail → conductor loop is fine, the downstream (writer or BLOCK subscriber) is the bottleneck. Drain task is parked on an await. - Both high → CPU starvation that's already caused downstream backup.


q (queue depth)

What it shows. cur/max — worst current depth and lifetime depth_max across worker→conductor bridges.

Healthy. cur stays low (single-digit, or near zero), regardless of max. The max may be high if there was a one-time spike; that's history.

Unhealthy. cur climbing toward max, or cur consistently near bridge capacity. Bridge capacities are max(64, ceil(8 * rate_hz)), so:

Worker Rate Capacity
Watlow 2 Hz 64
Alicat 2 Hz 64
NI-DAQ 5 Hz 64
Balance 50 Hz 400
Webcam 30 fps 240
FLIR IR 30 fps 240

A balance at cur 380/400 means the drain is about to block the balance producer.

What to check. Same triage as satq climbing precedes sat going red.


What "healthy steady state" looks like for capa_real_full

After ~30 s of run time:

RUNNING  00:00:42  UI overflow 0  sat ok  loop 8 ms  q 1/47  …
  • UI overflow stays at 0 for the first ~10 min, then grows at roughly the producer rate (informational).
  • sat ok stays green.
  • loop stays well under 50 ms.
  • q cur stays under ~10; max may show a startup spike (50-ish on the balance is normal for the first second).

Anything else means something is wrong — and the pill that goes red first is the most diagnostic. loop first → CPU on conductor. sat first (with loop low) → writer / disk / camera encode. UI overflow only → UI thread is the issue (heavy widget callback).


Quick reference: which pill points where

UI overflow climbing       → informational (ring at capacity); use sat/loop/q for real distress
loop > 50 ms              → CPU on conductor loop (CustomStep, missing to_thread.run_sync)
sat blocked, loop low     → downstream stall: writer, disk, slow BLOCK subscriber
sat blocked, loop high    → CPU on conductor cascaded into downstream backup
q cur high, sat ok        → transient burst; watch for trend
disk red                  → free space; writer will stall imminently

See also: runtime-architecture.md for the runtime concepts.