Acquisition Diagnostics dock¶

Audience: operators triaging "why is this run weird?", contributors debugging adapters. Scope: every column of the Acquisition Diagnostics dock — per-worker rate, poll-period p50, jitter, and last-sample age — and how to read those rows alongside the status bar when something looks off.

The dock is a per-worker view of acquisition health. The status bar tells you that something is wrong; the diagnostics dock tells you which device. Both surfaces poll Conductor.runtime_diagnostics() at 1 Hz — they are looking at exactly the same numbers, just sliced differently.

See docks/diagnostics.py for the implementation and runtime/metrics.py for the underlying WorkerMetrics struct.

Layout¶

One row per worker (one resource_id). On the full real config that is six rows — one each for the heater, purge MFC, balance, NI-DAQ chassis, visible camera, and IR camera.

Column	What it shows
Device	The adapter name(s) hosted by this worker. A worker with one adapter shows just that name; a worker that hosts two adapters that share a serial port shows both, comma-separated.
Rate (Hz)	Measured poll rate — `1000 / poll_period_p50_ms`. This is the operator-facing acquisition rate to compare against the device's configured `rate_hz`.
p50 (ms)	Median wall-clock gap between consecutive polls. Stable rates have a tight p50; loop-lag drives p50 up.
Jitter (ms)	`p99 − p50` of the poll-period ring. The long tail of poll lateness.
Age (s)	Wall-clock seconds since the most recent poll on this worker. Colour-coded: green ≤ 2 s, yellow 2–5 s, red > 5 s.

All four numeric columns hold an em-dash (—) until at least two polls have landed on that worker. Poll-period needs two timestamps to compute a gap, so showing 0.00 after a single poll would lie about the cadence.

What each column actually measures¶

Rate (Hz)¶

The displayed rate is inverse of poll-period p50, not poll-count divided by elapsed time. The distinction matters: the underlying counter, WorkerMetrics.poll_rate_hz, responds within ~50 samples to a rate change, where a naive count / elapsed would lag for the full run length.

A second subtlety: rate is keyed on SourceRecord emissions — one per actual device poll — not on every adapter.stream() yield. Every adapter yields a burst of emissions per poll (1 SourceRecord + N ChannelSamples + the occasional DeviceSnapshot), so a count-all-yields / elapsed calculation would report tens of thousands of Hz for a 1 Hz device. See runtime/metrics.py's polls_emitted vs samples_emitted.

What "healthy" looks like for the real configurations:

Worker	Configured `rate_hz`	Displayed Rate
Heater (Watlow)	2 Hz	~2.00
Purge MFC (Alicat)	2 Hz	~2.00
Balance (Sartorius)	50 Hz	~50.0
NI-DAQ chassis	5 Hz	~5.00
Visible camera	30 fps	~30.0
FLIR IR	30 fps	~30.0

A persistent disagreement between configured rate and displayed rate (more than a few percent low) means the worker's loop is missing its target cadence — see Age and Jitter for which signal points where.

p50 (ms) and Jitter (ms)¶

p50 is the median gap between consecutive polls. Jitter is p99 − p50, so it captures the long tail without being dominated by the median.

For a clean 50 Hz balance: p50 ≈ 20 ms, jitter ≈ 1–3 ms.
For a 2 Hz heater: p50 ≈ 500 ms, jitter ≈ 5–10 ms.

Healthy jitter is a single-digit fraction of p50. Jitter that grows toward p50 (e.g. p50=500 ms, jitter=200 ms) means polls are landing in clumps — the worker is alternately stalling and catching up.

Both percentiles read from a 1024-observation ring inside WorkerMetrics.poll_period_ms, so the readout reflects the last ~20 s of a 50 Hz worker or the last ~8.5 min of a 2 Hz worker. Slow workers update slowly; that's the cost of a fixed-size ring.

Age (s)¶

Wall-clock seconds since this worker last produced a SourceRecord. The colouring is harsh on purpose:

Green (≤ 2 s): normal.
Yellow (2–5 s, or any worker with loop_lag_p99_ms above the configured warn threshold): the worker is degraded but still producing.
Red (> 5 s): the worker has not polled in five seconds. For a 2 Hz device that's 10 missed polls; for a 50 Hz balance that's 250 missed polls. Something is wrong — either the adapter has wedged, the serial transport has died, or the worker's loop is starved.

A row goes neutral grey (idle) when no run is active, or when the worker exists in the pool but has not produced a poll yet — distinguishable from red because the numeric cells stay as em-dashes rather than displaying the last-known value.

Reading the dock alongside the status bar¶

The status bar is an aggregated view: one sat pill summarising the worst bridge, one loop pill summarising the conductor. The diagnostics dock is the per-worker drill-down. Use them together:

sat red, single Age column also red. A single worker is stuck. Likely a serial-port wedge or an adapter that has stopped yielding. Check that device's events in events.sqlite and the worker's section of run.log.
sat red, every Age column climbing in lockstep. The downstream — writer thread, disk, or a slow BLOCK-policy databus subscriber — is the bottleneck. No single worker is at fault; they are all backed up because their drain task is blocked downstream. See Saturation and deadlines.
loop red, p50 columns drifting upward across all workers. The conductor loop is CPU-starved. Most common cause is a CustomStep procedure handler doing inline CPU work; see runtime-architecture.md §11.
loop red, p50 columns not drifting. Each worker is on its own loop, so worker-side p50 is insensitive to conductor-side starvation. The mismatch tells you the loop lag is real on the conductor but the workers are still polling fine — the data is queued at the bridge, not delayed at the device.
Age red on the IR camera, everything else green. The FLIR SDK has stopped delivering frames. Confirm in events.sqlite for a camera.recording_stopped event without a matching recording_started. The cancellation shield (see runtime-architecture §5) means a wedged vendor SDK call cannot be safely interrupted; recovery requires restarting capa.

What the dock does not show¶

No latency / inbox / fsync metrics. Writer-thread health is observable only through the sat pill and the downstream effect on every worker's Age. The diagnostics dock is producer-side.
No per-channel rates. A worker may host multiple channels (a Watlow heater emits both PV and setpoint as channels under one poll). The dock measures the poll cadence, not the channel emission count.
No history beyond the percentile ring. Closing and reopening the run resets every row; the dock is a live view, not a recording. The bundle's manifest.json queue_health block captures the final snapshot on seal — that is the post-run archival source.