Acquisition Diagnostics dock¶
Audience: operators triaging "why is this run weird?", contributors debugging adapters. Scope: every column of the Acquisition Diagnostics dock — per-worker rate, poll-period p50, jitter, and last-sample age — and how to read those rows alongside the status bar when something looks off.
The dock is a per-worker view of acquisition health. The status bar tells you that something is wrong; the diagnostics dock tells you which device. Both surfaces poll Conductor.runtime_diagnostics() at 1 Hz — they are looking at exactly the same numbers, just sliced differently.
See docks/diagnostics.py for the implementation and runtime/metrics.py for the underlying WorkerMetrics struct.
Layout¶
One row per worker (one resource_id). On the full real config that is six rows — one each for the heater, purge MFC, balance, NI-DAQ chassis, visible camera, and IR camera.
| Column | What it shows |
|---|---|
| Device | The adapter name(s) hosted by this worker. A worker with one adapter shows just that name; a worker that hosts two adapters that share a serial port shows both, comma-separated. |
| Rate (Hz) | Measured poll rate — 1000 / poll_period_p50_ms. This is the operator-facing acquisition rate to compare against the device's configured rate_hz. |
| p50 (ms) | Median wall-clock gap between consecutive polls. Stable rates have a tight p50; loop-lag drives p50 up. |
| Jitter (ms) | p99 − p50 of the poll-period ring. The long tail of poll lateness. |
| Age (s) | Wall-clock seconds since the most recent poll on this worker. Colour-coded: green ≤ 2 s, yellow 2–5 s, red > 5 s. |
All four numeric columns hold an em-dash (—) until at least two polls have landed on that worker. Poll-period needs two timestamps to compute a gap, so showing 0.00 after a single poll would lie about the cadence.
What each column actually measures¶
Rate (Hz)¶
The displayed rate is inverse of poll-period p50, not poll-count divided by elapsed time. The distinction matters: the underlying counter, WorkerMetrics.poll_rate_hz, responds within ~50 samples to a rate change, where a naive count / elapsed would lag for the full run length.
A second subtlety: rate is keyed on SourceRecord emissions — one per actual device poll — not on every adapter.stream() yield. Every adapter yields a burst of emissions per poll (1 SourceRecord + N ChannelSamples + the occasional DeviceSnapshot), so a count-all-yields / elapsed calculation would report tens of thousands of Hz for a 1 Hz device. See runtime/metrics.py's polls_emitted vs samples_emitted.
What "healthy" looks like for the real configurations:
| Worker | Configured rate_hz |
Displayed Rate |
|---|---|---|
| Heater (Watlow) | 2 Hz | ~2.00 |
| Purge MFC (Alicat) | 2 Hz | ~2.00 |
| Balance (Sartorius) | 50 Hz | ~50.0 |
| NI-DAQ chassis | 5 Hz | ~5.00 |
| Visible camera | 30 fps | ~30.0 |
| FLIR IR | 30 fps | ~30.0 |
A persistent disagreement between configured rate and displayed rate (more than a few percent low) means the worker's loop is missing its target cadence — see Age and Jitter for which signal points where.
p50 (ms) and Jitter (ms)¶
p50 is the median gap between consecutive polls. Jitter is p99 − p50, so it captures the long tail without being dominated by the median.
- For a clean 50 Hz balance: p50 ≈ 20 ms, jitter ≈ 1–3 ms.
- For a 2 Hz heater: p50 ≈ 500 ms, jitter ≈ 5–10 ms.
Healthy jitter is a single-digit fraction of p50. Jitter that grows toward p50 (e.g. p50=500 ms, jitter=200 ms) means polls are landing in clumps — the worker is alternately stalling and catching up.
Both percentiles read from a 1024-observation ring inside WorkerMetrics.poll_period_ms, so the readout reflects the last ~20 s of a 50 Hz worker or the last ~8.5 min of a 2 Hz worker. Slow workers update slowly; that's the cost of a fixed-size ring.
Age (s)¶
Wall-clock seconds since this worker last produced a SourceRecord. The colouring is harsh on purpose:
- Green (≤ 2 s): normal.
- Yellow (2–5 s, or any worker with
loop_lag_p99_msabove the configured warn threshold): the worker is degraded but still producing. - Red (> 5 s): the worker has not polled in five seconds. For a 2 Hz device that's 10 missed polls; for a 50 Hz balance that's 250 missed polls. Something is wrong — either the adapter has wedged, the serial transport has died, or the worker's loop is starved.
A row goes neutral grey (idle) when no run is active, or when the worker exists in the pool but has not produced a poll yet — distinguishable from red because the numeric cells stay as em-dashes rather than displaying the last-known value.
Reading the dock alongside the status bar¶
The status bar is an aggregated view: one sat pill summarising the worst bridge, one loop pill summarising the conductor. The diagnostics dock is the per-worker drill-down. Use them together:
satred, single Age column also red. A single worker is stuck. Likely a serial-port wedge or an adapter that has stopped yielding. Check that device's events inevents.sqliteand the worker's section ofrun.log.satred, every Age column climbing in lockstep. The downstream — writer thread, disk, or a slowBLOCK-policy databus subscriber — is the bottleneck. No single worker is at fault; they are all backed up because their drain task is blocked downstream. See Saturation and deadlines.loopred, p50 columns drifting upward across all workers. The conductor loop is CPU-starved. Most common cause is aCustomStepprocedure handler doing inline CPU work; see runtime-architecture.md §11.loopred, p50 columns not drifting. Each worker is on its own loop, so worker-side p50 is insensitive to conductor-side starvation. The mismatch tells you the loop lag is real on the conductor but the workers are still polling fine — the data is queued at the bridge, not delayed at the device.- Age red on the IR camera, everything else green. The FLIR SDK has stopped delivering frames. Confirm in
events.sqlitefor acamera.recording_stoppedevent without a matchingrecording_started. The cancellation shield (see runtime-architecture §5) means a wedged vendor SDK call cannot be safely interrupted; recovery requires restarting capa.
What the dock does not show¶
- No latency / inbox / fsync metrics. Writer-thread health is observable only through the
satpill and the downstream effect on every worker's Age. The diagnostics dock is producer-side. - No per-channel rates. A worker may host multiple channels (a Watlow heater emits both PV and setpoint as channels under one poll). The dock measures the poll cadence, not the channel emission count.
- No history beyond the percentile ring. Closing and reopening the run resets every row; the dock is a live view, not a recording. The bundle's
manifest.jsonqueue_healthblock captures the final snapshot on seal — that is the post-run archival source.
See also: Status bar, Runtime architecture, Channel pipeline.