Events and status (sqlite)¶

Audience: analysts reconstructing what happened during a run; operators triaging a crashed_but_sealed outcome; plugin authors deciding what their adapter should log. Scope: the two SQLite databases in a bundle — events.sqlite (the run's event log) and status.sqlite (periodic health snapshots) — their schemas, the event taxonomy, and recipes for reading them.

A sealed bundle ships two SQLite databases, deliberately kept separate. events.sqlite holds the transactional event log: adapter commands, procedure milestones, safety trips, operator notes — every discrete thing worth remembering after the run. status.sqlite holds the 1 Hz DeviceSnapshot heartbeat — connection state, comm-error counters, firmware version, alarm bits. The separation is structural rather than aesthetic: a noisy 1 Hz health stream from five adapters produces tens of thousands of rows per hour, and a combined table would drown the operator-relevant event view every time you opened it. By keeping the two streams in different files, SELECT * FROM events ORDER BY t_mono_ns stays useful no matter how chatty the rig is.

`events.sqlite` — schema¶

The DDL is defined verbatim in events_sink.py:

CREATE TABLE IF NOT EXISTS events (
    id           INTEGER PRIMARY KEY AUTOINCREMENT,
    t_mono_ns    INTEGER NOT NULL,
    t_utc        TEXT    NOT NULL,
    kind         TEXT    NOT NULL,
    severity     TEXT    NOT NULL,
    source       TEXT    NOT NULL,
    message      TEXT    NOT NULL,
    metadata_json TEXT
);
CREATE INDEX IF NOT EXISTS idx_events_t_mono_ns ON events (t_mono_ns);
CREATE INDEX IF NOT EXISTS idx_events_kind ON events (kind);

There is exactly one table, named events. Every row — adapter command, procedure milestone, safety trip, preflight warning — lives in the same table, distinguished by the kind and source columns. (An earlier stub of this doc promised separate method_steps, manual_commands, and safety tables; they were never built. Filter by kind instead.)

Column	Type	Notes
`id`	`INTEGER PK AUTOINCREMENT`	Row id. Useful only as an insertion-order tiebreaker — `t_mono_ns` is the real time key.
`t_mono_ns`	`INTEGER`	Monotonic-clock nanoseconds at emission. This is the join key for cross-referencing against `scalars.parquet` and the per-adapter device records. Indexed.
`t_utc`	`TEXT`	ISO-8601 UTC timestamp at emission. For human display only; not monotonic across NTP corrections.
`kind`	`TEXT`	Open-string event discriminator (see taxonomy below). Indexed.
`severity`	`TEXT`	One of `info`, `warning`, `error` — enforced at write time by `ALLOWED_SEVERITIES` in `events_sink.py`.
`source`	`TEXT`	Free-form producer identifier. Adapter events use `"<adapter>:<device>"` (e.g. `"watlow:heater"`); procedure events use `"procedure:<name>"`; conductor-level events use `"engine"`; operator notes use `"operator"`.
`message`	`TEXT`	Human-readable summary. Keep typed data in `metadata_json`, not here.
`metadata_json`	`TEXT`	JSON-encoded `dict[str, Any]` (or `NULL`). Free-form per-`kind` payload — readers should `json.loads(row["metadata_json"])`.

Two indexes ship by default: idx_events_t_mono_ns (for timeline scans) and idx_events_kind (for WHERE kind = ... filters). Nothing else is indexed; if you need WHERE source = ... to be fast on a large bundle, add an index in your reader copy.

The two write paths¶

EventsSink exposes two entry points. Most callers should use one or the other, not invent a third:

EventsSink.write_device_event(event: DeviceEvent) — the typed path. Adapters emit DeviceEvent records through their normal emission stream, and the runtime fans them into this method. The wrapper sets source = f"{event.adapter}:{event.device}" automatically, so adapter authors don't have to name themselves.
EventsSink.write(*, kind, message, severity, source, t_mono_ns, t_utc, metadata) — the generic path. Used by the conductor (source="engine"), the procedure runner (source="procedure:..."), the safety layer, and operator-note hooks. Callers are responsible for picking a meaningful source.

Both paths funnel into the same INSERT INTO events ... statement, so there is no schema split between them — write_device_event is sugar.

Durability. The connection runs with journal_mode=WAL and synchronous=NORMAL, and isolation_level=None (autocommit). Every write() commits before returning. That is deliberate: losing an event row to a crash is the worst possible outcome for a forensic log. Throughput is fine for CAPA's 3–60 Hz emission envelope (events are sparse — adapter commands, procedure transitions — not per-sample), and the WAL keeps a power-loss read consistent.

From the producer's perspective the call is fire-and-forget — write_event from the runtime context is awaited but the cost is a single SQLite insert. The actual serialization happens on the writer thread; concurrent write_event calls from the procedure and an adapter cannot interleave because the connection uses check_same_thread=False with a single owning writer task.

A third "path" worth mentioning is the runtime's RunContext.writer.write_event(...) helper. That is what the conductor and procedures actually call — it forwards into EventsSink.write(...) with source and t_mono_ns / t_utc pre-filled from the run context. Adapter authors don't need to touch it; their DeviceEvent emissions are routed through write_device_event automatically by the worker drain.

Event-kind taxonomy¶

The kind column is an open string, not a database enum — anyone with a writer handle can introduce a new kind. The following table catalogs every kind currently produced in-tree, grouped by producer. New kinds should follow the dotted-namespace convention (producer.event or producer.sub.event) so the timeline stays grep-friendly.

Kind	Producer	When fired	Typical metadata
`saturation_deadline`	conductor (`engine`)	Saturation monitor escalated — see Saturation and deadlines.	`reason`, `details` (`resource_id`, `blocked_s`, `deadline_s`, or `depth` + `since_last_accept_s`)
`worker_adapter_error`	worker (`engine`)	An adapter's streaming loop raised; the worker marked itself fatal and exited.	`resource_id`, `adapter`, `error_type`
`profile.preflight.warning`	procedure runner (`profile`)	A non-blocking problem from a domain-profile preflight check.	`code`, `message`, profile-specific keys
`procedure.preflight.warning`	procedure runner (`procedure`)	A non-blocking problem from the procedure's own preflight.	`code`, `message`
`free_run.started`, `free_run.ended`	`procedure:free_run`	Bracket the Free Run lifetime.	run parameters, end reason
`batch.started`, `batch.child.started`, `batch.child.ended`, `batch.ended`	`procedure:batch`	Outer + per-child brackets for the Batch procedure.	`child_index`, `child_name`, totals at `batch.ended`
`heat_flux_tune.started` / `.holding` / `.iteration` / `.target_accepted` / `.completed` / `.aborted` / `.operator_command` / `.command.issued`	`procedure:heat_flux_tune`	State transitions, loop iterations, operator interventions, and adapter commands inside the heat-flux tuning controller.	`target_kw_m2`, `measured_kw_m2`, `setpoint_c`, iteration index, etc.
`method.step.entered`, `method.step.exited`, `method.step.failed`	`procedure:method`	Per-step brackets emitted by the method executor.	`step_index`, `step_kind`, `target`, exception text on `.failed`
`method.prompt.shown`, `method.prompt.acknowledged`, `method.prompt.unanswered`	`procedure:method`	Operator-prompt lifecycle (waiting for an acknowledgement, timed out).	`prompt_id`, `text`, `timeout_s`
`method.command.issued`, `method.wait.timeout`	`procedure:method`	Method-level adapter command dispatched / a `wait` step expired.	`target`, `value`, `unit`, `timeout_s`
`set_setpoint`, `hold`, `ramp`, `safe_shutdown`	`procedure:method` (via `_command_setpoint(kind=…)`)	The four shapes a setpoint command can take from a method step.	`target`, `value`, `unit`
`set_setpoint`, `write_parameter`, `set_display_units`, `unit_mismatch`	`watlow:<device>`	Setpoint / EEPROM-parameter writes and the unit-sync diagnostic from the Watlow adapter.	`register`, `value`, `units`
`set_setpoint`, `set_gas`, `set_units`, `tare_flow`, `hold_valves`, `hold_valves_closed`, `cancel_valve_hold`, `totalizer_reset`, `lock_display`, `unlock_display`	`alicat:<device>`	The MFC command surface (see Alicat adapter). One row per accepted command.	`setpoint`, `gas`, `units`, command-specific arguments
`tare`, `zero`, `internal_adjust`, `set_filter_mode`, `set_display_unit`, `set_auto_zero`, `save_menu`, `reload_menu`	`sartorius:<device>`	Balance commands from the Sartorius adapter.	command-specific arguments
`recording_started`, `recording_stopped`, `nuc_triggered`	`flir:<device>` / `webcam:<device>`	Camera lifecycle and (IR-only) non-uniformity correction trigger.	`path`, `fps`, `format`
`pump_warning`, `open_retry`	`webcam:<device>`	Webcam frame-pump warning or device open retry.	`attempt`, `error`

The taxonomy is producer-namespaced by convention: procedure-emitted kinds use dots (method.step.entered, heat_flux_tune.iteration), adapter-emitted kinds tend to be flat (set_setpoint, tare) because the producer is already disambiguated by the source column. When you filter, combine both — WHERE kind = 'set_setpoint' AND source LIKE 'watlow:%' is unambiguous; kind = 'set_setpoint' alone matches the method executor, the Watlow adapter, and the Alicat adapter.

`status.sqlite` — schema¶

The companion file is status.sqlite. Its DDL:

CREATE TABLE IF NOT EXISTS status (
    id          INTEGER PRIMARY KEY AUTOINCREMENT,
    adapter     TEXT    NOT NULL,
    device      TEXT    NOT NULL,
    t_mono_ns   INTEGER NOT NULL,
    t_utc       TEXT    NOT NULL,
    health      TEXT    NOT NULL,
    fields_json TEXT
);
CREATE INDEX IF NOT EXISTS idx_status_device ON status (adapter, device, t_mono_ns);

One row per DeviceSnapshot. Cadence is whatever the adapter chooses; the in-tree adapters emit at ~1 Hz, driven by their snapshot timer rather than the sample rate. The (adapter, device, t_mono_ns) index makes per-device time slicing fast.

The health column is the tri-state pill that drives the status-bar widgets:

Value	Meaning
`ok`	Adapter is producing samples on schedule, no recent retries, no comm errors.
`degraded`	Transient trouble — auto-reconnect counter > 0 within the snapshot window, or one-off late samples. The adapter is still emitting, but something needed retrying.
`down`	Connection is lost or the producer has gone silent. No samples are flowing.

fields_json is the adapter-specific health payload: firmware version, alarm bits, comm latency, frame counters, valve drive percentages. The shape is per-adapter — json.loads it, then inspect.

Same WAL + NORMAL durability story as events.sqlite. Snapshots are less critical (latest-value semantics; the producer queue drops old rows under pressure) but the connection setup is symmetric for consistency.

Status vs. events: when to use which¶

Use the right file for the question:

Question	File	Query shape
"Did the operator press abort?"	`events.sqlite`	`WHERE kind = 'free_run.ended'` or `WHERE kind LIKE 'method.step.%'`
"Did a saturation deadline trip?"	`events.sqlite`	`WHERE kind = 'saturation_deadline'`
"What setpoints did the Watlow receive?"	`events.sqlite`	`WHERE source LIKE 'watlow:%' AND kind = 'set_setpoint'`
"Was the Alicat reachable at t = 120 s?"	`status.sqlite`	`WHERE adapter = 'alicat' AND t_mono_ns < ? ORDER BY t_mono_ns DESC LIMIT 1`
"What was the Watlow comm-error rate during the run?"	`status.sqlite`	`WHERE adapter = 'watlow'` → unpack `fields_json`
"Did any device go `degraded` or `down`?"	`status.sqlite`	`WHERE health != 'ok'`
"What was the operator told (and did they ack it)?"	`events.sqlite`	`WHERE kind LIKE 'method.prompt.%'`

A rough heuristic: events answer "what happened?", status answers "what was the state?"

Recipes¶

Reconstruct the Run-tab timeline¶

SELECT t_mono_ns, kind, severity, source, message
FROM events
ORDER BY t_mono_ns;

That single query produces the same chronological story the Run tab paints during the run.

Find every command issued by the procedure¶

SELECT t_mono_ns, kind, source, message, metadata_json
FROM events
WHERE kind LIKE 'method.command.%'
   OR kind IN ('set_setpoint', 'hold', 'ramp', 'safe_shutdown')
ORDER BY t_mono_ns;

Add AND source = 'procedure:method' if you want to exclude the same kinds when they originate from an adapter (those are the resulting writes, not the procedure's intent).

Cross-reference an event to channel samples via mono time¶

import json
import sqlite3
import polars as pl

conn = sqlite3.connect("runs/<run_id>/events.sqlite")
abort_t = conn.execute(
    "SELECT t_mono_ns FROM events WHERE kind = 'free_run.ended' LIMIT 1"
).fetchone()[0]

df = pl.read_parquet("runs/<run_id>/scalars.parquet").filter(
    (pl.col("t_mono_ns") > abort_t - 5_000_000_000)
    & (pl.col("t_mono_ns") < abort_t + 5_000_000_000)
)

The 5-second window on either side captures whatever channel samples bracket the event. See Reading a bundle for the longer treatment, including how to join against per-adapter device_records/*.parquet files.

Decode an event's `metadata_json` column¶

import json
import sqlite3

conn = sqlite3.connect("runs/<run_id>/events.sqlite")
conn.row_factory = sqlite3.Row
for row in conn.execute(
    "SELECT t_mono_ns, kind, message, metadata_json FROM events "
    "WHERE kind = 'heat_flux_tune.iteration' ORDER BY t_mono_ns"
):
    meta = json.loads(row["metadata_json"]) if row["metadata_json"] else {}
    print(row["t_mono_ns"], meta.get("setpoint_c"), meta.get("measured_kw_m2"))

Always guard the json.loads with a None check — metadata_json is nullable and rows with no payload come back as NULL.

Build a per-device health timeline from `status.sqlite`¶

import json
import sqlite3
import polars as pl

conn = sqlite3.connect("runs/<run_id>/status.sqlite")
rows = conn.execute(
    "SELECT adapter, device, t_mono_ns, health, fields_json "
    "FROM status WHERE adapter = 'watlow' ORDER BY t_mono_ns"
).fetchall()

df = pl.DataFrame(
    [
        {
            "t_mono_ns": r[2],
            "health": r[3],
            **(json.loads(r[4]) if r[4] else {}),
        }
        for r in rows
    ]
)

The fields_json keys are adapter-specific — open one row's payload first to see what's available before assuming a column exists.

`saturation_deadline` — the must-know event¶

When a row with kind = 'saturation_deadline' appears in events.sqlite, the bundle is in the crashed_but_sealed outcome state. The conductor's SaturationMonitor tripped, the run short-circuited through the normal shutdown path, and the bundle was sealed despite the failure. source is engine, severity is error, message carries the trip reason, and metadata_json decodes to a details blob with resource_id, blocked_s, and deadline_s (or the writer-stall analogues depth and since_last_accept_s).

The write is best-effort because the writer itself may be the wedged component — if you see crashed_but_sealed in the manifest but no saturation_deadline row in events.sqlite, check run.log for conductor.saturation_event_write_failed. The full mechanism (signals, tuning, what happens to adapter stop()) is documented in Saturation and deadlines.

Implementation notes for contributors¶

A few details that matter when you're adding events or writing a reader:

The kind column is an open string, not an enum. Anyone calling write_event(kind=…) adds to the taxonomy. The catalog above is descriptive, not prescriptive — a third-party procedure plugin can introduce myplugin.calibration.started and nothing breaks.
Namespace by producer. Procedure-level kinds use dotted namespaces (heat_flux_tune.iteration, method.prompt.shown); adapter-level kinds use flat names because source already carries the producer identity. New kinds should follow whichever convention matches their producer so timeline filters stay predictable.
metadata_json is the typed payload. Readers json.loads(row["metadata_json"]) and dig in. Don't shove machine-readable values into message — keep message human-readable for the Run tab and the operator handbook. Sample sizes are tiny (a few hundred bytes); there's no pressure to pack.
WAL means a mid-run bundle has a events.sqlite-wal file. Don't be surprised by it. EventsSink.close() runs PRAGMA wal_checkpoint(TRUNCATE) at finalize, which folds the WAL back into the main file and removes the sidecar. Status sink does the same.
Two SQLite files exist precisely so a 1 Hz status stream cannot drown the event view. Don't merge them, and don't add high-rate diagnostic emissions to events.sqlite. If you have a new low-rate health signal, extend DeviceSnapshot.fields and let it ride on status.sqlite.
Severity is enforced. ALLOWED_SEVERITIES = frozenset({"info", "warning", "error"}) — passing anything else to write() raises EventsSinkError. The DeviceEvent.severity field already constrains it at the typed boundary, so the runtime path is safe, but generic write() callers should pick from the set.
t_mono_ns is the cross-file join key. t_utc is for humans only; it can jump backwards across NTP corrections. Every other bundle file (scalars.parquet, device_records/*.parquet, video/*.frames.parquet) carries the same monotonic clock, and joining on it is exact.