File formats¶
Audience: tool authors. Scope: every on-disk format capa reads or writes, with a pointer to its schema documentation.
Configuration files¶
| Format | Extension(s) | Where it appears | Schema doc |
|---|---|---|---|
| YAML | .yaml, .yml |
Experiment recipes (configs/experiments/*.yaml). The recommended outer format. |
Experiment YAML |
| TOML | .toml |
Experiment recipes (alternative); hardware profiles; methods; calibration sets; profile defaults; the optional plugins.lock. |
Configuration overview |
| TOML | *.method.toml |
Method files. | Method TOML |
| TOML | configs/hardware/*.toml |
Hardware profiles. | Hardware TOML |
| TOML | configs/calibrations/<family>/*.toml |
Calibration sets. | Calibrations on disk |
| TOML | plugins.lock |
Trusted-plugin manifest. | Plugin lockfile |
YAML parsing uses ruamel.yaml; TOML uses tomllib (read) and
tomli_w (write). Canonical-form writers live in
capa.config.canonical.
Bundle artifacts¶
Every artifact below lives inside a sealed bundle directory. The top-level directory tour is in What's in a bundle.
| Format | Filename(s) | Contents | Schema doc |
|---|---|---|---|
| JSON | manifest.json |
The bundle index. The single source of truth. | Manifest and schema |
| Plain text | manifest.sha256 |
sha256sum-compatible integrity hash file. Sealed-bundle signal. |
Integrity and sealing |
| Parquet | scalars.parquet |
Normalized per-channel samples (long format). | Channel samples parquet |
| Parquet | device_records/<adapter>.parquet |
Library-native rows, one file per adapter family. | Device records parquet |
| SQLite | events.sqlite |
Procedure / safety / device-event log. | Events SQLite |
| SQLite | status.sqlite |
Periodic per-device health snapshots. | Events SQLite |
| JSON lines | run.log |
Structlog output tee'd from the engine logger. One JSON object per line. | What's in a bundle |
| Video | video/<name>.mkv |
Visible-camera container (PyAV / Matroska, libx264 by default). | Video |
| Video | video/<name>.csq |
FLIR IR raw container. | Video |
| Parquet | video/<name>.frames.parquet |
Frame-index sidecar (frame # ↔ monotonic ns). One per camera. | Video |
| JSON | video/<name>.csq.meta.json |
IR-only metadata sidecar (palette, radiometric kit, calibration). | Video |
| TOML | config.toml |
Frozen ExperimentConfig (canonical TOML form). |
Experiment YAML |
| TOML | method.toml |
Frozen Method (present only when one was loaded). |
Method TOML |
| TOML | profiles/<short_id>.toml |
Frozen domain-profile metadata. | CAPA profile fields |
| TOML | equipment.toml |
What was actually opened (firmware, serial #s). | What's in a bundle |
| JSON | calibration.json |
CalibrationSet reference snapshot. | Calibrations on disk |
| TOML | env/uv.lock |
Verbatim copy of the lockfile at run-start. Hash recorded in manifest.json under lockfile.sha256. |
What's in a bundle |
| JSON | env/packages.json |
Installed distribution metadata gathered by gather_provenance. |
What's in a bundle |
In-flight (transient) artifacts¶
These exist during a run and are rewritten by finalize:
| Format | Filename pattern | Becomes | Notes |
|---|---|---|---|
| Arrow IPC | scalars.in-flight.arrows |
scalars.parquet |
Append-optimised channel-sample stream. |
| Arrow IPC | device_records/<adapter>.in-flight.arrows |
device_records/<adapter>.parquet |
One per family. |
| Arrow IPC | video/<name>.in-flight.arrows |
video/<name>.frames.parquet |
Frame-index sidecar (the container is written directly to its final path). |
Arrow IPC has no compression-level knob, so the in-flight codec is
just a name (e.g. "zstd"). Final Parquet codecs may carry a level
(e.g. "zstd:6").
Calibration tune artifacts¶
Heat-flux tune procedure emits tune artifact TOML files outside the
bundle directory, under configs/calibrations/flux/:
| Format | Filename pattern | Contents | Schema doc |
|---|---|---|---|
| TOML | capa_flux_YYYY-MM-DD.toml |
One tune session's per-target flux/setpoint pairs plus fit metadata. | Tune artifacts |
| TOML | latest.toml |
Symlink-equivalent pointer to the most recent artifact. | Tune artifacts |
Subsequent runs read these via calibration_set: name = flux (with an
optional revision: pin).
Format choice rationale¶
A short note on why each format ended up where it did:
- YAML for experiment recipes — nested metadata (specimen, atmosphere, operator) reads better than TOML.
- TOML for hardware, method, and calibration files — heavy use of
array-of-tables (
[[devices]],[[channels]],[[steps]]) reads better in TOML than YAML lists of mappings. - JSON for
manifest.json,equipment.toml's eventual sibling blocks, and the camera sidecars — machine-targeted, no comments needed, every language has a fast parser. - Parquet for sample / record streams — columnar, compressed, and
every analyst's preferred input. Row groups are large
(
parquet_final_row_group_rows = 262144by default) for scan efficiency. - SQLite for events and status — transactional appends, queryable in place, robust to mid-run crashes.
- Arrow IPC for in-flight — append cost matters more than read cost during acquisition.
sha256sum-compatible formanifest.sha256— readable by every OS's command-line verifier without bespoke tooling.
See also¶
- What's in a bundle — directory tour.
- Configuration overview — how config files compose.
- Bundle versioning — schema-bump policy across formats.