Troubleshooting¶
A checklist for the common failure modes when opening, polling, or streaming from Alicat hardware. The library raises typed exceptions for each layer, so most troubleshooting is "read the exception type, then check the bullet below."
Source: errors.py, config.py, devices/discovery.py.
Finding the port¶
from alicatlib import list_serial_ports
list_serial_ports()
# ["/dev/ttyUSB0", "/dev/ttyUSB1", "/dev/serial/by-id/usb-FTDI..."]
list_serial_ports() wraps
anyserial.list_serial_ports(). On Linux, prefer the stable
/dev/serial/by-id/usb-FTDI_... path over /dev/ttyUSB0 — the latter
renumbers on reboot if multiple converters are plugged in. The manager
canonicalises both forms to the same port identity (os.path.realpath
on POSIX, uppercased \\.\ prefix strip on Windows), so a port passed
either way shares its protocol client.
If list_serial_ports() misses a device that's clearly plugged in,
install the pyserial-backed fallback:
Native anyserial enumeration relies on platform-specific APIs; the
pyserial fallback is broader but heavier. Both return the same path
strings.
Permissions (Linux / macOS)¶
On Linux the default /dev/ttyUSB* device is owned root:dialout
with mode 660. If open_device raises
AlicatConnectionError: [Errno 13] Permission denied: '/dev/ttyUSB0':
On some distributions the group is uucp (Arch) or tty. Check with
ls -l /dev/ttyUSB0.
On macOS the default device is /dev/tty.usbserial-* and is
world-accessible; no permission changes are needed.
Discovering devices on a bus¶
A mixed fleet with unknown baud rates / unit ids:
from alicatlib import find_devices, DEFAULT_DISCOVERY_BAUDRATES
results = await find_devices(
ports=None, # None = all available ports
unit_ids=("A", "B", "C"),
baudrates=DEFAULT_DISCOVERY_BAUDRATES, # (19200, 115200) by default
timeout=0.2,
)
for r in results:
if r.ok:
print(r.info.model, r.info.firmware, r.port, r.baudrate, r.unit_id)
find_devices runs
probe over the cross-product
of ports × unit_ids × baudrates, bounded by a CapacityLimiter
(default 8 concurrent opens). Individual probe failures never raise
from find_devices — every combination produces a
DiscoveryResult, and
the caller filters on result.ok.
Default baudrates are (19200, 115200) — Alicat factory default plus
the most common alternative after NCB. Widen via the baudrates
kwarg when you have devices at other rates.
Timeouts¶
AlicatTimeoutError means a transport-level I/O timeout expired.
Timeouts are distinct from empty successful responses — a missing
reply is never silently represented as nothing.
Tune defaults via AlicatConfig or
per-call timeout= kwargs (available on every I/O boundary —
transport,
protocol client,
session,
factory,
discovery,
manager):
| Setting | Default | When to bump |
|---|---|---|
default_timeout_s |
0.5 s | USB-to-RS485 converter with high buffering latency (PL2303, CH340), slow devices on busy RS-485 buses. |
multiline_timeout_s |
1.0 s | ??M* / ??D* / gas list on slow devices — the table commands are paced at device speed across 5–20 lines. |
write_timeout_s |
0.5 s | Writes can block on RS-485 hardware flow control, a hung device, or a TCP transport's send buffer. |
from alicatlib import AlicatConfig, open_device
config = AlicatConfig(default_timeout_s=1.0)
async with open_device("/dev/ttyUSB0", timeout=1.0) as dev:
...
Timeouts can also be set via environment variables — see
config_from_env. Recognised keys all
prefix with ALICATLIB_: DEFAULT_TIMEOUT_S, MULTILINE_TIMEOUT_S,
WRITE_TIMEOUT_S, DEFAULT_BAUDRATE, DRAIN_BEFORE_WRITE,
SAVE_RATE_WARN_PER_MIN, EAGER_TASKS.
If p50 latency creeps near the timeout¶
Benchmark the single-line round-trip:
See benchmarks.md for the reference table. The baseline is p50 ~4 ms on PL2303 / 8v17 / 115200. If you're at 400 ms p50, the timeout needs to go up before it starts swallowing legitimate replies.
Stale input on the bus¶
If a prior process crashed mid-command or left a device streaming,
the bus has stale bytes that confuse the next command.
open_device(..., recover_from_stream=True)
(default) handles the streaming case: the factory passively sniffs
for ~100 ms, and if bytes arrive it writes the stop-stream sequence
and drains before VE runs. See streaming.md.
For non-streaming stale bytes, set
AlicatConfig.drain_before_write=True. The protocol client drains
any stale input before each command; this adds latency but
re-synchronises after a timeout or a partial reply.
Identification failures¶
AlicatConfigurationError: ... model_hint is required¶
The device is GP-family or pre-8v28 numeric firmware, so ??M*
isn't available. Supply model_hint:
See devices.md §Escape hatches.
AlicatMediumMismatchError¶
A command's declared medium doesn't intersect the device's configured
medium. Typical cause: calling .gas(...) on a CODA device configured
for liquid, or .fluid(...) on a gas-only device. The error carries
ErrorContext.device_media and ErrorContext.command_media so you
can see the exact mismatch.
Remediation: if your device actually supports both media but is
configured for one, narrow via
assume_media=Medium.GAS | Medium.LIQUID at open time. The factory
replaces (not unions) the prefix-derived media when assume_media
is passed.
AlicatMissingHardwareError¶
The device lacks a capability the command requires. Error message
names the missing Capability flag (e.g. BAROMETER,
ANALOG_OUTPUT, DISPLAY). Check DeviceInfo.capabilities to see
what the factory probed:
async with open_device(port) as dev:
print(dev.info.capabilities)
print(dev.info.probe_report) # per-flag outcomes
If the probe returned "timeout" or "rejected" on a capability you
know the device has, opt in via assume_capabilities= — see
devices.md §Escape hatches. This is
common for TAREABLE_ABSOLUTE_PRESSURE where no safe probe exists.
AlicatFirmwareError¶
The firmware is outside the command's supported range. The error
carries actual, required_min, required_max, and
required_families so the remediation is obvious — usually "this
device is too old for this command". See
commands.md for the per-command firmware gates.
Unit-ID and baud changes¶
Changing unit id (ADDR) or baud (NCB) is destructive and leaves
the session explicitly broken on success. The next call on the stale
session raises AlicatError with an explanation; re-open with the new
settings:
async with open_device("/dev/ttyUSB0", unit_id="A") as dev:
await dev.change_unit_id("B", confirm=True)
# old session is broken here — re-open on the new id
async with open_device("/dev/ttyUSB0", unit_id="B") as dev:
...
This is intentional. Pretending the session still works would let a command interleave with a half-committed reconciliation and produce silent data corruption.
Streaming-mode fast-fails¶
While a StreamingSession is active on a port, every
request/response command on any session sharing that client fails
fast with AlicatStreamingModeError. Typical cause: trying to
poll() from a second task while the first task holds a
StreamingSession.
One streamer per port is a hard invariant. Either fold the second
consumer into the streaming session's iterator, or use record()
instead of streaming (see streaming.md §Streaming vs. record()).
Sink errors¶
Sink failures are typed under AlicatSinkError:
| Error | Cause |
|---|---|
AlicatSinkDependencyError |
Optional extra not installed. Error message names the extra (alicatlib[parquet] / alicatlib[postgres]). |
AlicatSinkSchemaError |
Batch shape incompatible with the sink's locked schema. Unknown optional columns drop with a WARN log — they don't raise. |
AlicatSinkWriteError |
Backing store rejected the write. Wraps the underlying driver exception (sqlite3 / asyncpg / pyarrow) so handlers don't need to import optional deps. |
See logging.md for sink setup.
EEPROM-wear warnings¶
Every command carrying save=True (active gas, PID gains, deadband,
batch, valve offset, totalizer save, setpoint source, ...) triggers
an EEPROM write on the device. Repeated saves wear the cell; the
library logs a WARN on the alicatlib.session logger when saves
exceed AlicatConfig.save_rate_warn_per_min per minute per device
(default: 10).
If you need to loop a setpoint rapidly, pass save=False — the
setting still takes effect for the current power cycle but doesn't
hit EEPROM.
Getting raw wire bytes¶
The protocol client emits one tx and one rx DEBUG event per
write / read on the alicatlib.protocol logger, with structured
{direction, raw, len} extras. Enable at your root handler:
import logging
logging.basicConfig(level=logging.DEBUG, format="%(name)s %(message)s")
logging.getLogger("alicatlib.protocol").setLevel(logging.DEBUG)
The raw extra carries the bytes verbatim; credentials never appear
in log lines (scrubbed at the PostgresSink boundary). See
logging.md §Logger tree for the full event
schema.
Capture a fixture from live hardware¶
Most library bugs are easier to diagnose against a recorded session
than a live one. The fixture format is a plaintext > / < dialog
readable by both humans and parse_fixture;
see testing.md §Fixture format for the
grammar and how to hand-write one, or replay the session through
AlicatConfig-driven DEBUG logs (see
§Getting raw wire bytes above) and paste the
tx / rx lines into a .txt file.
An automated record_session(device, scenario) capture helper is
planned but not shipped yet — follow
issue #TODO
or attach the raw DEBUG transcript to your bug report in the
meantime.
Filing a bug¶
Collect:
DeviceInfo.model,.firmware.raw,.capabilities,.probe_report(repr(dev.info)does fine).- The exception traceback including the typed
ErrorContextrender. - If available, the fixture capture from the section above or a
--log-level=DEBUGpytest transcript with thealicatlib.protocolDEBUG lines included.