Utilities
pyngb.util.get_hash(path, max_size_mb=1000)
Generate file hash for metadata.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
path
|
str
|
Path to the file to hash |
required |
max_size_mb
|
int
|
Maximum file size in MB to hash (default: 1000MB) |
1000
|
Returns:
Type | Description |
---|---|
Optional[str]
|
BLAKE2b hash as hex string, or None if hashing fails |
Raises:
Type | Description |
---|---|
OSError
|
If there are file system related errors |
PermissionError
|
If file access is denied |
Source code in src/pyngb/util.py
pyngb.util.set_metadata(tbl, col_meta={}, tbl_meta={})
Store table- and column-level metadata as json-encoded byte strings.
Provided by: https://stackoverflow.com/a/69553667/25195764
Table-level metadata is stored in the table's schema. Column-level metadata is stored in the table columns' fields.
To update the metadata, first new fields are created for all columns. Next a schema is created using the new fields and updated table metadata. Finally a new table is created by replacing the old one's schema, but without copying any data.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
tbl
|
Table
|
The table to store metadata in |
required |
col_meta
|
dict[str, Any]
|
A json-serializable dictionary with column metadata in the form { 'column_1': {'some': 'data', 'value': 1}, 'column_2': {'more': 'stuff', 'values': [1,2,3]} } |
{}
|
tbl_meta
|
dict[str, Any]
|
A json-serializable dictionary with table-level metadata. |
{}
|
Returns:
Type | Description |
---|---|
Table
|
pyarrow.Table: The table with updated metadata |