NGB Binary Format Reference
Overview
NETZSCH STA NGB files are proprietary binary files containing thermal analysis data. This document describes the reverse-engineered binary format to aid in maintaining and extending the parser.
⚠️ Disclaimer: This format documentation is based on reverse engineering and may not be complete or accurate for all NGB file versions. This is an unofficial implementation not affiliated with NETZSCH-Gerätebau GmbH.
File Structure
Container Format
NGB files are ZIP archives containing binary streams:
file.ngb-ss3
├── Streams/
│ ├── stream_1.table # File metadata and settings
│ ├── stream_2.table # Measurement data
│ └── stream_3.table # Additional data (optional)
└── [other files] # May contain additional resources
Stream Organization
- Stream 1 (Metadata): File-level metadata, calibration constants, temperature programs
- Stream 2 (Data): Time-series measurement data (time, temperature, mass, DSC, etc.)
- Stream 3+ (Auxiliary): Additional streams may contain derived data or settings
Binary Markers
Core Markers
| Marker Name | Hex Bytes | Purpose |
|---|---|---|
START_DATA |
00 00 08 80 07 |
Marks the beginning of a data array block |
END_TABLE |
FF FE FF 00 00 |
Marks the end of a complete table |
TABLE_SEPARATOR |
FF FE FF 04 |
Separates multiple tables within a stream |
TEMP_PROG_TYPE_PREFIX |
03 80 01 |
Prefix for temperature program entries |
String Encoding Markers
| Marker Name | Hex Bytes | Purpose |
|---|---|---|
STRING_DATA_TYPE |
1F |
Indicates string data follows |
FFFEFF_PATTERN |
FF FE FF |
Alternative string encoding marker |
Metadata Markers
| Marker Name | Hex Bytes | Purpose |
|---|---|---|
APP_LICENSE_CATEGORY |
00 03 |
Application license category marker |
APP_LICENSE_FIELD |
18 FC |
License field identifier |
String Encoding
NGB files use multiple string encoding schemes:
1. Standard String (4-byte length prefix)
Example:
0C 00 00 00 48 65 6C 6C 6F 20 57 6F 72 6C 64 21
│ │
│ └─ "Hello World!" (12 bytes UTF-8)
└─ Length: 12 (0x0000000C)
2. UTF-16LE String
Example:
18 00 00 00 48 00 65 00 6C 00 6C 00 6F 00 21 00
│ │
│ └─ "Hello!" (6 characters = 12 bytes UTF-16LE)
└─ Length: 24 (0x00000018) - byte count, not char count
3. FFFEFF Format (Special)
This format is used for specific metadata fields. The length byte indicates the string length, and the format ends with null terminators.
Data Types
Type IDs
| Type ID (Hex) | Format | Description | Size |
|---|---|---|---|
0B |
float64[] |
Array of double-precision floats | 8 bytes per value |
0A |
float32[] |
Array of single-precision floats | 4 bytes per value |
08 |
int32 |
32-bit signed integer | 4 bytes |
09 |
int64 |
64-bit signed integer | 8 bytes |
1F |
string |
Variable-length string | Variable |
Data Array Format
Arrays are stored with a header and data section:
Example - Float64 Array:
00 00 08 80 07 0B 03 00 00 00 [24 bytes of float64 data]
│ │ │ │
│ │ │ └─ 3 float64 values (3 * 8 = 24 bytes)
│ │ └─ Element count: 3
│ └─ Type: 0x0B (float64[])
└─ START_DATA marker
Metadata Patterns
Column Identification
Columns in the data stream are identified by hex IDs. Common IDs:
| Hex ID | Column Name | Units | Description |
|---|---|---|---|
0x0154 |
time |
raw minutes, exposed as seconds | Elapsed time |
0x02bc |
sample_temperature |
°C | Sample temperature |
0x00e1 |
furnace_temperature |
°C | Furnace temperature |
0x0167 |
mass |
mg | Sample mass |
0x02c1 |
dsc_signal |
μV | DSC signal |
Crucible Mass Patterns
Sample Crucible:
Signature: 07 00 0C 00 s a m p l e _ c r u c i b l e
│ │
│ └─ "sample_crucible" string
└─ Type and length markers
Reference Crucible:
Temperature Program Format
Temperature programs describe heating/cooling segments:
Fields:
- Type: 0x01 = heating, 0x02 = cooling, 0x03 = isothermal
- Temperatures: float64 (°C)
- Heating/cooling rate: float64 (°C/min)
- Stage duration: raw minutes, converted to seconds in pyngb metadata
File Metadata Fields
Common Metadata
| Field Name | Type | Description |
|---|---|---|
instrument |
string | Instrument model (e.g., "STA 449 F3 Jupiter") |
sample_name |
string | Sample identifier |
operator |
string | Operator name |
sample_mass |
float | Initial sample mass (mg) |
reference_mass |
float | Reference mass (mg) |
crucible_mass |
float | Crucible mass (mg) |
date |
string | Measurement date (ISO format) |
time |
string | Measurement time |
Calibration Constants
DSC calibration constants for signal correction:
| Field Name | Type | Description |
|---|---|---|
tau |
float | Time constant (τ) |
sensitivity |
float | DSC sensitivity |
E |
float | Calibration constant E |
Formula: DSC_corrected = (DSC_raw + τ * dDSC/dt) / Sensitivity + E
Column Metadata
Each data column can have metadata:
| Field | Type | Description |
|---|---|---|
units |
string | Measurement units |
processing_history |
list[string] | Processing steps applied |
baseline_subtracted |
bool | Whether baseline correction applied |
source |
string | Data source identifier |
Parsing Algorithm
High-Level Algorithm
def parse_ngb_file(file_path):
# 1. Open ZIP archive
with ZipFile(file_path) as zf:
# 2. Read stream files
streams = [zf.read(f"Streams/stream_{i}.table") for i in range(1, 4)]
# 3. Parse binary streams
for stream in streams:
tables = split_tables(stream) # Split on TABLE_SEPARATOR
for table in tables:
# 4. Extract data arrays
arrays = extract_data_arrays(table)
# 5. Parse metadata
metadata = extract_metadata(table)
# 6. Construct PyArrow table
return build_table(arrays, metadata)
Table Splitting
def split_tables(stream: bytes) -> list[bytes]:
"""Split stream on TABLE_SEPARATOR markers."""
separator = b'\xFF\xFE\xFF\x04'
tables = stream.split(separator)
return [t for t in tables if t] # Remove empty tables
Data Extraction
def extract_data_arrays(table: bytes) -> dict:
"""Extract all data arrays from a table."""
arrays = {}
pos = 0
while (start_pos := table.find(START_DATA, pos)) != -1:
type_id = table[start_pos + 5]
count = int.from_bytes(table[start_pos+6:start_pos+10], 'little')
# Extract based on type
if type_id == 0x0B: # float64[]
data_size = count * 8
data_start = start_pos + 10
values = struct.unpack(f'<{count}d',
table[data_start:data_start+data_size])
arrays[...] = values
pos = start_pos + 10 + data_size
return arrays
Discovery Methodology
This binary format was reverse-engineered using:
- Hex Dump Analysis: Manual inspection of files in hex editors
- Pattern Recognition: Identifying repeated byte sequences
- Cross-File Comparison: Comparing multiple NGB files
- Trial and Error: Testing different interpretations
- Validation: Checking parsed values against instrument software output
Key Insights
- ZIP Container: Recognized standard ZIP signatures
- Marker Patterns: Identified repeated sequences before data blocks
- Length Prefixes: Common pattern in binary formats
- Little-Endian: Standard for Windows-based instruments
- String Encodings: Multiple formats found through trial-and-error
Limitations and Unknowns
Known Limitations
- Version Compatibility: Format may vary across NETZSCH software versions
- Undocumented Fields: Many byte sequences remain unidentified
- Conditional Logic: Some patterns appear only in specific configurations
- Proprietary Extensions: Vendor-specific features may exist
Unknowns
- Complete list of all possible column hex IDs
- All temperature program segment types
- Meaning of some metadata fields
- Versioning scheme (if any)
Validation Strategy
To validate parsing correctness:
- Cross-Check: Compare with NETZSCH software export (CSV/text)
- Physical Validity: Check temperature ranges, mass values
- Conservation: Verify mass loss calculations
- Metadata Consistency: Check sample mass matches data
- Multiple Files: Test across different experiments/instruments
Contributing
If you discover new patterns or corrections:
- Document the pattern with hex dumps
- Provide test files (if possible)
- Explain the discovery methodology
- Verify against multiple files
References
- NETZSCH Proteus software documentation (limited)
- ZIP format specification (RFC 1951, RFC 1952)
- IEEE 754 floating-point standard
- UTF-8 and UTF-16LE encoding standards