NGB Binary Format Reference

Overview

NETZSCH STA NGB files are proprietary binary files containing thermal analysis data. This document describes the reverse-engineered binary format to aid in maintaining and extending the parser.

⚠️ Disclaimer: This format documentation is based on reverse engineering and may not be complete or accurate for all NGB file versions. This is an unofficial implementation not affiliated with NETZSCH-Gerätebau GmbH.

File Structure

Container Format

NGB files are ZIP archives containing binary streams:

file.ngb-ss3
├── Streams/
│   ├── stream_1.table    # File metadata and settings
│   ├── stream_2.table    # Measurement data
│   └── stream_3.table    # Additional data (optional)
└── [other files]         # May contain additional resources

Stream Organization

Stream 1 (Metadata): File-level metadata, calibration constants, temperature programs
Stream 2 (Data): Time-series measurement data (time, temperature, mass, DSC, etc.)
Stream 3+ (Auxiliary): Additional streams may contain derived data or settings

Binary Markers

Core Markers

Marker Name	Hex Bytes	Purpose
`START_DATA`	`00 00 08 80 07`	Marks the beginning of a data array block
`END_TABLE`	`FF FE FF 00 00`	Marks the end of a complete table
`TABLE_SEPARATOR`	`FF FE FF 04`	Separates multiple tables within a stream
`TEMP_PROG_TYPE_PREFIX`	`03 80 01`	Prefix for temperature program entries

String Encoding Markers

Marker Name	Hex Bytes	Purpose
`STRING_DATA_TYPE`	`1F`	Indicates string data follows
`FFFEFF_PATTERN`	`FF FE FF`	Alternative string encoding marker

Metadata Markers

Marker Name	Hex Bytes	Purpose
`APP_LICENSE_CATEGORY`	`00 03`	Application license category marker
`APP_LICENSE_FIELD`	`18 FC`	License field identifier

String Encoding

NGB files use multiple string encoding schemes:

1. Standard String (4-byte length prefix)

[4 bytes: length (little-endian int32)][N bytes: UTF-8 data]

Example:

0C 00 00 00 48 65 6C 6C 6F 20 57 6F 72 6C 64 21
│           │
│           └─ "Hello World!" (12 bytes UTF-8)
└─ Length: 12 (0x0000000C)

2. UTF-16LE String

[4 bytes: length (little-endian int32)][N*2 bytes: UTF-16LE data]

Example:

18 00 00 00 48 00 65 00 6C 00 6C 00 6F 00 21 00
│           │
│           └─ "Hello!" (6 characters = 12 bytes UTF-16LE)
└─ Length: 24 (0x00000018) - byte count, not char count

3. FFFEFF Format (Special)

[FF FE FF][length byte][string data][00 00]

This format is used for specific metadata fields. The length byte indicates the string length, and the format ends with null terminators.

Data Types

Type IDs

Type ID (Hex)	Format	Description	Size
`0B`	`float64[]`	Array of double-precision floats	8 bytes per value
`0A`	`float32[]`	Array of single-precision floats	4 bytes per value
`08`	`int32`	32-bit signed integer	4 bytes
`09`	`int64`	64-bit signed integer	8 bytes
`1F`	`string`	Variable-length string	Variable

Data Array Format

Arrays are stored with a header and data section:

[START_DATA marker][Type ID][4 bytes: element count][data elements]

Example - Float64 Array:

00 00 08 80 07  0B  03 00 00 00  [24 bytes of float64 data]
│               │   │            │
│               │   │            └─ 3 float64 values (3 * 8 = 24 bytes)
│               │   └─ Element count: 3
│               └─ Type: 0x0B (float64[])
└─ START_DATA marker

Metadata Patterns

Column Identification

Columns in the data stream are identified by hex IDs. Common IDs:

Hex ID	Column Name	Units	Description
`0x0154`	`time`	raw minutes, exposed as seconds	Elapsed time
`0x02bc`	`sample_temperature`	°C	Sample temperature
`0x00e1`	`furnace_temperature`	°C	Furnace temperature
`0x0167`	`mass`	mg	Sample mass
`0x02c1`	`dsc_signal`	μV	DSC signal

Crucible Mass Patterns

Sample Crucible:

Signature: 07 00 0C 00  s  a  m  p  l  e  _  c  r  u  c  i  b  l  e
           │           │
           │           └─ "sample_crucible" string
           └─ Type and length markers

Reference Crucible:

Signature: 13 00 12 00  r  e  f  e  r  e  n  c  e  _  c  r  u  c  i  b  l  e

Temperature Program Format

Temperature programs describe heating/cooling segments:

[TEMP_PROG_TYPE_PREFIX][Type byte][Start temp][End temp][Rate][...]

Fields: - Type: 0x01 = heating, 0x02 = cooling, 0x03 = isothermal - Temperatures: float64 (°C) - Heating/cooling rate: float64 (°C/min) - Stage duration: raw minutes, converted to seconds in pyngb metadata

File Metadata Fields

Common Metadata

Field Name	Type	Description
`instrument`	string	Instrument model (e.g., "STA 449 F3 Jupiter")
`sample_name`	string	Sample identifier
`operator`	string	Operator name
`sample_mass`	float	Initial sample mass (mg)
`reference_mass`	float	Reference mass (mg)
`crucible_mass`	float	Crucible mass (mg)
`date`	string	Measurement date (ISO format)
`time`	string	Measurement time

Calibration Constants

DSC calibration constants for signal correction:

Field Name	Type	Description
`tau`	float	Time constant (τ)
`sensitivity`	float	DSC sensitivity
`E`	float	Calibration constant E

Formula: DSC_corrected = (DSC_raw + τ * dDSC/dt) / Sensitivity + E

Column Metadata

Each data column can have metadata:

Field	Type	Description
`units`	string	Measurement units
`processing_history`	list[string]	Processing steps applied
`baseline_subtracted`	bool	Whether baseline correction applied
`source`	string	Data source identifier

Parsing Algorithm

High-Level Algorithm

def parse_ngb_file(file_path):
    # 1. Open ZIP archive
    with ZipFile(file_path) as zf:
        # 2. Read stream files
        streams = [zf.read(f"Streams/stream_{i}.table") for i in range(1, 4)]

        # 3. Parse binary streams
        for stream in streams:
            tables = split_tables(stream)  # Split on TABLE_SEPARATOR

            for table in tables:
                # 4. Extract data arrays
                arrays = extract_data_arrays(table)

                # 5. Parse metadata
                metadata = extract_metadata(table)

        # 6. Construct PyArrow table
        return build_table(arrays, metadata)

Table Splitting

def split_tables(stream: bytes) -> list[bytes]:
    """Split stream on TABLE_SEPARATOR markers."""
    separator = b'\xFF\xFE\xFF\x04'
    tables = stream.split(separator)
    return [t for t in tables if t]  # Remove empty tables

Data Extraction

def extract_data_arrays(table: bytes) -> dict:
    """Extract all data arrays from a table."""
    arrays = {}
    pos = 0

    while (start_pos := table.find(START_DATA, pos)) != -1:
        type_id = table[start_pos + 5]
        count = int.from_bytes(table[start_pos+6:start_pos+10], 'little')

        # Extract based on type
        if type_id == 0x0B:  # float64[]
            data_size = count * 8
            data_start = start_pos + 10
            values = struct.unpack(f'<{count}d',
                                  table[data_start:data_start+data_size])
            arrays[...] = values

        pos = start_pos + 10 + data_size

    return arrays

Discovery Methodology

This binary format was reverse-engineered using:

Hex Dump Analysis: Manual inspection of files in hex editors
Pattern Recognition: Identifying repeated byte sequences
Cross-File Comparison: Comparing multiple NGB files
Trial and Error: Testing different interpretations
Validation: Checking parsed values against instrument software output

Key Insights

ZIP Container: Recognized standard ZIP signatures
Marker Patterns: Identified repeated sequences before data blocks
Length Prefixes: Common pattern in binary formats
Little-Endian: Standard for Windows-based instruments
String Encodings: Multiple formats found through trial-and-error

Limitations and Unknowns

Known Limitations

Version Compatibility: Format may vary across NETZSCH software versions
Undocumented Fields: Many byte sequences remain unidentified
Conditional Logic: Some patterns appear only in specific configurations
Proprietary Extensions: Vendor-specific features may exist

Unknowns

Complete list of all possible column hex IDs
All temperature program segment types
Meaning of some metadata fields
Versioning scheme (if any)

Validation Strategy

To validate parsing correctness:

Cross-Check: Compare with NETZSCH software export (CSV/text)
Physical Validity: Check temperature ranges, mass values
Conservation: Verify mass loss calculations
Metadata Consistency: Check sample mass matches data
Multiple Files: Test across different experiments/instruments

Contributing

If you discover new patterns or corrections:

Document the pattern with hex dumps
Provide test files (if possible)
Explain the discovery methodology
Verify against multiple files

References

NETZSCH Proteus software documentation (limited)
ZIP format specification (RFC 1951, RFC 1952)
IEEE 754 floating-point standard
UTF-8 and UTF-16LE encoding standards