Specification

Detailed technical specification of the Polyglot binary format and type system

Binary Format

Polyglot uses a simple binary format consisting of repeating type-value pairs. Each entry in a Polyglot buffer follows this pattern:

Binary Format
| Type (1 byte) | Payload (0+ bytes) |

The default buffer size is 512 bytes, though buffers can grow as needed.

Type System

Basic Types

IdentifierTypeDescription
0x00NilRepresents absence of value
0x07BooleanTrue (0x01) or False (0x00)
0x08Uint88-bit unsigned integer
0x09Uint1616-bit unsigned integer
0x0aUint3232-bit unsigned integer
0x0bUint6464-bit unsigned integer
0x0cInt3232-bit signed integer
0x0dInt6464-bit signed integer
0x0eFloat32IEEE 754 32-bit float
0x0fFloat64IEEE 754 64-bit float

Collection Types

IdentifierTypeDescription
0x01ArrayOrdered collection
0x02MapKey-value collection
0x04BytesRaw byte buffer
0x05StringUTF-8 encoded text

Special Types

IdentifierTypeDescription
0x03AnyDynamic type
0x06ErrorError message

Type Details

Nil Type (0x00)

  • Single byte identifier
  • No payload
  • Used to represent null/nil/None values

Array Type (0x01)

Array Type Format
| 0x01 | Element Type (1 byte) | Size (4 bytes) | Elements... |
  • Element Type indicates the type of all elements
  • Size is uint32 indicating number of elements
  • Elements follow in sequence, each with their respective payloads
  • Can use Any (0x03) as element type for mixed-type arrays

Map Type (0x02)

Map Type Format
| 0x02 | Key Type (1 byte) | Value Type (1 byte) | Size (4 bytes) | Key-Value Pairs... |
  • Key and Value types specified separately
  • Size is uint32 indicating number of pairs
  • Pairs follow in sequence: key, value, key, value...
  • Value Type can be Any (0x03) for mixed-type values

String Type (0x05)

Map Type Format
| 0x05 | Size (4 bytes) | UTF-8 Bytes... |
  • Size is uint32 indicating number of bytes
  • Content is UTF-8 encoded
  • No null termination

Bytes Type (0x04)

Map Type Format
| 0x04 | Size (4 bytes) | Raw Bytes... |
  • Size is uint32 indicating number of bytes
  • Raw byte content follows

Error Type (0x06)

Map Type Format
| 0x06 | Size (4 bytes) | UTF-8 Message... |
  • Encodes error messages as UTF-8 strings
  • Size is uint32 indicating message length in bytes
  • Language implementations convert to native error types

Integer Types

All integers are encoded in big-endian format:

  • Most significant byte first
  • Fixed width based on type
  • No variable-length encoding

Float Types

  • Follow IEEE 754 standard
  • 32-bit (Float32) or 64-bit (Float64) precision
  • Encoded in big-endian format

Buffer Management

  • Default buffer size: 512 bytes
  • Buffers can grow dynamically
  • No internal fragmentation
  • No padding between elements
  • Zero-copy operations where possible

Limitations

  • No schema versioning
  • No optional fields
  • No type inheritance
  • No variable-length integers
  • No compression
  • No cyclic references
  • No metadata
  • No padding or alignment

These limitations are intentional design choices that enable Polyglot's high-performance characteristics.

Edit on GitHub

Last updated on

On this page