BJSON

This is Binary-JSON (BJSON) format specification draft ver 0.4.
The BJSON spec can be always found on bjson.org.

Definition

BJSON is binary form of JSON.

Why

Textual JSON is widely used in data storage, serialization and streaming.
Textual JSON is extremely easy to use for developer, however textual form has some disadvantages for production environments.

There is a need of using JSON data in non-textual form which:
- is more compact than JSON, may be shorter in byte representation,
- is easy to parse,
- is easy to be implemented in most current environments,
- is easy be implemented partially when needed (for specific use),
- can be easily traversed without parsing all data (e.g. skipping some entries),
- supports all common data formats nativelly - primitive and structured,
- supports all features of JSON without additional data and coding needed,
- supports path-like "addressing" of data,
- can be easily transcoded to textual JSON back and forth (only if not extended),
- can be easily embedded into common transports: files, databases, mpeg streams, etc.
- is easy to extend at context specific or even private level,

Don't reinvent the wheel

Developers of BJSON, as well as developers using BJSON, should have understanding of existing standards, protocols and data-formats to not reinvent the wheel and to choose best fitting technology.

Especially, have some understanding of:
- BSON at http://bsonspec.org - all its pros and cons,
- ASN.1 (BER, DER, etc.) and all Abstract Syntax Notation One related work,
- Protocol Buffers from Google,
- Thrift from Apache, especially its protocols (TBinaryProtocol, TCompactProtocol, etc.),
- OGDL and its binary representation,
- XML,
- YAML,
- SmileFormat - check SmileFormatSpec,

The format

Numbers are little-endian by default.
Size fields contain number of bytes.

primitive values:

There are "zero" values, one byte sized:
0 - null
1 - numeric zero, or boolean false
2 - empty string
3 - boolean true (may be also a numeric one)

positive_integer:

4, uint8
5, uint16
6, uint32
7, uint64

negative_integer:

they are in positive form, not mod2
8, uint8
9, uint16
10, uint32
11, uint64

float:

12, 32bit float
13, 64bit float (double)

utf8_string:

default coding is utf-8
the string MUST NOT have null-termination code
string cannot have any "zero" bytes to avoid null-termination finishing the string before its real length

16, size[uint8], utf8_data[size*byte] - a short string up to 255 bytes
17, size[uint16], utf8_data[size*byte] - a string of up to 64k bytes
18, size[uint32], utf8_data[size*byte] - a long string, 64K to 4GB
19, size[uint64], utf8_data[size*byte] - a very long string, which probably won't be even used for now

binary:

binary data of specified length.
This is not fully JSON transcodable, as the JSON has no native support for binary data.

20, size[uint8], binary_data[size*byte]
21, size[uint16], binary_data[size*byte]
22, size[uint32], binary_data[size*byte]
23, size[uint64], binary_data[size*byte]

array:

in JSON represented as array [item0, item1, item2, ...]

32, size[uint8], item0, item1, item2, ...
33, size[uint16], item0, item1, item2, ...
34, size[uint32], item0, item1, item2, ...
35, size[uint64], item0, item1, item2, ...

map of key -> value:

in JSON represented as object {key0:value0, key1:value1, ...}

For JSON compatibility keys shall be utf8_string.
However implementation may ignore that (use any other type as keys, even mixing types) if the JSON-compatibility is not a requirement.

Keys should be unique.

36, size[uint8], key0, value0, key1, value1, ...
37, size[uint16], key0, value0, key1, value1, ...
38, size[uint32], key0, value0, key1, value1, ...
39, size[uint64], key0, value0, key1, value1, ...

Legals, authors etc.

You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission.

Authors

Feel free to contribute.
This document is mantained by Pietrzak 'yosh' Roman (yosh.ke.mu).