2015-11-18 Data JSON

JSON is my preferred data exchange format: it is simple, it is human readable, and it is widely adopted.

But it has one huge shortcoming: it does not support binary data.

Here I propose an extended data format DJSON, which supports binary data, and at the same time also maps nicely into plain JSON.

In order to to include binary data in JSON, we need to map it into strings somehow. For this, we have the data URI scheme which allows binary data with a media type. Data URIs without media type can just map to binary data, or escaped strings, based on encoding.

In the conversion from “JavaScript”-objects to DJSON, we want to encode

  • [optional] Data with media-types, is encoded depending on media type as a data-url. The media type can be used to encode/decode custom data types, – and/or any media-type can be of general Blobs with content info etc.
  • Binary data without mediatype, is encoded as “data:;base64,”, followed by the base64 encoded data. This is not truely “text/plain” (default media type), but excluding the the media type for binary data, means that Blobs with any media type can be encoded/decoded.
  • Strings starting with “data:”, also needs to be encoded/escaped, and the way to do this is to encode it as “data:,” followed by the “data:…” string. (Theoretically this could be either “data:,” or “data:;charset=utf-16,”, and while “charset=utf-16” is technically more correct, this is actually a strict superset of the default ASCII when JSON encodes 16-bit characters, so we shall just use the shorter/more readable “data:,” prefix).

Decoding DJSON is simple, any string starting with “data:”, is decoded using the rules above, and anything else is just passed through.

The API within JavaScript would be similar to the JSON-api, but additional methods: “jsonify” and “parseJson” to convert to/from ordinary JSON-objects, and async versions which returns a promise “stringifyAsync”, …, “parseJsonAsync”. It would also take an additional parameter for custom media types which is a map from content-type/object.constructor.name to a decoding/encoding function. (With special content-type “*/*” which matches unmatched content types, and special object.constructor.name “*” which matches unmatched objects).