Types

Types, constants, defaults and macros are defined in a nested dictionary which is usually serialised to YAML. This file, usually named types.yml, consists of three (optional) sections; types, constants, defaults and macros. In general the types file will look something like this:

---
constants:
  multiplier: 10
defaults:
  size: 2
types:
  s_char:
    size: 1
    function:
      name: struct
  text:
    delimiter:
      - 0x00

Types

A type consists of two subunits controlling two stages; the acquirement stage and the processing stage.

The acquirement stage is controlled by the size and delimiter parameters, the size is given in number of bytes, the delimiter is a list of bytes. Usually specifying one of these parameters is sufficient for the acquisition of the data, but in some cases, where for example we have to read a fixed sized block in which a string of variable size is stored, both parameters can be used simultaneously. Once the data is acquired, it is passed to the processing stage.

The processing stage is controlled by the function parameter, it denotes the function that is responsible for processing the acquired data. Additional parameters for this function can be supplied by the args parameter.

Basic types

In version 0.0.14 the struct type was introduced to replace basic types like int, float, etc. and simple compound data types. The formatting parameter fmt is used to control how a value is packed or unpacked. For example, a 4-byte little-endian integer uses the formatting string '<i' and a big-endian unsigned long uses the formatting string '>L'. To avoid any issues with serialisation to YAML (the > sign may cause problems), it is recommended to quote the string.

For a complete overview of the supported basic types, see the Python struct documentation or our extensive list of examples.

Examples

The following type is stored in two bytes and is processed by the text function:

id:
  size: 2
  function:
    name: text

This type is stored in a variable size array delimited by 0x00 and is processed by the text function:

comment:
  delimiter:
    - 0x00
  function:
    name: text

We can pass additional parameters to the text function, in this case split on the character 0x09, like so:

comment:
  delimiter:
    - 0x00
  function:
    name: text
    args:
      split:
        - 0x09

A 2-byte little-endian integer is defined as follows:

int:
  size: 2
  function:
    name: struct
    args:
      fmt: '<h'

And a 4-byte big-endian float is defined as follows:

float:
  size: 4
  function:
    name: struct
    args:
      fmt: '>f'

Compound types

Simple compound types can also be created using the struct function. By default this will return a list of basic types, which can optionally be mapped using an annotation list. Additionally, a simple dictionary can be created by labeling the basic types.

In the following example, we read three unsigned bytes, by providing a list of labels, the first byte is labelled r, the second one g, and the last one b. If the values are 0, 255 and 128 respectively, the resulting dictionary will be: {'r': 0, 'g': 255, 'b': 128}.

colour:
  size: 3
  function:
    name: struct
    args:
      fmt: 'BBB'
      labels: [r, g, b]

Values can also be mapped using an annotation list to improve readability. This procedure replaces specific values by their annotation and leaves other values unaltered. Note that mapping multiple values to the same annotation will break reversibility of the parser.

In the following example, we read one 4-byte little-endian unsigned integer and provide annotation for the maximum and minimum value. If the value is 0, the result will be unknown, if the value is 10, the result will be 10 as well.

date:
  size: 4
  function:
    name: struct
    args:
      fmt: '<I'
      annotation:
        0xffffffff: defined
        0x00000000: unknown

Labels and annotation lists can be combined.

Constants

A constant can be used as an alias in structure.yml. Using constants can make conditional statements and loops more readable.

Defaults

To save some space and time writing types definitions, the following default values are used:

  • size defaults to 1.
  • function defaults to the name of the type.
  • If no name is given, the type defaults to raw and the destination is a list named __raw__.

So, for example, since a byte is of size 1, we can omit the size parameter in the type definition:

byte:
  function:
    name: struct

In the next example the function text will be used.

text:
  size: 2

And if we need an integer of size one which we want to name struct, we do not need to define anything.

If the following construction is used in the structure, the type will default to raw:

- name:
  size: 20

Overrides

The following defaults can be overridden by adding an entry in the defaults section:

  • delimiter (defaults to []).
  • name (defaults to '').
  • size (defaults to 1).
  • type (defaults to text).
  • unknown_destination (defaults to __raw__).
  • unknown_type (defaults to raw).

Macros

Macros were introduced in version 0.0.15 to define complex compound types. A macro is equivalent to a sub structure, which are also used in the structure definition either as is, or as the body of a loop or conditional statement.

In the following example, we have a substructure that occurs more than once in our binary file. We have two persons, of which the name, age, weight and height are stored. Using a flat file structure will result in something similar to this:

---
- name: name_1
- name: age_1
  type: u_char
- name: weight_1
  type: u_char
- name: height_1
  type: u_char
- name: name_2
- name: age_2
  type: u_char
- name: weight_2
  type: u_char
- name: height_2
  type: u_char

Note that we have to choose new variable names for every instance of a person. This makes downstream processing quite tedious. Furthermore, code duplication makes maintenance tedious.

The structure directive can be used to group variables in a substructure. This solves the variable naming issue, but it does not solve the maintenance issue.

---
- name: person_1
  structure:
  - name: name
  - name: age
    type: u_char
  - name: weight
    type: u_char
  - name: height
    type: u_char
- name: person_2
  structure:
  - name: name
  - name: age
    type: u_char
  - name: weight
    type: u_char
  - name: height
    type: u_char

We can define a macro in the types.yml file by adding a section named macros where we describe the structure of the group of variables.

---
types:
  u_char:
    function:
      name: struct
      args:
        fmt: 'B'
  text:
    delimiter:
      - 0x00
macros:
  person:
    - name: name
    - name: age
      type: u_char
    - name: weight
      type: u_char
    - name: height
      type: u_char

This macro can then be used in the structure.yml file in almost the same we we use a basic type.

---
- name: person_1
  macro: person
- name: person_2
  macro: person

A common substructure in binary formats is a data field preceded by its length, e.g., a string preceded by its length as a little endian 32-bit unsigned integer: \x0b\x00\x00\x00hello world. In the size_string example we show how we can use a macro to facilitate this.

Macros can also be used to define variable types, i.e., a type that depends on the value of a previously defined variable. In the var_type example, we show how this can be accomplished.