Types¶
Types, constants, defaults and macros are defined in a nested dictionary which
is usually serialised to YAML. This file, usually named types.yml
, consists
of three (optional) sections; types
, constants
, defaults
and
macros
. In general the types file will look something like this:
---
constants:
multiplier: 10
defaults:
size: 2
types:
s_char:
size: 1
function:
name: struct
text:
delimiter:
- 0x00
Types¶
A type consists of two subunits controlling two stages; the acquirement stage and the processing stage.
The acquirement stage is controlled by the size
and delimiter
parameters, the size is given in number of bytes, the delimiter is a list of
bytes. Usually specifying one of these parameters is sufficient for the
acquisition of the data, but in some cases, where for example we have to read a
fixed sized block in which a string of variable size is stored, both parameters
can be used simultaneously. Once the data is acquired, it is passed to the
processing stage.
The processing stage is controlled by the function
parameter, it denotes
the function that is responsible for processing the acquired data. Additional
parameters for this function can be supplied by the args
parameter.
Basic types¶
In version 0.0.14 the struct
type was introduced to replace basic types
like int
, float
, etc. and simple compound data types. The formatting
parameter fmt
is used to control how a value is packed or unpacked. For
example, a 4-byte little-endian integer uses the formatting string '<i'
and
a big-endian unsigned long uses the formatting string '>L'
. To avoid any
issues with serialisation to YAML (the >
sign may cause problems), it is
recommended to quote the string.
For a complete overview of the supported basic types, see the Python struct documentation or our extensive list of examples.
Examples¶
The following type is stored in two bytes and is processed by the
text
function:
id:
size: 2
function:
name: text
This type is stored in a variable size array delimited by 0x00
and is
processed by the text
function:
comment:
delimiter:
- 0x00
function:
name: text
We can pass additional parameters to the text
function, in this case split
on the character 0x09
, like so:
comment:
delimiter:
- 0x00
function:
name: text
args:
split:
- 0x09
A 2-byte little-endian integer is defined as follows:
int:
size: 2
function:
name: struct
args:
fmt: '<h'
And a 4-byte big-endian float is defined as follows:
float:
size: 4
function:
name: struct
args:
fmt: '>f'
Compound types¶
Simple compound types can also be created using the struct
function. By
default this will return a list of basic types, which can optionally be mapped
using an annotation list. Additionally, a simple dictionary can be created by
labeling the basic types.
In the following example, we read three unsigned bytes, by providing a list of
labels, the first byte is labelled r
, the second one g
, and the last
one b
. If the values are 0, 255 and 128 respectively, the resulting
dictionary will be: {'r': 0, 'g': 255, 'b': 128}
.
colour:
size: 3
function:
name: struct
args:
fmt: 'BBB'
labels: [r, g, b]
Values can also be mapped using an annotation list to improve readability. This procedure replaces specific values by their annotation and leaves other values unaltered. Note that mapping multiple values to the same annotation will break reversibility of the parser.
In the following example, we read one 4-byte little-endian unsigned integer and
provide annotation for the maximum and minimum value. If the value is 0, the
result will be unknown
, if the value is 10, the result will be 10 as well.
date:
size: 4
function:
name: struct
args:
fmt: '<I'
annotation:
0xffffffff: defined
0x00000000: unknown
Labels and annotation lists can be combined.
Constants¶
A constant can be used as an alias in structure.yml
. Using constants can
make conditional statements and loops more readable.
Defaults¶
To save some space and time writing types definitions, the following default values are used:
size
defaults to1
.function
defaults to the name of the type.- If no name is given, the type defaults to
raw
and the destination is a list named__raw__
.
So, for example, since a byte is of size 1, we can omit the size
parameter
in the type definition:
byte:
function:
name: struct
In the next example the function text
will be used.
text:
size: 2
And if we need an integer of size one which we want to name struct
, we do
not need to define anything.
If the following construction is used in the structure, the type will default
to raw
:
- name:
size: 20
Overrides¶
The following defaults can be overridden by adding an entry in the defaults
section:
delimiter
(defaults to[]
).name
(defaults to''
).size
(defaults to 1).type
(defaults totext
).unknown_destination
(defaults to__raw__
).unknown_type
(defaults toraw
).
Macros¶
Macros were introduced in version 0.0.15 to define complex compound types. A macro is equivalent to a sub structure, which are also used in the structure definition either as is, or as the body of a loop or conditional statement.
In the following example, we have a substructure that occurs more than once in our binary file. We have two persons, of which the name, age, weight and height are stored. Using a flat file structure will result in something similar to this:
---
- name: name_1
- name: age_1
type: u_char
- name: weight_1
type: u_char
- name: height_1
type: u_char
- name: name_2
- name: age_2
type: u_char
- name: weight_2
type: u_char
- name: height_2
type: u_char
Note that we have to choose new variable names for every instance of a person. This makes downstream processing quite tedious. Furthermore, code duplication makes maintenance tedious.
The structure
directive can be used to group variables in a substructure.
This solves the variable naming issue, but it does not solve the maintenance
issue.
---
- name: person_1
structure:
- name: name
- name: age
type: u_char
- name: weight
type: u_char
- name: height
type: u_char
- name: person_2
structure:
- name: name
- name: age
type: u_char
- name: weight
type: u_char
- name: height
type: u_char
We can define a macro in the types.yml
file by adding a section named
macros
where we describe the structure of the group of variables.
---
types:
u_char:
function:
name: struct
args:
fmt: 'B'
text:
delimiter:
- 0x00
macros:
person:
- name: name
- name: age
type: u_char
- name: weight
type: u_char
- name: height
type: u_char
This macro can then be used in the structure.yml
file in almost the same we
we use a basic type.
---
- name: person_1
macro: person
- name: person_2
macro: person
A common substructure in binary formats is a data field preceded by its length,
e.g., a string preceded by its length as a little endian 32-bit unsigned
integer: \x0b\x00\x00\x00hello world
. In the size_string example we show
how we can use a macro to facilitate this.
Macros can also be used to define variable types, i.e., a type that depends on the value of a previously defined variable. In the var_type example, we show how this can be accomplished.