Extended Exercise: Defining Helpful Utilities
In this section we’ll get started writing a DaeDaLus specification for the PNG format. While you shouldn’t need to read much of the PNG layout specification for this tutorial, we provide it as a reference. In this tutorial, we’ll present any relevant information within the exercises themselves. If you want to write a full PNG implementation at some point, though, the full PNG specification has everything you’ll need.
To begin, open your favorite text editor and create a new file named
png.ddl
. This is where you’ll write the format specification. The
first line of your file, after any comments you wish to add to introduce
the module, should be:
import Daedalus
This will load all of the standard library features covered in the previous section, which you’ll start using right away.
Domain-Specific Parser Names
For total clarity in code, it is often useful to give new names to
things that are otherwise quite generic (e.g. to represent lengths,
we may be using floating-point numbers, but the type name length
is more informative than float
, so we might want to alias these
names). In PNG, there are a number of examples of this being useful
since unsigned 32-bit words are used for a number of very different
components.
Exercise: Define two parsers, Length
and Crc
, that each parse a
big-endian, 32-bit, unsigned integer.
Hint
You may want to review the section aside: the daedalus standard library if you’re not sure how to deal with the endianness.
Solution
def Length = BEUInt32
def Crc = BEUInt32
Exercise: There are a couple of places in the PNG specification where we
have 1-bit flags, indicating whether some option is enabled or not. Write a
parser FLAG
that matches a byte that is 0
or 1
.
Solution
We use a character class to write this very succinctly:
def FLAG = $[0 .. 1]
For brevity, we’ll prefer this syntax in the other provided solutions.
Null-terminated Strings
A null-terminated string is an array of characters terminated with
the null character (ASCII value 0
). These are also known as C
strings since this is the representation used in C’s standard library
string manipulation functions.
PNG makes use of null-terminated strings in a few places, so we need to be able to parse them. In particular, we need to be able to parse strings within a specific range of sizes and strings of unbounded length.
Exercise: Define a parser, NullChar
, that parses the ASCII null
byte.
Solution
def NullChar = $[0]
Exercise: Define a parser, NonNullChar
, that parses one non-null
ASCII byte.
Solution
def NonNullChar = $[1 .. 255]
Exercise (Challenging): Define a parser, OMany
, that behaves like
Many
, but takes the integer arguments as maybe
values. Your parser
should satisfy the following laws:
OMany nothing nothing P = Many P
OMany nothing (just max) P = Many (..max) P
OMany (just min) nothing P = Many (min..) P
OMany (just min) (just max) P = Many (min..max) P
Hint
The above equations are the hard part! Your job is to write the ‘glue’ to bring it all together.
Solution
def OMany (omin:maybe (uint 64)) (omax:maybe (uint 64)) P =
case omin of
nothing -> case omax of
nothing -> Many P
just max -> Many (..max) P
just min -> case omax of
nothing -> Many (min..) P
just max -> Many (min..max) P
Note that the right-hand sides of each case arm is the right-hand side of one of the laws; this form of algebraic specification is very useful when writing functional code, as all that is left once we have established the laws is writing the appropriate pattern-matching code to cover each one.
Exercise (Challenging): Define a parser, NTString
, that parses a
null-terminated string between min
and max
characters in length,
if bounds are provided (i.e. the bounds should be maybe
values). The
null character should not be included in the character count.
Hint
The OMany
parser you wrote in the previous exercise should be extremely
helpful.
Solution
def NTString (omin:maybe (uint 64)) (omax:maybe (uint 64)) =
block
$$ = OMany omin omax NonNullChar
NullChar
Common Structures
Time and RGB color values are both critical parts of the PNG specification. The latter shouldn’t be surprising given that this is an image format, but the former is perhaps a little surprising: by the format’s definition, PNG chunks can carry last-modified time data.
Exercise: Define a parser RGB
that consumes three bytes and returns
them in a structure with fields red
, green
, and blue
. The parsed
bytes should be assigned to those fields in that order (i.e. red
is the
first byte, green
the second, and blue
the third).
Solution
def RGB =
block
red = UInt8
green = UInt8
blue = UInt8
While the order of fields in a structure doesn’t matter in general, when writing DaeDaLus specifications, we must be sure to write the fields in the order in which things will be parsed. If there is a clearer order to write the structure itself in, you can feel free to use the de-sugared form of sequence parsing discussed in an earlier section.
Exercise: The time format used by PNG is given by this table:
Year |
Month |
Day |
Hour |
Minute |
Second |
---|---|---|---|---|---|
2 bytes |
1 byte (1 - 12) |
1 byte (1 - 31) |
1 byte (0 - 23) |
1 byte (0 - 59) |
1 byte (0 - 60) |
Write a parser UTCTime
that produces a structure with these fields
with the given value constraints. The order of columns in the table is
the order the fields should be parsed and the year should be parsed in
big-endian order.
Hint
Remember that you can use the syntax $[n .. m]
to parse one byte with
a value between n
and m
, inclusive.
Solution
def UTCTime =
block
year = BEUInt16
month = $[1 .. 12]
day = $[1 .. 31]
hour = $[0 .. 23]
minute = $[0 .. 59]
second = $[0 .. 60]
In case you’re wondering why the second
field allows for a value of 60:
It’s to allow for leap-seconds, according to the PNG specification.