diff --git a/doc/fileformat.md b/doc/fileformat.md
new file mode 100644
index 0000000..4d5dcfc
--- /dev/null
+++ b/doc/fileformat.md
@@ -0,0 +1,173 @@
+# Package File Format
+
+## Record Structure
+
+A package file consists of a series of records with possibly compressed
+payload. Each record has a header, indicating the type of record, raw and
+compressed size of the payload data and the compression algorithm used.
+
+All multi byte integers are stored in little endian byte order.
+
+Knowledge of the payload size lets a decoder skip unknown record types inserted
+by an encoder that supports a newer version of the format, allowing for some
+degree of backwards compatibility.
+
+The diagram below illustrates what a record header looks like. The horizontally
+arranged numbers indicate a byte offset inside a 32 bit word and the vertical
+numbers count 32 bit words from the start of the header.
+
+          0       1       2       3
+      +-------+-------+-------+-------+
+	0 |             magic             |
+	  +-------+-------+-------+-------+
+	1 | comp  |reserved for future use|
+      +-------+-------+-------+-------+
+	2 |                               |
+	  |        compressed size        |
+	3 |                               |
+	  +-------+-------+-------+-------+
+	4 |                               |
+	  |       uncompressed size       |
+	5 |                               |
+	  +-------+-------+-------+-------+
+	6 |                               |
+	. |            payload            |
+	. |                               |
+	  |/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/
+
+       /\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/|
+	  |                               |
+	  |_______________________________|
+
+
+The field `magic` holds a 32 bit magic number indicating the chunk type.
+
+Currently, the following chunk types are supported:
+
+* `PKG_MAGIC_HEADER` with the value `0x21676B70` (ASCII "pkg!"). The overall
+  package header with information about the package.
+* `PKG_MAGIC_TOC` with the value `0x21636F74` (ASCII "toc!"). The table of
+  contents record.
+* `PKG_MAGIC_DATA` with the value `0x21746164` (ASCII "dat!"). The package data
+  record.
+
+The byte labeled `comp` holds a compression algorithm identifier. Currently, the
+following compression algorithms are supported:
+
+* `PKG_COMPRESSION_NONE` with the value 0. The payload is uncompressed. The
+  compressed size of the record payload must equal the uncompressed size.
+* `PKG_COMPRESSION_ZLIB` with the value 1. The record payload area contains a
+  raw zlib stream.
+* `PKG_COMPRESSION_LZMA` with the value 2. The record payload area contains
+  lzma compressed data.
+
+The compressor ID is padded with 3 bytes that **must be set to zero** by an
+encoder and are currently reserved for future use.
+
+## Package Header Record
+
+The header record must be present in ever package and must always be the first
+record in a package file. If a decoder encounters a file which does not start
+with the magic value of the header record, it must reject the file.
+
+Future versions of the format that break backwards compatibility can simply
+introduce a new magic value for the (possibly altered) header record. Older
+decoders are expected to reject a file with the newer format, while newer
+decoders can implement different behavior depending on what magic value they
+find at the start.
+
+The payload of header record is currently only used to store package
+dependencies. It is byte aligned and contains a 16 bit integer, indicating the
+number of dependencies, followed by a sequence of dependent packages.
+
+Each dependent package is encoded as follows:
+
+         0         1         2         length - 1
+    +---------+---------+-------     -+---------+
+    |  type   | length  | name   ...  |         |
+    +---------+---------+-------     -+---------+
+
+
+The type field must be set to 0, indicating that a dependency is required by a
+package.
+
+The length byte indicates the length of the package name that follows, allowing
+for up to 255 bytes of package name afterwards.
+
+If all dependencies have been processed, but there is still payload data left
+in the header record, a decoder must ignore the extra data and skip to the end
+of the record.
+
+## Table of Contents Record
+
+For each file, directory, symlink, et cetera contained in the package, the
+table of contents contains an entry with the following common structure:
+
+          0       1       2       3
+      +-------+-------+-------+-------+
+    0 |              mode             |
+      +-------+-------+-------+-------+
+    1 |            user id            |
+      +-------+-------+-------+-------+
+    2 |           group id            |
+      +-------+-------+-------+-------+
+    3 |  path length  |
+      +---------------+
+
+The mode field contains standard UNIX permissions. The user ID and group ID
+fields contain the numeric IDs of the user and group respectively that own
+the file. The path length field indicates the length of the byte aligned,
+absolute file name that follows.
+
+The file path is expected to neither start nor end with a slash, contain no
+sequences of more than one slash and not contain the components `.` or `..`.
+
+On the bit level, the mode field is structured as follows:
+
+    1 1 1 1 1 1
+    5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0
+      type |U|G|S|r|w|x|r|w|x|r|w|x|
+        |   | | | | | | | | | | | |
+        |   | | | | | | | | | | | +--- others execute permission flag
+        |   | | | | | | | | | | +----- others write permission flag
+        |   | | | | | | | | | +------- others read permission flag
+        |   | | | | | | | | +--------- group execute permission flag
+        |   | | | | | | | +----------- group write permission flag
+        |   | | | | | | +------------- group read permission flag
+        |   | | | | | +--------------- owner execute permission flag
+        |   | | | | +----------------- owner write permission flag
+        |   | | | +------------------- owner read permission flag
+        |   | | +--------------------- sticky bit
+        |   | +----------------------- set GID bit
+        |   +------------------------- set UID bit
+        +----------------------------- file type
+
+The upper 16 bit of the mode filed must be set to zero.
+
+Currently, the following file types are supported:
+
+* `S_IFCHR` with a value of 2. The entry defines a character device.
+* `S_IFDIR` with a value of 4. The entry defines a directory.
+* `S_IFBLK` with a value of 6. The entry defines a block device.
+* `S_IFREG` with a value of 8. The entry defines a regular file.
+* `S_IFLNK` with a value of 10. The entry defines a symbolic link.
+
+
+For character and block devices, the file path is followed by a byte aligned,
+64 bit device number.
+
+For regular files, the path is followed by a byte aligned, 64 bit integer
+indicating the total size of the file in bytes, followed by a byte aligned,
+32 bit integer containing a unique file ID.
+
+For symlinks, the path is followed by a byte aligned, 16 bit integer holding
+the length of the target path, followed by the byte aligned symlink target.
+
+## Data Record
+
+A package may contain multiple data records holding raw file data. Each data
+record contains a sequence of a byte aligned, 32 bit file ID, followed by raw
+data until the file is filled (size indicated by table of contents is reached).
+
+A file may not span across multiple data records. A file ID must not occur
+more than once in a data record and must only occur in a single data record.