mirror of
https://github.com/pygos/pkg-utils.git
synced 2024-11-22 04:49:46 +01:00
Add writeup on package file structure
Signed-off-by: David Oberhollenzer <goliath@infraroot.at>
This commit is contained in:
parent
f11c9df330
commit
1a84276bf6
1 changed files with 173 additions and 0 deletions
173
doc/fileformat.md
Normal file
173
doc/fileformat.md
Normal file
|
@ -0,0 +1,173 @@
|
||||||
|
# Package File Format
|
||||||
|
|
||||||
|
## Record Structure
|
||||||
|
|
||||||
|
A package file consists of a series of records with possibly compressed
|
||||||
|
payload. Each record has a header, indicating the type of record, raw and
|
||||||
|
compressed size of the payload data and the compression algorithm used.
|
||||||
|
|
||||||
|
All multi byte integers are stored in little endian byte order.
|
||||||
|
|
||||||
|
Knowledge of the payload size lets a decoder skip unknown record types inserted
|
||||||
|
by an encoder that supports a newer version of the format, allowing for some
|
||||||
|
degree of backwards compatibility.
|
||||||
|
|
||||||
|
The diagram below illustrates what a record header looks like. The horizontally
|
||||||
|
arranged numbers indicate a byte offset inside a 32 bit word and the vertical
|
||||||
|
numbers count 32 bit words from the start of the header.
|
||||||
|
|
||||||
|
0 1 2 3
|
||||||
|
+-------+-------+-------+-------+
|
||||||
|
0 | magic |
|
||||||
|
+-------+-------+-------+-------+
|
||||||
|
1 | comp |reserved for future use|
|
||||||
|
+-------+-------+-------+-------+
|
||||||
|
2 | |
|
||||||
|
| compressed size |
|
||||||
|
3 | |
|
||||||
|
+-------+-------+-------+-------+
|
||||||
|
4 | |
|
||||||
|
| uncompressed size |
|
||||||
|
5 | |
|
||||||
|
+-------+-------+-------+-------+
|
||||||
|
6 | |
|
||||||
|
. | payload |
|
||||||
|
. | |
|
||||||
|
|/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/
|
||||||
|
|
||||||
|
/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/|
|
||||||
|
| |
|
||||||
|
|_______________________________|
|
||||||
|
|
||||||
|
|
||||||
|
The field `magic` holds a 32 bit magic number indicating the chunk type.
|
||||||
|
|
||||||
|
Currently, the following chunk types are supported:
|
||||||
|
|
||||||
|
* `PKG_MAGIC_HEADER` with the value `0x21676B70` (ASCII "pkg!"). The overall
|
||||||
|
package header with information about the package.
|
||||||
|
* `PKG_MAGIC_TOC` with the value `0x21636F74` (ASCII "toc!"). The table of
|
||||||
|
contents record.
|
||||||
|
* `PKG_MAGIC_DATA` with the value `0x21746164` (ASCII "dat!"). The package data
|
||||||
|
record.
|
||||||
|
|
||||||
|
The byte labeled `comp` holds a compression algorithm identifier. Currently, the
|
||||||
|
following compression algorithms are supported:
|
||||||
|
|
||||||
|
* `PKG_COMPRESSION_NONE` with the value 0. The payload is uncompressed. The
|
||||||
|
compressed size of the record payload must equal the uncompressed size.
|
||||||
|
* `PKG_COMPRESSION_ZLIB` with the value 1. The record payload area contains a
|
||||||
|
raw zlib stream.
|
||||||
|
* `PKG_COMPRESSION_LZMA` with the value 2. The record payload area contains
|
||||||
|
lzma compressed data.
|
||||||
|
|
||||||
|
The compressor ID is padded with 3 bytes that **must be set to zero** by an
|
||||||
|
encoder and are currently reserved for future use.
|
||||||
|
|
||||||
|
## Package Header Record
|
||||||
|
|
||||||
|
The header record must be present in ever package and must always be the first
|
||||||
|
record in a package file. If a decoder encounters a file which does not start
|
||||||
|
with the magic value of the header record, it must reject the file.
|
||||||
|
|
||||||
|
Future versions of the format that break backwards compatibility can simply
|
||||||
|
introduce a new magic value for the (possibly altered) header record. Older
|
||||||
|
decoders are expected to reject a file with the newer format, while newer
|
||||||
|
decoders can implement different behavior depending on what magic value they
|
||||||
|
find at the start.
|
||||||
|
|
||||||
|
The payload of header record is currently only used to store package
|
||||||
|
dependencies. It is byte aligned and contains a 16 bit integer, indicating the
|
||||||
|
number of dependencies, followed by a sequence of dependent packages.
|
||||||
|
|
||||||
|
Each dependent package is encoded as follows:
|
||||||
|
|
||||||
|
0 1 2 length - 1
|
||||||
|
+---------+---------+------- -+---------+
|
||||||
|
| type | length | name ... | |
|
||||||
|
+---------+---------+------- -+---------+
|
||||||
|
|
||||||
|
|
||||||
|
The type field must be set to 0, indicating that a dependency is required by a
|
||||||
|
package.
|
||||||
|
|
||||||
|
The length byte indicates the length of the package name that follows, allowing
|
||||||
|
for up to 255 bytes of package name afterwards.
|
||||||
|
|
||||||
|
If all dependencies have been processed, but there is still payload data left
|
||||||
|
in the header record, a decoder must ignore the extra data and skip to the end
|
||||||
|
of the record.
|
||||||
|
|
||||||
|
## Table of Contents Record
|
||||||
|
|
||||||
|
For each file, directory, symlink, et cetera contained in the package, the
|
||||||
|
table of contents contains an entry with the following common structure:
|
||||||
|
|
||||||
|
0 1 2 3
|
||||||
|
+-------+-------+-------+-------+
|
||||||
|
0 | mode |
|
||||||
|
+-------+-------+-------+-------+
|
||||||
|
1 | user id |
|
||||||
|
+-------+-------+-------+-------+
|
||||||
|
2 | group id |
|
||||||
|
+-------+-------+-------+-------+
|
||||||
|
3 | path length |
|
||||||
|
+---------------+
|
||||||
|
|
||||||
|
The mode field contains standard UNIX permissions. The user ID and group ID
|
||||||
|
fields contain the numeric IDs of the user and group respectively that own
|
||||||
|
the file. The path length field indicates the length of the byte aligned,
|
||||||
|
absolute file name that follows.
|
||||||
|
|
||||||
|
The file path is expected to neither start nor end with a slash, contain no
|
||||||
|
sequences of more than one slash and not contain the components `.` or `..`.
|
||||||
|
|
||||||
|
On the bit level, the mode field is structured as follows:
|
||||||
|
|
||||||
|
1 1 1 1 1 1
|
||||||
|
5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0
|
||||||
|
type |U|G|S|r|w|x|r|w|x|r|w|x|
|
||||||
|
| | | | | | | | | | | | |
|
||||||
|
| | | | | | | | | | | | +--- others execute permission flag
|
||||||
|
| | | | | | | | | | | +----- others write permission flag
|
||||||
|
| | | | | | | | | | +------- others read permission flag
|
||||||
|
| | | | | | | | | +--------- group execute permission flag
|
||||||
|
| | | | | | | | +----------- group write permission flag
|
||||||
|
| | | | | | | +------------- group read permission flag
|
||||||
|
| | | | | | +--------------- owner execute permission flag
|
||||||
|
| | | | | +----------------- owner write permission flag
|
||||||
|
| | | | +------------------- owner read permission flag
|
||||||
|
| | | +--------------------- sticky bit
|
||||||
|
| | +----------------------- set GID bit
|
||||||
|
| +------------------------- set UID bit
|
||||||
|
+----------------------------- file type
|
||||||
|
|
||||||
|
The upper 16 bit of the mode filed must be set to zero.
|
||||||
|
|
||||||
|
Currently, the following file types are supported:
|
||||||
|
|
||||||
|
* `S_IFCHR` with a value of 2. The entry defines a character device.
|
||||||
|
* `S_IFDIR` with a value of 4. The entry defines a directory.
|
||||||
|
* `S_IFBLK` with a value of 6. The entry defines a block device.
|
||||||
|
* `S_IFREG` with a value of 8. The entry defines a regular file.
|
||||||
|
* `S_IFLNK` with a value of 10. The entry defines a symbolic link.
|
||||||
|
|
||||||
|
|
||||||
|
For character and block devices, the file path is followed by a byte aligned,
|
||||||
|
64 bit device number.
|
||||||
|
|
||||||
|
For regular files, the path is followed by a byte aligned, 64 bit integer
|
||||||
|
indicating the total size of the file in bytes, followed by a byte aligned,
|
||||||
|
32 bit integer containing a unique file ID.
|
||||||
|
|
||||||
|
For symlinks, the path is followed by a byte aligned, 16 bit integer holding
|
||||||
|
the length of the target path, followed by the byte aligned symlink target.
|
||||||
|
|
||||||
|
## Data Record
|
||||||
|
|
||||||
|
A package may contain multiple data records holding raw file data. Each data
|
||||||
|
record contains a sequence of a byte aligned, 32 bit file ID, followed by raw
|
||||||
|
data until the file is filled (size indicated by table of contents is reached).
|
||||||
|
|
||||||
|
A file may not span across multiple data records. A file ID must not occur
|
||||||
|
more than once in a data record and must only occur in a single data record.
|
Loading…
Reference in a new issue