mirror of
https://github.com/pygos/pkg-utils.git
synced 2024-11-21 20:39:46 +01:00
Add writeup on package file structure
Signed-off-by: David Oberhollenzer <goliath@infraroot.at>
This commit is contained in:
parent
f11c9df330
commit
1a84276bf6
1 changed files with 173 additions and 0 deletions
173
doc/fileformat.md
Normal file
173
doc/fileformat.md
Normal file
|
@ -0,0 +1,173 @@
|
|||
# Package File Format
|
||||
|
||||
## Record Structure
|
||||
|
||||
A package file consists of a series of records with possibly compressed
|
||||
payload. Each record has a header, indicating the type of record, raw and
|
||||
compressed size of the payload data and the compression algorithm used.
|
||||
|
||||
All multi byte integers are stored in little endian byte order.
|
||||
|
||||
Knowledge of the payload size lets a decoder skip unknown record types inserted
|
||||
by an encoder that supports a newer version of the format, allowing for some
|
||||
degree of backwards compatibility.
|
||||
|
||||
The diagram below illustrates what a record header looks like. The horizontally
|
||||
arranged numbers indicate a byte offset inside a 32 bit word and the vertical
|
||||
numbers count 32 bit words from the start of the header.
|
||||
|
||||
0 1 2 3
|
||||
+-------+-------+-------+-------+
|
||||
0 | magic |
|
||||
+-------+-------+-------+-------+
|
||||
1 | comp |reserved for future use|
|
||||
+-------+-------+-------+-------+
|
||||
2 | |
|
||||
| compressed size |
|
||||
3 | |
|
||||
+-------+-------+-------+-------+
|
||||
4 | |
|
||||
| uncompressed size |
|
||||
5 | |
|
||||
+-------+-------+-------+-------+
|
||||
6 | |
|
||||
. | payload |
|
||||
. | |
|
||||
|/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/
|
||||
|
||||
/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/|
|
||||
| |
|
||||
|_______________________________|
|
||||
|
||||
|
||||
The field `magic` holds a 32 bit magic number indicating the chunk type.
|
||||
|
||||
Currently, the following chunk types are supported:
|
||||
|
||||
* `PKG_MAGIC_HEADER` with the value `0x21676B70` (ASCII "pkg!"). The overall
|
||||
package header with information about the package.
|
||||
* `PKG_MAGIC_TOC` with the value `0x21636F74` (ASCII "toc!"). The table of
|
||||
contents record.
|
||||
* `PKG_MAGIC_DATA` with the value `0x21746164` (ASCII "dat!"). The package data
|
||||
record.
|
||||
|
||||
The byte labeled `comp` holds a compression algorithm identifier. Currently, the
|
||||
following compression algorithms are supported:
|
||||
|
||||
* `PKG_COMPRESSION_NONE` with the value 0. The payload is uncompressed. The
|
||||
compressed size of the record payload must equal the uncompressed size.
|
||||
* `PKG_COMPRESSION_ZLIB` with the value 1. The record payload area contains a
|
||||
raw zlib stream.
|
||||
* `PKG_COMPRESSION_LZMA` with the value 2. The record payload area contains
|
||||
lzma compressed data.
|
||||
|
||||
The compressor ID is padded with 3 bytes that **must be set to zero** by an
|
||||
encoder and are currently reserved for future use.
|
||||
|
||||
## Package Header Record
|
||||
|
||||
The header record must be present in ever package and must always be the first
|
||||
record in a package file. If a decoder encounters a file which does not start
|
||||
with the magic value of the header record, it must reject the file.
|
||||
|
||||
Future versions of the format that break backwards compatibility can simply
|
||||
introduce a new magic value for the (possibly altered) header record. Older
|
||||
decoders are expected to reject a file with the newer format, while newer
|
||||
decoders can implement different behavior depending on what magic value they
|
||||
find at the start.
|
||||
|
||||
The payload of header record is currently only used to store package
|
||||
dependencies. It is byte aligned and contains a 16 bit integer, indicating the
|
||||
number of dependencies, followed by a sequence of dependent packages.
|
||||
|
||||
Each dependent package is encoded as follows:
|
||||
|
||||
0 1 2 length - 1
|
||||
+---------+---------+------- -+---------+
|
||||
| type | length | name ... | |
|
||||
+---------+---------+------- -+---------+
|
||||
|
||||
|
||||
The type field must be set to 0, indicating that a dependency is required by a
|
||||
package.
|
||||
|
||||
The length byte indicates the length of the package name that follows, allowing
|
||||
for up to 255 bytes of package name afterwards.
|
||||
|
||||
If all dependencies have been processed, but there is still payload data left
|
||||
in the header record, a decoder must ignore the extra data and skip to the end
|
||||
of the record.
|
||||
|
||||
## Table of Contents Record
|
||||
|
||||
For each file, directory, symlink, et cetera contained in the package, the
|
||||
table of contents contains an entry with the following common structure:
|
||||
|
||||
0 1 2 3
|
||||
+-------+-------+-------+-------+
|
||||
0 | mode |
|
||||
+-------+-------+-------+-------+
|
||||
1 | user id |
|
||||
+-------+-------+-------+-------+
|
||||
2 | group id |
|
||||
+-------+-------+-------+-------+
|
||||
3 | path length |
|
||||
+---------------+
|
||||
|
||||
The mode field contains standard UNIX permissions. The user ID and group ID
|
||||
fields contain the numeric IDs of the user and group respectively that own
|
||||
the file. The path length field indicates the length of the byte aligned,
|
||||
absolute file name that follows.
|
||||
|
||||
The file path is expected to neither start nor end with a slash, contain no
|
||||
sequences of more than one slash and not contain the components `.` or `..`.
|
||||
|
||||
On the bit level, the mode field is structured as follows:
|
||||
|
||||
1 1 1 1 1 1
|
||||
5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0
|
||||
type |U|G|S|r|w|x|r|w|x|r|w|x|
|
||||
| | | | | | | | | | | | |
|
||||
| | | | | | | | | | | | +--- others execute permission flag
|
||||
| | | | | | | | | | | +----- others write permission flag
|
||||
| | | | | | | | | | +------- others read permission flag
|
||||
| | | | | | | | | +--------- group execute permission flag
|
||||
| | | | | | | | +----------- group write permission flag
|
||||
| | | | | | | +------------- group read permission flag
|
||||
| | | | | | +--------------- owner execute permission flag
|
||||
| | | | | +----------------- owner write permission flag
|
||||
| | | | +------------------- owner read permission flag
|
||||
| | | +--------------------- sticky bit
|
||||
| | +----------------------- set GID bit
|
||||
| +------------------------- set UID bit
|
||||
+----------------------------- file type
|
||||
|
||||
The upper 16 bit of the mode filed must be set to zero.
|
||||
|
||||
Currently, the following file types are supported:
|
||||
|
||||
* `S_IFCHR` with a value of 2. The entry defines a character device.
|
||||
* `S_IFDIR` with a value of 4. The entry defines a directory.
|
||||
* `S_IFBLK` with a value of 6. The entry defines a block device.
|
||||
* `S_IFREG` with a value of 8. The entry defines a regular file.
|
||||
* `S_IFLNK` with a value of 10. The entry defines a symbolic link.
|
||||
|
||||
|
||||
For character and block devices, the file path is followed by a byte aligned,
|
||||
64 bit device number.
|
||||
|
||||
For regular files, the path is followed by a byte aligned, 64 bit integer
|
||||
indicating the total size of the file in bytes, followed by a byte aligned,
|
||||
32 bit integer containing a unique file ID.
|
||||
|
||||
For symlinks, the path is followed by a byte aligned, 16 bit integer holding
|
||||
the length of the target path, followed by the byte aligned symlink target.
|
||||
|
||||
## Data Record
|
||||
|
||||
A package may contain multiple data records holding raw file data. Each data
|
||||
record contains a sequence of a byte aligned, 32 bit file ID, followed by raw
|
||||
data until the file is filled (size indicated by table of contents is reached).
|
||||
|
||||
A file may not span across multiple data records. A file ID must not occur
|
||||
more than once in a data record and must only occur in a single data record.
|
Loading…
Reference in a new issue