zlib and gzip format
Raw DEFLATE compressed data (RFC 1951) are typically written with a zlib or gzip wrapper encapsulating the data, by adding a header and footer. This provides stream identification and error detection that are not provided by the raw DEFLATE data.
ZLIB
ZLIB header
header fields | size | description |
---|---|---|
Compression method (CM) | 4 bits | CM = 8 (deflate with 32K window) CM = 15 (reserved) |
Compression information (CINFO) | 4 bits | CINFO=7 window size: 32K Window = 1 << (CINFO + 8) |
Flags (FLG) | 8 bits | flags, details see below |
Dictionary identifier (DICTID) | 32 bits | optional, if FLG.FDICT set. DICTID = Alder32(Dictionary) |
Dictionary | size could be checked by the DICTID | optional, if FLG.FDICT set. |
Flags (FLG)
bit fields | size | description |
---|---|---|
check bits (FCHECK) | 5 bits | check bits for CMF and FLG |
preset dictionary (FDICT) | 1 bit | preset dictionary flag |
compression level (FLEVEL) | 2 bits | compression level |
check bits(FCHECK) value must be such that CMF(CM + CINFO) and FLG, when viewed as a 16-bit unsigned integer stored in MSB order (CMF*256 + FLG), is a multiple of 31.
Simply, it should satisfy: (CMF*256 + FLG) mod 31 == 0
preset dictionary(FDICT) if set, a DICT dictionary identifier is present immediately after the FLG byte. The dictionary is a sequence of bytes which are initially fed to the compressor without producing any compressed output. DICT is the Adler-32 checksum of this sequence of bytes.
Compression level (FLEVEL) (for CM = 8)
value | description |
---|---|
0 | compressor used fastest algorithm |
1 | compressor used fast algorithm |
2 | compressor used default algorithm |
3 | compressor used maximum compression, slowest algorithm |
Compressed Data
For compression method 8, the compressed data is stored in the deflate compressed data format. For raw deflate format, please see deflate format
ZLIB footer
footer fields | size | description |
---|---|---|
Adler-32 Checksum | 32 bits | checksum value of the uncompressed data (excluding any dictionary data) |
reference
https://www.rfc-editor.org/rfc/rfc1950
https://en.wikipedia.org/wiki/Zlib
https://github.com/madler/zlib
GZIP
GZIP header
header fields | size | description |
---|---|---|
IDentification 1 (ID1) | 8 bits | fixed value ID1 = 31 (0x1f, \037) |
IDentification 2 (ID2) | 8 bits | fixed value ID2 = 139 (0x8b, \213) |
Compression method (CM) | 8 bits | CM = 0-7 (reserved), CM = 8 (deflate) |
Flags (FLG) | 8 bits | flags, details see below |
Modification TIME (MTIME) | 32 bits | Unix format of the most recent modification time |
Extra Flags (XFL) | 8 bits | Extra flags, details see below |
Operating System (OS) | 8 bits | Operating System |
XLEN | 16 bits | optional, if FLG.FEXTRA set. Extra field byte length |
Extra field | XLEN bytes | optional, if FLG.FEXTRA set. |
Original file name | zero-terminated | optional, if FLG.FNAME set. |
File comment | zero-terminated | optional, if FLG.FCOMMENT set. |
Header CRC16 | 16 bits | optional, if FLG.FHCRC set. |
Flags (FLG)
fields | bit | description |
---|---|---|
FTEXT | bit 0 | If FTEXT is set, the file is probably ASCII text. |
FHCRC | bit 1 | If FHCRC is set, a CRC16 for the gzip header is present. |
FEXTRA | bit 2 | If FEXTRA is set, optional extra fields are present. |
FNAME | bit 3 | If FNAME is set, a zero-terminated original file name is present. |
FCOMMENT | bit 4 | If FCOMMENT is set, a zero-terminated file comment is present. |
reserved | bit 5-7 | Reserved FLG bits must be zero. |
Extra Flags (XFL) (for CM = 8)
value | description |
---|---|
XFL = 2 | compressor used maximum compression, slowest algorithm. |
XFL = 4 | compressor used fastest algorithm. |
Operating System (OS)
value | description |
---|---|
0 | FAT filesystem (MS-DOS, OS/2, NT/Win32) |
1 | Amiga |
2 | VMS (or OpenVMS) |
3 | Unix |
4 | VM/CMS |
5 | Atari TOS |
6 | HPFS filesystem (OS/2, NT) |
7 | Macintosh |
8 | Z-System |
9 | CP/M |
10 | TOPS-20 |
11 | NTFS filesystem (NT) |
12 | QDOS |
13 | Acorn RISCOS |
255 | unknown |
Compressed Data
For compression method 8, the compressed data is stored in the deflate compressed data format. For raw deflate format, please see deflate format
GZIP footer
footer fields | size | description |
---|---|---|
CRC-32 Checksum (CRC32) | 32 bits | checksum value of the uncompressed data. |
Input Size (ISIZE) | 32 bits | size of the uncompressed input data modulo 2^32. |
reference
本文标题:zlib and gzip format
文章作者:Mr Bluyee
发布时间:2024-03-25
最后更新:2024-03-25
原始链接:https://www.mrbluyee.com/2024/03/25/zlib-and-gzip-format/
版权声明:The author owns the copyright, please indicate the source reproduced.