zlib and gzip format
Raw DEFLATE compressed data (RFC 1951) are typically written with a zlib or gzip wrapper encapsulating the data, by adding a header and footer. This provides stream identification and error detection that are not provided by the raw DEFLATE data.
ZLIB
ZLIB header
| header fields | size | description |
|---|---|---|
| Compression method (CM) | 4 bits | CM = 8 (deflate with 32K window) CM = 15 (reserved) |
| Compression information (CINFO) | 4 bits | CINFO=7 window size: 32K Window = 1 << (CINFO + 8) |
| Flags (FLG) | 8 bits | flags, details see below |
| Dictionary identifier (DICTID) | 32 bits | optional, if FLG.FDICT set. DICTID = Alder32(Dictionary) |
| Dictionary | size could be checked by the DICTID | optional, if FLG.FDICT set. |
Flags (FLG)
| bit fields | size | description |
|---|---|---|
| check bits (FCHECK) | 5 bits | check bits for CMF and FLG |
| preset dictionary (FDICT) | 1 bit | preset dictionary flag |
| compression level (FLEVEL) | 2 bits | compression level |
check bits(FCHECK) value must be such that CMF(CM + CINFO) and FLG, when viewed as a 16-bit unsigned integer stored in MSB order (CMF*256 + FLG), is a multiple of 31.
Simply, it should satisfy: (CMF*256 + FLG) mod 31 == 0
preset dictionary(FDICT) if set, a DICT dictionary identifier is present immediately after the FLG byte. The dictionary is a sequence of bytes which are initially fed to the compressor without producing any compressed output. DICT is the Adler-32 checksum of this sequence of bytes.
Compression level (FLEVEL) (for CM = 8)
| value | description |
|---|---|
| 0 | compressor used fastest algorithm |
| 1 | compressor used fast algorithm |
| 2 | compressor used default algorithm |
| 3 | compressor used maximum compression, slowest algorithm |
Compressed Data
For compression method 8, the compressed data is stored in the deflate compressed data format. For raw deflate format, please see deflate format
ZLIB footer
| footer fields | size | description |
|---|---|---|
| Adler-32 Checksum | 32 bits | checksum value of the uncompressed data (excluding any dictionary data) |
reference
https://www.rfc-editor.org/rfc/rfc1950
https://en.wikipedia.org/wiki/Zlib
https://github.com/madler/zlib
GZIP
GZIP header
| header fields | size | description |
|---|---|---|
| IDentification 1 (ID1) | 8 bits | fixed value ID1 = 31 (0x1f, \037) |
| IDentification 2 (ID2) | 8 bits | fixed value ID2 = 139 (0x8b, \213) |
| Compression method (CM) | 8 bits | CM = 0-7 (reserved), CM = 8 (deflate) |
| Flags (FLG) | 8 bits | flags, details see below |
| Modification TIME (MTIME) | 32 bits | Unix format of the most recent modification time |
| Extra Flags (XFL) | 8 bits | Extra flags, details see below |
| Operating System (OS) | 8 bits | Operating System |
| XLEN | 16 bits | optional, if FLG.FEXTRA set. Extra field byte length |
| Extra field | XLEN bytes | optional, if FLG.FEXTRA set. |
| Original file name | zero-terminated | optional, if FLG.FNAME set. |
| File comment | zero-terminated | optional, if FLG.FCOMMENT set. |
| Header CRC16 | 16 bits | optional, if FLG.FHCRC set. |
Flags (FLG)
| fields | bit | description |
|---|---|---|
| FTEXT | bit 0 | If FTEXT is set, the file is probably ASCII text. |
| FHCRC | bit 1 | If FHCRC is set, a CRC16 for the gzip header is present. |
| FEXTRA | bit 2 | If FEXTRA is set, optional extra fields are present. |
| FNAME | bit 3 | If FNAME is set, a zero-terminated original file name is present. |
| FCOMMENT | bit 4 | If FCOMMENT is set, a zero-terminated file comment is present. |
| reserved | bit 5-7 | Reserved FLG bits must be zero. |
Extra Flags (XFL) (for CM = 8)
| value | description |
|---|---|
| XFL = 2 | compressor used maximum compression, slowest algorithm. |
| XFL = 4 | compressor used fastest algorithm. |
Operating System (OS)
| value | description |
|---|---|
| 0 | FAT filesystem (MS-DOS, OS/2, NT/Win32) |
| 1 | Amiga |
| 2 | VMS (or OpenVMS) |
| 3 | Unix |
| 4 | VM/CMS |
| 5 | Atari TOS |
| 6 | HPFS filesystem (OS/2, NT) |
| 7 | Macintosh |
| 8 | Z-System |
| 9 | CP/M |
| 10 | TOPS-20 |
| 11 | NTFS filesystem (NT) |
| 12 | QDOS |
| 13 | Acorn RISCOS |
| 255 | unknown |
Compressed Data
For compression method 8, the compressed data is stored in the deflate compressed data format. For raw deflate format, please see deflate format
GZIP footer
| footer fields | size | description |
|---|---|---|
| CRC-32 Checksum (CRC32) | 32 bits | checksum value of the uncompressed data. |
| Input Size (ISIZE) | 32 bits | size of the uncompressed input data modulo 2^32. |
reference
本文标题:zlib and gzip format
文章作者:Mr Bluyee
发布时间:2024-03-25
最后更新:2024-03-25
原始链接:https://www.mrbluyee.com/2024/03/25/zlib-and-gzip-format/
版权声明:The author owns the copyright, please indicate the source reproduced.