zlib and gzip format
Raw DEFLATE compressed data (RFC 1951) are typically written with a zlib or gzip wrapper encapsulating the data, by adding a header and footer. This provides stream identification and error detection that are not provided by the raw DEFLATE data.
ZLIB
ZLIB header
| header fields | size | description | 
|---|---|---|
| Compression method (CM) | 4 bits | CM = 8 (deflate with 32K window)  CM = 15 (reserved)  | 
| Compression information (CINFO) | 4 bits | CINFO=7 window size: 32K  Window = 1 << (CINFO + 8)  | 
| Flags (FLG) | 8 bits | flags, details see below | 
| Dictionary identifier (DICTID) | 32 bits | optional, if FLG.FDICT set.  DICTID = Alder32(Dictionary)  | 
| Dictionary | size could be checked by the DICTID | optional, if FLG.FDICT set. | 
Flags (FLG)
| bit fields | size | description | 
|---|---|---|
| check bits (FCHECK) | 5 bits | check bits for CMF and FLG | 
| preset dictionary (FDICT) | 1 bit | preset dictionary flag | 
| compression level (FLEVEL) | 2 bits | compression level | 
check bits(FCHECK) value must be such that CMF(CM + CINFO) and FLG, when viewed as a 16-bit unsigned integer stored in MSB order (CMF*256 + FLG), is a multiple of 31.
Simply, it should satisfy: (CMF*256 + FLG) mod 31 == 0
preset dictionary(FDICT) if set, a DICT dictionary identifier is present immediately after the FLG byte. The dictionary is a sequence of bytes which are initially fed to the compressor without producing any compressed output. DICT is the Adler-32 checksum of this sequence of bytes.
Compression level (FLEVEL) (for CM = 8)
| value | description | 
|---|---|
| 0 | compressor used fastest algorithm | 
| 1 | compressor used fast algorithm | 
| 2 | compressor used default algorithm | 
| 3 | compressor used maximum compression, slowest algorithm | 
Compressed Data
For compression method 8, the compressed data is stored in the deflate compressed data format. For raw deflate format, please see deflate format
ZLIB footer
| footer fields | size | description | 
|---|---|---|
| Adler-32 Checksum | 32 bits | checksum value of the uncompressed data  (excluding any dictionary data)  | 
reference
https://www.rfc-editor.org/rfc/rfc1950
https://en.wikipedia.org/wiki/Zlib
https://github.com/madler/zlib
GZIP
GZIP header
| header fields | size | description | 
|---|---|---|
| IDentification 1 (ID1) | 8 bits | fixed value ID1 = 31 (0x1f, \037) | 
| IDentification 2 (ID2) | 8 bits | fixed value ID2 = 139 (0x8b, \213) | 
| Compression method (CM) | 8 bits | CM = 0-7 (reserved), CM = 8 (deflate) | 
| Flags (FLG) | 8 bits | flags, details see below | 
| Modification TIME (MTIME) | 32 bits | Unix format of the most recent modification time | 
| Extra Flags (XFL) | 8 bits | Extra flags, details see below | 
| Operating System (OS) | 8 bits | Operating System | 
| XLEN | 16 bits | optional, if FLG.FEXTRA set. Extra field byte length | 
| Extra field | XLEN bytes | optional, if FLG.FEXTRA set. | 
| Original file name | zero-terminated | optional, if FLG.FNAME set. | 
| File comment | zero-terminated | optional, if FLG.FCOMMENT set. | 
| Header CRC16 | 16 bits | optional, if FLG.FHCRC set. | 
Flags (FLG)
| fields | bit | description | 
|---|---|---|
| FTEXT | bit 0 | If FTEXT is set, the file is probably ASCII text. | 
| FHCRC | bit 1 | If FHCRC is set, a CRC16 for the gzip header is present. | 
| FEXTRA | bit 2 | If FEXTRA is set, optional extra fields are present. | 
| FNAME | bit 3 | If FNAME is set, a zero-terminated original file name is present. | 
| FCOMMENT | bit 4 | If FCOMMENT is set, a zero-terminated file comment is present. | 
| reserved | bit 5-7 | Reserved FLG bits must be zero. | 
Extra Flags (XFL) (for CM = 8)
| value | description | 
|---|---|
| XFL = 2 | compressor used maximum compression, slowest algorithm. | 
| XFL = 4 | compressor used fastest algorithm. | 
Operating System (OS)
| value | description | 
|---|---|
| 0 | FAT filesystem (MS-DOS, OS/2, NT/Win32) | 
| 1 | Amiga | 
| 2 | VMS (or OpenVMS) | 
| 3 | Unix | 
| 4 | VM/CMS | 
| 5 | Atari TOS | 
| 6 | HPFS filesystem (OS/2, NT) | 
| 7 | Macintosh | 
| 8 | Z-System | 
| 9 | CP/M | 
| 10 | TOPS-20 | 
| 11 | NTFS filesystem (NT) | 
| 12 | QDOS | 
| 13 | Acorn RISCOS | 
| 255 | unknown | 
Compressed Data
For compression method 8, the compressed data is stored in the deflate compressed data format. For raw deflate format, please see deflate format
GZIP footer
| footer fields | size | description | 
|---|---|---|
| CRC-32 Checksum (CRC32) | 32 bits | checksum value of the uncompressed data. | 
| Input Size (ISIZE) | 32 bits | size of the uncompressed input data modulo 2^32. | 
reference
本文标题:zlib and gzip format
文章作者:Mr Bluyee
发布时间:2024-03-25
最后更新:2024-03-25
原始链接:https://www.mrbluyee.com/2024/03/25/zlib-and-gzip-format/
版权声明:The author owns the copyright, please indicate the source reproduced.