Common file format features

txt

Plain text file format that only text files without any identity and structure, content that is stored in the file format. Corresponding thereto is rich text, the text is not formatted, the control word, the control symbols, etc., can contain text image color.
The following is the content of the document viewed in winhex in a txt format

doc/docx

It is a rich text format. Is the file format of Microsoft office word software created. doc docx is added on the basis of xml, and is present in the form of a zip file. So you can unzip the docx file.
This is a docx document with the contents of the decompression software to open

open with winhex, whose header is 50 4B correspond ascii code for the PK. This is also a zip file header

png

Lossless compression of image formats.
Png file arrangement consists of a file header flag and a plurality of data blocks sequentially. 89 began to Hex, ascii this is beyond the scope, in order to prevent being parsed as a text file.
Each block consists of four parts, the first part represents a four-byte data portion length, type 4 bytes, information of specific data is not fixed, the last four bytes represent check code

wherein the data block is called INDR the data portion of particular concern
where the color depth (bits per pixel share) represents the color of the type of support, such as 2 ^ 04 supports four kinds.

Png file is opened with a winhex
wherein the blue part of the data portion of the frame portion occupied by a length D, the 13 bytes. While the data portion, i.e. the frame does brown 13 bytes.
Type frame portion of the green, i.e., the block is IHDR.
Brown frame portion, the first red and second red line width and height, i.e. decimal 95 107 5F 6B, see the detailed picture information and indeed 107 pixels 95 pixels.
Brown the underlined checksum.

jpg/jpeg

jpg和jpeg是一种格式。属于有损压缩的图片格式。有多个段组成,每个段由FF开头,之后的一个字节代表该段的类型。其中FFD8(图像起始位置)FFE0(文件详细信息)FFC0(框架起始位置)FFDA(扫描起始位置)FFD9(文件结束符)
用winhex打开一个jpg文件如下

jpeg图像压缩原理

  • 将图像分成多个 8*8的像素块。因为在8*8的像素的方格中,像素是非常相似的。
  • 对每一个像素块进行离散余弦逆变换(DCT),产生了一个新的8*8的块。在每一个块中,像素点间的差异大则弦波频率高,称为高频区,差异小则弦波频率低,称为低频区。通过DCT变换后,新块中的高频系数越接近右下,低频系数越接近左上
  • 然后进行量化:因为得到的新块还不能直接进行压缩。因为人对高频识别较差,所以可以去除大量高频系数,所以将新块除以一个特定的矩阵,再取整,使高频系数基本为0
  • 通过熵编码等对图像进行压缩

供参考的两篇文章
https://blog.csdn.net/ljh618625/article/details/102760728
(https://www.cnblogs.com/Arvin-JIN/p/9133745.html)[https://www.cnblogs.com/Arvin-JIN/p/9133745.html]

bmp

不采用任何其它的压缩方式,保留图像的原汁原味
由文件头、信息头、调色板、数据四部分组成
文件头占14个字节

字节数 意义
2个字节 位图的类别
4个字节 位图的大小
4个字节 两个保留位
4个字节 文件头到实际的数据之间的偏移量(以小端存储)

信息头通常40个字节

字节数 意义
4个字节 文件大小
4字节 宽度
4字节 高度
2个字节 目标设备的颜色平面数
2个字节 一个像素点占几位
4个字节 压缩类型(一般为0不压缩)
4个字节 图像的大小
4个字节 水平分辨率
4个字节 垂直分辨率
4个字节 位图使用调色板的实际索引数(0是使用所有索引)
4个字节 对图像显示有重要影响的索引数

通过高度的正负(负值以补码形式表示)可以判断位图是倒向(正)还是正向(负)

gif

无损压缩。并不存储颜色本身,而是该点的颜色对应的颜色列表的索引值。可以存储多个图像和控制图像的行为的控制块,从而实现动画效果。由文件头、gif数据流、终结器组成。

在kali中通过file命令可以判断文件类型
常见文件头,文件尾总结。这张图截自一个文档,出处找不到了。

Guess you like

Origin www.cnblogs.com/Qi-Lin/p/12332169.html