Computer | compressed data

I believe that many friends have seen JOJO for a word very familiar. (Killed)

 

 

Ah today to talk compression.

 

0

Save the file in bytes

 

In saying before compression mechanism, first look at the form of bytes stored in the file. File size reason to use KB, MB said, is because the file in bytes saved.

 

 

1

RLE algorithm

 

RLE algorithm is to use the "character × number of repetitions of" compressed manner, such as to deposit into the compressed string AAAAAABBCDDEEEEEF A6B2C1D2E5F1. We look at the character of the original 17 bytes long, and after compression is 12 bytes, 12 ÷ 17≈70%, thus compressing a success.

 

RLE algorithm should be said that the most simple compression algorithm, and in line with the first reaction when considering compression algorithms - merge duplicates. However, this algorithm does not have much practical application, the reason is very simple, the situation is repeated characters appear few and far between, and, for a 7 string ABCDEFG, will be saved as A1B1C1D1E1F1G1H1 a total of 14 bytes, the compression ratio has become the 100%.

 

Here's why mention RLE algorithm is that its logic is simple , easy to think of a certain reference value and really his wife too easy to write up.

 

2

Morse coding

 

When an algorithm in speaking, to talk about the Morse telegraph. Morse telegraph passed through the long and short text messages. In various war movie, Morse telegraph frequency of occurrence is very high.

 

If we thought the telegraph Morse applied to the coding, it is Morse code. The computer is the binary storage of data, then use the analogy short 0 points, 1 point long analogy? If so, then the question is, how to divide the different characters?

 

Indeed, the computer Limoersi common encoding an analog short point, 11 point analog long , long and short 0 separated, each separated character 00

 

 

In this way, each character there is a corresponding Morse code. Of course, this encoding may be varied to give different character encode different lengths according to the actual situation, as long as the receiver has a table and a corresponding method can reproduce the original content.

 

3

Huffman algorithm

 

Then there is the classic Huffman algorithm commonly used.


举个例子,一个文件内,有100个A,4个E,这时要压缩的话,把A赋予较小的摩尔斯编码,如1,而E可以给较长的编码以便较短的编码分配给其余较多的字符,如E对应110101101一共9位,这样,加上字符间隔,一共是1×100+9×4+2×104=344位,344位(如果不够8的整数倍,加上结束符然后补足8的倍数即可)。原文件是8×104=832位,344÷832≈41%,显然达到了很好的压缩效率。

 

因此,归纳一下,哈夫曼算法是为各压缩对象文件分别构造最佳的编码体系,并以该编码体系为基础来进行压缩的算法。

 

4

为啥表情包越来越糊

 

最后,来看看图片的压缩方式。最完整的图片是BMP格式,这种格式的图片完全没有被压缩,而JPEG、TIFF、GIF等格式都会用一些技法对图片进行压缩。压缩方式分两种,可逆压缩和非可逆压缩,可逆压缩能还原到压缩前状态,非可逆压缩则不行。

 

JPG是非可逆压缩,所以还原后的图片有部分是模糊的,而相当一部分的表情包经过网络的反复传播,经历了多次压缩解压缩的蹂躏后终于变成了那种很有味道的高糊图片。因此,每一张模糊的表情包图片都是表情包界的荣耀。

 

GIF是可逆压缩,但是还原后颜色信息会有一些丢失,导致图像模糊。

 


 

 

发布了16 篇原创文章 · 获赞 2 · 访问量 1729

Guess you like

Origin blog.csdn.net/chengduxiu/article/details/104400670