Huffman树压缩和解压文件

Huffman树

Huffman树：以静态三叉链的存储结构建立的二叉树
在这里插入图片描述 Huffman树是一个带权路径长度最小的二叉树，又称最优二叉树

Huffman树的构造方法

①将每个结点都看作是一个树；
②选择两个根结点值最小的二叉树，构造一个新的二叉树，直至剩一个树为止。
在这里插入图片描述

Huffman树的存储（静态三叉链，n为叶子结点个数）

在这里插入图片描述

Huffman编码

前缀编码：对每一个字符规定一个0，1串作为其代码，并要求任一字符的代码都不是其它字符代码的前缀。
哈夫曼编码：依据权值构造哈夫曼树，根据此树得到字符集的二进制前缀编码。哈夫曼编码是一种不等长的二进制编码。其特点是字符的编码长度和字符的使用频率成反比。
如：
在这里插入图片描述

Huffman树实例

①建立存储结点

class HuffmanNode{
 	//权重
 	public int weight;
 	//父结点
 	public int pNode;
 	//左孩子
 	public int lChild;
 	//右孩子
 	public int rChild;
 
 	public HuffmanNode() {
  		this.weight = 0;
  		this.pNode = this.lChild = this.rChild = -1;
 	}
}

②预先定义

//ASCII码共256种字符
 private final int MAX_NUMBER = 256;
 //静态二叉树
 private HuffmanNode[] huffMan;
 //权重
 private int[] weight;
 //存储Huffman编码
 private String[] hfCode;
//初始化
 public HuffmanTree(){
  	this.huffMan = new HuffmanNode[2*MAX_NUMBER - 1];
  	this.weight = new int[MAX_NUMBER];
  	this.hfCode = new String[MAX_NUMBER];
 }

③读入文件，计算权值

此处根据ASCII码来对权值进行计算，其静态二叉链长度为256*2-1

/**
  * 根据文件路径读入文件计算出权重信息
  * @param file
  */
 public void file2Weight(String infile) {
  	try {
   		BufferedReader bReader = new BufferedReader(new FileReader(infile));
   		int c;  //以ASCII码的形式接收字符
   		while((c = bReader.read()) != -1) {
    			weight[c]++;
   		} 
   		bReader.close();
  	} catch (Exception e) {
   		e.printStackTrace();
  	}  
 }

④根据权值构建Huffman树

这一步的重点就是在已有的树中找到两个权值最小的结点

/**
  * 构建哈夫曼树
  */
 public void createHuffmanTree() {
  	int index = 0;
 	//将权重存入Huffman树
  	for(; index< this.MAX_NUMBER; index++) {
   		huffMan[index] = new HuffmanNode();
   		huffMan[index].weight = weight[index];
  	}
  	//对Huffman树进行更新
  	for(;index < 2*this.MAX_NUMBER-1; index++) {
   		huffMan[index] = new HuffmanNode();
   		//右子树等于左子树
   		int rchild = getMinWeight(index);
   		int lchild = getMinWeight(index);
   		//设置父结点
   		huffMan[rchild].pNode = index;
   		huffMan[lchild].pNode = index;
   		//创造父结点
   		huffMan[index].weight = huffMan[rchild].weight + huffMan[lchild].weight;
   		huffMan[index].lChild = lchild;
   		huffMan[index].rChild = rchild;
  	}
 }

从表中取出最小结点：

/**
  * 在已经存在的Huffman结点中找到最小的
  * @return 返回权重最小的结点
  */
 private int getMinWeight(int stop) {
  	int move = 0;
  	int minWeight = 99999;       //赋予较大的值
  	int min = 0;                    //记录最小的点
  	while(move < stop) {
   		if(huffMan[move].pNode == -1) {
    			if(huffMan[move].weight < minWeight) {
     				minWeight = huffMan[move].weight;
     				min = move;
    			}     
   		} 
   		move++;
  	}  
  	//进行标记
  	huffMan[min].pNode = -2; 
  	return min;
 }

⑤根据Huffman树得出Huffman编码

借助栈的特性，从叶子结点回溯到根结点，对回溯路径进行存储，再输出即编码

/**
  * 根据HuffMan树提取HuffMan码
  * 从数据端开始，向根回溯，用栈进行存储
  */
 public void createHuffmanCode() {
  	int satck[] = new int[this.MAX_NUMBER];
  	int top = 0;
  	int move,parent;
  	for(int i=0; i < this.MAX_NUMBER; i++) {
   		move = i;
   		//之前我设置的默认父结点为-1，此处可用于判断
   		while((parent = huffMan[move].pNode) != -1) {
    			//左孩子为零，右孩子为一
    			if(huffMan[parent].lChild == move) {
     				satck[top++] = 0;
    			}else {
     				satck[top++] = 1;
    			}
    			move = parent; 
   		}
   		//出栈，拼接（编码）
   		StringBuilder sBuilder = new StringBuilder();
   		for(int j = top-1; j >= 0; j--) {
    			sBuilder.append(satck[j]);
   		}
   		top = 0;
   		this.hfCode[i] = sBuilder.toString();
  	}
 }

⑥再次遍历文件，将对应字符转为Huffman编码

在此处有一个问题：因为Huffman码是以String数组存储的，即每一位都是一个字符，占1个字节，8位的空间；而在转换过程中每一个字符都会变成一串编码，于是文件在压缩后并没有变小，反而变大了。
所以，我们要将每一位都变成一个字节类型，然后进行存储。
但是，java中并无直接以字节进行传输的IO流、于是乎，需将每8位字节转为一个byte类型，然后通过字节流的byte数组进行写入。此时要注意的是，字符串最后可能未满8位需在低位补1（防止100等直接转为0，导致数据丢失）。

/**
  * 对文件按照huffman编码进行编码,并写入文件
  * @param infile   源文件
  * @return 压缩文件地址
  */
 public String condenseFile(String infile) {
  	String newFilename = null;
  	try {
   		BufferedReader bReader = new BufferedReader(new FileReader(infile));
   		int c;
   		StringBuilder sBuilder = new StringBuilder();
   		while((c = bReader.read()) != -1) {
      			sBuilder.append(hfCode[c]);
		   }
   		byte[] data = new byte[sBuilder.toString().length()/8+1];
   		int exceed = str2ByteArray(sBuilder.toString(), data);
   		newFilename = infile.substring(0,infile.indexOf("."))+"press"+exceed+infile.substring(infile.indexOf("."));
   		BufferedOutputStream boStream = new BufferedOutputStream(
            						new FileOutputStream(newFilename));
   		boStream.write(data); 
   		boStream.flush();
   		bReader.close();
   		boStream.close();
  	} catch (Exception e) {
   		e.printStackTrace();
  	}  
  	return newFilename;
 }

数据转换：此时还需注意byte是有符号类型，最高位为符号位，当数据末尾恰好满8位也应进行判断，否则会造成数组越界。

/**
     * 将字符串转为byte数组，并返回填充长度
     * @param str  字符串
     * @param data  byte数组
     * @return  填充长度
     */
 private int str2ByteArray(String str,byte data[]) {
  	//计算数组长度
  	int result = 0;
  	int cursor = 0; //设置游标
  	char[] ch = new char[8];
  	//每八位拼出一个,有符号byte，对于最后，若不足八位，后面添1
  	for(int i=0; i < data.length-1; i++) {
   		ch = str.substring(cursor, cursor+8).toCharArray();
   			for(int j=1; j < ch.length; j++) {
    				result += java.lang.Math.pow(2, 8-1-j)*(ch[j]-48);
   			}
   		//判断符号位
   		if(ch[0]-48 == 1) {
   			 result = 0 - result;
   		}
   		data[i] = (byte)result;
   		//移动游标,结果置零
   		cursor = cursor + 8;
   		result = 0;
  	}
  	//对于最后，若不足八位，后面添1(防止10变为0)
  	if(cursor == str.length()) { 
   		//若恰好够八位
   		data[data.length-1] = 0;
   		return 8;   
  	}else {
   		ch = str.substring(cursor).toCharArray();
   		for(int j=1; j < str.length()-cursor; j++) {
    			result += java.lang.Math.pow(2, 8-1-j)*(ch[j]-48);
   		}
   		for(int k=str.length()-cursor; k<8; k++) {
    			result += java.lang.Math.pow(2, 8-1-k);
   		}
   		if(ch[0]-48 == 1) { 
    		result = 0 - result;
   		}
   		data[data.length-1] = (byte)result;
   		return 8-(str.length()-cursor);
  	}  
 }

PS：在该处我使用了一个取巧的办法，将最后填补的位数放在了文件名中

⑦遍历压缩文件，根据Huffman树解码

解码方法：读文件会得到一个byte类型数据，然后转化为8位，取一位跟Huffman树做比对，0向左孩子走，1向右孩子走，到叶子结点就输出。
此处对文件名中的信息进行提取。

/**
  * 对压缩后的文件进行解压
  * @param unPressFile  解压目的地址
  */
 public void recoverFile(String pressFile, String unPressFile) {
  	try {
   		BufferedWriter bWriter = new BufferedWriter(new FileWriter(unPressFile));
   		BufferedInputStream biStream = new BufferedInputStream(new FileInputStream(pressFile));
   		int size = Integer.parseInt(pressFile.substring(pressFile.indexOf("press")+5, pressFile.indexOf("press")+6));
   		int c,judge;
   		int move = 2*this.MAX_NUMBER - 2;  //从哈夫曼树的最顶端开始查询
   		c = biStream.read();
   		int length = 8;   //数组中存储的实际有用数据
   		while(c != -1) {
    			judge = biStream.read();
    			//判断是否为结尾
    			if(judge == -1) {
     				length = length - size;
    			}
    			c = (byte)c;
    			int[] data= byte2Int(c);
    			//需先走树再判断，不然最后一位取不到
   			for (int i = 0; i < length; i++) {
     				//0走左树，1走右树
     				if(data[i] == 0) {
      					move = huffMan[move].lChild;
     				}else {
      					move = huffMan[move].rChild;
     				}
     				if(huffMan[move].lChild == -1 && huffMan[move].rChild == -1 ) {
      					bWriter.write((char)move);
      					bWriter.flush();
      					move = 2*this.MAX_NUMBER - 2; //重置     
     				}
    			}
    			c = judge;
   		}
   		biStream.close();
   		bWriter.close();
  	} catch (Exception e) {
   		e.printStackTrace();
  	}
 }

解压文件时数据转换：即将byte类型转换为01码```

/**
  * 将一个byte长的数转为二进制
  * @param num byte数
  * @return int数组
  */
 public int[] byte2Int(int num) {
  	//返回结果
  	int[] result = {0,0,0,0,0,0,0,0};
  	//栈
  	int[] stack = {-1,-1,-1,-1,-1,-1,-1,-1};
  	int top = 0;
  	//有符号
  	if(num < 0) {
   		result[0] = 1;
   		num = 0 - num;
  	}
  	//入栈
  	while(num != 0) {
   		stack[top++] = num%2;
   		num = num/2;
  	}
  	//将栈中数据存入结果数组
  	int move = 7;   //结果数组游标
  	for(int i=0; i < top; i++) {
   		result[move--] = stack[i];
  	} 
  	return result;
 }

⑧主函数测试

	HuffmanTree hfTree = new HuffmanTree();
  	String inFile = "Test/Hufin.txt";
  	String pressFile;
  	String unPressFile= "Test/Huffmanunpress.txt";
  
  	//①建立权值表
  	hfTree.file2Weight(inFile);
  	//②建立Huffman树
  	hfTree.createHuffmanTree();
 	 //③求得huffman编码
  	hfTree.createHuffmanCode();  
  	//④根据编码压缩文件
  	pressFile = hfTree.condenseFile(inFile);
  	System.out.println("压缩后地址为："+pressFile);
  	//⑤根据编码解压文件
  	hfTree.recoverFile(pressFile,unPressFile);

测试情况：
控制台输出：
在这里插入图片描述
文件内容比对：

文件大小比对：