Base64 应用及原理解析

Base64由来：

Base64算法最早应用于解决电子邮件传输的问题。在早期，由于“历史问题”，电子邮件只允许ASCII码字符。如果要传输一封带有非ASCII码字符的电子邮件，当通过有“历史问题”的网关时就可能出现问题。这个网关很可能会对这个非ASCII码字符的二进制位做调整，即将这个非ASCII码的8位二进制码的最高位置为0。此时用户收到的邮件就会是一封纯粹的乱码邮件。基于这个原因产生了Base64算法。

Base64定义：

根据RFC2045、RFC4648规定，Base64由64个ASCII码字符组成，Base64目前有三种格式，RFC2045、RFC4648、RFC4648 URLSAFE。

RFC2045、RFC4648、RFC4648 URLSAFE区别：

· RFC2045和RFC4648中所用字符如表1，RFC2045规定，每76个字符为一行，每行末需添加一个回车换行符（\r\n）。

· RFC4648 URLSAFE中所用字符如表2，与RFC2045、RFC4648有两个字符不一样，+、/由-、替换，由于+和/在URL编码时，会出现丢失，所以提供了替换更为安全的-、。

Value	Encoding	Value	Encoding	Value	Encoding	Value	Encoding
0	A	17	R	34	i	51	z
1	B	18	S	35	j	52	0
2	C	19	T	36	k	53	1
3	D	20	U	37	l	54	2
4	E	21	V	38	m	55	3
5	F	22	W	39	n	56	4
6	G	23	X	40	o	57	5
7	H	24	Y	41	p	58	6
8	I	25	Z	42	q	59	7
9	J	26	a	43	r	60	8
10	K	27	b	44	s	61	9
11	L	28	c	45	t	62	+
12	M	29	d	46	u	63	/
13	N	30	e	47	v	padding	=
14	O	31	f	48	w
15	P	32	g	49	x
16	Q	33	h	50	y

表 1

Value	Encoding	Value	Encoding	Value	Encoding	Value	Encoding
0	A	17	R	34	i	51	z
1	B	18	S	35	j	52	0
2	C	19	T	36	k	53	1
3	D	20	U	37	l	54	2
4	E	21	V	38	m	55	3
5	F	22	W	39	n	56	4
6	G	23	X	40	o	57	5
7	H	24	Y	41	p	58	6
8	I	25	Z	42	q	59	7
9	J	26	a	43	r	60	8
10	K	27	b	44	s	61	9
11	L	28	c	45	t	62	-
12	M	29	d	46	u	63	_
13	N	30	e	47	v	padding	=
14	O	31	f	48	w
15	P	32	g	49	x
16	Q	33	h	50	y

表 2

演示示例：

JDK1.8版本开始，JDK默认提供Base64工具类，即java.util.Base64，其中对RFC2045、RFC4648、RFC4648 URLSAFE格式加解码，示例如下：

package com.securitit.serialize.bs64;

import java.util.Base64;

public class Base64Tester {

	public static void main(String[] args) throws Exception {
		String plainStr = null;
		String bs64Str = null;
		byte[] plainBts = null;

		// 原文内容.
		plainStr = "Hello Base64！Now this is testing RFC2045 Base64！Please see the result！";
		// RFC2045测试.
		bs64Str = Base64.getMimeEncoder().encodeToString(plainStr.getBytes("UTF-8"));
		System.out.println("RFC2045-Base64编码结果：");
		System.out.println(bs64Str);
		plainBts = Base64.getMimeDecoder().decode(bs64Str);
		plainStr = new String(plainBts, "UTF-8");
		System.out.println("RFC2045-Base64解码结果：");
		System.out.println(plainStr);
		
		System.out.println("===============================================================");
		// RFC4648测试.
		bs64Str = Base64.getEncoder().encodeToString(plainStr.getBytes("UTF-8"));
		System.out.println("RFC4648-Base64编码结果：");
		System.out.println(bs64Str);
		plainBts = Base64.getDecoder().decode(bs64Str);
		plainStr = new String(plainBts, "UTF-8");
		System.out.println("RFC4648-Base64解码结果：");
		System.out.println(plainStr);
		
		System.out.println("===============================================================");
		// RFC4648 URLSAFE测试.
		bs64Str = Base64.getUrlEncoder().encodeToString(plainStr.getBytes("UTF-8"));
		System.out.println("RFC4648 URLSAFE-Base64编码结果：");
		System.out.println(bs64Str);
		plainBts = Base64.getUrlDecoder().decode(bs64Str);
		plainStr = new String(plainBts, "UTF-8");
		System.out.println("RFC4648 URLSAFE-Base64解码结果：");
		System.out.println(plainStr);
	}

}

输出结果：

RFC2045-Base64编码结果：
SGVsbG8gQmFzZTY077yBTm93IHRoaXMgaXMgdGVzdGluZyBSRkMyMDQ1IEJhc2U2NO+8gVBsZWFz
ZSBzZWUgdGhlIHJlc3VsdO+8gQ==
RFC2045-Base64解码结果：
Hello Base64！Now this is testing RFC2045 Base64！Please see the result！
===============================================================
RFC4648-Base64编码结果：
SGVsbG8gQmFzZTY077yBTm93IHRoaXMgaXMgdGVzdGluZyBSRkMyMDQ1IEJhc2U2NO+8gVBsZWFzZSBzZWUgdGhlIHJlc3VsdO+8gQ==
RFC4648-Base64解码结果：
Hello Base64！Now this is testing RFC2045 Base64！Please see the result！
===============================================================
RFC4648 URLSAFE-Base64编码结果：
SGVsbG8gQmFzZTY077yBTm93IHRoaXMgaXMgdGVzdGluZyBSRkMyMDQ1IEJhc2U2NO-8gVBsZWFzZSBzZWUgdGhlIHJlc3VsdO-8gQ==
RFC4648 URLSAFE-Base64解码结果：
Hello Base64！Now this is testing RFC2045 Base64！Please see the result！

从输出可以很明显的看出各种格式的区别，每种数据格式都对应某种应用场景。RFC2045较合适邮件传输，RFC4648较适合针对文件、大数据的编码，RFC4648 URLSAFE较适合需要在URL中传输的数据。

源码分析：

Base64编码是将3个字节转换为4个字节，然后使用ASCII表示，大致过程如下图所示：

在这里插入图片描述

当3个字节转换为二进制后变为24位二进制，这24位二进制均分后变为4个字节，每个字节只有低6位是有效的，低6位的值范围是0-63，正好是本文开头的字典数据。

接下来，我们以JDK1.8的java.util.Base64为例，对源码进行简单分析。

实现基础：

实现基础就是本文开头两个字典，按照上图中的过程，转换后按照字典取值，拼接为字符串，即是Base64编码后的值。

基本方法：

java.util.Base64的源码相对比较简单，我们只对其中两个比较重要的方法encode0和decode0进行分析，这两个方法是编码和解码的主要方法，其他方法都是围绕这两个方法进行的简单封装和调用，在此不做赘述。

encode0：

encode0方法负责编码。

private int encode0(byte[] src, int off, int end, byte[] dst) {
    // 取得编码字典.
    char[] base64 = isURL ? toBase64URL : toBase64;
    // 源数据初始位置.
    int sp = off;
    // 对字节数组进行分组，三个字节为一组，计算可分组数量.
    int slen = (end - off) / 3 * 3;
    // 可完整的分组的数量.
    int sl = off + slen;
    // 若限定每行最大字符数，且最大分组数大于每行最大字符数对应的最大分组数.
    if (linemax > 0 && slen  > linemax / 4 * 3)
        // 使用每行最大字符数计算最大分组数.
        slen = linemax / 4 * 3;
    int dp = 0;
    // 处理源数据，直到数据处理完为止.
    while (sp < sl) {
        // 若未处理完，则s10为当前处理为止.
        int sl0 = Math.min(sp + slen, sl);
        for (int sp0 = sp, dp0 = dp ; sp0 < sl0; ) {
            // 取3个字节的低8位，组成一个23位的int.
            int bits = (src[sp0++] & 0xff) << 16 |
                (src[sp0++] & 0xff) <<  8 |
                (src[sp0++] & 0xff);
            // 0x3f二进制是0011 1111
            // 下面四个语句，针对24位，每6位与0x3f进行&操作，结果作为一个字节.
            dst[dp0++] = (byte)base64[(bits >>> 18) & 0x3f];
            dst[dp0++] = (byte)base64[(bits >>> 12) & 0x3f];
            dst[dp0++] = (byte)base64[(bits >>> 6)  & 0x3f];
            dst[dp0++] = (byte)base64[bits & 0x3f];
        }
        // 一次处理过后，对数据进行初始化.
        int dlen = (sl0 - sp) / 3 * 4;
        dp += dlen;
        sp = sl0;
        // CRF2045协议处理.
        if (dlen == linemax && sp < end) {
            for (byte b : newline){
                dst[dp++] = b;
            }
        }
    }
    // 对剩余的未到3个的字节处理.
    if (sp < end) {               // 1 or 2 leftover bytes
        int b0 = src[sp++] & 0xff;
        dst[dp++] = (byte)base64[b0 >> 2];
        if (sp == end) {
            dst[dp++] = (byte)base64[(b0 << 4) & 0x3f];
            // 若需要补位，则用=进行补位.
            if (doPadding) {
                dst[dp++] = '=';
                dst[dp++] = '=';
            }
        } else {
            // 取数据低8位.
            int b1 = src[sp++] & 0xff;
            dst[dp++] = (byte)base64[(b0 << 4) & 0x3f | (b1 >> 4)];
            dst[dp++] = (byte)base64[(b1 << 2) & 0x3f];
            // 若需要补位，则用=进行补位.
            if (doPadding) {
                dst[dp++] = '=';
            }
        }
    }
    return dp;
}

decode0：

decode0方法负责解码。

private int decode0(byte[] src, int sp, int sl, byte[] dst) {
    // 是RFC4648或RFC4648 URLSAFE.
    int[] base64 = isURL ? fromBase64URL : fromBase64;
    int dp = 0;
    int bits = 0;
    // 处理4个字节时，首个字节移动位数.
    int shiftto = 18;
    // 遍历源数据，直至源数据遍历结束.
    while (sp < sl) {
        // 逐个获取字节低8位.
        int b = src[sp++] & 0xff;
        // 低8位在字典中数值小于0时.
        if ((b = base64[b]) < 0) {
            // 包含=的填充.
            if (b == -2) {
                // =     shiftto==18 unnecessary padding
                // x=    shiftto==12 a dangling single x
                // x     
                // xx=   shiftto==6&&sp==sl missing last =
                // xx=y  shiftto==6 last is not =
                // 输入字节数组有错误的4字节结束单元.
                // 移位数量为18时，没必要填充.
                // 移位数量为12时，只有一个值，例如：x=.
                // 与无填充示例一起使用.
                // 移位数量为6时，且源数据已到末尾，丢失最后的=,
                // 移位数量为6时，最后一个字符不是=.
                if (shiftto == 6 && (sp == sl || src[sp++] != '=') ||
                    shiftto == 18) {
                    throw new IllegalArgumentException(
                        "Input byte array has wrong 4-byte ending unit");
                }
                break;
            }
            // 跳过RFC2045.
            if (isMIME)    // skip if for rfc2045
                continue;
            // 其它情况进行异常处理.
            else
                throw new IllegalArgumentException(
                "Illegal base64 character " +
                Integer.toString(src[sp - 1], 16));
        }
        // 拼接bits，将4个字节放一起.
        // 第一个字节左移18位.
        // 第二个字节左移12位.
        // 第三个字节左移6位.
        // 第四个字节不做移动.
        bits |= (b << shiftto);
        // 以6位差递减移位数量.
        shiftto -= 6;
        // 移位数量小于0时.
        if (shiftto < 0) {
            // 将拼接好的bits拆分为3个字节.
            dst[dp++] = (byte)(bits >> 16);
            dst[dp++] = (byte)(bits >>  8);
            dst[dp++] = (byte)(bits);
            // 恢复移位数量为18.
            shiftto = 18;
            // 4个字节拼接存储结果.
            bits = 0;
        }
    }
    // 当最后移位数量为6时，计算一个字节.
    if (shiftto == 6) {
        dst[dp++] = (byte)(bits >> 16);
    } else 
    // 当最后移位数量为0时，计算两个字节.
    if (shiftto == 0) {
        dst[dp++] = (byte)(bits >> 16);
        dst[dp++] = (byte)(bits >>  8);
    } else 
    // 当最后移位数量为12时，此时说明字节处理异常.    
    if (shiftto == 12) {
        // dangling single "x", incorrectly encoded.
        throw new IllegalArgumentException(
            "Last unit does not have enough valid bits");
    }
    // 未处理到结尾.
    while (sp < sl) {
        // 当是媒体类型. && 源数据指定位置字节对应字典数据小于0时.
        if (isMIME && base64[src[sp++]] < 0)
            continue;
        throw new IllegalArgumentException(
            "Input byte array has incorrect ending byte at " + sp);
    }
    return dp;
}

除了上面提供的方法，还提供了EncOutputStream、DecIInputSream用来针对流数据进行编码和解码，虽然接口和实现存在差异，但整体思路大致类似，各位若想知道具体原理，可查看源码详细分析。

注：文中源码均来自于JDK1.8版本，不同版本间可能存在差异。

如果有哪里有不明白或不清楚的内容，欢迎留言哦！

Base64 应用及原理解析

猜你喜欢