編碼原理
Base64編碼就是把3個8位的二進制數據用4個ASCII可見字符展示出來。編碼時,將3個8位二進制碼重新分組成4個6位的二進制碼,不足6位的,右側補零,然后這4個6位的二進制碼高位補兩個0,形成4個8位的字節數據,然后取每個字節的十進制值在編碼表中對應的字符作為最終的編碼數據。Base64編碼后的數據長度是源數據長度的4/3。標準的Base64編碼要求最終的數據長度是4字節的整數倍,不足4字節的倍數時要用填充字符補齊,填充字符為等號“=”。編碼表如下
0 A 1 B 2 C 3 D 4 E 5 F 6 G 7 H
8 I 9 J 10 K 11 L 12 M 13 N 14 O 15 P
16 Q 17 R 18 S 19 T 20 U 21 V 22 W 23 X
24 Y 25 Z 26 a 27 b 28 c 29 d 30 e 31 f
32 g 33 h 34 i 35 j 36 k 37 l 38 m 39 n
40 o 41 p 42 q 43 r 44 s 45 t 46 u 47 v
48 w 49 x 50 y 51 z 52 0 53 1 54 2 55 3
56 4 57 5 58 6 59 7 60 8 61 9 62 + 63 /
例如ASCII碼A的Base64編碼過程為
字符:A
ASCII碼:65
二進制: 0100 0001
重新分組:010000 01
低位補零:010000 010000
高位補零:00010000 00010000
轉十進制:16 16
對應字符:Q Q
填充字符:Q Q = =
最終結果:QQ==
代碼實現
使用Bouncy Castle實現
下面的代碼使用開源軟件Bouncy Castle實現Base64編解碼,使用的版本是1.56。
import java.io.UnsupportedEncodingException;
import org.bouncycastle.util.encoders.Base64;
public class Base64TestBC {
public static void main(String[] args)
throws UnsupportedEncodingException {
// 編碼
byte data[] = "A".getBytes();
byte[] encodeData = Base64.encode(data);
String encodeStr = Base64.toBase64String(data);
System.out.println(new String(encodeData, "UTF-8"));
System.out.println(encodeStr);
// 解碼
byte[] decodeData = Base64.decode(encodeData);
byte[] decodeData2 = Base64.decode(encodeStr);
System.out.println(new String(decodeData, "UTF-8"));
System.out.println(new String(decodeData2, "UTF-8"));
}
}
程序輸出為
QQ==
QQ==
A
A
使用Apache Commons Codec實現
下面的代碼使用開源軟件Apache Commons Codec實現Base64編解碼,使用的版本是1.10。
import java.io.UnsupportedEncodingException;
import org.apache.commons.codec.binary.Base64;
public class Base64TestCC {
public static void main(String[] args)
throws UnsupportedEncodingException {
// 編碼
byte data[] = "A".getBytes();
byte[] encodeData = Base64.encodeBase64(data);
String encodeStr = Base64.encodeBase64String(data);
System.out.println(new String(encodeData, "UTF-8"));
System.out.println(encodeStr);
// 解碼
byte[] decodeData = Base64.decodeBase64(encodeData);
byte[] decodeData2 = Base64.decodeBase64(encodeStr);
System.out.println(new String(decodeData, "UTF-8"));
System.out.println(new String(decodeData2, "UTF-8"));
}
}
源碼分析
Bouncy Castle實現源碼分析
Bouncy Castle實現Base64編解碼的方法和其實現Hex編解碼的方法類似,源碼是org.bouncycastle.util.encoders.Base64Encoder類,實現編碼時首先定義了一個編碼表和填充字符
protected final byte[] encodingTable =
{
(byte)'A', (byte)'B', (byte)'C', (byte)'D',
(byte)'E', (byte)'F', (byte)'G', (byte)'H',
(byte)'I', (byte)'J', (byte)'K', (byte)'L',
(byte)'M', (byte)'N', (byte)'O', (byte)'P',
(byte)'Q', (byte)'R', (byte)'S', (byte)'T',
(byte)'U', (byte)'V', (byte)'W', (byte)'X',
(byte)'Y', (byte)'Z', (byte)'a', (byte)'b',
(byte)'c', (byte)'d', (byte)'e', (byte)'f',
(byte)'g', (byte)'h', (byte)'i', (byte)'j',
(byte)'k', (byte)'l', (byte)'m', (byte)'n',
(byte)'o', (byte)'p', (byte)'q', (byte)'r',
(byte)'s', (byte)'t', (byte)'u', (byte)'v',
(byte)'w', (byte)'x', (byte)'y', (byte)'z',
(byte)'0', (byte)'1', (byte)'2', (byte)'3',
(byte)'4', (byte)'5', (byte)'6', (byte)'7',
(byte)'8', (byte)'9', (byte)'+', (byte)'/'
};
protected byte padding = (byte)'=';
然后編碼的代碼如下,首先依次處理連續的3字節的數據,因為連續的3個字節可以完整的轉換為4個字節的數據。最后處理末尾的字節,末尾的字節分為3種情況,如果是字節數正好是3的倍數,即末尾沒有多余的字節,不作處理。如果末尾剩余1個字節,那么需要補兩個填充字符,如果末尾有2個字節,那么需要補1個填充字符
public int encode(
byte[] data,
int off,
int length,
OutputStream out)
throws IOException
{
int modulus = length % 3;
int dataLength = (length - modulus);
int a1, a2, a3;
for (int i = off; i < off + dataLength; i += 3)
{
a1 = data[i] & 0xff;
a2 = data[i + 1] & 0xff;
a3 = data[i + 2] & 0xff;
out.write(encodingTable[(a1 >>> 2) & 0x3f]);
out.write(encodingTable[((a1 << 4) | (a2 >>> 4)) & 0x3f]);
out.write(encodingTable[((a2 << 2) | (a3 >>> 6)) & 0x3f]);
out.write(encodingTable[a3 & 0x3f]);
}
/*
* process the tail end.
*/
int b1, b2, b3;
int d1, d2;
switch (modulus)
{
case 0: /* nothing left to do */
break;
case 1:
d1 = data[off + dataLength] & 0xff;
b1 = (d1 >>> 2) & 0x3f;
b2 = (d1 << 4) & 0x3f;
out.write(encodingTable[b1]);
out.write(encodingTable[b2]);
out.write(padding);
out.write(padding);
break;
case 2:
d1 = data[off + dataLength] & 0xff;
d2 = data[off + dataLength + 1] & 0xff;
b1 = (d1 >>> 2) & 0x3f;
b2 = ((d1 << 4) | (d2 >>> 4)) & 0x3f;
b3 = (d2 << 2) & 0x3f;
out.write(encodingTable[b1]);
out.write(encodingTable[b2]);
out.write(encodingTable[b3]);
out.write(padding);
break;
}
return (dataLength / 3) * 4 + ((modulus == 0) ? 0 : 4);
}
解碼的方法同樣是首先構建解碼表,解碼表是一個128位數組,每個位置代表對應的ASCII碼,該位置上的值表示該ASCII碼在編碼表中的序號。具體到Base64的解碼表,每個編碼表上的可見字符,在解碼表中其ASCII碼對應的十進制位置上的值就是其編碼的序號,比如編碼表中數字0對應的字符是A,而A的ASCII碼是65,那么解碼表的第65個位置上的值就是0,其他的值都是-1。生成解碼表的源碼如下
protected final byte[] decodingTable = new byte[128];
protected void initialiseDecodingTable()
{
for (int i = 0; i < decodingTable.length; i++)
{
decodingTable[i] = (byte)0xff;
}
for (int i = 0; i < encodingTable.length; i++)
{
decodingTable[encodingTable[i]] = (byte)i;
}
}
解碼表實際上是這樣的(不可見字符統一用空白表示)
-1 -1 -1 -1 -1 -1 -1 -1
-1 -1 -1 -1 -1 -1 -1 -1
-1 -1 -1 -1 -1 -1 -1 -1
-1 -1 -1 -1 -1 -1 -1 -1
-1 ! -1 " -1 # -1 $ -1 % -1 & -1 ' -1
( -1 ) -1 * -1 + 62 , -1 - -1 . -1 / 63
0 52 1 53 2 54 3 55 4 56 5 57 6 58 7 59
8 60 9 61 : -1 ; -1 < -1 = -1 > -1 ? -1
@ -1 A 0 B 1 C 2 D 3 E 4 F 5 G 6
H 7 I 8 J 9 K 10 L 11 M 12 N 13 O 14
P 15 Q 16 R 17 S 18 T 19 U 20 V 21 W 22
X 23 Y 24 Z 25 [ -1 \ -1 ] -1 ^ -1 _ -1
` -1 a 26 b 27 c 28 d 29 e 30 f 31 g 32
h 33 i 34 j 35 k 36 l 37 m 38 n 39 o 40
p 41 q 42 r 43 s 44 t 45 u 46 v 47 w 48
x 49 y 50 z 51 { -1 | -1 } -1 ~ -1 -1
解碼的過程實際上就是獲取連續4個字符,取解碼表中對應的值,都去掉高兩位,則剩余24個二進制位,然后將這個24個二進制碼重組成3個字節作為解碼的輸出。對于最后的4個字符,要判斷是否有填充字符,如果有填充字符,則作相應的處理。源碼如下:
public int decode(
byte[] data,
int off,
int length,
OutputStream out)
throws IOException
{
byte b1, b2, b3, b4;
int outLen = 0;
int end = off + length;
while (end > off)
{
if (!ignore((char)data[end - 1]))
{
break;
}
end--;
}
int i = off;
int finish = end - 4;
i = nextI(data, i, finish);
while (i < finish)
{
b1 = decodingTable[data[i++]];
i = nextI(data, i, finish);
b2 = decodingTable[data[i++]];
i = nextI(data, i, finish);
b3 = decodingTable[data[i++]];
i = nextI(data, i, finish);
b4 = decodingTable[data[i++]];
if ((b1 | b2 | b3 | b4) < 0)
{
throw new IOException("invalid "
+ "characters encountered in base64 data");
}
out.write((b1 << 2) | (b2 >> 4));
out.write((b2 << 4) | (b3 >> 2));
out.write((b3 << 6) | b4);
outLen += 3;
i = nextI(data, i, finish);
}
outLen += decodeLastBlock(out, (char)data[end - 4],
(char)data[end - 3], (char)data[end - 2],
(char)data[end - 1]);
return outLen;
}
private boolean ignore(char c)
{
return (c == '\n' || c =='\r' || c == '\t' || c == ' ');
}
private int nextI(byte[] data, int i, int finish)
{
while ((i < finish) && ignore((char)data[i]))
{
i++;
}
return i;
}
private int decodeLastBlock(OutputStream out, char c1,
char c2, char c3, char c4) throws IOException
{
byte b1, b2, b3, b4;
if (c3 == padding)
{
b1 = decodingTable[c1];
b2 = decodingTable[c2];
if ((b1 | b2) < 0)
{
throw new IOException("invalid characters "
+ "encountered at end of base64 data");
}
out.write((b1 << 2) | (b2 >> 4));
return 1;
}
else if (c4 == padding)
{
b1 = decodingTable[c1];
b2 = decodingTable[c2];
b3 = decodingTable[c3];
if ((b1 | b2 | b3) < 0)
{
throw new IOException("invalid characters"
+ " encountered at end of base64 data");
}
out.write((b1 << 2) | (b2 >> 4));
out.write((b2 << 4) | (b3 >> 2));
return 2;
}
else
{
b1 = decodingTable[c1];
b2 = decodingTable[c2];
b3 = decodingTable[c3];
b4 = decodingTable[c4];
if ((b1 | b2 | b3 | b4) < 0)
{
throw new IOException("invalid characters"
+ " encountered at end of base64 data");
}
out.write((b1 << 2) | (b2 >> 4));
out.write((b2 << 4) | (b3 >> 2));
out.write((b3 << 6) | b4);
return 3;
}
}
從代碼中可以看到,在解碼時會忽略首、尾、中間的空白。
Apache Commons Codec的實現
Apache Commons Codec的實現較復雜,該實現抽象出一個BaseNCodec抽象類用以同時支持Base32和Base64編解碼,Base64編解碼的實現類是org.apache.commons.codec.binary.Base64,編碼的實現也是定義了編碼表,由于Apache Commons Codec的Base64類同時支持UrlBase64編碼,所以定義了兩個編碼表,本文暫不分析這部分代碼。
Base64編碼的分塊
標準的Base64編碼要求每76個字符后面加回車換行符(\r\n),一行無論是否夠76個字符,末尾都要加回車換行。Bouncy Castle沒有實現該功能,而Apache Commons Codec實現了該功能。