Use java.util.Base64 of java8 to report "java.lang.IllegalArgumentException: Illegal base64 character d"

Reference original: https://blog.csdn.net/java_4_ever/article/details/80978089

I want to thank you again for the originality. I also found a solution to the problem, but I didn't understand the reason. I learned the article above.

Is the problem discovered after the production went online:

java.lang.IllegalArgumentException: Illegal base64 character d
        at java.util.Base64$Decoder.decode0(Base64.java:714) ~[na:1.8.0_45]
        at java.util.Base64$Decoder.decode(Base64.java:526) ~[na:1.8.0_45]
        at java.util.Base64$Decoder.decode(Base64.java:549) ~[na:1.8.0_45]

Originally from the production code, sun.misc.BASE64Decoder/BASE64Encoder has been used, because these two classes are not official classes, sonar scanning, maven compilation, and some code specification plug-ins will cause some warning prompts, etc., java8 and Provided the official java.util.Base64. I had code cleanliness, but I was impulsive and immediately started to do it. After a while, the test went online and caused a bug.

Here is why there is no test, because just the base64 method is replaced, you will feel that there is no problem, and it is not complicated to think about it. The other is that I wrote a test case that uses java.util.Base64 for Encoder. Use java.util.Base64 to decode the data after the Encoder. There is no problem in the test, and the code is considered ok. But the problem is precisely here! Because the actual operation in the production environment is not consistent with my case, in the production we are accessing the base64 data of the partner organization to decode, but the other party is not using the Base64 encoding of java8, so an exception occurred!

Post my solution here: original use

Base64.getDecoder().decode() 修改为 Base64.getMimeDecoder().decode()

Overview
Base64 is a string encoding format that uses 64 characters of AZ, az, 0-9, "+" and "/" to encode original characters (and the pad character "="). A character itself is a byte, that is, 8 bits, and a character encoded by base64 can only represent 6 bits of information. That is, the 3-byte information encoding in the original string becomes 4-byte information. The main function of Base64 is to meet the transmission requirements of MIME. 
In Java8, Base64 encoding has become a standard for Java class libraries, and a Base64 encoding encoder and decoder are built in.

Problem I
accidentally found that when using the built-in Base64 decoder of jdk8 for parsing, java.lang.IllegalArgumentException: Illegal base64 character a exception will be thrown. 
This is very strange, because the original text is encoded using the encoder in jdk7, so this incompatibility should not happen theoretically.

Test program
Let's write a program to test where the problem is.

The test program uses a relatively long original text, mainly because this problem only occurs when the original text is longer. If the original text is shorter (the byte length does not exceed 57), then this problem will not occur.

1 Use jdk7 to encode:

import sun.misc.BASE64Encoder;
public class TestBase64JDK7 {
    private static final String TEST_STRING = "0123456789,0123456789,0123456789,0123456789,0123456789,0123456789,0123456789";
    public static void main(String[] args) {
        BASE64Encoder base64Encoder = new BASE64Encoder();
        String base64Result = base64Encoder.encode(TEST_STRING.getBytes());
        System.out.println(base64Result);
    }
}
1
2
3
4
5
6
7
8
9
2 jdk7编码结果:

+ + 8jDAxMjM0 8jDAxMjM0NTY3ODnvvIwwMTIzNDU2Nzg577yMMDEyMzQ1Njc4Oe MDEyMzQ1Njc4Oe
NTY3ODnvvIwwMTIzNDU2Nzg577yMMDEyMzQ1Njc4OQ ==
. 1
2
. 3 jdk8 encoded using the above decoding results:

java.util.Base64 Import;
public class TestBase64JDK8 {     public static void main (String [] args) {         String base64Result = "MDEyMzQ1Njc4Oe 8jDAxMjM0NTY3ODnvvIwwMTIzNDU2Nzg577yMMDEyMzQ1Njc4Oe + + 8jDAxMjM0 \ n-" +                 "NTY3ODnvvIwwMTIzNDU2Nzg577yMMDEyMzQ1Njc4OQ ==";         . Base64.getDecoder () decode (base64Result );     } } 1 2 3 4 5 6 7 8 4 The result is as described at the beginning, an exception will be thrown:














Exception in thread "main" java.lang.IllegalArgumentException: Illegal base64 character a
    at java.util.Base64$Decoder.decode0(Base64.java:714)
    at java.util.Base64$Decoder.decode(Base64.java:526)
    at java.util.Base64$Decoder.decode(Base64.java:549)
    at com.francis.TestBase64JDK8.main(TestBase64JDK8.java:14)
1
2
3
4
5
Could it be said that jdk7 and jdk8 have any difference in the processing of base64 same? ? ?

5 Continue to look at the encoding of the original text by jdk8:

import java.util.Base64;
public class TestBase64JDK8 {
    private static final String TEST_STRING = "0123456789,0123456789,0123456789,0123456789,0123456789,0123456789,0123456789";
    public static void main(String[] args) {
        String base64Result = Base64.getEncoder().encodeToString(TEST_STRING.getBytes());
        System.out.println(base64Result);
    }
}
1
2
3
4
5
6
7
8
6 jdk8编码结果:

MDEyMzQ1Njc4Oe+8jDAxMjM0NTY3ODnvvIwwMTIzNDU2Nzg577yMMDEyMzQ1Njc4Oe+8jDAxMjM0NTY3ODnvvIwwMTIzNDU2NzgQ577yMMDEyMOQ= Length can be compared to the following conclusions from
1
to 4 base coding:

The encoding result of jdk7 contains line
breaks ; the encoding result of jdk8 does not contain line
breaks ; jdk8 cannot decode encoding results that include line breaks;
the encoding result of jdk8 uses jdk7 for decoding, there is no problem, and no further demonstration.

Now the cause of the problem is basically clear, because the encoding result of jdk7 contains line breaks, which causes an exception to be thrown when decoding jdk8. 
But why is there such a difference? Is the base64 standard used differently?

Troubleshooting
Continue to troubleshoot the problem, start with class annotations, and see if you understand it incorrectly.

1 Let’s take a look at the Base64 class annotations in jdk8. Here are only some key contents:

/**
 * This class consists exclusively of static methods for obtaining
 * encoders and decoders for the Base64 encoding scheme. The
 * implementation of this class supports the following types of Base64
 * as specified in
 * <a href="http://www.ietf.org/rfc/rfc4648.txt">RFC 4648</a> and
 * <a href="http://www.ietf.org/rfc/rfc2045.txt">RFC 2045</a>.
 *
 * <ul>
 * <li><a name="basic"><b>Basic</b></a>
 * <p> Uses "The Base64 Alphabet" as specified in Table 1 of
 *     RFC 4648 and RFC 2045 for encoding and decoding operation.
 *     The encoder does not add any line feed (line separator)
 *     character. The decoder rejects data that contains characters
 The Base64 Alphabet Outside *. </ P> </ Li>
 ...
 * @author Xueming Shen
 * @Since 1.8
 * /
. 1
2
. 3
. 4
. 5
. 6
. 7
. 8
. 9
10
. 11
12 is
13 is
14
15
16
. 17
18 is
. 19
to the effect that :

This class contains the encoding method and decoding method of the base64 encoding format, and the implementation is implemented in accordance with the two protocols rfc4648 and rfc2045.
The encoding and decoding operations are based on the "The Base64 Alphabet" specified in the "Table 1" of the two protocols. The encoder will not add any line breaks, and the decoder will only process data within the range of'The Base64 Alphabet'. If it is not within this range, the decoder will refuse to process it.
1
2
see here you can understand why the coding result does not contain jdk8 for the trip. 

In addition, you can basically guess why jdk8 cannot decode the encoding result of jdk7 (the newline character should not be in The base64 alphabet).

2 Let’s take a look at the base64 alphabet in the two standards (the table in the two standards is the same):

                         Table 1: The Base 64 Alphabet
        Value Encoding  Value Encoding  Value Encoding  Value Encoding
            0 A            17 R            34 i            51 z
            1 B            18 S            35 j            52 0
            2 C            19 T            36 k            53 1
            3 D            20 U            37 l            54 2
            4 E            21 V            38 m            55 3
            5 F            22 W            39 n            56 4
            6 G            23 X            40 o            57 5
            7 H            24 Y            41 p            58 6
            8 I            25 Z            42 q            59 7
            9 J            26 a            43 r            60 8
           10 K            27 b            44 s            61 9
           11 L            28 c            45 t            62 +
           12 M            29 d            46 u            63 /
           13 N            30 e            47 v
           14 O            31 f            48 w         (pad) =
           15 P            32 g            49 x
           16 Q            33 h            50 y
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
does not contain a newline character, which can explain why jdk8 cannot decode the encoding result that contains a newline.

3 Let's take a look at the class annotation of sun.misc.BASE64Encoder in jdk7:

   This class implements a BASE64 Character encoder as specified in RFC1521. 
   This RFC is part of the MIME specification as published by the Internet Engineering Task Force (IETF). 
   Unlike some other encoding schemes there is nothing in this encoding that indicates where a buffer starts or ends. 
   This means that the encoded text will simply start with the first line of encoded text and end with the last line of encoded text.
1
2
3
4
This implementation is based on RFC1521, and there are no encoding or decoding constraints in the class comments instruction of.

4 Then continue to look at the key parts of rfc1521 (link: https://tools.ietf.org/html/rfc1521).

In the section 5.2. Base64 Content-Transfer-Encoding, there are the following contents:

   The output stream (encoded bytes) must be represented in lines of no
      more than 76 characters each.  All line breaks or other characters
      not found in Table 1 must be ignored by decoding software.  In base64
      data, characters other than those in Table 1, line breaks, and other
      white space probably indicate a transmission error, about which a
      warning message or even a message rejection might be appropriate
      under some circumstances.
1
2
3
4
5
6
7
这里明确规定了:

Each line of the encoding result cannot exceed 76 characters; the
decoded characters must be in the range of: Tbale 1 (that is, the base64 alphabet mentioned earlier), line breaks and whitespace characters;
this is why the encoding result of jdk7 contains line breaks. 
In this way, based on the class annotations and the content of the rfc protocol, you can explain the above conclusions obtained through the test code, and you can understand why this problem occurs.

The package beginning with'sun' does not belong to the Java specification, but is the implementation of Sun, so the base64 encoding method in jdk7 is not a Java specification.

Solution
So, how to solve this problem: 
1. Use the org.apache.commons.codec.binary.Base64 class in the apache common package to encode and decode; 
2. Remove line breaks after encoding or before decoding; 
3. Encoding and Use the same jdk version for decoding;

Other Base64 libraries
Take a look at how other libraries handle base64. 
1. Apache Common

The org.apache.commons.codec.binary.Base64 class in Apache Common is implemented based on rfc2045. According to the class comments, we can understand that this implementation ignores all characters that are not in the base64 alphabet range when decoding, so the implementation can handle the inclusion Base64 encoding result of newline character. 
At the same time, this type of encoding method provides parameters to specify whether to add line breaks when the length of the encoding result exceeds 76 characters. By default, line breaks are not added.

Spring Core
Spring Core provides the Base64Utils class, which is just a tool class and does not implement any protocol.

Java.util.Base64 preferably used in class java8 encoding and decoding;
if java.util.Base64 does not exist, the use org.apache.commons.codec.binary.Base64;
if not present, will be given
protocol Jane From the
above troubleshooting steps, we can see that the base64 part of rfc1521, rfc2045 and rfc4648 seems to be different. Next, let’s take a brief look at how these three protocols regulate the line breaks of base64 encoding.

rfc1521 (link: https://tools.ietf.org/html/rfc1521) 
This protocol is about MIME, and Base64 is an encoding type supported by MIME. Key content 5.2. The Base64 Content-Transfer-Encoding chapter has been briefly explained above, mainly to stipulate: the length of each line of the encoding result and the range of decoded characters. 
The agreement has been eliminated. 
jdk7 implements base64 based on this protocol, so the encoding result will contain line breaks.

MIME: Multipurpose Internet Mail Extensions, multipurpose Internet mail extension type. It is an Internet standard that was first used in e-mail systems and later applied to browsers. The server will tell the browser the type of multimedia data they send, and the notification means is to indicate the MIME type of the multimedia data.

rfc2045 (link: https://tools.ietf.org/html/rfc2045)

The agreement is also about MIME, is an updated version of rfc1521, the key content is 6.8. Base64 Content-Transfer-Encoding section, in which there is no difference between the length of the encoding result and the range of decoded characters and rfc1521.

rfc4648

The agreement is about base16, base32 and base64 encoding. The description of the length of each line of the encoding result is in the 3.1. Line Feeds in Encoded Data chapter:

   MIME is often used as a reference for base 64 encoding.  However,
      MIME does not define "base 64" per se, but rather a "base 64 Content-
      Transfer-Encoding" for use within MIME.  As such, MIME enforces a
      limit on line length of base 64-encoded data to 76 characters.  MIME
      inherits the encoding from Privacy Enhanced Mail (PEM) [3], stating
      that it is "virtually identical"; however, PEM uses a line length of
      64 characters.  The MIME and PEM limits are both due to limits within
      SMTP.

   Implementations MUST NOT add line feeds to base-encoded data unless
      the specification referring to this document explicitly directs base
      encoders to add line feeds after a specific number of characters.
1
2
3
4
5
6
7
8
9
10
11
12
大意是:

   The MIME protocol is usually referred to as the base64 protocol. But the MIME protocol does not define'base64', but instead defines'base64 content transfer encoding'. Therefore, MIME limits the length of base64-encoded data to 76 characters.
   ...
   MIME and PEM length restrictions are used for SMTP.
   The implementation of this protocol cannot add a newline character in the encoding result, unless the implementation of the document is quoted, and it is clearly stated that a newline character is added after a certain length.
1
2
3
4
The Base64 class of jdk8 is implemented based on rfc2045 and rfc4648. According to the protocol content listed above, it can be determined that the encoding result of this class will not contain line breaks, and it is clearly stated in the class comments that it will not be added Line break.
--------------------- 
Author: java_4_ever 
Source: CSDN 
Original: https: //blog.csdn.net/java_4_ever/article/details/80978089 
Disclaimer: This article Original article for the blogger, please attach a link to the blog post if you reprint it!

Guess you like

Origin blog.csdn.net/kevin_mails/article/details/87878601