Java, regex to split a string on a delimiter with constraints

Daniel :

I have a malformed base64 string in java.

It's not absolutely malformed but the string sometimes contains more base64 encoded data.

I want to split the string, and I think regex is the best way to achieve this.

There are cases:

  • if there is only one base64 in the string, it either
    • does not contain any padding char =
    • contains padding char(s) (one or two) only at the end
  • if there are more base64 in the string, it will
    • contain padding char(s) (one or two) not at the end (or not only at the end)

Now I want to get a String[] which holds the single base64 strings.

So regex does not have to split if there is no padding char, or the padding char is at the end. But it has to split if there is padding char in the middle (and there can be one ore two padding chars).

Test snippet:

import java.util.Base64;
import java.io.UnsupportedEncodingException;
import java.util.Arrays;

/*
TEST CASES:
output array shall contain one item only
TG9y
TG9yZW0=
TG9yZQ==

output array shall contain two items
TG9yZW0=TG9y
TG9yZW0=TG9yZW0=
TG9yZW0=TG9yZQ==

TG9yZQ==TG9y
TG9yZQ==TG9yZW0=
TG9yZQ==TG9yZQ==

output array shall contain three items
TG9yZW0=TG9yZW0=TG9y
TG9yZQ==TG9yZW0=TG9y
...
*/

public class MyClass {
  public static void main(String args[]) {

    String buffer = "";

    try {
      byte[] decodedString = Base64.getDecoder().decode(buffer.getBytes("UTF-8"));
      System.out.println(new String(decodedString));
    } catch (IllegalArgumentException e) {
      e.printStackTrace();
      System.err.println("Buffer: " + buffer);
    } catch (UnsupportedEncodingException e) { }
  }
}

I'm not sure if regex is fully capable of this, or if it is the best method to do this.

Andreas :

As mentioned in a comment, you can split the string after an = equal sign, that isn't followed by an = equal sign, by using a combination of (?<=X) zero-width positive lookbehind and (?!X) zero-width negative lookahead:

String[] arr = input.split("(?<==)(?!=)");

Test

String[] inputs = {
        "TG9y",
        "TG9yZW0=",
        "TG9yZQ==",
        "TG9yZW0=TG9y",
        "TG9yZW0=TG9yZW0=",
        "TG9yZW0=TG9yZQ==",
        "TG9yZQ==TG9y",
        "TG9yZQ==TG9yZW0=",
        "TG9yZQ==TG9yZQ==",
        "TG9yZW0=TG9yZW0=TG9y",
        "TG9yZQ==TG9yZW0=TG9y",
        "TG9yTG9yZQ==TG9yZW0=" };
Decoder decoder = Base64.getDecoder();
for (String input : inputs) {
    String[] arr = input.split("(?<==)(?!=)");
    for (int i = 0; i < arr.length; i++)
        arr[i] = new String(decoder.decode(arr[i]), StandardCharsets.US_ASCII);
    System.out.println(Arrays.toString(arr));
}

Output

[Lor]
[Lorem]
[Lore]
[Lorem, Lor]
[Lorem, Lorem]
[Lorem, Lore]
[Lore, Lor]
[Lore, Lorem]
[Lore, Lore]
[Lorem, Lorem, Lor]
[Lore, Lorem, Lor]
[LorLore, Lorem]

Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=396434&siteId=1