Splitting string in java : lookbehind with specified length

huangjs :

I want to split a string after the letter "K" or "L" except when either is followed by the letter "P". Meanwhile, I hope not to split if the substring length less than 4 when the string is split on a location. For example:

- Input:
AYLAKPHKKDIV

- Expected Output
AYLAKPHK
KDIV

Now, I have achieved to split string after the letter "K" or "L" except when either is followed by the letter "P". My regular expression is (?<=[K|R])(?!P).

My result:
AYLAKPHK
K
DIV

However, I don't know how to ignore the split location where the substring length less than 4.

My Demo

Pshemo :

I hope not to split if the substring length less than 4

In other words, you want to have

  1. previous match (split) separated to current match with at least 4 characters, so ABCKABKKABCD would split into ABCK|ABKK|ABCD not but not into `ABCK|ABK|.....

  2. at least 4 characters after current split since ABCKAB after split ABCK|AB would have AB at the end which length is less than 4.

To achieve first condition you can use \G which represents place of previous match (or start of the string if there ware no matches yet). So first condition can look like (?<=\G.{4,}) (WARNING: usually look-behind expects obvious maximal length of subregex it handles, but for some reasons .{4,} works here, which can be bug or feature added in Java 10 which I am using now. In case it complains about it, you can use some very big number which should be bigger than max amount of characters you expect between two splits like .{4,10000000})

Second condition is simpler since it is just (?=.{4}).

BTW you don't want | in [K|R] as there it represents literal, not OR operator since by default any character in character set is alternative choice. So [K|R] represents K OR | OR R. Use [KR] instead.

DEMO:

String text = "AYLAKPHKKKKKKDIVK123KAB";
String regex = "(?<=[KR])(?!P)(?<=\\G.{4,})(?=.{4})";
for (String s : text.split(regex)){
    System.out.println("'"+s+"'");
}

Output:

'AYLAKPHK'
'KKKK'
'KDIVK'
'123KAB'

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=107452&siteId=1