I want to split a string after the letter "K" or "L" except when either is followed by the letter "P". Meanwhile, I hope not to split if the substring length less than 4 when the string is split on a location. For example:
- Input:
AYLAKPHKKDIV
- Expected Output
AYLAKPHK
KDIV
Now, I have achieved to split string after the letter "K" or "L" except when either is followed by the letter "P". My regular expression is (?<=[K|R])(?!P)
.
My result:
AYLAKPHK
K
DIV
However, I don't know how to ignore the split location where the substring length less than 4.
I hope not to split if the substring length less than 4
In other words, you want to have
previous match (split) separated to current match with at least 4 characters, so
ABCKABKKABCD
would split intoABCK|ABKK|ABCD
not but not into `ABCK|ABK|.....at least 4 characters after current split since
ABCKAB
after splitABCK|AB
would haveAB
at the end which length is less than 4.
To achieve first condition you can use \G
which represents place of previous match (or start of the string if there ware no matches yet). So first condition can look like (?<=\G.{4,})
(WARNING: usually look-behind expects obvious maximal length of subregex it handles, but for some reasons .{4,}
works here, which can be bug or feature added in Java 10 which I am using now. In case it complains about it, you can use some very big number which should be bigger than max amount of characters you expect between two splits like .{4,10000000}
)
Second condition is simpler since it is just (?=.{4})
.
BTW you don't want |
in [K|R]
as there it represents literal, not OR operator since by default any character in character set is alternative choice. So [K|R]
represents K
OR |
OR R
. Use [KR]
instead.
DEMO:
String text = "AYLAKPHKKKKKKDIVK123KAB";
String regex = "(?<=[KR])(?!P)(?<=\\G.{4,})(?=.{4})";
for (String s : text.split(regex)){
System.out.println("'"+s+"'");
}
Output:
'AYLAKPHK'
'KKKK'
'KDIVK'
'123KAB'