I have following RegEx: (([a-zA-Z0-9?]{4,8})(-[a-zA-Z0-9?]{4,8})+-([a-zA-Z0-9?]{4,8}))
How can I avoid matching sequences which do not contain at least one digit AND one character (a-zA-Z)?
For example:
This text: Hello World 123 abc 1AB2C-D3FGH-456I7-JK8LM-NOP9Q Hello World 123 abc
should return 1AB2C-D3FGH-456I7-JK8LM-NOP9Q
and this: Hello World 123 abc 11111-1111-1111 Hello World 123 abc
or
Hello World 123 abc aaaa-aaaa-aaaa-aaa Hello World 123 abc
should return nothing.
I develop in Java and get the group like this:
public List<String> getKeys() {
keys = new ArrayList<>();
Matcher matcher = KEY_REGEX.matcher(text);
while (matcher.find()) {
keys.add(matcher.group());
}
return keys;
}
Thanks!
One way is to use a positive lookahead (?=
to check for at least an occurrence of A-Z and a digit 0-9
To match the occurrences in the -
in the lookahead to find both, you could add it to the character class.
When matching, you start matching chars A-Z0-9 and repeat a group prepending the -
so that there are no consecutive occurrences of -
and not at the start or at the end.
\b(?=[A-Z0-9-]*[A-Z])(?=[A-Z0-9-]*[0-9])[A-Z0-9]+(?:-[A-Z0-9]+)+\b
\b
Word boundary(?=[A-Z0-9-]*[A-Z])
Assert a char A-Z(?=[A-Z0-9-]*[0-9])
Assert a digit 0-9[A-Z0-9]+
Match 1+ occurrences of A-Z0-9(?:-[A-Z0-9]+)+
Repeat matching 1+ occurrences of A-Z0-9 with-
prepended\b
Word boundary
Note that [A-z]
matches more than [A-Za-z]
Limiting the character class to 4-8 occurrences:
\b(?=[A-Z0-9-]*[A-Z])(?=[A-Z0-9-]*[0-9])[A-Z0-9]{4,8}(?:-[A-Z0-9]{4,8})+\b