Ziqi :
I am trying to arrive at a regex to detect tokens from a sentence. These tokens should be a combination of letters and digits (mandatory), with optional chars like , or .
Given the sentence:
M5 x 35mm Full Thread Hexagon Bolts (DIN 933) - PEEK DescriptionThe M5 x 0.035mm, and 6NB7 plus a Go9IuN.
It should find six tokens:
M5, 35mm, M5, 0.035mm, 6NB7, Go9IuN
I have tried the following which does not work:
Pattern alphanum=Pattern.compile("\\b(([A-Za-z].*[0-9])|([0-9].*[A-Za-z]))\\b");
Any suggestions please?
Thanks
The fourth bird :
You could use a positive lookahead to assert at least 1 digit and then match at least 1 char a-zA-Z
The .*
part will over match as it will match any char 0+ times except a newline
\b(?=[a-zA-Z0-9.,]*[0-9])[a-zA-Z0-9.,]*[a-zA-Z][a-zA-Z0-9.,]*\b
Explanation
\b
Word boundary(?=[a-zA-Z0-9.,]*[0-9])
Assert at least 1 digit[a-zA-Z0-9.,]*[a-zA-Z][a-zA-Z0-9.,]*
Match at least 1 char a-zA-Z\b
Word boundary
In Java
final String regex = "\\b(?=[a-zA-Z0-9.,]*[0-9])[a-zA-Z0-9.,]*[a-zA-Z][a-zA-Z0-9.,]*\\b";
Guess you like
Origin http://43.154.161.224:23101/article/api/json?id=324923&siteId=1