Java regex: find sequence of letter-digit combinations, allowing certain symbols

Ziqi :

I am trying to arrive at a regex to detect tokens from a sentence. These tokens should be a combination of letters and digits (mandatory), with optional chars like , or .

Given the sentence:

M5 x 35mm Full Thread Hexagon Bolts (DIN 933) - PEEK DescriptionThe M5 x 0.035mm, and 6NB7 plus a Go9IuN.

It should find six tokens:

M5, 35mm, M5, 0.035mm, 6NB7, Go9IuN

I have tried the following which does not work:

Pattern alphanum=Pattern.compile("\\b(([A-Za-z].*[0-9])|([0-9].*[A-Za-z]))\\b");

Any suggestions please?

Thanks

The fourth bird :

You could use a positive lookahead to assert at least 1 digit and then match at least 1 char a-zA-Z

The .* part will over match as it will match any char 0+ times except a newline

\b(?=[a-zA-Z0-9.,]*[0-9])[a-zA-Z0-9.,]*[a-zA-Z][a-zA-Z0-9.,]*\b

Explanation

  • \b Word boundary
  • (?=[a-zA-Z0-9.,]*[0-9]) Assert at least 1 digit
  • [a-zA-Z0-9.,]*[a-zA-Z][a-zA-Z0-9.,]* Match at least 1 char a-zA-Z
  • \b Word boundary

Regex demo

In Java

final String regex = "\\b(?=[a-zA-Z0-9.,]*[0-9])[a-zA-Z0-9.,]*[a-zA-Z][a-zA-Z0-9.,]*\\b";

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324923&siteId=1