Remove numbering using roman numbers

Slach :

I'm trying to remove valid roman numbers ( numbering) from a text that contains headlines, Paragraphs, etc...

I'm using this regex :

Pattern ROMAN = Pattern.compile("^[([]‌?x{0,3}(i[xv]‌|v?i{0,3})[)\.]/]{1,2}", Pattern.CASE_INSENSITIVE);

Although it matches also empty parenthesis.

What I want to do is to remove the following:

Input :
iv. foo foo foo.
Output:
foo foo foo.
Input :
v) foo foo foo.
Output:
foo foo foo.

But also do nothing when not using them for numbering:

Input :
foo foo foo i) foo v) .
Output:
foo foo foo i) foo v) .

Another example of what the regex should match : iv) X) ix/ V/ x. IV.

Nikolas :

How about something like the following Regex:

^((?=[mdclxvi])m*(c[md]|d?c{0,3})(x[cl]|l?x{0,3})(i[xv]|v?i{0,3})(?:\)|\.))

This matches a roman number that is followed by either ) or . characters. There is a nice article about matching roman numbers Regular Expressions Cookbook by Steven Levithan, Jan Goyvaerts from O'Reilly.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=80730&siteId=1