Use Notepad++ regular expressions to compress and remove words

Original string: abbbbbijkijkijkijkijkijkijkijkc

 

----------------------------------------------------------------------------

 

Goal 1:

Compress a single character repeated more than 3 times in a string into 1 time, that is, bbbbb becomes b

 

The regular expression to find is:

(.)\1{2,}

Explanation:

. Represents a character arbitrarily specified,

The \1 in (.)\1 represents the content equivalent to the parentheses.

\1{2,} means that the content quoted by \1 appears at least twice in a row.

Therefore, (.)\1{2,} this string of regular expressions represents that in addition to (.) itself, the content equivalent to (.) appears at least twice, that is to say, any specified character is continuous Appeared at least three times.

 

The regular expression to replace is:

\1

Explanation:

Obviously, replace the found content with the content quoted by \1, which is also (.), which is b here.

 

The result after replacement:

abijkijkijkijkijkijkijkc

 

----------------------------------------------------------------------------

 

Goal 2:

Compress the string repeated more than twice in abijkijkijkijkijkijkijkijkc processed in the previous step into a string, that is, change ijkijkijkijkijkijkijkijk into ijk

 

The regular expression to find is:

(..+?)\1{1,}

Explanation:

. Represents a randomly specified character, and ..+? represents any specified string with a length of 2 or more, and it is a non-greedy match, that is, ..+? matches ijk. If it is changed to ..+, it is a greedy match, which will match ijkijkijkijk. The greedy match will cause the need to repeat the search and replacement operation logN times (N is the number of string repetitions), in order to compress the string repeated N times into one . Instead of greedy matching, you only need to find and replace once.

The \1 in (..+?)\1 represents the content equivalent to the parentheses.

\1{1,} means that the content quoted by \1 appears at least once in a row.

Therefore, (..+?)\1{1,} this string of regular expressions represents in addition to (..+?) itself, followed by content equivalent to (..+?) that appears at least once, also That is to say, the arbitrarily specified string of length 2 or more appears at least twice in a row.

 

The regular expression to replace is:

\1

Explanation:

Obviously, you need to replace the found content with the content quoted by \1, which is also (..+?), which is ijk here.

 

The result after replacement:

abijkc

----------------------------------------------------------------------------

Extended usage:

In the above search of targets 1 and 2, it is required that the repeated string found is not a pure number, and only needs to be changed to the following regular expression:

Goal 1:

([^0-9])\1{2,}

Goal 2:

To find and replace in two steps,

Find first

([0-9]+[^0-9].*?)\1{1,}

Among them ([0-9]+[^0-9].*?) represents the repeated string unit, and the pattern is interpreted as: at least one (greedy match) numeric character followed by a non-numeric character followed by more than zero (non-numeric characters) Greedy match) any character;

Then find

([^0-9].+?)\1{1,}

Among them ([^0-9].+?) represents the repeated string unit, and the pattern is interpreted as: a non-digit character followed by more than one arbitrary character (non-greedy matching);

----------------------------------------------------------------------------

Attachment: Advanced usage of NOTEPAD++ regular expression https://blog.csdn.net/yocencyy/article/details/104117433

Guess you like

Origin blog.csdn.net/yocencyy/article/details/104120457