REGEX : Splitting String into Fixed Length w/o Breaking Words AND Including Specials

Etep :

I have the following REGEX...

\S.{1,40}\b\W?

This will break a string into smaller strings no longer than 40 characters long and will not break up the words (plus includes punctuation). However what it won't handle is special characters such as # on their own IF it ends up at the end of the string (less than 40 characters long.) I assume because the regex doesn't see it as a word?

Take the following example...

abcd (efghij # / klmno (# #)

The result will be...

abcd (efghij # / klmno 

Where the result should be the same as the input (including the (# #) at the end).

Take this example as well...

abcd (efghij # / klmno (# #)
blah blah etc etc words and more words and yet more words. What about these words?
And some more text for this string so that we can test things out. 

In this case the results should be...

abcd (efghij # / klmno (# #)
blah blah etc etc words and more words 
and yet more words. What about these 
words?
And some more text for this string so 
that we can test things out.

However again, with my current regex above the results are...

abcd (efghij # / klmno 
blah blah etc etc words and more words 
and yet more words. What about these 
words?
And some more text for this string so 
that we can test things out.

Notice that the (# #) is missing. I need this (# #) to be included in the first result.

Please note I'm using this regex in Java using Pattern and Matcher classes.

Any suggestions?

Emma :

My guess is that you might want to pre- or post-process your first-like sentences, otherwise the expression would become rather complicated, then maybe the following expression would be somewhat close:

.{0,39}\S(?=$|\s)

The expression is explained on the top right panel of regex101.com, if you wish to explore/simplify/modify it, and in this link, you can watch how it would match against some sample inputs, if you like.

Demo 2

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=126864&siteId=1