My requirement is to check a URL in a string using regex. What I am doing is using Pattern and Matcher finding whether a string contains URL or not
val pattern = Pattern.compile(HyperlinkParser.validRegex.toString())
val matcher = pattern.matcher(htmlParsedMessage) //"abcd www.google.com def"
while (matcher.find()) {
val url = matcher.group()//contains the required url but it returns "www.".Expected "www.google.com"
val indicesPair = Pair(matcher.start(), matcher.end())
hyperlinkStartEndIndicesList.add(indicesPair)
}
matcher.reset()
Where HyperlinkParser.validRegex
is
private const val regularExpression = "(?:(?:https?|ftp|file):|www.|ftp.)(?:([-A-Z0-9+&@#/%=~_|\$?!:,.]*)|[-A-Z0-9+&@#/%=~_|\$?!:,.])*(?:([-A-Z0-9+&@#/%=~_|\$?!:,.]*)|[A-Z0-9+&@#/%=~_|\$])"
val validRegex = Regex(regularExpression,RegexOption.IGNORE_CASE)
I am expecting the URL "www.google.com" but it is returning "www.".
Any ideas what can be the issue. Any help would be greatly accepted.
The documentation of the toString()
method of Regex
:
Returns the string representation of this regular expression, namely the pattern of this regular expression.
Note that another regular expression constructed from the same pattern string may have different options and may match strings differently.
Which means that it is the same as the regularExpression
string without the IGNORE_CASE
option.
So when you do val pattern = Pattern.compile(HyperlinkParser.validRegex.toString())
, you lose the case-insensitive option, and that's why google.com
isn't matched, since your regex only matches A-Z
.
Change that line to:
val pattern = HyperlinkParser.validRegex.toPattern()
That will work, because the documentation of toPattern
says:
Returns an instance of Pattern with the same pattern string and options as this instance of Regex has.