Running this code with JDK 1.8:
try {
System.out.println( new URI(null, null, "5-12-145-35_s-81", 443, null, null, null));
} catch (URISyntaxException e) {
e.printStackTrace();
}
results in this error: java.net.URISyntaxException: Illegal character in hostname at index 13: //5-12-145-35_s-81:443
Where does this error come from, considering all the hostname characters seem legit, according to Types of URI characters?
If I use these URLs: //5-12-145-35_s-81:443
or /5-12-145-35_s-81:443
the error is gone.
From the comments, I understand that, according to RFC-2396, the hostname cannot contain any underscore characters.
The question that still holds is why a hostname starting with slash or double slash is allowed to contain underscores?
Host name must match the following syntax:
hostname = domainlabel [ "." ] | 1*( domainlabel "." ) toplabel [ "." ] domainlabel = alphanum | alphanum *( alphanum | "-" ) alphanum toplabel = alpha | alpha *( alphanum | "-" ) alphanum
As you can see, only .
and -
are allowed, _
is not.
You then say that //5-12-145-35_s-81:443
is allowed, and it is, but not for host name.
To see how that pans out:
URI uriBadHost = URI.create("//5-12-145-35_s-81:443");
System.out.println("uri = " + uriBadHost);
System.out.println(" authority = " + uriBadHost.getAuthority());
System.out.println(" host = " + uriBadHost.getHost());
System.out.println(" port = " + uriBadHost.getPort());
URI uriGoodHost = URI.create("//example.com:443");
System.out.println("uri = " + uriGoodHost);
System.out.println(" authority = " + uriGoodHost.getAuthority());
System.out.println(" host = " + uriGoodHost.getHost());
System.out.println(" port = " + uriGoodHost.getPort());
Output
uri = //5-12-145-35_s-81:443
authority = 5-12-145-35_s-81:443
host = null
port = -1
uri = //example.com:443
authority = example.com:443
host = example.com
port = 443
As you can see, when the authority
has a valid host name, the host
and port
are parsed, but when not valid, the authority
is treated as freeform text, and not parsed any further.
UPDATE
From comment:
System.out.println( new URI(null, null, "/5-12-145-35_s-81", 443, null, null, null))
outputs: ///5-12-145-35_s-81:443. I'm giving it as hostname
The URI
constructor you're calling is a convenience method, and it simple builds a full URI string and then parses that.
Passing "5-12-145-35_s-81", 443
becomes //5-12-145-35_s-81:443
.
Passing "/5-12-145-35_s-81", 443
becomes ///5-12-145-35_s-81:443
.
In the first, it's a host and port, and fails to parse.
In the second the authority part is empty, and /5-12-145-35_s-81:443
is a path.
URI uri1 = new URI(null, null, "/5-12-145-35_s-81", 443, null, null, null);
System.out.println("uri = " + uri1);
System.out.println(" authority = " + uri1.getAuthority());
System.out.println(" host = " + uri1.getHost());
System.out.println(" port = " + uri1.getPort());
System.out.println(" path = " + uri1.getPath());
Output
uri = ///5-12-145-35_s-81:443
authority = null
host = null
port = -1
path = /5-12-145-35_s-81:443