Why is the hostname declared invalid when creating a URI
The bug is not in Java but in naming the host, since an underscore is not a valid character in a hostname. Although widely used incorrectly, Java refuses to handle such hostnames
Underscores are not supported in URIs.
While a hostname may not contain other characters, such as the underscore character (_), other DNS names may contain the underscore.[5][6] This restriction was lifted by RFC 2181, Section 11. Systems such as DomainKeys and service records use the underscore as a means to assure that their special character is not confused with hostnames. For example, _http._sctp.www.example.com specifies a service pointer for an SCTP-capable webserver host (www) in the domain example.com. Notwithstanding the standard, Chrome, Firefox, Internet Explorer, Edge and Safari allow underscores in hostnames, although cookies in IE do not work correctly if any part of the hostname contains an underscore character
Wikipedia
From Javadocs :
public URI(String str) throws URISyntaxException Throws: URISyntaxException - If the given string violates RFC 2396, as augmented by the above deviations
Javadocs
(Hacky) Solution :
URI url = URI.create("https://5-12-145-35_s-8:8080");
System.out.println(url.getHost()) // null
if (url.getHost() == null) {
final Field hostField = URI.class.getDeclaredField("host");
hostField.setAccessible(true);
hostField.set(url, "5-12-145-35_s-81");
}
System.out.println(url.getHost()); // 5-12-145-35_s-81
This was reported as - JDK bug
Host name must match the following syntax:
hostname = domainlabel [ "." ] | 1*( domainlabel "." ) toplabel [ "." ] domainlabel = alphanum | alphanum *( alphanum | "-" ) alphanum toplabel = alpha | alpha *( alphanum | "-" ) alphanum
As you can see, only .
and -
are allowed, _
is not.
You then say that //5-12-145-35_s-81:443
is allowed, and it is, but not for host name.
To see how that pans out:
URI uriBadHost = URI.create("//5-12-145-35_s-81:443");
System.out.println("uri = " + uriBadHost);
System.out.println(" authority = " + uriBadHost.getAuthority());
System.out.println(" host = " + uriBadHost.getHost());
System.out.println(" port = " + uriBadHost.getPort());
URI uriGoodHost = URI.create("//example.com:443");
System.out.println("uri = " + uriGoodHost);
System.out.println(" authority = " + uriGoodHost.getAuthority());
System.out.println(" host = " + uriGoodHost.getHost());
System.out.println(" port = " + uriGoodHost.getPort());
Output
uri = //5-12-145-35_s-81:443
authority = 5-12-145-35_s-81:443
host = null
port = -1
uri = //example.com:443
authority = example.com:443
host = example.com
port = 443
As you can see, when the authority
has a valid host name, the host
and port
are parsed, but when not valid, the authority
is treated as freeform text, and not parsed any further.
UPDATE
From comment:
System.out.println( new URI(null, null, "/5-12-145-35_s-81", 443, null, null, null))
outputs: ///5-12-145-35_s-81:443. I'm giving it as hostname
The URI
constructor you're calling is a convenience method, and it simple builds a full URI string and then parses that.
Passing "5-12-145-35_s-81", 443
becomes //5-12-145-35_s-81:443
.
Passing "/5-12-145-35_s-81", 443
becomes ///5-12-145-35_s-81:443
.
In the first, it's a host and port, and fails to parse.
In the second the authority part is empty, and /5-12-145-35_s-81:443
is a path.
URI uri1 = new URI(null, null, "/5-12-145-35_s-81", 443, null, null, null);
System.out.println("uri = " + uri1);
System.out.println(" authority = " + uri1.getAuthority());
System.out.println(" host = " + uri1.getHost());
System.out.println(" port = " + uri1.getPort());
System.out.println(" path = " + uri1.getPath());
Output
uri = ///5-12-145-35_s-81:443
authority = null
host = null
port = -1
path = /5-12-145-35_s-81:443