URI - getHost returns null. Why?

I don't think it's a bug in Java, I think Java is parsing hostnames correctly according to the spec, there are good explanations of the spec here: http://en.wikipedia.org/wiki/Hostname#Restrictions_on_valid_host_names and here: http://www.netregister.biz/faqit.htm#1

Specifically hostnames MUST NOT contain underscores.


It's because of underscore in base uri. Just Remove underscore to check that out.It's working.

Like given below :

public static void main(String[] args) throws Exception {
java.net.URI uri = new java.net.URI("http://brokenarrow.huntingtonhelps.com");
String host = uri.getHost();
System.out.println("Host = [" + host + "].");

uri = new java.net.URI("http://mail.yahoo.com");
host = uri.getHost();
System.out.println("Host = [" + host + "].");

}


As mentioned in comments by @hsz it is known bug.

But, let's debug and look inside sources of URI class. The problem is inside the method:

private int parseHostname(int start, int n):

parsing first URI fails at lines if ((p < n) && !at(p, n, ':')) fail("Illegal character in hostname", p);

this is because _ symbol isn't foreseed inside scan block, it allows only alphas, digits and -symbol (L_ALPHANUM, H_ALPHANUM, L_DASH and H_DASH).

And yes, this is not fixed yet in Java 7.

Tags:

Java