Java native code string ending

All current answers to the question seem to be outdated (Edward Thomson's answer last update dates back to 2015), or referring to Android JNI documentation which can be authoritative only in the Android world. The matter has been clarified in recent (2017) official Oracle JNI documentation clean-up and updates, more specifically in this issue.

Now the JNI specification clearly states:

String Operations

This specification makes no assumptions on how a JVM represent Java strings internally. Strings returned from these operations:

GetStringChars()

GetStringUTFChars()

GetStringRegion()

GetStringUTFRegion()

GetStringCritical()

are therefore not required to be NULL terminated. Programmers are expected to determine buffer capacity requirements via GetStringLength() or GetStringUTFLength().

In the general case this means one should never assume JNI returned strings are null terminated, not even UTF-8 strings. In a pragmatic world one can test a specific behavior in a list of supported JVM(s). In my experience, rereferring to JVMs I actually tested:

Oracle JVMs do null terminate both UTF-16 (with \u0000) and UTF-8 strings (with '\0');
Android JVMs do terminate UTF-8 strings but not UTF-16 ones.

Yes, GetStringUTFChars returns a null-terminated string. However, I don't think you should take my word for it, instead you should find an authoritative online source that answers this question.

Let's start with the actual Java Native Interface Specification itself, where it says:

Returns a pointer to an array of bytes representing the string in modified UTF-8 encoding. This array is valid until it is released by ReleaseStringUTFChars().

Oh, surprisingly it doesn't say whether it's null-terminated or not. Boy, that seems like a huge oversight, and fortunately somebody was kind enough to log this bug on Sun's Java bug database back in 2008. The notes on the bug point you to a similar but different documentation bug (which was closed without action), which suggests that the readers buy a book, "The Java Native Interface: Programmer's Guide and Specification" as there's a suggestion that this become the new specification for JNI.

But we're looking for an authoritative online source, and this is neither authoritative (it's not yet the specification) nor online.

Fortunately, the reviews for said book on a certain popular online book retailer suggest that the book is freely available online from Sun, and that would at least satisfy the online portion. Sun's JNI web page has a link that looks tantalizingly close, but that link sadly doesn't go where it says it goes.

So I'm afraid I cannot point you to an authoritative online source for this, and you'll have to buy the book (it's actually a good book), where it will explain to you that:

UTF-8 strings are always terminated with the '\0' character, whereas Unicode strings are not. To find out how many bytes are needed to represent a jstring in the UTF-8 format, JNI programmers can either call the ANSI C function strlen on the result of GetStringUTFChars, or call the JNI function GetStringUTFLength on the jstring reference directly.

(Note that in the above sentence, "Unicode" means "UTF-16", or more accurately "the internal two-byte string representation used by Java, though finding proof of that is left as an exercise for the reader.)

Java native code string ending

Tags:

Java Native Interface

Related

Recent Posts